• 📖 Cover
  • Contents

Contents

Contents

Tap any chapter to start reading.

Chapter 1 Topic Models

From bag-of-words to LDA: discovering latent themes in corpora of tweets, earnings calls, and news headlines. Coherence, perplexity, choosing K, and reading topics as financial-market signals.

Chapter 2 Sentiment Analysis

Lexicon-based scoring (Loughran–McDonald, VADER), supervised classifiers, and transformer-based sentiment (FinBERT, RoBERTa). Building sentiment indices from Twitter and earnings calls; predicting returns and economic surprises.

Chapter 3 Large Language Models for Social Media

Tokenization, embeddings, attention — how LLMs read text. Using pretrained models for classification, semantic search, and zero-shot tagging of social-media content. Prompting patterns, RAG, and evaluating LLM outputs at scale.


How to read this book

Every Python code block in this book runs live in your browser. Click into any cell, edit it, press the ▶ Run button, and see the output. The Python engine downloads once on the first chapter — after that, everything is instant.

Tips for self-study
  • Chapter 1 (topic models) and Chapter 2 (sentiment) are independent — read in either order.
  • Chapter 3 (LLMs) assumes you already understand how text is represented numerically; if you skip Chapter 1, at least skim the bag-of-words and embedding sections.
  • The companion book Hands-On Large Language Models (Alammar & Grootendorst) is the reference for Chapter 3 — chapter numbers roughly match.
  • Each chapter ends with a small case study: build a sentiment index, train a topic model on Reuters headlines, fine-tune a classifier on labeled tweets.

← Back to Cover

 

Prof. Xuhu Wan · HKUST ISOM 5640 · Introduction to Text Analytics for News and Social Media