Contents
Contents
Tap any chapter to start reading.
Chapter 1 Topic ModelsFrom bag-of-words to LDA: discovering latent themes in corpora of tweets, earnings calls, and news headlines. Coherence, perplexity, choosing K, and reading topics as financial-market signals.
Chapter 2 Sentiment AnalysisLexicon-based scoring (Loughran–McDonald, VADER), supervised classifiers, and transformer-based sentiment (FinBERT, RoBERTa). Building sentiment indices from Twitter and earnings calls; predicting returns and economic surprises.
Chapter 3 Large Language Models for Social MediaTokenization, embeddings, attention — how LLMs read text. Using pretrained models for classification, semantic search, and zero-shot tagging of social-media content. Prompting patterns, RAG, and evaluating LLM outputs at scale.
How to read this book
Every Python code block in this book runs live in your browser. Click into any cell, edit it, press the ▶ Run button, and see the output. The Python engine downloads once on the first chapter — after that, everything is instant.
- Chapter 1 (topic models) and Chapter 2 (sentiment) are independent — read in either order.
- Chapter 3 (LLMs) assumes you already understand how text is represented numerically; if you skip Chapter 1, at least skim the bag-of-words and embedding sections.
- The companion book Hands-On Large Language Models (Alammar & Grootendorst) is the reference for Chapter 3 — chapter numbers roughly match.
- Each chapter ends with a small case study: build a sentiment index, train a topic model on Reuters headlines, fine-tune a classifier on labeled tweets.