Introduction to NLP

Chapter 2: Text Representation

Shining applications of BoW and TF-IDF

What word embedding is

Simple encoding methods

What TF-IDF is

Coding – BoW

Coding – Bag-of-N-grams

Coding – TF-IDF

Chapter 3: Text Wrangling and Preprocessing

Key steps in NLP preprocessing

Coding with spaCy

Coding with NLTK

Coding with Gensim

Building a pipeline with spaCy

Part 2: Latent Semantic Analysis/Latent Semantic Indexing

Chapter 4: Latent Semantic Analysis with scikit-learn

Understanding matrix operations

Understanding a transformation matrix

Understanding eigenvectors and eigenvalues

An introduction to SVD

Coding truncatedSVD with scikit-learn

Using TruncatedSVD for LSI with real data

Chapter 5: Cosine Similarity

What is cosine similarity?

How cosine similarity is used in images

How to compute cosine similarity with scikit-learn

Chapter 6: Latent Semantic Indexing with Gensim

Performing text preprocessing

Performing word embedding with BoW and TF-IDF

Modeling with Gensim

Using the coherence score to find the optimal number of topics

Saving the model for production

Using the model as an information retrieval tool

Part 3: Word2Vec and Doc2Vec

Chapter 7: Using Word2Vec

Introduction to Skip-Gram (SG)

Introduction to Word2Vec

Introduction to CBOW

Using a pretrained model for semantic search

Adding and subtracting words/concepts

Visualizing Word2Vec with TensorBoard

Training your own Word2Vec model in CBOW and Skip-Gram

Visualizing your Word2Vec model with t-SNE

Comparing Word2Vec with Doc2Vec, GloVe, and fastText

Chapter 8: Doc2Vec with Gensim

The real-world applications of Doc2Vec

From Word2Vec to Doc2Vec

PV-DBOW

PV-DM

Doc2Vec modeling with Gensim

Putting the model into production

Tips on building a good Doc2Vec model

Part 4: Topic Modeling with Latent Dirichlet Allocation

Chapter 9: Understanding Discrete Distributions

The basics of discrete probability distributions

Bernoulli distributions

Binomial distributions

Multinomial distributions

Beta distributions

Dirichlet distributions

What is generative modeling?

Chapter 10: Latent Dirichlet Allocation

Understanding the idea behind LDA

Understanding the structure of LDA

Variational inference

Variational E-M

Variational E-M versus Gibbs sampling

Chapter 11: LDA Modeling

Experimenting with LDA modeling

Text preprocessing

Building LDA models with a different number of topics

Determining the optimal number of topics

Using the model to score new documents

Chapter 12: LDA Visualization

Data visualization with pyLDAvis

Designing an infographic

Chapter 13: The Ensemble LDA for Model Stability

The process of Ensemble LDA

From LDA to Ensemble LDA

Understanding DBSCAN and CBDBSCAN

Building an Ensemble LDA model with Gensim

Part 5: Comparison and Applications

Chapter 14: LDA and BERTopic

Understanding the Transformer model

Understanding BERT

Describing how BERTopic works

Building a BERTopic model

Reviewing the results of BERTopic

Visualizing the BERTopic model

Predicting new documents

Using the modular property of BERTopic

Comparing BERTopic with LDA

Word2Vec for medical fraud detection

Chapter 15: Real-World Use Cases

Comparing LDA/NMF/BERTopic on Twitter/X posts

Interpretable text classification from electronic health records

BERTopic for legal documents

Word2Vec for 10-K financial documents to the SEC