Book Image

Transformers for Natural Language Processing - Second Edition

By : Denis Rothman

5 (1)

Book Image

Transformers for Natural Language Processing - Second Edition

5 (1)

By: Denis Rothman

Overview of this book

Transformers are...well...transforming the world of AI. There are many platforms and models out there, but which ones best suit your needs? Transformers for Natural Language Processing, 2nd Edition, guides you through the world of transformers, highlighting the strengths of different models and platforms, while teaching you the problem-solving skills you need to tackle model weaknesses. You'll use Hugging Face to pretrain a RoBERTa model from scratch, from building the dataset to defining the data collator to training the model. If you're looking to fine-tune a pretrained model, including GPT-3, then Transformers for Natural Language Processing, 2nd Edition, shows you how with step-by-step guides. The book investigates machine translations, speech-to-text, text-to-speech, question-answering, and many more NLP tasks. It provides techniques to solve hard language problems and may even help with fake news anxiety (read chapter 13 for more details). You'll see how cutting-edge platforms, such as OpenAI, have taken transformers beyond language into computer vision tasks and code creation using DALL-E 2, ChatGPT, and GPT-4. By the end of this book, you'll know how transformers work and how to implement them and resolve issues like an AI detective.

Preface

Who this book is for

What this book covers

To get the most out of this book

Free Chapter

What are Transformers?

What are Transformers?

The ecosystem of transformers

Optimizing NLP models with transformers

What resources should we use?

Getting Started with the Architecture of the Transformer Model

Getting Started with the Architecture of the Transformer Model

The rise of the Transformer: Attention is All You Need

Training and performance

Tranformer models in Hugging Face

Fine-Tuning BERT Models

Fine-Tuning BERT Models

The architecture of BERT

Fine-tuning BERT

Pretraining a RoBERTa Model from Scratch

Pretraining a RoBERTa Model from Scratch

Training a tokenizer and pretraining a transformer

Building KantaiBERT from scratch

Downstream NLP Tasks with Transformers

Downstream NLP Tasks with Transformers

Transduction and the inductive inheritance of transformers

Transformer performances versus Human Baselines

Running downstream tasks

Machine Translation with the Transformer

Machine Translation with the Transformer

Defining machine translation

Preprocessing a WMT dataset

Evaluating machine translation with BLEU

Translation with Google Translate

Translations with Trax

The Rise of Suprahuman Transformers with GPT-3 Engines

The Rise of Suprahuman Transformers with GPT-3 Engines

Suprahuman NLP with GPT-3 transformer models

The architecture of OpenAI GPT transformer models

Generic text completion with GPT-2

Training a custom GPT-2 language model

Running OpenAI GPT-3 tasks

Comparing the output of GPT-2 and GPT-3

Fine-tuning GPT-3

The role of an Industry 4.0 AI specialist

Applying Transformers to Legal and Financial Documents for AI Text Summarization

Applying Transformers to Legal and Financial Documents for AI Text Summarization

Designing a universal text-to-text model

Text summarization with T5

Summarization with GPT-3

Matching Tokenizers and Datasets

Matching Tokenizers and Datasets

Matching datasets and tokenizers

Standard NLP tasks with specific vocabulary

Exploring the scope of GPT-3

Semantic Role Labeling with BERT-Based Transformers

Semantic Role Labeling with BERT-Based Transformers

Getting started with SRL

SRL experiments with the BERT-based model

Difficult samples

Questioning the scope of SRL

Let Your Data Do the Talking: Story, Questions, and Answers

Let Your Data Do the Talking: Story, Questions, and Answers

Method 0: Trial and error

Method 1: NER first

Method 2: SRL first

Detecting Customer Emotions to Make Predictions

Detecting Customer Emotions to Make Predictions

Getting started: Sentiment analysis transformers

The Stanford Sentiment Treebank (SST)

Predicting customer behavior with sentiment analysis

Sentiment analysis with GPT-3

Some Pragmatic I4.0 thinking before we leave

Analyzing Fake News with Transformers

Analyzing Fake News with Transformers

Emotional reactions to fake news

A rational approach to fake news

Interpreting Black Box Transformer Models

Interpreting Black Box Transformer Models

Transformer visualization with BertViz

Transformer visualization via dictionary learning

Exploring models we cannot access

From NLP to Task-Agnostic Transformer Models

From NLP to Task-Agnostic Transformer Models

Choosing a model and an ecosystem

From Task-Agnostic Models to Vision Transformers

An expanding universe of models

The Emergence of Transformer-Driven Copilots

The Emergence of Transformer-Driven Copilots

Prompt engineering

Domain-specific GPT-3 engines

Transformer-based recommender systems

Computer vision

Humans and AI copilots in metaverses

The Consolidation of Suprahuman Transformers with OpenAI’s ChatGPT and GPT-4

The Consolidation of Suprahuman Transformers with OpenAI’s ChatGPT and GPT-4

Consolidating suprahuman NLP with ChatGPT and GPT-4 transformer models

Jump-starting the ChatGPT API

ChatGPT Plus writes and comments on a program

Getting started with the GPT-4 API

Advanced prompt engineering

Explainable AI (XAI)

Getting started with the DALL-E 2 API

Putting it all together

Other Books You May Enjoy

Other Books You May Enjoy

Index

Appendix I — Terminology of Transformer Models

Appendix I — Terminology of Transformer Models

Attention heads

Appendix II — Hardware Constraints for Transformer Models

Appendix II — Hardware Constraints for Transformer Models

The Architecture and Scale of Transformers

Why GPUs are so special

GPUs are designed for parallel computing

GPUs are also designed for matrix multiplication

Implementing GPUs in code

Testing GPUs with Google Colab

Google Colab Free with a CPU

Google Colab Pro with a GPU

Appendix III — Generic Text Completion with GPT-2

Appendix III — Generic Text Completion with GPT-2

Step 1: Activating the GPU

Step 2: Cloning the OpenAI GPT-2 repository

Step 3: Installing the requirements

Step 4: Checking the version of TensorFlow

Step 5: Downloading the 345M-parameter GPT-2 model

Steps 6-7: Intermediate instructions

Steps 7b-8: Importing and defining the model

Step 9: Interacting with GPT-2

Appendix IV — Custom Text Completion with GPT-2

Appendix IV — Custom Text Completion with GPT-2

Training a GPT-2 language model

Appendix V — Answers to the Questions

Chapter 1, What are Transformers?

Chapter 2, Getting Started with the Architecture of the Transformer Model

Chapter 3, Fine-Tuning BERT Models

Chapter 4, Pretraining a RoBERTa Model from Scratch

Chapter 5, Downstream NLP Tasks with Transformers

Chapter 6, Machine Translation with the Transformer

Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines

Chapter 8, Applying Transformers to Legal and Financial Documents for AI Text Summarization

Chapter 9, Matching Tokenizers and Datasets

Chapter 10, Semantic Role Labeling with BERT-Based Transformers

Chapter 11, Let Your Data Do the Talking: Story, Questions, and Answers

Chapter 12, Detecting Customer Emotions to Make Predictions

Chapter 13, Analyzing Fake News with Transformers

Chapter 14, Interpreting Black Box Transformer Models

Chapter 15, From NLP to Task-Agnostic Transformer Models

Chapter 16, The Emergence of Transformer-Driven Copilots

Chapter 17, The Consolidation of Suprahuman Transformers with OpenAI’s ChatGPT and GPT-4

Customer Reviews

5 (1)

5 star

100%

4 star

0

3 star

0

2 star

0

1 star

0

Designing a universal text-to-text model

Google’s NLP technical revolution started with Vaswani et al. (2017), the original Transformer, in 2017. Attention is All You Need toppled 30+ years of artificial intelligence belief in RNNs and CNNs applied to NLP tasks. It took us from the stone age of NLP/NLU to the 21^st century in a long-overdue evolution.

Chapter 7, The Rise of Suprahuman Transformers with GPT-3 Engines, summed up a second revolution that boiled up and erupted between Google’s Vaswani et al. (2017) original Transformer and OpenAI’s Brown et al. (2020) GPT-3 transformers. The original Transformer was focused on performance to prove that attention was all we needed for NLP/NLU tasks.

OpenAI’s second revolution, through GPT-3, focused on taking transformer models from fine-tuned pretrained models to few-shot trained models that required no fine-tuning. The second revolution was to show that a machine can learn a language and apply it to...