Introduction | Train Large Language Models Faster

Book Overview & Buying
Table Of Contents

Train Large Language Models Faster - Parallelism Deep Dive

By : Paulo Dichone

Buy this Video

Train Large Language Models Faster - Parallelism Deep Dive

By: Paulo Dichone

Buy this Video

Overview of this book

This course offers an in-depth exploration of parallelism in Large Language Model (LLM) training. Beginning with foundational IT concepts like cloud computing, GPUs, and network communication, the course introduces various parallelism techniques such as data parallelism, model parallelism, hybrid approaches, and pipeline parallelism, explaining their benefits and trade-offs. You’ll then apply these strategies in hands-on demos using real-world datasets like MNIST and WikiText. As you progress, you’ll work on true parallelism with multiple GPUs through platforms like Runpod.io, and dive into essential topics such as fault tolerance, scalability, and checkpointing strategies. These lessons ensure your training systems are resilient and optimized for large-scale machine learning workflows. With insights into GPU architectures and advanced tools like DeepSpeed, you'll be equipped to handle the complexities of training massive models efficiently. Whether you're an AI researcher or a data scientist, this course provides the knowledge and practical experience needed to accelerate LLM training and build scalable, efficient AI systems. Through a combination of theoretical lessons and hands-on applications, you’ll master parallelism techniques and become proficient in building and optimizing high-performance LLM training pipelines.

Introduction

Introduction & What Is This Course About

Course Structure

DEMO - What You'll Build in This Course

Strategies for Parallelizing LLMS - Deep Dive

What is Parallelism and Why it Matters

Understanding the Single GPU Strategy

Understanding the Parallel Strategy and Advantages

Parallelism vs Single GPU - Summary

IT Fundamental Concepts

IT Fundamentals - Introduction

Introduction to Cloud Computing and Traditional IT

What is a Computer - CPU and RAM Overview

Data Storage and File Systems

OS File System Structure

LAN Introduction

What is the Internet

Internet Communication Deep Dive

Understanding Servers and Clients

GPUs - Overview

GPU Architecture for LLM Training Deep Dive

GPU Architecture for LLM Training

Why this Architecture Excels

Deep and Machine Learning - Deep Dive

Machine and Deep Learning Introduction

Deep and Machine Learning - Overview and Breakdown

Deep Learning Key Aspects

Deep Neural Networks - Deep Dive

The Single Neuron Computation - Deep Dive

Weights

Activation Functions - Deep Dive

Deep Learning - Summary

Machine Learning Introduction - ML vs DL

Learning Types and Full ML & DL Analogy Example

DL and ML Comparative Capabilities - Summary

Large Language Models - Fundamentals of AI and LLMs

Introduction

The Transformer Architecture Fundamentals

The Self-Attention Mechanism - Analogy

The Transformer Architecture Animation

The Transformer Library - Deep Dive

Parallel Computing Fundamentals & Parallelism in LLM Training

Parallel Computing Introduction - Key Concepts

Parallel Computing Fundamentals and Scaling Laws - Deep Dive

Types of Parallelism in LLM Training - Data, Model, and Hybrid Parallelism

Types of Parallelism in LLM Training

Data Parallelism - How It Works

Data Parallelism Advantages for LLM Training

Real-world Example - Data Parallelism in GPT-3 Training

Model Parallelism and Tensor Parallelism and Layer Parallelism - Deep Dive

LLM Relevance and Implementation

Model vs Data Parallelism

Key Differences Highlighted - Data vs Model Parallelism

Data vs Model Parallelism

Hybrid Parallelism - Animation

Hybrid Parallelism - What is It and Motivation

Types of Parallelism - Pipeline and Tensor Parallelism

Pipeline Parallelism Overview

Pipeline Parallelism Key Concepts and How it Works - Step by Step

Pipeline Bubbles Key Concepts

Pipeline Schedules Key Concepts

Activation Recomputation - Overview and Introduction

Neural Network and Activation and Backward and Forward Passes - Full Dive

Understanding Activation Recomputation vs Standard Training - Deep Dive

Demo - Activation Recomputation Visualization

Activation Recomputation vs Standard Approach

Benefits of Activation Recomputation and Implementation Strategies

Pipeline Parallelism Implementation Frameworks and Key Takeaways

Tensor Parallelism - Deep Dive

What is Tensor Parallelism and Why - Benefits

Tensor Parallel Pizza Making Analogy

Tensors and Partitioning Strategies - Deep Dive

Tensor Communication Patterns - Deep Dive

Device Mesh Communication Pattern - Deep Dive

How Components Work Together in Distributed LLM Training

Understanding Tensor Parallelism with LEGO Bricks Animation Demo

Putting it All Together - All Strategies in LLM Training

HANDS-ON: Strategies for Parallelism - Data Parallelism Deep Dive

Strategies for Parallelizing LLMs - Hands-on Introduction

Pytorch - LLM Training Library Overview

The Transformers Library - Overview

Numpy Overview

TorchVision and TorchDistributed Overview

DeepSpeed and Megatron-LM - Overview

Datasets and Why this Toolkit

HANDS-On: Data Parallelism - Training a Small Model - MNIST Dataset

Testing Pseudo Data Parallelism Trained Model

HANDS-ON: Data Parallelism - Colab - Full Demo

Data Parallelism - Simulated Parallelism on GPU Takeaways

HANDS-ON: Data Parallelism w/ WikiText Dataset & DeepSpeed Mem. Optimization

Hands-on: Data Parallelism - Wikitext-2 Dataset

DeepSpeed - Full Dive

Hands-on: Data Parallelism with DeepSpeed Optimization

Running TRUE Parallelism on Multiple GPU Systems - Runpod.io

Setup Runpod.io Environment Overview

Runpod SSH Setup

Setting up Runpod Parallelism in JupyterNotebook

HANDS-ON - Parallelism with IMDB Dataset - Deep Dive - True Parallelism

Runpod Cleanup

Fault Tolerance and Scalability & Advanced Checkpointing Strategies - Deep Dive

Fault Tolerance Introduction & Types of Failures in Distributed LLM Training

Strategies for Fault Tolerance

Checkpointing in LLM Training - Animation

Basic Checkpointing in LLM Training

Incremental Checkpointing in LLM Training

Asynchronous Checkpointing in LLM Training

Multi-level Checkpointing in LLM Training - Animation

Checkpoint Storage Considerations - Deep Dive

Implementing a Hybrid Approach - Performance, Failure, Optimizations - Full Dive

Checkpoint Storage Strategy - Summary

Advanced Topics and Emerging Trends

Wrap up and Next Steps

Course Summary and Next Steps

Train Large Language Models Faster - Parallelism Deep Dive

By : Paulo Dichone

Train Large Language Models Faster - Parallelism Deep Dive

By: Paulo Dichone

Overview of this book

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access