Book Image

Exploring Deepfakes

By : Bryan Lyon, Matt Tora
Book Image

Exploring Deepfakes

By: Bryan Lyon, Matt Tora

Overview of this book

Applying Deepfakes will allow you to tackle a wide range of scenarios creatively. Learning from experienced authors will help you to intuitively understand what is going on inside the model. You’ll learn what deepfakes are and what makes them different from other machine learning techniques, and understand the entire process from beginning to end, from finding faces to preparing them, training the model, and performing the final swap. We’ll discuss various uses for face replacement before we begin building our own pipeline. Spending some extra time thinking about how you collect your input data can make a huge difference to the quality of the final video. We look at the importance of this data and guide you with simple concepts to understand what your data needs to really be successful. No discussion of deepfakes can avoid discussing the controversial, unethical uses for which the technology initially became known. We’ll go over some potential issues, and talk about the value that deepfakes can bring to a variety of educational and artistic use cases, from video game avatars to filmmaking. By the end of the book, you’ll understand what deepfakes are, how they work at a fundamental level, and how to apply those techniques to your own needs.
Table of Contents (15 chapters)
1
Part 1: Understanding Deepfakes
6
Part 2: Getting Hands-On with the Deepfake Process
10
Part 3: Where to Now?

Text-guided image generation

Text-guided image generation is an interesting category of generative AI. OpenAI had several developers release a paper called Learning Transferable Visual Models From Natural Language Supervision (https://arxiv.org/abs/2103.00020). Though I prefer the summary title they posted on their blog CLIP: Connecting Text and Images. CLIP was mentioned in Chapter 8, Applying the Lessons of Deepfakes, but we’ll talk about it some more here.

CLIP

CLIP is actually a pair of neural network encoders. One is trained on images while the other is trained on text. So far, this isn’t very unusual. The real trick comes from how the two are linked. Essentially, both encoders are passed data from the same image; the image encoder gets the image, the text encoder gets the image’s description, and then the encoding they generate is compared to each other. This training methodology effectively trains two separate models to create the same output given...