Book Image

Generative Adversarial Networks Cookbook

By : Josh Kalin
Book Image

Generative Adversarial Networks Cookbook

By: Josh Kalin

Overview of this book

Developing Generative Adversarial Networks (GANs) is a complex task, and it is often hard to find code that is easy to understand. This book leads you through eight different examples of modern GAN implementations, including CycleGAN, simGAN, DCGAN, and 2D image to 3D model generation. Each chapter contains useful recipes to build on a common architecture in Python, TensorFlow and Keras to explore increasingly difficult GAN architectures in an easy-to-read format. The book starts by covering the different types of GAN architecture to help you understand how the model works. This book also contains intuitive recipes to help you work with use cases involving DCGAN, Pix2Pix, and so on. To understand these complex applications, you will take different real-world data sets and put them to use. By the end of this book, you will be equipped to deal with the challenges and issues that you may face while working with GAN models, thanks to easy-to-follow code solutions that you can implement right away.
Table of Contents (17 chapters)
Title Page
Copyright and Credits
About Packt

What does a GAN output?

So, we've seen the different structures and types of GANs. We know that GANs can be used for a variety of tasks. But, what does a GAN actually output? Similar to the structure of a neural network (deep or otherwise), we can expect that the GAN will be able to output any value that a neural network can produce. This can take the form of a value, an image, or many other types of variables. Nowadays, we usually use the GAN architecture to apply and modify images.

How to do it...

Let's take a few examples to explore the power of GANs. One of the great parts about this section is that you will be able to implement every one of these architectures by the end of this book. Here are the topics we'll cover in the next section:

  • Working with limited data – style transfer
  • Dreaming new scenes – DCGAN
  • Enhancing simulated data – SimGAN

How it works...

There are three core sections we want to discuss here that involve typical applications of GANs: style transfer, DCGAN, and enhancing simulated data.

Working with limited data – style transfer

Have you ever seen a neural network that was able to easily convert a photo into a famous painter's style, such as Monet? GAN architecture is often employed for this type of network, called style transfer, and we'll learn how to do style transfer in one of our recipes in this book. This represents one of the simplest applications of  generative adversarial network architecture that we can apply quickly. A simple example of the power of this particular architecture is shown here: 

Image A represents in the input and Image B represents the style transferred image. The <style> has been applied to this input image.

One of the unique things about these agents is that they require fewer examples than the typical deep learning techniques you may be familiar with. With famous painters, there aren't that many training examples for each of their styles, which produces a very limited dataset and it took more advanced techniques in the past to replicate their painting styles. Today, this technique will allow all of us to find our inner Monet.

Dreaming new scenes – DCGAN

We talked about the network dreaming a new scene. Here's another powerful example of the GAN architecture. The Deep Convolution Generative Adversarial Network (DCGAN) architecture allows a neural network to operate in the opposite direction of a typical classifier. An input phrase goes into the network and produces an image output. The network that produces output images is attempting to beat a discriminator based on a classic CNN architecture.

Once the generator gets past a certain point, the discriminator stops training ( and the following image shows how we go from an input to an output image with the DCGAN architecture:

Image A represents in the input and Image B represents the style transferred image; the input image now represents the conversion of the input to the new output space

Ultimately, the DCGAN takes in a set of random numbers (or numbers derived from a word, for instance) and produces an image. DCGANs are fun to play with because they learn relationships between an input and their corresponding label. If we attempted to use a word the model has never seen, it'll still produce an output image. I wonder what types of image the model will give us for words it has never seen.

Enhancing simulated data – simGAN

Apple recently released the simGAN paper focused on making simulated images look real-how? They used a particular GAN architecture, called simGAN, to improve images of eyeballs. Why is this problem interesting? Imagine realistic hands with no models needed. It provides a whole new avenue and revenue stream for many companies once these techniques can be replicated in real life. Using the simGAN architecture, you'll notice that the actual network architectures aren't that complicated:

A simple example of the simGAN architecture. The architecture and implementation will be discussed at length 

The real secret sauce is in the loss function that the Apple developers used to train the networks. A loss function is how the GAN is able to know when to stop training the GAN. Here’s the powerful piece to this architecture: labeled real data can be expensive to produce or generate. In terms of time and cost, simulated data with perfect labels is easy to produce and the trade space is controllable.