So, we've seen the different structures and types of GANs. We know that GANs can be used for a variety of tasks. But, what does a GAN actually output? Similar to the structure of a neural network (deep or otherwise), we can expect that the GAN will be able to output any value that a neural network can produce. This can take the form of a value, an image, or many other types of variables. Nowadays, we usually use the GAN architecture to apply and modify images.
Let's take a few examples to explore the power of GANs. One of the great parts about this section is that you will be able to implement every one of these architectures by the end of this book. Here are the topics we'll cover in the next section:
- Working with limited data – style transfer
- Dreaming new scenes – DCGAN
- Enhancing simulated data – SimGAN
There are three core sections we want to discuss here that involve typical applications of GANs: style transfer, DCGAN, and enhancing simulated data.
Have you ever seen a neural network that was able to easily convert a photo into a famous painter's style, such as Monet? GAN architecture is often employed for this type of network, called style transfer, and we'll learn how to do style transfer in one of our recipes in this book. This represents one of the simplest applications of generative adversarial network architecture that we can apply quickly. A simple example of the power of this particular architecture is shown here:
Image A represents in the input and Image B represents the style transferred image. The <style> has been applied to this input image.
One of the unique things about these agents is that they require fewer examples than the typical deep learning techniques you may be familiar with. With famous painters, there aren't that many training examples for each of their styles, which produces a very limited dataset and it took more advanced techniques in the past to replicate their painting styles. Today, this technique will allow all of us to find our inner Monet.
We talked about the network dreaming a new scene. Here's another powerful example of the GAN architecture. The Deep Convolution Generative Adversarial Network (DCGAN) architecture allows a neural network to operate in the opposite direction of a typical classifier. An input phrase goes into the network and produces an image output. The network that produces output images is attempting to beat a discriminator based on a classic CNN architecture.
Once the generator gets past a certain point, the discriminator stops training (https://www.slideshare.net/enakai/dcgan-how-does-it-work) and the following image shows how we go from an input to an output image with the DCGAN architecture:
Image A represents in the input and Image B represents the style transferred image; the input image now represents the conversion of the input to the new output space
Ultimately, the DCGAN takes in a set of random numbers (or numbers derived from a word, for instance) and produces an image. DCGANs are fun to play with because they learn relationships between an input and their corresponding label. If we attempted to use a word the model has never seen, it'll still produce an output image. I wonder what types of image the model will give us for words it has never seen.
Apple recently released the simGAN paper focused on making simulated images look real-how? They used a particular GAN architecture, called simGAN, to improve images of eyeballs. Why is this problem interesting? Imagine realistic hands with no models needed. It provides a whole new avenue and revenue stream for many companies once these techniques can be replicated in real life. Using the simGAN architecture, you'll notice that the actual network architectures aren't that complicated:
A simple example of the simGAN architecture. The architecture and implementation will be discussed at length
The real secret sauce is in the loss function that the Apple developers used to train the networks. A loss function is how the GAN is able to know when to stop training the GAN. Here’s the powerful piece to this architecture: labeled real data can be expensive to produce or generate. In terms of time and cost, simulated data with perfect labels is easy to produce and the trade space is controllable.