Book Image

Modern Generative AI with ChatGPT and OpenAI Models

By : Valentina Alto
4.9 (8)
Book Image

Modern Generative AI with ChatGPT and OpenAI Models

4.9 (8)
By: Valentina Alto

Overview of this book

Generative AI models and AI language models are becoming increasingly popular due to their unparalleled capabilities. This book will provide you with insights into the inner workings of the LLMs and guide you through creating your own language models. You’ll start with an introduction to the field of generative AI, helping you understand how these models are trained to generate new data. Next, you’ll explore use cases where ChatGPT can boost productivity and enhance creativity. You’ll learn how to get the best from your ChatGPT interactions by improving your prompt design and leveraging zero, one, and few-shots learning capabilities. The use cases are divided into clusters of marketers, researchers, and developers, which will help you apply what you learn in this book to your own challenges faster. You’ll also discover enterprise-level scenarios that leverage OpenAI models’ APIs available on Azure infrastructure; both generative models like GPT-3 and embedding models like Ada. For each scenario, you’ll find an end-to-end implementation with Python, using Streamlit as the frontend and the LangChain SDK to facilitate models' integration into your applications. By the end of this book, you’ll be well equipped to use the generative AI field and start using ChatGPT and OpenAI models’ APIs in your own projects.
Table of Contents (17 chapters)
1
Part 1: Fundamentals of Generative AI and GPT Models
4
Part 2: ChatGPT in Action
11
Part 3: OpenAI for Enterprises

The history and current status of research

In previous sections, we had an overview of the most recent and cutting-edge technologies in the field of generative AI, all developed in recent years. However, the research in this field can be traced back decades ago.

We can mark the beginning of research in the field of generative AI in the 1960s, when Joseph Weizenbaum developed the chatbot ELIZA, one of the first examples of an NLP system. It was a simple rules-based interaction system aimed at entertaining users with responses based on text input, and it paved the way for further developments in both NLP and generative AI. However, we know that modern generative AI is a subfield of DL and, although the first Artificial Neural Networks (ANNs) were first introduced in the 1940s, researchers faced several challenges, including limited computing power and a lack of understanding of the biological basis of the brain. As a result, ANNs hadn’t gained much attention until the 1980s when, in addition to new hardware and neuroscience developments, the advent of the backpropagation algorithm facilitated the training phase of ANNs. Indeed, before the advent of backpropagation, training Neural Networks was difficult because it was not possible to efficiently calculate the gradient of the error with respect to the parameters or weights associated with each neuron, while backpropagation made it possible to automate the training process and enabled the application of ANNs.

Then, by the 2000s and 2010s, the advancement in computational capabilities, together with the huge amount of available data for training, yielded the possibility of making DL more practical and available to the general public, with a consequent boost in research.

In 2013, Kingma and Welling introduced a new model architecture in their paper Auto-Encoding Variational Bayes, called Variational Autoencoders (VAEs). VAEs are generative models that are based on the concept of variational inference. They provide a way of learning with a compact representation of data by encoding it into a lower-dimensional space called latent space (with the encoder component) and then decoding it back into the original data space (with the decoder component).

The key innovation of VAEs is the introduction of a probabilistic interpretation of the latent space. Instead of learning a deterministic mapping of the input to the latent space, the encoder maps the input to a probability distribution over the latent space. This allows VAEs to generate new samples by sampling from the latent space and decoding the samples into the input space.

For example, let’s say we want to train a VAE that can create new pictures of cats and dogs that look like they could be real.

To do this, the VAE first takes in a picture of a cat or a dog and compresses it down into a smaller set of numbers into the latent space, which represent the most important features of the picture. These numbers are called latent variables.

Then, the VAE takes these latent variables and uses them to create a new picture that looks like it could be a real cat or dog picture. This new picture may have some differences from the original pictures, but it should still look like it belongs in the same group of pictures.

The VAE gets better at creating realistic pictures over time by comparing its generated pictures to the real pictures and adjusting its latent variables to make the generated pictures look more like the real ones.

VAEs paved the way toward fast development within the field of generative AI. In fact, only 1 year later, GANs were introduced by Ian Goodfellow. Differently from VAEs architecture, whose main elements are the encoder and the decoder, GANs consist of two Neural Networks – a generator and a discriminator – which work against each other in a zero-sum game.

The generator creates fake data (in the case of images, it creates a new image) that is meant to look like real data (for example, an image of a cat). The discriminator takes in both real and fake data, and tries to distinguish between them – it’s the critic in our art forger example.

During training, the generator tries to create data that can fool the discriminator into thinking it’s real, while the discriminator tries to become better at distinguishing between real and fake data. The two parts are trained together in a process called adversarial training.

Over time, the generator gets better at creating fake data that looks like real data, while the discriminator gets better at distinguishing between real and fake data. Eventually, the generator becomes so good at creating fake data that even the discriminator can’t tell the difference between real and fake data.

Here is an example of human faces entirely generated by a GAN:

Figure 1.8 – Examples of photorealistic GAN-generated faces (taken from Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017:  https://arxiv.org/pdf/1710.10196.pdf)

Figure 1.8 – Examples of photorealistic GAN-generated faces (taken from Progressive Growing of GANs for Improved Quality, Stability, and Variation, 2017: https://arxiv.org/pdf/1710.10196.pdf)

Both models – VAEs and GANs – are meant to generate brand new data that is indistinguishable from original samples, and their architecture has improved since their conception, side by side with the development of new models such as PixelCNNs, proposed by Van den Oord and his team, and WaveNet, developed by Google DeepMind, leading to advances in audio and speech generation.

Another great milestone was achieved in 2017 when a new architecture, called Transformer, was introduced by Google researchers in the paper, – Attention Is All You Need, was introduced in a paper by Google researchers. It was revolutionary in the field of language generation since it allowed for parallel processing while retaining memory about the context of language, outperforming the previous attempts of language models founded on RNNs or Long Short-Term Memory (LSTM) frameworks.

Transformers were indeed the foundations for massive language models called Bidirectional Encoder Representations from Transformers (BERT), introduced by Google in 2018, and they soon become the baseline in NLP experiments.

Transformers are also the foundations of all the Generative Pre-Trained (GPT) models introduced by OpenAI, including GPT-3, the model behind ChatGPT.

Although there was a significant amount of research and achievements in those years, it was not until the second half of 2022 that the general attention of the public shifted toward the field of generative AI.

Not by chance, 2022 has been dubbed the year of generative AI. This was the year when powerful AI models and tools became widespread among the general public: diffusion-based image services (MidJourney, DALL-E 2, and Stable Diffusion), OpenAI’s ChatGPT, text-to-video (Make-a-Video and Imagen Video), and text-to-3D (DreamFusion, Magic3D, and Get3D) tools were all made available to individual users, sometimes also for free.

This had a disruptive impact for two main reasons:

  • Once generative AI models have been widespread to the public, every individual user or organization had the possibility to experiment with and appreciate its potential, even without being a data scientist or ML engineer.
  • The output of those new models and their embedded creativity were objectively stunning and often concerning. An urgent call for adaptation—both for individuals and governments—rose.

Henceforth, in the very near future, we will probably witness a spike in the adoption of AI systems for both individual usage and enterprise-level projects.