Introducing Davinci, Babbage, Curie, and Ada
The massive dataset that is used for training GPT-3 is the primary reason why it's so powerful. However, bigger is only better when it's necessary—and more power comes at a cost. For those reasons, OpenAI provides multiple models to choose from. Today there are four primary models available, along with a model for content filtering and instruct models.
The available models or engines (as they're also referred to) are named
Ada. Of the four,
Davinci is the largest and most capable.
Davinci can perform any tasks that any other engine can perform.
Babbage is the next most capable engine, which can do anything that
Ada can do.
Ada is the least capable engine, but the best-performing and lowest-cost engine.
When you're getting started and for initially testing new prompts, you'll usually want to begin with Davinci , then try,
Curie to see if one of them can complete the task faster or more cost-effectively. The following is an overview of each engine and the types of tasks that might be best suited for each. However, keep in mind that you'll want to test. Even though the smaller engines might not be trained with as much data, they are all still general-purpose models.
Davinci is the most capable model and can do anything that any other model can do, and much more—often with fewer instructions.
Davinci is able to solve logic problems, determine cause and effect, understand the intent of text, produce creative content, explain character motives, and handle complex summarization tasks.
Curie tries to balance power and speed. It can do anything that
Babbage can do but it's also capable of handling more complex classification tasks and more nuanced tasks like summarization, sentiment analysis, chatbot applications, and Question and Answers.
Babbage is a bit more capable than
Ada but not quite as performant. It can perform all the same tasks as
Ada, but it can also handle a bit more involved classification tasks, and it's well suited for semantic search tasks that rank how well documents match a search query.
Ada is usually the fastest model and least costly. It's best for less nuanced tasks—for example, parsing text, reformatting text, and simpler classification tasks. The more context you provide
Ada, the better it will likely perform.
Content filtering model
These are models that are built on top of the
Curie models. Instruct models are tuned to make it easier to tell the API what you want it to do. Clear instructions can often produce better results than the associated core model.
A snapshot in time
A final note to keep in mind about all of the engines is that they are all a snapshot in time, meaning the data used to train them cuts off on the date the model was built. So, GPT-3 is not working with up-to-the-minute or even up-to-the-day data—it's likely weeks or months old. OpenAI is planning to add more continuous training in the future, but today this is a consideration to keep in mind.
All of the GPT-3 models are extremely powerful and capable of generating text that is indistinguishable from human-written text. This holds tremendous potential for all kinds of potential applications. In most cases, that's a good thing. However, not all potential use cases are good.