You have seen the letters GPT everywhere. But what do they actually mean, and why do they matter?
GPT stands for Generative Pre-trained Transformer. Each word tells you something real about how these models work.
Generative
The model generates new content. It does not retrieve pre-written answers from a database. It produces text from scratch, one piece at a time, based on what it has learned.
This is different from a search engine, which finds existing pages. A generative model creates something new each time.
Pre-trained
The model learned from a huge amount of text before you ever talked to it. This process is called pre-training.
During pre-training, the model read enormous amounts of text from the internet, books, and other sources. It learned patterns: which words tend to follow which, how arguments are structured, how code is formatted, how different topics connect.
By the time you send your first message, the model already has years of learning baked in. Your conversation just gives it context for what to generate next.
Transformer
This is the technical part. A Transformer is a specific architecture, a design for how the model is built, that was introduced by Google researchers in a 2017 paper called “Attention is All You Need.”
Before Transformers, language models struggled with long text. They had trouble remembering what was said at the start of a paragraph by the time they reached the end.
Transformers solved this with a mechanism called attention, which lets the model weigh which parts of the text are most relevant when producing each next word. You will learn more about attention in a later lesson.
Why the name matters
Understanding GPT helps you understand the whole family of AI models. Claude, Gemini, LLaMA, Mistral: these are all variations on the same basic architecture. They are all large language models built on Transformers. GPT is just OpenAI’s name for their version.
When people talk about “AI” in the context of chatbots, they are almost always talking about Transformer-based language models.
GPT stands for Generative Pre-trained Transformer. Generative means the model creates new text. Pre-trained means it learned from a massive dataset before you ever used it. Transformer is the underlying model architecture that lets it handle language effectively. These three ideas together describe how most modern AI chatbots work.