How LLMs Work 5 min read

What Is a Large Language Model?

You keep hearing the term “large language model,” often shortened to LLM. ChatGPT runs on one. So do Claude from Anthropic, Gemini from Google, and many other AI chatbots you encounter today. But what actually is an LLM?

The easiest way to separate the terms is this: ChatGPT, Claude, and Gemini are products you can talk to. An LLM is the model doing the language work underneath.

Think of it like a very experienced reader

Imagine someone who has read hundreds of millions of documents: novels, textbooks, articles, forum posts, code, recipes, legal contracts, transcripts, you name it.

After all that reading, they have developed an intuition for how language works. They can finish a sentence in a way that sounds natural. They can explain a concept, translate between styles, summarise an argument, or write in a specific tone.

They did not memorise everything word-for-word. They absorbed patterns. That is roughly what an LLM does.

What “large” means

The word “large” refers to the scale of two things:

  1. The training data. LLMs learn from enormous amounts of text. GPT-4 is estimated to have trained on trillions of words.
  2. The model itself. LLMs have billions of parameters, which are the internal numbers the model adjusts during learning. More parameters generally means the model can capture more complex patterns.

Large does not mean better in every way. Bigger models cost more to run and are slower. But at the scales used by modern LLMs, larger models tend to be more capable.

What “language model” means

A language model is a system trained to understand and generate text. Specifically, it learns to predict what comes next given what has come before.

During training, the model sees a sentence with the last word hidden, tries to predict it, checks how close it was, and updates itself to do better next time. Repeat that billions of times across trillions of words and you get a system that is surprisingly good at language.

How an LLM learns

1See text with a word hidden: “The cat sat on the ___“
2Predict the missing word
3Check against the real word
4Adjust and repeat, billions of times

LLMs versus traditional software

Traditional software follows rules. A calculator does exactly what you program it to do. If you did not write a rule for something, the program cannot handle it.

LLMs work differently. They do not have explicit rules for every situation. Instead, they have patterns absorbed from training. That is why they can handle questions no one specifically programmed an answer for. It is also why they sometimes get things wrong in unexpected ways.

What LLMs are good at

  • Following instructions in natural language
  • Explaining, summarising, and rewriting text
  • Writing code and spotting bugs
  • Translating between languages
  • Answering questions on topics covered in their training data

What LLMs struggle with

  • Facts that require real-time information
  • Precise arithmetic and counting
  • Reliable reasoning through complex chains of logic
  • Knowing when they do not know something
You now understand this

A large language model is a system trained on enormous amounts of text to predict and generate language. It learns patterns rather than rules, which makes it flexible but not always reliable. Most AI chatbots you use today are built on LLMs.