How LLMs Work 5 min read

How Do Chatbots Generate Answers?

When you send a message to ChatGPT, an answer appears in seconds. But what is actually happening between the moment you hit send and the moment words start appearing on screen?

It is not a lookup

The model does not search a database of pre-written answers and return the best match. It does not find and copy text from the web (unless a browsing tool is attached). It generates text from scratch, one piece at a time.

Think of it like autocomplete, taken very far

Your phone’s keyboard suggests the next word as you type. It is not thinking about what you want to say. It is just predicting what word tends to follow based on patterns.

A language model does the same thing, but at a much larger scale and with much more context. It looks at everything you have written, everything in the conversation so far, and asks: given all of this, what comes next?

Then it picks a word. Then it picks the next word. Then the next. That is how the response is built: one token at a time, from left to right.

Why answers vary

You may have noticed that if you ask the same question twice, you can get slightly different answers. That is intentional.

The model does not always pick the single most likely next word. It samples from a range of plausible options. This makes responses feel more natural and less robotic. A setting called temperature controls how adventurous this sampling is. Higher temperature means more variation and creativity. Lower temperature means more predictable, consistent output.

Building a reply, token by token

You: What is the capital of France?
The
The capital
The capital of
The capital of France is Paris.

The role of your prompt

The model generates its response based on everything in the context: your message, any system instructions from the app, and earlier messages in the conversation. The more clearly you frame what you want, the better the model can aim its next-token predictions toward a useful answer.

This is why prompting matters. You will explore that properly in a later lesson.

What the model does not have

The model has no memory between conversations. Each time you start a new chat, it starts fresh with no knowledge of previous sessions. Within a single conversation, it can refer back to what was said. Once you close and reopen, that context is gone.

It also has no ability to act in the world on its own. It generates text. Anything that goes beyond text (searching, clicking, running code) requires extra tools built on top of the model.

You now understand this

Chatbots generate answers one token at a time, choosing each next piece of text based on everything before it. There is no lookup, no retrieval from a database of answers. The model predicts what comes next, samples from likely options, and builds a response word by word. Your prompt sets the direction for all of those predictions.