At the most fundamental level, AI systems operate through mathematical calculations. Without going too deep, here is the simplest way to understand tokens:
Every word, and sometimes spaces or special characters, gets converted into a numerical representation when you send text to an AI model. For example, the sentence “What is a Blackhole?” becomes 6 tokens. The exact conversion process is outside the scope of this post, but you can try it out yourself using the tokenizer tool at https://platform.openai.com/tokenizer.
In simple terms, tokens are the basic numerical units that all AI models use to process and understand text.
AI models do not have memory. Instead, they have something similar to temporary computer memory, often compared to RAM. This is called a context window. The context window determines the maximum amount of tokens the model can handle at one time. Context is measured in tokens. So if a model has a context window of 250,000 tokens, your entire prompt (input) + the output tokens for each generation must fit within that limit.
The prompt + output should not exceed the context window.
In a chatbot conversation, every message in your history contributes to the prompt. Many people assume the model only reads the last message, but actually, the full chat history plus your latest message plus system instructions, and any attached files all combine to form the complete prompt. Each time you press send, the model reads everything in the conversation so far along with any new content.