Open this lesson in your favourite AI. It'll walk you through the why, explain the demo, and quiz you on the try-it list.
Every model has a context window, measured in tokens, not words. When your prompt exceeds it, the oldest content is silently dropped (or the request fails). This single mechanical fact explains half the weird behavior you'll see with long conversations.
A token is roughly ¾ of a word in English. "Hello, world!" is 4 tokens. A 2,000-word document is ~2,600 tokens. Claude's default window is 200K tokens; GPT-4o's is 128K; open-source models often range 8K–32K.
If you paste a 500-page PDF into a 128K-token window, something has to give. Either the API rejects you, or the framework silently trims. Knowing your model's window is the first step to not being surprised.
Go to OpenAI's tokenizer and paste something you commonly ask about. Count the tokens. Now count your typical chat history. Are you anywhere near the limit?
Rough-count the tokens in the text below without running any tool (use the heuristic: ~4 characters per token in English). Then tell me:
1. Estimated token count.
2. Whether it fits in a 32K-token context window with 4K reserved for output.
3. What I'd cut if it doesn't fit.
Text:
"""
[PASTE HERE]
"""