Claude 4 and the long-context arms race
Anthropic just pushed context windows to a million tokens. We tested whether it's actually useful, or just a benchmark flex.
Anthropic shipped Claude 4 with a one-million-token context window. Google had already done it with Gemini. OpenAI is rumoured to be close.
The honest answer to "does it matter?" is: sometimes.
For pasting an entire codebase and asking architecture questions — yes, it's a step change. For pasting a 400-page legal document and asking targeted questions — yes, retrieval over the full doc is genuinely better than chunked RAG for many tasks.
For most chat-style use cases, you'll never hit even a 200K window, and a smaller, cheaper model is the right call.
The takeaway: context size isn't the headline. Reliability *within* the context — needle-in-haystack accuracy, instruction-following deep into the prompt — is what matters. Claude 4 is genuinely good there. The marketing-number arms race, less so.
Tools mentioned
ChatGPT
Conversational assistant for any task
4.6(1,732)Claude
Anthropic's thoughtful AI for long-form reasoning
4.6(227)Gemini
Google's multimodal AI across the workspace
4.5(442)