
Context is not memory
Why every long-running agent hits a wall at turn 50 — and how we stopped feeding ours garbage.
Context is not memory.
Why every long-running agent hits a wall at turn 50 — and how we stopped feeding ours garbage.
There's a moment, around the fiftieth turn, when a useful agent becomes useless.
You've seen it. A research session that started sharp is now hallucinating sources it cited three turns ago. A CAD agent that understood your part library yesterday is asking what SUS304 is. A coding agent proudly refactors the function you told it not to touch.
We call this Agent Dementia. It is not a bug. It is the predictable collapse of the only memory architecture the industry has agreed on: stuff everything into the context window and hope.
This piece is about why that architecture breaks, what replaces it, and what becomes possible when an agent can actually remember.
The crisis has a shape
Every team building long-running agents hits the same four walls.
Tool bloat. Raw tool inputs and outputs get crammed into the prompt. A single web search adds four thousand tokens of noise. By the time the agent has called ten tools, the mission statement is buried under system chatter. Reasoning degrades — in our benchmarks, by a factor of roughly 5.3x after fifty turns.
The session silo. When the chat ends, the memory dies. Tomorrow the agent will re-learn your codebase, your style, your goals, your last seventeen decisions. It is Groundhog Day, priced per token.
The token tax. Every irrelevant log the agent reads it also pays for. Cost and quality move in opposite directions with every turn. Garbage in, garbage out, at API rates.
The similarity trap. Flat RAG retrieves what is mathematically similar, not what is logically relevant. Ask about a bolt hole tolerance and you get the three other documents that mentioned the word "bolt." The narrative of your work, the sequence of decisions that led you here, is absent.
The common mistake is to treat these as separate problems. They are the same problem in four costumes. The context window is being used as memory, and the context window is bad at being memory.
Context is working space. Memory is a database.
The human brain does not keep every experience in working memory. It couldn't if it tried. Working memory holds the last few seconds. The rest lives somewhere deeper, structured, retrieved on demand.
Agents need the same split.
We built CoMeT — Cognitive Memory Tree — to make that split explicit. The active context window stays small and fresh, holding only the last two or three turns plus compact references to everything else. Memory itself lives outside, in a hybrid index-plus-graph database, and is recalled through a tool call, not through stuffing.
When the agent needs historical context, it asks the memory system for exactly the resolution it needs. Orientation? Pull the summary. Reasoning? Pull the detailed summary. Precision? Pull the raw source.
Three resolutions, one call, no bloat.
Two small models, doing the quiet work
Two lightweight models sit between the agent and its memory.
The Sensor watches. It continuously monitors the session, the browsing history, and the tool calls, making a real-time decision about whether an interaction holds enough value to persist. Most don't. The ones that do get passed along.
The Compactor writes. It takes what the Sensor flagged and stores it across five layers:
Summary — the high-level overview, for orientation.
Detailed Summary — deeper context, for complex reasoning.
Trigger — the conditions under which this memory should activate.
Tags — metadata for rapid retrieval.
Raw Source — the untouched original, for precision when precision is what matters.
Neither model is large. Neither needs to be. They exist to keep the bigger model focused.
The numbers
We ran CoMeT against a baseline of full-context stuffing on 119 turns of real conversation data, about 100,000 characters. Three findings mattered.
Prompt noise dropped from 71% to 27%. The lost-in-the-middle problem that haunts long contexts largely went away.
Retrieval accuracy held. CoMeT matched full-context accuracy at 5.2x lower cost. In session mode, it was 13.5x cheaper and missed a single question.
Thirty percent of benchmark questions were resolved without ever touching the raw content. The summary layer was enough.
The deeper shift is architectural. Memory scaling moves from an O(n²) attention bottleneck into an O(n) retrieval process. The agent stops wasting compute on filtering its own logs and gets the cognitive breathing room back.
What becomes possible
We did not build CoMeT as an academic exercise. We built it because the product we are shipping — a CAD agent that takes a 2D engineering drawing and produces a manufacturing-grade parametric 3D model — does not work without it.
A single serious CAD session runs for hours. An engineer uploads a drawing, iterates on tolerances, asks the agent to regenerate a flange with a different PCD, compares two variants, references a part from last week's job, asks for GD&T annotations, and revises. Without memory, the agent forgets the first flange by the time it's working on the second. With CoMeT, it remembers the full trajectory, and the trajectory of every session before it.
This is what the manifesto meant when we said the factory shrinks to fit one person. It does not shrink because the tools are clever. It shrinks because the tools finally remember.
Where this goes
CoMeT is open source. We are releasing it as a memory protocol for any long-running agent — CAD, research, coding, sales, home management, anywhere an agent needs to run longer than a session.
Two directions we are exploring next.
A Memory Market, where users share memory maps and let other agents inherit accumulated workflows. A domain expert's months of engagement become a resource others can build on.
Memory-augmented reasoning, where the memory graph is treated as a reasoning substrate, not just a retrieval store. The agent thinks through its memory, not just from it.
Both need the memory protocol to work first. That is the problem we solved, so we can work on the next ones.
One more thing
There is a temptation, when the context window gets bigger, to believe the memory problem is going away. A million-token window will fix it. Ten million. A hundred million.
It won't.
Bigger context windows solve the storage question. They make the retrieval, relevance, and reasoning questions worse. An agent drowning in a hundred million tokens is not an agent with memory. It is an agent with access to a library and no index.
The real work is structural. The context window is for thinking. The memory is for knowing. Until agents understand the difference, they will keep forgetting why they started, fifty turns in, every time.
We started remembering. You can too.
CoMeT is open source — [GitHub link]. CoBrA is the runtime we built on top — early access for select engineers. Both are products of The Dimension Company, the CAD AI platform for defense and automotive.