MIT's RLMs: Breaking the 10M-Token Barrier Without Retraining Your LLMs

MIT researchers developed Recursive Language Models to handle 10 million tokens without retraining.

MIT researchers just cracked the code on handling 10 million-token prompts without retraining models—by making LLMs act like Python programmers.

The breakthrough comes from Recursive Language Models (RLMs) developed by MIT CSAIL, which use out-of-core algorithms to process prompts as external data. This approach enables analysis of millions of tokens without requiring model retraining.

"There is an entropy argument that implies you need exponentially more data samples as you increase the effective context window size," explained Alex Zhang, co-author of the study.

The framework employs two agents: a root model (e.g., GPT-5) as an orchestrator and a recursive model (e.g., Qwen3-Coder) as a worker. This dual-agent system achieved 91.33% accuracy on the BrowseComp-Plus benchmark (10M+ tokens) compared to 0% for base models.

While maintaining cost efficiency—up to 3x cheaper than summarization baselines—the system acknowledges potential "long-tailed" outlier costs.

The code is available on GitHub for experimentation, though Zhang clarified that RLMs work in tandem with standard retrieval methods like RAG; they do not serve as a replacement.

"When you look at customer engagement, it is not necessarily focused on marketing teams..."