Why is “long-context” research so unsatisfactory?
“Context rot” is (as defined by anthropic [1]):
[when] the number of tokens in the context window increases, the model’s ability to accurately recall information from that context decreases
But people have noticed that context rot is this weird thing that happens when your Claude Code history gets bloated, or you chat with ChatGPT for a long time - it’s almost like, as the conversation goes on, the model gets…dumber?
The natural solution is something along the lines of, “well maybe if I split the context into two model calls, then combine them in a third model call, I’d avoid this degradation issue”. Authors have taken this intuition as the basis for a recursive language model.
Recursive Language Models (RLMs)
A recursive language model is a thin wrapper around a LM that can spawn (recursive) LM calls for intermediate computation - from the perspective of the user or programmer, it is the same as a model call.
In other words, you query a RLM as an “API” like you would a LM, i.e. rlm.completion(messages) is a direct replacement for gpt5.completion(messages). RLM takes a context-centric view rather than a problem-centric view of input decomposition. This framing retains the functional view that we want a system that can answer a particular query over some associated context.
A recursive language model call replaces a language model call. It provides the user the illusion of near infinite context, while under the hood a language model manages, partitions, and recursively calls itself or another LM over the context accordingly to avoid context rot.

Under the hood, a RLM provides only the query to the LM (which we call the root LM, or LM with depth=0), and allows this LM to interact with an environment, which stores the (potentially huge) context.
We choose the environment to be a loop where the LM can write to and read the output of cells of a Python REPL Notebook (similar to a Jupyter Notebook environment) that is pre-loaded with the context as a variable in memory. The root LM has the ability to call a recursive LM (or LM with depth=1) inside the REPL environment as if it were a function in code, allowing it to naturally peek at, partition, grep through, and launch recursive sub-queries over the context.
Our instantiation of the RLM framework provides the root LM the ability to analyze the context in a Python notebook environment, and launch recursive LM calls (depth=1) over any string stored in a variable. The LM interacts by outputting code blocks, and it receives a (truncated) version of the output in its context. When it is done, it outputs a final answer with FINAL(…) tags or it can choose to use a string in the code execution environment with FINAL_VAR(…).
When the root LM is confident it has an answer, it can either directly output the answer as FINAL(answer), or it can build up an answer using the variables in its REPL environment, and return the string inside that answer as FINAL_VAR(final_ans_var).

This setup yields several benefits that are visible in practice:
- The context window of the root LM is rarely clogged — because it never directly sees the entire context, its input context grows slowly.
- The root LM has the flexibility to view subsets of the context, or naively recurse over chunks of it. For example, if the query is to find a needle-in-the-haystack fact or multi-hop fact, the root LM can use
regexqueries to roughly narrow the context, then launch recursive LM calls over this context. This is particularly useful for arbitrary long context inputs, where indexing a retriever is expensive on the fly! - The context can, in theory, be any modality that can be loaded into memory. The root LM has full control to view and transform this data, as well as ask sub-queries to a recursive LM.
What is happening under the hood?
Based on [2]:
Step 1: Context lives in the environment
# Instead of:
response = llm("Here's 10M tokens... answer this")
# RLM does:
context = "..." # 10M tokens stored as variable
# Model never sees this directly in its prompt!
Step 2: Model explores programmatically
# The RLM can do things like:
print(context[:1000]) # Peek at first 1000 chars
print(len(context)) # Check total length
# Search with regex
import re
matches = re.findall(r'festival.*celebration', context)# Filter smartly
relevant = [
line for line in context.split('\n')
if 'magic number' in line.lower()
]
Step 3: Spawn recursive sub-models
# When it finds something interesting:
chunk = context[50000:100000]
answer = llm_query(f"Extract dates: {chunk}")
# Batch process:
results = []
for i in range(0, len(context), 100000):
chunk = context[i:i+100000]
result = llm_query(f"Find patterns: {chunk}")
results.append(result)final = llm_query(f"Synthesize: {results}")
Step 4: Return the answer
FINAL(synthesized_answer)
Real world example
Let’s trace through a real example from the original paper [3]. Task: Answer using 1,000 documents (≈8.3M tokens):
“This vegetable stew uses fish… What are the first and last names of the person who won the beauty pageant at the thirteenth anniversary of this festival?”
What GPT-5 Does (Naive):
- Can’t fit 8.3M tokens
- Even chunked, loses connections
- Score: 0% (literally cannot attempt)
What RLM(GPT-5) Does:
Turn 1: Probe the corpus
# Search for keywords
festival_docs = []
for i, doc in enumerate(documents):
if re.search(r'festival|celebration', doc, re.I):
festival_docs.append((i, doc))print(f"Found {len(festival_docs)} docs")
print(documents[6][:500]) # Looks promising!
Turn 2: Deep dive with recursive call
doc = documents[6]
analysis = llm_query(f"""
Find info about:
- Festival name and location
- Beauty pageant winners
- Dates
Context: {doc}
""")
# Returns: "Maria Dalmacio won..."
Turn 3: Verify and return
verify = llm_query(f"Confirm Maria Dalmacio: {analysis}")
FINAL("Maria Dalmacio")
How RLMs Think: Emergent Behaviors
Without special training, RLMs develop smart strategies:
1. Smart Filtering
# Uses model priors:
matches = re.findall(r'La Union|festival', context)
# Narrows 1M+ lines to ~100 relevant ones
2. Intelligent Chunking
# Semantic batching:
for batch in chunks(lines, size=100):
label = llm_query(f"Classify: {batch}")
3. Answer Verification
# Fresh sub-LLMs avoid context rot:
answer = llm_query("What's the answer?")
verify = llm_query(f"Is '{answer}' correct?")
4. Long Output Generation
# Bypasses output limits:
parts = []
for section in data:
part = llm_query(f"Process {section}")
parts.append(part)
FINAL_VAR(parts) # Returns variable!
Everything has limitations!
1. Inefficient Decomposition
Qwen3-Coder: Thousands of sub-calls for simple tasks
GPT-5: Sometimes too conservative
Example: Qwen made a separate call per line (1000+ calls) when 10–20 batches would suffice.
2. Answer Verification Loops
Iteration 1: Correct answer
Iteration 2: Verify (correct)
Iteration 3: Verify again
…
Iteration 6: Returns wrong answer
3. Synchronous Execution
All sub-calls blocking/sequential. With async: 10–100x faster possible.
4. REPL Brittleness
Must wrap answers in FINAL() tags. Sometimes forgets or wraps intermediate thoughts.
Practical experimentation
After everything i presented above, i had to try and experiment with this new (potential) new paradigm in LLMs. There is an official repo for this at [4] This is what i experimented:
- My goal: Summarize content of a specific page present in a long PDF.
- Why: Long context are a challenge to LLMs. Achieving strong performance generally requires implementing a retrieval-augmented generation (RAG) system that retrieves only the sections of lengthy documents pertinent to the task.
Selected PDF for the experiment: On The Origin Of Species by Charles Darwin [5].
rlm = RLM(
backend="openai", # or "portkey", etc.
backend_kwargs={
"model_name": "gpt-5-mini",
"api_key": os.getenv("OPENAI_API_KEY"),
"base_url": "https://openrouter.ai/api/v1",
},
environment="local",
environment_kwargs={
"setup_code": f"import shutil; shutil.copy(r'{PDF_PATH}', 'On-the-Origin-of-Species.pdf')",
},
max_depth=1,
logger=logger,
verbose=True, # For printing to console with rich, disabled by default.
)
Question
The selected question was:
In your workspace there is a PDF file called ‘On-the-Origin-of-Species.pdf’. Open it using PyPDF2 and summarize the content on page 52.
Why?
- Tradicional LLMs can’t do this. LLM’s have no logic on pagination unless you pass all text together with pagination.
- Partial selection of content without looking at the whole document
- It enables the us to see the points where this approach is really good (execute code in a sandbox environment to find what it needs)
Steps executed
- First it checks the request
╭─ ▸ Code Execution (0.000s) ────────────────────────────────────────────────────────────────────────────────────╮
│ Code: │
│ # Inspect the provided context variable │
│ print("Context length:", len(context)) │
│ print("Context content:") │
│ print(context) │
│ │
│ Output: │
│ Context length: 135 │
│ Context content: │
│ In your workspace there is a PDF file called 'On-the-Origin-of-Species.pdf'. Open it using PyPDF2 and summarize │
│ the content on page 52. │
│ │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
- Open on the requested page
╭─ ▸ Code Execution (0.056s) ────────────────────────────────────────────────────────────────────────────────────╮
│ Code: │
│ # Attempt to find and read 'On-the-Origin-of-Species.pdf' and extract text from page 52. │
│ import os │
│ │
│ pdf_filename = 'On-the-Origin-of-Species.pdf' │
│ print("Working directory contents:", os.listdir('.')) │
│ │
│ if not os.path.isfile(pdf_filename): │
│ print(f"File '{pdf_filename}' not found in the current directory.") │
│ else: │
│ print(f"Found '{pdf_filename}'. Attempting to extract page 52 text...") │
│ │
│ page_text = None │
│ # Try PyPDF2 first, then pypdf as fallback │
│ try: │
│ import PyPDF2 │
│ print("Using PyPDF2 to read PDF...") │
│ with open(pdf_filename, 'rb') as f: │
│ reader = PyPDF2.PdfReader(f) │
│ num_pages = len(reader.pages) │
│ print("Number of pages in PDF:", num_pages) │
│ page_index = 51 # page 52 -> zero-based index 51 │
│ if page_index < num_pages: │
│ page = reader.pages[page_index] │
│ # PyPDF2 has .extract_text() │
│ try: │
│ page_text = page.extract_text() │
│ except Exception as e: │
│ print("PyPDF2 extract_text failed:", e) │
│ except Exception as e: │
│ print("PyPDF2 not available or failed to import:", e) │
│ │
│ if page_text is None: │
│ try: │
│ import pypdf │
│ print("Using pypdf to read PDF...") │
│ with open(pdf_filename, 'rb') as f: │
│ reader = pypdf.PdfReader(f) │
│ num_pages = len(reader.pages) │
│ print("Number of pages in PDF:", num_pages) │
│ page_index = 51 │
│ if page_index < num_pages: │
│ page = reader.pages[page_index] │
│ try: │
│ page_text = page.extract_text() │
│ except Exception as e: │
│ print("pypdf extract_text failed:", e) │
│ except Exception as e: │
│ print("pypdf not available or failed to import:", e) │
│ │
│ if page_text is None: │
│ try: │
│ # Try using PyMuPDF (fitz) if available │
│ import fitz │
│ print("Using PyMuPDF (fitz) to read PDF...") │
│ doc = fitz.open(pdf_filename) │
│ num_pages = doc.page_count │
│ print("Number of pages in PDF:", num_pages) │
│ page_index = 51 │
│ if page_index < num_pages: │
│ page = doc.load_page(page_index) │
│ page_text = page.get_text("text") │
│ except Exception as e: │
│ print("PyMuPDF (fitz) not available or failed to import:", e) │
│ │
│ if page_text is None: │
│ print("Unable to extract page text with available PDF libraries.") │
│ else: │
│ # Save the extracted text to a variable for later use │
│ extracted_page_text = page_text │
│ print("Extracted text (first 1000 chars):") │
│ print(extracted_page_text[:1000]) │
│ │
│ Output: │
│ Working directory contents: ['On-the-Origin-of-Species.pdf', 'context_0.txt'] │
│ Found 'On-the-Origin-of-Species.pdf'. Attempting to extract page 52 text... │
│ Using PyPDF2 to read PDF... │
│ Number of pages in PDF: 377 │
│ Extracted text (first 1000 chars): │
│ 3. STRUGGLE FOR EXISTENCE. │
│ Bears on natural selection. The term used in a wide sense. │
│ Geometrical powers of increase. Rapid increase of naturalised │
│ animals and plants. Nature of the checks to increase. │
│ Competition universal. Effects of climate. Protection from the │
│ number of individuals. Complex relations of all animals and │
│ plants throughout nature. Struggle for life most severe between │
│ individuals and varieties of the same species; often severe │
│ between species of the same genus. The relation of organism to │
│ organism the most important of all relations. │
│ Before entering on the subject of this chapter, I must make a │
│ few preliminary │
│ remarks, to show how the struggle for existence bears on Natural │
│ Selection. It │
│ has been seen in the last chapter that amongst organic beings in a state │
│ of nature │
│ there is some individual variability; indeed I am not aware that this │
│ has ever been │
│ disputed. It is immaterial for us whether a multitude of │
│ doubtful forms be called │
│ species or sub-species or varieties; │
│ │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
- Request to summarize and return the result
╭─ ▸ Code Execution (7.373s) ────────────────────────────────────────────────────────────────────────────────────╮
│ Code: │
│ # Use the available llm_query to produce a concise summary of the extracted page text. │
│ prompt = ( │
│ "You are given the OCR/extracted text of page 52 from Charles Darwin's " │
│ "\"On the Origin of Species\" (shown below). Provide a concise summary (3-6 sentences) " │
│ "highlighting the main topics, arguments, and their significance for Darwin's theory. " │
│ "Do not add any information not present on the page; focus only on the content provided.\n\n" │
│ "Page text:\n\n" + extracted_page_text │
│ ) │
│ │
│ page52_summary = llm_query(prompt) │
│ print("Summary produced:\n") │
│ print(page52_summary) │
│ │
│ Output: │
│ Summary produced: │
│ │
│ Darwin introduces the "struggle for existence" as key to understanding natural selection, after noting that │
│ individual variability and some well-marked varieties unquestionably exist. He argues that mere variability, │
│ while necessary, cannot by itself explain how the many exquisite adaptations and co-adaptations of organs, │
│ organisms, and conditions of life have been perfected, citing examples like the woodpecker, mistletoe, │
│ parasites, diving beetles, and plumed seeds. He also downplays the importance of rigidly ranking doubtful forms │
│ as species or varieties for the present argument. Finally, he poses the central question of how incipient │
│ varieties are ultimately converted into distinct species that differ far more from one another than do ordinary │
│ varieties. │
│ │
│ │
│ ↳ 1 sub-call(s) │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
- Final answer!
╭─ ★ Final Answer ────────────────────────────────────────────────────────────────────────────────────────────────╮
│ │
│ Darwin introduces the "struggle for existence" as key to understanding natural selection, after noting that │
│ individual variability and some well-marked varieties unquestionably exist. He argues that mere variability, │
│ while necessary, cannot by itself explain how the many exquisite adaptations and co-adaptations of organs, │
│ organisms, and conditions of life have been perfected, citing examples like the woodpecker, mistletoe, │
│ parasites, diving beetles, and plumed seeds. He also downplays the importance of rigidly ranking doubtful │
│ forms as species or varieties for the present argument. Finally, he poses the central question of how │
│ incipient varieties are ultimately converted into distinct species that differ far more from one another than │
│ do ordinary varieties. │
│ │
╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Conclusions
This is a very interesting approach that has real world potential for some use cases.
Recursive Language Models represent a compelling shift in how we approach long-context tasks, not by expanding what a model can hold in memory, but by giving it the tools to navigate large contexts intelligently. The practical experiment above demonstrates this clearly: rather than attempting to ingest an entire book, the RLM located, extracted, and summarized a specific page with precision, using only a handful of targeted operations.
This paradigm is not a replacement for RAG but more close to an agent with programmatic access to its own retrieval process. The tradeoffs are real: cost, latency, and decomposition quality all depend heavily on the underlying model’s reasoning ability. But for tasks where context structure is unknown at indexing time, or where on-the-fly exploration is required, RLMs offer a genuinely novel and practical alternative.
As models become more capable coders and the cost per token continues to fall, the inefficiencies that currently limit RLMs will shrink, making this a paradigm worth watching closely.
References
[1]: https://www.anthropic.com/engineering/effective-context-engineering-for-ai-agents
[3]: https://arxiv.org/pdf/2512.24601
[4]: https://github.com/alexzhang13/rlm?tab=readme-ov-file
[5]: https://www.vliz.be/docs/Zeecijfers/Origin_of_Species.pdf