Researchers say they may have found a ladder to climb the "data wall"

Researchers at MIT have introduced a new framework called SEAL that lets large language models (LLMs) generate their own synthetic training data and improve themselves without outside help.

SEAL works in two stages. First, the model learns to create effective "self-edits" using reward learning. These self-edits are written as natural language instructions that define new training data and set optimization parameters. In the second stage, the system applies these instructions and updates its own weights through machine learning.

Flowchart: SEAL process for LM optimization through RL, self-edits (SE), testing, rewards, and iterative policy updates (θt+1). — The model suggests its own corrections (SE), updates its weights, and is evaluated on the task. Reinforcement learning (RL) helps it generate better edits with each cycle. | Image: Zweiger et al.

A key part of SEAL is its ReST^EM algorithm, which acts like a filter: it only keeps and reinforces self-edits that actually improve performance. The algorithm collects different edits, tests which ones work, and then trains the model using only the successful variants. SEAL also uses Low-Rank Adapters (LoRA), a technique that enables quick, lightweight updates without retraining the entire model.

The researchers put SEAL to the test in two scenarios. In the first, they used Qwen2.5-7B on a text comprehension task. The model generated logical inferences from text and then trained on its own outputs.

Synthetic implications generated from a text passage serve as training data for LoRA fine-tuning. | Image: Zweiger et al.

SEAL reached an accuracy of 47 percent, beating the comparison method's 33.5 percent. The quality of its self-generated data even surpassed that of OpenAI's GPT-4.1, despite the underlying model being much smaller.

Left: QA self-edits before vs. after 3 RL iterations. Right: Comparison of average edit lengths for base, prompt, and RL. — Reinforcement learning produces more detailed self-edits, which in turn boost performance. | Image: Zweiger et al.

In a second test, the team looked at Few-Shot Prompting using Llama 3.2-1B on a reasoning task. Here, the model picked different data processing techniques and training parameters from a preset toolkit. With SEAL, the model achieved a 72.5 percent success rate, compared to just 20 percent without any prior training.

"Catastrophic forgetting" remains a challenge

Despite the strong results, the researchers found several limits. The main issue is "catastrophic forgetting": when the model takes on new tasks, it starts to lose performance on previous ones. Training is also resource-intensive, since each evaluation of a self-edit takes 30 to 45 seconds.

Heat map: Model performance after successive self-edit iterations on passages 0–7 shows decreasing accuracy on earlier tasks. — Each self-edit round leads to declining accuracy on earlier-learned content. | Image: Zweiger et al.

Tackling the data wall

The MIT team sees SEAL as a step toward overcoming the so-called "data wall"—the point where all available human-written training data has been used up. Separately, researchers have also warned about the risk of "model collapse," where models degrade in quality when trained too heavily on low-quality AI-generated data. SEAL could enable ongoing learning and autonomous AI systems that keep adapting to new goals and information.

If models can teach themselves by absorbing new material—like scientific papers—and generating their own explanations and inferences, they could keep improving on rare or underrepresented topics. This kind of self-driven learning loop may help push language models past current limits.

Recommendation

AI research

AI language models struggle to connect the dots in long texts, study finds

The source code for SEAL is available on GitHub.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Researchers say they may have found a ladder to climb the "data wall"

"Catastrophic forgetting" remains a challenge

Tackling the data wall

AI language models struggle to connect the dots in long texts, study finds

AI learns math reasoning by playing Snake and Tetris-like games rather than using math datasets

New method adapts language models without training

Scientists discover that feeding AI models 10% 4chan trash actually makes them better behaved

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Researchers say they may have found a ladder to climb the "data wall"

"Catastrophic forgetting" remains a challenge

Tackling the data wall

Share

Bank details