OpenAI unveils Reinforcement Fine-Tuning to build specialized AI models for complex domains

OpenAI is expanding its custom AI training offerings with a new method called Reinforcement Fine-Tuning (RFT). The technique aims to create specialized o1 models that can perform complex technical tasks with minimal training examples.

The new approach works differently from traditional supervised fine-tuning. Instead of just learning to copy the style and tone of training data, models can develop new ways of "thinking" through problems, according to OpenAI. When given a problem, the model gets time to work out a solution o1 style. An evaluation system then rates the answer - strengthening successful reasoning patterns while weakening incorrect ones.

Example of RFT medical training: case description, instructions and correct answer (Gen FOXE3). — An example of reinforcement learning in medicine: based on symptoms, the AI model should determine the most likely genetic causes and justify its answer. | Image: OpenAI

Flow chart: FOXE3 gene and additional genes are analyzed by the grader system, output score 0.7 — The diagram illustrates the evaluation of the correct answer (FOXE3) by a scoring algorithm. The evaluation is intended to reinforce the "thinking" process toward the correct answer. | Image: OpenAI

OpenAI says this approach works especially well for specialized fields like law, finance, engineering, and insurance that need deep technical knowledge. As an example, the company highlights its collaboration with Thomson Reuters, where they trained the compact o1 Mini model to work as a legal assistant.

Reinforcement learning for expert systems

Justin Ree, a bioinformatician at Berkeley Lab, used RFT to study rare genetic diseases. He trained the system using data extracted from hundreds of scientific papers that included symptoms and their associated genes.

Ree reports that the RFT-trained o1 Mini outperformed the standard o1 model at this task, despite being smaller and less expensive. He notes that the model's ability to explain its predictions makes it particularly useful.

Testing shows the fine-tuned mini model achieves the highest precision in gene identification, reaching up to 45 percent accuracy at maximum range.

Line chart: Comparison of gene identification accuracy for three model variants using different metrics — The fine-tuned mini model (o1-mini finetune) achieves the highest precision in gene identification with up to 45 percent at maximum range. | Image: OpenAI

Early access program

OpenAI is now accepting organizations into its Reinforcement Fine-Tuning Research Program. The program is designed for organizations working on complex tasks that could benefit from AI assistance.

Participants will receive access to the RFT API and can help improve it through feedback before its public release. OpenAI plans to make RFT more widely available in early 2025.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

OpenAI unveils Reinforcement Fine-Tuning to build specialized AI models for complex domains

Reinforcement learning for expert systems

Early access program

OpenAI hit $1 billion in monthly revenue for the first time in July

OpenAI CEO Sam Altman shifts focus to GPT-6 after the rocky rollout of GPT-5

Warmer-sounding LLMs are more likely to repeat false information and conspiracy theories

Deepseek’s first hybrid model V3.1 surpasses its R1 reasoning model on benchmarks

Meta's human-like chatbot personas can mislead users and result in real-world harm

OpenAI launches GPT-5 as a unified system with adaptive reasoning for complex tasks

OpenAI unveils Reinforcement Fine-Tuning to build specialized AI models for complex domains

Reinforcement learning for expert systems

Early access program

OpenAI hit $1 billion in monthly revenue for the first time in July

OpenAI CEO Sam Altman shifts focus to GPT-6 after the rocky rollout of GPT-5

Warmer-sounding LLMs are more likely to repeat false information and conspiracy theories