Content
summary Summary

OpenAI is expanding its fine-tuning program for o4-mini, introducing Reinforcement Fine-Tuning (RFT) for organizations. The method is designed to help tailor models like o4-mini to highly specific tasks with the help of a programmable grading system.

Ad

RFT is designed to help organizations tune language models for highly specialized domains, such as law, finance, or security. Instead of relying on fixed answers, RFT uses a programmable "grader" that scores each model response based on custom criteria like style, accuracy, or security. Multiple graders can be combined to reflect more nuanced objectives.

With this setup, the model learns to prioritize responses that earn higher scores from the grader. The approach builds on reinforcement learning, the same core technique behind OpenAI's reasoning models like o3. RFT is available to verified organizations starting today.

Fine-tuning with graders, checkpoints, and structured outputs

The RFT process is organized into five main steps: First, a grader is set up to define the criteria for strong answers. Next, training and validation data is uploaded and the fine-tuning job begins. During training, the model produces several potential answers for each prompt, each of which is evaluated by the grader. A policy gradient algorithm updates the model to favor high-scoring responses.

Ad
Ad

OpenAI demonstrates RFT with a security example: a model is trained to answer questions about a company’s internal security policies, producing a JSON object with fields for "compliant" (yes, no, or "needs review") and "explanation." Both compliance and the quality of the explanation are graded. Training data must be in JSONL format and include the expected structured outputs.

During training, OpenAI tracks metrics such as average reward on both training and validation sets. High-performing checkpoints can be tested individually or resumed as needed. RFT is fully integrated with OpenAI's evaluation tools.

Watch the video: OpenAI

OpenAI first introduced RFT as an experimental technique in a research program in December 2024. Early results showed promise in specialized domains. OpenAI researcher Rohan Pandey says RFT could be especially valuable for vertical startups that train specialized agents on rare data.

Supervised fine-tuning now available for GPT-4.1 nano

Alongside the expanded RFT program for o4-mini, OpenAI is now offering supervised fine-tuning for the GPT-4.1 nano model, described as the fastest and most cost-effective GPT-4 variant. This enables organizations to make traditional adjustments using fixed input-response pairs.

Recommendation

Organizations that share their training data with OpenAI receive a 50% discount. Results are available through the standard API and can be integrated directly into existing applications.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • OpenAI has been developing reinforcement fine-tuning (RFT) since last December, and it is now available for verified organizations using the o4-mini model.
  • RFT uses a programmable evaluation system, where a grader assigns numerical scores to model responses, helping the AI improve performance in specialized areas like law, finance, and security.
  • OpenAI has also introduced Supervised Fine-Tuning for the GPT-4.1 nano model; Supervised Fine-Tuning is used for style adaptation, while RFT is aimed at targeted optimization for specific tasks.
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.