OpenAI adds new fine-tuning options for o4-mini and GPT-4.1

GPT-Image-1 prompted by THE DECODER

OpenAI is expanding its fine-tuning program for o4-mini, introducing Reinforcement Fine-Tuning (RFT) for organizations. The method is designed to help tailor models like o4-mini to highly specific tasks with the help of a programmable grading system.

RFT is designed to help organizations tune language models for highly specialized domains, such as law, finance, or security. Instead of relying on fixed answers, RFT uses a programmable "grader" that scores each model response based on custom criteria like style, accuracy, or security. Multiple graders can be combined to reflect more nuanced objectives.

With this setup, the model learns to prioritize responses that earn higher scores from the grader. The approach builds on reinforcement learning, the same core technique behind OpenAI's reasoning models like o3. RFT is available to verified organizations starting today.

Fine-tuning with graders, checkpoints, and structured outputs

The RFT process is organized into five main steps: First, a grader is set up to define the criteria for strong answers. Next, training and validation data is uploaded and the fine-tuning job begins. During training, the model produces several potential answers for each prompt, each of which is evaluated by the grader. A policy gradient algorithm updates the model to favor high-scoring responses.

OpenAI demonstrates RFT with a security example: a model is trained to answer questions about a company’s internal security policies, producing a JSON object with fields for "compliant" (yes, no, or "needs review") and "explanation." Both compliance and the quality of the explanation are graded. Training data must be in JSONL format and include the expected structured outputs.

During training, OpenAI tracks metrics such as average reward on both training and validation sets. High-performing checkpoints can be tested individually or resumed as needed. RFT is fully integrated with OpenAI's evaluation tools.

Watch the video: OpenAI

OpenAI first introduced RFT as an experimental technique in a research program in December 2024. Early results showed promise in specialized domains. OpenAI researcher Rohan Pandey says RFT could be especially valuable for vertical startups that train specialized agents on rare data.

Supervised fine-tuning now available for GPT-4.1 nano

Alongside the expanded RFT program for o4-mini, OpenAI is now offering supervised fine-tuning for the GPT-4.1 nano model, described as the fastest and most cost-effective GPT-4 variant. This enables organizations to make traditional adjustments using fixed input-response pairs.

Recommendation

AI in practice

Here's every Apple Intelligence update Apple announced at WWDC 25

Organizations that share their training data with OpenAI receive a 50% discount. Results are available through the standard API and can be integrated directly into existing applications.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

OpenAI adds new fine-tuning options for o4-mini and GPT-4.1

Fine-tuning with graders, checkpoints, and structured outputs

Supervised fine-tuning now available for GPT-4.1 nano

Here's every Apple Intelligence update Apple announced at WWDC 25

Google brings Gemini 2.5 Pro and Deep Search to AI Mode and adds AI phone calling to search

Zuckerberg predicts that not wearing AI glasses in the future will put you at a cognitive disadvantage

Mistral unveils Voxtral, an open-source speech model with lower costs than proprietary rivals

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

OpenAI adds new fine-tuning options for o4-mini and GPT-4.1

Fine-tuning with graders, checkpoints, and structured outputs

Supervised fine-tuning now available for GPT-4.1 nano

Share

Bank details