OpenAI introduces new functions for API fine-tuning and expands its program for customer-specific models.
OpenAI has announced new features for self-service fine-tuning of GPT-3.5 via its API, which has been used by thousands of companies to train hundreds of thousands of models since its launch in August 2023, the company says.
New features include saving checkpoints during each training epoch, a new Playground interface for comparing model quality and performance, support for third-party platform integrations (starting with Weights and Biases), calculation of metrics across the entire validation dataset at the end of each session, and various improvements to the Fine-Tuning Dashboard.
According to OpenAI, the most common use cases for fine-tuning include training a model to generate better code in a specific programming language, summarizing text in a specific format, or creating personalized content based on user behavior.
Indeed, a global job listing and job placement platform, used GPT-3.5 Turbo fine-tuning to send personalized recommendations to jobseekers, reducing costs and latency by reducing the number of tokens in the prompts by 80 percent and increasing the number of personalized job recommendations sent per month from one million to around 20 million.
OpenAI believes in customized AI models for companies - and is becoming a service provider
OpenAI is also continuing to develop its program for customer-specific models. As part of Assisted Fine-Tuning, the company's technical teams work with customers to implement techniques that go beyond the Fine-Tuning API, which is supposed to be particularly helpful for companies that need assistance in building efficient training data pipelines, evaluation systems, and customized parameters to maximize model performance for their use case.
According to OpenAI, after several weeks of collaborative work on GPT-4, South Korean telecommunications provider SK Telecom was able to increase call summary quality by 35 percent, intent recognition accuracy by 33 percent, and satisfaction scores from 3.6 to 4.5 (out of 5) compared to standard GPT-4.
Harvey, an AI tool for lawyers and an OpenAI investment, achieved an 83 percent increase in factual answers to legal questions by making adjustments throughout the training process, with lawyers preferring the outputs of the customized model compared to GPT-4 in 97 percent of cases.
An independent test of GPT-4 fine-tuning by the data analysis platform Supersimple found that while fine-tuning improves task performance, there are challenges.
In the case of Supersimple, which achieved a 56 percent performance improvement over GPT-3.5, the benefits of fine-tuning GPT-4 were less significant than those observed when switching from GPT-3 to GPT-3.5. Additionally, the fine-tuned GPT-4 continued to show weaknesses in answering broad and open-ended questions, and had significantly higher latency and cost compared to GPT-3.5.
Fine-tuning can help models better understand content and extend the existing knowledge and capabilities of a model for a given task. Experts debate whether learning from many examples directly in the prompt ("many-shot in-context learning") might be more efficient than the comparatively more complex fine-tuning. Either way, it is easier to test.