OpenAI aims to create AI benchmarks that better reflect real-world use cases

OpenAI has introduced a new initiative called the "Pioneers Program" aimed at developing AI benchmarks tailored to specific industries. The company says the goal is to create evaluation methods that better reflect real-world use cases in areas such as law, finance, and healthcare—domains where existing benchmarks fall short. According to OpenAI, current AI benchmarks are often flawed. They tend to measure tasks that are difficult to interpret or overly susceptible to manipulation—criticisms that have also been directed at OpenAI itself. As reported previously, the company has faced scrutiny over its involvement in funding and promoting a prominent math evaluation dataset. In the coming months, OpenAI plans to collaborate with multiple companies to build domain-specific evaluation tools. These benchmarks will eventually be released publicly. The first cohort includes select startups focused on practical AI applications. Participating companies will also have the opportunity to work with OpenAI on improving model performance via reinforcement fine-tuning, a method the company recently introduced for customizing expert-level language models.

OpenAI aims to create AI benchmarks that better reflect real-world use cases

Meta refuses to sign EU's AI Code of Practice, citing legal uncertainty

OpenAI CEO Sam Altman warns users not to trust ChatGPT agent with sensitive or personal data

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

OpenAI aims to create AI benchmarks that better reflect real-world use cases

Meta refuses to sign EU's AI Code of Practice, citing legal uncertainty

OpenAI CEO Sam Altman warns users not to trust ChatGPT agent with sensitive or personal data

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team