Ad
Skip to content
Read full article about: OpenAI wants to retire the AI coding benchmark that everyone has been competing on

OpenAI says the SWE-bench Verified programming benchmark has lost its value as a meaningful measure of AI coding ability. The company points to two main problems: at least 59.4 percent of the benchmark's tasks are flawed, rejecting correct solutions because they enforce specific implementation details or check functions not described in the task.

Many tasks and solutions have also leaked into leading models' training data. OpenAI reports that GPT-5.2, Claude Opus 4.5, and Gemini 3 Flash Preview could reproduce some original fixes from memory, meaning benchmark progress increasingly reflects what a model has seen, not how well it codes. OpenAI recommends SWE-bench Pro instead and is building its own non-public tests.

There's a possible strategic angle here: a "contaminated" benchmark can make rivals—especially open-source models—look better and skew rankings. SWE-bench Verified was long the gold standard for AI coding evaluation, with OpenAI, Anthropic, Google, and many Chinese open-weight models competing for small leads. AI benchmarks can provide useful signal, but their real-world value remains limited.

Read full article about: OpenAI partners with major consulting firms to push Frontier agent platform

OpenAI has launched a new partner program called "Frontier Alliances." The initiative aims to bring the company's recently introduced Frontier platform to large enterprise customers. Frontier lets businesses build AI agents that handle tasks independently, from processing customer inquiries and pulling CRM data to verifying policies. Details about the platform remain scarce at this point. For now, Frontier is only available to a select group of customers. For now, Frontier is only available to a select group of customers.

To get Frontier into major corporations, OpenAI has signed multi-year partnerships with Boston Consulting Group (BCG), McKinsey, Accenture, and Capgemini. BCG and McKinsey are taking on strategy, organizational restructuring, and rollout planning, while Accenture and Capgemini handle the technical side, integrating Frontier with existing systems and data infrastructure. All four partners are standing up dedicated teams that will be certified in OpenAI's technology.

OpenAI CEO Sam Altman warns "the world is not prepared" as OpenAI accelerates research using its own AI

Sam Altman says AGI is “pretty close” and superintelligence “not that far off.” Speaking at the Express Adda event in India, the OpenAI CEO suggested the company’s internal models are already accelerating its own research and that “the world is not prepared” for what’s coming.

OpenAI staff debated alerting Canadian police about violent ChatGPT logs months before a deadly school shooting

Jesse Van Rootselaar left digital warning signs across multiple platforms before her shooting rampage in Tumbler Ridge, including in ChatGPT conversations. About a dozen OpenAI employees debated internally whether to alert Canadian police. Management decided against it. The case exposes a dilemma facing the entire online industry and AI chatbot companies in particular.

Read full article about: OpenAI is building a $200 to $300 smart speaker that tells you when to go to bed

OpenAI's first smart speaker is expected to land between $200 and $300. According to The Information, the device packs a camera and facial recognition for purchases. It uses video to scan its surroundings and serve up proactive suggestions, like telling you to hit the sack early before a big meeting. A court filing from Vice President Peter Welinder puts the earliest ship date at February 2027.

The company's 200-plus-person hardware team is reportedly building out a whole product lineup. That includes smart glasses (mass production no earlier than 2028), prototypes of a smart lamp with no clear launch timeline, and an audio wearable called "Sweetpea" that's gunning for AirPods. There's also a stylus called "Gumdrop" in the works. Foxconn is reportedly handling manufacturing for the hardware lineup.

CEO Sam Altman has teased at least one device reveal for 2026. OpenAI isn't alone in this race. Companies like Meta and Apple are making similar bets on AI hardware as the next big computing platform.

Read full article about: Nvidia reportedly set to invest $30 billion in OpenAI

Nvidia is close to investing $30 billion in OpenAI, Reuters reports, citing a person familiar with the matter. The investment is part of a funding round in which OpenAI aims to raise more than $100 billion total - a deal that would value the ChatGPT maker at roughly $830 billion, making it one of the largest private fundraises in history.

SoftBank and Amazon are also expected to participate in the round. OpenAI plans to spend a significant portion of the new capital on Nvidia chips needed to train and run its AI models.

According to the Financial Times, the investment replaces a deal announced in September, under which Nvidia was set to provide up to $100 billion to support OpenAI's chip usage in data centers. That original agreement took longer to finalize than expected.

Read full article about: New benchmark shows AI agents can exploit most smart contract vulnerabilities on their own

OpenAI and crypto investment firm Paradigm have built EVMbench, a benchmark that measures how well AI agents can find, fix, and exploit security vulnerabilities in Ethereum smart contracts. The dataset covers 120 vulnerabilities drawn from 40 real-world security audits.

In the most realistic test setup, AI agents interact with a local blockchain and have to carry out attacks entirely on their own.

The top-performing model, GPT-5.3-Codex, successfully exploited 72 percent of the vulnerabilities and fixed 41.5 percent. For detection, Claude Opus 4.6 came out ahead at 45.6 percent.

The biggest challenge for the AI agents isn't exploiting or fixing vulnerabilities - it's finding them in large codebases, the researchers say. When agents were given hints about where a vulnerability was located, exploit success rates jumped from 63 to 96 percent, and fix rates climbed from 39 to 94 percent.

With over $100 billion locked in smart contracts, the authors see both an opportunity for better security and a growing risk if these capabilities fall into the wrong hands.

Read full article about: OpenClaw developer Peter Steinberger joins OpenAI to build AI agents

Peter Steinberger, the developer behind the open-source project OpenClaw, is joining OpenAI. His focus will be on building the next generation of personal AI agents. OpenAI CEO Sam Altman called Steinberger a "genius with a lot of amazing ideas about the future of very smart agents interacting with each other to do very useful things for people." Altman expects this work to quickly become a core part of OpenAI's product lineup.

OpenClaw, Steinberger's original hobby project, which blew up over the past few weeks, will "live in a foundation as an open-source project" and will be supported by OpenAI, Altman says, calling the future "extremely multi-agent."

Steinberger writes in his blog that he spoke to several large AI labs in San Francisco but ultimately chose OpenAI because they shared the same vision. Steinberger's goal: building an agent that even his mother can use. Getting there, he says, requires fundamental changes, more security research, and access to the latest models.

What I want is to change the world, not build a large company and teaming up with OpenAI is the fastest way to bring this to everyone.

Peter Steinberger