Author HubMatthias Bastian
It looks like humanity might flunk its own "final AI exam." According to FutureHouse, about 29 percent of biology and chemistry questions in the AI benchmark Humanity's Last Exam (HLE) have answers that are incorrect or misleading, based on published literature. The error rate was uncovered through a combination of human review and AI-backed analysis.
HLE was built to push language models to their limits with especially tough questions, but the analysis suggests that many of its items are themselves misleading or wrong. Experts only spent a few minutes per question, and a full accuracy check wasn't required. In response, FutureHouse has released a smaller, vetted version called "HLE Bio/Chem Gold" on HuggingFace.
Update: Registration is now open. Places will be allocated by lottery.
OpenAI has set the date for its next DevDay: October 6, 2025, in San Francisco. With over 1,500 developers expected, the company says that this will be the largest event of its kind so far. The agenda includes a live-streamed keynote, hands-on workshops featuring the latest models and tools, and more stages and demos than last year. Details are still under wraps, but you can sign up for updates here.
Netflix used generative AI to produce a VFX scene in its Argentinian series “El Eternauta,” co-CEO Ted Sarandos said during the company’s earnings call. The AI-assisted sequence was finished ten times faster than traditional methods and would have been too expensive to make otherwise, Sarandos said. Like every CEO, he claimed AI is meant to support creatives, not replace them. The scene also used virtual production tools.
An OpenAI AI model placed second in the AtCoder Heuristics World Finals, an international competition for solving tough optimization problems. The model ran completely on its own for ten hours, competing under the same rules as human participants. After a strong start, it briefly lost the lead before catching up again, only to be overtaken at the last moment by veteran competitor FakePsyho.
OpenAI says this marks the first time an AI has cracked the top three in a major programming and math competition. OpenAI hasn't revealed which specific model it used. The competition itself was sponsored by the company.