Matthias Bastian
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Read full article about: Nearly 29 percent of "Humanity's Last Exam" chemistry/biology answers are wrong or misleading
It looks like humanity might flunk its own "final AI exam." According to FutureHouse, about 29 percent of biology and chemistry questions in the AI benchmark Humanity's Last Exam (HLE) have answers that are incorrect or misleading, based on published literature. The error rate was uncovered through a combination of human review and AI-backed analysis.
HLE was built to push language models to their limits with especially tough questions, but the analysis suggests that many of its items are themselves misleading or wrong. Experts only spent a few minutes per question, and a full accuracy check wasn't required. In response, FutureHouse has released a smaller, vetted version called "HLE Bio/Chem Gold" on HuggingFace.
Comment
Source: FutureHouse
Read full article about: OpenAI will host its next DevDay on October 6, 2025, in San Francisco
Update: Registration is now open. Places will be allocated by lottery.
OpenAI has set the date for its next DevDay: October 6, 2025, in San Francisco. With over 1,500 developers expected, the company says that this will be the largest event of its kind so far. The agenda includes a live-streamed keynote, hands-on workshops featuring the latest models and tools, and more stages and demos than last year. Details are still under wraps, but you can sign up for updates here.
Comment
Source: OpenAI via X