Ad
Short

It looks like humanity might flunk its own "final AI exam." According to FutureHouse, about 29 percent of biology and chemistry questions in the AI benchmark Humanity's Last Exam (HLE) have answers that are incorrect or misleading, based on published literature. The error rate was uncovered through a combination of human review and AI-backed analysis.

HLE was built to push language models to their limits with especially tough questions, but the analysis suggests that many of its items are themselves misleading or wrong. Experts only spent a few minutes per question, and a full accuracy check wasn't required. In response, FutureHouse has released a smaller, vetted version called "HLE Bio/Chem Gold" on HuggingFace.

Short

The US Food and Drug Administration is relying on Elsa, a generative AI system, to help evaluate new drugs - even though, according to insiders, it regularly fabricates studies.

"Anything that you don’t have time to double-check is unreliable. It hallucinates confidently," one current FDA employee told CNN, describing the AI system known as Elsa (Efficient Language System for Analysis), which is supposed to speed up drug approvals. Several staff members reported that Elsa frequently invents studies or misrepresents research data - a well-known issue with large language models. The FDA's head of AI, Jeremy Walsh, acknowledged the problem: "Elsa is no different from lots of [large language models] and generative AI. They could potentially hallucinate."

Despite these risks, Elsa is already being used to review clinical protocols and assess risks during inspections. The system operates in a regulatory gray area, since there are currently no binding rules for AI in US healthcare.

Ad
Ad
Short

Google is rolling out new AI-powered features in the Google Photos app. With the new "Photo to video" tool, users can turn individual photos into short, six-second video clips with subtle motion effects. This feature is powered by Google's Veo 2 model and is launching now in the US on both Android and iOS. Another addition, the "Remix" function, lets users transform photos into anime, comics, or 3D animations. Remix will also launch in the US in the coming weeks. Both tools automatically mark generated content with a visible and invisible watermark to improve traceability. Google is also adding a new "Create" tab to the app, which collects all creative tools in one place. The tab will start rolling out to US users in August.

Video: Google

Ad
Ad
Short

Google's latest AI features are now reaching billions of users each month.

AI Overviews, which display AI-generated summaries directly in Google Search, have rolled out to over 200 countries and now serve two billion monthly users. For searches where this feature appears, Google is seeing more than a 10% increase in search activity.

The Gemini app has reached 450 million monthly active users, with daily requests up more than 50% compared to the previous quarter. The new AI Mode, a chat interface built into Search, has already surpassed 100 million monthly active users in the US and India.

Google's text-to-video model Veo 3 has been used to generate over 70 million videos since May. The Google Vids tool, built on Veo, now approaches one million monthly active users.

Short

Update: Registration is now open. Places will be allocated by lottery.

OpenAI has set the date for its next DevDay: October 6, 2025, in San Francisco. With over 1,500 developers expected, the company says that this will be the largest event of its kind so far. The agenda includes a live-streamed keynote, hands-on workshops featuring the latest models and tools, and more stages and demos than last year. Details are still under wraps, but you can sign up for updates here.

Ad
Ad
Google News