Ad
Skip to content

Deepmind suggests AI should occasionally assign humans busywork so we do not forget how to do our jobs

AI systems should sometimes give tasks to humans they could easily handle themselves, just so people don’t forget how to do their jobs. That’s one of the more striking recommendations from a new Google Deepmind paper on how AI agents should delegate work.

Read full article about: OpenAI wants to retire the AI coding benchmark that everyone has been competing on

OpenAI says the SWE-bench Verified programming benchmark has lost its value as a meaningful measure of AI coding ability. The company points to two main problems: at least 59.4 percent of the benchmark's tasks are flawed, rejecting correct solutions because they enforce specific implementation details or check functions not described in the task.

Many tasks and solutions have also leaked into leading models' training data. OpenAI reports that GPT-5.2, Claude Opus 4.5, and Gemini 3 Flash Preview could reproduce some original fixes from memory, meaning benchmark progress increasingly reflects what a model has seen, not how well it codes. OpenAI recommends SWE-bench Pro instead and is building its own non-public tests.

There's a possible strategic angle here: a "contaminated" benchmark can make rivals—especially open-source models—look better and skew rankings. SWE-bench Verified was long the gold standard for AI coding evaluation, with OpenAI, Anthropic, Google, and many Chinese open-weight models competing for small leads. AI benchmarks can provide useful signal, but their real-world value remains limited.

AI agents are thriving in software development but barely exist anywhere else, Anthropic study finds

AI agents are supposed to revolutionize how we work. But Anthropic’s own data tells a different story: so far, that revolution is almost entirely limited to software engineering. And even there, users aren’t letting agents work nearly as autonomously as the technology would allow.

Deepmind veteran David Silver raises $1B seed round to build superintelligence without LLMs

Long-time DeepMind researcher David Silver is raising one billion dollars for his London-based AI start-up Ineffable Intelligence, the largest seed round in European start-up history. Instead of training on internet text like today’s LLMs, Silver is betting on reinforcement learning in simulated environments to build an “endlessly learning superintelligence.”

Alibaba's free Qwen3.5 signals that China's open-weight model race is far from slowing down

Chinese AI labs keep shipping new models at a rapid clip. Today it’s Alibaba’s turn with Qwen3.5, which tries to match top Western models using a hybrid architecture that combines linear attention and mixture-of-experts while keeping just 17 billion parameters active per query. And yes, it’s open weight.