Ad
Skip to content

AI agents are thriving in software development but barely exist anywhere else, Anthropic study finds

AI agents are supposed to revolutionize how we work. But Anthropic’s own data tells a different story: so far, that revolution is almost entirely limited to software engineering. And even there, users aren’t letting agents work nearly as autonomously as the technology would allow.

Deepmind veteran David Silver raises $1B seed round to build superintelligence without LLMs

Long-time DeepMind researcher David Silver is raising one billion dollars for his London-based AI start-up Ineffable Intelligence, the largest seed round in European start-up history. Instead of training on internet text like today’s LLMs, Silver is betting on reinforcement learning in simulated environments to build an “endlessly learning superintelligence.”

Alibaba's free Qwen3.5 signals that China's open-weight model race is far from slowing down

Chinese AI labs keep shipping new models at a rapid clip. Today it’s Alibaba’s turn with Qwen3.5, which tries to match top Western models using a hybrid architecture that combines linear attention and mixture-of-experts while keeping just 17 billion parameters active per query. And yes, it’s open weight.

Anthropic recruits ex-Google data center veterans to build its own AI infrastructure empire

Anthropic is discussing building at least 10 gigawatts of data center capacity worth hundreds of billions of dollars, recruiting ex-Google managers and lining up Google as a financial backer to make it happen.

Read full article about: Google Deepmind upgrades Gemini 3 Deep Think for complex science and engineering tasks

Google Deepmind has upgraded its specialized thinking mode "Gemini 3 Deep Think" and made it available through the Gemini app and as an API via a Vertex AI early access program. The upgrade targets complex tasks in science, research, and engineering.

Google AI Ultra subscribers can access Deep Think through the Gemini app, while developers and researchers can sign up separately for the API program. According to Google Deepmind, the model tops several major benchmarks:

Benchmark Deep Think Claude Opus 4.6 GPT-5.2 Gemini 3 Pro Preview
ARC-AGI-2 (Logical reasoning) 84.6% 68.8% 52.9% 31.1%
Humanity's Last Exam (Academic reasoning) 48.4% 40.0% 34.5% 37.5%
MMMU-Pro (Multimodal reasoning) 81.5% 73.9% 79.5% 81.0%
Codeforces (Coding/algorithms, Elo) 3,455 2,352 - 2,512

While Deep Think dominates in logic and coding, the gap narrows significantly on MMMU-Pro: it scored 81.5 percent, barely ahead of Gemini 3 Pro Preview at 81.0 percent. This suggests the thinking upgrades focus heavily on abstract reasoning rather than visual processing. Deep Think also achieved gold medal-level results at the 2025 Physics and Chemistry Olympiads. Examples of the model in scientific use can be found here.