Anthropic recruits ex-Google data center veterans to build its own AI infrastructure empire
Anthropic is discussing building at least 10 gigawatts of data center capacity worth hundreds of billions of dollars, recruiting ex-Google managers and lining up Google as a financial backer to make it happen.
Google Deepmind has upgraded its specialized thinking mode "Gemini 3 Deep Think" and made it available through the Gemini app and as an API via a Vertex AI early access program. The upgrade targets complex tasks in science, research, and engineering.
Google AI Ultra subscribers can access Deep Think through the Gemini app, while developers and researchers can sign up separately for the API program. According to Google Deepmind, the model tops several major benchmarks:
Benchmark
Deep Think
Claude Opus 4.6
GPT-5.2
Gemini 3 Pro Preview
ARC-AGI-2 (Logical reasoning)
84.6%
68.8%
52.9%
31.1%
Humanity's Last Exam (Academic reasoning)
48.4%
40.0%
34.5%
37.5%
MMMU-Pro (Multimodal reasoning)
81.5%
73.9%
79.5%
81.0%
Codeforces (Coding/algorithms, Elo)
3,455
2,352
-
2,512
While Deep Think dominates in logic and coding, the gap narrows significantly on MMMU-Pro: it scored 81.5 percent, barely ahead of Gemini 3 Pro Preview at 81.0 percent. This suggests the thinking upgrades focus heavily on abstract reasoning rather than visual processing. Deep Think also achieved gold medal-level results at the 2025 Physics and Chemistry Olympiads. Examples of the model in scientific use can be found here.
Isomorphic Labs, Google DeepMind's AI medicine startup, has unveiled a new system called "Isomorphic Labs Drug Design Engine" (IsoDDE) that it says outperforms AlphaFold 3. According to the company, IsoDDE doubles AlphaFold 3's accuracy when predicting protein-ligand structures that differ significantly from the training data (see left graph below).
IsoDDE outperforms previous methods in structure prediction, binding pocket recognition, and binding strength prediction, according to Isomorphic Labs. | Image: Isomorphic Labs
Beyond structure prediction, IsoDDE can identify previously unknown docking sites on proteins in seconds based solely on their blueprint, with accuracy that Isomorphic Labs says approaches that of lab experiments. Isomorphic Labs also claims the system can estimate how strongly a drug binds to its target at a fraction of the time and cost of traditional methods. These capabilities could uncover new starting points for active compounds and speed up computational screening.
Best multimodal models still can't crack 50 percent on basic visual entity recognition
A new benchmark called WorldVQA tests whether multimodal AI models actually recognize what they see or just make it up. Even the best performer, Gemini 3 Pro, tops out at 47.4 percent when asked for specific details like exact species or product names instead of generic labels. Worse, the models are convinced they’re right even when they’re wrong.
Study finds AI reasoning models generate a "society of thought" with arguing voices inside their process
New research reveals that reasoning models like Deepseek-R1 simulate entire teams of experts when solving problems: some extraverted, some neurotic, all conscientious. This internal debate doesn’t just look like teamwork. It measurably boosts performance.
Google's PaperBanana uses five AI agents to auto-generate scientific diagrams
Researchers at Peking University and Google built a system that turns method descriptions into scientific diagrams automatically. Five specialized AI agents handle everything from finding reference images to quality control, tackling one of the last manual bottlenecks in academic publishing.
Anthropic's security training fails when Claude operates a graphical user interface.
In pilot tests, Claude was able to get Opus 4.6 to provide detailed instructions on how to make mustard gas in an Excel spreadsheet and maintain an accounting spreadsheet for a criminal gang - behaviors that did not or rarely occurred in text-only interactions.
"We found some kinds of misuse behavior in these pilot evaluations that were absent or much rarer in text-only interactions," Anthropic writes in the Claude Opus 4.6 system card. "These findings suggest that our standard alignment training measures are likely less effective in GUI settings."
According to Anthropic, tests with the predecessor model Claude Opus 4.5 in the same environment showed "similar results" - so the problem persists across model generations without having been noticed. The vulnerability apparently arises because, while models learn to reject malicious requests in conversation, they do not fully transfer this behavior to agent-based tool usage.