Ad
Skip to content
Read full article about: Claude can now jump between Excel and PowerPoint on its own

Anthropic now lets Claude switch independently between Excel and PowerPoint - for example, running an analysis and then building a presentation directly from the results. At the same time, the company is expanding Cowork for enterprise customers with private plugin marketplaces that let admins create their own plugin collections and distribute them to specific teams. Plugins turn Claude into specialized AI agents for different departments, with new templates now available for HR, design, engineering, finance, and asset management, among others.

Anthropic is putting particular emphasis on finance: New MCP interfaces for FactSet and MSCI give Claude access to real-time market data and index analysis, while partners like S&P Global (Capital IQ Pro) and LSEG have contributed their own plugins.

The update also adds new connections to third-party software including Google Workspace, Docusign, Salesforce Slack, FactSet, and others. Admins get more control over user access along with OpenTelemetry support for monitoring costs and usage. The Excel-PowerPoint feature is available as a research preview across all paid plans. Cowork is Anthropic's desktop tool for agent-based office work. Plugins were added at the end of January to make Claude a specialist for individual departments, though the tool has known security vulnerabilities.

Deepmind suggests AI should occasionally assign humans busywork so we do not forget how to do our jobs

AI systems should sometimes give tasks to humans they could easily handle themselves, just so people don’t forget how to do their jobs. That’s one of the more striking recommendations from a new Google Deepmind paper on how AI agents should delegate work.

Read full article about: OpenAI ships API upgrades targeting voice reliability and agent speed for developers

OpenAI has shipped two API updates for developers: the new gpt-realtime-1.5 model for the real-time API is designed to make voice commands more reliable. In internal testing, OpenAI saw roughly a ten percent improvement in transcribing numbers and letters, a five percent bump in logical audio tasks, and seven percent better instruction following. The audio model has also been updated to version 1.5.

The Responses API also now supports WebSockets. Instead of retransmitting the full context with every request, this opens a persistent connection that only sends new data as it comes in. According to OpenAI, the change speeds up complex AI agents with many tool calls by 20 to 40 percent.

Read full article about: Google, OpenAI, and Anthropic are all bracing for Deepseek's next big release

Chinese AI startup Deepseek has apparently trained its latest AI model on Nvidia's most powerful Blackwell chips, despite the US export ban. That's according to Reuters, citing a senior Trump administration official. The model is expected to drop next week. Rumors about chip smuggling had already been circulating since late last year.

The official says the Blackwell chips are believed to be in a data center in Inner Mongolia, and Deepseek is expected to scrub technical fingerprints of US chip usage before release. The official wouldn't say how Deepseek obtained the chips. Nvidia declined to comment, and neither Deepseek nor the US Department of Commerce responded to Reuters.

If the timing of these leaks is any indicator, Deepseek may be on the verge of another major splash. Google, OpenAI, and Anthropic have all been complaining about distillation attacks on their models by Chinese startups, and OpenAI recently moved to relativize a well-known coding benchmark. Together, these moves suggest Deepseek is about to deliver strong results at rock-bottom prices once again. Back in January 2025, China's leading AI startup sent shockwaves through US tech stocks riding the AI bubble.

Read full article about: Anthropic accuses Deepseek, Moonshot, and MiniMax of stealing Claude's AI data through 16 million queries

Anthropic says it has caught Chinese AI labs Deepseek, Moonshot, and MiniMax running large-scale distillation attacks on Claude, a technique where a weaker model learns from the output of a stronger one. Over 24,000 fake accounts fired off more than 16 million queries targeting Claude's reasoning, programming, and tool usage capabilities. The labs used proxy services to bypass China's access restrictions.

Lab Requests Targets
Deepseek 150,000+ Extracting reasoning steps, reward model data for reinforcement learning, censorship-compliant answers on politically sensitive topics
Moonshot AI 3.4 million+ Agent-based reasoning, tool usage, programming, data analysis, computer vision, reconstructing Claude's thought processes
MiniMax 13 million+ Agent-based programming, tool usage and orchestration; pivoted to new Claude model within 24 hours

Deepseek specifically targeted Claude's reasoning chain, extracting thought processes and censorship-compliant answers on sensitive topics. MiniMax ran the biggest campaign by far with over 13 million requests. When Anthropic shipped a new model, MiniMax pivoted within 24 hours and redirected nearly half its traffic to the updated system, Anthropic says.

OpenAI and Google report similar attempts from Chinese labs. Anthropic is calling on the industry and policymakers to mount a coordinated response.

Read full article about: OpenAI wants to retire the AI coding benchmark that everyone has been competing on

OpenAI says the SWE-bench Verified programming benchmark has lost its value as a meaningful measure of AI coding ability. The company points to two main problems: at least 59.4 percent of the benchmark's tasks are flawed, rejecting correct solutions because they enforce specific implementation details or check functions not described in the task.

Many tasks and solutions have also leaked into leading models' training data. OpenAI reports that GPT-5.2, Claude Opus 4.5, and Gemini 3 Flash Preview could reproduce some original fixes from memory, meaning benchmark progress increasingly reflects what a model has seen, not how well it codes. OpenAI recommends SWE-bench Pro instead and is building its own non-public tests.

There's a possible strategic angle here: a "contaminated" benchmark can make rivals—especially open-source models—look better and skew rankings. SWE-bench Verified was long the gold standard for AI coding evaluation, with OpenAI, Anthropic, Google, and many Chinese open-weight models competing for small leads. AI benchmarks can provide useful signal, but their real-world value remains limited.

Read full article about: OpenAI partners with major consulting firms to push Frontier agent platform

OpenAI has launched a new partner program called "Frontier Alliances." The initiative aims to bring the company's recently introduced Frontier platform to large enterprise customers. Frontier lets businesses build AI agents that handle tasks independently, from processing customer inquiries and pulling CRM data to verifying policies. Details about the platform remain scarce at this point. For now, Frontier is only available to a select group of customers. For now, Frontier is only available to a select group of customers.

To get Frontier into major corporations, OpenAI has signed multi-year partnerships with Boston Consulting Group (BCG), McKinsey, Accenture, and Capgemini. BCG and McKinsey are taking on strategy, organizational restructuring, and rollout planning, while Accenture and Capgemini handle the technical side, integrating Frontier with existing systems and data infrastructure. All four partners are standing up dedicated teams that will be certified in OpenAI's technology.