Short News Archive

Feb 24, 2026

Chinese AI startup Deepseek has apparently trained its latest AI model on Nvidia's most powerful Blackwell chips, despite the US export ban. That's according to Reuters, citing a senior Trump administration official. The model is expected to drop next week. Rumors about chip smuggling had already been circulating since late last year.

The official says the Blackwell chips are believed to be in a data center in Inner Mongolia, and Deepseek is expected to scrub technical fingerprints of US chip usage before release. The official wouldn't say how Deepseek obtained the chips. Nvidia declined to comment, and neither Deepseek nor the US Department of Commerce responded to Reuters.

If the timing of these leaks is any indicator, Deepseek may be on the verge of another major splash. Google, OpenAI, and Anthropic have all been complaining about distillation attacks on their models by Chinese startups, and OpenAI recently moved to relativize a well-known coding benchmark. Together, these moves suggest Deepseek is about to deliver strong results at rock-bottom prices once again. Back in January 2025, China's leading AI startup sent shockwaves through US tech stocks riding the AI bubble.

Comment Source: Reuters

Matthias Bastian

Feb 23, 2026

Short News

Anthropic says it has caught Chinese AI labs Deepseek, Moonshot, and MiniMax running large-scale distillation attacks on Claude, a technique where a weaker model learns from the output of a stronger one. Over 24,000 fake accounts fired off more than 16 million queries targeting Claude's reasoning, programming, and tool usage capabilities. The labs used proxy services to bypass China's access restrictions.

Lab	Requests	Targets
Deepseek	150,000+	Extracting reasoning steps, reward model data for reinforcement learning, censorship-compliant answers on politically sensitive topics
Moonshot AI	3.4 million+	Agent-based reasoning, tool usage, programming, data analysis, computer vision, reconstructing Claude's thought processes
MiniMax	13 million+	Agent-based programming, tool usage and orchestration; pivoted to new Claude model within 24 hours

Deepseek specifically targeted Claude's reasoning chain, extracting thought processes and censorship-compliant answers on sensitive topics. MiniMax ran the biggest campaign by far with over 13 million requests. When Anthropic shipped a new model, MiniMax pivoted within 24 hours and redirected nearly half its traffic to the updated system, Anthropic says.

OpenAI and Google report similar attempts from Chinese labs. Anthropic is calling on the industry and policymakers to mount a coordinated response.

Comment Source: Anthropic

Matthias Bastian

Feb 23, 2026

Short News

OpenAI says the SWE-bench Verified programming benchmark has lost its value as a meaningful measure of AI coding ability. The company points to two main problems: at least 59.4 percent of the benchmark's tasks are flawed, rejecting correct solutions because they enforce specific implementation details or check functions not described in the task.

Many tasks and solutions have also leaked into leading models' training data. OpenAI reports that GPT-5.2, Claude Opus 4.5, and Gemini 3 Flash Preview could reproduce some original fixes from memory, meaning benchmark progress increasingly reflects what a model has seen, not how well it codes. OpenAI recommends SWE-bench Pro instead and is building its own non-public tests.

There's a possible strategic angle here: a "contaminated" benchmark can make rivals—especially open-source models—look better and skew rankings. SWE-bench Verified was long the gold standard for AI coding evaluation, with OpenAI, Anthropic, Google, and many Chinese open-weight models competing for small leads. AI benchmarks can provide useful signal, but their real-world value remains limited.

Comment Source: OpenAI

Matthias Bastian

Feb 23, 2026

Short News

OpenAI has launched a new partner program called "Frontier Alliances." The initiative aims to bring the company's recently introduced Frontier platform to large enterprise customers. Frontier lets businesses build AI agents that handle tasks independently, from processing customer inquiries and pulling CRM data to verifying policies. Details about the platform remain scarce at this point. For now, Frontier is only available to a select group of customers. For now, Frontier is only available to a select group of customers.

To get Frontier into major corporations, OpenAI has signed multi-year partnerships with Boston Consulting Group (BCG), McKinsey, Accenture, and Capgemini. BCG and McKinsey are taking on strategy, organizational restructuring, and rollout planning, while Accenture and Capgemini handle the technical side, integrating Frontier with existing systems and data infrastructure. All four partners are standing up dedicated teams that will be certified in OpenAI's technology.

Comment Source: OpenAI

Matthias Bastian

Feb 23, 2026

Short News

Google for Education and the educational organization ISTE+ASCD are launching a joint initiative to provide free AI training to all six million teachers in the US. Google says it's the largest program of its kind. The courses cover how to use Google's AI products Gemini and NotebookLM, with the goal of helping teachers and their more than 74 million students use AI safely in the classroom. The modules are designed to be short and practical, with concrete examples teachers can apply directly to their lessons. The initiative is set to launch in the coming months. Those interested can sign up via a Google form.

There's a clear strategic play behind the effort, of course. Getting your products embedded in the education system early means getting young people comfortable with your ecosystem while they're still in school and keeping them there well into their professional lives. Competitors like OpenAI and Anthropic are running similar playbooks, though they tend to focus on university partnerships and enticing offers for students, such as free or discounted access to their AI models.

Comment Source: Google

Matthias Bastian

Feb 22, 2026

Short News

Newsguard tested whether ChatGPT Voice (OpenAI), Gemini Live (Google), and Alexa+ (Amazon) repeat false claims in realistic-sounding audio, the kind easily shared on social media to spread disinformation.

Researchers tested 20 false claims across health, US politics, world news, and foreign disinformation, each with a neutral question, a leading question, and a malicious prompt to write a radio script with the false information. ChatGPT repeated falsehoods 22 percent of the time, Gemini 23 percent. With malicious prompts, those numbers jumped to 50 and 45 percent, respectively.

Bar chart showing fail rates for three audio bots by prompt type. Neutral prompts (red): ChatGPT and Gemini both at 5 percent. Leading prompts (blue): ChatGPT at 10 percent, Gemini at 20 percent. Malicious prompts (brown): ChatGPT at 50 percent, Gemini at 45 percent. Alexa+ stayed at 0 percent across all three prompt types. — Fail rates for ChatGPT, Gemini, and Alexa+ audio bots by prompt type. Malicious prompts spiked ChatGPT to 50 percent and Gemini to 45 percent. Alexa+ stayed at 0 percent across all three types. | Image: Newsguard

Amazon's Alexa+ was the clear outlier. It rejected every single false claim. Amazon Vice President Leila Rouhi says Alexa+ pulls from trusted news sources like AP and Reuters. OpenAI declined to comment, and Google didn't respond to two requests for comment. Full details on the methodology are available on Newsguardtech.com.

Comment Source: Newsguard

Matthias Bastian

Feb 21, 2026

Short News

Google's Gemini 3.1 Pro Preview leads the Artificial Analysis Intelligence Index four points ahead of Anthropic's Claude Opus 4.6, at less than half the cost. The model ranks first in six of ten categories, including agent-based coding, knowledge, scientific reasoning, and physics. Its hallucination rate dropped 38 percentage points compared to Gemini 3 Pro, which struggled in that area. The index rolls ten benchmarks into one overall score.

Bar chart of the Artificial Analysis Intelligence Index: Gemini 3.1 Pro Preview leads with 57 points, followed by Claude Opus 4.6 at 53, Claude Sonnet 4.6 at 51, GPT-5.2 at 51, and GLM-5 at 50. Other models like Kimi K2.5, Gemini 3 Flash, and Grok 4 follow with lower scores. — Gemini 3.1 Pro Preview scored 57 points in the Artificial Analysis Intelligence Index, four points ahead of Claude Opus 4.6, six ahead of GPT-5.2. | Image: Artificial Analysis

Running the full index test with Gemini costs $892, compared to $2,304 for GPT-5.2 and $2,486 for Claude Opus 4.6. Gemini used just 57 million tokens, well under GPT-5.2's 130 million. Open-source models like GLM-5 come in even cheaper at $547. When it comes to real-world agent tasks, though, Gemini 3.1 Pro still falls behind Claude Sonnet 4.6, Opus 4.6, and GPT-5.2.

As always, benchmarks only go so far. In our own internal fact-checking test, 3.1 Pro does significantly worse than Opus 4.6 or GPT-5.2, verifying only about a quarter of statements in initial tests, even fewer than Gemini 3 Pro, which was already weak here. So find your own benchmarks.

Comment Source: AA via X - Cost | AA via X - Overall

Matthias Bastian

Feb 21, 2026

Short News

OpenAI CEO Sam Altman warns "the world is not prepared" as OpenAI accelerates research using its own AI

Sam Altman says AGI is “pretty close” and superintelligence “not that far off.” Speaking at the Express Adda event in India, the OpenAI CEO suggested the company’s internal models are already accelerating its own research and that “the world is not prepared” for what’s coming.

Read full article

Comment

Matthias Bastian

Feb 21, 2026

Short News

Anthropic is rolling out new desktop features for Claude Code that take development automation a step further. The AI can now spin up development servers and display running web apps right in the interface, spot errors, and fix them on its own.

There's also a new code review feature that checks changes and drops comments directly in the diff view. For GitHub projects, Claude keeps an eye on pull requests in the background, automatically fixes CI errors, and can even merge PRs on its own once tests pass. That means developers can move on to new tasks while Claude Code works through open PRs behind the scenes. Sessions pick up seamlessly across CLI, desktop, web, and mobile. All updates are available now.

Comment Source: Anthropic

Matthias Bastian

Feb 20, 2026

Short News

OpenAI's first smart speaker is expected to land between $200 and $300. According to The Information, the device packs a camera and facial recognition for purchases. It uses video to scan its surroundings and serve up proactive suggestions, like telling you to hit the sack early before a big meeting. A court filing from Vice President Peter Welinder puts the earliest ship date at February 2027.

The company's 200-plus-person hardware team is reportedly building out a whole product lineup. That includes smart glasses (mass production no earlier than 2028), prototypes of a smart lamp with no clear launch timeline, and an audio wearable called "Sweetpea" that's gunning for AirPods. There's also a stylus called "Gumdrop" in the works. Foxconn is reportedly handling manufacturing for the hardware lineup.

CEO Sam Altman has teased at least one device reveal for 2026. OpenAI isn't alone in this race. Companies like Meta and Apple are making similar bets on AI hardware as the next big computing platform.

Comment Source: The Information