Ad
Skip to content

Matthias Bastian

Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.

Google's fastest and cheapest model Gemini 3.1 Flash-Lite got smarter but also tripled the price

Google Deepmind has released a preview of Gemini 3.1 Flash-Lite, the fastest and cheapest model in the Gemini 3 series. It’s significantly more capable than its predecessor, but output costs have more than tripled.

Read full article about: OpenAI releases GPT-5.3 Instant for smoother everyday conversations and better search

OpenAI has released GPT-5.3 Instant, an update to the standard ChatGPT model. The new version aims to make everyday conversations feel more natural and useful. According to OpenAI, the model delivers more accurate answers, better web search results, and fewer unnecessary warnings and refusals. Hallucination rates drop by up to 26.8 percent for web searches and 19.7 percent for internal knowledge, depending on the scenario. The writing style also feels less robotic and preachy, OpenAI claims.

The system card shows some trade-offs on the safety front. GPT-5.3 Instant beats the older GPT-5.1 Instant on average when it comes to catching unauthorized content, but it actually performs worse than its direct predecessor, GPT-5.2 Instant. The model also takes a small hit on health-related queries (HealthBench) compared to the previous version.

GPT-5.3 Instant is rolling out now to all ChatGPT users and is available to developers via the API as "gpt-5.3-chat-latest." The outgoing GPT-5.2 Instant will stick around for paying users for another three months before OpenAI pulls the plug on June 3, 2026.

Read full article about: Anthropic pitched Claude for Pentagon drone swarm competition

Anthropic entered a $100 million Pentagon competition in early 2026. The company was proposing to use Claude for voice-controlled autonomous drone swarm technology, Bloomberg reports.

The idea was to use Claude to translate a commander's spoken orders into digital instructions and coordinate drone fleets without using AI for autonomous targeting or weapons decisions. Humans would be able to monitor and shut down the system. This approach lines up with Anthropic's position in its ongoing dispute with the Pentagon, where the company has stressed that human oversight is essential for autonomous weapons because current AI models aren't reliable enough to operate without it.

Anthropic didn't win the contract. Instead, the Pentagon awarded it to SpaceX/xAI and two defense companies partnered with OpenAI.

Anthropic's new prompt forces ChatGPT to reveal everything it knows about you

Anthropic is capitalizing on OpenAI’s bad press with a new import function for Claude. A single prompt exports your saved context from ChatGPT or other chatbots, letting you transfer it straight to Claude’s memory.

Read full article about: ElevenLabs and Google dominate Artificial Analysis' updated speech-to-text benchmark

Artificial Analysis has released version 2.0 of its AA-WER speech-to-text benchmark. ElevenLabs' Scribe v2 leads with a word error rate of just 2.3 percent, followed by Google's Gemini 3 Pro (2.9%) and Mistral's Voxtral Small (3.0%). Google's Gemini 3 Flash (3.1%) and ElevenLabs' older Scribe v1 (3.2%) are close behind. Notably, Google didn't specifically train for transcription—the strong results come from Gemini's general multimodal capabilities. OpenAI's popular open-source Whisper Large v3 (4.2%) lands mid-pack, while Alibaba's Qwen3 ASR Flash (5.9%), Amazon's Nova 2 Omni (6.0%), and Rev AI (6.1%) bring up the rear.

Bar chart showing the AA-WER v2.0 overall ranking with word error rates ranging from 2.3% (Scribe v2) to 6.1% (Rev AI).
ElevenLabs' Scribe v2 tops the AA-WER v2.0 overall ranking with the lowest word error rate, followed by Google's Gemini 3 Pro and Mistral's Voxtral Small. | Image: Artificial Analysis

The results hold up in the separate AA-AgentTalk test for speech directed at voice assistants: Scribe v2 (1.6%) and Gemini 3 Pro (1.7%) pull well ahead, with AssemblyAI's Universal-3 Pro taking third at 2.3%.

Bar chart showing the AA-AgentTalk ranking with word error rates ranging from 1.6% (Scribe v2) to 6.1% (Rev AI).
ElevenLabs' Scribe v2 and Google's Gemini 3 Pro also dominate the AA-AgentTalk voice assistant test with the lowest error rates. | Image: Artificial Analysi
Read full article about: Even frontier LLMs from GPT-5 onward lose up to 33% accuracy when you chat too long

The latest generation of large language models—from GPT-5 onward—still struggles when tasks are spread across multiple conversation turns. Researcher Philippe Laban and his team tested current models on six tasks covering code, databases, actions, data-to-text, math, and summarization. Performance drops significantly when information is split across several messages (sharded) instead of a single prompt (concat).

Laban et al.

Newer models did slightly better—performance degradation shrank from 39 to 33 percent—but the issue is far from solved. The biggest gains showed up in Python tasks, where some models only lost 10 to 20 percent. Laban suspects real-world losses could be even worse, since the tests used simple user simulations. Users who change their mind mid-conversation would likely cause steeper drops.

Technical tweaks like lowering temperature values don't fix the problem, the original study found. The researchers recommend starting a fresh conversation when things go sideways, ideally by having the model summarize all requests first and using that summary as the starting point for a new chat.

OpenAI calls Stuart Russell a "doomer" in court after its CEO co-signed his AI extinction warning

Fear generates attention, and OpenAI usually knows how to use that. But in court, the company is trying to discredit an AI expert as a doomsday prophet, even though CEO Sam Altman spent years spreading the same warnings when they still served his own agenda.

Read full article about: OpenAI promises Canada tighter safety protocols after ChatGPT flagged a shooter's violent chats but never called police

In a letter to AI Minister Evan Solomon, OpenAI has promised the Canadian government it will tighten its safety protocols. The move follows a fatal shooting at a school in Tumbler Ridge, British Columbia, that killed eight people. The suspect, Jesse Van Rootselaar, had previously interacted with ChatGPT. An internal algorithm flagged the interactions as possible warnings of real-world violence, and OpenAI employees reviewed them. The company blocked the account but ultimately decided not to contact police.

According to the Wall Street Journal, OpenAI now plans to adopt more flexible criteria for sharing account data with authorities, establish direct lines of communication with Canadian law enforcement, and improve its systems for detecting evasion tactics. OpenAI Vice President Ann O'Leary said the account would have been reported under the new rules. Canada's Justice Minister Sean Fraser warned that new AI regulations could follow if OpenAI doesn't act quickly.