Ad
Skip to content

Matthias Bastian

Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Read full article about: Grok 4.20 trails Gemini and GPT-5.4 by a wide margin but sets a new record for not hallucinating

xAI's Grok 4.20 can't keep up with the top AI models in benchmarks but hallucinates less than any other model tested. According to Artificial Analysis, Grok 4.20 Beta scores 48 on the Intelligence Index with reasoning enabled, well behind Gemini 3.1 Pro Preview and GPT-5.4 at 57, but still a 6-point improvement over Grok 4.

Grok hängt den neuesten Modellen der großen KI-Labore hinterher. | Bild: Artificial Analysis
Grok trails the latest models from major AI labs in overall benchmark performance. | Image: Artificial Analysis

xAI shipped three API variants: with reasoning, without reasoning, and a multi-agent mode. The model supports a 2-million-token context window and costs 2 or 6 dollars per million tokens; cheaper than Grok 4 and competitively priced among Western models.

Where Grok 4.20 stands out, of all things, is factual reliability. On the AA Omniscience test, it hit a 78 percent non-hallucination rate, a record, according to Artificial Analysis. The test measures how often a model fabricates an answer instead of admitting it doesn't know, alongside factual recall. Grok 4.20 only got it wrong about one in five times when it didn't have the answer.

Read full article about: US War Department CTO says Anthropic's AI models "pollute" the supply chain with built-in ethics

Emil Michael, the US Department of War's chief technology officer, made clear that classifying Anthropic as a supply chain risk is an ideologically motivated move. Claude models "pollute" the supply chain because they have a "different policy preference" baked into them, Michael told CNBC. He pointed to Anthropic's "constitution," a ruleset emphasizing ethics and safety, which he said could result in soldiers receiving "ineffective weapons, ineffective body armor, ineffective protection." The measure was "not meant to be punitive," he added.

Anthropic is the first US company to receive this classification, which is normally reserved for foreign adversaries. The AI company is suing over the designation and has drawn support from Microsoft, OpenAI, and Google employees, as well as former US military personnel. Anthropic has previously pushed back against its own AI models being used for US mass surveillance and autonomous weapons.

The administration has already signaled its intent to control AI along ideological lines by enacting regulations targeting so-called "woke AI," framed as a commitment to political neutrality. The approach echoes the Chinese government's own efforts to exert political control over AI models.

Comment Source: CNBC

Copilot Health marks Microsoft's entry into the AI health race alongside OpenAI and Anthropic

Microsoft is launching Copilot Health, an AI health assistant that pulls data from wearables, medical records, and lab results to deliver personalized health advice. Long term, the company says it’s working toward “medical superintelligence.”

Read full article about: ChatGPT still leads the chatbot market but its dominance is slipping as Google's Gemini gains ground

ChatGPT still dominates the chatbot market, but its lead is shrinking. New data from Similarweb shows OpenAI's chatbot accounted for just 61.7 percent of global AI web traffic in February 2026, down from 75.7 percent twelve months earlier. The biggest winner is Google Gemini, which more than quadrupled its share from 5.7 percent to 24.4 percent over the same period. Grok (3.4 percent) and Claude (3.3 percent) have overtaken DeepSeek (3.2 percent) for the first time, claiming third and fourth place. Claude crossed the three percent mark for the first time in February, though it's much stronger in the B2B market, according to a separate study.

ChatGPT still leads overall, but Google Gemini has closed the gap significantly. These figures only cover web traffic. | Image: Similarweb

In absolute numbers, ChatGPT recorded 5.35 billion visits in February, while Gemini pulled in 2.11 billion. Grok came in at 298.5 million visits, Claude at 290.3 million, Deepseek at 246.4 million, and Perplexity at 153.8 million. Microsoft's Copilot stagnated at 1.1 percent market share, though that only reflects the web version. Microsoft's actual share of the enterprise market is likely much higher.

Read full article about: Google's new Ask Maps lets you search for places in plain language using Gemini AI

Google has introduced "Ask Maps," a conversational feature powered by its Gemini models. Users can ask questions in plain language, like "Is there a public tennis court with lights on that I can play at tonight?" or "My phone is dying — where can I charge it without having to wait in a long line for coffee?" The feature taps into data from more than 300 million locations and reviews from over 500 million contributors.

Results show up on a personalized map based on past searches and saved places. Users can book tables, save or share locations, and jump into navigation directly. Ask Maps is rolling out first in the US and India on Android and iOS, with a desktop version on the way.

Google also announced "Immersive Navigation," a revamped turn-by-turn system with a 3D view of surroundings, including buildings, overpasses, and lane markings. Gemini models generate the visuals by analyzing Street View and aerial imagery.

Immersive Navigation launches first in the US, expanding to more iOS and Android devices, CarPlay, Android Auto, and cars with built-in Google over the coming months.

Read full article about: OpenAI is reportedly planning to integrate its video AI Sora into ChatGPT

OpenAI is reportedly planning to fold its video AI Sora directly into ChatGPT. So far, Sora has only been available as a standalone mobile and web app. OpenAI originally pitched it as a viral hit and potential TikTok alternative, a strategy that seemed to work early on, partly thanks to massive copyright infringements.

That momentum didn't last. According to The Information, the app has slid from No. 1 to No. 165 in the Apple App Store since launching last fall. CEO Sam Altman reportedly admitted internally that hardly anyone was sharing videos publicly. Rolling Sora into ChatGPT might fix that: with around 920 million weekly active users, the move would naturally drive more video generation. The standalone app will stick around for now, The Information reports.

Google already offers video generation in Gemini, though with tight capacity limits and only for paying subscribers. OpenAI will likely go a similar route: the company is strapped for compute, burns through cash supporting the roughly 95 percent of free ChatGPT accounts, and video generation is especially resource-hungry.

Read full article about: Amazon gets court order blocking Perplexity's AI shopping agent

A federal court in San Francisco has granted Amazon an injunction against AI startup Perplexity, barring it from using its AI browser agent Comet to make purchases on Amazon.

Amazon sued Perplexity in November, accusing the startup of fraud because Comet didn't disclose when it was shopping on behalf of a real person and ignored Amazon's demands to stop. The case raises a growing legal question: how should courts handle AI agents taking on complex tasks like online shopping?

Judge Maxine Chesney ruled that Amazon presented strong evidence that Perplexity was accessing users' password-protected accounts with their permission but without Amazon's authorization. Perplexity must also delete any collected Amazon data and has one week to appeal.

There's an interesting wrinkle here: Amazon recently became a major investor in OpenAI, which also sees product research and online shopping as key AI chat features. So far, though, OpenAI reportedly hasn't cracked direct checkout in its chat interface. Amazon may be positioning itself to step in and own that piece of the puzzle.