Ad
Skip to content

Matthias Bastian

Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Read full article about: New Artificial Analysis benchmark shows OpenAI, Anthropic, and Google locked in a three-way tie at the top

Artificial Analysis just released version 4.0 of its Intelligence Index, ranking AI models across multiple benchmarks. OpenAI's GPT-5.2 at its highest reasoning setting takes the top spot, with Anthropic's Claude Opus 4.5 and Google's Gemini 3 Pro close behind.

The index scores models across four equally weighted categories: Agents, Programming, Scientific Reasoning, and General. Results are less saturated this time, with top models peaking at 50 points compared to 73 in the previous version.

Artificial Analysis Intelligence Index v4.0: GPT-5.2 (xhigh) leads with 50 points, followed by Claude Opus 4.5 (49) and Gemini 3 Pro Preview (48). It's a tight race at the top. | Image: Artificial Analysis

The updated index swaps three older tests (AIME 2025, LiveCodeBench, and MMLU-Pro) for a fresh set: AA-Omniscience checks model knowledge across 40 topics while flagging hallucinations, GDPval-AA tests models on practical tasks across 44 professions, and CritPt tackles physics research problems.

Artificial Analysis says it ran all tests independently using a standardized approach, with full details available on its website.

Read full article about: AI industry finds its 2026 narrative as OpenAI and Microsoft argue users are the bottleneck, not models

The AI industry seems to have found its narrative for 2026: AI models are more capable than the people using them. Following Satya Nadella, OpenAI product head Fidji Simo has now weighed in. Her message: "AI models are capable of far more than how most people experience them day to day." OpenAI's goal for 2026 is closing the gap between what AI can do and how people actually use it. The company that turns research into useful products will lead the market.

ChatGPT has over 800 million weekly active users and one million business customers, according to Simo. In 2026, OpenAI plans to evolve it from a chatbot to a more personal "super assistant," one that understands goals, stores context, and helps proactively. A leaked mid-2025 document describes how such a super assistant would compete for human attention.

For businesses, OpenAI wants to build an automated workflow platform, with Codex serving as an "automated teammate" for developers. To justify higher prices, OpenAI needs major AI agent improvements: the company is reportedly considering plans costing up to 20,000 dollars per month.

Boston Dynamics unveils production Atlas designed for warehouses and factory floors

Boston Dynamics is turning its humanoid robot Atlas into a commercial product. The first fleet ships to Hyundai in 2026, where the 1.9-meter-tall robot will handle heavy lifting in warehouses and factories.

Read full article about: OpenAI loses top AI researcher Jerry Tworek after seven years

OpenAI is losing yet another senior researcher: Jerry Tworek is out after nearly seven years at the company. Tworek shared the news in a message to his team. He was a key player in building GPT-4, ChatGPT, and OpenAI's first AI coding models, while also helping push new scaling boundaries. Most recently, he ran the "Reasoning Models" team, working on AI systems that can handle complex logical reasoning. He was part of the core group behind the o1 and o3 models, the foundation for much of OpenAI's recent AI progress.

Tworek says he wants "to try and explore types of research that are hard to do at OpenAI." That sounds like a not-so-subtle dig at CEO Sam Altman's relentless focus on products and revenue, which has reportedly been causing tension among researchers. No word yet on where Tworek is headed next.

Read full article about: Abu Dhabi's TII claims its Falcon H1R 7B reasoning model matches rivals seven times its size

The Technology Innovation Institute (TII) from Abu Dhabi has released Falcon H1R 7B, a compact reasoning language model with 7 billion parameters. TII says the model matches the performance of competitors two to seven times larger across various benchmarks, though as always, benchmark scores only loosely correlate with real-world performance, especially for smaller models. Falcon H1R 7B uses a hybrid Transformer-Mamba architecture, which lets it process data faster than comparable models.

Falcon H1R 7B scores 49.5 percent across four benchmarks, outperforming larger models like Qwen3 32B (46.2 percent) and Nemotron H 47B Reasoning (43.5 percent). | Image: Technology Innovation Institute (TII)

The model is available as a complete checkpoint and quantized version on Hugging Face, along with a demo. TII released it under the Falcon LLM license, which allows free use, reproduction, modification, distribution, and commercial use. Users must follow the Acceptable Use Policy, which TII can update at any time.

Read full article about: Only 5 percent of ChatGPT's 900 million weekly users pay, and reportedly most aren't worth much to advertisers

Almost 90 percent of ChatGPT's roughly 900 million weekly users live outside the USA and Canada, according to The Information, citing data from tracking platform Sensor Tower. This creates a challenge for OpenAI's planned advertising business, since international users generate far less revenue. At Pinterest, for example, the average revenue per user in the USA is $7.64, compared to just 21 cents elsewhere.

India and Brazil rank among the largest ChatGPT markets alongside the USA, Japan, and France. Only about five percent of users pay for subscriptions. For emerging markets like India, OpenAI offers the cheaper "ChatGPT Go" plan at around $5 per month.

OpenAI plans to generate roughly $110 billion from free users by 2030, with advertising likely playing a major role. The company needs this aggressive revenue growth to meet its data center commitments.

Read full article about: Anthropic President Daniela Amodei says "the exponential continues until it doesn't"

"The exponential continues until it doesn't," says Anthropic President Daniela Amodei, quoting her colleagues. At Anthropic, the team believed every year that this pace couldn't possibly keep up, and yet it did, Amodei says in an interview with CNBC TV. But that's not guaranteed, she adds. Anthropic doesn't know the future either and could be wrong about this assumption.

Economically, things get more complicated, Amodei says (from 15:56). Even if the models keep improving, rolling them out in companies can stall for "human reasons": change management takes time, procurement processes move slowly, and specific use cases often remain unclear. The key question for whether AI is in a bubble comes down to whether the economy can absorb the technology as fast as it's advancing, she suggests.