Ad
Skip to content

Jonathan Kemper

Jonathan writes for THE DECODER about how AI tools can improve both work and creative projects.

Best multimodal models still can't crack 50 percent on basic visual entity recognition

A new benchmark called WorldVQA tests whether multimodal AI models actually recognize what they see or just make it up. Even the best performer, Gemini 3 Pro, tops out at 47.4 percent when asked for specific details like exact species or product names instead of generic labels. Worse, the models are convinced they’re right even when they’re wrong.

Study finds AI reasoning models generate a "society of thought" with arguing voices inside their process

New research reveals that reasoning models like Deepseek-R1 simulate entire teams of experts when solving problems: some extraverted, some neurotic, all conscientious. This internal debate doesn’t just look like teamwork. It measurably boosts performance.

Google's PaperBanana uses five AI agents to auto-generate scientific diagrams

Researchers at Peking University and Google built a system that turns method descriptions into scientific diagrams automatically. Five specialized AI agents handle everything from finding reference images to quality control, tackling one of the last manual bottlenecks in academic publishing.

Read full article about: Voxtral Transcribe 2 offers speech recognition at $0.003 per minute

Mistral AI launches Voxtral Transcribe 2, undercutting competitors on speech recognition pricing. The second-generation speech recognition models start at $0.003 per minute and, according to Mistral, outperform GPT-4o mini Transcribe, Gemini 2.5 Flash, and Deepgram Nova in accuracy. The model family comes in two variants: Voxtral Mini Transcribe V2 for processing larger audio files, and Voxtral Realtime for real-time applications with latency under 200 milliseconds. Voxtral Realtime costs twice as much and uses a proprietary streaming architecture that transcribes audio as it arrives - designed for voice assistants, live captioning, or call center analysis.

Both models support 13 languages, including German, English, and Chinese. New features include speaker recognition, word-level timestamps, and support for recordings up to three hours long. Voxtral Realtime is available as open-weights under Apache 2.0 on Hugging Face and via API, while Voxtral Mini Transcribe V2 is only accessible through Le Chat, the Mistral API, and a playground. Mistral released the first Voxtral generation in July 2025.

Read full article about: Anthropic partners with leading research institutes to tackle biology's data bottleneck

Anthropic has announced two partnerships with major US research institutions to develop AI agents for biological research. The Allen Institute and the Howard Hughes Medical Institute (HHMI) will serve as founding partners in the initiative. According to Anthropic, "modern biological research generates data at unprecedented scale," but turning it into "validated biological insights remains a fundamental bottleneck." The company says manual processes "can't keep pace with the data being produced."

HHMI will develop specialized AI agents at the Janelia Research Campus that connect experimental knowledge to scientific instruments and analysis pipelines. The Allen Institute is working on multi-agent systems for data integration and experiment design that could "compress months of manual analysis into hours." According to Anthropic, these systems "are designed to amplify scientific intuition rather than replace it, keeping researchers in control of scientific direction while handling computational complexity."

The move extends Anthropic's push into scientific applications. The company recently launched Cowork, a feature designed for office work that gives Claude access to local files. OpenAI is also targeting the research market with Prism, an AI workspace for scientific writing.