Ad
Skip to content
Read full article about: Alibaba's Qwen3-Coder-Next delivers solid coding performance in a compact package

Alibaba has released Qwen3-Coder-Next, a new open-weight AI model for programming agents and local development. Trained on 800,000 verifiable tasks, the model has 80 billion parameters total but only 3 billion active at any time. Despite this small footprint, Alibaba says it outperforms or matches much larger open-source models on coding benchmarks, scoring above 70 percent on SWE-Bench Verified with the SWE-Agent framework. As always, benchmarks only indicate real-world performance.

Performance on Coding Agent Benchmarks
Qwen3-Coder-Next competes with much larger models across multiple coding benchmarks while using only 3 billion active parameters. | Image: Qwen

The model supports 256,000 tokens of context and works with development environments like Claude Code, Qwen Code, Qoder, Kilo, Trae, and Cline. Local tools like Ollama, LMStudio, MLX-LM, llama.cpp, and KTransformers also support it. Qwen3-Coder-Next is available on Hugging Face and ModelScope under the Apache 2.0 license. More details in the blog post and technical report.

Read full article about: OpenAI hires Anthropic's Dylan Scandinaro to lead AI safety as "extremely powerful models" loom

OpenAI has filled its "Head of Preparedness" position with Dylan Scandinaro, who previously worked on AI safety at competitor Anthropic. CEO Sam Altman announced the hire on X, calling Scandinaro "by far the best candidate" for the role. With OpenAI working on "extremely powerful models," Altman said strong safety measures are essential.

In his own post, Scandinaro acknowledged the technology's major potential benefits but "risks of extreme and even irrecoverable harm." OpenAI recently disclosed that a new coding model received a "high" risk rating in cybersecurity evaluations.

There’s a lot of work to do, and not much time to do it!

Dylan Scandinaro

Scandinaro's Anthropic background adds an interesting layer. The company was founded by former OpenAI employees concerned about OpenAI's product focus and what they saw as insufficient safety measures, and has since become known as one of the more safety-conscious AI developers. Altman says he plans to work with Scandinaro to implement changes across the company.

Ad

A new platform lets AI agents pay humans to do the real-world work they can't

On Rentahuman.ai, AI agents can hire people for real-world tasks, from holding signs to picking up packages. It sounds absurd, but it shows what happens when language models stop just talking and start taking action.

Read full article about: Anthropic partners with leading research institutes to tackle biology's data bottleneck

Anthropic has announced two partnerships with major US research institutions to develop AI agents for biological research. The Allen Institute and the Howard Hughes Medical Institute (HHMI) will serve as founding partners in the initiative. According to Anthropic, "modern biological research generates data at unprecedented scale," but turning it into "validated biological insights remains a fundamental bottleneck." The company says manual processes "can't keep pace with the data being produced."

HHMI will develop specialized AI agents at the Janelia Research Campus that connect experimental knowledge to scientific instruments and analysis pipelines. The Allen Institute is working on multi-agent systems for data integration and experiment design that could "compress months of manual analysis into hours." According to Anthropic, these systems "are designed to amplify scientific intuition rather than replace it, keeping researchers in control of scientific direction while handling computational complexity."

The move extends Anthropic's push into scientific applications. The company recently launched Cowork, a feature designed for office work that gives Claude access to local files. OpenAI is also targeting the research market with Prism, an AI workspace for scientific writing.

Ad
Read full article about: Gemini models dominate new AI rankings for strategic board games

Google's Gemini models are outperforming the competition in board game benchmarks. Google Deepmind and Kaggle have expanded their "Game Arena" platform with two new games: Werewolf and Poker. The platform tests AI models across strategic games that measure different cognitive abilities—chess evaluates logical thinking, Werewolf tests social skills like communication and detecting deception, and Poker assesses how models handle risk and incomplete information.

These games provide objective ways to measure skills like planning and decision-making under uncertainty. Gemini 3 Pro and Gemini 3 Flash currently hold the top spots in all rankings. The Werewolf benchmark serves double duty for security research as well: it tests whether models can detect manipulation without any real-world consequences. According to Google Deepmind CEO Demis Hassabis, the AI industry needs more rigorous tests to properly evaluate the latest models.

Read full article about: French prosecutors raid X's Paris offices over data and child abuse allegations

French prosecutors have raided the Paris offices of Elon Musk's platform X. The cybercrime unit is investigating multiple allegations, including unlawful data extraction and aiding the distribution of child sexual abuse material. Sexual deepfakes are also part of the investigation. Musk and former X CEO Linda Yaccarino have been summoned for hearings in April, according to the BBC. X has previously called the investigation politically motivated.

At the same time, the UK's Information Commissioner's Office (ICO) has opened an investigation into Musk's AI tool Grok. The probe focuses on whether personal data was used without consent to create sexualized images. The UK media regulator Ofcom and the European Commission are also continuing their reviews of the platform. X has not commented on the investigations.

Comment Source: BBC
Ad