Ad
Skip to content
Read full article about: Andrej Karpathy says programming is "unrecognizable" now that AI agents actually work

Andrej Karpathy, former AI developer at Tesla and OpenAI, says programming with AI agents has changed fundamentally over the past two months. According to Karpathy, AI agents barely worked before December 2026, but since then they've become reliable, thanks to higher model quality and the ability to stay on task for longer stretches.

As an example, he describes how an AI agent independently built a video analysis dashboard over a weekend: he typed the task in plain English, the agent worked for 30 minutes, solved problems on its own, and delivered a finished result. Three months ago, that would have been an entire weekend project, Karpathy says.

As a result, programming is becoming unrecognizable. You’re not typing computer code into an editor like the way things were since computers were invented, that era is over. You're spinning up AI agents, giving them tasks *in English* and managing and reviewing their work in parallel.

Karpathy via X

Karpathy also points out that these systems aren't perfect and still need human "high-level direction, judgement, taste, oversight, iteration, and hints and ideas." What makes his take especially notable is how recently he held the opposite view. As late as October 2025, he called the hype around AI agents exaggerated, saying the products were far from ready for real-world use. He fundamentally changed that opinion after the release of Opus 4.5 and Codex 5.2 in the winter and is now doubling down on it.

Read full article about: Alibaba's open Qwen 3.5 takes aim at GPT-5 mini and Claude Sonnet 4.5 at a fraction of the cost

Alibaba has expanded its Qwen 3.5 model series. The lineup includes four models: Qwen3.5-Flash, Qwen3.5-35B-A3B, Qwen3.5-122B-A10B, and Qwen3.5-27B. According to Alibaba, the models deliver stronger performance while using less compute. All four take text, images, and video as input and generate text as output. The series started with the release of Qwen3.5-397B-A17B in mid-February.

The smaller Qwen3.5-35B-A3B model outperforms its much larger predecessor, Qwen3-235B-A22B; a clear sign that better architecture, data quality, and reinforcement learning matter more than raw model size. The larger 122B and 27B variants aim to close the remaining gap to top-tier models, particularly in complex agent scenarios.

Benchmarks show Alibaba's Qwen 3.5 models matching or outperforming top Western models like OpenAI's GPT-5 mini, gpt-oss-120b, and Anthropic's Claude Sonnet 4.5. The largest model, Qwen3.5-122B-A10B, leads in several tests: it tops all competitors in agent-based tool use (BFCL V4, 72.2) and agent-based web search (BrowseComp, 63.8). In the HMMT math benchmark, it scores 91.4 - just behind GPT-5 mini (92.0). It also takes the lead in visual reasoning (MMMU-Pro, 76.9) and document recognition (OmniDocBench, 89.8). Claude Sonnet 4.5, on the other hand, clearly outperforms all Qwen models in agent-based terminal coding (49.4) and embodied reasoning (64.7). GPT-5 mini leads in multilingual knowledge (MMMLU, 90.0) and math. Notably, the small Qwen3.5-35B-A3B with just 3 billion active parameters keeps up with much larger models across many tests.
Alibaba's Qwen 3.5 models match or outperform leading Western models like OpenAI's GPT-5 mini, gpt-oss-120b, and Anthropic's Claude Sonnet 4.5 across multiple benchmarks. | Image: Alibaba

All models are available on Hugging Face, ModelScope, and through Qwen Chat. They ship under the Apache License 2.0, a permissive open-source license that allows commercial use, modification, and redistribution. Qwen3.5-Flash is the hosted production version with a context length of one million tokens and built-in tools. The API costs $0.10 per million input tokens and $0.40 per million output tokens.

Read full article about: Perplexity Computer bundles rival AI models into one agentic workflow system for $200 a month

Perplexity has launched "Perplexity Computer," a new chat interface that pulls together multiple agentic AI models into a single system. Similar to Claude Cowork, but browser-based and with access to models from different providers, it handles entire workflows on its own.

Users describe the outcome they want, and the system spins up sub-agents for web research, document creation, data processing, or API calls. According to Perplexity, AI models are becoming increasingly specialized, so a complete workflow needs access to all of them, a convenient argument for a company built on top of other providers' models, though that doesn't make it wrong.

Perplexity Computer currently runs Opus 4.6 as its core model, supplemented by Gemini, Grok, ChatGPT 5.2, Nano Banana for images, and Veo 3.1 for video. Each task gets its own secure environment with browser, file system, and tool connections. Perplexity Computer is available as part of the Max plan at 200 dollars per month.

Read full article about: Google relaunches its AI creative studio Flow with new features and integrations

Google has relaunched and expanded its AI creative studio Flow. The company's image generation experiments, Whisk and ImageFX, are now being integrated directly into Flow, and starting in March, users will be able to transfer their existing projects and files. At the core is Google's image model Nano Banana, which lets users generate images and use them directly as the basis for videos with Veo.

Other new features include a lasso tool for targeted editing of image areas using text input, flexible media management with collections, and tools for extending clips and controlling camera movements. Google is aiming to combine text, image, and video creation into a single workflow.

Flow is available at flow.google and free to use after signing up - paying users get higher usage limits and access to the full set of tools. According to Google, users have created over 1.5 billion images and videos since the platform launched last year.

Read full article about: Adobe's new Firefly "Quick Cut" tool turns raw footage into a rough edit from a text prompt

Adobe has added a new feature called "Quick Cut" to its Firefly AI creative platform. The tool lets video creators upload their own raw footage or generate new material with AI, then automatically produces an initial rough cut. Users describe what the video should be about in plain language—an interview, a product demo, a travel vlog—and Firefly builds a structured first edit from that description. Scripts or shot lists can also be added as optional input.

Quick Cut targets product reviewers, reporters, podcasters, and marketers. Firefly bundles AI models from Adobe, Google, OpenAI, and Runway into a single app. Through March 16, Adobe is offering unlimited image and video generation in up to 2K resolution on select subscription plans.

Read full article about: Anthropic refuses Pentagon demand to loosen military AI restrictions, faces Defense Production Act threat

Anthropic won't back down on its military AI restrictions, but the Pentagon is giving it little choice. According to Reuters, the AI company continues to refuse to loosen its safety guardrails for military use. The dispute centers on security measures that prevent Anthropic's technology from being used for autonomous weapon control and domestic surveillance.

At a meeting between Anthropic CEO Dario Amodei and US Secretary of Defense Pete Hegseth, Hegseth delivered an ultimatum: either Anthropic complies by Friday, or the Pentagon will invoke the Defense Production Act—a law that can force companies to cooperate—or classify Anthropic as a supply chain risk. According to Franklin Turner, a government contracts attorney at McCarter & English, such a move against Anthropic would be unprecedented and could trigger a wave of lawsuits.

Amodei argued that the existing safeguards don't interfere with current military operations. Meanwhile, the Pentagon is negotiating parallel AI contracts with Google, xAI, and OpenAI for battlefield applications, including autonomous drone swarms, robots, and cyberattacks. Elon Musk's xAI has already secured an agreement with the Pentagon to deploy on classified networks this week.

Read full article about: Claude Code sessions now accessible from any device

Claude Code users can now continue a locally running programming session from their smartphone, tablet, or browser. The session keeps running on the user's own machine - no data moves to the cloud. Local files, servers, and project configurations all remain accessible. Users connect through claude.ai/code or the Claude app for iOS and Android and can switch seamlessly between terminal, browser, and phone. If the network drops, the session automatically reconnects, though it ends after roughly ten minutes offline.

The feature is initially available as a research preview for Max subscribers, with Pro users next in line. Unlike Claude Code on the web, which has been running tasks in Anthropic's cloud environments since last year, remote control sessions run entirely on the user's own computer.

Anthropic is aggressively building out Claude Code, adding automated code reviews and GitHub integrations. The company is also raising $10 billion at a $350 billion valuation. Inventor Boris Cherny says the new Claude Cowork tool was built almost entirely with Claude Code itself.

 

Read full article about: Claude can now jump between Excel and PowerPoint on its own

Anthropic now lets Claude switch independently between Excel and PowerPoint, for example, running an analysis and then building a presentation directly from the results. The company is also expanding Cowork for enterprise customers with private plugin marketplaces, letting admins curate and distribute plugin collections to specific teams. New templates cover HR, design, engineering, finance, asset management, and more.

In finance, new MCP interfaces for FactSet and MSCI provide real-time market data and index analysis; S&P Global (Capital IQ Pro) and LSEG have contributed their own plugins.

New third-party integrations include Google Workspace, DocuSign, Salesforce, Slack, and FactSet. Admins gain finer user-access controls plus OpenTelemetry support for cost and usage monitoring. The Excel-PowerPoint feature is available as a research preview on all paid plans. Cowork is Anthropic's desktop tool for agent-based office work; plugins were added in late January but have known security vulnerabilities.