Less is more: Meta’s new image model, Pixio, beats more complex competitors at depth estimation and 3D reconstruction, despite having fewer parameters. The training method was considered outdated.
Meta brings Segment Anything to audio, letting editors pull sounds from video with a click or text prompt
Filtering a dog bark from street noise or isolating a sound source with a single click on a video: Meta’s SAM Audio brings the company’s visual segmentation approach to the audio world. The model lets users edit audio using text commands, clicks, or time markers. Code and weights are open source.
Zhipu AI has introduced GLM-4.7, a new model specialized in autonomous programming that uses "Preserved Thinking" to retain reasoning across long conversations. This capability works alongside the "Interleaved Thinking" feature introduced in GLM-4.5, which allows the system to pause and reflect before executing tasks. The model shows a significant performance jump over its predecessor, GLM-4.6, scoring 73.8 percent on the SWE-bench Verified test. Beyond writing code, Zhipu says GLM-4.7 excels at "vibe coding" - generating aesthetically pleasing websites and presentations. In a blog post, the company showcased several sites reportedly created from a single prompt. Benchmark comparisons show a tight race between GLM-4.7 and commercial Western models from providers like OpenAI and Anthropic. | Image: Zhipu AI
Benchmark comparisons show a tight race between GLM-4.7 and commercial Western models from providers like OpenAI and Anthropic.
The model is available through the Z.ai platform and OpenRouter, or as a local download on Hugging Face. It also integrates directly into coding workflows like Claude Code. Z.ai is positioning the release as a cost-effective alternative, claiming it costs just one-seventh as much as comparable models.
The Qwen team at Alibaba Cloud has released two new AI models that create or clone voices using text commands. The Qwen3-TTS-VD-Flash model lets users generate voices based on detailed descriptions, allowing them to precisely define characteristics like emotion and speaking tempo. For example, a user could request a "Male, middle-aged, booming baritone - hyper-energetic infomercial voice with rapid-fire delivery and exaggerated pitch rises, dripping with salesmanship." According to the manufacturer, the model outperforms the API for OpenAI's GPT-4o mini-tts, which launched earlier this spring.
The second release, Qwen3-TTS-VC-Flash, can copy voices from just three seconds of audio and reproduce them in ten languages. Qwen claims the model achieves a lower error rate than competitors like Elevenlabs or MiniMax. The AI is also capable of processing complex texts, imitating animal sounds, and extracting voices from recordings. Both models are accessible via the Alibaba Cloud API. You can try demos for the design model and the clone model on Hugging Face.
Google's open standard lets AI agents build user interfaces on the fly
Google’s new A2UI standard gives AI agents the ability to create graphical interfaces on the fly. Instead of just sending text, AIs can now generate forms, buttons, and other UI elements that blend right into any app.
OpenAI is significantly expanding the availability of ChatGPT Go, its budget-friendly subscription tier. Following a launch in India in August, the plan is now available in over 70 additional countries—including markets across Europe and South America—according to an updated support page. In Germany, the service costs 8 euros per month. Beyond extended access to the flagship model, the subscription adds capabilities for image generation, file analysis, and data evaluation, along with a larger context window for handling longer conversations. Users can also organize projects and build their own custom GPTs. However, the plan excludes access to Sora, the API, and older models like GPT-4o.
The broader rollout comes alongside a cost-saving adjustment to how the system handles queries. OpenAI recently removed the automatic model router for users on the free tier and the Go subscription. By default, the system now answers requests using the faster GPT-5.2 Instant. Users must manually switch to more powerful reasoning models when needed, as the automatic routing feature is now exclusive to the higher-priced plans.
Anthropic's AI store makes money while debating eternal transcendence
Anthropic’s autonomous kiosk is finally making money, but not without drama. In the second phase of Project Vend, stronger models, stricter processes, and an AI “CEO” turned losses into profits, while also exposing how easily AI agents can be manipulated, misunderstand authority, or ignore real‑world laws. The experiment shows that structure and guardrails matter more than raw intelligence when AI runs a business.