Ad
Skip to content

Jonathan Kemper

Jonathan writes for THE DECODER about how AI tools can improve both work and creative projects.

ByteDance's StoryMem gives AI video models a memory so characters stop shapeshifting between scenes

ByteDance tackles one of AI video generation’s most persistent problems: characters that change appearance from scene to scene. The new StoryMem system remembers how characters and environments should look, keeping them consistent throughout an entire story.

AI reasoning models think harder on easy problems than hard ones, and researchers have a theory for why

If I spent more time thinking about a simple task than a complex one—and did worse on it—my boss would have some questions. But that’s exactly what’s happening with current reasoning models like Deepseek-R1. A team of researchers took a closer look at the problem and proposed theoretical laws describing how AI models should ideally ‘think.’

Meta brings Segment Anything to audio, letting editors pull sounds from video with a click or text prompt

Filtering a dog bark from street noise or isolating a sound source with a single click on a video: Meta’s SAM Audio brings the company’s visual segmentation approach to the audio world. The model lets users edit audio using text commands, clicks, or time markers. Code and weights are open source.

Read full article about: Zhipu AI challenges Western rivals with low-cost GLM-4.7

Zhipu AI has introduced GLM-4.7, a new model specialized in autonomous programming that uses "Preserved Thinking" to retain reasoning across long conversations. This capability works alongside the "Interleaved Thinking" feature introduced in GLM-4.5, which allows the system to pause and reflect before executing tasks. The model shows a significant performance jump over its predecessor, GLM-4.6, scoring 73.8 percent on the SWE-bench Verified test. Beyond writing code, Zhipu says GLM-4.7 excels at "vibe coding" - generating aesthetically pleasing websites and presentations. In a blog post, the company showcased several sites reportedly created from a single prompt. Benchmark comparisons show a tight race between GLM-4.7 and commercial Western models from providers like OpenAI and Anthropic. | Image: Zhipu AI

A table with benchmark results comparing the GLM-4.7 AI model with competitors; the model shows leading values in categories such as Reasoning, Code Agent and General Agent.
Benchmark comparisons show a tight race between GLM-4.7 and commercial Western models from providers like OpenAI and Anthropic.

The model is available through the Z.ai platform and OpenRouter, or as a local download on Hugging Face. It also integrates directly into coding workflows like Claude Code. Z.ai is positioning the release as a cost-effective alternative, claiming it costs just one-seventh as much as comparable models.

Read full article about: Alibaba's new Qwen models can clone voices from three seconds of audio

The Qwen team at Alibaba Cloud has released two new AI models that create or clone voices using text commands. The Qwen3-TTS-VD-Flash model lets users generate voices based on detailed descriptions, allowing them to precisely define characteristics like emotion and speaking tempo. For example, a user could request a "Male, middle-aged, booming baritone - hyper-energetic infomercial voice with rapid-fire delivery and exaggerated pitch rises, dripping with salesmanship." According to the manufacturer, the model outperforms the API for OpenAI's GPT-4o mini-tts, which launched earlier this spring.

The second release, Qwen3-TTS-VC-Flash, can copy voices from just three seconds of audio and reproduce them in ten languages. Qwen claims the model achieves a lower error rate than competitors like Elevenlabs or MiniMax. The AI is also capable of processing complex texts, imitating animal sounds, and extracting voices from recordings. Both models are accessible via the Alibaba Cloud API. You can try demos for the design model and the clone model on Hugging Face.

Comment Source: Qwen

Google's open standard lets AI agents build user interfaces on the fly

Google’s new A2UI standard gives AI agents the ability to create graphical interfaces on the fly. Instead of just sending text, AIs can now generate forms, buttons, and other UI elements that blend right into any app.