ByteDance's StoryMem gives AI video models a memory so characters stop shapeshifting between scenes
ByteDance tackles one of AI video generation’s most persistent problems: characters that change appearance from scene to scene. The new StoryMem system remembers how characters and environments should look, keeping them consistent throughout an entire story.
AI reasoning models think harder on easy problems than hard ones, and researchers have a theory for why
If I spent more time thinking about a simple task than a complex one—and did worse on it—my boss would have some questions. But that’s exactly what’s happening with current reasoning models like Deepseek-R1. A team of researchers took a closer look at the problem and proposed theoretical laws describing how AI models should ideally ‘think.’
Less is more: Meta’s new image model, Pixio, beats more complex competitors at depth estimation and 3D reconstruction, despite having fewer parameters. The training method was considered outdated.
Meta brings Segment Anything to audio, letting editors pull sounds from video with a click or text prompt
Filtering a dog bark from street noise or isolating a sound source with a single click on a video: Meta’s SAM Audio brings the company’s visual segmentation approach to the audio world. The model lets users edit audio using text commands, clicks, or time markers. Code and weights are open source.
Zhipu AI has introduced GLM-4.7, a new model specialized in autonomous programming that uses "Preserved Thinking" to retain reasoning across long conversations. This capability works alongside the "Interleaved Thinking" feature introduced in GLM-4.5, which allows the system to pause and reflect before executing tasks. The model shows a significant performance jump over its predecessor, GLM-4.6, scoring 73.8 percent on the SWE-bench Verified test. Beyond writing code, Zhipu says GLM-4.7 excels at "vibe coding" - generating aesthetically pleasing websites and presentations. In a blog post, the company showcased several sites reportedly created from a single prompt. Benchmark comparisons show a tight race between GLM-4.7 and commercial Western models from providers like OpenAI and Anthropic. | Image: Zhipu AI
Benchmark comparisons show a tight race between GLM-4.7 and commercial Western models from providers like OpenAI and Anthropic.
The model is available through the Z.ai platform and OpenRouter, or as a local download on Hugging Face. It also integrates directly into coding workflows like Claude Code. Z.ai is positioning the release as a cost-effective alternative, claiming it costs just one-seventh as much as comparable models.
The Qwen team at Alibaba Cloud has released two new AI models that create or clone voices using text commands. The Qwen3-TTS-VD-Flash model lets users generate voices based on detailed descriptions, allowing them to precisely define characteristics like emotion and speaking tempo. For example, a user could request a "Male, middle-aged, booming baritone - hyper-energetic infomercial voice with rapid-fire delivery and exaggerated pitch rises, dripping with salesmanship." According to the manufacturer, the model outperforms the API for OpenAI's GPT-4o mini-tts, which launched earlier this spring.
The second release, Qwen3-TTS-VC-Flash, can copy voices from just three seconds of audio and reproduce them in ten languages. Qwen claims the model achieves a lower error rate than competitors like Elevenlabs or MiniMax. The AI is also capable of processing complex texts, imitating animal sounds, and extracting voices from recordings. Both models are accessible via the Alibaba Cloud API. You can try demos for the design model and the clone model on Hugging Face.
Google's open standard lets AI agents build user interfaces on the fly
Google’s new A2UI standard gives AI agents the ability to create graphical interfaces on the fly. Instead of just sending text, AIs can now generate forms, buttons, and other UI elements that blend right into any app.