Ad
Skip to content

AI benchmarks are broken and the industry keeps using them anyway, study finds

Benchmarks are supposed to measure AI model performance objectively. But according to an analysis by Epoch AI, results depend heavily on how the test is run. The research organization identifies numerous variables that are rarely disclosed but significantly affect outcomes.

Researchers extract up to 96% of Harry Potter word-for-word from leading AI models

Harry Potter, Game of Thrones, 1984: researchers pulled nearly complete books from commercial language models. Two of the four systems tested didn’t even put up a fight. The findings could shape ongoing copyright lawsuits against AI companies.

ByteDance's StoryMem gives AI video models a memory so characters stop shapeshifting between scenes

ByteDance tackles one of AI video generation’s most persistent problems: characters that change appearance from scene to scene. The new StoryMem system remembers how characters and environments should look, keeping them consistent throughout an entire story.

AI reasoning models think harder on easy problems than hard ones, and researchers have a theory for why

If I spent more time thinking about a simple task than a complex one—and did worse on it—my boss would have some questions. But that’s exactly what’s happening with current reasoning models like Deepseek-R1. A team of researchers took a closer look at the problem and proposed theoretical laws describing how AI models should ideally ‘think.’