Content
summary Summary

Shengshu Technology and Tsinghua University unveil Vidu, their first Sora-like AI model for text-to-video creation, but it still falls short of OpenAI's impressive video debut.

Chinese AI company Shengshu Technology and Tsinghua University unveiled Vidu at the Zhongguancun Forum 2024 in Beijing. Vidu can create a 16-second HD video at 1080p resolution with a single click, and is "very close" to the level of OpenAI's Sora model, according to Shengshu Technology.

Compared to Sora, Vidu is supposed to better "understand and generate Chinese elements such as the panda and dragon", a claim that has yet to be proven in practice. Shengshu Technology also says that the core architecture of the model was developed in September 2022, before the launch of Sora, China Daily reports.

Video: Shengshu Technology via Reddit

Ad
Ad

Despite the confidence of its developers, the quality of Vidu seems to lag Sora. The most significant difference is that while Sora can generate continuous videos of up to one minute, Vidu currently only manages 16 seconds.

Although Shengshu Technology promises "exceptional consistency" within these scenes, meaning that the individual images build on each other logically, Vidu is still far from matching Sora's capabilities. One reason could be the limited access to GPUs in China compared to OpenAI.

With Vidu, however, China is demonstrating its serious ambitions to catch up with or even surpass leading US companies such as OpenAI in the race for generative AI models. This will require a significant increase in performance.

Sora is expected to be released this year, with OpenAI planning to scale the model further. Details on pricing and generation times are not yet known.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • With Vidu, China unveils its first AI model for text-to-video generation, which is said to be comparable to OpenAI's Sora, but still lags far behind the US competition.
  • Vidu can create 16-second HD videos with 1080p resolution at the touch of a button, and is said to be "very close" to Sora's level, which has yet to be proven in practice.
  • Despite the emphasis on "exceptional consistency" within scenes, Vidu's maximum 16-second video length falls far short of Sora's ability to create continuous videos of up to one minute.
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.