Generative AI can create photorealistic images, first models can generate videos. Bytedance and Alibaba are now showing models that generate photorealistic video clips of animated people.
Both companies use slightly different architectures, but both essentially use diffusion models that adapt images to a model that specifies the poses of, say, a dance sequence. MagicAnimate, from Bytedance and the Show Lab at the National University of Singapore, and Animate Anyone, from Alibaba and the Institute for Intelligent Computing, both generate short video clips of dancing people or cartoon characters from a reference image and a dance sequence.
Using methods such as ControlNet and temporal stability techniques, the videos achieve much higher consistency than other text-to-video or image-to-video models, beating the current best of the TikTok benchmark by almost 40 percent.
Both methods require only one image and one motion sequence to generate videos — this can be a real person, the Mona Lisa, or an AI-generated image. Bytedance MagicAnimate can even animate several people simultaneously.
More opportunities for TikTok — and AI influencers
With these methods, AI influencers, which today mainly exist in the form of static images and AI-generated text, could soon also pick up on current TikTok trends or create other short video clips. In the future, Bytedance could also offer the models directly on TikTok for its customers.
The code for MagicAnimate is available on the project pages on Github. A demo for MagicAnimate is also available. The code for Animate Anyone should also be available on Github soon - the team wants to make some improvements before the release.