Content
summary Summary

After months of speculation, Midjourney has launched its first video model - a move the company describes as an early milestone toward AI systems that can simulate entire 3D worlds in real time.

Ad

The new "Image-to-Video" feature lets users turn any Midjourney image into a short animated clip. Animation is handled through a new "Animate" button in the Midjourney web interface. Users can choose between an automatic mode, where the system determines the movement, and a manual mode, where they describe how the animation should unfold.

Demo reel for Midjourney's new animation feature. | Video: Midjourney

There are two main settings: "Low motion" works best for scenes with steady cameras and slow movement, while "High motion" animates both the camera and subject more aggressively - though this can sometimes produce less accurate results, Midjourney says.

Ad
Ad

Each video can be extended by about four seconds, up to four times total. Users can also tweak the original image prompt with each extension.

An image of a red-bearded man dancing in the rain was automatically animated and then manually extended with the follow-up prompt "dances and jumps." | Video: Midjourney Animate prompted by THE DECODER

Users can also animate images created outside of Midjourney by dragging them into the prompt bar and setting them as the "Start Frame." The desired motion is then described in a text prompt.

Prompt: "Turning like a wheel" | Video: Midjourney Animate prompted by THE DECODER

Midjourney hasn't published official specs for resolution, framerate, or bitrate, and there's no built-in upscaling yet. However, downloaded videos appear as 480p MP4 files at 24 frames per second.

Recommendation

Video generation costs about eight times as much as creating an image

The video feature is currently available only through the web interface. Each video job costs about eight times as much as an image job and generates four five-second clips. In practice, this comes out to roughly one image equivalent per second of video. Midjourney claims this is about 25 times cheaper than competing services.

For subscribers on the "Pro" tier or higher, Midjourney is also testing a "Video Relax Mode," which lets users generate videos without using their fast processing minutes, potentially lowering the cost per job. The company says pricing will be adjusted in the coming weeks based on demand and server load.

Midjourney describes this video model as a necessary intermediate step. The plan is to eventually combine video models, 3D elements, and real-time processing into a unified platform. Founder David Holz has long aimed to build a system capable of real-time world simulation. Lessons learned from building the video model are also feeding back into Midjourney's existing image tools.

Competition and Legal Pressure

Elsewhere in AI video, Google's new Veo 3 model is widely considered the frontrunner. Veo 3 can generate videos directly from text prompts, with no need for a starting image, and can add voices and sound effects, making it a standout in the current AI video landscape.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Meanwhile, Disney and Universal have filed a joint lawsuit against Midjourney, claiming that the AI image generator creates unauthorized images of trademarked characters like Darth Vader and the Minions. The complaint, filed in U.S. District Court in California, accuses Midjourney of repeatedly copying copyrighted material despite previous warnings.

Both studios are seeking damages, a jury trial, and an order to block future use of protected characters. Similar copyright disputes involving Midjourney go back to 2023. So far, Midjourney has not responded publicly, and it's unclear what video data was used to train its new model.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Midjourney has launched version 1 of its video model, introducing an "image-to-video" feature that lets users turn images into short videos with a choice of animation modes.
  • Users can animate external images using a text prompt, and each video can be extended by about four seconds, with animations either generated automatically or customized.
  • Creating a video costs about eight times more than generating an image and results in four videos, each five seconds long. Founder David Holz describes this release as a step toward future systems capable of real-time 3D world simulation.
Sources
Matthias is the co-founder and publisher of THE DECODER, exploring how AI is fundamentally changing the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.