After months of speculation, Midjourney has launched its first video model - a move the company describes as an early milestone toward AI systems that can simulate entire 3D worlds in real time.
The new "Image-to-Video" feature lets users turn any Midjourney image into a short animated clip. Animation is handled through a new "Animate" button in the Midjourney web interface. Users can choose between an automatic mode, where the system determines the movement, and a manual mode, where they describe how the animation should unfold.
Demo reel for Midjourney's new animation feature. | Video: Midjourney
There are two main settings: "Low motion" works best for scenes with steady cameras and slow movement, while "High motion" animates both the camera and subject more aggressively - though this can sometimes produce less accurate results, Midjourney says.
Each video can be extended by about four seconds, up to four times total. Users can also tweak the original image prompt with each extension.
An image of a red-bearded man dancing in the rain was automatically animated and then manually extended with the follow-up prompt "dances and jumps." | Video: Midjourney Animate prompted by THE DECODER
Users can also animate images created outside of Midjourney by dragging them into the prompt bar and setting them as the "Start Frame." The desired motion is then described in a text prompt.
Prompt: "Turning like a wheel" | Video: Midjourney Animate prompted by THE DECODER
Midjourney hasn't published official specs for resolution, framerate, or bitrate, and there's no built-in upscaling yet. However, downloaded videos appear as 480p MP4 files at 24 frames per second.
Video generation costs about eight times as much as creating an image
The video feature is currently available only through the web interface. Each video job costs about eight times as much as an image job and generates four five-second clips. In practice, this comes out to roughly one image equivalent per second of video. Midjourney claims this is about 25 times cheaper than competing services.
For subscribers on the "Pro" tier or higher, Midjourney is also testing a "Video Relax Mode," which lets users generate videos without using their fast processing minutes, potentially lowering the cost per job. The company says pricing will be adjusted in the coming weeks based on demand and server load.
Midjourney describes this video model as a necessary intermediate step. The plan is to eventually combine video models, 3D elements, and real-time processing into a unified platform. Founder David Holz has long aimed to build a system capable of real-time world simulation. Lessons learned from building the video model are also feeding back into Midjourney's existing image tools.
Competition and Legal Pressure
Elsewhere in AI video, Google's new Veo 3 model is widely considered the frontrunner. Veo 3 can generate videos directly from text prompts, with no need for a starting image, and can add voices and sound effects, making it a standout in the current AI video landscape.
Meanwhile, Disney and Universal have filed a joint lawsuit against Midjourney, claiming that the AI image generator creates unauthorized images of trademarked characters like Darth Vader and the Minions. The complaint, filed in U.S. District Court in California, accuses Midjourney of repeatedly copying copyrighted material despite previous warnings.
Both studios are seeking damages, a jury trial, and an order to block future use of protected characters. Similar copyright disputes involving Midjourney go back to 2023. So far, Midjourney has not responded publicly, and it's unclear what video data was used to train its new model.