MVDream creates impressive 3D renderings from text

MVDream uses Stable Diffusion and NeRFs to generate some of the best 3D renderings yet from text prompts.

Researchers at ByteDance present MVDream (Multi-view Diffusion for 3D Generation), a diffusion model capable of generating high-quality 3D renderings from text prompts. Similar models already exist, but MVDream achieves comparatively high quality and avoids two core problems of alternative approaches.

These often struggle with the Janus problem and content drift. For example, a generated baby Yoda has multiple faces, or a generated plate of waffles changes the number and arrangement of the waffles depending on the viewing angle.

To solve this problem, ByteDance trains a diffusion model such as Stable Diffusion not only with the usual prompt-image pairs but also with multiple views of 3D objects. To do this, the researchers render a large dataset of 3D models from different perspectives and camera angles.

By seeing coherent views from different angles, the model learns to produce coherent 3D shapes instead of disjointed 2D images, the team says.

Video: ByteDance

MVDream to get even better with SDXL

Specifically, the model generates images of an object from different perspectives from a text prompt, which the team then uses to train a NeRF as a 3D representation of the object.

In direct comparison to alternative approaches, MVDream shows a significant jump in quality and avoids common artifacts such as the Janus problem or content drift.

Video: ByteDance

Recommendation

AI research

LLMs can outperform neuroscientists at predicting research outcomes

In an experiment, the team also shows that MVDream can learn new concepts via Dreambooth and then generate 3D views of a specific dog, for example.

Video: ByteDance

The team cites the still low resolution of 256 x 256 pixels and limited generalizability as limitations. However, ByteDance expects that both problems can be reduced or solved in the future by using larger diffusion models such as SDXL. To significantly improve the quality and style of 3D renderings, however, the team says that extensive training with a new dataset will likely be required.

More information and examples are available on the MVDreams GitHub.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

MVDream creates impressive 3D renderings from text

MVDream to get even better with SDXL

LLMs can outperform neuroscientists at predicting research outcomes

AI system StreamDiT generates livestream videos from text at 16 fps 512p

Researchers used 1,600 YouTube fail videos to show AI models struggle with surprises

AI coding can make developers slower even if they feel faster

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

MVDream creates impressive 3D renderings from text

MVDream to get even better with SDXL

Share

Bank details