AI research

MVDream creates impressive 3D renderings from text

Maximilian Schreiner

ByteDance

MVDream uses Stable Diffusion and NeRFs to generate some of the best 3D renderings yet from text prompts.

Researchers at ByteDance present MVDream (Multi-view Diffusion for 3D Generation), a diffusion model capable of generating high-quality 3D renderings from text prompts. Similar models already exist, but MVDream achieves comparatively high quality and avoids two core problems of alternative approaches.

These often struggle with the Janus problem and content drift. For example, a generated baby Yoda has multiple faces, or a generated plate of waffles changes the number and arrangement of the waffles depending on the viewing angle.

To solve this problem, ByteDance trains a diffusion model such as Stable Diffusion not only with the usual prompt-image pairs but also with multiple views of 3D objects. To do this, the researchers render a large dataset of 3D models from different perspectives and camera angles.

By seeing coherent views from different angles, the model learns to produce coherent 3D shapes instead of disjointed 2D images, the team says.

Video: ByteDance

MVDream to get even better with SDXL

Specifically, the model generates images of an object from different perspectives from a text prompt, which the team then uses to train a NeRF as a 3D representation of the object.

In direct comparison to alternative approaches, MVDream shows a significant jump in quality and avoids common artifacts such as the Janus problem or content drift.

Video: ByteDance

In an experiment, the team also shows that MVDream can learn new concepts via Dreambooth and then generate 3D views of a specific dog, for example.

Video: ByteDance

The team cites the still low resolution of 256 x 256 pixels and limited generalizability as limitations. However, ByteDance expects that both problems can be reduced or solved in the future by using larger diffusion models such as SDXL. To significantly improve the quality and style of 3D renderings, however, the team says that extensive training with a new dataset will likely be required.

More information and examples are available on the MVDreams GitHub.

Sources: