Dreamfusion combines Google's large AI image model Imagen with NeRF's 3D capabilities.
Dreamfusion is the evolution of Dream Fields, a generative 3D AI system that Google unveiled in late 2021. For Dream Fields, Google combined OpenAI's image analysis model CLIP with Neural Radiance Fields (NeRF), which allow a neural network to store 3D models.
Dream Fields leveraged NeRF's ability to generate 3D views and combined it with CLIP's ability to evaluate content from images. After a text input, an untrained NeRF model generates a random view from a single viewpoint, which is evaluated by CLIP. The feedback is used as a correction signal for the NeRF model. This process is repeated up to 20000 times from different viewpoints until a 3D model matching the text description is generated. Dreamfusion further develops this approach.
From 2D images to 3D models
Based on Google's pre-trained 2D text-image diffusion model Imagen, Dreamfusion performs text 3D synthesis. For Dreamfusion, Google is replacing OpenAI's CLIP, which can also be used for 3D generation, with a new loss based on Imagen, which Google says, "could enable many new applications of pre-trained diffusion models."
Therefore, 3D generation does not require training with 3D data that would not be available at the required scale. Instead, Dreamfusion learns the 3D representation using 2D images of an object generated with Imagen from different perspectives. The research team used gaze-dependent prompts such as "front view" or "rear view" for this purpose. The process runs automatically.
Compared to Dream Fields, Dreamfusion creates re-lightable 3D objects with higher quality, depth, and normals based on text input. Multiple 3D models created with Dreamfusion can also be merged into one scene.
"Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors," Google's research team writes.
Exporting generated 3D models for standard 3D tools
The generated NeRF models can be exported into meshes using the Marching Cubes algorithm and then integrated into popular 3D renderers or modeling software.
"We're excited to incorporate our methods with open-source models and enable a new future for 3D generation," wrote contributing Google Brain researcher Ben Poole on Twitter.
An overview of 3D models generated with Dreamfusion is available on Github.