Content
summary Summary

The leap from 2D to 3D poses challenges for existing diffusion methods. RenderDiffusion seems promising because it can render a 3D scene based on a single 2D image.

For 2D images, diffusion methods have made great progress over the past few months. Gradually, researchers on this path are also seeing success for 3D objects. Google, for example, recently demonstrated 3DiM, which can generate 3D views from 2D images.

Diffusion models currently achieve the best performance in both conditional and unconditional image generation, according to researchers at several UK universities and Adobe Research. So far, however, these models have not supported consistent 3D generation or reconstruction of objects from a single perspective.

Image: Titas Anciukevičius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J. Mitra, Paul Guerrero

3D denoising

In their paper, the researchers present RenderDiffusion. This, they say, is the first diffusion model for 3D generation and inference that can be trained using only monocular 2D supervision. The model can generate a 3D scene from a single 2D image end-to-end without relying on multiview data such as Gaudi.

Ad
Ad

At the heart of the method is a customized architecture for denoising the original image. At each step, the method generates a three-dimensional, volumetric 3D representation of a scene. The resulting 3D representation can be rendered from any viewpoint. This diffusion-based approach also allows the use of 2D inpainting to modify the generated 3D scenes.

Image: Titas Anciukevičius, Zexiang Xu, Matthew Fisher, Paul Henderson, Hakan Bilen, Niloy J. Mitra, Paul Guerrero

Compared to similar generative 3D models such as the GAN-based EG3D and PixelNeRF, which uses multiview images of 2D input images, RenderDiffusion produces more faithful 3D objects of the input image that are also sharper and more detailed, the researchers write.

A major disadvantage of RenderDiffusion is that training images must be labeled with camera parameters. In addition, the generation across different object categories is difficult.

These limitations could be overcome by estimating camera parameters and object bounding boxes, and by using an object-centric coordinate system. In this way, the system could also generate scenes with multiple objects, the researchers write.

RenderDiffussion could enable "full 3D generation at scale"

The researchers see their paper as a significant contribution to the 3D industry: "We believe that our work promises to enable full 3D generation at scale when trained on massive image collections, thus circumventing the need to have large-scale 3D model collections for supervision."

Recommendation

Future work could enable object and material editing to enable an "expressive 3D-aware 2D image editing workflow."

The researchers plan to publish their code and datasets on Github soon.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Diffusion models generate high-quality 2D images. Similar capable systems for 3D generation do not yet exist.
  • This could change with the new RenderDiffusion method, which can render a volumetric 3D scene based on a single 2D image.
  • The research team believes their work will enable large-scale 3D content generation.
Sources
Jonathan works as a technology journalist who focuses primarily on how easily AI can already be used today and how it can support daily life.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.