summary Summary

Researchers at ETH Zurich and Google Zurich have presented a new method called InseRF for generating objects into 3D scenes.


The method uses a text prompt and a 2D bounding box at a reference point to generate new objects in a NeRF. Experiments show that InseRF outperforms existing methods and can insert consistent objects into NeRFs without requiring explicit 3D information as input.

InseRF combines advances in NeRFs with advances in generative AI, such as transforming single images into 3D models or 3D processing.

InseRF relies on diffusion models and NeRFs

To integrate new 3D objects into NeRF, InseRF starts with a 2D image of the 3D scene, on which the user can mark an area where a change should be made. The change is described by a text prompt, e.g. "a teacup on a table". InseRF then generates a teacup in this 2D view using a diffusion model and estimates the depth information of the generated view. This data is then used to update NeRF and generate the 3D cup.

Image: Shahbazi, Claessens et al.

This process makes it possible to create a new 3D object in a scene that is consistent across multiple views and can be placed anywhere. According to the team, the method also overcomes the challenges of 3D-consistent creation and placement of objects in different views, which is a major hurdle for generative 2D models.

InseRF outperforms alternatives

The researchers test InseRF on some real indoor and outdoor scenes from the MipNeRF-360 and Instruct-NeRF2NeRF datasets. The results clearly show that InseRF can locally modify the scene and insert 3D-consistent objects.

Video: ETH Zurich / Google

The performance of InseRF is, however, limited by the capabilities of the underlying generative 2D and 3D models. But future improvements to these models could easily be applied to the InseRF pipeline. The team plans to test additional methods in the future, such as improving shadowing and equalizing the quality of the generated object and its environment.

Further examples and information can be found on the InseRF project page.

Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
  • Researchers at ETH Zurich and Google Zurich have developed InseRF, a method for inserting objects into 3D scenes using text prompts and 2D bounding boxes.
  • InseRF combines advances in Neural Radiance Fields (NeRFs) and generative AI to insert consistent objects into 3D scenes without the need for explicit 3D information.
  • The method has been successfully tested on real indoor and outdoor scenes and shows potential for future improvements in generative 2D and 3D models.
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.