Content
summary Summary

A new AI system developed by Google Research and Google DeepMind transforms photos into realistic 3D scenes in a matter of seconds, as long as it knows where the camera was positioned.

Ad

The system, called Bolt3D, processes photos into complete three-dimensional scenes in just 6.25 seconds on an Nvidia H100 GPU - a task that typically takes other systems minutes or hours to complete.

Bolt3D first figures out where each pixel belongs in 3D space and what color it should be. A second model then determines how transparent each point should be and how it extends through space.

Overview of the Bolt3D methodology: input from multiple images and target poses, latent diffusion models for appearance and geometry, VAE decoder, geometry decoding, Gaussians for splatter images, result as 3D Gaussian scene.
Bolt3D combines diffusion models, VAE decoders, and trained geometry decoding to create a renderable 3D scene from images. | Image: Szymanowicz et al.

The system relies on a technique called "Gaussian splatting" to store its data, organizing the 3D scene using three-dimensional Gaussian functions laid out in 2D grids. Each function tracks position, color, transparency, and spatial information, letting users view the scene from any angle in real time. To keep files manageable, the system strips out transparent areas and compresses the remaining data efficiently.

Ad
Ad

Video: Szymanowicz et al.

Breaking new ground in 3D generation

Tests show Bolt3D performing significantly better than existing fast methods like Flash3D and DepthSplat. While those systems can only blur areas they can't see, Bolt3D actually generates realistic content for hidden parts of scenes.

This capability comes from a specialized AI model designed specifically for handling spatial data - the researchers found that regular models trained on photos alone couldn't handle the complexities of 3D information.

To build this capability, the team trained Bolt3D on about 300,000 3D scenes, using a mix of photo-based reconstructions and computer-generated models. This extensive dataset helps the system make educated guesses about parts of scenes it can't fully see.

Video: Szymanowicz et al.

Recommendation

The system still has its limitations. It struggles with very fine details (anything less than eight pixels wide), transparent materials like glass, and highly reflective surfaces. The quality of results also depends heavily on how the photos were taken and how large the final scene needs to be.

Even with these limitations, Bolt3D appears to be a step forward in 3D content creation. The paper suggests that its speed could make large-scale 3D scene generation practical for the first time. While there's no word yet on public availability, interested users can find more information and interactive demos on the project's website.

The development comes as Stability AI releases its own SPAR3D system, which can also generate 3D objects from single images very quickly. The key difference: while SPAR3D works with individual objects, Bolt3D can handle entire scenes.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Google Research and Google Deepmind have created an AI system called Bolt3D that generates realistic 3D scenes from photographs in just 6.25 seconds, a significant improvement over previous methods that took minutes or hours.
  • Bolt3D works in two steps: first, an AI model analyzes the pixels, and then a second model determines their transparency and spatial extent. The data is stored in a "Gaussian splatting" format, allowing for real-time visualization.
  • The AI has been trained on 300,000 3D scenes and can realistically fill in non-visible areas. However, it has limitations with fine structures smaller than eight pixels and struggles with glass and reflective surfaces.
Sources
Jonathan writes for THE DECODER about how AI tools can make our work and creative lives better.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.