Content
summary Summary

Tripo AI and Stability AI release an open-source image-to-3D model that generates 3D content in less than a second.

Researchers from Stability AI and Tripo AI have unveiled TripoSR, an AI model that enables 3D reconstruction of objects from a single image in less than 0.5 seconds on an Nvidia A100. TripoSR is said to be superior to other open source alternatives in both quality and quantity.

Such models have great potential for the entertainment, gaming, industrial design, and architecture industries by enabling fast and efficient visualization of 3D objects.

TripoSR uses NeRF and Vision Transformer

TripoSR uses a single RGB image as input. This image serves as the basis for the subsequent 3D reconstruction. First, the image is processed by a pre-trained image encoder based on a vision transformer (DINOv1). This step converts the image into a set of latent vectors that encode both global and local features of the image. These vectors contain the information necessary to reconstruct the 3D object.

Ad
Ad

A decoder then converts the latent vectors into a triplane NeRF representation, a 3D representation suitable for objects with complex shapes and textures. The decoder uses transformer layers that allow it to learn relationships between different parts of the triplane representation while integrating global and local image features.

Unlike other approaches that require camera parameters, such as information about the camera's position in space, TripoSR "guesses" these parameters during training and inference. This increases the robustness of the model by eliminating the need for precise camera information.

To further improve performance, the team made other specific optimizations, including pre-selecting realistic and high-quality 3D models for training from the Objaverse dataset.

Demo and a first ComfyUI node are already available

The source code and model weights of TripoSR are available for download under the MIT license, which permits use for commercial, personal, and research purposes.

Based on an image generated via Midjourney, TripoSR exposes the object and generates a simple 3D model. | Video: THE DECODER

Recommendation

There is a demo on Hugging Face. There is also a first community implementation for the ComfyUI interface.

More examples, the code, and the model are available on Github and Hugging Face.

Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at Tripo AI and Stability AI present TripoSR, an AI model that creates 3D models from a single image in less than 0.5 seconds and could be useful for applications in entertainment, gaming, industrial design, and architecture.
  • TripoSR processes an RGB image through a vision-transformer-based encoder, which converts it into latent vectors, and a decoder, which converts these vectors into a Triplane-NeRF representation for 3D reconstruction.
  • The model is available under the MIT Open Source license, which permits its use for commercial, personal, and research purposes.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.