Tripo AI and Stability AI release an open-source image-to-3D model that generates 3D content in less than a second.
Researchers from Stability AI and Tripo AI have unveiled TripoSR, an AI model that enables 3D reconstruction of objects from a single image in less than 0.5 seconds on an Nvidia A100. TripoSR is said to be superior to other open source alternatives in both quality and quantity.
Such models have great potential for the entertainment, gaming, industrial design, and architecture industries by enabling fast and efficient visualization of 3D objects.
TripoSR uses NeRF and Vision Transformer
TripoSR uses a single RGB image as input. This image serves as the basis for the subsequent 3D reconstruction. First, the image is processed by a pre-trained image encoder based on a vision transformer (DINOv1). This step converts the image into a set of latent vectors that encode both global and local features of the image. These vectors contain the information necessary to reconstruct the 3D object.
A decoder then converts the latent vectors into a triplane NeRF representation, a 3D representation suitable for objects with complex shapes and textures. The decoder uses transformer layers that allow it to learn relationships between different parts of the triplane representation while integrating global and local image features.
Unlike other approaches that require camera parameters, such as information about the camera's position in space, TripoSR "guesses" these parameters during training and inference. This increases the robustness of the model by eliminating the need for precise camera information.
To further improve performance, the team made other specific optimizations, including pre-selecting realistic and high-quality 3D models for training from the Objaverse dataset.
Demo and a first ComfyUI node are already available
The source code and model weights of TripoSR are available for download under the MIT license, which permits use for commercial, personal, and research purposes.
Based on an image generated via Midjourney, TripoSR exposes the object and generates a simple 3D model. | Video: THE DECODER
There is a demo on Hugging Face. There is also a first community implementation for the ComfyUI interface.
More examples, the code, and the model are available on Github and Hugging Face.
 
             
					 
							 
					 
					 
					 
					 
					