Free TripoSR generates 3D models in half a second

Tripo AI and Stability AI release an open-source image-to-3D model that generates 3D content in less than a second.

Researchers from Stability AI and Tripo AI have unveiled TripoSR, an AI model that enables 3D reconstruction of objects from a single image in less than 0.5 seconds on an Nvidia A100. TripoSR is said to be superior to other open source alternatives in both quality and quantity.

Such models have great potential for the entertainment, gaming, industrial design, and architecture industries by enabling fast and efficient visualization of 3D objects.

TripoSR uses NeRF and Vision Transformer

TripoSR uses a single RGB image as input. This image serves as the basis for the subsequent 3D reconstruction. First, the image is processed by a pre-trained image encoder based on a vision transformer (DINOv1). This step converts the image into a set of latent vectors that encode both global and local features of the image. These vectors contain the information necessary to reconstruct the 3D object.

A decoder then converts the latent vectors into a triplane NeRF representation, a 3D representation suitable for objects with complex shapes and textures. The decoder uses transformer layers that allow it to learn relationships between different parts of the triplane representation while integrating global and local image features.

Unlike other approaches that require camera parameters, such as information about the camera's position in space, TripoSR "guesses" these parameters during training and inference. This increases the robustness of the model by eliminating the need for precise camera information.

To further improve performance, the team made other specific optimizations, including pre-selecting realistic and high-quality 3D models for training from the Objaverse dataset.

Demo and a first ComfyUI node are already available

The source code and model weights of TripoSR are available for download under the MIT license, which permits use for commercial, personal, and research purposes.

Based on an image generated via Midjourney, TripoSR exposes the object and generates a simple 3D model. | Video: THE DECODER

Recommendation

AI research

Automated research: The AI Scientist generates papers for 15 dollars each

There is a demo on Hugging Face. There is also a first community implementation for the ComfyUI interface.

More examples, the code, and the model are available on Github and Hugging Face.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Free TripoSR generates 3D models in half a second

TripoSR uses NeRF and Vision Transformer

Demo and a first ComfyUI node are already available

Automated research: The AI Scientist generates papers for 15 dollars each

Zero123: Stability AI releases new model for text-to-3D from a single image

Stability AI Audio Team Leader resigns due to disagreement over fair use

Stable Doodle turns doodles into high-quality AI images with SDXL

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

New Othello experiment supports the world model hypothesis for large language models

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

Free TripoSR generates 3D models in half a second

TripoSR uses NeRF and Vision Transformer

Demo and a first ComfyUI node are already available

Share

Bank details