AI research

Zero123: Stability AI releases new model for text-to-3D from a single image

Maximilian Schreiner

Stability AI

Stabilty AI has released a new image model and workflow for creating better 3D models.

The new model is called Stable Zero123 and is a new version of the model series of the same name. Stable Zero123 does not generate 3D models directly — rather, it is a central building block in a generative workflow that starts with a text prompt and ends with a 3D model. Specifically, Zero123 can take an image of an object and generate multiple new images of the object from different view angles.

These panoramic images can then be used by another model, e.g., to condition a NeRF on these images and finally generate a 3D model.

Stable Zero123 was trained with a huge 3D data set

According to Stability AI, the Stable Zero123 should achieve significantly better results than its predecessor Zero123-XL. This is primarily made possible by a better training data set. To achieve this, the start-up has exclusively filtered high-quality 3D models from the Objaverse data set. During training and inference, the stable Zero123 receives not only the images but also estimated camera angles that support the model's predictions.

Zero123 produces more consistent results than Zero123-XL. | Image: Stability AI

Together with other improvements, such as the ability to train with larger batches, Stability AI says this has led to a 40-fold increase in training efficiency compared to Zero123-XL.

StableZero123 plus threestudio for 3D generation

Stable Zero123 is released for research purposes only and is not intended for commercial use. Those interested in using Stability AI's 3D solutions for commercial products or purposes should contact the company directly.

To create 3D objects with Stable Zero123, the team is releasing the model with instructions on HuggingFace. The threestudio framework and the model are required. While the VRAM requirements for generating the new views are at the level of Stable Diffusion 1.5, generating the 3D objects takes significantly more time, and 24 gigabytes of VRAM is recommended.

Stable Zero123 is also available via the Stable 3D Private Preview for text-to-3D generation.

Sources: