OpenAI dominates the media with ChatGPT, but the company is also researching other generative AI models. A new paper shows a text-to-3D model.
In late 2022, OpenAI unveiled Point-E, a generative AI model for text-to-3D that received little attention given the enormous success of ChatGPT that same month. In part, this was because Point-E did not produce particularly impressive results.
With Point-E, OpenAI attempted to deliver a particularly fast text-to-3D model based on point clouds. Almost half a year later, the company's researchers are now presenting Shap-E, a direct successor.
Shap-E is extremely fast and a bit better
Unlike Point-E, Shap-E does not generate a point cloud, but instead directly parameters implicit functions that can be rendered as both textured meshes and NeRFs. Essentially, an encoder converts text or image input into these functions, and a diffusion model generates the desired 3D representation.
Like its predecessor, the quality of these renderings sometimes falls far short of alternatives such as Dreamfusion, Dreamfields, Magic3D, Dream3D or CLIP-Mesh. However, while CLIP-Mesh needs 17 minutes, Dreamfusion 12 hours and Dreamfields even 200 hours for a model on an Nvidia v100 GPU, Shap-E needs only 13 seconds with text input and only one minute with image input.
Shap-E can be combined with DreamFusion
OpenAI says the results "s highlight the potential of generating implicit representations, especially in domains like 3D where they can offer more flexibility than explicit representations."
However, Shap-E also has numerous limitations, such as assigning multiple attributes to an object or representing the correct number of objects. The team attributes these shortcomings to limited training data and believes they could be reduced by collecting and generating larger, labeled 3D datasets. In addition, the quality of the objects is limited.
However, to achieve better results, Shap-E could be combined with other optimization-based generative 3D techniques. For example, the team shows that a Shap-E model can be refined as a NeRF with DreamFusion.
If OpenAI finds a suitable architecture, it should be scaled up. Whether that will be Shap-E remains to be seen, but projects like Objaverse are creating large databases of labeled 3D data.
The code and model are available on GitHub.