summary Summary

OpenAI dominates the media with ChatGPT, but the company is also researching other generative AI models. A new paper shows a text-to-3D model.

In late 2022, OpenAI unveiled Point-E, a generative AI model for text-to-3D that received little attention given the enormous success of ChatGPT that same month. In part, this was because Point-E did not produce particularly impressive results.

With Point-E, OpenAI attempted to deliver a particularly fast text-to-3D model based on point clouds. Almost half a year later, the company's researchers are now presenting Shap-E, a direct successor.

Shap-E is extremely fast and a bit better

Unlike Point-E, Shap-E does not generate a point cloud, but instead directly parameters implicit functions that can be rendered as both textured meshes and NeRFs. Essentially, an encoder converts text or image input into these functions, and a diffusion model generates the desired 3D representation.

Shap-E and Point-E produce similar results, but the former is slightly faster and can be more easily linked to other methods. | Image: OpenAI

Like its predecessor, the quality of these renderings sometimes falls far short of alternatives such as Dreamfusion, Dreamfields, Magic3D, Dream3D or CLIP-Mesh. However, while CLIP-Mesh needs 17 minutes, Dreamfusion 12 hours and Dreamfields even 200 hours for a model on an Nvidia v100 GPU, Shap-E needs only 13 seconds with text input and only one minute with image input.

Shap-E can be combined with DreamFusion

OpenAI says the results "s highlight the potential of generating implicit representations, especially in domains like 3D where they can offer more flexibility than explicit representations."

However, Shap-E also has numerous limitations, such as assigning multiple attributes to an object or representing the correct number of objects. The team attributes these shortcomings to limited training data and believes they could be reduced by collecting and generating larger, labeled 3D datasets. In addition, the quality of the objects is limited.

However, to achieve better results, Shap-E could be combined with other optimization-based generative 3D techniques. For example, the team shows that a Shap-E model can be refined as a NeRF with DreamFusion.

If OpenAI finds a suitable architecture, it should be scaled up. Whether that will be Shap-E remains to be seen, but projects like Objaverse are creating large databases of labeled 3D data.


The code and model are available on GitHub.

Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
  • OpenAI's Shap-E is a text-to-3D model and a direct successor to Point-E.
  • Like its predecessor, Shap-E is orders of magnitude faster than other systems, but falls short in quality.
  • The code and model of Shap-E have been released by OpenAI.
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.