Shap-E is OpenAI's fastest text-to-3D model to date

OpenAI dominates the media with ChatGPT, but the company is also researching other generative AI models. A new paper shows a text-to-3D model.

In late 2022, OpenAI unveiled Point-E, a generative AI model for text-to-3D that received little attention given the enormous success of ChatGPT that same month. In part, this was because Point-E did not produce particularly impressive results.

With Point-E, OpenAI attempted to deliver a particularly fast text-to-3D model based on point clouds. Almost half a year later, the company's researchers are now presenting Shap-E, a direct successor.

Shap-E is extremely fast and a bit better

Unlike Point-E, Shap-E does not generate a point cloud, but instead directly parameters implicit functions that can be rendered as both textured meshes and NeRFs. Essentially, an encoder converts text or image input into these functions, and a diffusion model generates the desired 3D representation.

Shap-E and Point-E produce similar results, but the former is slightly faster and can be more easily linked to other methods. | Image: OpenAI

Like its predecessor, the quality of these renderings sometimes falls far short of alternatives such as Dreamfusion, Dreamfields, Magic3D, Dream3D or CLIP-Mesh. However, while CLIP-Mesh needs 17 minutes, Dreamfusion 12 hours and Dreamfields even 200 hours for a model on an Nvidia v100 GPU, Shap-E needs only 13 seconds with text input and only one minute with image input.

Shap-E can be combined with DreamFusion

OpenAI says the results "s highlight the potential of generating implicit representations, especially in domains like 3D where they can offer more flexibility than explicit representations."

However, Shap-E also has numerous limitations, such as assigning multiple attributes to an object or representing the correct number of objects. The team attributes these shortcomings to limited training data and believes they could be reduced by collecting and generating larger, labeled 3D datasets. In addition, the quality of the objects is limited.

However, to achieve better results, Shap-E could be combined with other optimization-based generative 3D techniques. For example, the team shows that a Shap-E model can be refined as a NeRF with DreamFusion.

If OpenAI finds a suitable architecture, it should be scaled up. Whether that will be Shap-E remains to be seen, but projects like Objaverse are creating large databases of labeled 3D data.

Recommendation

AI research

Deepmind proves robust AI adaptation requires learning causal models under the hood

The code and model are available on GitHub.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Shap-E is OpenAI's fastest text-to-3D model to date

Shap-E is extremely fast and a bit better

Shap-E can be combined with DreamFusion

Deepmind proves robust AI adaptation requires learning causal models under the hood

Stable Video 4D creates moving 3D models from video

Google DeepMind's latest AI models might bring us one step closer to LLMs that can reason

Rule-Based Rewards: OpenAI provides insight into the GPT-4 safety stack

Rule-Based Rewards: OpenAI provides insight into the GPT-4 safety stack

Meta takes on OpenAI's GPT-4o with Llama 3 405B, its largest open-source LLM to date

AI models might need to scale down to scale up again

Shap-E is OpenAI's fastest text-to-3D model to date

Shap-E is extremely fast and a bit better

Shap-E can be combined with DreamFusion

Share

Bank details