AI method "Dream3D" creates detailed 3D objects from text

Dream3D is a text-to-3D model that uses Stable Diffusion, CLIP and NeRFs to create detailed 3D objects from text.

Generative AI models for 3D have been a major research focus since at least late 2021: In December 2021, Google showed Dream Fields, a generative AI model that combines OpenAI's CLIP with Neural Radiance Fields (NeRF). Through the method, 3D shapes can be synthesized from text descriptions. CLIP guides a randomly initialized NeRF network to build a matching internal representation of the text description.

Less than a year later, researchers from Concordia University in Canada demonstrated the related method CLIP-Mesh, which, however, does not use NeRFs. That same month, Google also showed Dreamfusion, a much-improved version of Dream Fields that relies on Google's large image model Imagen instead of CLIP. From Nvidia, there's GET3D, and from OpenAI, Point-E.

Dream3D combines CLIP, Stable Diffusion and NeRFs for detailed models.

A new paper by researchers at ARC Lab, Tencent PCG, ShanghaiTech University, Shanghai Engineering Research Center of Intelligent Vision and Imaging, and Shanghai Engineering Research Center of Energy Efficient and Custom AI IC now shows Dream3D, a generative text-to-3D model that combines CLIP, Stable Diffusion, a 3D generator, and NeRFs.

Dream3D uses a 3D shape as a prior for the NeRF mesh. | Image: Xu, Wang, Gao et al.

A text input is first passed to a fine-tuned Stable Diffusion model in Dream3D to synthesize a rendering-style image. This image is then converted into a 3D shape by another model.

Unlike other methods, this process uses only the portion of the text input relevant to the central shape: From "A park bench overgrown by vines", for example, only "A park bench" is used.

The resulting 3D shape is then used to initialize the NeRF, which is then optimized using CLIP guidance via the complete text input, as with other methods.

Dream3D is one of the best methods currently available

According to the team, Dream3D clearly outperforms older methods such as Dream Fields, PureCLIPNeRF or CLIP-Mesh. In fact, the NeRF renderings shown are detailed and match the text inputs.

"A car is burning." | Video: Xu, Wang, Gao et al.

Recommendation

AI research

So-called reasoning models are more efficient but not more capable than regular LLMs, study finds

"The Iron Throne in Game of Thrones." | Video: Xu, Wang, Gao et al.

"A minecraft car." | Video: Xu, Wang, Gao et al.

The advantage of initializing the NeRF with the generated 3D shape can be clearly seen. However, the team does not make a direct comparison with Google's recent Dreamfusion method.

But the use of 3D shapes as a prior for the NeRF also limits Dream3D:

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Despite the strong generation capability of Stable Diffusion, we cannot constrain it to avoid generating shape images that are out of the distribution of the 3D shape generator, since Stable Diffusion is trained on a mega-scale text-image dataset while the 3D shape generator can only generate a limited amount of shapes. Besides, the quality of text-to-shape synthesis in our framework highly depends on the generation capability of the 3D generator.

From the paper.

The researchers hope to introduce better 3D priors into the system in the future, extending Dream3D's functionality to more object categories. More examples and soon the code are available on GitHub.

AI method "Dream3D" creates detailed 3D objects from text

Dream3D combines CLIP, Stable Diffusion and NeRFs for detailed models.

Dream3D is one of the best methods currently available

So-called reasoning models are more efficient but not more capable than regular LLMs, study finds

Vulnerable kids are nearly three times more likely to use companion AI chatbots for friendship

xAI says it wants to fix Grok 4 because referencing Musk's views is not right for a truth-seeking AI

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Musk unveils Grok 4 as xAI’s new AI model that beats OpenAI and Google on major benchmarks

AI method "Dream3D" creates detailed 3D objects from text

Dream3D combines CLIP, Stable Diffusion and NeRFs for detailed models.

Dream3D is one of the best methods currently available

Share

Bank details