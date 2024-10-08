AI research
Maximilian Schreiner

WonderWorld AI generates interactive 3D environments from photos in just 10 seconds

Yu, Duan et al.
WonderWorld AI generates interactive 3D environments from photos in just 10 seconds
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail
Content
summary Summary

Researchers at Stanford University and MIT have developed an AI system that can interactively generate 3D scenes from a single image.

Ad

Researchers at Stanford University and MIT have developed an AI system that can interactively generate 3D scenes from a single image in real-time. This new technology, called WonderWorld, enables users to build and explore virtual environments step-by-step by controlling the content and layout of generated scenes.

The biggest challenge in developing WonderWorld was achieving rapid 3D scene generation. While previous approaches often required dozens of minutes to hours to generate a single scene, WonderWorld can produce a new 3D environment within 10 seconds on an Nvidia A6000 GPU. This speed allows for real-time interaction, a significant advancement in the field.

Video: Yu, Duan et al.

Ad
Ad

WonderWorld works by starting with an input image and generating an initial 3D scene. It then enters a loop, alternating between creating scene images and corresponding FLAGS representations. Users can control where new scenes are generated by moving the camera and use text input to specify the type of scene they want.

Illustration der WonderWorld-Funktionsweise: Aus einem Eingabebild werden schrittweise mehrere zusammenhängende 3D-Szenen generiert, gesteuert durch Nutzereingaben zu Inhalt und Platzierung neuer Szenen.
WonderWorld generates interactively linked 3D scenes from a single input image. Users can control the content and layout of the generated environments. | Image: Yu, Duan et al.

The FLAGS representation consists of three layers: foreground, background, and sky. Each layer contains a set of "surfels" - elements defined by their 3D position, orientation, scale, opacity, and color. These surfels are initialized using estimated depth and normal maps, then optimized to create the final scene.

To reduce geometric distortions at scene transitions, WonderWorld employs a guided depth diffusion process. This uses a pre-trained diffusion model for depth maps, adjusting the depth estimate to match the geometry of existing parts of the scene.

Team sees potential in game development

Experiments have shown that WonderWorld significantly outperforms previous methods for 3D scene generation in terms of speed and visual quality. In user studies, the generated scenes were rated as more visually convincing than those produced by other approaches.

Video: Yu, Duan et al.

Recommendation
AI research

Nvidia researcher Jim Fan expects "GPT-3 moment" for robotics in the next few years

Video: Yu, Duan et al.

However, the system does have some limitations. It can only create forward-facing surfaces, restricting user movement to about 45 degrees in the virtual world. The generated worlds currently look like paper cut-outs. The system also struggles with detailed objects like trees, which can lead to "holes" or "floating" elements when the viewing angle changes.

Despite these limitations, the researchers see significant potential for WonderWorld in various applications. Game developers could use it to build 3D worlds iteratively. It could generate larger and more diverse content for virtual reality experiences. In the long term, it could enable users to create freely explorable, dynamically evolving virtual worlds.

You can find more examples to try out for yourself on the WonderWorld project page.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at Stanford University and MIT have developed WonderWorld, an AI system that can interactively generate 3D scenes from a single image. Users can control the content and layout of the generated environments.
  • The system generates a new scene within 10 seconds on an Nvidia A6000 GPU, which is significantly faster than previous methods. It uses a FLAGS display with three levels and so-called surfels as well as guided depth diffusion to optimize the geometry.
  • Despite limitations such as only displaying forward-facing surfaces, the researchers see potential in game development, virtual reality and the creation of dynamic virtual worlds. In user studies, the generated scenes were rated as visually convincing.
Sources
Arxiv
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Profile
E-Mail
AI in practice

Adobe launches web app to protect creatives from unwanted AI use

News, tests and reports about VR, AR and MIXED Reality.
Start your own band in Virtual Reality: Band Space is coming to Meta Quest and SteamVR in October Top Deals on PC VR Hardware: Powerful Alienware laptop $500 off VR multiplayer shooter Frenzies goes into Early Access on Meta Quest this week MIXED-NEWS.com
AI research

Researchers collect 950,000 hours of open source speech data for EU languages

AI research

New algorithm could reduce energy requirements of AI systems by up to 95 percent

Google News
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

WonderWorld AI generates interactive 3D environments from photos in just 10 seconds

Bank details

IBAN: DE87 1203 0000 1086 0070 75
Account holder: DEEP CONTENT GbR
Purpose: Support THE DECODER
AI research

Study reveals major reasoning flaws in smaller AI language models

AI in practice

GPT-o1-mini helps math professor with complex proof, but it's complicated

AI in practice

OpenAI's new Realtime API lets developers add realistic conversations to their apps

Google News