Content
summary Summary

Stanford University and Google researchers have unveiled "Streetscapes," an AI system capable of generating realistic street views of entire cities.

Ad

A research team from Stanford University and Google Research has introduced a new AI system called "Streetscapes" that can create realistic street views of entire cities. Specifically, the system generates long, continuous video sequences simulating a drive through a virtual city. These can also be exported in 3D format via NeRF.

Streetscapes is based on diffusion models, which are widely used in image and video generation. The system was trained on millions of real street views from Google Street View, learning how typical street scenes look.

As input, Streetscapes receives a street map, a height map of buildings, and a desired camera path through the virtual city. From this, it generates realistic video sequences step by step. The created street views look amazingly realistic and contain many details such as windows, cobblestones, and vegetation. Light and shadows are also naturally rendered.

Ad
Ad

A key component is a "Motion Module" that ensures movement and temporal consistency between consecutive images. Additionally, improved temporal consistency is enabled by a new technique called "Temporal Imputation," where each new image is generated considering the previous images.

This allows Streetscapes to generate longer video sequences compared to alternative approaches: up to 100 frames with camera movements covering more than 170 meters. Streetscapes uses an architecture that has since been surpassed by other video generation models like OpenAI's Sora. According to the team, the underlying diffusion model is easily interchangeable, so future versions will deliver even better results.

Streetscapes can be controlled with text prompts

Besides generating street views, Streetscapes also enables creative applications. The appearance of the generated city can be controlled through text descriptions, for example, different times of day or weather conditions can be generated. Mixing city layouts and architectural styles is also possible - for instance, the system can visualize Parisian streets in the style of New York City.

The research team sees Streetscapes as an important step towards AI systems that can realistically generate not just individual objects, but entire, unlimited scenes. For the future, they plan to improve control over moving objects like cars. They also want to work on further increasing consistency between consecutive images.

More examples can be found on the project page.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Recommendation
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Researchers at Stanford University and Google have developed Streetscapes, an AI system that can generate realistic street views of entire cities as video sequences. The system is based on diffusion models and has been trained on millions of Google Street View images.
  • The system generates realistic videos step-by-step from street maps, elevation maps, and desired camera movements. A "motion module" and "temporal imputation" technology ensure movement and temporal consistency between images.
  • Streetscapes can generate up to 100 images with camera movements over 170 meters, enabling creative applications such as controlling the appearance of the city through text descriptions. The researchers plan to further improve control over moving objects and image consistency.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.