Streetscapes AI generates uncannily realistic Street View scenes of entire cities from scratch

Stanford University and Google researchers have unveiled "Streetscapes," an AI system capable of generating realistic street views of entire cities.

A research team from Stanford University and Google Research has introduced a new AI system called "Streetscapes" that can create realistic street views of entire cities. Specifically, the system generates long, continuous video sequences simulating a drive through a virtual city. These can also be exported in 3D format via NeRF.

Streetscapes is based on diffusion models, which are widely used in image and video generation. The system was trained on millions of real street views from Google Street View, learning how typical street scenes look.

As input, Streetscapes receives a street map, a height map of buildings, and a desired camera path through the virtual city. From this, it generates realistic video sequences step by step. The created street views look amazingly realistic and contain many details such as windows, cobblestones, and vegetation. Light and shadows are also naturally rendered.

A key component is a "Motion Module" that ensures movement and temporal consistency between consecutive images. Additionally, improved temporal consistency is enabled by a new technique called "Temporal Imputation," where each new image is generated considering the previous images.

This allows Streetscapes to generate longer video sequences compared to alternative approaches: up to 100 frames with camera movements covering more than 170 meters. Streetscapes uses an architecture that has since been surpassed by other video generation models like OpenAI's Sora. According to the team, the underlying diffusion model is easily interchangeable, so future versions will deliver even better results.

Streetscapes can be controlled with text prompts

Besides generating street views, Streetscapes also enables creative applications. The appearance of the generated city can be controlled through text descriptions, for example, different times of day or weather conditions can be generated. Mixing city layouts and architectural styles is also possible - for instance, the system can visualize Parisian streets in the style of New York City.

The research team sees Streetscapes as an important step towards AI systems that can realistically generate not just individual objects, but entire, unlimited scenes. For the future, they plan to improve control over moving objects like cars. They also want to work on further increasing consistency between consecutive images.

More examples can be found on the project page.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Recommendation

AI research

Streetscapes AI generates uncannily realistic Street View scenes of entire cities from scratch

Streetscapes can be controlled with text prompts

OpenAI’s math breakthrough might also mean AI is getting better at knowing its own limits

Googles CodeMender is designed to automatically find and fix security flaws in software

YouTube adds generative AI to Shorts and podcasts

YouTube rolls out multilingual audio tracks to millions of creators

ChatGPT's memory could turn personal details into ads OpenAI CEO Altman once called dystopian

The long-predicted deepfake dystopia has arrived with Sora 2

Anthropic claims to lower the entry barrier for advanced AI models with Claude Haiku 4.5

Streetscapes AI generates uncannily realistic Street View scenes of entire cities from scratch

Streetscapes can be controlled with text prompts

Share

Bank details