Stanford University and Google researchers have unveiled "Streetscapes," an AI system capable of generating realistic street views of entire cities.
A research team from Stanford University and Google Research has introduced a new AI system called "Streetscapes" that can create realistic street views of entire cities. Specifically, the system generates long, continuous video sequences simulating a drive through a virtual city. These can also be exported in 3D format via NeRF.
Streetscapes is based on diffusion models, which are widely used in image and video generation. The system was trained on millions of real street views from Google Street View, learning how typical street scenes look.
As input, Streetscapes receives a street map, a height map of buildings, and a desired camera path through the virtual city. From this, it generates realistic video sequences step by step. The created street views look amazingly realistic and contain many details such as windows, cobblestones, and vegetation. Light and shadows are also naturally rendered.
A key component is a "Motion Module" that ensures movement and temporal consistency between consecutive images. Additionally, improved temporal consistency is enabled by a new technique called "Temporal Imputation," where each new image is generated considering the previous images.
This allows Streetscapes to generate longer video sequences compared to alternative approaches: up to 100 frames with camera movements covering more than 170 meters. Streetscapes uses an architecture that has since been surpassed by other video generation models like OpenAI's Sora. According to the team, the underlying diffusion model is easily interchangeable, so future versions will deliver even better results.
Streetscapes can be controlled with text prompts
Besides generating street views, Streetscapes also enables creative applications. The appearance of the generated city can be controlled through text descriptions, for example, different times of day or weather conditions can be generated. Mixing city layouts and architectural styles is also possible - for instance, the system can visualize Parisian streets in the style of New York City.
The research team sees Streetscapes as an important step towards AI systems that can realistically generate not just individual objects, but entire, unlimited scenes. For the future, they plan to improve control over moving objects like cars. They also want to work on further increasing consistency between consecutive images.
More examples can be found on the project page.