Where GPT-4 Vision meets the road: Can it improve autonomous driving?

Can OpenAI's GPT-4 Vision improve autonomous driving? Chinese researchers have put the vision language model on the road, so to speak.

If companies like Nvidia have their way, vision-language models like OpenAI's GPT-4 Vision (GPT-4V) could become a key building block for computer vision in industrial applications, robotics, and autonomous driving in the future. In a new study, a team from Shanghai Artificial Intelligence Laboratory, GigaAI, East China Normal University, Chinese University of Hong Kong and WeRide.ai tested GPT-4V in autonomous driving scenarios.

Unlike the pure language model GPT-4, GPT-4V has strong image recognition capabilities and can, for example, describe image contents and provide context for them. The team tested GPT-4V in a range of tasks, from simple scene recognition to complex causal analysis and decision-making, under a variety of conditions.

GPT-4 Vision outperforms current systems in some applications

According to the team, the results show that GPT-4V has partially superior performance compared to existing autonomous systems in scene understanding and corner case analysis. The system has also demonstrated its ability to handle off-distribution scenarios, recognize intentions, and make informed decisions in real-world driving situations.

At the same time, the model shows weaknesses in areas that are particularly relevant to autonomous driving, especially spatial perception. For example, GPT-4V shows poor results in distinguishing directions and does not recognize all traffic lights.

Would GPT-4 Vision make the right decision on the road?

Specifically, the model's capabilities were tested in various aspects of autonomous driving. For example, in the area of scene understanding, GPT-4V was able to recognize weather and lighting conditions, identify traffic lights and road signs in different countries, and estimate the positions and actions of other road users in photos taken by different types of cameras.

GPT-4V was also able to handle borderline cases, such as an image of an aircraft making an emergency landing on a road or a complex construction site, and to understand and analyze panoramic and sequential images. It was also able to link road images with images from a navigation system.

In a final test, the team gave the GPT-4V the task of acting as a driver and making decisions based on the environment in real driving situations. The model was fed frame-by-frame with driving videos, key vehicle speeds, and other relevant information, and had to perform the required actions and justify its decisions. The system completed this task, albeit with a few errors.

GPT-4 Vision is promising but shows dangerous weaknesses

The team sees significant potential for systems such as GPT-4V to outperform existing autonomous driving systems in terms of scene understanding, intention detection, and decision-making. However, limitations in spatial perception and errors in traffic light recognition mean that GPT-4V alone is not currently suitable in such a scenario.

Recommendation

AI research

NeurIPS 2023: These are the top papers and award winners at the largest AI conference

Further research is needed to increase the robustness and applicability of GPT-4V and other vision models in different driving situations and conditions.

Further information and all data are available on GitHub.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Where GPT-4 Vision meets the road: Can it improve autonomous driving?

GPT-4 Vision outperforms current systems in some applications

Would GPT-4 Vision make the right decision on the road?

GPT-4 Vision is promising but shows dangerous weaknesses

NeurIPS 2023: These are the top papers and award winners at the largest AI conference

AI search company Perplexity eyes major funding boost

Spirit LM: Meta's AI division paves the way for its own Advanced Voice Mode

In-context learning proves competitive with LLM fine-tuning when data is scarce

Apple's local AI agent framework paves the way for more useful Apple Intelligence

Apple AI researchers question OpenAI's claims about o1's reasoning capabilities

Tesla unveils Cybercab robot taxi, but robot Optimus is the bigger deal

Where GPT-4 Vision meets the road: Can it improve autonomous driving?

GPT-4 Vision outperforms current systems in some applications

Would GPT-4 Vision make the right decision on the road?

GPT-4 Vision is promising but shows dangerous weaknesses

Share

Bank details