Researchers at Korea's Yonsei University have created a new AI system that tests actions before executing them on websites. The approach shows better results than previous methods at helping AI navigate the web.
The team first tested how well large language models could predict sequences of actions. Even the newest AI models only got it right about 54% of the time. GPT-4 Turbo scored slightly higher than GPT-4o, while Claude 3.5 Sonnet performed "as badly as random guessing."
"These suggest that the world model, the ability to foresee the potential outcomes of actions taken, is absent in LLMs," the researchers explained.
Testing before doing
Instead of trial and error, the new system simulates possible actions first. The researchers developed what they call "transition-focused observation abstraction" to track important changes on websites.
The process works in three steps: First, it gathers data about how AI interacts with websites. Using GPT-4o-mini to generate prompts, the team gathered 14,000 training examples.
Second, it tracks changes between actions using the Hungarian algorithm to identify updates, deletions, and additions on web pages.
Third, it translates technical changes into simple language, reducing data from about 4,000 tokens to a much smaller amount. This cuts computing costs and increases efficiency.
The system's success varied by task type. In WebArena tests, which include common tasks like online shopping and using Reddit, it achieved a 16.6 percent success rate, improving from the previous 12.8 percent baseline.
Results varied significantly by category. GitLab page navigation improved by 181 percent, while map services showed a 92 percent gain. Online shopping saw the smallest improvement at 3 percent.
When tested on Mind2Web's collection of 2,000 tasks across 137 websites, the system achieved a new record with 25.4 percent of tasks completed successfully.
Looking ahead
The researchers acknowledge that work remains, especially in processing visual information and planning multiple steps. They plan to focus on these areas in future research.
Web navigation could become a key part of how agent-based AI systems, which some see as the next big step for AI, work with the Internet. Both Anthropic's "Claude Computer Use" and Google's "Project Jarvis" are developing similar capabilities to help AI navigate the web more effectively.