Apple's new Ferret-UI 2 AI system can control apps across iPhones, iPads, Android, and Apple TV

Midjourney prompted by THE DECODER

Apple has developed a new AI system called Ferret-UI 2 that can read and control apps across iPhones, iPads, Android devices, web browsers, and Apple TV.

The system scored 89.73 in UI element recognition tests, significantly higher than GPT-4o's score of 77.73. It also shows significant improvements over its predecessor in basic tasks such as text and button recognition, as well as more complex operations.

Comparison table: benchmark results of various UI models with different backbones, showing performance values for elementary and advanced tasks. — Apple tested the system with several language models. While Llama-3 showed the best results, the smaller Gemma-2B also performed well. | Image: Apple

Understanding user intent

Instead of relying on specific click coordinates, Ferret-UI 2 aims to understand user intent. When given a command such as "Please confirm your input," the system can identify the appropriate button without requiring precise location data. Apple's research team used GPT-4o's visual capabilities to generate high-quality training data that helped the system better understand how UI elements relate to each other spatially.

Ferret-UI 2 uses an adaptive architecture that recognizes UI elements across platforms. It includes an algorithm that automatically balances image resolution and processing requirements for each platform. According to the researchers, this approach is "both information-preserving and efficient for local encoding."

Four UI screenshots with conversation examples: iPhone settings, iPad weather app, MacBook product page and Apple TV interface with model answers. — Ferret-UI 2 interaction examples. | Image: Apple

Testing showed strong cross-platform performance, with models trained on iPhone data achieving 68 percent accuracy on iPads and 71 percent accuracy on Android devices. However, the system had more difficulty transitioning between mobile devices and TV or Web interfaces, which the researchers attribute to differences in screen layouts.

Llama- and Gemma-based Ferret UI models are available from Hugging Face, along with a demo.

Microsoft releases UI understanding tool as open source

Apple's work comes as other companies push forward with their own UI understanding AI systems. Anthropic recently released an updated Claude 3.5 Sonnet with UI interaction, while Microsoft released OmniParser, an open-source tool that converts screen content into structured data, for the same purpose.

Apple also recently unveiled CAMPHOR, a framework that uses specialized AI agents coordinated by a master reasoning agent to handle complex tasks. Combined with Ferret-UI 2, this technology could enable voice assistants like Siri to analyze and perform complex tasks, such as finding and booking a specific restaurant, that involve navigating apps or the web using only voice commands.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Apple's new Ferret-UI 2 AI system can control apps across iPhones, iPads, Android, and Apple TV

Understanding user intent

Microsoft releases UI understanding tool as open source

Siri will get a Gemini-powered brain transplant as Apple bets on Google to close its generative gap

Apple seeks AI researchers for reasoning even as its own study questions current models

Apple's head of AI search Ke Yang joins Meta

Frustrated authors withdraw papers after realizing their reviewers are just lazy LLMs

Gemini 3 Pro tops new AI reliability benchmark, but hallucination rates remain high

Researchers push "Context Engineering 2.0" as the road to lifelong AI memory

Apple's new Ferret-UI 2 AI system can control apps across iPhones, iPads, Android, and Apple TV

Understanding user intent

Microsoft releases UI understanding tool as open source

Siri will get a Gemini-powered brain transplant as Apple bets on Google to close its generative gap

Apple seeks AI researchers for reasoning even as its own study questions current models

Apple's head of AI search Ke Yang joins Meta