Apple Intelligence is efficient, but its "intelligence" is average

Apple's own benchmarks show that the company's on-device and cloud LLMs achieve middling performance compared to other AI models. To boost performance for certain tasks, Apple has an adapter strategy - and ChatGPT.

At the 2024 Worldwide Developers Conference, Apple showed off its new "Apple Intelligence" generative AI system, built into iOS 18, iPadOS 18, and macOS Sequoia. It has several generative AI models that take the user's context into account.

Apple says it uses licensed data, public data from its AppleBot web crawler, and human labeled and synthetic data to train its models. According to Apple, private user data is not part of the training set.

Apple's server model is on par with GPT-3.5 Turbo

Internal benchmarks pitting the models against open-source and commercial models of similar size, such as GPT-3.5, show that Apple's AI models, while efficient for their size, perform well below SOTA models such as GPT-4.

With around three billion parameters, the on-device model beats larger models like Phi-3-mini, Mistral-7B, and Gemma-7B. The server model performs similar to DBRX-Instruct, Mixtral-8x22B, and GPT-3.5-Turbo, Apple says.

The server model can't compete with GPT-4. So don't expect any new "wow" moments from Apple Intelligence. For now, Apple Intelligence's main advantage is how deeply it's embedded in Apple's operating systems.

Apple says it tested prompts with varying levels of difficulty in areas such as brainstorming, sorting, answering yes/no questions, coding, pulling out key points, math, answering open-ended questions, paraphrasing, security, summarizing, and writing for the results above. People rated the output of the models.

When testing writing skills on internal benchmarks just for summarizing and composing text, Apple's on-device models outperformed larger models. The server model is as good as GPT-4 Turbo on these tasks. The results are only for how the models perform without task-specific adapters (see below).

To improve the performance of the base models, Apple uses an adapter strategy. Adapters are small neural network modules that are optimized for specific tasks, such as summarization or spell checking, and can be controlled by the base model after being fine-tuned for that task.

Recommendation

AI in practice

OpenAI launches Codex: Autonomous AI agents for software development

Video: Apple

Apple says that adapters can make the base models better at specific tasks in a dynamic, memory-efficient way without losing the general knowledge of the model. In some cases, Apple's on-device model with an adapter is much better than Microsoft's Phi-3-mini at summarizing emails and notifications. Notably, user satisfaction is increased by about ten percent due to fewer bad summaries.

Other than that, Apple is relying on its team-up with OpenAI and adding ChatGPT to expand what Apple Intelligence can do. By adding ChatGPT to iOS and Siri, Apple can use OpenAI's more capable multimodal models like GPT-4o when they're needed, like for hard writing tasks in the text editor or complex requests to Siri. It's not yet clear exactly when and how many requests will be sent to OpenAI.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Apple Intelligence is efficient, but its "intelligence" is average

Apple's server model is on par with GPT-3.5 Turbo

OpenAI launches Codex: Autonomous AI agents for software development

OpenAI reportedly booked TSMC's A16 process for manufacturing AI chips

"Cat attack" on reasoning model shows how important context engineering is

Apple's claims about large reasoning models face fresh scrutiny from a new study

Cloudflare CEO Matthew Prince sees trouble ahead for the open web

Apple Intelligence is efficient, but its "intelligence" is average

Apple's server model is on par with GPT-3.5 Turbo

OpenAI launches Codex: Autonomous AI agents for software development

OpenAI reportedly booked TSMC's A16 process for manufacturing AI chips