Content
summary Summary

The latest version of Perplexity AI's search model Sonar has arrived, powered by Meta's Llama 3.3 70B and some specialized hardware.

Ad

In internal tests, the company says Sonar performs better than models like GPT-4o mini and Claude 3.5 Haiku when it comes to user satisfaction. It even matches or sometimes exceeds the capabilities of premium models like GPT-4o and Claude 3.5 Sonnet, particularly in search-related tasks.

The team built Sonar on Meta's Llama 3.3 70B model, fine-tuning it with additional training to improve its search capabilities. This extra training focused on making responses more factually accurate and easier to read, Perplexity says. The company previously used a modified version of Llama 3.1 under the same Sonar name.

Specialized chips push response speeds to new levels

To make Sonar faster, Perplexity partnered with Cerebras Systems, which takes an unusual approach to chip design. Instead of creating multiple small processors, Cerebras turns entire silicon wafers into single, massive chips called "Wafer Scale Engines" (WSE). Running on this hardware, Sonar can process 1,200 tokens per second, allowing it to generate responses almost instantly. While the French AI startup Mistral recently achieved similar speeds with its "Flash Answers" feature, that wasn't specifically for search applications.

Ad
Ad

For now, Sonar access is limited to paying Pro users, though Perplexity plans to make it more widely available in the future. The company hasn't shared details about the financial aspects of its partnership with Cerebras.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Perplexity AI has released an updated version of its search model, Sonar, built on Meta's Llama 3.3 70B and specialized hardware from Cerebras Systems.
  • Internal tests show Sonar outperforming models like GPT-4o mini and Claude 3.5 Haiku in user satisfaction, and even matching or surpassing premium models like GPT-4o and Claude 3.5 Sonnet in search-related tasks.
  • Cerebras Systems' unique Wafer Scale Engines allow Sonar to process 1,200 tokens per second, enabling near-instant responses, though access is currently limited to paying Pro users.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.