The latest version of Perplexity AI's search model Sonar has arrived, powered by Meta's Llama 3.3 70B and some specialized hardware.
In internal tests, the company says Sonar performs better than models like GPT-4o mini and Claude 3.5 Haiku when it comes to user satisfaction. It even matches or sometimes exceeds the capabilities of premium models like GPT-4o and Claude 3.5 Sonnet, particularly in search-related tasks.
The team built Sonar on Meta's Llama 3.3 70B model, fine-tuning it with additional training to improve its search capabilities. This extra training focused on making responses more factually accurate and easier to read, Perplexity says. The company previously used a modified version of Llama 3.1 under the same Sonar name.
Specialized chips push response speeds to new levels
To make Sonar faster, Perplexity partnered with Cerebras Systems, which takes an unusual approach to chip design. Instead of creating multiple small processors, Cerebras turns entire silicon wafers into single, massive chips called "Wafer Scale Engines" (WSE). Running on this hardware, Sonar can process 1,200 tokens per second, allowing it to generate responses almost instantly. While the French AI startup Mistral recently achieved similar speeds with its "Flash Answers" feature, that wasn't specifically for search applications.
For now, Sonar access is limited to paying Pro users, though Perplexity plans to make it more widely available in the future. The company hasn't shared details about the financial aspects of its partnership with Cerebras.