AI in practice

AI startup Inflection's new LLM closes in on GPT-4 with only 40% of training FLOPs

Matthias Bastian
a colorful road, abstract

Midjourney prompted by THE DECODER

AI startup Inflection introduces Inflection 2.5, a new large language model that can catch up to GPT-4 with comparatively little effort.

AI startup Inflection, which specializes in developing AI assistants, recently unveiled its latest LLM: Inflection-2.5, which is designed to catch up with leading models like GPT-4 while being more efficient.

The new model is integrated into Inflection's "Pi", an AI assistant that has been "designed to be empathetic, helpful, and safe," according to the startup. It is now available to all Pi users through Inflection's website.

More efficient AI training

Inflection 2.5 is reported to achieve 94 percent of the average performance of GPT-4, while requiring only 40 percent of the estimated computational effort for training. Inflection particularly highlights advances in STEM (science, technology, engineering, and mathematics).

Image: Inflection AI

On the popular MMLU language comprehension benchmark, Inflection 2.5 is close to GPT-4, albeit with a more complex prompting scheme. The comparison to Inflection 1 is flawed because the 72.7 score on the MMLU benchmark was achieved with a simpler prompt (5-shot). Inflection 2 scored just under 80 percent on the MMLU using the industry standard 5-shot method.

Image: Inflection AI

In MT-Bench, which tests the conversational skills of AI models and how well they can follow a prompt, Inflection 2.5 scored between GPT-3.5 and GPT-4.

Curiously, during the evaluation, Inflection found that nearly 25 percent of the examples in the Reasoning, Mathematics, and Coding sections had incorrect reference solutions. The startup has corrected these and published a revised data set (MT-Bench Corrected). Yet another indication of the limited validity of synthetic benchmarks.

Image: Inflection AI

The model was also tested on the Hungarian math exam, which should be excluded from the training data, and on the Physics GRE, an entrance exam for graduate studies in physics. Inflection-2.5 scored at the 85th percentile of human test takers on the physics test, and came close to the maximum score using an extended prompting method, just behind GPT-4.

Image: Inflection AI

Inflection AI was founded by LinkedIn founder Reid Hoffman, Deepmind co-founder Mustafa Suleyman and former Deepmind researcher Karén Simonyan. The startup's goal is to create an interface that allows people to talk to computers and give them complex tasks without having to learn code.

The startup has received investments from well-known personalities and companies such as Nvidia and Microsoft, Reid Hoffman, Bill Gates and Eric Schmidt.

Sources: