Content
summary Summary
Update
  • Reuters update on the MATH benchmark added.

Update from July 15, 2024:

Ad

According to another Reuters source, OpenAI has internally tested an AI that scored over 90 percent on the MATH benchmark, a collection of championship-level math problems. Reuters could not confirm whether this AI is the "Strawberry" project.

The MATH (Mathematics Aptitude Test of Heuristics) dataset is a benchmark that measures the performance of AI systems in solving complex mathematical problems. It contains problems from math competitions for high school and college students. For comparison, the original GPT-4 scored around 53 percent, while GPT-4o achieves 76.6 percent.

A score above 90 percent would indicate that the tested AI was able to correctly solve most of these challenging problems. It is an indicator of the system's advanced mathematical and possibly reasoning skills, if the problems were not simply memorized.

Ad
Ad

Original article from July 13, 2024:

OpenAI's secret "Strawberry" project teaches AI models autonomous Internet research skills

OpenAI is developing an AI technology with advanced reasoning capabilities, codenamed "Strawberry". The project is similar to STaR, a method already presented by Stanford researchers, according to a source.

According to a Reuters report, OpenAI is working on a project called "Strawberry", previously known as Q* or Q-Star. The goal is to significantly enhance the reasoning abilities of the company's AI models.

Internal OpenAI documents reviewed by Reuters outline plans to use Strawberry models for autonomous web searches. The technology is said to allow the AI not only to generate answers, but also to plan ahead and "navigate the web autonomously" - referred to as "deep research". This could also be related to OpenAI's rumored Google Killer.

An insider told Reuters that Strawberry uses a special form of "post-training," in which pre-trained models are adapted for specific tasks. The exact details of this process remain unknown, but it involves a "deep research" dataset.

Recommendation

With Strawberry, OpenAI aims to improve its models' ability to plan and execute complex tasks over extended periods, so called long-horizon tasks (LHT). To achieve this, the systems will be assisted by a "CUA", a computer-controlled agent that can independently perform actions based on the AI's results.

This is in line with OpenAI's vision that AI agents that first reason logically and then take action represent the next level of technology. According to the Reuters source, Strawberry is being specifically tested to take over tasks from software and machine learning engineers.

OpenAI's approach reportedly similar to Stanford research

OpenAI's approach is similar to a method introduced by Stanford researchers called "Self-Taught Reasoner" (STaR), according to Reuters' source. STaR aims to teach AI systems to read between the lines, thereby improving their logical reasoning abilities.

Quiet-STaR, an advancement of STaR presented in March, trains language models to generate possible reasons for continuation at every point in a text. Through trial and error, the AI learns which considerations yield the best results. The longer the system can reason, the better the outcomes. Quiet-STaR could be abbreviated to "Q*".

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Formerly known as Q*, Strawberry has been a topic of speculation in the AI community since last fall, when rumors of a potential breakthrough for OpenAI began to circulate. At the time, Q* was said to be capable of solving complex mathematical problems. OpenAI CEO Sam Altman indirectly confirmed Q*'s existence, calling it an "unfortunate leak."

Experts believe that Q*/Strawberry combines large language models with planning algorithms, similar to chess programs or poker AI. Reinforcement learning and computation time during application are also likely to play a crucial role, echoing another similarity to Quiet-STaR.

The extent of Strawberry's development remains unclear. However, it is clear that projects like Strawberry and Quiet-STaR are designed to enable the next generation of AI systems with enhanced understanding and reasoning capabilities.

Microsoft CTO Kevin Scott recently echoed this sentiment, stating that he already has access to the next generation of AI and promising significant advances in the area of reasoning.

Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • Codenamed "Strawberry," OpenAI is developing AI technology with enhanced reasoning capabilities that teach AI to plan ahead and perform complex tasks autonomously.
  • The approach is reportedly similar to the "Quiet-STaR" method unveiled by Stanford researchers in March, in which language models learn to generate possible justifications for text continuations and optimize their reasoning skills through trial and error.
  • Experts believe that Strawberry, also known as Q*, combines large language models with planning algorithms, reinforcement learning, and longer computation times during application to create AI systems that can understand better and think more independently.
Sources
Online journalist Matthias is the co-founder and publisher of THE DECODER. He believes that artificial intelligence will fundamentally change the relationship between humans and computers.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.