Content
summary Summary

OpenAI has introduced a new feature called "Predicted Outputs" for its GPT-4o and GPT-4o-mini language models that makes AI-supported text processing much faster than before.

Ad

Initial testing of the new feature shows significant speed gains: Code editing responses come in two to four times faster compared to existing models, and large file modifications that used to take about 70 seconds can now be completed in about 20 seconds. OpenAI points to several key applications, including updating blog posts in documents, iterating on previous responses, and rewriting code in an existing file.

The system operates on a straightforward principle: developers can input an expected portion of the output ahead of time. This approach works especially well for repetitive tasks or small document changes, since the model needs to generate fewer new tokens. OpenAI states that as a general guideline, when 50 percent of output tokens can be saved, the processing time decreases by about 50 percent.

Predicted outputs are only useful for special use cases

The feature performs best when the prediction closely aligns with what the model would respond with, but it's less effective for generating entirely new content where meaningful predictions aren't possible. OpenAI has successfully tested the feature across multiple programming languages, including Python, JavaScript, Go, and C++.

Ad
Ad

However, the feature comes with certain restrictions. It's only available with the GPT-4o and GPT-4o-mini models and doesn't work with advanced API parameters like multiple outputs or function calls. OpenAI suggests that developers should first test the feature with controlled, predictable tasks to achieve maximum efficiency.

Additional details about the feature can be found in OpenAI's documentation.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • OpenAI introduces Predicted Outputs, a feature for the GPT-4o and GPT-4o-mini that accelerates AI-powered text processing. Response times are two to four times faster than existing models, with large file processing now taking around 20 seconds instead of 70.
  • The system allows developers to pre-populate an expected portion of the output. Saving 50 percent of output tokens reduces latency by around 50 percent. Tests in Python, JavaScript, Go and C++ have been successful.
  • The feature is particularly suitable for repetitive tasks and minor document changes, but less so for completely new content. It does not support advanced API parameters such as multiple outputs or function calls, and is only available for GPT-4o and GPT-4o-mini.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.