AI in practice

OpenAI reportedly developing two AI agents to automate entire work processes

Matthias Bastian

DALL-E 3 prompted by THE DECODER

OpenAI is reportedly developing two types of AI agents to automate complex tasks.

According to The Information, one type of agent could take over a user's device to perform tasks such as transferring data between documents and spreadsheets or filling out expense reports. The agent takes care of all the clicks and automatically fills out forms.

The second type of AI agent is web-centric: It is designed to perform web-based tasks such as collecting public data, creating travel itineraries, or booking airline tickets.

This fits with earlier rumors that OpenAI plans to turn ChatGPT into a "super smart personal assistant for work."

This advanced AI assistant could have in-depth knowledge of individual employees and their workplaces, and perform personal assistance tasks such as composing emails or documents in the employee's style and incorporating the latest business data.

It is not known whether the advanced AI assistant will be sold as a standalone product or as part of a more comprehensive software suite.

Step by step to general-purpose AI assistants

OpenAI recently introduced the ability to combine the capabilities of different GPTs. This feature is a step towards OpenAI's goal of making ChatGPT a personalized, individual, universal assistant. The next step would be for the underlying model to automatically learn which GPT should respond to which request.

The recently launched Assistant API goes in a similar direction. According to OpenAI CEO Sam Altman, assistants are a first step toward full-fledged AI agents and will add new capabilities in the future.

Google CEO Sundar Pichai also recently said that Google's Bard chatbot could evolve into an assistant that performs actions for users, rather than just responding to them.

Startups like Adept and Imbue are also working on AI agents that can do things like operate web browsers. The most hyped startup at CES 2024, rabbit, is also working on an action-optimized language model (Large Action Model, LAM) that can perform actions on human interfaces at the user's request.

Sources: