Jack of All Trades, Master of None - how ChatGPT will change AI anyway

According to a new study, ChatGPT is a jack of all trades, master of none. But the chatbot will change artificial intelligence forever, the researchers say.

In a new paper, a team from the University of Science and Technology in Wrocław, Poland, shows how OpenAI's ChatGPT performs on numerous natural language processing (NLP) machine learning benchmarks.

To do this, the researchers compared the chatbot to today's best AI models in 25 different tasks. Their conclusion: ChatGPT is a "jack of all trades, master of none."

Researchers develop a custom API to send over 38,000 requests to ChatGPT

So far, ChatGPT has mainly been tested in generative tasks, i.e. tasks that require the AI model to write or summarize text, or to answer questions, e.g. in a legal or medical context. In contrast, the Polish team is focusing on the analytical capabilities, especially the semantic and pragmatic understanding of the OpenAI chatbot.

This includes typical NLP problems such as simple text classification for humor or sarcasm, more complex ones such as grammatical correctness or sentiment analysis, and those where ambiguous words need to be correctly classified, or reasoning is tested.

Such tasks are not only relevant for research, but also for businesses, which can use them to automatically classify product reviews or moderate content with the help of AI.

For each benchmark, the team creates custom prompts that prompt ChatGPT to provide answers in the correct format. To handle the large volume of requests - over 38,000 prompts - the researchers use a custom PyGPT API and up to 20 OpenAI accounts.

ChatGPT is not yet on the level of state-of-the-art systems

In all 25 benchmarks, ChatGPT was consistently outperformed by today's best AI models for each task. On average, the quality of the specialized models was 73.7 percent, while that of ChatGPT was 56.6 percent. ChatGPT was particularly weak on tasks involving a "very subjective problem of emotional perception and individual interpretation of the content".

When the eight emotion-related tasks are excluded, the average quality of ChatGPT rises to 69.7 percent, while that of the other methods rises to 80 percent. In some cases, the quality of ChatGPT can be improved by a few percentage points with additional examples in the prompt.

Recommendation

AI in practice

Meta takes on OpenAI's GPT-4o with Llama 3 405B, its largest open-source LLM to date

The results of 25 benchmarks show where ChatGPT's strengths and weaknesses lie. | Image: Kocoń et al.

So ChatGPT's performance is still below the SOTA models - but apart from the emotion-related tasks, the gap is not very far, the researchers conclude. ChatGPT is thus a jack-of-all-trades, but without really mastering any task.

ChatGPT will be "life-changing" and "AI-boosting"

The researchers, therefore, expect ChatGPT to be used in classical NLP areas as well. The team sees a special advantage in the interactivity of the bot. Disadvantages are the lower accuracy and the beta status of the system.

ChatGPT also offers a unique self-explanation feature that makes it easier for people to understand what the bot is saying. This is an important part of explainable artificial intelligence (XAI), the paper says. As a result, the researchers "strongly believe that ChatGPT can accelerate the development of various AI-related technologies and profoundly change our daily lives." They expect that ChatGPT and similar AI systems will advance AI research and spark an "economic and social AI revolution."

The overview shows where the team expects big changes from ChatGPT and similar AI systems. | Image: Kocoń et al.

In the future, the team plans to test ChatGPT in more reasoning benchmarks, as well as in a variety of prompt engineering methods.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

Jack of All Trades, Master of None - how ChatGPT will change AI anyway

Researchers develop a custom API to send over 38,000 requests to ChatGPT

ChatGPT is not yet on the level of state-of-the-art systems

Meta takes on OpenAI's GPT-4o with Llama 3 405B, its largest open-source LLM to date

ChatGPT will be "life-changing" and "AI-boosting"

Theory of Mind: Why GPT-4 learns how we think

Unexpected cash infusion: When ChatGPT helps manage bureaucracy

Vicuna: GPT-4 likes this chatbot almost as much as ChatGPT

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

Jack of All Trades, Master of None - how ChatGPT will change AI anyway

Researchers develop a custom API to send over 38,000 requests to ChatGPT

ChatGPT is not yet on the level of state-of-the-art systems

ChatGPT will be "life-changing" and "AI-boosting"

Share

Bank details