AI in practice

ChatGPT review: Does OpenAI's text AI beat our editorial team?

Maximilian Schreiner
ChatGPT im Test: Schlägt OpenAIs Text-KI unsere Redaktion?

Midjourney prompted by THE DECODER

We summarize our articles for our readers in short bullet points. But how do we compare to an AI?

When we designed THE DECODER, our team decided to offer our readers the main points of our articles as a short summary. Before launching the website, we also tested how large language models like OpenAI's GPT-3 could do the job. Our result: The AI summaries were okay, but kept missing important points in the text.

But OpenAI continued to improve the GPT models with human feedback in recent months. The GPT-3.5 models, as OpenAI calls them, which have been available since the beginning of 2022, also form the basis for the live test of ChatGPT.

However, the model additionally processed dialog data from humans for training, OpenAI hired AI trainers, who wrote both their text and that of an AI assistant. These AI trainers had access to suggestions that helped them write responses.

Does ChatGPT write better summaries than we do?

ChatGPT has attracted an enormous amount of attention since its release, as we show in our article about some of the best ChatGPT demos. Part of these demos are impressive summaries created by ChatGPT. AI tools like Bearly.AI or SumUp bring this capability just a click away to any browser.

So it's time to give AI summaries another shot. Here are five of our article summaries and what ChatGPT does with the same text. For all examples, the prompt "Summarize the text in a few bullet points" was placed in front of the article and the first output was used.

Deepmind's "Dramatron" can write film and theater scripts

Summary THE DECODER

Summary ChatGPT

 

Video training for roboter hand allows it to learn human movements

Summary THE DECODER

Summary ChatGPT

Multimodal prompting is key for better generative AI

Summary THE DECODER

Summary ChatGPT

Nvidia moves into the financial sector with a multi-year AI partnership with Deutsche Bank

Summary THE DECODER

Summary ChatGPT

ChatGPT is a GPT-3 chatbot from OpenAI that you can test now

Summary THE DECODER

Summary ChatGPT

ChatGPT Summaries: Prompt engineering for "key takeaways".

ChatGPT ist "too wordy"? Maybe that explains the verbosity of the summaries, which is especially evident in the article about the cooperation between Nvidia and Deutsche Bank. But as I show, my prompt doesn't make ChatGPT keep it short.

Using the prompt "List three key takeaways from this text", I can significantly shorten ChatGPT's output. Here is an example from the Nvidia article:

Summary ChatGPT

  1. Deutsche Bank and Nvidia have announced a multi-year partnership to improve financial services using artificial intelligence (AI).
  2. The partnership will include the development of intelligent avatars, large language models for risk management, and fraud detection, as well as benefits to Deutsche Bank's cloud transformation.
  3. Deutsche Bank will be using Nvidia's AI Enterprise software suite and Omniverse platform to create 3D virtual avatars for internal navigation and customer service, and to accelerate computing for risk management and scenario calculations.

That's 78 words instead of 112. With "List two key takeaways from this text", it's down to 45:

Summary ChatGPT

  1. Deutsche Bank and Nvidia have announced a multi-year innovation partnership to improve financial services with AI.
  2. The partnership will focus on developing applications including intelligent avatars, large language models for risk management, and fraud detection, and will also benefit Deutsche Bank's cloud transformation.

OpenAI's ChatGPT could make my job easier, but at the end it missteps

So, in my sample, ChatGPT seems to work as a tool for our editing. The summaries never miss any key points - at least in this run. Whether ChatGPT or other large language models can make our job easier - and how economical that is - only a long-term test can show.

Here are the three key takeaways, ChatGPT gets from this article:

  1. OpenAI's GPT-3 language model was not able to provide accurate summaries of text, but with human feedback and additional training on dialog data, the improved GPT-3.5 model, also known as ChatGPT, has shown impressive summary capabilities in demos.
  2. Carnegie Mellon University researchers have developed a system called VideoDex, which allows robots to learn real-world interactions from internet videos of human interactions, rather than just sample data.
  3. Aleph Alpha's M-Vader generative AI model is able to generate new images based on multiple images and text input, rather than just text input alone. This multimodal prompting technique improves the quality and variety of the generated images.

While those could be key takeaways a human gets, when reading this article, it is not what it is about. Too bad, I'll write the summary then.