ChatGPT scores equal to or better than students in 9 out of 32 university courses

Midjourney prompted by THE DECODER

A study confirms anecdotal reports: ChatGPT can achieve academic performance comparable to students.

A study published in Scientific Reports compared the performance of students and ChatGPT on the same tasks. In an experiment, instructors at New York University Abu Dhabi (NYUAD) were first asked to provide ten questions from their respective lectures, along with three randomly selected student answers to each question.

The researchers then used ChatGPT to generate three different answers to each question. The questions were entered directly into ChatGPT without any additional context in the prompt.

It is not clear from the study whether GPT-3.5 or GPT-4 was used, although GPT-4 is mentioned in the references. If GPT-3.5 was used, the quality of the AI responses using GPT-4 instead could be much better, especially when it comes to reasoning.

ChatGPT is at least on the same level in 9 of 32 subjects

After the ChatGPT responses were generated, they were mixed with the student responses and scored by three different reviewers. ChatGPT performed as well as or better than human students in nine out of 32 subjects. These nine subjects were

Data Structures
Introduction to Public Policy
Quantitative Synthetic Biology
Cyberwarfare
Object Oriented Programming
Structure and Properties of Civil Engineering Materials
Biopsychology
Climate/Change
Management and Organizations

The AI was particularly convincing in areas where extensive factual knowledge was required. In the "Introduction to Public Policy" course, ChatGPT scored on average more than twice as high as the students. On the other hand, students outperformed ChatGPT in mathematical and economic tasks that required higher cognitive skills.

Image: Ibrahim, H., Liu, F., Asim, R. *et al*.

AI text detectors fail

The researchers also tested whether they could reliably distinguish human from machine text using OpenAI's AI text classifier, which the company has since withdrawn due to unreliability, and GPTZero.

The OpenAI tool misclassified five percent of human text as machine text, while GPTZero misclassified 18 percent. This is a disastrous result, considering the potential consequences for the students involved, who could be falsely accused of cheating.

Conversely, the OpenAI tool identified 49 percent of machine-generated text as human, compared to 32 percent for GPTZero. In both cases, the potential for AI text to pass as human text is high.

Recommendation

AI in practice

OpenAI's Strawberry AI is reportedly the secret sauce behind next-gen Orion language model

This finding is significant in the context of results from a survey of 1,601 students and teachers in Brazil, India, Japan, the United States and the United Kingdom that was also part of the study. 74 percent of students want to use ChatGPT for their work. 70 percent of teachers want to report this use as plagiarism if they notice it.

Join our community

Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.

ChatGPT scores equal to or better than students in 9 out of 32 university courses

ChatGPT is at least on the same level in 9 of 32 subjects

AI text detectors fail

OpenAI's Strawberry AI is reportedly the secret sauce behind next-gen Orion language model

US teachers estimate that AI tools save them about six hours of work every week, study finds

ChatGPT might be draining your brain, MIT warns - what ‘cognitive debt’ means for you

ChatGPT becomes study buddy for Hong Kong school students

OpenAI launches new ChatGPT agent that automates complex tasks for Pro, Plus, and Team

Kimi-K2 is the next open-weight AI milestone from China after Deepseek

New Energy-Based Transformer architecture aims to bring better "System 2 thinking" to AI models

ChatGPT scores equal to or better than students in 9 out of 32 university courses

ChatGPT is at least on the same level in 9 of 32 subjects

AI text detectors fail

Share

Bank details