Content
summary Summary

A study examines the effectiveness of GenAI text detection tools. The result: The tools should not be used in schools or universities, the researchers say.

Ad

According to the study, conducted by researchers from British University Vietnam and James Cook University Singapore, GenAI text detection tools show significant weaknesses when confronted with manipulated, machine-generated content. The study, which tested the effectiveness of six leading detectors on 805 text samples, found that the already low accuracy of the detectors dropped from an average of 39.5% to 17.4% when confronted with manipulated content, such as intentionally added spelling and grammar errors.

The results show significant variance in both the detection accuracy of AI-generated texts and the susceptibility to false-positive rates between the different tools. Copyleaks showed the highest accuracy in detecting unmanipulated and manipulated content, but had the highest false-positive rate among the evaluated tools at 50%. In contrast, several tools such as GPT-2 Output Detector, ZeroGPT, and Turnitin had a false-positive rate of 0%, meaning they did not falsely classify any of the human-written control samples as AI-generated - but in return they failed to detect more than 50% of the AI-generated texts.

Tool Accuracy (not manipulated) Accuracy (manipulated) False positive rate
Copyleaks 73,9% 58,7% 50%
Crossplag 54,3% 32,4% 30%
GPT-2 Output Detector 34,7% 17,5% 0%
ZeroGPT 31,3% 17,3% 0%
GPTZero 26,4% 16,7% 10%
Turnitin 50% 7,9% 0%
GPT Kit 6% 4,5% 0%

Language models tested include GPT-4, Claude 2, and Bard. With the new models Gemini and especially Claude 3, the problem is likely to have only become larger.

Ad
Ad

Researchers advise against AI text recognition tools

Text detection tools fail quickly once small changes are made, and those that provide better detection rates also more frequently classify human texts as AI-generated, the study finds.

Due to these accuracy limitations and especially because of the potential for false accusations, the team concludes that these tools cannot currently be recommended for uncovering violations of academic integrity.

The study also points to the inequalities and inclusion issues that could arise from the consensual acceptance of GenAI tools in academic publishing. This could disadvantage certain groups of students and researchers, e.g. through barriers to Internet access, financial barriers to accessing paid GenAI tools, and other access issues such as disabilities, which could lead to an exacerbation of 'digital poverty', the researchers warn.

The team therefore recommends promoting discussions on academic integrity, for which text detection tools could serve as an impetus. In addition, alternative assessment methods and a positive use of GenAI tools to support the learning process are needed, they say.

Ad
Ad
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.
Support our independent, free-access reporting. Any contribution helps and secures our future. Support now:
Bank transfer
Summary
  • A study by researchers from British University Vietnam and James Cook University Singapore shows that GenAI text detection tools have significant weaknesses, especially in detecting manipulated, AI-generated texts. The average accuracy dropped from 39.5% to 17.4% when the content was slightly altered.
  • The evaluated tools showed large differences in both detection accuracy and susceptibility to false-positive results. While Copyleaks showed the highest accuracy, it also had the highest rate of falsely classifying human-written texts as AI-generated at 50%.
  • Due to the accuracy limitations and the potential for false accusations, the research team advises against using these tools to uncover violations of academic integrity at this time. Instead, they recommend focusing on discussions about academic integrity, alternative assessment methods, and a positive use of GenAI tools to support learning.
Sources
Max is managing editor at THE DECODER. As a trained philosopher, he deals with consciousness, AI, and the question of whether machines can really think or just pretend to.
Join our community
Join the DECODER community on Discord, Reddit or Twitter - we can't wait to meet you.