AI text is flooding education systems and the Internet. Reliable recognition tools would allow more control. Two such tools are DetectGPT and GPTZeroX.
DetectGPT is being developed by a research team at Stanford University led by Eric Mitchell. The idea is that text generated by an LLM "tends to occupy negative curvature regions of the model’s log probability function." This property occurs in many large language models, the researchers say.
Based on this observation, the DetectGPT approach defines a new curvature-based criterion for judging whether a passage was generated with an LLM based on rephrasing compared to the original text. Simply put, language generation follows a mathematical pattern that DetectGPT can identify.
Quick q:
What do we expect the log probability function to look like in the neighborhood of a model sample?
We hypothesized that a model's samples are usually in local maxima of its log probability function, or more generally, in areas of negative curvature.
Spoiler: they are! pic.twitter.com/tu29ZUaLes
- Eric (@_eric_mitchell_) January 27, 2023
According to the research team, their method classifies AI text 95 percent of the time in the scenarios they tested, outperforming existing zero-shot methods. DetectGPT is also as good as or significantly better than custom recognition models trained on millions of examples on some datasets, writes participating researcher Chelsea Finn.
How does it compare to generated-vs-human classifiers trained to LOTS of data?
DetectGPT matches the performance of large supervised classifiers on some datasets and substantially outperforms them on others. pic.twitter.com/g2nR2MzZvV
— Chelsea Finn (@chelseabfinn) January 27, 2023
To achieve this performance, DetectGPT does not require a separate classifier, a comparison dataset of real and generated text, or explicit watermarking, as OpenAI is said to have in development. Computer scientist Tom Goldstein of the University of Maryland also presented a scientific paper on watermarking for large language models a few days ago, describing promising recognition rates and numerous open questions.
AI text recognition has many open questions
Despite the high recognition rate, DetectGPT still has many limitations: Among other things, the method requires that the log probabilities of the model can be evaluated. API models such as GPT-3 would provide the necessary data, but the evaluation costs money because the questionable text must first be processed by the model. DetectGPT would also be more computationally intensive than other methods.
In addition, there are many variables in text generation, such as humans post-processing an AI text, which is perhaps the most common scenario. According to Mitchell, AI text recognition still achieves an accuracy of 0.9 AUROC (the rating scale used, equivalent to 90 percent) when 15 percent of the AI text has been modified. However, as the degree of modification increases, the accuracy steadily decreases.
Responded to you elsewhere, but re: 2), we can still get pretty good results even if we "pre-perturb" the text before detection. With ~15% of the text replaced, we still have ~0.9 AUROC in our experiments. pic.twitter.com/LbFtI2iQps
— Eric (@_eric_mitchell_) January 29, 2023
Another open question is whether LLMs can be made to explicitly generate text that is not recognized by the detectors by using special prompts. This scenario has not been investigated by the DetectGPT team.
GPTZeroX for the education system
The GPTZero team is also introducing a new product: GPTZeroX is designed for education and is said to be based on new detection models compared to previous versions. The models are constantly being updated, and there have been some recent breakthroughs, the team writes.
GPTZeroX provides API access for bulk text processing, scores a text holistically, and can highlight individual AI sentences. The system outputs a probability that a text was generated by an AI. A scientific evaluation of GPTZero is not yet available.
The system is based on the detection of two factors aligned with human authors: Perplexity, the randomness in a sentence, and Burstiness, the general randomness of all sentences in a text.
The thesis: bots tend to produce simple sentences, while humans express themselves with varying degrees of complexity within a text. GPTZero's 22-year-old creator, Edward Tian, is a computer science major and journalism minor at Princeton.
AI text detectors are a help, not a solution
Aside from the open questions about reliability mentioned above, there are other good reasons why education in particular should not consider AI text detectors as a solution to the potential problem of large-scale AI plagiarism.
The most important reason is that there are legitimate purposes for using language models in writing, such as translation and stylistic improvement. Newer models, such as DeepL Write, go out of their way to optimize entire paragraphs according to common style rules. In this way, the tool helps inexperienced writers produce more readable texts according to common rules of good writing.
In the future, the content of a text could be entirely conceived by a human, but the text itself could be largely written by a machine. DetectGPT researcher Eric Mitchell expects his team's tool to flag texts as machine-generated if they contain more than 30 percent AI text - a limit that is quickly reached.
The education system would therefore be well advised to prepare for a future in which AI-generated text is ubiquitous, and to use detectors only as an additional option for difficult cases of plagiarism. At worst, detectors will discourage students from using such tools because they may be falsely labeled as plagiarists - and the potential efficiency gains from these new tools would fall by the wayside.
Sam Altman, CEO of OpenAI, predicts that AI text detectors will have a half-life of a few months before methods are available to outsmart them.