OpenAI’s AI text classifier is designed to recognize AI-written text from OpenAI models and other language models. There are still limitations in terms of reliability.
OpenAI’s classifier is supposed to distinguish between AI-written text and text written by humans, but is not “fully reliable,” according to the company. In evaluating a “challenge set” of English texts, the classifier correctly classified 26 percent of the AI-written texts as “likely written by AI” (true positives), it said. Nine percent of the time, human-written texts were misclassified as AI texts (false positives).
In an area under the curve (AUC) validation, the classifier achieved a score of 0.97 in the validation set and 0.66 in the challenge set, compared to 0.95 in the validation set and 0.43 in the challenge set for a previously published classifier.
As the size of the generating language model increased, the performance of the classifier decreased – conversely, this means that particularly large language models are more likely to generate human-like text without predictive patterns.
The reliability of the classifier increases with the length of the text. Currently, the minimum input size is 1,000 characters, which is about 150 to 250 words. Below this limit, the classifier is “very unreliable,” according to OpenAI.
Therefore, the web demo does not allow scoring below this limit. OpenAI recommends using the classifier for English texts only.
Evaluating texts in five categories
The classifier model was trained on pairs of human and AI texts on the same topic. The human texts “may not be representative of all types of texts written by humans,” according to OpenAI. They came from a Wikipedia dataset, the WebText dataset collected in 2019, and a set of human demonstrations collected as part of InstructGPT training.
OpenAI’s AI text classifier divides the input text into five categories:
- “Very unlikely to be AI-generated” corresponds to a classifier threshold of <0.1. About 5% of human-written text and 2% of AI-generated text from our challenge set has this label.
- “Unlikely to be AI-generated” corresponds to a classifier threshold between 0.1 and 0.45. About 15% of human-written and 10% of AI-generated text from our challenge set has this label.
- “Unclear if it is AI written” corresponds to a classifier threshold between 0.45 and 0.9. About 50% of human-written text and 34% of AI-generated text from our challenge set has this label.
- “Possibly AI-generated” corresponds to a classifier threshold between 0.9 and 0.98. About 21% of human-written text and 28% of AI-generated text from our challenge set has this label.
- “Likely AI-generated” corresponds to a classifier threshold >0.98. About 9% of human-written text and 26% of AI-generated text from our challenge set has this label.
OpenAI sees itself as contributing to the dialog about AI texts rather than offering a solution for the education system
“Classifiers based on neural networks are known to be poorly calibrated outside of their training data. For inputs that are very different from text in our training set, the classifier is sometimes extremely confident in a wrong prediction,” OpenAI writes.
OpenAI specifically notes that the tool has not yet been evaluated on student essays, automated disinformation campaigns, or chat transcripts. Nor has the classifier’s performance been tested on texts co-authored by AI and humans, which is likely one of the most common use cases for AI text processing.
OpenAI also acknowledges that AI-written text can easily be edited to beat classifiers. While models could be updated with known attacks, it is unclear whether AI text recognition would provide a long-term advantage, the company said.
These limitations also apply to other recently announced classifiers, such as DetectGPT and GPTZeroX. OpenAI CEO Sam Altman has previously questioned the usefulness of AI text detectors, which he said could have a half-life of just a few months.
The education system would be wise to prepare for a future in which AI-generated text is ubiquitous and detectors are used as an additional option for difficult cases of plagiarism, as I have argued elsewhere.
The AI text classifier is provided by OpenAI free of charge on the web. An OpenAI account is required to use it.