On Tuesday, OpenAI, the company behind the viral chatbot ChatGPT, released a tool that detects whether a chunk of text was written by AI or a human being. It is, unfortunately, only accurate about 1 in 4 times.
“Our classifier is not fully reliable,” the company wrote in a blog post on its website. “We’re making [it] publicly available to get feedback on whether imperfect tools like this one are useful.”
OpenAI claimed that its detection tool correctly identifies 26% of AI-written text as “likely AI-written,” and incorrectly labels human-written text as AI-written 9% of the time.
Since its release in November, ChatGPT has become wildly popular around the world for responding to all kinds of questions with seemingly intelligent answers. Last week it was reported that ChatGPT had passed the final exam for the University of Pennsylvania’s Wharton School MBA program.
The bot has raised concerns, especially among academics, who are worried about high school and college students using it to do homework and complete assignments. Recently, a 22-year-old Princeton senior became the darling of professors everywhere after he set up a website that can detect whether a piece of writing was created using ChatGPT.
OpenAI seems aware of the problem. “We are engaging with educators in the US to learn what they are seeing in their classrooms and to discuss ChatGPT’s capabilities and limitations, and we will continue to broaden our outreach as we learn,” the company wrote in its announcement.
Still, by OpenAI’s own admission and BuzzFeed News’ completely unscientific testing, no one should be relying solely on the company’s detection tool just yet, because it kind of…blows.
We asked ChatGPT to write 300 words each on Joe Biden, Kim Kardashian, and Ron DeSantis, then used OpenAI’s own tool to detect whether an AI had written the text. We got three different results: The tool said that the piece about Biden was “very unlikely” to be AI-generated and the one on Kardashian was “possibly” AI-generated. The tool was “unclear” about whether the piece about DeSantis generated by ChatGPT was AI-generated.
Other people who played with the detection tool noticed it was messing up pretty spectacularly too. When the Intercept’s Sam Biddle pasted in a chunk of text from the Bible, OpenAI’s tool said that it was “likely” to be AI-generated.
It also determined that the US Declaration of Independence, which was written by Thomas Jefferson, Benjamin Franklin, John Adams, Robert Livingston, and Roger Sherman, was “possibly” AI-generated.
In addition to being bad, the tool also has a laundry list of limitations — it’s even less reliable on texts below 1,000 characters, and it only works with English. It also isn’t good at detecting AI-generated texts that humans have tweaked, something that most students using tools like ChatGPT to do their homework are probably doing already.
“It appears,” Biddle wrote, “that the problem of distinguishing human writing from software-generated text remains unsolved.”