
AI has become a crucial tool in most student’s daily lives, with many basic assignments and simple projects being completely influenced by an artificial source rather than the student. To combat this growing epidemic, teachers have begun using AI detection tools such as turnitin.com, gptzero.me, and even Google Classroom’s newest AI detection features. However, as more and more students are being inaccurately flagged down for their purported use of AI, the accuracy of these detectors has come into question.
According to prominent AI and tech influencers on Instagram and Youtube such as Jason West, Arturri Explores, and Nathan Gotch, originality.ai, gptzero.me, gowinston.ai, and panagram.com are the most reliable AI detection tools, used by many educators and employers to scan through essays and resumés to pick up any sign of AI interference. To truly test their accuracy, I generated an eight-sentence paragraph using the prompt: Generate me a human-like, opinionated paragraph, breaking all normal AI patterns. Try to be as undetectable as possible.
The results weren’t very surprising. All of the AI detection websites did their jobs, to varied degrees of accuracy. GPTZero identified the text as 100% AI, whereas Panagram, and GoWinston both stated that the text was likely AI with some human modifications. Originality said the text was likely an even mix between AI and human writing. However, the true accuracy of AI detection websites arose when I typed a similar paragraph with a similar pattern and opinion. GPTZero and GoWinston identified the text as “completely” AI, Panagram identified the text as “AI with Human Modifications,” and, yet again, Originality believed it was a 50/50 split. To worsen things, when I reloaded the website and copied the exact same text into the detectors, both GoWinston and Panagram changed their percentages reports.
“AI detecting AI is kind of dumb,” junior Kenzie Loretta said. “Yes, it will catch the AI, but what it was trained on is what ChatGPT was trained on…human work.”
This pattern prevails with other AI detection tools like Uncheck, WingZoolio and Scribbr, further harming the credibility of AI detectors’ ability to discern human work from AI, especially when detection patterns are using punctuation clues like the Oxford comma or em-dashes to flag work.
“There are people that I know who use AI on everything, and they never get flagged. I used one em-dash, and my whole paper got flagged, even though my teacher could see my computer the whole time,” Loretta said.

Loretta had an assignment flagged for AI use, despite not using AI at all. Her experience is not unique; according to an analysis by Northern Illinois University, if a 1% false positive rate is applied to the 2.235 million first-year college students in the U.S. writing 10 papers each, over 220,000 essays could be wrongly flagged. Studies have shown that AI detection services can show false positives 11-66% of the time, inflating the number of students who have received failing grades for papers that they wrote.
“AI detectors are partially accurate, but even so, I will plug a flagged essay into multiple AI detection websites just to be sure,” English teacher Teresa Huber said.
Huber has seen many essays and assignments flagged for AI and she thinks the disconnect between students’ definitions of cheating versus teachers’ definitions are to blame.
“I find that most students who get flagged didn’t use AI to write their papers, but they did use AI to generate ideas, and relied on AI to the extent that they get flagged,” Huber said. “I now ask my students ‘Tell me how you used AI,’ instead of ‘Did you use AI?’”
However, not all students use AI to aid them with their work, and not all teachers triple-check their AI detection software.
Further exacerbating this issue is that it is likely that if a student is wrongfully flagged once for AI, they will continuously be flagged due to similarity in writing patterns between the student and AI. Junior Taiya Viacrucis has been flagged three times for having 100% AI rates in her essays across multiple classes.
“[My teacher] was monitoring me on Dyknow, all other websites were blocked, and we got only the class period to write and print out the essay,” Viacrucis said, about the most recent flag. “There is literally no way I could have used AI.”
Luckily, most of Viacrucis’ teachers have been understanding. Some students are not as fortunate. According to NBC News, several students have filed lawsuits against their universities for wrongful accusations of AI use, leading to failing grades and negative GPA impacts.
“In a day and age where we consume more and more AI generated writing, the language and structure tends to stick with us,” junior Taiya Viacrucis said.
Worse, organizations such as Center for Democracy and Technology have exposed the racial bias involved in AI detection, saying that Black students are 20% more likely to be accused of using AI than their White or Hispanic counterparts. While this bias is not inherently in the AI detection software itself, it may lie within the human grading the papers, or deciding where the line between human or AI is drawn.
“It’s completely unfair,” Loretta said. “Saying that just because some kids type at higher vocabulary levels, or use better grammar, they must be using AI makes no sense.”
The inaccuracy of these AI detection systems also has immense long-term impacts. Students feel that they have to avoid certain words or grammar simply because they do not want to take any chances, leading to writing with more errors and less expression.
“People will talk about how lazy or dumb our generation is, and then flag every above-average IQ essay as AI,” sophomore Aarna Gupta said. “It forces [students] to reconsider how we are typing and what we are typing. We almost have to dumb our writing down so that we don’t get flagged.”
Collectively, students are being forced to rethink their own words and ideas, simply because of the inaccuracy of AI detectors. It seems that, in trying to encourage creative and critical thinking, schools have accidentally caused a mass epidemic of wrongful accusations, instead subduing the thoughts of students. While there is no surefire way to truly tell whether a student is using AI or not, rather than using AI detection tools to try and stop the use of AI, the best AI detector seems to be comparison between a student’s test scores and their daily assignments.
“You can use AI in the real world, but right now we are just trying to help you learn,” Huber said.
