AI vs AI: How effective are Turnitin, ZeroGPT, GPTZero, and Writer AI in detecting text generated by ChatGPT, Perplexity, and Gemini?
DOI Link to article: https://doi.org/10.37074/jalt.2025.8.1.9
Abstract:
AI chatbots and LLMs have made a significant impact in a short time. Despite their benefits, they pose serious threats to academic integrity and ethics by generating human-like text, which is very hard to detect. Various AI-detection tools have been developed to tackle this issue. However, their effectiveness is questionable. This study investigates the performance of four AI-detection tools (Turnitin, ZeroGPT, GPTZero, and Writer AI) in detecting AI-generated text. That text was generated using three LLMs (ChatGPT, Perplexity, and Gemini). Furthermore, three adversarial techniques (edited through Grammarly, paraphrased through Quillbot, and 10%-20% editing by a human expert) were applied to see their effects on the performance of AI-detection tools. Turnitin turned out to be the most accurate and consistent one, with a 100% AI score even with the adversarial techniques. ZeroGPT and GPTZero also reported relatively high AI scores, especially with the original files and the first and third adversarial techniques. Among the three adversarial techniques, paraphrasing through Quillbot affected the performance of three AI-detection tools (ZeroGPT, GPTZero, and Writer AI) the most. Among the three LLMs, text generated through Perplexity was more accurately detected, while Gemini-generated text showed a relatively lower AI score. What was the most note-worthy was the fact that in many cases, even when the text was generated through the same LLM, and detected through the same AI-detection tool; different files showed different AI scores, further highlighting the inconsistencies among AI-detection tools.