A recent mathematical proof suggests that reliably identifying whether text has been generated by artificial intelligence (AI) models such as ChatGPT may be impossible.
The ease with which AI can create text that resembles human writing has led to problems such as cheating and disinformation campaigns. To prevent such issues, some have suggested methods like adding a hidden watermark to the AI-generated text or analyzing the text for patterns unique to AI-generated content.
However, researchers from the University of Maryland, led by Soheil Feizi, have demonstrated that these techniques may not be dependable. They used AI-based paraphrasing tools to modify AI-generated text, with and without watermarks, and discovered that the effectiveness of the watermark was significantly reduced by paraphrasing.
Furthermore, as language models improve, their output becomes increasingly similar to human speech, making detection even more difficult. The researchers fed the reworded text into several text detectors and observed a considerable drop in accuracy, bringing the performance of the detectors down to approximately 50%, which is comparable to random guessing.
The researchers used an “impossibility result” mathematical proof to demonstrate that as AI models become more similar to human speech, detecting whether text has been generated by AI models will become increasingly challenging.
Detectors will struggle to identify false positives or miss detecting AI-generated text. Feizi notes that even the best detector will not be very effective, essentially providing a random guess between human-generated and AI-generated text.
As a result, Feizi believes that it may not be possible to reliably determine whether text has been generated by humans or AI, and we should learn to accept this. Yulan He at King’s College London suggests that rather than concentrating on AI detectors, we should focus on comprehending the implications of AI generative models and how we can use them beneficially.