Comment by Al-Khwarizmi

There are tools that claim accuracies in the 95%-99% range. This is useless for many actual applications, though. For example, in teaching, you really need to not have false positives at all. The alternative is failing some students because a machine unfairly marked their work as machine-generated.

And anyway, those accuracies tend to be measured on 100% human-generated vs. 100% machine-generated texts by a single LLM... good luck with texts that contain a mix of human and LLM contents, mix of contents by several LLMs, or an LLM asked to "mask" the output of another.

I think detection is a lost cause.