Comment by efavdb

Comment by efavdb 3 days ago

3 replies

Curious for the state on things here. Can we reliably tell if a text was LLM generated? I just heard of a prof screening assignments for this, but not sure how that would work.

arendtio 2 days ago

Well, I think it depends on how much effort the 'writer' is going to invest. If the writer simply tells the LLM to write something, you can be fairly certain it can be identified. However, I am not sure if the 'writer' provides extensive style instructions (e.g., earlier works by the same author).

Anecdotal: A few weeks ago, I came across a story on HN where many commenters immediately recognized that an LLM had written the article, and the author had actually released his prompts and iterations. So it was not a one-shot prompt but more like 10 iterations, and still, many people saw that an LLM wrote it.

jvanderbot 3 days ago

Of course there are people who will sell you a tool to do this. I sincerely doubt it's any good. But then again they can apparently fingerprint human authors fairly well using statistics from their writing, so what do I know.

  • Al-Khwarizmi 2 days ago

    There are tools that claim accuracies in the 95%-99% range. This is useless for many actual applications, though. For example, in teaching, you really need to not have false positives at all. The alternative is failing some students because a machine unfairly marked their work as machine-generated.

    And anyway, those accuracies tend to be measured on 100% human-generated vs. 100% machine-generated texts by a single LLM... good luck with texts that contain a mix of human and LLM contents, mix of contents by several LLMs, or an LLM asked to "mask" the output of another.

    I think detection is a lost cause.