Comment by iak8god

Comment by iak8god 2 days ago

0 replies

The main concerns expressed in Robyn's note, as I read them, seem to be 1) generative AI has polluted the web with text that was not written by humans, and so it is no longer feasible to produce reliable word frequency data that reflects how humans use natural language; and 2) simultaneously, sources of natural language text that were previously accessible to researchers are now less accessible because the owners of that content don't want it used by others to create AI models without their permission. A third concern seems to be that support for and practice of any other NLP approaches is vanishing.

Making resources like wordfreq more visible won't exacerbate any of these concerns.