Comment by Miraltar

Comment by Miraltar a year ago

I guess it would be interesting but differentiating pollution from language evolution seems very tricky since getting a non polluted corpus gets harder and harder

Retr0id a year ago

Arguably it is a form of language evolution. I bet humans have started using "delve" more too, on average. I think the best we can do is look at the trends and think about potential causes.

Reply View 5 replies

rvnx a year ago

“Seamless”, “honed”, “unparalleled”, “delve” are now polluting the landscape because of monkeys repeating what ChatGPT says without even questioning what the words mean.
Everything is “seamless” nowadays. Like I am seamlessly commenting here.
Arguably, the meaning of these words evolve due to misuse too.

Reply View | 2 replies
- oneeyedpigeon a year ago
  
  I see a lot of writing in my day-to-day, and the words that stick out most are things like "plethora" and "utilized". They're not terribly obscure, but they're just 'odd' and, maybe, formal enough to really stick out when overused.
  
  Reply View | 0 replies
- lobsterthief a year ago
  
  Btw can’t people just open their prompts by instructing LLMs not to use those words?
  
  Reply View | 0 replies
pavel_lishin a year ago

> I bet humans have started using "delve" more too, on average.
I wish there were a way to check.

Reply View | 1 reply
- linhns a year ago
  
  I'm seeing more and more of uses of it on this thread.
  
  Reply View | 0 replies

wpietri a year ago

One way to tackle it would be to use LLMs to generate synthetic corpuses, so you have some good fingerprints for pollution. But even there I'm not sure how doable that is given the speed at which LLMs are being updated. Even if I know a particular page was created in, say, January 2023, I may no longer be able to try to generate something similar now to see how suspect it is, because the precise setups of the moment may no longer be available.

Reply View 0 replies