Comment by woah
This could be very good for scaling data while avoiding copyright claims since the copyright argument is a lot weaker (at least to the layman) if no memorization is happening. It even may open the door to Snow Crash like distributed training where people feed the model continuous streams of data of their computer use or even daily lives without worrying about PII leakage