Comment by ryandamm
This may not be a particularly popular opinion, but current copyright laws in the US are pretty clearly in favor of training an AI as a transformative act, and covered by fair use. (I did confirm this belief in conversation with an IP attorney earlier this week, by the way, though I myself am not a lawyer.)
The best-positioned lawsuits to win, like NYTimes vs. OpenAI/MS, is actually based on violating terms of use, rather than infringing at training time.
Emitting works that violate copyright is certainly possible, but you could argue that the additional entropy required to pass into the model (the text prompt, or the random seed in a diffusion model) is necessary for the infringement. Regardless, the current law would suggest that the infringing action happens at inference time, not training.
I'm not making a claim that the copyright should work that way, merely that it does today.
> Regardless, the current law would suggest that the infringing action happens at inference time, not training.
Zuckerberg downloading a large library of pirated articles does not violate any laws? I think you can get a life sentence for merely posting links to the library.