Comment by free_bip
I think your analogy is a massive stretch. `wc` is neither generative nor capable of having market effect.
Your second construction is generative, but likely worse than a Markov chain model, which also did not have any market effect.
We're talking about the models that have convinced every VC it can make a trillion dollars from replacing millions of creative jobs.
It's not a stretch because I'm not claiming they're the same thing, I'm incrementally walking the tech stack to try and find where we would want to draw the line. If things something has to be generative in order to be a violation, that (for all but the most insane definitions of generative) clears `wc`, but what about publishing the DVD or BluRay encryption keys? Most of the "hacker" communities pretty clearly believe that isn't a violation of copyright. But is it a violation of copyright to distribute that key and also software that can use that key to make a copy of a DVD? If not, why? Is it because the user has to combine the key, with the software and specifically direct that software to make a copy of which the copy is a violation of copyright but not the software and key combination?
If that's the combination of the decryption key and the software that can use that key to make a copy of a DVD is not a violation of copyright, does that imply that distributing a model and a piece of software separately that can use that model is also not a copyright violation? If it is a violation, what makes it different from the key + copy software combo?
If we decide that generative is a necessary component, is the line just whenever the generative model becomes useful? That seems arbitrary and unnecessarily restrictive. Google Scholar is an instructive example here, a search database that scanned many thousands of copyright materials, digitized them and then made that material searchable to anyone and even (intentionally) displayed verbatim copies (or even images) of parts of the work in question. This is unquestionably useful for people, and also very clearly producing portions of copyrighted works. Should the court cases be revisited and Google Scholar shut down for being useful?
If market effect is the key thing, how do we square that with the fact that a number of unquestionably market impacting things are also considered fair use. Emulators are the classic example here, and certainly modern retro gaming OSes like Recalbox or Retropie have measurable impacts on the market for things like nostalgia bait mini SNES and Atari consoles. And yet, the emulators and their OS's remain fair use. Or again, lets go back to the combination of the DVD encryption keys and something like handbrake. Everyone knows exactly what sort of copyright infringement most people do with those things. And there are whole businesses dedicated to making a profit off of people doing just that (just try and tell anyone with a straight face that Plex servers are only being used to connect to legitimate streaming services and stream people's digitized home movies).
My point is that AI models touch on all of these sorts of areas that we have previously carved out as fair use, and AI models are useful tools that don't (despite claims to the contrary) clearly fall afoul of copyright law. So any argument that they do needs to think about where we draw the lines and what are the factors that make up that decision. So far the courts have found training an AI model with legally obtained materials and distributing that model to be fair use, and they've explained how they got to that conclusion. So an argument to the contrary needs to draw and different line and explain why the line belongs there.