Comment by Workaccount2
Comment by Workaccount2 6 hours ago
I think you are confused about how LLMs train and store information. These models aren't archives of code and text, they are surprisingly small, especially relative to the training dataset.
A recent anthropic lawsuit decision also reaffirms that training on copyright is not a violation of copyright.[1]
However outputting copyright still would be a violation, the same as a person doing it.
Most artists can draw a batman symbol. Copyright means they can't monetize that ability. It doesn't mean they can't look at bat symbols.
[1]https://www.npr.org/2025/06/25/nx-s1-5445242/federal-rules-i...
No, I'm quite aware of how LLMs work. They are statistical models. They have, however, already been caught reproducing source material accurately. There is, inherently, no way to actually stop that if the only training data for a given output is a limited set of inputs. LLMs can and do exhibit extreme overfitting.
As for the Anthropic lawsuit, the piracy part of the case is continuing. Most models are built with pirated or unlicensed inputs. The part that was decided on, although the decision imo was wrong, only covers if someone CAN train a model.
At no point have I claimed you can't train one. The question is can you distribute one, and then use one. An LLM is not simplistic enough to be considered a phonebook, so they can't just handwave that away.
Saying an LLM can do that is like saying an artist can make a JPEG of a Batman symbol, and that's totally okay for them to distribute because the JPEG artifacts are transformative. LLMs ultimately are just a clever way of compressing data, and compressors are not transformative under the law, but possessing a compressor is not inherently illegal, nor is using one on copyrighted material for your own personal use.