Comment by Lerc
Comment by Lerc 6 months ago
Based upon legal decisions in the past there is a clear argument that the distinction for fair use is whether a work is substantially different to another. You are allowed to write a book containg information you learned about from another book. There is threshold in academia regarding plagiarism that stands apart from the legal standing. The measure that was used in Gyles v Wilcox was if the new work could substitute for the old. Lord Hardwicke had the wisdom to defer to experts in the field as to what the standard should be for accepting something as meaningfully changed.
Recent decisions such as Andy Warhol Foundation for the Visual Arts, Inc. v. Goldsmith have walked a fine line with this. I feel like the supreme court got this one wrong because the work is far more notable as a Warhol than as a copy of a photograph, perhaps that substitution rule should be a two way street. If the original work cannot substitute for the copy, then clearly the copy must be transformative.
LLMs generating works verbatim might be an infringement of copyright (probably not), distributing those verbatim works without a licence certainly would be. In either case, it is probably considered a failure of the model, Open AI have certainly said that such reproductions shouldn't happen and they consider it a failure mode when it does. I haven't seen similar statements from other model producers, but it would not surprise me if this were the standard sentiment.
Humans looking at works and producing things in a similar style is allowed, indeed this is precisely what art movements are. The same transformative threshold applies. If you draw a cartoon mouse, that's ok, but if people look at it and go "It's Mickey mouse" then it's not. If it's Mickey to tiki Tu meke, it clearly is Mickey but it is also clearly transformative.
Models themselves are very clearly transformative. Copyright itself was conceived at a time when generated content was not considered possible so the notion of the output of a transformative work being a non transformative derivative of something else was never legally evaluated.
I think you may have something with that line of reasoning.
The threshold for transformative for fictional works is fairly high unfortunately. Fan fiction and reasonably distinct works with excessive inspiration are both copyright infringing. https://en.wikipedia.org/wiki/Tanya_Grotter
> Models themselves are very clearly transformative.
A near word for word copy of large sections of a work seems nowhere near that threshold. An MP3 isn’t even close to a 1:1 copy of a piece of music but the inherent differences are irrelevant, a neural network containing and allowing the extraction of information looks a lot like lossy compression.
Models could easily be transformative, but the justification needs to go beyond well obviously they are.