Comment by TGower

Comment by TGower 16 hours ago

1 reply

People aren't buying Harry Potter action figures as a subtitute for buying the book either, but copyright protects creators from other people swooping in and using their work in other mediums. There is obviously a huge market demand for high quality data for training LLMs, Meta just spent 15 billion on a data labeling company. Companies training LLMs on copyrighted material without permission are doing that as a substitue for obtaining a license from the creator for doing so in the same way that a pirate downloading a torrent is a substitue for getting an ebook license.

ritz_labringue 14 hours ago

Harry Potter action figures trade almost entirely on J. K. Rowling’s expressive choices. Every unlicensed toy competes head‑to‑head with the licensed one and slices off a share of a finite pot of fandom spending. Copyright law treats that as classic market substitution and rightfully lets the author police it.

Dropping the novels into a machine‑learning corpus is a fundamentally different act. The text is not being resold, and the resulting model is not advertised as “official Harry Potter.” The books are just statistical nutrition. One ingredient among millions. Much like a human writer who reads widely before producing new work. No consumer is choosing between “Rowling’s novel” and “the tokens her novel contributed to an LLM,” so there’s no comparable displacement of demand.

In economic terms, the merch market is rivalrous and zero‑sum; the training market is non‑rivalrous and produces no direct substitute good. That asymmetry is why copyright doctrine (and fair‑use case law) treats toy knock‑offs and corpus building very differently.