Comment by aprilthird2021

Comment by aprilthird2021 16 hours ago

5 replies

Did you read the article? This exact point is made and then analyzed.

> Or maybe Meta added third-party sources—such as online Harry Potter fan forums, consumer book reviews, or student book reports—that included quotes from Harry Potter and other popular books.

> “If it were citations and quotations, you'd expect it to concentrate around a few popular things that everyone quotes or talks about,” Lemley said. The fact that Llama 3 memorized almost half the book suggests that the entire text was well represented in the training data.

gpm 16 hours ago

The article fails to mention or understand the volume of content here. Every, literally every, part of these books is quoted and "talked about" (in the sense of used in unlicensed derivative works).

And yes, I read the article before commenting. I don't appreciate the baseless insinuation to the contrary.

  • 1123581321 15 hours ago

    Agreed. It’s an obtuse quote by Lemley who can’t picture the enormous quantity of associations and crawled data, or at least wants to minimize the quantity. It’s hardly discussion-ending.

    Accusations of not reading the article are fair when someone brings up a “related” anecdote that was in the article. It’s not fair when someone is just disagreeing.

  • davidcbc 16 hours ago

    Even assuming you are correct, which I'm skeptical of, does this make it better?

    It's essentially the same thing, they are copying from a source that is violating copyright, whether that's a pirated book directly or a pirated book via fanficton.

    • gpm 15 hours ago

      Generally I think it matters a great deal to get the facts right when discussing something with nuance.

      Is this specific fact required to make my beliefs consistent... Yes I think it is, but if you disagree with me in other ways it might not be important to your beliefs.

      Legally (note: not a lawyer) I'm generally of the opinion that

      A) Torrenting these books was probably copyright infringement on Meta's part. They should have done so legally by scanning lawfully acquired copies like Google did with Google Books.

      B) Everything else here that Meta did falls under the fair use and de minimis exceptions to copyrights prohibition on copying copyrighted works without a license.

      And if it was copying significant amounts of a work that appeared only once in its training set into the model the de minimis argument would fall apart.

      Morally I'm of the opinion that copyright law's prohibition on deeply interacting with our cultural artifacts by creating derivative works is incredibly unfair and bad for society. This extends to a belief that the communities that do this should not be excluded from technological developments because there entire existence is unjustly outlawed.

      Incidentally I don't believe that browsing a site that complies with the DMCA and viewing what it lawfully serves you constitutes piracy, so I can't agree with your characterization of events either. The fanfiction was not pirated just because it was likely unlawful to produce in the US.

      • [removed] 8 hours ago
        [deleted]