Comment by paxys

Comment by paxys 16 hours ago

11 replies

That may be relevant in the NYT vs OpenAI case, since NYT was supposedly able to reproduce entire articles in ChatGPT. Here Llama is predicting one sentence at a time when fed the previous one, with 50% accuracy, for 42% of the book. That can easily be written off as fair use.

gpm 15 hours ago

I'm pretty sure books.google.com does the exact same with much better reliability... and the US courts found that to be fair use. (Agreeing with parent comment)

  • pclmulqdq 15 hours ago

    If there is a circuit split between it and NYT vs OAI, the Google Books ruling (in the famously tech-friendly ninth circuit) may also find itself under review.

gamblor956 14 hours ago

That can easily be written off as fair use.

No, it really couldn't. In fact, it's very persuasive evidence that Llama is straight up violating copyright.

It would be one thing to be able to "predict" a paragraph or two. It's another thing entirely to be able to predict 42% of a book that is several hundred pages long.

  • reedciccio 14 hours ago

    Is it Llama violating the "copyright" or is it the researcher pushing it to do so?

    • lern_too_spel 12 hours ago

      If you distribute a zip file of the book, are you violating copyright, or is it the person who unzips it?

      • TeMPOraL 8 hours ago

        If you walk through the N-gram database with a copy of Harry Potter in hand and observe that for N=7, you can find any piece of it in the database with above-average frequency, does that mean N-gram database is violating copyright?

      • gamblor956 4 hours ago

        You are.

        Copyright is quite literally about the right to control the creation and distribution of copies.

        The creation of the unzipped file is not treated as a separate copy so the recipient would not be violating copyright just by unzipping the file you provided.

echelon 15 hours ago

> Here Llama is predicting one sentence at a time when fed the previous one, with 50% accuracy, for 42% of the book. That can easily be written off as fair use.

Is that fair use, or is that compression of the verbatim source?

  • TeMPOraL 3 hours ago

    It doesn't let you recover the text without knowing it in advance, so no.

    You can't in particular iterate it sentence by sentence; you're unlikely to go past sentence 2 this way before it starts giving you back it's own ideas.

    The whole thing is a sleigh of hand, basically. There's 42% of the book there, in tiny pieces, which you can only identify if you know what you're looking for. The model itself does not.