Comment by paxys

Comment by paxys 16 hours ago

That may be relevant in the NYT vs OpenAI case, since NYT was supposedly able to reproduce entire articles in ChatGPT. Here Llama is predicting one sentence at a time when fed the previous one, with 50% accuracy, for 42% of the book. That can easily be written off as fair use.

gpm 15 hours ago

I'm pretty sure books.google.com does the exact same with much better reliability... and the US courts found that to be fair use. (Agreeing with parent comment)

Reply View 1 reply

pclmulqdq 15 hours ago

If there is a circuit split between it and NYT vs OAI, the Google Books ruling (in the famously tech-friendly ninth circuit) may also find itself under review.

Reply View | 0 replies

gamblor956 14 hours ago

That can easily be written off as fair use.

No, it really couldn't. In fact, it's very persuasive evidence that Llama is straight up violating copyright.

It would be one thing to be able to "predict" a paragraph or two. It's another thing entirely to be able to predict 42% of a book that is several hundred pages long.

Reply View 6 replies

reedciccio 14 hours ago

Is it Llama violating the "copyright" or is it the researcher pushing it to do so?

Reply View | 5 replies
- lern_too_spel 12 hours ago
  
  If you distribute a zip file of the book, are you violating copyright, or is it the person who unzips it?
  
  Reply View | 4 replies
  
  TeMPOraL 8 hours ago
  
  If you walk through the N-gram database with a copy of Harry Potter in hand and observe that for N=7, you can find any piece of it in the database with above-average frequency, does that mean N-gram database is violating copyright?
  
  Reply View | 2 replies
  
  gamblor956 4 hours ago
  
  You are.
  Copyright is quite literally about the right to control the creation and distribution of copies.
  The creation of the unzipped file is not treated as a separate copy so the recipient would not be violating copyright just by unzipping the file you provided.
  
  Reply View | 0 replies

echelon 15 hours ago

> Here Llama is predicting one sentence at a time when fed the previous one, with 50% accuracy, for 42% of the book. That can easily be written off as fair use.

Is that fair use, or is that compression of the verbatim source?

Reply View 1 reply

TeMPOraL 3 hours ago

It doesn't let you recover the text without knowing it in advance, so no.
You can't in particular iterate it sentence by sentence; you're unlikely to go past sentence 2 this way before it starts giving you back it's own ideas.
The whole thing is a sleigh of hand, basically. There's 42% of the book there, in tiny pieces, which you can only identify if you know what you're looking for. The model itself does not.

Reply View | 0 replies