Comment by redox99

Comment by redox99 18 hours ago

Books3 was used in Llama1. We don't know if they used it later on.

My comparison was illustrative and analogous in nature. The copyright cartel is making a fruit of the poisonous tree type of argument. Whatever Meta are doing with LLMs is doing the heavy lifting that parity files used to do back in the Usenet days. I wouldn’t be surprised if BitTorrent or other similar caching and distribution mechanisms incorporate AI/LLMs to recognize an owl on the wire, draw the rest just in time in transit, and just send the diffs, or something like that.

The pictures are the same. All roads lead to Rome, so they say.

Reply View 0 replies

aprilthird2021 17 hours ago

All of the major AI models these days use "clean" datasets stripped of copyrighted material.

They also use data from the previous models, so I'm not sure how "clean" it really is

Reply View 3 replies

dragonwriter 17 hours ago

> All of the major AI models these days use "clean" datasets stripped of copyrighted material.
Which of the major commercial models discloses its dataset? Or are you just trusting some unfalsifiable self-serving PR characterization?

Reply View | 0 replies
pclmulqdq 17 hours ago

All written text is copyrighted, with few exceptions like court transcripts. I own the copyright to this inane comment. I sincerely doubt that all copyrighted material is scrubbed.

Reply View | 1 reply
- Tepix 17 hours ago
  
  Your brief comment is hardly copyrightable. Which makes your point moot.
  
  Reply View | 0 replies