Comment by aspenmayer

Comment by aspenmayer a day ago

1 reply

https://archive.is/OSQt6

If you've seen as many magnet links as I have, with your subconscious similarly primed with the foreknowledge of Meta having used torrents to download/leech (and possibly upload/seed) the dataset(s) to train their LLMs, you might scroll down to see the first picture in this article from the source paper, and find uncanny the resemblance of the chart depicted to a common visual representation of torrent block download status.

Can't unsee it. For comparison (note the circled part):

https://superuser.com/questions/366212/what-do-all-these-dow...

Previously, related:

Extracting memorized pieces of books from open-weight language models - https://news.ycombinator.com/item?id=44108926 - May 2025