Comment by aschobel
Indeed! It is a form of massive lossy compression.
> Llama 3 70B was trained on 15 trillion tokens
That's roughly a 200x "compression" ration; compared to 3-7x for tradtional lossless text compression like bzip and friends.
LLM don't just compress, they generalize. If they could only recite Harry Potter perfectly but couldn’t write code or explain math, they wouldn’t be very useful.
But LLMs cant write code nor explain math, they only plagiarize existing code and plagiarize existing explanations of math.