Comment by Workaccount2

Comment by Workaccount2 19 hours ago

2 replies

>It is absolutely true, and AI cannot think, reason, comprehend anything it has not seen before. If you're getting answers, it has seen it elsewhere, or it is literally dumb, statistical luck.

How would you reconcile this with the fact that SOTA models are only a few TB in size? Trained on exabytes of data, yet only a few TB in the end.

Correct answers couldn't be dumb luck either, because otherwise the models would pretty much only hallucinate (the space of wrong answers is many orders of magnitude larger than the space of correct answers), similar to the early proto GPT models.

efavdb 18 hours ago

Could it be that there is a lot of redundancy in the training data?

daveguy 17 hours ago

> How would you reconcile this with the fact that SOTA models are only a few TB in size? Trained on exabytes of data, yet only a few TB in the end.

This is false. You are off by ~4 orders of magnitude by claiming these models are trained on exabytes of data. It is closer to 500TB of more curated data at most. Contrary to popular belief LLMs are not trained on "all of the data on the internet". I responded to another one of your posts that makes this false claim here:

https://news.ycombinator.com/item?id=44283713