Comment by godelski

I think a lot of people are really bad at evaluating world models. Feifei is right here that they are multimodal but really they must codify a physics. I don't mean "physics" but "a physics". I also think it's naïve to think this can be done through data alone. I mean just ask a physicist...[0].

But why people are really bad at evaluating them is because the details dominate. What matters here is consistency. We need invariance to some things and equivariance to others. As evaluators we tend to be hopeful so the subtle changes frame to frame are overlooked though thats kinda the most important part. It can't just be similar to the last frame, but needs be exactly the same. You need equivariance to translation, yet that's still not happening in any of these models (and it's not a limitation of attention or transformers). You're just going to have a really hard time getting all this data even though by doing that you'll look like you're progressing because you're better fitting it. But in the end the models will need to create some compact formulation representing concepts such as motion. Or in other words, a physics. And it's not like physicists aren't know for being detail oriented and nitpicky over nuances. That is breed into then with good reason

[0] https://m.youtube.com/watch?v=hV41QEKiMlM

ontouchstart 6 hours ago

The YouTube video tells a fascinating story. Who would be our Fermi today who can tell the truth and save five years of work, billions of dollars and careers of Ph.D. students?

We wouldn’t expect LLM to review a paper and tell us the truth like Fermi did. That is super-intelligence.

Thanks for sharing.

Reply View 0 replies

https://www.webofstories.com/play/freeman.dyson/94

This link has transcript.