Comment by montebicyclelo

Comment by montebicyclelo 3 days ago

4 replies

Reminds me of this [1] HN post from 9 months ago, where the author trained a neural network to do world emulation from video recordings of their local park — you can walk around in their interactive demo [2].

I don't have access to the DeepMind demo, but from the video it looks like it takes the idea up a notch.

(I don't know the exact lineage of these ideas, but a general observation is that it's a shame that it's the norm for blog posts / indie demos to not get cited.)

[1] https://news.ycombinator.com/item?id=43798757

[2] https://madebyoll.in/posts/world_emulation_via_dnn/demo/

ollin 3 days ago

Yup, similar concepts! Just at two opposite extremes of the compute/scaling spectrum.

- That forest trail world is ~5 million parameters, trained on 15 minutes of video, scoped to run on a five-year-old iPhone through a twenty-year old API (WebGL GPGPU, i.e OpenGL fragment shaders). It's the smallest '3D' world model I'm aware of.

- Genie 3 is (most likely) ~100 billion parameters trained on millions of hours of video and running across multiple TPUs. I would be shocked if it's not the largest-scale world model available to the public.

There are lots of neat intermediate-scale world models being developed as well (e.g. LingBot-World https://github.com/robbyant/lingbot-world, Waypoint 1 https://huggingface.co/blog/waypoint-1) so I expect we'll be able to play something of Genie quality locally on gaming GPUs within a year or two.

danielwmayer 2 days ago

I was immediately struck when I looked down at just the boardwalk how similar it felt to being on LSD. I am continually astounded with how similar these systems end up seeming to how our brain works. May just be happy coincidences but I am pretty sold on there being true parallels that will only become more and more apparent.

  • ollin 2 days ago

    A lot of people mentioned this! The "dreamlike" comparison is common as well. In both cases, you have a network of neurons rendering an image approximating the real world :) so it sort of makes sense.

    Regarding the specific boiling-textures effect: there's a tradeoff in recurrent world models between jittering (constantly regenerating fine details to avoid accumulating error) and drifting (propagating fine details as-is, even when that leads to accumulating error and a simplified/oversaturated/implausible result). The forest trail world is tuned way towards jittering (you can pause with `p` and step frame-by-frame with `.` to see this). So if the effect resembles LSD, it's possible that LSD applies some similar random jitter/perturbation to the neurons within your visual cortex.