Comment by LarsDu88
Yeah, which is pretty slow due to the need to autoregressively generate each image frame token in sequence. And leading diffusion models need to progressively denoise each frame. These are very expensive computationally. Generating the entire world using current techniques is incredibly expensive compared to rendering and rasterizing triangles, which is almost completely parallelized by comparison.
in a few years it's possible that this will run locally in real time