Comment by Rover222
I don't think you grasp what I'm saying? I'm talking about next token prediction to generate video frames.
I don't think you grasp what I'm saying? I'm talking about next token prediction to generate video frames.
in a few years it's possible that this will run locally in real time
I don't think that will ever happen die to extreme hardware requirements. What I do see happen is that only an extremely low fidelity scene is rendered with only basic shapes, no or very little textures etc. that is them filled in by AI. DLSS taken to the extreme, not just resolution but the whole stack.
Yeah, which is pretty slow due to the need to autoregressively generate each image frame token in sequence. And leading diffusion models need to progressively denoise each frame. These are very expensive computationally. Generating the entire world using current techniques is incredibly expensive compared to rendering and rasterizing triangles, which is almost completely parallelized by comparison.