Comment by jedwhite

Comment by jedwhite 5 days ago

6 replies

That's an interesting insight about "stacking tricks" together. I'm curious where you found that approach hit limits. And what gives you an advantage if anything against others copying it. Getting real-time streaming with a 20B parameter diffusion model and 20fps on a single GPU seems objectively impressive. It's hard to resist just saying "wow" looking at the demo, but I know that's not helpful here. It is clearly a substantial technical achievement and I'm sure lots of other folks here would be interested in the limits with the approach and how generalizable it is.

sid-the-kid 5 days ago

Good question! Software gets democratized so fast that I am sure others will implement similar approaches soon. And, to be clear, some of our "speed upgrades" are pieced together from recent DiT papers. I do think getting everything running on a single GPU at this resolution and speed is totally new (as far as i have seen).

I think people will just copy it, and we just need to continue moving as fast as we can. I do think that a bit of a revolution is happening right now in real-time video diffusion models. There are so many great papers being published in that area in the last 6 months. My guess is that many DiT models will be real time within 1 year.

  • jedwhite 5 days ago

    > I do think getting everything running on a single GPU at this resolution and speed is totally new

    Thanks, it seemed to be the case that this was really something new, but HN tends to be circumspect so wanted to check. It's an interesting space and I try to stay current but everything is moving so fast. But I was pretty sure I hadn't seen anyone do that. Its a huge achievement to do it first and make it work for real like this! So well done!

  • sid-the-kid 5 days ago

    One thing that is interesting: LLMs pipelines have been highly optimize for speed (since speed is directly related to cost for companies). That is just not true for real-time DiTs. So, there is still lots of low hanging fruit for how we (and others) can make things faster and better.

  • storystarling 5 days ago

    Curious about the memory bandwidth constraints here. 20B parameters at 20fps seems like it would saturate the bandwidth of a single GPU unless you are running int4. I assume this requires an H100?

    • andrew-w 5 days ago

      Yep, the model is running on Hopper architecture. Anything less was not sufficient in our experiments.