Comment by charles_irl

Comment by charles_irl 6 months ago

Oh hey, I wrote this!

Thanks for sharing it.

ks2048 6 months ago

Looks nice. I'm not sure if this is the place for it, but what I am always searching for is a very concise table of the different GPUs available with approximate compute power and costs. Lists such as wikipedia [1] are way to complicated.

[1] https://en.wikipedia.org/wiki/List_of_Nvidia_graphics_proces...

Reply View 2 replies

charles_irl 6 months ago

Yeah, there's a tension between showing enough information to be useful for driving decisions and hiding enough information.
For example, "compute capability" sounds like it'd be what you need, but it's actually more of a software versioning index :(
Was thinking of splitting the difference by collecting up the quoted arithmetic (FLOP/s) and memory bandwidths from the manufacturer datasheets. But there's caveats there too, e.g. the dreaded "With sparsity" asterisk on the Tensor Core FLOP/s of recent generations.

Reply View | 1 reply
- shihab 6 months ago
  
  I was looking for a simple table recently- outlining say how the shared memory or total register size/SM varies between generations (Something like that Wiki table). It was surprisingly hard to find those info.
  
  Reply View | 0 replies

alberth 6 months ago

Thank you for this.

Any chance you could just make it a single long webpage (as opposed to making me click through one page at a time)?

For some reason on my iPad the links don’t always work the first time I click them.

Reply View 0 replies

petermcneeley 6 months ago

Great work. Nice aesthetic.

"These groups of threads, known as warps , are switched out on a per clock cycle basis — roughly one nanosecond. CPU thread context switches, on the other hand, take few hundred to a few thousand clock cycles"

I would note that intels SMT does do something very similar (2 hw threads). Other like the xeon phi would round robin 4 threads on a single core.

Reply View 3 replies

zeusk 6 months ago

SMT isn't that really is it?
SMT allows for concurrent execution of both threads (thus independent front-end for fetch, decode especially) and certain core resources are statically partitioned unlike a warp being scheduled on SM.
I'm not a graphics expert but warps seem closer to run-time/dynamic VLIW than SMT.

Reply View | 1 reply
- petermcneeley 6 months ago
  
  In actual implementation they are very much like very wide SIMD on a CPU core. Each HW thread is a different warp as each warp can execute different instructions.
  This mapping is so close that translation from GPU to CPU relatively easy and performant.
  
  Reply View | 0 replies
charles_irl 6 months ago

Thanks!
> intels SMT does do something very similar (2 hw threads)
Yeah that's a good point. One thing I learned from looking at both hardware stacks more closely was that they aren't as different as they seem at first -- lots of the same ideas or techniques get are used, but in different ways.

Reply View | 0 replies

TerraHertz 6 months ago

Thanks! As an old (retired) programmer I was hoping a good intro to GPUs would turn up. Now, I don't suppose you could add 'ink on paper' to the color options? Gray on light gray, with medium gray highlighting, is hard on old eyes. While I never want to see P7 phosphor green again. And I suppose a zipfile of the whole thing, for local reading and archive, would be out of the question?

Reply View 0 replies

byteknight 6 months ago

I absolutely love the look. Is it a template or custom?

Reply View 1 reply

charles_irl 6 months ago

Custom! Took inspiration from lynx, lotus, and other classic terminal programs.

Reply View | 0 replies

MostlyAmiable 6 months ago

What was your method for drawing/generating the SM diagram?

Reply View 0 replies