Comment by torginus

Comment by torginus 15 days ago

I wonder if it's feasible to hook up NAND flash with a high bandwidth link necessary for inference.

Each of these NAND chips hundreds of dies of flash stacked inside, and they are hooked up to the same data line, so just 1 of them can talk at the same time, and they still achieve >1GB/s bandwidth. If you could hook them up in parallel, you could have 100s of GBs of bandwidth per chip.

potatolicious 15 days ago

NAND is very, very slow relative to RAM, so you'd pay a huge performance penalty there. But maybe more importantly my impression is that memory contents mutate pretty heavily during inference (you're not just storing the fixed weights), so I'd be pretty concerned about NAND wear. Mutating a single bit on a NAND chip a million times over just results in a large pile of dead NAND chips.

Reply View 4 replies

torginus 15 days ago

No it's not slow - a single NAND chip in SSDs offers >1GB of bandwidth - inside the chip there are 100+ wafers actually holding the data, but in SSDs only one of them is active when reading/writing.
You could probably make special NAND chips where all of them can be active at the same time, which means you could get 100GB+ bandwidth out of a single chip.
This would be useless for data storage scenarios, but very useful when you have huge amounts of static data you need to read quickly.

Reply View | 3 replies
- slickytail 15 days ago
  
  The memory bandwidth on an H100 is 3TB/s, for reference. This number is the limiting factor in the size of modern LLMs. 100GB/s isn't even in the realm of viability.
  
  Reply View | 2 replies
  
  torginus 14 days ago
  
  That bandwidth is for the whole GPU, which has 6 mermoy chips. But anyways, what I'm proposing isn't for the high-end and training, but for making inference cheap.
  And I was somehat conservative with the numbers, a modern budget SSD with a single NAND can do more than 5GB/s read speed.
  
  Reply View | 0 replies
  
  torginus 14 days ago
  
  That bandwidth is for the whole GPU, which has 6 chips. But anyways, what I'm proposing isn't for the high-end and training, but for making inference cheap.
  And I was somehat conservative with the numbers, a modern budget SSD with a single NAND can do more than 5GB/s read speed.
  
  Reply View | 0 replies