Comment by kombine

Comment by kombine 2 days ago

If hope AMD can produce a chip that matches H100 in training workloads.

lhl 2 days ago

Last year I had issues using MI300X for training, and when it did work, was about 20-30% slower than H100, but I'm doing some OpenRLHF (transformers/DeepSpeed-based) DPO training atm w/ latest ROCm and PyTorch and it seems to be doing OK, roughly matching GPU-hour perf w/ an H200 for small ~12h runs.

Note: previous testing I did was on a single (8x) MI300X node, currently I'm doing testing on just a single MI300X GPU, so not quite apples-to-apples, multi-GPU/multi-node training is still a question mark, just a single data point.

Reply View 0 replies

fooker a day ago

It gets even more jarring that H100 is about three years old now.

Reply View 0 replies

moralestapia 2 days ago

You mean a slower chip?

Their MI300s already beat them, 400s coming soon.

Reply View 10 replies

Vvector a day ago

Chip speed isn't as important as good software

Reply View | 9 replies
- moralestapia a day ago
  
  The software is the same, AMD is not doing its own LLMs.
  
  Reply View | 8 replies
  
  jjice a day ago
  
  I think the software they were referring to is CUDA and the developer experience around the nvidia stack.
  
  Reply View | 7 replies