lhl 2 days ago

Last year I had issues using MI300X for training, and when it did work, was about 20-30% slower than H100, but I'm doing some OpenRLHF (transformers/DeepSpeed-based) DPO training atm w/ latest ROCm and PyTorch and it seems to be doing OK, roughly matching GPU-hour perf w/ an H200 for small ~12h runs.

Note: previous testing I did was on a single (8x) MI300X node, currently I'm doing testing on just a single MI300X GPU, so not quite apples-to-apples, multi-GPU/multi-node training is still a question mark, just a single data point.

fooker a day ago

It gets even more jarring that H100 is about three years old now.

moralestapia 2 days ago

You mean a slower chip?

Their MI300s already beat them, 400s coming soon.

  • Vvector a day ago

    Chip speed isn't as important as good software

    • moralestapia a day ago

      The software is the same, AMD is not doing its own LLMs.

      • jjice a day ago

        I think the software they were referring to is CUDA and the developer experience around the nvidia stack.