Comment by Dwedit
Does using pure PyTorch improve performance on non-NVIDIA cards in any way? Or is PyTorch so highly optimized for CUDA that no other GPU vendors have a chance?
Does using pure PyTorch improve performance on non-NVIDIA cards in any way? Or is PyTorch so highly optimized for CUDA that no other GPU vendors have a chance?
Does pytorch work on AS out of the box? Or do you need some apple specific package
It is possible to run ML workloads on for example AMD devices via Vulkan. With newer extensions like cooperative matrix, and maybe also in the future some scheduling magic exposed by the driver through a new extension, the remaining single digit percent gap CUDA has will evaporate.
I believe pytorch works nicely with rocm, but I don't know if it's nicely to the point where it's "on par"
Pytorch also runs great on apple silicon, though it is hard to directly compare because Apple's high end GPUs can't compute anywhere near as much as nvidia's high end stuff.
e: I'll also add that pytorch does still have one oddity on apple silicon which is that it considers each tensor to be 'owned' by a particular device, either a cpu or gpu. Macs have unified memory but pytorch will still do a full copy when you 'move' data between the cpu and gpu because it just wasn't built for unified memory.