Comment by osigurdson

Comment by osigurdson 10 hours ago

0 replies

>> [1] what is a node? Typically it is a synonym for "server". In some configurations HPC schedulers allow node sharing

I'm sure they mean actual servers / not just cores. Even in traditional HPC it isn't abstracted to the level of individual cores usually since most HPC jobs care about memory bandwidth - even with Infiniband or other techniques throughput / latency is much worse than on a single machine. Of course, multiple machines are connected (usually using MPI / Infiniband) but important to try to minimize communication between nodes where possible.

For AI workloads, they are running GPUs - so 10K+ cores on a single device so even less likely to be talking about cores here.