Comment by porridgeraisin

Comment by porridgeraisin 6 days ago

I was referring to this portion of TFA

> CUDA cores are much more flexible than a TPU’s VPU: GPU CUDA cores use what is called a SIMT (Single Instruction Multiple Threads) programming model, compared to the TPU’s SIMD (Single Instruction Multiple Data) model.

adrian_b 6 days ago

This flexibility of CUDA is a software facility, which is independent of the hardware implementation.

For any SIMD processor one can write a compiler that translates a program written for the SIMT programming model into SIMD instructions. For example, for the Intel/AMD CPUs with SSE4/AVX/AVX-512 ISAs, there exists a compiler of this kind (ispc: https://github.com/ispc/ispc).

Reply View 1 reply

porridgeraisin 5 days ago

Thanks, I will look into that.
However, I'm still confused about the original statement. What I had thought was that
pre-volta GPUs, each thread in a warp has to execute in lock-step. Post-volta, they can all execute different instructions.
Obviously this is a surface level understanding. How do I reconcile this with what you wrote in the other comment and this one?

Reply View | 0 replies