Comment by zeusk
SMT isn't that really is it?
SMT allows for concurrent execution of both threads (thus independent front-end for fetch, decode especially) and certain core resources are statically partitioned unlike a warp being scheduled on SM.
I'm not a graphics expert but warps seem closer to run-time/dynamic VLIW than SMT.
In actual implementation they are very much like very wide SIMD on a CPU core. Each HW thread is a different warp as each warp can execute different instructions.
This mapping is so close that translation from GPU to CPU relatively easy and performant.