Comment by DiabloD3

(given the context of LLMs) Unless you're doing CPU-side inference for corner cases where GPU inference is worse, lack of SIMD isn't a huge issue.

There are libraries to write SIMD in Go now, but I think the better fix is being able to autovectorize during the LLVM IR optimization stage, so its available with multiple languages.

I think LLVM has it now, its just not super great yet.