Comment by DiabloD3
(given the context of LLMs) Unless you're doing CPU-side inference for corner cases where GPU inference is worse, lack of SIMD isn't a huge issue.
There are libraries to write SIMD in Go now, but I think the better fix is being able to autovectorize during the LLVM IR optimization stage, so its available with multiple languages.
I think LLVM has it now, its just not super great yet.