Comment by adastra22
For the record of people reading this, I wrote a multithreaded SIMD-heavy compute task in Go, and it suffered only 5% slowdown vs the original hand-optimized C++ version.
The low level SIMD stuff was called out to over the c FFI bridge; golang was used for the rest of the program.