Comment by chubot
That looks more like a SIMD problem than a multi-core problem
You want bigger units of work for multiple cores, otherwise the coordination overhead will outweigh the work the application is doing
I think the Erlang runtime is probably the best use of functional programming and multiple cores. Since Erlang processes are shared nothing, I think they will scale to 64 or 128 cores just fine
Whereas the GC will be a bottleneck in most languages with shared memory ... you will stop scaling before using all your cores
But I don't think Erlang is as fine-grained as your example ...
Some related threads:
https://news.ycombinator.com/item?id=40130079
https://news.ycombinator.com/item?id=31176264
AFAIU Erlang is not that fast an interpreter; I thought the Pony Language was doing something similar (shared nothing?) with compiled code, but I haven't heard about it in awhile
There's some sharing used to avoid heavy copies, though GC runs at the process level. The implementation is tilted towards copying between isolated heaps over sharing, but it's also had performance work done over the years. (In fact, if I really want to cause a global GC pause bottleneck in Erlang, I can abuse persistent_term to do this.)