Comment by gdwatson
Yeah, I think the dream was more like, “The compiler looks at a map or filter operation and figures out whether it’s worth the overhead to parallelize it automatically.” And that turns out to be pretty hard, with potentially painful (and nondeterministic!) consequences for failure.
Maybe it would have been easier if CPU performance didn’t end up outstripping memory performance so much, or if cache coherency between cores weren’t so difficult.
Spawning threads or using a thread pool implicitly would be pretty bad - it would be difficult to reason about performance if the compiler was to make these choices for you.