hocuspocus 2 days ago

Mostly the latter. Scala 3 is almost completely irrelevant to the big data space so far. Databricks took six years before upgrading their proprietary Spark runtime to Scala 2.13. Flink dropped the Scala API before even moving to 2.13. I don't know if Scio will seriously attempt the move to Scala 3. All of them suffer from Twitter libraries being abandoned, which isn't insurmountable, but an annoyance still.

And I don't think it matters anymore. I predict that the JVM will eventually be out of the equation. We're already seeing query engines being replaced by proprietary or open source equivalents in C++ or Rust. Large scale distribution is less of a selling point with modern cloud computing. Do you really need 100 executors when you can get a bare metal instance with 192, 256 or 384 cores?

People want a dataframe API in Python because that's what the the ML/DS/AI crowd knows. Queries and processing will be done in C++ or Rust, with little or even zero need for a distributed runtime. The JVM and Scala solve a problem that simply won't exist anymore.