Comment by wenc
DuckDB is not currently a supported package, but Pandas and matplotlib are which is good. If DuckDB and Polars were supported and if they ran well, I suspect many data jobs could benefit.
DuckDB is not currently a supported package, but Pandas and matplotlib are which is good. If DuckDB and Polars were supported and if they ran well, I suspect many data jobs could benefit.
They wouldn’t benefit from performance because as you say they are already blazing fast as is. And I know what you mean — I rewrote a pure (granted old pre-2.0) pandas transformation into duckdb and compute time dropped from nearly an hour to single digit minutes.
But having these in Graal would allow more types of applications to be deployed in JVM stacks. As sibling comments note, many data science models are in python but production stacks are in Java.
Why would they benefit? When duckdb/Polaris are being used correctly, all the work is happening in the native stack. It should already be very fast compared to the Python runtime.
I recently moved a large ETL process that was mostly Python runtime processing to pyarrow/Polaris and wrote all the ETL logic in SQL. I've seen processes that used to take a week to run drop to about an hour (no exaggeration).