Comment by rsyring

Comment by rsyring 2 days ago

2 replies

Why would they benefit? When duckdb/Polaris are being used correctly, all the work is happening in the native stack. It should already be very fast compared to the Python runtime.

I recently moved a large ETL process that was mostly Python runtime processing to pyarrow/Polaris and wrote all the ETL logic in SQL. I've seen processes that used to take a week to run drop to about an hour (no exaggeration).

wenc 2 days ago

They wouldn’t benefit from performance because as you say they are already blazing fast as is. And I know what you mean — I rewrote a pure (granted old pre-2.0) pandas transformation into duckdb and compute time dropped from nearly an hour to single digit minutes.

But having these in Graal would allow more types of applications to be deployed in JVM stacks. As sibling comments note, many data science models are in python but production stacks are in Java.

  • rsyring 2 days ago

    > But having them this would allow more types of applications to be deployed in JVM stack...

    Ah...makes sense now. I was thinking along the lines of someone switching to the JVM for better performance, but being held back by the absence of those libraries.