Comment by crystal_revenge
Comment by crystal_revenge 20 hours ago
You can assert whatever you want, but Polars is a great answer. The performance improvements are secondary to me compared to the dramatic improvement in interface.
Today all serious DS work will ultimately become data engineering work anyway. The time when DS can just fiddle around in notebooks all day has passed.
Pandas is widely adopted and deeply integrated into the Python ecosystem. Meanwhile, Polars remains a small niche, and it's one of those hype technologies that will likely be dead in 3 years once most of its users realise that it offers them no actual practical advantages over Pandas.
If you are dealing with huge data sets, you are probably using Spark or something like Dask already where jobs can run in the cloud. If you need speed and efficiency on your local machine, you use NumPy outright. And if you really, really need speed, you rewrite it in C/C++.
Polars is trying to solve an issue that just doesn't exist for the vast majority of users.