Comment by kccqzy
Comment by kccqzy 5 days ago
Because in the real world, for code where performance is needed, you run the profiler and either find that the time is spent on I/O, or that the time is spent inside native code.
Comment by kccqzy 5 days ago
Because in the real world, for code where performance is needed, you run the profiler and either find that the time is spent on I/O, or that the time is spent inside native code.
Many simple scripts at my work that more or less just argparse and fire off an HTTP request spend half a minute importing random stuff because of false deps and uncommon codepaths. For some unit tests it's 45 seconds, substantially longer than the time taken to run the test logic.
In dev cycles most code is short-running.
> Many simple scripts at my work [...] For some unit tests it's 45 seconds
> I spend a lot of time rewriting the python logic in C++, which makes it 100x faster
Nice! Your workplace didn't care to pick a better tool for the job in the past, and it seems to not care what you're doing at present, if you have to spend time rewriting the stuff in C++, instead of picking Nim and calling it a day, in a day.
Even better, in Nim these little CLI tools could use https://github.com/c-blake/cligen and have had terminal colorized, auto-generated help for many years now with much less dev-effort than raw argparse. Start-up time of statically linked Nim programs is like O(100..500 microseconds, just like C programs).
Have you thought about packing that stuff into an executable or precomputing or preloading it? There's techniques for each of those things that help in some scenarios.
If imports are slow, you need to not be writing python in the first place, because you are either on limited hardware or you are writing a very performant app.
I do a bit of performance work and find most often that things are mixed: there’s enough CPU between syscalls that the hardware isn’t being full maximized, but there’s enough I/O the CPUs aren’t pegged either. It is rare that the profiler finds an obvious hotspot that yields an easy win; usually it shows that with heavy refactoring you can make 10% of your load several times faster, and then you’ll need to do the same for the next 10% and so on. That is the more typical real world for me, and in that world Python is really awful when compared to rewrite-it-in-Rust.
This "There are no hot spots, it's just a uniform glowing orange" situation is why Google picked C++ and then later Rust and to some extent why they picked Go too.
IRL you will have CPU-bottlenecked pure Python code too. But it's not enough to take on the unknown risk of switching to a lesser supported interpreter. Worst case you just put in the effort to convert the hot parts to multiprocessing.
Also, that engineer time you would spend optimizing for performance costs more than just throwing more hardware at it.
That's the thing with single threaded CPU operations, you can't throw more hardware at it
In this situation, "more hardware" would mean throwing a faster CPU at it.
It caps out quickly. If you have a newish Mac, you're already pretty much at the max.
This might have been your experience, but mine has been very different. In my experience a typical python workload is 50% importing python libraries, 45% slow python wrapper logic and 5% fast native code. I spend a lot of time rewriting the python logic in C++, which makes it 100x faster, so the resulting performance approaches "10% fast native logic, 90% useless python imports".