Comment by kccqzy

Comment by kccqzy 5 days ago

Because in the real world, for code where performance is needed, you run the profiler and either find that the time is spent on I/O, or that the time is spent inside native code.

repsilat 5 days ago

This might have been your experience, but mine has been very different. In my experience a typical python workload is 50% importing python libraries, 45% slow python wrapper logic and 5% fast native code. I spend a lot of time rewriting the python logic in C++, which makes it 100x faster, so the resulting performance approaches "10% fast native logic, 90% useless python imports".

Reply View 7 replies

kccqzy 5 days ago

There is more than one PEP related to making imports faster such as PEP 690 or PEP 810. It's definitely a well-known problem. The solution is probably right around the corner.

Reply View | 0 replies
b3orn 5 days ago

Imports being slow is annoying, but only matters to short running code.

Reply View | 4 replies
- repsilat 5 days ago
  
  Many simple scripts at my work that more or less just argparse and fire off an HTTP request spend half a minute importing random stuff because of false deps and uncommon codepaths. For some unit tests it's 45 seconds, substantially longer than the time taken to run the test logic.
  In dev cycles most code is short-running.
  
  Reply View | 3 replies
  
  instig007 4 days ago
  
  > Many simple scripts at my work [...] For some unit tests it's 45 seconds
  > I spend a lot of time rewriting the python logic in C++, which makes it 100x faster
  Nice! Your workplace didn't care to pick a better tool for the job in the past, and it seems to not care what you're doing at present, if you have to spend time rewriting the stuff in C++, instead of picking Nim and calling it a day, in a day.
  
  Reply View | 1 reply
  
  cb321 4 days ago
  
  Even better, in Nim these little CLI tools could use https://github.com/c-blake/cligen and have had terminal colorized, auto-generated help for many years now with much less dev-effort than raw argparse. Start-up time of statically linked Nim programs is like O(100..500 microseconds, just like C programs).
  
  Reply View | 0 replies
  
  nickpsecurity 5 days ago
  
  Have you thought about packing that stuff into an executable or precomputing or preloading it? There's techniques for each of those things that help in some scenarios.
  
  Reply View | 0 replies
ActorNightly 4 days ago

If imports are slow, you need to not be writing python in the first place, because you are either on limited hardware or you are writing a very performant app.

Reply View | 0 replies

jonstewart 5 days ago

I do a bit of performance work and find most often that things are mixed: there’s enough CPU between syscalls that the hardware isn’t being full maximized, but there’s enough I/O the CPUs aren’t pegged either. It is rare that the profiler finds an obvious hotspot that yields an easy win; usually it shows that with heavy refactoring you can make 10% of your load several times faster, and then you’ll need to do the same for the next 10% and so on. That is the more typical real world for me, and in that world Python is really awful when compared to rewrite-it-in-Rust.

Reply View 2 replies

tialaramex 4 days ago

This "There are no hot spots, it's just a uniform glowing orange" situation is why Google picked C++ and then later Rust and to some extent why they picked Go too.

Reply View | 1 reply
- jonstewart 4 days ago
  
  I am, indeed, a C++ developer. :-)
  
  Reply View | 0 replies

lucb1e 5 days ago

When it's a drop-in replacement, as in most of my code (and it's dead simple to try if it runs when you use pypy ./main.py), I wouldn't know why you should run the code 5-50% slower for no reason though

Reply View 0 replies

[removed] 4 days ago

[deleted]

Reply View 0 replies

morshu9001 5 days ago

IRL you will have CPU-bottlenecked pure Python code too. But it's not enough to take on the unknown risk of switching to a lesser supported interpreter. Worst case you just put in the effort to convert the hot parts to multiprocessing.

Reply View 0 replies

lenerdenator 5 days ago

Also, that engineer time you would spend optimizing for performance costs more than just throwing more hardware at it.

Reply View 5 replies

repsilat 5 days ago

For cloud jobs that can be true, but for single threaded dev-in-the-loop work you can't just buy a 100x faster processor than the one on their dev machine, and the latency is expensive workflow friction.

Reply View | 0 replies
tehjoker 5 days ago

Not if you have certain types of scientific data. You can't rent enough hardware to run the slow code.

Reply View | 0 replies
morshu9001 5 days ago

That's the thing with single threaded CPU operations, you can't throw more hardware at it

Reply View | 2 replies
- lenerdenator 4 days ago
  
  In this situation, "more hardware" would mean throwing a faster CPU at it.
  
  Reply View | 1 reply
  
  morshu9001 4 days ago
  
  It caps out quickly. If you have a newish Mac, you're already pretty much at the max.
  
  Reply View | 0 replies