Comment by modeless

Comment by modeless 5 days ago

View on Hacker News

What are the reasons why nobody uses pypy?

mort96 5 days ago

A lot of Python use cases don't care about CPU performance at all.

In most cases where you do care about CPU performance, you're using numpy or scikit learn or pandas or pytorch or tensorflow or nltk or some other Python library that's more or just a wrapper around fast C, C++ or Fortran code. The performance of the interpreter almost doesn't matter for these use cases.

Also, those native libraries are a hassle to get to work with PyPy in my experience. So if any part of your program uses those libraries, it's way easier to just use CPython.

There are cases where the Python interpreter's bad performance does matter and where PyPy is a practical choice, and PyPy is absolutely excellent in those cases. They just sadly aren't common and convenient enough for PyPy to be that popular. (Though it's still not exactly unpopular.)

Reply View 0 replies

ActivePattern 5 days ago

It doesn't play nice with a lot of popular Python libraries. In particular, many popular Python libraries (NumPy, Pandas, TensorFlow, etc.) rely on CPython’s C API which can cause issues.

Reply View 2 replies

jszymborski 5 days ago

FWIW, PyPy supports NumPy and Pandas since at least v5.9.
That said, of all the reasons stated here, it's why I don't primarily use PyPy (lots of libraries still missing)

Reply View | 1 reply
- pletnes 4 days ago
  
  But pypy doesn’t necessarily perform as well, and it can’t jit compile the already compiled C code in numpy, so any benefits are often lost.
  
  Reply View | 0 replies

gwking 5 days ago

Speaking only for myself, and in all sincerity: every year, there is some feature of the latest CPython version that makes a bigger difference to my work than faster execution would. This year I am looking forward to template strings, zstd, and deferred evaluation of annotations.

Reply View 0 replies

miguelgrinberg 5 days ago

Keep in mind that the two scripts that I used in my benchmark are written in pure Python, without any dependencies. This is the sweet spot for pypy. Once you start including dependencies that have native code their JIT is less efficient. Nevertheless, the performance for pure Python code is out of this world, so I definitely intend to play more with it!

Reply View 0 replies

kccqzy 5 days ago

Because in the real world, for code where performance is needed, you run the profiler and either find that the time is spent on I/O, or that the time is spent inside native code.

Reply View 20 replies

repsilat 5 days ago

This might have been your experience, but mine has been very different. In my experience a typical python workload is 50% importing python libraries, 45% slow python wrapper logic and 5% fast native code. I spend a lot of time rewriting the python logic in C++, which makes it 100x faster, so the resulting performance approaches "10% fast native logic, 90% useless python imports".

Reply View | 7 replies
- kccqzy 5 days ago
  
  There is more than one PEP related to making imports faster such as PEP 690 or PEP 810. It's definitely a well-known problem. The solution is probably right around the corner.
  
  Reply View | 0 replies
- b3orn 5 days ago
  
  Imports being slow is annoying, but only matters to short running code.
  
  Reply View | 4 replies
  
  repsilat 5 days ago
  
  Many simple scripts at my work that more or less just argparse and fire off an HTTP request spend half a minute importing random stuff because of false deps and uncommon codepaths. For some unit tests it's 45 seconds, substantially longer than the time taken to run the test logic.
  In dev cycles most code is short-running.
  
  Reply View | 3 replies
- ActorNightly 4 days ago
  
  If imports are slow, you need to not be writing python in the first place, because you are either on limited hardware or you are writing a very performant app.
  
  Reply View | 0 replies
jonstewart 5 days ago

I do a bit of performance work and find most often that things are mixed: there’s enough CPU between syscalls that the hardware isn’t being full maximized, but there’s enough I/O the CPUs aren’t pegged either. It is rare that the profiler finds an obvious hotspot that yields an easy win; usually it shows that with heavy refactoring you can make 10% of your load several times faster, and then you’ll need to do the same for the next 10% and so on. That is the more typical real world for me, and in that world Python is really awful when compared to rewrite-it-in-Rust.

Reply View | 2 replies
- tialaramex 4 days ago
  
  This "There are no hot spots, it's just a uniform glowing orange" situation is why Google picked C++ and then later Rust and to some extent why they picked Go too.
  
  Reply View | 1 reply
  
  jonstewart 4 days ago
  
  I am, indeed, a C++ developer. :-)
  
  Reply View | 0 replies
lucb1e 5 days ago

When it's a drop-in replacement, as in most of my code (and it's dead simple to try if it runs when you use pypy ./main.py), I wouldn't know why you should run the code 5-50% slower for no reason though

Reply View | 0 replies
[removed] 4 days ago

[deleted]

Reply View | 0 replies
morshu9001 5 days ago

IRL you will have CPU-bottlenecked pure Python code too. But it's not enough to take on the unknown risk of switching to a lesser supported interpreter. Worst case you just put in the effort to convert the hot parts to multiprocessing.

Reply View | 0 replies
lenerdenator 5 days ago

Also, that engineer time you would spend optimizing for performance costs more than just throwing more hardware at it.

Reply View | 5 replies
- repsilat 5 days ago
  
  For cloud jobs that can be true, but for single threaded dev-in-the-loop work you can't just buy a 100x faster processor than the one on their dev machine, and the latency is expensive workflow friction.
  
  Reply View | 0 replies
- tehjoker 5 days ago
  
  Not if you have certain types of scientific data. You can't rent enough hardware to run the slow code.
  
  Reply View | 0 replies
- morshu9001 5 days ago
  
  That's the thing with single threaded CPU operations, you can't throw more hardware at it
  
  Reply View | 2 replies
  
  lenerdenator 4 days ago
  
  In this situation, "more hardware" would mean throwing a faster CPU at it.
  
  Reply View | 1 reply
  
  morshu9001 4 days ago
  
  It caps out quickly. If you have a newish Mac, you're already pretty much at the max.
  
  Reply View | 0 replies

TheCondor 5 days ago

I use it where I can, unfortunately those places are usually scripts that don’t benefit from the compiler.

The project is moving into maintenance mode, if some folks want to get python-famous, go support pypy.

Reply View 0 replies

rootusrootus 5 days ago

We look periodically and pypy is usually unusable for us due to third-party library support. E.g. psycopg2, at least as of a couple years ago. Have not checked in a while.

Reply View 1 reply

throwaway7783 5 days ago

pypy has a c-extension compatibility layer that allows running psycopg2 (via psycopg2cffi) and similar for numpy etc.

Reply View | 0 replies

dec0dedab0de 5 days ago

Because it hasn't been blessed by the PSF. Plus it's always behind, so if you want to use the newest version of framework x, or package y then you're SOL.

Python libraries used to brag about being pure Python and backwards compatible, but during the push to get everyone on 3.x that went away, and I think it is a shame.

Reply View 0 replies

lucb1e 5 days ago

I keep wondering the same. It's a significant speed-up in most cases and equally easy to (apt) install

For public projects I default the shebang to use `env python3` but with a comment on the next line that people can use if they have pypy. People seem to rarely have it installed but they always have Python3 (often already shipped with the OS, but otherwise manually installed). I don't get it. Just a popularity / brand awareness thing I guess?

Reply View 0 replies

MobiusHorizons 4 days ago

I think generally people who care about performance don't tend to write their code in Python to begin with, so the culture of python is much less performance sensitive than is typical even among other interpreted languages like perl, php, ruby or javascript. The people who do need performance, but are still using python, tend to rely on native libraries doing significant numerical calculations, and many of these libraries are not compatible with PyPy. The escape hatch there is to offload more and more of the computation into the native runtime rather than to optimize the python performance.

Reply View 0 replies

redsymbol 5 days ago

It currently only supports Python 3.11. That is a big reason.

Reply View 1 reply

lucb1e 5 days ago

I was happy to see it supports a fairly recent Python3 at all now, like Py3.5 or what is it that ships with most of the expected stuff? Works for me, I'd target something like that for compatibility anyway

Reply View | 0 replies

[removed] 4 days ago

[deleted]

Reply View 0 replies

Asooka 5 days ago

Because all the heavy number-crunching code is already written in C or Rust or as CUDA kernels, so the actual time spent running Python code is miniscule. If it starts to matter, I would probably reach for Cython first. PyPy is an extremely impressive project, but using it adds a lot of complexity to what is usually a glue language. It is a bit like writing a JIT for Bash.

Reply View 0 replies

ActorNightly 4 days ago

The advantage of core python is that you import stuff and 99.999999% of the time it works.

With PyPy not so much.

Reply View 0 replies

CivBase 5 days ago

I've never experienced any problems that could be attributed to the speed of my Python runtime. I use Python a lot for internal scripting and devops work, but never in a production environment that scaled beyond a few hundred users. I suspect most Python usecases are like that, and CPython is just the safest option.

Reply View 0 replies

eightys3v3n 5 days ago

It's not easily available in uv. Even if I installed it outside uv, it always seems significantly out of date. I'm running code in spaces where with uv I can control all the installs of Python, so I don't benefit from using an older release for compatibility.

Reply View 0 replies

[removed] 5 days ago

[deleted]

Reply View 0 replies

nickelpro 5 days ago

Personally: cpyext always lags changes in the CPython ABI and headers which my code relies on, or I'm relying on internals which cpyext doesn't implement at all

Reply View 0 replies

nhumrich 5 days ago

We want the new features more than we want performance!

Also: there are some libraries that just don't work on pypy.

Reply View 0 replies

bjourne 5 days ago

Can't run PyTorch on PyPy.

Reply View 0 replies

throwaway314155 5 days ago

Yeah I'm curious about this myself. Seems to utterly destroy CPython in every one of those benchmarks.

Reply View 0 replies

semiinfinitely 5 days ago

because it turns out that optimizing performance of a programming language designed for use-cases where runtime performance doesn't matter ... doesn't matter

Reply View 4 replies

morganherlocker 5 days ago

There's currently talk of adding gigawatts of data center capacity to the grid just for use cases where python dominates development. While a lot of that will be compiled into optimized kernels on CPU or GPU, it only takes a little bit of 1000x slower code to add up to a significant chunk of processing time at training or inference time.

Reply View | 2 replies
- MobiusHorizons 4 days ago
  
  What percentage of the CPU cycles are actually spent running Python though? My impression is _very_ low in production LLM workloads. I think significantly less than 1%. There are almost certainly better places to spend the effort, and if it did matter, I think they would replace Python with something like C++ or Rust.
  
  Reply View | 0 replies
- [removed] 5 days ago
  
  [deleted]
  
  Reply View | 0 replies
lucb1e 5 days ago

Might as well take the work that's already done though? I can't think of a logical reason why you'd want to run it at potentially half the speed (depending on the hot code specifics how much, if any, speedup you get of course)

Reply View | 0 replies