Comment by modeless
Comment by modeless 5 days ago
What are the reasons why nobody uses pypy?
Comment by modeless 5 days ago
What are the reasons why nobody uses pypy?
It doesn't play nice with a lot of popular Python libraries. In particular, many popular Python libraries (NumPy, Pandas, TensorFlow, etc.) rely on CPython’s C API which can cause issues.
FWIW, PyPy supports NumPy and Pandas since at least v5.9.
That said, of all the reasons stated here, it's why I don't primarily use PyPy (lots of libraries still missing)
Speaking only for myself, and in all sincerity: every year, there is some feature of the latest CPython version that makes a bigger difference to my work than faster execution would. This year I am looking forward to template strings, zstd, and deferred evaluation of annotations.
Keep in mind that the two scripts that I used in my benchmark are written in pure Python, without any dependencies. This is the sweet spot for pypy. Once you start including dependencies that have native code their JIT is less efficient. Nevertheless, the performance for pure Python code is out of this world, so I definitely intend to play more with it!
Because in the real world, for code where performance is needed, you run the profiler and either find that the time is spent on I/O, or that the time is spent inside native code.
This might have been your experience, but mine has been very different. In my experience a typical python workload is 50% importing python libraries, 45% slow python wrapper logic and 5% fast native code. I spend a lot of time rewriting the python logic in C++, which makes it 100x faster, so the resulting performance approaches "10% fast native logic, 90% useless python imports".
Many simple scripts at my work that more or less just argparse and fire off an HTTP request spend half a minute importing random stuff because of false deps and uncommon codepaths. For some unit tests it's 45 seconds, substantially longer than the time taken to run the test logic.
In dev cycles most code is short-running.
If imports are slow, you need to not be writing python in the first place, because you are either on limited hardware or you are writing a very performant app.
I do a bit of performance work and find most often that things are mixed: there’s enough CPU between syscalls that the hardware isn’t being full maximized, but there’s enough I/O the CPUs aren’t pegged either. It is rare that the profiler finds an obvious hotspot that yields an easy win; usually it shows that with heavy refactoring you can make 10% of your load several times faster, and then you’ll need to do the same for the next 10% and so on. That is the more typical real world for me, and in that world Python is really awful when compared to rewrite-it-in-Rust.
This "There are no hot spots, it's just a uniform glowing orange" situation is why Google picked C++ and then later Rust and to some extent why they picked Go too.
IRL you will have CPU-bottlenecked pure Python code too. But it's not enough to take on the unknown risk of switching to a lesser supported interpreter. Worst case you just put in the effort to convert the hot parts to multiprocessing.
Also, that engineer time you would spend optimizing for performance costs more than just throwing more hardware at it.
That's the thing with single threaded CPU operations, you can't throw more hardware at it
In this situation, "more hardware" would mean throwing a faster CPU at it.
It caps out quickly. If you have a newish Mac, you're already pretty much at the max.
We look periodically and pypy is usually unusable for us due to third-party library support. E.g. psycopg2, at least as of a couple years ago. Have not checked in a while.
pypy has a c-extension compatibility layer that allows running psycopg2 (via psycopg2cffi) and similar for numpy etc.
Because it hasn't been blessed by the PSF. Plus it's always behind, so if you want to use the newest version of framework x, or package y then you're SOL.
Python libraries used to brag about being pure Python and backwards compatible, but during the push to get everyone on 3.x that went away, and I think it is a shame.
I keep wondering the same. It's a significant speed-up in most cases and equally easy to (apt) install
For public projects I default the shebang to use `env python3` but with a comment on the next line that people can use if they have pypy. People seem to rarely have it installed but they always have Python3 (often already shipped with the OS, but otherwise manually installed). I don't get it. Just a popularity / brand awareness thing I guess?
I think generally people who care about performance don't tend to write their code in Python to begin with, so the culture of python is much less performance sensitive than is typical even among other interpreted languages like perl, php, ruby or javascript. The people who do need performance, but are still using python, tend to rely on native libraries doing significant numerical calculations, and many of these libraries are not compatible with PyPy. The escape hatch there is to offload more and more of the computation into the native runtime rather than to optimize the python performance.
Because all the heavy number-crunching code is already written in C or Rust or as CUDA kernels, so the actual time spent running Python code is miniscule. If it starts to matter, I would probably reach for Cython first. PyPy is an extremely impressive project, but using it adds a lot of complexity to what is usually a glue language. It is a bit like writing a JIT for Bash.
The advantage of core python is that you import stuff and 99.999999% of the time it works.
With PyPy not so much.
I've never experienced any problems that could be attributed to the speed of my Python runtime. I use Python a lot for internal scripting and devops work, but never in a production environment that scaled beyond a few hundred users. I suspect most Python usecases are like that, and CPython is just the safest option.
It's not easily available in uv. Even if I installed it outside uv, it always seems significantly out of date. I'm running code in spaces where with uv I can control all the installs of Python, so I don't benefit from using an older release for compatibility.
Yeah I'm curious about this myself. Seems to utterly destroy CPython in every one of those benchmarks.
because it turns out that optimizing performance of a programming language designed for use-cases where runtime performance doesn't matter ... doesn't matter
There's currently talk of adding gigawatts of data center capacity to the grid just for use cases where python dominates development. While a lot of that will be compiled into optimized kernels on CPU or GPU, it only takes a little bit of 1000x slower code to add up to a significant chunk of processing time at training or inference time.
What percentage of the CPU cycles are actually spent running Python though? My impression is _very_ low in production LLM workloads. I think significantly less than 1%. There are almost certainly better places to spend the effort, and if it did matter, I think they would replace Python with something like C++ or Rust.
A lot of Python use cases don't care about CPU performance at all.
In most cases where you do care about CPU performance, you're using numpy or scikit learn or pandas or pytorch or tensorflow or nltk or some other Python library that's more or just a wrapper around fast C, C++ or Fortran code. The performance of the interpreter almost doesn't matter for these use cases.
Also, those native libraries are a hassle to get to work with PyPy in my experience. So if any part of your program uses those libraries, it's way easier to just use CPython.
There are cases where the Python interpreter's bad performance does matter and where PyPy is a practical choice, and PyPy is absolutely excellent in those cases. They just sadly aren't common and convenient enough for PyPy to be that popular. (Though it's still not exactly unpopular.)