Comment by Rochus

Comment by Rochus 2 days ago

6 replies

In case someone is interested, here are some benchmark results comparing GraalPy and others with JDK8 using the Are-we-fast-yet benchmark suite: https://stefan-marr.de/downloads/tmp/awfy-bun.html

And here is a table representation of all benchmarks and the geomean and median overall results: http://software.rochus-keller.ch/awfy-bun-summary.ods

The implementation of the same benchmark suite runs around factor 2.4 (geomean) faster on JDK8 than on GraalPython EE 22.3 Hotspot, or 41 times faster than CPython 3.11. GraalPython is thus about 17 times faster than CPython, and about two times faster than PyPy. The Graal Enterprise Edition (EE) seem to be factor 1.31 faster than the Community Edition (CE).

masklinn 2 days ago

Your mileage may very much vary, much like pypy this is very inconsistent and highly dependent on your workload (as well as your dependencies).

My limited experience was that on re-heavy workload pypy is several times slower than cpython (~3x compared to 3.10) and graal is even worse (~6x compared to 3.11).

  • mike_hearn 2 days ago

    Which version was that with? GraalVM can JIT compile regular expressions these days, with the same compiler as everything else. They implemented TRegex on top of Truffle so regex can be inlined and optimized like regular code.

    Performance does indeed depend on workload. There's a page that compares GraalPy vs CPython and Jython on the Python Performance Suite which aims to be "real world":

    https://www.graalvm.org/latest/reference-manual/python/Perfo...

    There the speedup is smaller, but this is partly because a lot of real world Python workloads these days spend all their time inside C or the GPU. Having a better implementation is still a good idea though, because it means more stuff can be done by researchers who don't know C++ well or at all. The point at which you're forced to get dedicated hackers involved to optimize gets pushed backwards if you can rely on a good JIT.

    • masklinn 2 days ago

      > Which version was that with?

      24.1. 23 may or may not have been worse, I didn’t take specific notes aside from “too slow to be acceptable”

  • Rochus 2 days ago

    That is why we should always use a standardized, controlled benchmark suite, which has well-defined rules to assure fair cross-language comparisons with a representative, well-balanced workload. By focusing on a core set of language features and abstractions, Are-we-fast-yet allows for a more controlled comparison of language implementation performance, isolating the effects of compiler and runtime optimizations.

    This is especially important for scripting languages like Python, where a large part of the features are implemented in C or other native languages and called via FFI. That's why, for example, the benchmark implements its own collections, because we want to know how fast the interpreter is. Otherwise, as you have noticed, the result is randomly influenced by how much compute a particular application can delegate to the FFI.

    • masklinn 2 days ago

      > That's why, for example, the benchmark implements its own collections, because we want to know how fast the interpreter is. Otherwise, as you have noticed, the result is randomly influenced by how much compute a particular application can delegate to the FFI.

      That sounds like the exact opposite of what I would want as a user of the language: the benchmark completely abstracts the actual behaviour of the runtime, claiming purported gains which don’t come anywhere near manifesting when trying to run actual software.

      I’m not implementing my own collections when `dict` suffices, and I don’t really care that a pure python version of `re` runs faster in graal than in cpython, because I’m not using that.

      So what happens is I see claims that graalpython runs 17 times faster than cpython, I try it out, it runs 6 times slower instead, and I can only conclude that graal is a worthless pile of lies and I should stop caring.

      • Rochus 2 days ago

        If you don't know exactly what you are measuring, the measurement is worthless. We must therefore isolate the measurement subject for the measurement, and avoid uncontrollable influences as far as possible. This is how engineering works, and every engineer should also be aware of measurement errors. In addition, repeatability and falsifiability of the experiment and conclusions are required for scientific claims. The mere statement "too slow to be acceptable" or "worthless pile of lies" is not enough for this.

        A measurement method does not have to represent every practical application of the measured subject. In the present case, the measurement allows a statement to be made about the performance of the interpreter (CPython) in relateion to the JIT compiler (GraalPy). Whether the technology is right for your specific application or not is another question.