Comment by Rochus
That is why we should always use a standardized, controlled benchmark suite, which has well-defined rules to assure fair cross-language comparisons with a representative, well-balanced workload. By focusing on a core set of language features and abstractions, Are-we-fast-yet allows for a more controlled comparison of language implementation performance, isolating the effects of compiler and runtime optimizations.
This is especially important for scripting languages like Python, where a large part of the features are implemented in C or other native languages and called via FFI. That's why, for example, the benchmark implements its own collections, because we want to know how fast the interpreter is. Otherwise, as you have noticed, the result is randomly influenced by how much compute a particular application can delegate to the FFI.
> That's why, for example, the benchmark implements its own collections, because we want to know how fast the interpreter is. Otherwise, as you have noticed, the result is randomly influenced by how much compute a particular application can delegate to the FFI.
That sounds like the exact opposite of what I would want as a user of the language: the benchmark completely abstracts the actual behaviour of the runtime, claiming purported gains which don’t come anywhere near manifesting when trying to run actual software.
I’m not implementing my own collections when `dict` suffices, and I don’t really care that a pure python version of `re` runs faster in graal than in cpython, because I’m not using that.
So what happens is I see claims that graalpython runs 17 times faster than cpython, I try it out, it runs 6 times slower instead, and I can only conclude that graal is a worthless pile of lies and I should stop caring.