Comment by masklinn

Comment by masklinn 2 days ago

1 reply

> That's why, for example, the benchmark implements its own collections, because we want to know how fast the interpreter is. Otherwise, as you have noticed, the result is randomly influenced by how much compute a particular application can delegate to the FFI.

That sounds like the exact opposite of what I would want as a user of the language: the benchmark completely abstracts the actual behaviour of the runtime, claiming purported gains which don’t come anywhere near manifesting when trying to run actual software.

I’m not implementing my own collections when `dict` suffices, and I don’t really care that a pure python version of `re` runs faster in graal than in cpython, because I’m not using that.

So what happens is I see claims that graalpython runs 17 times faster than cpython, I try it out, it runs 6 times slower instead, and I can only conclude that graal is a worthless pile of lies and I should stop caring.

Rochus 2 days ago

If you don't know exactly what you are measuring, the measurement is worthless. We must therefore isolate the measurement subject for the measurement, and avoid uncontrollable influences as far as possible. This is how engineering works, and every engineer should also be aware of measurement errors. In addition, repeatability and falsifiability of the experiment and conclusions are required for scientific claims. The mere statement "too slow to be acceptable" or "worthless pile of lies" is not enough for this.

A measurement method does not have to represent every practical application of the measured subject. In the present case, the measurement allows a statement to be made about the performance of the interpreter (CPython) in relateion to the JIT compiler (GraalPy). Whether the technology is right for your specific application or not is another question.