Comment by pron

Comment by pron 7 hours ago

6 replies

I was surprised to see that Java was slower than C++, but the Java code is run with `-XX:+UseSerialGC`, which is the slowest GC, meant to be used only on very small systems, and to optimise for memory footprint more than performance. Also, there's no heap size, which means it's hard to know what exactly is being measured. Java allows trading off CPU for RAM and vice-versa. It would be meaningful if an appropriate GC were used (Parallel, for this batch job) and with different heap sizes. If the rules say the program should take less than 8GB of RAM, then it's best to configure the heap to 8GB (or a little lower). Also, System.gc() shouldn't be invoked.

Don't know if that would make a difference, but that's how I'd run it, because in Java, the heap/GC configuration is an important part of the program and how it's actually executed.

Of course, the most recent JDK version should be used (I guess the most recent compiler version for all languages).

rockwotj 6 hours ago

It’s so hard to actually benchmark languages because it so much depends on the dataset, I am pretty sure with simdjson and some tricks I could write C++ (or Rust) that could top the leaderboard (see some of the techniques from the billion row challenge!).

tbh for silly benchmarks like this it will ultimately be hard to beat a language that compiles to machine code, due to jit warmup etc.

It’s hard to due benchmarks right, for example are you testing IO performance? are OS caches flushed between language runs? What kind of disk is used etc? Performance does not exist in a vacuum of just the language or algorithm.

  • pron 4 hours ago

    > due to jit warmup

    I think this harness actually uses JMH, which measures after warmup.

KerrAvon 5 hours ago

Why are you surprised? Java always suffers from abstraction penalty for running on a VM. You should be surprised (and skeptical) if Java ever beats C++ on any benchmark.

  • pron 4 hours ago

    The only "abstraction penalty" of "running on a VM" (by which I think you mean using a JIT compiler), is the warmup time of waiting for the JIT.

    • andersmurphy 29 minutes ago

      Its a statement of our times that this is getting down voted. JIT is so underrated.

  • woooooo 4 hours ago

    For the most naive code, if you're calling "new" multiple times per row, maybe Java benefits from out of band GC while C++ calls destructors and free() inline as things go out of scope?

    Of course, if you're optimizing, you'll reuse buffers and objects in either language.