JVM statistics cause garbage collection pauses (2015)
(evanjones.ca)68 points by tosh 10 hours ago
68 points by tosh 10 hours ago
> The pauses occur even [..] if you call mlock
I wonder how this is even possible. The only scenario I can think of involves a page fault on the page table itself (i.e., the page is locked into memory, but a page fault occurs during virtual-to-physical address translation). Does anyone know the real reason?
Probably because pages mapped, even if they are locked into memory are not allowed to stay dirty forever. Does this help? https://stackoverflow.com/a/11024388 (In contrast, if you mlocked but never wrote to the pages, you probably would not encounter read pauses)
There is no law that says /tmp must be on tmpfs, and historically this wasn't done, because tmpfs is limited in size to some faction of the kernel's memory, while /tmp may be used to store much larger files.
For example, GNU sort can sort arbitrarily large input files, which is implemented by splitting the input into sorted chunks that are written to a temporary directory, /tmp by default. But this is based on the assumption that /tmp can store significantly larger files than fit in memory, otherwise the point is moot. So using tmpfs makes /tmp useless for this type of operation.
In the end, it's a trade-off between performance and disk space. I also prefer to mount /tmp on tmpfs for performance reasons, but you should not assume that this is the case on all systems.
> Why would I want it on tmpfs?
It's now there in several distros by default. Not that it answers your question.
A bit more context in the mailing list:
> It's a non-issue with a pure ram-based file system. Or tmpfs with no swap.
https://mail.openjdk.org/pipermail/hotspot-runtime-dev/2015-...
In 2015 there was no ZGC. Today ZGC (an optional garbage collector optimized for latency) guarantees that there will be no GC pauses longer than a millisecond.
I would check your answer. These are pauses due to time spent writing to diagnostic outputs. These are not traditional collection pauses. This affects both jstat as well as writes of GC logs. (I.e. GC log writes will block the app just the same way)
These modern garbage collectors are not simply free though. I got bored last year and went on a deep dive with GC params for Minecraft. For my needs I ended up with: -XX:+UseParallelGC -XX:MaxGCPauseMillis=300 -Xmx2G -Xms768M
When flying around in spectator mode, you'd see 3 to 4 processes using 100%. Changing to more modern collectors just added more load to the system. ZGC was the worst, with 16+ processes all using 100% cpu. With the ParallelGC, yes you'll get the occasional pause but at least my laptop is not burning hot fire.
Sadly in many cases no; it's not magic. This nirvana is restricted to cases where there is CPU bandwidth available (e.g. some cores idle) and plenty of free RAM. When either CPU or RAM are less plentiful... hello pauses my old friend.
This is why memory-bound services generally use languages without mandatory GC. Tail latency is a killer.
Rust's memory management does have some issues in practice (large synchronous drops) but they're relatively minor and easily addressed compared to mandatory GC.
For proper statistics use Visual VM or Flight Recorder, if using an OpenJDK derived JVM implementation.
Also note that not all JVMs are made alike, and there are plenty to chose from.