Comment by LadyCailin
Comment by LadyCailin 4 days ago
We noticed this in our logs once! We service a huge amount of traffic, and as part of that, we log what is effectively an enum. We did a summarization of this field once, and noticed that there were a couple of “impossible” values being logged. One of my coworkers realized that the string that actually got logged was exactly one bit off from a valid string, and we came to the conclusion that we were probably seeing cosmic rays in action, either in our service, or in the logging service.
I had a similar story on my NAS that got one btrfs path corrupt. Plopped in on the btrfs IRC, one of the devs noticed the inconsistency was one bitflip away from the right value. Incredibly they were able to give me the right commands to fix it! Got to give credit where it is due, btrfs took the safe path and refused to touch the affected directory until fixed, and has enough tooling to fix this.
I won’t blame cosmic rays but more likely dying RAM. The NAS now runs ECC memory.