Comment by addaon
> One thing to remember is that unless you explicitly made a device to operate in space and even there beyond LEO, you quite probably might have seen no requirement for ECC memory for radiation reasons.
This does not match my experience (although, admittedly, I've been in the field only a couple decades -- the hardware under discussion predates that). The problem with SEU-induced bit flips is not that errors happen, but that errors with unbounded behavior happen -- consider a bit flip in the program counter, especially in an architecture with variable sized instructions. This drives requirements around error detection, not correction -- but the three main tools here are lockstep processor cores, parity on small memories, and SECDED on large memories. SECDED ECC here is important both because it can catch double errors that happen close together in time, and because memory scrubbing with single error correction allows multiple errors spaced in time to be considered separately. At the system level, the key insight is that detectable failures of a single ECU have to be handled anyway, because of non-transient statistical failures -- connector failures, tin whiskers, etc. The goal, then, is to convert substantially all failures to detectable failures, and then have defined failure behavior (often fail-silent). This leads to dual-dual redundancy architectures and similar, instead of triplex; each channel consists of two units that cross-check each other, and downstream units can assume that commands received from either channel are either correct or absent.
The incident report on the 2008 case specifically mentions SEU for memory - certain level of EDAC techniques is applied on the whole unit, but one area that was not covered was the possibility of non-catastrophic (in terms of operation) failure of memory module with a bitflip.
An under-appreciated thing is also that the devices in question used to be rebooted pretty often which triggered self-test routines in addition to the run-time tests - something that didn't trigger anything in case of A330 in 2008, but was impactful in risk assessments missing certain things with 787 some years later (and newer A380/A350 recently).