Comment by themerone
Comment by themerone 5 days ago
Gracefully handling hardware faults is a software problem. The Air France Flight 447 crash was the result of bad software and bad hardware.
Comment by themerone 5 days ago
Gracefully handling hardware faults is a software problem. The Air France Flight 447 crash was the result of bad software and bad hardware.
True. I would say, however, that every "concept" of airliner flight deck has its own gimmicks that can kill. The Airbus "dual input" is such a gimmick. Even though there was, for example, an AF accident with a 777 where there was hardware linkage between yokes and the two pilots were fighting... each other. Physically.
The official report doesn't identify the lack of sidestick linkage as a factor in the accident. Neither of the two pilots who were at the controls had any idea what was happening. Both pulled back on their sticks repeatedly right up to the moment of impact. The captain, who eventually realized (too late) that the plane was stalled, was standing behind them, and so would not have benefited from linked sticks.
There's a detailed breakdown here: https://admiralcloudberg.medium.com/the-long-way-down-the-cr...
There was more than one case where pilots would accidentally fight and break the linkage, or one would overpower the other.
One glider instructor talked about taking a stick with him in case of panicking student, so they could hit them hard enough so they would stop holding the controls.
Although the pitot tubes on AF447 were due to be replaced with a type more resistant to icing, nonetheless there's no such thing as a 100% reliable pitot tube and there were procedures to follow in the event of unreliable airspeed indication. Had they been followed the accident would not have happened. Instead the co-pilot held back on his stick until the aircraft fell out of the sky.
I don't believe there was any issue identified with the software of the plane.
Of topic for the thread, but on for the comment: I was working in an automotive project 3 years ago. It was all about safety, and one hypothesis was the processor could get overloaded. I was astonished no one in a grouo of 20 “senior sw architecs” had any idea about the concept of load shedding. The proposed solution was “in that case, reboot”.
Mind you whatever came out of that project is rolling on the street today.
Fail safe/fail soft
I still design this into many of the things I work on, especially if I’m working close to the metal on controller systems. At some point it becomes ridiculous / impossible but I’m often thinking about how a system would handle memory corruption, bit flips, invalid sensor data, etc. These days, somebody should design a triple redundant microcontroller that runs quorum on the gpio at the hardware level. It could be a 0.30 part instead of 0.10 one, but I would specify it just about everywhere. Add $3 to BOM cost to categorically eliminate an entire class of failure would be ramrodded by legal into just about every medical device, PLC, critical automotive system, etc one would think. Seems like a good gambit for a riscV startup, but what do I know.
Ok so, turns out there are a lot of MCUs like this, including a riscV triple core lockstep with ECC lol. No super cheap ones, but microchip makes the AVR-SD which leverages a pair of their AVR8 cores in lockstep with ECC flash and RAM. It’s ~$1, so I think I’ll pick that as my next toy project to play with. Turns out, Simpsons already did it.
Crashes caused by pilots failing to execute proper stall recovery procedures are surprisingly common, and similar accidents have happened before in aircraft with traditional control schemes, so I’m skeptical that there are any hardware changes that would have made much difference. The official report doesn’t identify the hardware or software as significant factors.
The moment to avoid the accident was probably the very first moment when Bonin entered a steep climb when the plane was already at 35,000 feet, only 2000 feet below the maximum altitude for its configuration. This was already a sufficiently insane thing to do that the other less senior pilot should have taken control, had CRM been functioning effectively. What actually happened is that both of the pilots in the cockpit at the start of the incident failed to identify that the plane was stalled despite the fact that (i) several stall warnings had sounded and (ii) the plane had climbed above its maximum altitude (where it would inevitably either stall or overspeed) and was now descending. It’s never very satisfying to blame pilots, but this was a monumental fuck up.
If the pilots genuinely disagree about control inputs there is not much that hardware or software can do to help. Even on aircraft with traditional mechanically linked control columns like the 737, the linkage will break if enough pressure is applied in opposite directions by each pilot (a protection against jamming).