Comment by mlyle

Comment by mlyle a year ago

You're a bit pessimistic, but beyond that I feel like you're missing the point a bit.

The purpose of a RTOS on big hardware is to provide bounded latency guarantees to many things with complex interactions, while keeping high system throughput (but not as good as a non-RTOS).

A small microcontroller can typically only service one interrupt in a guaranteed fast fashion. If you don't use interrupt priorities, it's a mess; and if you do, you start adding up latencies so that the lowest priority interrupt can end up waiting indefinitely.

So, we tend to move to bigger microcontrollers (or small microprocessors) and run RTOS on them for timing critical stuff. You can get latencies of several microseconds with hundreds of nanoseconds of jitter fairly easily.

But bigger RTOS are kind of annoying; you don't have the option to run all the world's software out there as lower priority tasks and their POSIX layers tend to be kind of sharp and inconvenient. With preempt-rt, you can have all the normal linux userland around, and if you don't have any bad performing drivers, you can do nearly as well as a "real" RTOS. So, e.g., I've run a 1.6KHz flight control loop for a large hexrotor on a Raspberry Pi 3 plus a machine vision stack based on python+opencv.

Note that wherever we are, we can still choose to do stuff in high priority interrupt handlers, with the knowledge that it makes latency worse for everything else. Sometimes this is worth it. On modern x86 it's about 300-600 cycles to get into a high priority interrupt handler if the processor isn't in a power saving state-- this might be about 100-200ns. It's also not mutually exclusive with using things like PIO-- on i.mx8 I've used their rather fancy DMA controller which is basically a Turing complete processor to do fancy things in the background while RT stuff of various priority runs on the processor itself.

kragen a year ago

thank you very much! mostly that is in keeping with my understanding, but the 100–200ns number is pretty shocking to me

Reply View 6 replies

mlyle a year ago

That's a best case number, based on warm power management, an operating system that isn't disabling interrupts, and the interrupt handler being warm in L2/L3 cache.
Note that things like PCIe MSI can add a couple hundred nanoseconds themselves if this is how the interrupt is arriving. If you need to load the interrupt handler out of SDRAM, add a couple hundred nanoseconds more, potentially.
And if you are using power management and let the system get into "colder" states, add tens of microseconds.

Reply View | 5 replies
- kragen a year ago
  
  hmm, i think what matters for hard-real-time performance is the worst-case number though, the wcet, not the best or average case number. not the worst-case number for some other system that is using power management, of course, but the worst-case number for the actual system that you're using. it sounds like you're saying it's hard to guarantee a number below a microsecond, but that a microsecond is still within reach?
  osamagirl69 (⸘‽) seems to be saying in https://news.ycombinator.com/item?id=41596304 that they couldn't get better than 10μs, which is an order of magnitude worse
  
  Reply View | 4 replies
  
  mlyle a year ago
  
  But you make the choices that affect these numbers. You choose whether you use power management; you choose whether you have higher priority interrupts, etc.
  > that they couldn't get better than 10μs,
  There are multiple things discussed here. In this subthread, we're talking about what happens on amd64 with no real operating system, a high priority interrupt, power management disabled and interrupts left enabled. You can design to consistently get 100ns with these constraints. You can also pay a few hundred nanoseconds more of taxes with slightly different constraints. This is the "apples and apples" comparison with an AVR microcontroller handling an interrupt.
  Whereas with rt-preempt, we're generally talking about the interrupt firing, a task getting queued, and then run, in a contended environment. If you do not have poorly behaving drivers enabled, the latency can be a few microseconds and the jitter can be a microsecond or a bit less.
  That is, we were talking about interrupt latency (absolute time) under various assumptions; osamagirl69 was talking about task jitter (variance in time) under different assumptions.
  You can, of course, combine these techniques; you can do stuff in top-half interrupt handlers in Linux, and if you keep the system "warm" you can service those quite fast. But you lose abstraction benefits and you make everything else on the system more latent.
  
  Reply View | 3 replies