Comment by femto

Comment by femto a year ago

14 replies

If you want to see the effect of the real-time kernel, build and run the cyclictest utility from the Linux Foundation.

https://wiki.linuxfoundation.org/realtime/documentation/howt...

It measures and displays the interrupt latency for each CPU core. Without the real-time patch, worst case latency can be double digit milliseconds. With the real-time patch, worst case drops to single digit microseconds. (To get consistently low latency you will also have to turn off any power saving states, as a transition between sleep states can hog the CPU, despite the RT kernel.) Cyclictest is an important tool if you're doing real-time with Linux.

As an example, if you're doing processing for software defined radio, it's the difference between the system occasionally having "blips" and the system having rock solid performance, doing what it is supposed to every time. With the real time kernel in place, I find I can do acid-test things, like running GNOME and libreoffice on the same laptop as an SDR, and the SDR doesn't skip a beat. Without the real-time kernel it would be dropping packets all over the place.

aero-glide2 a year ago

Interestingly, whenever I touch my touchpad, the worst case latency shoots up 20x, even with RT patch. What could be causing this? And this is always on core 5.

  • femto a year ago

    Perhaps the code associated with the touchpad has a priority greater than that you used to run cyclictest (80?). Does it still happen if you boost the priority of cyclictest to the highest possible, using the option:

    --priority=99

    Apply priority 99 with care to your own code. A tight endless loop with priority 99 will override pretty well everything else, so about the only way to escape will be to turn your computer off. Been there, done that :-)

    • snvzz a year ago

      The most important is to set the policy, described in sched(7), rather than the priority.

      Notice that without setting the priority, default policy is other, which is the standard one most processes get unless they request else.

      By setting priority (while not specifying policy), the policy becomes fifo, the highest, which is meant to give the cpu immediately and not preempt until process releases it.

      This implicit change in policy is why you see such brutal effect from setting priority.

  • robocat a year ago

    Perhaps an SMM ring -2 touchpad driver?

    If you're developing anything on x86 that needs realtime - how do you disable SMM drivers causing unexpected latency?

    • jabl a year ago

      Buy HW that can be flashed with coreboot?

      And while it won't (completely) remove SMM, https://github.com/corna/me_cleaner might get rid of some stuff. I think that's more about getting rid of spyware and ring -1 security bugs than improving real-time behavior though.

  • angus-g a year ago

    Maybe a PS/2 touchpad that is triggering (a bunch of) interrupts? Not sure how hardware interrupts work with RT!

    • jabl a year ago

      One of the features of PREEMPT_RT is that it converts interrupt handlers to running in their own threads (with some exceptions, I believe), instead of being tacked on top of whatever thread context was active at the time like with the softirq approach the "normal" kernel uses. This allows the scheduler to better decide what should run (e.g. your RT process rather than serving interrupts for downloading cat pictures).

  • monero-xmr a year ago

    Touchpad support very poor in Linux. I use System76 and the touchpad is always a roll of the dice with every kernel upgrade, despite it being a "good" distro / vendor

dijit a year ago

Quiet reminder that "real-time" is almost best considered "consistent-time".

The problem space is such that it doesn't necessarily mean "faster" or lower latency in any way, just that where there is latency: it's consistent.

  • amiga386 a year ago

    I always viewed it as "the computer needs to control things that are happening in real time and won't wait for it if it's late".

  • PhilipRoman a year ago

    Indeed, some of my colleagues worked on a medical device which must be able to reset itself in 10 seconds, in case something goes wrong. 10 seconds is plenty of time on average, the real problem is eliminating those remaining 0.01% cases.

  • froh a year ago

    consistent as in reliably bounded that is.