Comment by rep_lodsb

Comment by rep_lodsb 6 days ago

6 replies

It's possible that actually reading the register takes (significantly) more time than an empty countdown loop. A somewhat extreme example of that would be on x86, where accessing legacy I/O ports for e.g. the timer goes through a much lower-clocked emulated ISA bus.

However, a more likely explanation is the use of "volatile" (which only appears in the working version of the code). Without it, the compiler might even have completely removed the loop?

deng 6 days ago

> However, a more likely explanation is the use of "volatile" (which only appears in the working version of the code). Without it, the compiler might even have completely removed the loop?

No, because the loop calls cpu_relax(), which is a compiler barrier. It cannot be optimized away.

And yes, reading via the memory bus is much, much slower than a barrier. It's absolutely likely that reading 4 times from main memory on such an old embedded system takes several hundred cycles.

  • Karliss 6 days ago

    From what I understand the timer registers should be on APB(1) bus which operates at fixed 26MHz clock. That should be much closer to the scale of fast timer clocks compared to cpu_relax() and main CPU clock running somewhere in the range of 0.5-1GHz and potentially doing some dynamic frequency scaling for power saving purpose.

    The silliest part of this mess is that 26Mhz clock for APB1 bus is derived from the same source as 13Mhz, 6.5Mhz 3.25Mhz, 1Mhz clocks usable by fast timers.

  • rep_lodsb 6 days ago

    You're right, didn't account for that. Though even when declared volatile, the counter variable would be on the stack, and thus already in the CPU cache (at least 32K according to the datasheet)?

    Looking at the assembly code for both versions of this delay loop might clear it up.

    • deng 6 days ago

      The only thing volatile does is to assure that the value is read from memory each time (which implicitly also forbids optimizations). Whether that memory is in a CPU cache is purely a hardware issue and outside the C specification. If you read something like a hardware register, you yourself need to take care in some way that a hardware cache will not give you old values (by mapping it into a non-cached memory area, or by forcing a cache update). If you for-loop over something that acts as a compiler barrier, all that 'volatile' on the counter variable will do is potentially make the for-loop slower.

      There's really just very few reasons to ever use 'volatile'. In fact, the Linux kernel even has its own documentation why you should usually not use it:

      https://www.kernel.org/doc/html/latest/process/volatile-cons...

      • sim7c00 6 days ago

        doesnt volatile also ensure the address is not changed for the read by compiler (as it might optimise data layout otherwise)? (so you can be sure when using mmio etc. it wont read from wrong place?)

        • deng 6 days ago

          "volatile", according to the standard, simply is: "An object that has volatile-qualified type may be modified in ways unknown to the implementation or have other unknown side effects. Therefore any expression referring to such an object shall be evaluated strictly according to the rules of the abstract machine."

          Or simpler: don't assume anything what you think you might know about this object, just do as you're told.

          And yes, that for instance prohibits putting a value from a memory address into a register for further use, which would be a simple case of data optimization. Instead, a fresh retrieval from memory must be done on each access.

          However, if your system has caching or an MMU is outside of the spec. The compiler does not care. If you tell the compiler to give you the byte at address 0x1000, it will do so. 'volatile' just forbids the compiler to deduce the value from already available knowledge. If a hardware cache or MMU messes with that, that's your problem, not the compiler's.