Comment by amenghra

Comment by amenghra a day ago

49 replies

IMHO, if something isn’t part of the contract, it should be randomized. Eg if iteration order of maps isn’t guaranteed in your language, then your language should go out of its way to randomize it. Otherwise, you end up with brittle code: code that works fine until it doesn’t.

bri3d a day ago

There are various compiler options like -ftrivial-auto-var-init to initialize uninitialized variables to specific (or random) values in some situations, but overall, randomizing (or zeroing) the full content of the stack in each function call would be a horrendous performance regression and isn't done for this reason.

  • neuroelectron a day ago

    There are fast instructions (e.g., REP STOSx, AVX zero stores, dc zva) and tricks (MTE, zero pages), but no magic CPU instruction exists that transparently and efficiently randomizes or zeros the stack on function calls. You think there would be one and I bet there are on some specialized high-security systems, but I'm not sure even where you would find such a product. Telecom certainly isn't it.

    • db48x a day ago

      There are proposed cpu architectures that work that way, like the Mill <https://millcomputing.com/>. Where most cpus support multiple calling conventions the Mill enforces a single calling convention in hardware. There is a hardware `call` instruction that does all the work directly, along with a corresponding `ret` instruction for returning from a function call. It also uses its equivalent of the TLB to ensure that each function is only granted permission to read from that portion of the stack which contains its arguments; any attempt to read outside that region would result in a permission error that causes the read to return a NaR (Not a Result, akin to a floating point NaN).

      As an additional protection, new stack frames are implicitly zeroed as they are created. I assume this is done by filling the CPU cache with zeros for those addresses before continuing to execute the called function. No need to wait for actual zeros to be written to main memory.

      https://millcomputing.com/wiki/Protection#Protecting_Stacks

      • smj-edison 14 hours ago

        This is really interesting—how do stack references work in this design?

        • db48x an hour ago

          Technically I think you can read the “whole” stack; it’s only reads off of the ends of the stack as a whole that are prevented. However, note that the start of your current stack may not really be the start of the real stack.

          Consider the case of a system call, such as `read`. You’re in user space and you have some stack frames on the stack as usual. You allocate a buffer on the stack (there’s a cpu instruction for that; it basically just extends your “turf¹” to include more of the stack page, and zeros it as mentioned) to hold the data you want to read. You then call `read` with the `call` instruction, including the address of the buffer and the buffer size as arguments. So far everything is very straight–forward.

          But `read` is actually in a different protection domain; it’s part of the kernel. The CPU uses metadata previously set up by the kernel to turn this into a “portal call”. After the portal call your thread will be given a different protection domain. In principle this is the kernel’s protection domain, but in reality the kernel might split that up in many complicated ways. What is relevant here is that the turf of this protection domain has been modified to include this new stack frame. From the perspective of `read`, the stack has just started; there are no prior frames. The reality is that this stack frame is still part of the stack of the caller, it’s only the turf that has changed. Those prior stack frames still exist, but they are unreadable. Worse, the buffer is also unreadable; it’s located at an address that is not part of the kernel’s turf.

          So obviously there needs to be another set of instructions for modifying turfs. The full set of obvious modifications are available, but the relevant one here is a temporary grant of read and/or write permissions to a function you are about to call. You would insert a `pass` instruction to pass along access to the buffer for the duration of the call. This access is automatically revoked after the call returns. (Ideally you wouldn’t actually have to do this manually for every portal call; instead you would call a non–portal `read` function in libc. This function’s job is to make the portal call, and whoever wrote it makes sure to include the `pass` instruction.)

          ¹ A turf is the set of addresses that a given thread running in a given protection domain can read and/or write.

    • mjevans a day ago

      You couldn't do random, but with a predictable performance hit to memory, cache and write-line use stack addresses COULD be isolated for a program, for a library, etc.

      It'd be expensive though; every context switch would require it's own stack and pushing / restoring one more register. There's GOOD reason programs don't work that way and are supposed to not rely on values outside of properly initialized (and not later clobbered) memory.

      • neuroelectron a day ago

        It should be efficient though, that's the point. Specialized hardware or instructions should be able to zero the stack in a single cycle, instead it's much more expensive. Of course the problem with this is it could be used to hide things just as easily, making it impossible to reverse engineer an unknown exploit.

        • mjevans 14 hours ago

          Why would a specialized instruction be necessary? 'the stack' is stored in memory just like everything else.

          Expensive is the (very slow for modern CPUs) operation of _writing_ that change in value out to memory at it's distant and slow speed compared to that which the CPU operates at, as well as the overhead of synchronizing that write to any other caches of those memory locations.

          Maybe you're thinking of the trick of a band new page of memory mapped memory that is 'zeroed' but is in reality just a special 'all zeros' page in the virtual to physical memory lookup table? Those still need to be zeroed by real writes at some point, if they're ever used.

    • dwattttt a day ago

      CPUs already special case xor reg,reg as zeroing out the register, breaking any data dependency on it. If zeroing bits of the stack were common enough, I'd believe CPUs could be made that handled it efficiently (they already special case the stack; push/pop)

  • smarks a day ago

    I'm a bit distant from this stuff, but it looks like C++26 will have something like -ftrivial-auto-var-init enabled by default. See the "safe by default" section of [1].

    For reference, the actual proposal that was accepted into C++26 is [2]. It discusses performance only in general, and it refers to an earlier analysis [3] for more details. This last reference describes regressions of around 0.5% in time and in code size. Earlier prototypes suggested larger regressions (perhaps even "horrendous") but more emphasis on compiler optimizations has brought the regression down considerably.

    Of course one's mileage may vary, and one might also consider a 0.5% regression unacceptable. However, the C++ committee seems to have considered this to be an acceptable tradeoff to remove a frequent cause of undefined behavior from C++.

    [1]: https://herbsutter.com/2024/08/07/reader-qa-what-does-it-mea...

    [2]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p27...

    [3]: https://open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2723r1...

  • canucker2016 17 hours ago

    Microsoft's Visual C++ compiler has the /Ge compiler option ( see https://learn.microsoft.com/en-us/cpp/build/reference/ge-ena... ) Deprecated since VC2005.

    This compiler option causes the compiler to emit a call to a stack probe function to ensure that a sufficient amount of stack space is available.

    Rather than just probe once for each stack page used, you can substitute a function that *FILLS* the stack frame with a particular value - something like 0xBAADF00D - one could set the value to anything you wanted at runtime.

    This would get you similar behaviour to gcc/clang's -ftrivial-auto-var-init

    Windows has started to auto-initialize most stack variables in the Windows kernel and several other areas.

        The following types are automatically initialized:
        
            Scalars (arrays, pointers, floats)
            Arrays of pointers
            Structures (plain-old-data structures)
        
        The following are not automatically initialized:
        
            Volatile variables
            Arrays of anything other than pointers (i.e. array of int, array of structures, etc.)
            Classes that are not plain-old-data
    
    
        During initial testing where we forcibly initialized all types of data on the stack we saw performance regressions of over 10% in several key scenarios.
    
        With POD structures only, performance was more reasonable. Compiler optimizations to eliminate redundant stores (both inside basic blocks and between basic blocks) were able to further drop the regression caused by POD structures from observable to noise-level for most tests.
    
        We plan on revisiting zero initializing all types (especially now that our optimizer has more powerful optimizations), we just haven’t gotten to it yet.
    
    see https://web.archive.org/web/20200518153645/https://msrc-blog...
frollogaston a day ago

Randomization at this level would be too expensive. There are tools that do this for debug purposes, and your stuff runs a lot slower in that mode.

  • throwaway2037 a day ago

    I had to Google to find the tid bit that I read about Perl years ago. I think this will affect iteration order of dicts.

        > Nov 22, 2012 — Perl 5.18 will introduce per process hash randomization and almost certainly will feature a new hash function.
  • foxhill a day ago

    it probably shouldn’t be a “release” thing. actually, certainly. i do wonder how many bugs would never have seen the light of day, if someone’s “set” actually turned out to be a sequence (i.e. allowed duplicate values) resulting in a debug build raising an assert.

    • Arainach a day ago

      Debug builds are worthless for catching issues. How many people actually run them? Perhaps developers run debug builds of individual binaries they're working on when they're trying to repro a bug, but my experience at every company of every size and position in the stack (including the Windows team) is that no one does their general purpose use on a debug build.

      • dontlaugh a day ago

        Especially in games, it’s common for only the highly optimised release builds to have playable performance.

      • frollogaston 12 hours ago

        Yeah, even their integration tests will probably run in opt mode.

abnercoimbre a day ago

Regarding contracts, there's an additional lesson here, quoting from the source:

> This is an interesting lesson in compatibility: even changes to the stack layout of the internal implementations can have compatibility implications if an application is bugged and unintentionally relies on a specific behavior.

I suppose this is why Linux kernel maintainers insist on never breaking user space.

  • cylemons 19 hours ago

    But the linux equivalent here would be glibc, not the kernel

tantalor a day ago

Nope. You have to remember https://www.hyrumslaw.com/

  With a sufficient number of users of an API,
  it does not matter what you promise in the contract:
  all observable behaviors of your system
  will be depended on by somebody.
If you promise randomization, then somebody will depend on that :)

And then you can never remove it!

  • scott_w 19 hours ago

    Semi-related: this type of thing is actually covered in the Site Reliability Engineering book by Google. They highlighted a case of a system that outperformed its SLO, so people depended on it having 100% uptime. They "fixed" this by injecting errors to go closer to their SLA, forcing downstream engineers to deal with the fact that the dependent services would sometimes fail for no reason.

    I know it's easier said than done everywhere, just found it to be an interesting parallel.

  • timewizard a day ago

    > If you promise randomization

    You don't. You say the order is undefined.

    • __float a day ago

      That isn't the point. In practice, if you provide randomness, it will be depended upon.

      • psnehanshu a day ago

        Why is that? Is that just bad coding habits?

        • simonask a day ago

          All of this is bad coding habits. That's why we're here.

ormax3 a day ago

one might argue that one of the advantages of languages like C is that you only pay for the features you choose to use, no unnecessary overhead like initializing unused variables

  • nayuki a day ago

    You can pay for those features in debug mode or in chaos monkey mode. It's okay to continue to not pay for them in release mode. Heck, Rust has this approach when it comes to handling integer overflow - fully checked in debug mode, silent wraparound in release mode.

  • pjc50 18 hours ago

    However, the compiler does not tell you this. We're back to the problem that it's possible to have a "working" C program that relies on UB and will therefore break at some point, but the tools will not yell at you for doing this. Whereas in Java or C# you get warnings or errors for using maybe-uninitialized variables.

    Also, scanf should be deprecated. Terrible API. Never use scanf or sscanf etc. We managed to get "gets()" deprecated, time to spread that to other parts of the API.

    atoi() or atof() etc. work OK, but really you need a parser.

gzalo a day ago

I agree, this can also detect brittle tests (e.g, test methods/classes that only pass if executed in a particular order). But applying it for all data could be expensive computation-wise

mras0 a day ago

Not really the ethos of C(++), though of course this particular bug would be easily caught by running a debug build (even 20 years ago). However, this being a game "true" debug builds were probably too slow to be usable. That was at least my experience doing gamedev in that timeframe. Then again code holding up for 20 years in that line of biz is more than sufficient anyway :)

  • mabster a day ago

    When I was doing gamedev about 5 years ago, we were still debugging with optimisation on. You get a class of bugs just from running in lower frame rates that don't happen in release.

codebje a day ago

I once updated a little shy of 1mloc of Perl 5.8 code to run on Perl 5.32 (ish). There were, overall, remarkably few issues that cropped up. One of these issues (that showed itself a few times) was more or less exactly this: the iteration order through a hash is not defined. It has never been defined, but in Perl 5.8 it was consistent: for the same insertion order of the same set of keys, a hash would always iterate in the same way. In a later Perl it was deliberately randomised, not just once, but on every iteration through the hash.

It turned out there a few places that had assumed a predictable - not just stable, but deterministic - hash key iteration order. Mostly this showed up as tests that failed 50% of the time, which suggested to me a rough measure of how annoying an error is to track down is inversely correlated with how often the error appears in tests.

(Other issues were mostly due to the fact that Perl 5 is all but abandoned by its former community: a few CPAN modules are just gone, some are so far out of date that they can't be coerced to still work with other modules that have been updated over time. )

plutaniano a day ago

Aren't you just creating another contract? Users might write code that depends on it being random.

willcipriano a day ago

Then you are wasting runtime clock cycles randomizing lists.

  • Cthulhu_ a day ago

    Not necessarily; you can do a thing where it's randomized during development, testing and fuzzing but not in production builds or benchmarks so that the obvious "I rely on internal map order" bugs are spotted right away.

  • wat10000 a day ago

    You can get it pretty much for free by using a random salt with your hash function. This is also useful for avoiding DOS attacks using deliberate hash collisions to trigger quadratic behavior in your hash tables.

  • nayuki a day ago

    Any sane language would design a list iterator to follow the order of the list. No, the difference is when you're iterating over orderless hash-based sets or maps/dictionaries. Many languages choose to leave the iteration order undefined. I think Python did that up to a point, but afterward they defined dictionaries (but not sets) to be iterated over in the order that keys were added. Also, some languages intentionally randomize the order per program run, to avoid things like users intentionally stuffing hash tables with colliding keys.

    • masklinn 21 hours ago

      > Also, some languages intentionally randomize the order per program run, to avoid things like users intentionally stuffing hash tables with colliding keys.

      Most modern langages do that as part of hashdos mitigation, Python did that until it switched to a naturally ordered hashmap, then made insertion order part of the spec. Importantly iteration order remains consistent with a process (possibly on a per-hashmap basis).

      Notably, Go will randomise the starting point of hashmap iteration on each iteration.

    • mabster a day ago

      Best change ever, that. Now it would also be nice if sets were ordered too.

      • throwaway2037 a day ago

        I am pretty sure there are trivial impls of sets with guaranteed iteration order in Python that use an underlying ordered map and a dummy value in each entry.

roseway4 a day ago

iirc, Go intentionally randomizes map ordering for just this reason.

  • withinboredom a day ago

    Yep, and then you get crash reports you can’t reproduce.

    • cristaloleg 21 hours ago

      Same can be said about pointer addresses (random for each run). But ASLR exists for a specific reason.