Comment by amenghra

Comment by amenghra 3 months ago

IMHO, if something isn’t part of the contract, it should be randomized. Eg if iteration order of maps isn’t guaranteed in your language, then your language should go out of its way to randomize it. Otherwise, you end up with brittle code: code that works fine until it doesn’t.

bri3d 3 months ago

There are various compiler options like -ftrivial-auto-var-init to initialize uninitialized variables to specific (or random) values in some situations, but overall, randomizing (or zeroing) the full content of the stack in each function call would be a horrendous performance regression and isn't done for this reason.

Reply View 10 replies

neuroelectron 3 months ago

There are fast instructions (e.g., REP STOSx, AVX zero stores, dc zva) and tricks (MTE, zero pages), but no magic CPU instruction exists that transparently and efficiently randomizes or zeros the stack on function calls. You think there would be one and I bet there are on some specialized high-security systems, but I'm not sure even where you would find such a product. Telecom certainly isn't it.

Reply View | 7 replies
- db48x 3 months ago
  
  There are proposed cpu architectures that work that way, like the Mill <https://millcomputing.com/>. Where most cpus support multiple calling conventions the Mill enforces a single calling convention in hardware. There is a hardware `call` instruction that does all the work directly, along with a corresponding `ret` instruction for returning from a function call. It also uses its equivalent of the TLB to ensure that each function is only granted permission to read from that portion of the stack which contains its arguments; any attempt to read outside that region would result in a permission error that causes the read to return a NaR (Not a Result, akin to a floating point NaN).
  As an additional protection, new stack frames are implicitly zeroed as they are created. I assume this is done by filling the CPU cache with zeros for those addresses before continuing to execute the called function. No need to wait for actual zeros to be written to main memory.
  https://millcomputing.com/wiki/Protection#Protecting_Stacks
  
  Reply View | 2 replies
  
  smj-edison 3 months ago
  
  This is really interesting—how do stack references work in this design?
  
  Reply View | 1 reply
  
  db48x 3 months ago
  
  Technically I think you can read the “whole” stack; it’s only reads off of the ends of the stack as a whole that are prevented. However, note that the start of your current stack may not really be the start of the real stack.
  Consider the case of a system call, such as `read`. You’re in user space and you have some stack frames on the stack as usual. You allocate a buffer on the stack (there’s a cpu instruction for that; it basically just extends your “turf¹” to include more of the stack page, and zeros it as mentioned) to hold the data you want to read. You then call `read` with the `call` instruction, including the address of the buffer and the buffer size as arguments. So far everything is very straight–forward.
  But `read` is actually in a different protection domain; it’s part of the kernel. The CPU uses metadata previously set up by the kernel to turn this into a “portal call”. After the portal call your thread will be given a different protection domain. In principle this is the kernel’s protection domain, but in reality the kernel might split that up in many complicated ways. What is relevant here is that the turf of this protection domain has been modified to include this new stack frame. From the perspective of `read`, the stack has just started; there are no prior frames. The reality is that this stack frame is still part of the stack of the caller, it’s only the turf that has changed. Those prior stack frames still exist, but they are unreadable. Worse, the buffer is also unreadable; it’s located at an address that is not part of the kernel’s turf.
  So obviously there needs to be another set of instructions for modifying turfs. The full set of obvious modifications are available, but the relevant one here is a temporary grant of read and/or write permissions to a function you are about to call. You would insert a `pass` instruction to pass along access to the buffer for the duration of the call. This access is automatically revoked after the call returns. (Ideally you wouldn’t actually have to do this manually for every portal call; instead you would call a non–portal `read` function in libc. This function’s job is to make the portal call, and whoever wrote it makes sure to include the `pass` instruction.)
  ¹ A turf is the set of addresses that a given thread running in a given protection domain can read and/or write.
  
  Reply View | 0 replies
- mjevans 3 months ago
  
  You couldn't do random, but with a predictable performance hit to memory, cache and write-line use stack addresses COULD be isolated for a program, for a library, etc.
  It'd be expensive though; every context switch would require it's own stack and pushing / restoring one more register. There's GOOD reason programs don't work that way and are supposed to not rely on values outside of properly initialized (and not later clobbered) memory.
  
  Reply View | 2 replies
  
  neuroelectron 3 months ago
  
  It should be efficient though, that's the point. Specialized hardware or instructions should be able to zero the stack in a single cycle, instead it's much more expensive. Of course the problem with this is it could be used to hide things just as easily, making it impossible to reverse engineer an unknown exploit.
  
  Reply View | 1 reply
  
  mjevans 3 months ago
  
  Why would a specialized instruction be necessary? 'the stack' is stored in memory just like everything else.
  Expensive is the (very slow for modern CPUs) operation of _writing_ that change in value out to memory at it's distant and slow speed compared to that which the CPU operates at, as well as the overhead of synchronizing that write to any other caches of those memory locations.
  Maybe you're thinking of the trick of a band new page of memory mapped memory that is 'zeroed' but is in reality just a special 'all zeros' page in the virtual to physical memory lookup table? Those still need to be zeroed by real writes at some point, if they're ever used.
  
  Reply View | 0 replies
- dwattttt 3 months ago
  
  CPUs already special case xor reg,reg as zeroing out the register, breaking any data dependency on it. If zeroing bits of the stack were common enough, I'd believe CPUs could be made that handled it efficiently (they already special case the stack; push/pop)
  
  Reply View | 0 replies
smarks 3 months ago

I'm a bit distant from this stuff, but it looks like C++26 will have something like -ftrivial-auto-var-init enabled by default. See the "safe by default" section of [1].
For reference, the actual proposal that was accepted into C++26 is [2]. It discusses performance only in general, and it refers to an earlier analysis [3] for more details. This last reference describes regressions of around 0.5% in time and in code size. Earlier prototypes suggested larger regressions (perhaps even "horrendous") but more emphasis on compiler optimizations has brought the regression down considerably.
Of course one's mileage may vary, and one might also consider a 0.5% regression unacceptable. However, the C++ committee seems to have considered this to be an acceptable tradeoff to remove a frequent cause of undefined behavior from C++.
[1]: https://herbsutter.com/2024/08/07/reader-qa-what-does-it-mea...
[2]: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p27...
[3]: https://open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2723r1...

Reply View | 0 replies

canucker2016 3 months ago

Microsoft's Visual C++ compiler has the /Ge compiler option ( see https://learn.microsoft.com/en-us/cpp/build/reference/ge-ena... ) Deprecated since VC2005.

This compiler option causes the compiler to emit a call to a stack probe function to ensure that a sufficient amount of stack space is available.

Rather than just probe once for each stack page used, you can substitute a function that *FILLS* the stack frame with a particular value - something like 0xBAADF00D - one could set the value to anything you wanted at runtime.

This would get you similar behaviour to gcc/clang's -ftrivial-auto-var-init

Windows has started to auto-initialize most stack variables in the Windows kernel and several other areas.

    The following types are automatically initialized:
    
        Scalars (arrays, pointers, floats)
        Arrays of pointers
        Structures (plain-old-data structures)
    
    The following are not automatically initialized:
    
        Volatile variables
        Arrays of anything other than pointers (i.e. array of int, array of structures, etc.)
        Classes that are not plain-old-data


    During initial testing where we forcibly initialized all types of data on the stack we saw performance regressions of over 10% in several key scenarios.

    With POD structures only, performance was more reasonable. Compiler optimizations to eliminate redundant stores (both inside basic blocks and between basic blocks) were able to further drop the regression caused by POD structures from observable to noise-level for most tests.

    We plan on revisiting zero initializing all types (especially now that our optimizer has more powerful optimizations), we just haven’t gotten to it yet.

see https://web.archive.org/web/20200518153645/https://msrc-blog...

Reply View 0 replies

frollogaston 3 months ago

Randomization at this level would be too expensive. There are tools that do this for debug purposes, and your stuff runs a lot slower in that mode.

Reply View 5 replies

throwaway2037 3 months ago
I had to Google to find the tid bit that I read about Perl years ago. I think this will affect iteration order of dicts.
> Nov 22, 2012 — Perl 5.18 will introduce per process hash randomization and almost certainly will feature a new hash function.
Reply View | 0 replies
foxhill 3 months ago

it probably shouldn’t be a “release” thing. actually, certainly. i do wonder how many bugs would never have seen the light of day, if someone’s “set” actually turned out to be a sequence (i.e. allowed duplicate values) resulting in a debug build raising an assert.

Reply View | 3 replies
- Arainach 3 months ago
  
  Debug builds are worthless for catching issues. How many people actually run them? Perhaps developers run debug builds of individual binaries they're working on when they're trying to repro a bug, but my experience at every company of every size and position in the stack (including the Windows team) is that no one does their general purpose use on a debug build.
  
  Reply View | 2 replies
  
  dontlaugh 3 months ago
  
  Especially in games, it’s common for only the highly optimised release builds to have playable performance.
  
  Reply View | 0 replies
  
  frollogaston 3 months ago
  
  Yeah, even their integration tests will probably run in opt mode.
  
  Reply View | 0 replies

abnercoimbre 3 months ago

Regarding contracts, there's an additional lesson here, quoting from the source:

> This is an interesting lesson in compatibility: even changes to the stack layout of the internal implementations can have compatibility implications if an application is bugged and unintentionally relies on a specific behavior.

I suppose this is why Linux kernel maintainers insist on never breaking user space.

Reply View 1 reply

cylemons 3 months ago

But the linux equivalent here would be glibc, not the kernel

Reply View | 0 replies

tantalor 3 months ago

Nope. You have to remember https://www.hyrumslaw.com/

  With a sufficient number of users of an API,
  it does not matter what you promise in the contract:
  all observable behaviors of your system
  will be depended on by somebody.

If you promise randomization, then somebody will depend on that :)

And then you can never remove it!

Reply View 6 replies

scott_w 3 months ago

Semi-related: this type of thing is actually covered in the Site Reliability Engineering book by Google. They highlighted a case of a system that outperformed its SLO, so people depended on it having 100% uptime. They "fixed" this by injecting errors to go closer to their SLA, forcing downstream engineers to deal with the fact that the dependent services would sometimes fail for no reason.
I know it's easier said than done everywhere, just found it to be an interesting parallel.

Reply View | 0 replies
timewizard 3 months ago

> If you promise randomization
You don't. You say the order is undefined.

Reply View | 4 replies
- __float 3 months ago
  
  That isn't the point. In practice, if you provide randomness, it will be depended upon.
  
  Reply View | 2 replies
  
  psnehanshu 3 months ago
  
  Why is that? Is that just bad coding habits?
  
  Reply View | 1 reply
  
  simonask 3 months ago
  
  All of this is bad coding habits. That's why we're here.
  
  Reply View | 0 replies
- dwattttt 3 months ago
  
  You can randomly not randomise it :)
  
  Reply View | 0 replies

ormax3 3 months ago

one might argue that one of the advantages of languages like C is that you only pay for the features you choose to use, no unnecessary overhead like initializing unused variables

Reply View 4 replies

nayuki 3 months ago

You can pay for those features in debug mode or in chaos monkey mode. It's okay to continue to not pay for them in release mode. Heck, Rust has this approach when it comes to handling integer overflow - fully checked in debug mode, silent wraparound in release mode.

Reply View | 2 replies
- irundebian 3 months ago
  
  In Ada you can pay for integer overflow checks (runtime) if you want to. With Ada SPARK you can prove that your code does not contain integer overflows so that you don't need runtime checks.
  
  Reply View | 1 reply
  
  johnisgood 3 months ago
  
  And you can disable these checks with a flag when it comes to Ada, and yeah, with SPARK, none of it happens at runtime.
  Check the table at https://docs.adacore.com/spark2014-docs/html/ug/en/usage_sce..., look for "SPARK builds on the strengths of Ada to provide even more guarantees statically rather than dynamically.".
  More reading:
  https://docs.adacore.com/spark2014-docs/html/ug/en/tutorial....
  https://learn.adacore.com (many books for learning Ada and SPARK) available in PDF, EPUB, and HTML format.
  
  Reply View | 0 replies
pjc50 3 months ago

However, the compiler does not tell you this. We're back to the problem that it's possible to have a "working" C program that relies on UB and will therefore break at some point, but the tools will not yell at you for doing this. Whereas in Java or C# you get warnings or errors for using maybe-uninitialized variables.
Also, scanf should be deprecated. Terrible API. Never use scanf or sscanf etc. We managed to get "gets()" deprecated, time to spread that to other parts of the API.
atoi() or atof() etc. work OK, but really you need a parser.

Reply View | 0 replies

willcipriano 3 months ago

Then you are wasting runtime clock cycles randomizing lists.

Reply View 6 replies

Cthulhu_ 3 months ago

Not necessarily; you can do a thing where it's randomized during development, testing and fuzzing but not in production builds or benchmarks so that the obvious "I rely on internal map order" bugs are spotted right away.

Reply View | 0 replies
wat10000 3 months ago

You can get it pretty much for free by using a random salt with your hash function. This is also useful for avoiding DOS attacks using deliberate hash collisions to trigger quadratic behavior in your hash tables.

Reply View | 0 replies
nayuki 3 months ago

Any sane language would design a list iterator to follow the order of the list. No, the difference is when you're iterating over orderless hash-based sets or maps/dictionaries. Many languages choose to leave the iteration order undefined. I think Python did that up to a point, but afterward they defined dictionaries (but not sets) to be iterated over in the order that keys were added. Also, some languages intentionally randomize the order per program run, to avoid things like users intentionally stuffing hash tables with colliding keys.

Reply View | 3 replies
- masklinn 3 months ago
  
  > Also, some languages intentionally randomize the order per program run, to avoid things like users intentionally stuffing hash tables with colliding keys.
  Most modern langages do that as part of hashdos mitigation, Python did that until it switched to a naturally ordered hashmap, then made insertion order part of the spec. Importantly iteration order remains consistent with a process (possibly on a per-hashmap basis).
  Notably, Go will randomise the starting point of hashmap iteration on each iteration.
  
  Reply View | 0 replies
- mabster 3 months ago
  
  Best change ever, that. Now it would also be nice if sets were ordered too.
  
  Reply View | 1 reply
  
  throwaway2037 3 months ago
  
  I am pretty sure there are trivial impls of sets with guaranteed iteration order in Python that use an underlying ordered map and a dummy value in each entry.
  
  Reply View | 0 replies

gzalo 3 months ago

I agree, this can also detect brittle tests (e.g, test methods/classes that only pass if executed in a particular order). But applying it for all data could be expensive computation-wise

Reply View 0 replies

mras0 3 months ago

Not really the ethos of C(++), though of course this particular bug would be easily caught by running a debug build (even 20 years ago). However, this being a game "true" debug builds were probably too slow to be usable. That was at least my experience doing gamedev in that timeframe. Then again code holding up for 20 years in that line of biz is more than sufficient anyway :)

Reply View 1 reply

mabster 3 months ago

When I was doing gamedev about 5 years ago, we were still debugging with optimisation on. You get a class of bugs just from running in lower frame rates that don't happen in release.

Reply View | 0 replies

plutaniano 3 months ago

Aren't you just creating another contract? Users might write code that depends on it being random.

Reply View 2 replies

Artoooooor 3 months ago

Maybe it would be good to change all non promised things between releases. So that such unwritten rules never become something users rely upon.

Reply View | 0 replies
tantalor 3 months ago

For those users, do this instead: https://xkcd.com/221/

Reply View | 0 replies

codebje 3 months ago

I once updated a little shy of 1mloc of Perl 5.8 code to run on Perl 5.32 (ish). There were, overall, remarkably few issues that cropped up. One of these issues (that showed itself a few times) was more or less exactly this: the iteration order through a hash is not defined. It has never been defined, but in Perl 5.8 it was consistent: for the same insertion order of the same set of keys, a hash would always iterate in the same way. In a later Perl it was deliberately randomised, not just once, but on every iteration through the hash.

It turned out there a few places that had assumed a predictable - not just stable, but deterministic - hash key iteration order. Mostly this showed up as tests that failed 50% of the time, which suggested to me a rough measure of how annoying an error is to track down is inversely correlated with how often the error appears in tests.

(Other issues were mostly due to the fact that Perl 5 is all but abandoned by its former community: a few CPAN modules are just gone, some are so far out of date that they can't be coerced to still work with other modules that have been updated over time. )

Reply View 2 replies

zerr 3 months ago

At booking.com? :)

Reply View | 1 reply
- codebje 3 months ago
  
  No, but they do (did?) have a vast ocean of Perl, and I did know a hacker or two who got hired to work there on it.
  
  Reply View | 0 replies

roseway4 3 months ago

iirc, Go intentionally randomizes map ordering for just this reason.

Reply View 2 replies

withinboredom 3 months ago

Yep, and then you get crash reports you can’t reproduce.

Reply View | 1 reply
- cristaloleg 3 months ago
  
  Same can be said about pointer addresses (random for each run). But ASLR exists for a specific reason.
  
  Reply View | 0 replies