When O3 is 2x slower than O2
(cat-solstice.github.io)97 points by keyle 5 days ago
97 points by keyle 5 days ago
You can influence the choice of conditional moves (usually inserting them) with
__builtin_expect_with_probability(..., 0.5)
https://github.com/protocolbuffers/protobuf/commit/9f29f02a3...
Nice read. Last week I wrote a blog post about two noteworthy cases of O3 being slower than O2 in C++: https://barish.me/blog/cpp-o3-slower/
While working on my game, I tried implementing a custom priority queue.
It ended up being > 2x faster in debug build, but 2x-5x slower in the release build (??!!?) [1]. I haven't learned much about compilers/"lower level" C++, so I moved on at that point.
How it worked:
1.) The P.Q. created a vector and resized it to known bounds.
2.) The P.Q. kept tract of and updated the "active sorting range" each time an element was inserted or popped.
2B.) So each time an element is added, it uses the closest unused vector element and updates the ".end" range to sort
2C.) Each time an element was removed, it updated the ".start" range
3.) In theory this should have saved reallocation overhead.
[1] I believe Visual Studio uses -O0 for debug, and -O2 for release.
These kinds of things are really interesting, and a good example of the importance of performance testing in optimized builds. I encountered something similar and realized the issue was different sizes of structures in debug vs release which ended up causing the CPU cache lines to overlap and invalidate. Though that sounds unlikely here.
And this is why you go and look at the assembly in godbolt to see wtf is going on.
There's no need to resort when popping an item.
When adding an item, it gets added to the next unused vector element. The sorting range end offset gets updated.
Then it sorts (note you would actually need a custom sort since a PriorityQueue is a pair)
std::push_heap( vec.begin() + startOffset, vec.begin() + endOffset ) [1]
Adding an item would be something like: endOffset++; vec.insert( vec.begin() + endOffset, $value ); [1]
Or maybe I just used endOffset++; vec[endOffset] = $value; [1]
Popping an item: startOffset++;
[1] Im writing this from memory from my attempt many months ago. May have mistakes.. but should communicate the jistAs a denser gas, Ozone would have greater friction getting through small pores, so that would be one example?
-O3 -march=haswell
The second one is your problem. Haswell is 15 years old now. Almost nobody owns a CPU that old. -O3 makes a lot of architecture-dependent decisions, and tying yourself to an antique architecture gives you very bad results.
The author gives a godbolt link. It takes 5 minutes to add two compilers on raptorlake and see that it gives the same result.
https://godbolt.org/z/oof145zjb
So no, Haswell is not the problem. LLVM just doesn't know about the dependency thing.
Also, if you’re building binaries for redistribution, requiring Haswell or newer is kind of an aggressive choice.
I have a box that old that can run image diffusion models (I upgraded the GPU during covid.)
My point being that compiler authors do worry about “obsolete” targets because they’re widely used for compatibility reasons.
Oh, yeah. Looking at the code the comparison function is bad in a way that will hit issues with cmovs. The "else if" there is a nearly 100% predictable branch which follows a branch that's probably completely unpredictable. One optimization you get with -O3 is that branches not marked likely or unlikely can be turned into dataflow. The proper ordering for that code is to pull the equality check forward into an if statement and mark with "unlikely" if you're using -O3, and then return the ordering of the floating point comparison.
Since we're using Rust there's some decent syntactic sugar for this function you can use instead:
let cmp = |other: &Neighbor| -> Ordering {
other.dist.partial_cmp(&neighbor.dist).unwrap()
.then_with(|| other.id.cmp(&neighbor.id))
};
You probably won't get any CMOVs in -O3 with this function because there are NaN issues with the original code.My desktop computer is a Sandy Bridge from 2011. I still haven't seen any compelling reason to upgrade.
For what it's worth, you may be pleasantly surprised by the performance if you upgrade. I went from an Ivy Bridge processor to a Zen 3 processor, and I found that there were a lot of real world scenarios which got noticably faster. For example, AI turns in a late game Civ 6 map went from 30s to 15s. I can't promise you'll see good results, but it's worth considering.
What factors would be compelling to upgrade for?
Just curious, since perf alone doesn’t seem to be the factor.
https://browser.geekbench.com/processors/intel-core-i7-2600k
https://browser.geekbench.com/processors/intel-core-i9-14900...
Because number bigger doesn’t translate to higher perceived performance…
The only compelling reason that I want to upgrade my Sandy Lake chip is AVX2.
So it is instruction set not perf, sure there will be improved performance but most of the things that are actually performance issues is already handed off to the GPU.
On that note probably rebar and PCIe4 but those aren’t dramatic differences, if CPU is really a problem (renders/compilation) then it gets offloaded to different hardware.
This era of CPUs has held up surprisingly well. I built an Ivy Bridge desktop in 2012 that still sees daily productivity use (albeit with an NVMe and RAM upgrade).
Off-topic, but I seriously dislike the syntax of Rust. It is chaotic and mind-boggling to me. See the "insert" function.
Good post though.
This point has been litigated to death. Read this here: https://matklad.github.io/2023/01/26/rusts-ugly-syntax.html
Almost everything that people think is ugly about Rust's syntax exists for very specific reasons. Most of the time, imo Rust made a good decision, and is just making something explicit.
Some things take time to get used to (e.g. if let), but for most people that's less an issue of syntax, and more an issue of not understanding a powerful feature (e.g. pattern matching deconstructions).
The reasons do not matter here. It is still ugly / noisy / overly-complicated and probably could have been done better.
I understand pattern matching deconstructions, I have seen it in other languages. Funnily enough they were nowhere as ugly / noisy / complicated as Rust's is. Rust seems to have bolted on a lot of fancy shit that may be appealing to a lot of people and that is it.
In your link, the first one is fugly, the last one is fine. Maybe Rust just encourages ugly (i.e. complicated) code a bit too much.
As someone who mostly writes Clojure code professionally during the day, I agree, Rust's syntax is complicated for no good reason, ugly and overly-verbose. But then I think that about most Algol-like language too, not just Rust.
And despite that I do use Rust when I want something simple to deploy/deliver, as handing over a binary that just runs is such a nice experience, and it's real easy to make fast. As long as I don't have to maintain in long-term, Rust is fine for what it is.
> It is still ugly / noisy / overly-complicated and probably could have been done better.
I don't know, it feels like you're just saying that you don't like it, missed the point of the post, and are not giving us anything concrete. Can you list a very clear example of how you'd improve the syntax?
Again, see the post: You can remove things, but you're losing explicitness or other things. If you want a language that's more implicit, this is fine. I don't.
I'm actually fine with almost all the decisions that Rust made in terms of logic and concepts, but specifically don't like the synthax itself: the symbols, keywords like the consonant-only "fn" instead of "func" for instance, the fact that || {} starts a lambda instead of || -> void {}, the fact that you can return things by simply having them in an if branch. It's the main reason I don't use the language.
"if let" just breaks my brain, it feels backwards to me and takes me a minute each time to work out what's going on. In 40 years of programming I've never seen a syntactic construct that I found less intuitive. And it would probably be easily fixable, if it was more along the lines of "if x matches Some(let z)".
if let makes a lot more sense when you learn that a normal let expression also takes a pattern[1].
let Coordinates(x, y) = get_coords();
But this is intended for "exhaustive patterns". If you can't express an exhaustive pattern, like with an Option, then you can use let ... else let Some((x, y)) = get_coords() else { return };
if let is just an extension of this "let pattern" system.Once you internalize how patterns work (and they really work everywhere) it all starts to really make sense and feels a lot cleaner.
They could have at least put the types on the left. Types on the right are a good fit for mathematics, where everything is single-letter symbols, so your eyes do not have to go across the line to see the entire definition, but in programming function definitions often span an entire line on screen, so the most important information should be the first thing shown! Also most programmers do not have formal education in mathematics, so the whole "function : domain -> image" syntax is foreign to them. We really should strive to democratise programming for everyone, rather than make it isolated to a small clique of "properly" educated white men who all went to the same university.
The type of a variable and the return type of a function are the most important pieces of information regarding those, so they ought to be on the left. It also fits the data flow going right to left (i.e. how the '=' operator works). C's type declarations can get pretty gnarly, so there is definitely room for improvement there. I would say Java (and C#) got it as close to perfect as possible.
If you want to replace C/C++ you should make your language look as inviting as possible to users of the old one. I truly think this strange fetish for putting types on the right is what gives C programmers the ick when they see Rust for the first time and is hampering its adoption.
as a beginner rust programmer, I agree. it takes me way longer to parse someone else's rust code than it does for me to read C or C++, even though I have about the same amount of experience with them. in that example, I had to look up what "if let Err() =" does, because it's not intuitive to me. it seems like every time I read rust code, I have to learn about some strange feature that's probably convenient when you know it, but there's a billion of them and they add up to hard to read code until you've spent tons and tons of time with rust. it's just so much to memorize compared to other languages.
I have the opposite experience: C++ is what I have the most experience with by a very wide margin, but I find reading other people's rust code way easier than other people's C++ code. There's way more weird features, language extensions and other crazy stuff in C++ land.
I think both Rust and C++ are behemoths. I do not even know what "proper" "modern" C++ is.
It reminds me the experience I had when working with Scala, I really tried to like it but the mind-boggling amount of features created similar issues.
It took me about 2 years to feel somewhat comfortable but I'd still run into code where someone decided to use their own set of unconventional favourite features, requiring me to learn yet-another-way to do the same thing I had seen done in other ways.
I just got tired of it, didn't feel more productive nor enlightened...
It takes at most a week to get used to just about any syntax. It should absolutely never be a reason not to try a new language. That said, it would be lovely if Rust had a more ML-like, rather than C-like, syntax (being originally inspired by O’Caml), but that would likely not help attract devs in the intended target audience!
The insert function, for what it’s worth, has nonstandard formatting; the de facto standard rustfmt tool would use more newlines, making it clearer and less dense. The function also uses a couple of syntactic features like `if let`, which may seem unfamiliar at first but become second nature in a few days at most.
I should probably have used something less ambivalent than "getting used to". There's certainly more elegance in Rust's semantics than syntax, but the latter doesn't really differ much from C, except for much better for loops, a sane declaration syntax (this is a BIG deal), and of course all the features that C just doesn't have, like closures^* Some of the syntax noise just has to be there in some way, to support the real deal which is the semantics.
^* I don't particularly enjoy the Ruby-borrowed || either, particularly because Alt-7 is a terrible key combination to type, but oh well.
Every time a language developer chooses to give their new language a novel, let alone chaotic, syntax, they sin against the very premise of software development being an engineering profession. Can you imagine the madness if mechanical engineers had the same laissez-faire mentality towards the conventions and learned best practices of their field? Randomly chosen bolts to use left hand threads just because it suits their arbitrary whim, every thread has a unique pitch never seen before just to make the machine stand out from the crowd. Parts of the machine use hot rivets, not because the design demands it but because it scratches a retro aesthetic itch the designer had that day. Every time a new machine is designed it has a whole class of new never before seen quirky ways of being put together and any mechanic who pleads for standardization on common sense ways of putting a thing together are just told to git gud because the creative expression of the 'engineers' is paramount.
(This is how we end up with German cars, isn't it?)
It's of course a matter of perspective, goals, and assumed knowledge. Is it choosing to not follow best practices, or leaving outdated practices with unfixable downsides behind? Is it inventing new things or instead actually just taking already-existing things that you just happened to not know about? Is it making arbitrary changes, or necessary ones for a useful goal? Is an extra bit of temporary pain for existing engineers worth making things better for all future engineers? (not to say that Rust has necessarily made all the right decisions; but alas basically nothing is perfect, and things certainly won't get better if noone tries)
> Every time a language developer chooses to give their new language a novel, let alone chaotic, syntax, they sin against the very premise of software development being an engineering profession.
Exactly which syntax should every language be using then? Everyone will give you a different answer.
> Randomly chosen bolts to use left hand threads just because it suits their arbitrary whim
Claiming the syntax was chosen randomly on a whim is very much not true: https://matklad.github.io/2023/01/26/rusts-ugly-syntax.html
And there are times when it does make sense to use left-hand threads.
Just because someone looks at new syntax and doesn't immediately understand it doesn't mean that syntax doesn't exist for good reason.
Also, if your language's constructs and semantics don't exactly match those in other languages, then giving them the same syntax would be actively misleading, and therefore a design flaw.
Syntax is also only an issue for people who haven't taken the time to learn the language. If you haven't learned the language, then familiar syntax isn't going to help you anyway. You'll just think you understand something you don't.
It’s promises make me interested, but the syntax is my main turnoff. I am a Go person, and I think what brings people to go is the opposite of what brings people to Rust. I am willing to sacrifice some memory safety (because I maybe naively think I can manage to write software without many memory bugs) for the simplicity and dev experience that Go offers.
The first statement defines a closure. The second is an if-let statement. It's not chaotic, you're just unfamiliar with the syntax.
I actually find the Rust syntax very natural, more than C in some areas.
Arrays are initialized with curly braces but the type notation uses brackets.
typedef takes the identifier at the end of the statement.
The asterisk is used to de-reference but used to denote a reference in types.
While loops may take the condition after the block.
Everyone who tells me this, I would love them to write an useful project in Forth. ;)
Guess what? They would blame Forth, Common Lisp, etc., yet I could tell them the same things I am being told about Rust.
Forth and Common Lisp are demanding, but they demand different things of the novice programmer than Rust.
The latter's baroque syntax is not provably good or necessary, but I'd call it the other end of a compromise for safety and expressiveness. And once you learn it, you've learned much of the language by default.
Not so with with the first two! The mental models involved in reasoning about a piece of code require much more cognition on the part of the programmer.
tbh it's bit boring to get stuck on aesthetic of writing up something. It's a bit nauseous to see how this place is hostile to rust for no reason other than "it's not pretty". It's a joke at this point we could make caricature of HN viewpoint on rust. We get it, you don't like it.
HN is not some monolithic single perspective. There are plenty of Rust fans here, and they show up just as frequently as the complainers. In fact, there is overlap between the two groups. You can be a fan of a tech and still complain about its warts.
Oh come on, how many times have we seen "blazingly fast in Rust", "rewritten in Rust" etc.? Many times. We get it. You like Rust. This "rewrite in Rust" bullshit is just as nauseous and I am sick of seeing it, too. So what? In fact, it is quite a meme now.
Please, every time you see someone praise Rust (every other day), tell them the same as you have told me, just the opposite way around.
I find the short type names for integers and float hard to read. Somehow the size of the type is more important than if it is a signed integer, unsigned integer or a floating point number.
Using Vec for arrays is also annoying, repeating the mistake from C++.
> Using Vec for arrays is also annoying, repeating the mistake from C++.
Neither Rust nor C++ uses vectors as array, they're distinct things.
An array is a fixed-size object, which never allocates and its elements thus always retain their memory address. In Rust they're `[T; N]`, in C++ `T[N]` or more recently `std::array<T, N>`.
A vector on the other hand is dynamically sized object, that may (re)allocate to accomodate new elements, and the address of their elements can change. In Rust they're `Vec<T>`; in C++ `std::vector<T>`.
I'm aware of how they are used, but fundamentally there is nothing with the words "array" and "vector" that says that one has a fixed size and the other has a dynamic size. If anything, it should be the other way around. Using the name vector for dynamic arrays is annoying when doing any kind of math coding. Even the designer of STL says the name was a mistake.
I genuinely don't understand how you can find the type names for numbers hard to read, nor how you can say the size is treated as more important. The first thing in the type is what kind of number it is!
For me it is the usage of macros and traits everywhere.
Good luck if you want to get into the code of a library to understand what a function does. You have to go through 3 macros and 5 traits across 5 files. When it could have been just a couple function calls.
People don’t stop and think if they really need that trait or macro for five seconds, they just have to use it every time
That is poor practice IMO. The simple reality of the world is that not everyone uses such tools. Designing a language with the expectation they will is deliberately ignoring the needs of some users.
The insert function is very unidiomatic. Instead of defining a cmp closure, you would typically implement Ord and friends.
In the branchy function, id is only compared if distance is equal, and since distance is a random float, this almost never happens and the corresponding branch is nearly perfectly predicted. The branchless function always compares both id and distance, effectively doing twice the work. It's only part of the reason why there's a 2x performance difference, but I thought it was interesting.
I really don't know how LLVM picks between branches or conditional moves, but my guess is that it doesn't assume that float equality is any less likely than other conditions, and some optimization pass in O3 turns unpredictable branches into conditional moves. I base this on the fact that adding std::hint::unlikely to the "equal" branch produces the same assembly for the function in both modes.
https://godbolt.org/z/erGPKaPcx
Whether it's safe to assume in general that float equality is unlikely for the purpose of program optimization, I'll leave to the compiler engineers. If you know the data your program will be handling, adding hints will avoid these surprises.