Chrome's hidden X-Browser-Validation header reverse engineered
(github.com)348 points by dsekz 3 days ago
348 points by dsekz 3 days ago
Making it easier to reject "unapproved" or "unsupported" browsers and take away user freedom. Trying to make it harder for other browsers to compete.
That can be done already based on User-Agent, though. Other browsers don't spoof their agent strings to look like Chrome, and never have (or, they do, but only in the sense that everyone still claims to be Mozilla). And browsers have always (for obvious reasons) been very happy to identify themselves correctly to backend sites.
The purpose here is surely to detect sophisticated spoofing by non-user-browser software, like crawlers and robots. Robots are in fact required by the net's Geneva Convention equivalent to identify themselves and respect limitations, but obviously many don't.
I have a hard time understanding robot detection as an issue of "user freedom" or "browser competition".
>I have a hard time understanding robot detection as an issue of "user freedom" or "browser competition".
The big one is that running a browser other than Chrome (or Safari) could come to mean endless captchas, degrading the experience. "Chrome doesn't have as many captchas" is a pretty good hook.
> I have a hard time understanding robot detection as an issue of "user freedom" or "browser competition".
In the name of robot detection, you can lock down device, require device attestation, prevent users from running non-standard devices/OS/software, prevent them from accessing websites (CloudFlare dislikes non-chrome browser and hates non-standard browsers, ReCaptcha blocks you out if you're not on Chrome-like/Safari/Firefox). Web Environment Integrity[1] is also a good example of where robot detection ends up affecting the end user.
The purpose here isn't to deal with sophisticated spoofing. This is setting a couple of headers to fixed and easily discoverable values. It wouldn't stop a teenager with Curl, let along a sophisticated adversary. There's no counter-abuse value here at all.
It's quite hard to figure out what this is for, because the mechanism is so incredibly weak. Either it was implemented by some total idiots who did not bother talking at all to the thousands of people with counter-abuse experience that work at Google, or it is meant for some incredibly specific case where they think the copyright string actually provides a deterrent.
(If I had to guess, it's about protecting server APIs only meant for use by the Chrome browser, not about protecting any kind of interactive services used directly by end-users.)
I would imagine that this serves the same purpose as the way that early home consoles would check the inserted cartridge to see that it had a specific copyright message in it, because then you can't reproduce that message without violating the copyright.
In this case, you would need to reproduce a message that explicitly states that it's Google's copyright, and that you don't have the right to copy it ("All rights reserved."). Doing that might then give Google the legal evidence it needs to sue you.
In other words, a legal deterrence rather than a technical one.
It's easy to change the User Agent and we cannot handwave this fact away for the sake of argument.
> Why do you think Chrome bothers with this extra headers. Anti-spoofing, bot detection, integrity or something else?
Bot detection. It's a menace to literally everyone. Not to piss anyone off, but if you haven't dealt with it, you don't have anything of value to scrape or get access to.
> Bot detection. It's a menace to literally everyone. Not to piss anyone off, but if you haven't dealt with it, you don't have anything of value to scrape or get access to.
What leads you to believe that bit developers are unable to set a request header?
They managed fine to set Chrome's user agent. Why do you think something like X-Browser-Validation is off limits?
Because you would need to reproduce an explicit Google copyright statement which states that you don't have the right to copy it ("All rights reserved.") in order to do it fully.
That presumably gives Google the legal ammunition it needs to sue you if you do it.
Bullshit. You don't have anything of value either. Scrapers will ram through _anything_, and figure out if it's useful later.
>1. Do I understand it correctly and the validation header is individual for each installation?
I'm not sure how you got that impression. It's generated from fixed constants.
https://github.com/dsekz/chrome-x-browser-validation-header?...
This should be somewhat alarming to anyone who already knows about WEI.
I wonder if "x-browser-copyright" is an attempt at trying to use the legal system to stifle competition and further their monopoly. If so, have they not heard of Sega v. Accolade ?
I'm a bit amused that they're using SHA-1. Why not MD5, CRC32, or (as the dumb security scanners would recommend) even SHA256?
I am also alarmed. Google has to split off its development of both Chrome and Android now, this crazy vertical integration is akin to a private company building and owning both the roads AND the cars. Sure, you can build other cars, but we just need to verify that your tires are safe before you can drive on OUR roads. It's fine as long as you build your car on our complete frame, you can still choose whatever color you like! Also, the car has ads.
Ok but The Road is the internet, how much of that does google/alphabet actually own?
All of YouTube. The vast majority of email. All sources of revenue for ad-funded sites, basically, except for those ads pushed by Meta in their respective walled gardens. They are also the gatekeepers deciding what parts of the internet the users actually see, and they continuously work towards preventing people from actually visiting other sites by siphoning off information and keeping users on Google (AMP, AI summaries). The whole Play Store ecosystem is a walled garden which pretends to be open by building on an ostensibly open source OS but adding strict integrity checks on top which gives Google the ultimate power to decide what is allowed to run on peoples phones.
They don't have to own the servers and the pipes if they own all the clients, sources of revenue, distribution platforms and financial transaction systems.
> how much of that does google/alphabet actually own?
A ton. They got shares in a bunch of submarine cables, their properties (YouTube, Maps, Google Search) make up a wide share of Internet traffic, they are via Google Search the chief traffic source for most if not all websites, they own a large CDN as well as one of the three dominant hyperscalers...
> I wonder if "x-browser-copyright" is an attempt at trying to use the legal system to stifle competition and further their monopoly. If so, have they not heard of Sega v. Accolade ?
My first thought was the Nintendo logo used for Gameboy game attestation.
I wonder what a court would make of the copyright header. What original work is copyright being claimed for here? The HTTP request? If I used Chrome to POST this comment, would Google be claiming copyright over the POST request?
SHA-1 is a head-scratcher for sure.
I can only assume it's the flawed logic that it's "reasonably secure, but shorter than sha256". Flawed because SHA1 is broken, and SHA256 is faster on most hardware, and you can just truncate your SHA256 output if you really want it to be shorter.
SHA-1 is broken for being used in digital signature algorithms or for any other application that requires collision resistance.
There are a lot of applications for which collision resistance is irrelevant and for which the use of SHA-1 is fine, for instance in some random number generators.
On the CPUs where I have tested this (with hardware instructions for both hashes, e.g. some Ryzen and some Aarch64), SHA-1 is faster than SHA-256, though the difference is not great.
In this case, collision resistance appears irrelevant. There is no point in finding other strings that will produce the same validation hash. The correct input strings can be obtained by reverse engineering anyway, which has been done by the author. Here the hash was used just for slight obfuscation.
The perf difference between SHA1 and SHA256 was marginal on the systems I tested (3950x, M1 Pro), which makes SHA256 a no-brainer to me if you're just picking between those two (collision resistance is nice to have even if you "don't need it").
You're right that collision resistance doesn't really matter here, but there's a fair chance SHA1 will end up deprecated or removed from whatever cryptography library you're using for it, at some point in the future.
WEI? As in Windows Experience Index? Can you elaborate?
Web Environment Integrity: https://en.wikipedia.org/wiki/Web_Environment_Integrity
Probably any cryptographic hash function would have done.
My suspicion is that what they're trying to do here is similar to e.g. the "Readium LCP" DRM for ebooks (previously discussed at [1]): A "secret key" and a "proprietary algorithm" might possibly bring this into DMCA scope in a way that using only a copyrighted string might not.
> have they not heard of Sega v. Accolade ?
My mind went here immediately as well, but some details are subtly different. For example being a remote service instead of a locally-executed copy of software, Google could argue that they are materially relying on such representation to provide any service at all. Or that without access to the service's code, someone cannot prove this string is required in order to interoperate. It also wouldn't be the first time the current Supreme Court took advantage of slightly differing details as an excuse to reject longstanding precedent in favor of fascism.
I have to imagine Google added these headers to make it easier for them to identify agentic requests vs human requests. What angers me is that this is yet another signal that can be used to uniquely fingerprint users.
How does that work, though? I have a bunch of automated tasks I use to speed up my workflows, but they all run on top of the regular browser that I also use. I don't see how this war is winnable? (not without tracking things like micro-movements of the mouse that might be caused by being a human etc)
It doesn't really meaningfully increase the fingerprinting surface. As the OP mentioned the hash is generated from constants that are the same for all chrome builds. The only thing it really does is help distinguish chrome from other chromium forks (eg. edge or brave), but there's already enough proprietary bits inside chrome that you can easily tell it apart.
> The only thing it really does is help distinguish chrome from other chromium forks (eg. edge or brave)
You could already do that with the user agent string. What this does is distinguishes between chrome and something else pretending to be chrome. Like say a firefox user who is spoofing a chrome user agent on a site that blocks, or reduces functionality for the firefox user agent.
Plenty of bots pretend to be Chrome via user agent, but if you look closely are actually running Headless Chromium. This is a very useful signal for fraud and abuse prevention.
I spoof User Agent, TLS/browser fingerprinting all day. These are the basics. None of this bothers me tbh, I'm constantly running tests on lots of versions chrome, firefox and brave and haven't really seen any impact in bot detection. I do a lot of browser emulation of other browsers in Chrome. PermiterX/Human seems to be the only WAF that is really good about catching this.
User-agent discrimination has been happening for literally decades at this point, but you're right that this could make things worse.
User-agent discrimination is tolerable when it's Joe Webmaster doing it out of ignorance. It is not acceptable if it is being used by a company leveraging their dominant position in one market to gain an advantage over its competitors in another market. It's not acceptable even if it's not said company's expressed intent to do so but merely a "happy accident" that is getting "overlooked".
Indeed, even for those who require a round of mental gymnastics before they concede that monopolies are, like, "bad" or whatever, GP points out precisely how this would constitute "consumer harm".
Seems unnecessary.
The same policies also offer the ability to force-install an official Google "Endpoint Verification" chrome extension which validates browser/OS integrity using Enterprise Chrome Extension APIs ("chrome.enterprise") [0] only available in force-installed enterprise extensions.
FWIW, in my years of managing enterprise chrome deployments, I haven't come across the feature to force people to use Chrome (there are a lot of settings, maybe I've missed this one). But, there definitely is the ability to prevent users from mixing their work and non-work gmail accounts in the same chrome profile.
[0] https://developer.chrome.com/docs/extensions/reference/api/e...
Edit: Okay, maybe one hole in my logic is the first-sign in experience. When signing into google for the first time in a new chrome browser, the force-installed extension wouldn't be there yet. Although Google could hypothetically still allow the login initially, but then abort/cancel the sign in process as part of the login flow if the extension doesn't sync and install (indicating non-chrome use).
This might be their “context aware” security feature. Which can prevent access to certain things based on device, browser, etc.
I don’t see why any of that can’t rely on a chrome extension implementation using the privileged APIs to verify OS, Browser, etc. Struggling to understand why they need special headers for any of this functionality.
Why would they think this was a good idea after losing the chrome anti-trust trial? I don't know the intended purpose is for this, but I can see several ways this could be used anti-competitive way, although now it has been reverse engineered, an extension could spoof it. On the other hand, I wonder if they intend to claim the header is a form of DRM and such spoofing is a DMCA violation...
http://en.wikipedia.org/wiki/Sega_Enterprises_Ltd._v._Accola... is the legal precedent that says trying to do that won't work, but then again maybe Google thinks it's invincible and can do whatever it wants after it ironically defeated Oracle in a case about interoperability and copyright.
Apple famously does this with this word soup in their SMC chips, and proceeded to bankrupt a company that sold Hackintoshes and shipped it in their EFI: https://en.wikipedia.org/wiki/Psystar_Corporation
Our hard work
by these words guarded
please don't steal
(c) Apple Computer Inc
Though one could argue that they would have probably bankrupted them anyway even if they hadn't done that.I think it’s difficult to argue that Google doesn’t have the right and capability to build their own private internet, I just also think they’d like to make the entire internet their own private internet, and do away with the public internet, and I’d really prefer they not do that.
Two questions:
Which version of chrome is the first to implement these headers?
What are the potential effects of these headers on chromium forks, e.g. ungoogled chromium?
Not really. It's just an API key + the user agent. There is no mechanism to detect the browser hasn't been tampered with. If you wanted to do that you'd at least include a hash over the browser binary, or better yet the in-memory application binary.
That would provide not extra capability. Anybody smart enough to modify the chrome executable could just patch the hash generation to also return a static (but correct) hash.
And why should anyone with a sane mind (except for Googlers) allow this kind of validation bs to exist?
At this point I am fully convinced that Google is abusing Chrome's dominant position to push their own agenda and shape the whole Internet the way they want. Privacy sandbox, manifest v3, you name it.
Sadly nobody can do anything about it, so far. We'll yet need to see the outcome of the antitrust trial.
If you were using a user agent spoofing extension couldn't this be used to guess your "real" UA?
It looks like it's an SHA hash, so working backwards would probably be prohibitively irritating.
It's not all that small, although probably small enough to make a rainbow table or something.
You would have to maintain the code to generate character-perfect strings (or maybe just keep a very large library of the current most popular ones) and also make sure you have the up to date API key salt values (which they probably going to start rotating regularly), which–as I said before–wouldn't be impossible, just prohibitively irritating to maintain for comparatively little benefit.
And besides, it won't be too long before people just start spoofing the hash too, probably shorter than getting the generator up and running
Is an "api key" like this covered by copyright? Would that technically mean that spoofing this random sequence of numbers would require me to agree to whatever source license they offer it under, since I wouldn't know the random sequence unless I read it in their source?
That's an odd possibility.
Dug into chrome.dll and figured out how the x-browser-validation header is generated. Full write up and PoC code here: https://github.com/dsekz/chrome-x-browser-validation-header
Why do you think Chrome bothers with this extra headers. Anti-spoofing, bot detection, integrity or something else?