Comment by JohnMakin

Comment by JohnMakin 2 days ago

31 replies

One of the crazier things a L4 meta colleague of mine told me, that I still don’t believe entirely, is that meta pretty much has their own fork of everything, even tools like git. is this true?

tqi 2 days ago

Facebook actually doesn't use git, they use mercurial (https://graphite.dev/blog/why-facebook-doesnt-use-git).

That decision is also illustrative of why they end up forking most things - Facebook's usage patterns at the far extreme end for almost any tool, and things thats are non-issues with fewer engineers or a smaller codebase become complete blockers.

  • kridsdale3 2 days ago

    Yes when I used to talk about this to interviewees, I described that every tool people commonly use is somewhere on the Big-O curves for scaling. Most of the time we don't really care if a tool is O(n) or O(10 n) or whatever.

    At Meta, N tends to be hundreds of billions to hundreds of trillions.

    So your algorithm REALLY matters. And git has a Big-O that is worse than Mercurial, so we had to switch.

    • steventhedev a day ago

      I'm gonna disagree with you there. The difference was with stat patterns, and the person at Facebook who ran the tests had something wrong with the disk setup that was causing it to run slowly. They ignored multiple responses that reproduced very different results.

      Nail in the coffin on this was a benchmark GitHub ran two years ago that got the results that FB should have: git status within seconds.

      Facebook didn't use mercurial because of big O, they used it because of hubris and a bad disk config.

      • sangnoir a day ago

        > Facebook didn't use mercurial because of big O, they used it because of hubris and a bad disk config.

        Half-remembering a blog post I read - the git maintainers also wouldn't give Facebook the time of day on code changes to accommodate FBs requirements. Mercurial was more amenable. This also disproves the "Facebook has a fork of evertyhing, because the attempted to upstream the changes they wanted)

      • deadmutex a day ago

        This sounds plausible, but would love a source

        • steventhedev a day ago

          I should probably just write it up into a post, but the git mailing list at the time is the source (I remember reading it from the side a few months after convincing our VP R&D to switch from svn to git). We were chuckling around the same time that FB had to reallocate the stack on Galaxy S2 phones because they were somehow unaware of proguard or unable to have it work properly with their codegen.

          Anyways:

          1. Github benchmark: https://github.blog/engineering/infrastructure/improve-git-m...

          2. The original email thread: https://public-inbox.org/git/CB04005C.2C669%25joshua.redston...

          3. There's another email thread that gets linked everywhere - but in light of the prior thread, the numbers don't track: https://public-inbox.org/git/CB5074CF.3AD7A%25joshua.redston...

          I recall there being a message from someone either at AirBnB or Uber who mentioned that they have a similar monorepo but without the slow git status, but can't seem to find it now - it's likely on one of the other mailing list archives but didn't make it to this one.

          Point being that painting this as "the community was hostile" or "git is too slow for FB" is just disingenuous. The FB engineer barely communicated with the git team (at least publicly) and when there was communication, it was pushing a single benchmark that was deeply flawed, and then ignoring feedback on how to both improve the performance of slow blame, commit by repacking checkpoint packfiles (a one-off effort) and also ignoring feedback that the benchmark numbers didn't make sense in absolute terms.

    • [removed] 2 days ago
      [deleted]
    • master_crab a day ago

      If git is blocking you, you are using it wrong. Lotta instances of people treating it as an artifact repository. Use it correctly with a branching strategy that works for your use case and it's bulletproof.

      Plenty of other customers with the same magnitude problems as Meta are using Git perfectly fine.

      • quicklime a day ago

        Who are the others with the same magnitude as Google and Meta’s monorepos?

      • KaiserPro a day ago

        > Plenty of other customers with the same magnitude problems as Meta are using Git perfectly fine.

        I mean there aren't. there are perhaps three places that have the same scale problem.

        A monorepo for a place with about 50k developers, that has been operating at that scale for 5 years.

        The current checkout if not sparse would be >80gigs

        The commit rate is > 20 a second.

        no amount of branching strategy is going to help on that.

        I love git, I used it professionally since 2010, but git is not a good fit for something _massive_

  • LarsDu88 2 days ago

    They use sapling. An in-house clone of mercurial that was open sourced 2 years ago

  • herval a day ago

    FB uses mercurial _for most things_, but like any company that size, there's teams that use git and even teams that use perforce

ipsum2 2 days ago

Yep. Zeus is a fork of Zookeeper, Hack is a fork of PHP, etc. It's usually needed to make it work with the internal environment.

The few things that don't have forks are usually the open source projects like React or PyTorch, but even those have some custom features added to make it work with FB internals.

  • gcr 2 days ago

    This is also how things work at Google.

    Google also maintains a monorepo with "forks" of all software that they use. History diverges, but is occasionally synchronized for things like security updates etc.

    • zhengyi13 2 days ago

      Am I completely off-base/confused thinking that the GFE originally started life (like back under csilver) as a fork of boa[0]?

      [0]: http://www.boa.org/

      • lacker 2 days ago

        I thought it was GWS that originally started as a fork of boa.

  • grantsucceeded 2 days ago

    Few companies experienced the explosive growth fb did, though many will claim to have done so. Hack made the existing codebase of php scale to insane levels while reaching escape velocity for the overall company to even attempt to transition away or shrink the php codebase, as i recall (i was an SRE, not a dev)

    zeus likewise.

    • ipsum2 2 days ago

      You worked at FB, but you call yourself an SRE, not a PE? ;)

  • ahupp a day ago

    nit: HHVM was a completely new implementation of a runtime for a PHP-like language, it wasn't a fork of Zend.

jamra 2 days ago

Meta doesn't use git. It uses mercurial. It does fork it because they have a huge monorepo. They created a concept of stacked commits which is a way of not having branches. Each commit is in a stack and then merged into master. Lots of things built for scaling.

sdenton4 2 days ago

It wouldn't be terribly surprising. Forking everything provides a liiiitle bit of protection against things like the 'left pad' incident.

  • [removed] 2 days ago
    [deleted]
  • 3eb7988a1663 2 days ago

    Left pad was from the creator pulling the code from the public source forge, not from a destructive code change.

    I assume all of the big tech companies host internal mirrors of every single code dependency + tooling. Otherwise they could not guarantee that they can build all of their code.