Bento: Jupyter Notebooks at Meta

(engineering.fb.com)

218 points by Maro a year ago

sk11001 a year ago

I kind of love Meta for all the seemingly unnecessary internal stuff they do. They have so many projects that are absolutely not critical for them, maybe not even net positive, but they spend who knows how many hours building and maintaining them.

Reply View 10 replies

apwell23 a year ago

> Meta for all the seemingly unnecessary internal stuff they do.
Netflix would like to have a word.

Reply View | 8 replies
- Narhem a year ago
  
  Netflix’s situation is caused by their business model.
  
  Reply View | 7 replies
  
  fwip a year ago
  
  Is it? It seems like 90% of what Netflix is (from a technical PoV), is a CDN + video playback. There's a lot more value in the content library they've negotiated and the business agreements with ISPs than there is in the software stack.
  Apologies if this response is delayed, 6 posts today is "too fast."
  
  Reply View | 6 replies
bbor a year ago

Internal startups have the same value proposition as external ones, I think; most fail, but every once in a while you hit a React or a Gmail.

Reply View | 0 replies

talles a year ago

Tanya Rai - Introducing Bento: Jupyter Notebooks @ Facebook | JupyterCon 2020 : https://www.youtube.com/watch?v=f3UfVX4_PD4

Reply View 0 replies

fauria a year ago

Can this be downloaded somewhere?

Couldn't find any link in the open source site: https://opensource.fb.com/ nor the ELI5: https://developers.facebook.com/blog/post/2021/09/20/eli5-be...

Reply View 8 replies

tqi a year ago

TBH the value of bento over other notebook offerings was almost entirely how well it plays with the rest of the data and infra stack within facebook. It was super easy to go from raw data (entire DE and DI orgs responsible for ETL and cluster maintenance) to a cleaned up table (usually built by DEs) to an ad hoc table to support a specific use-case that could then be accessed via bento, analyzed, and then published / shared to anyone in the company.

Reply View | 0 replies
jamra a year ago

If you use jupyterlite, you're using the same thing. Bento is just the internal Meta version and the only potential benefits is the internal integration.

Reply View | 0 replies
ipsum2 a year ago

Probably not. It's written in Hack, and heavily tied to internal frameworks, so it'll be practically impossible to extract into a standalone package, unless they do a "clean room" implementation (like they did for Sapling UI https://sapling-scm.com/docs/addons/isl/).
But it has some cool features that notebook developers can take inspiration from.

Reply View | 0 replies
michaelmior a year ago

I don't believe Bento has been open-sourced.

Reply View | 4 replies
- make3 a year ago
  
  interesting that they make external articles about it
  
  Reply View | 3 replies
  
  rovr138 a year ago
  
  "Oh that's cool.", "It'd be interesting to work on problems like that.", "That's a neat solution"
  If anyone's on the fence about applying, that could be enough to nudge them in the direction. If anyone's worked in similar areas, could be worth applying and looking at the team, etc.
  
  Reply View | 2 replies

quantadev a year ago

The original "Block Editor" (that Jupyter modeled itself after) is the one that's now called "Quanta", and has been around for decades in various forms and incantations:

https://github.com/Clay-Ferguson/quantizr

I'm thinking that Jupyter might still not be "Tree Based" but that would be a heck of a leap in capability if they "fix" that.

Reply View 3 replies

chthonicdaemon a year ago

I always thought Jupyter was based on other notebook-style interfaces like Mathematica or Maple.

Reply View | 2 replies
- quantadev a year ago
  
  I meant the "block editor" aspect, like how individual chunks of text and images can be independently selected and moved around or even shared with their own URL.
  I've long believed some system like that could and should some day replace even HTML and the web, and that it'll only happen if the Semantic Web ever takes off in a big way where chunks of stuff are "typed" (like a Type-Safe Web). Even Tim Berners Lee has been dreaming of this for decades, but the world is still stuck in HTML-land for the foreseeable future.
  
  Reply View | 0 replies
- mkl a year ago
  
  Yes, but it's closer to Sage (browser and Python based).
  I don't know what quantadev is thinking of, but Quanta seems totally different and not a programming notebook at all. Its README also claims "Quanta is a new kind of platform with a new kind of architecture", while quantadev claims it "has been around for decades".
  
  Reply View | 0 replies

tantalor a year ago

Glad to see people using the term "serverless" to mean "actually without a server" instead of what other places are doing.

Reply View 0 replies

talles a year ago

I wish more people used marimo, so much better than jupyter

Reply View 2 replies

akshayka a year ago

For the curious: https://github.com/marimo-team/marimo

Reply View | 0 replies
mkl a year ago

Show HN from 8 months ago: https://news.ycombinator.com/item?id=38971966

Reply View | 0 replies

mhh__ a year ago

I've been using Marimo along these lines recently. I'm fan. So so glad to not use Jupyter.

Reply View 0 replies

woadwarrior01 a year ago

Having used both Bento and later Colaboratory, a few years ago, I think I liked the latter a lot more. Google's internal tools are usually much more polished and better-designed, perhaps because they've been around for much longer.

Reply View 0 replies

Fraterkes a year ago

A bit off-topic, but my problem with any notebook type of tool (ie you create a document that mixes code, the output of that code, and text/media) is that they always feel like they're meant to be these quick, off the cuff ways to present data. But when I try to use them they just feel awkward and slow. (I tried doing a jupyter notebook with the vscode plugin, and while everything was very polished, it feld like I was ponderously coding in Word or something. The same was true for R-notebooks in rstudio. Maybe it's a better experience if you have a decently fast laptop)

Reply View 14 replies

taeric a year ago

I'm assuming you've seen https://www.youtube.com/watch?v=7jiPeIFXb6U&t=61s? I know I found it far more amusing than I should have when it was released.
I will confess that I found Mathematica kind of neat back in the day. I never got as good with it as peers did. I'm curious if that would be different for me today.

Reply View | 1 reply
- 3eb7988a1663 a year ago
  
  That video cannot be seen without watching Jeremy Howard's rebuttal: I Like Notebooks. I also believe this was the video that got him kicked out of a conference(?) because it was too confrontational? Which was just ugly for a guy who clearly loves being an educator.
  [0] https://www.youtube.com/watch?v=9Q6sLbz37gk
  
  Reply View | 0 replies
lamename a year ago

IME notebooks in VS Code are even worse (but improving). Jupyter lab is faster...but that depends on how fast you prefer ;)

Reply View | 5 replies
- wenc a year ago
  
  I have the exact opposite experience — VS Code notebooks are much snappier and are possibly the best Jupyter implementations I’ve ever used (better and more responsive than vanilla Jupyter or Jupyter labs).
  VS code notebooks also support LSPs with refactoring, typing etc. Black is supported. Step by step debugging is supported. Venv is built in.
  There are so many conveniennces in VS Code that whenever I have to use Jupyter Lab I feel a lot of stuff is missing.
  
  Reply View | 4 replies
  
  3eb7988a1663 a year ago
  
  I agree with you that the VSCode experience feels superior. It integrates a lot of the other various IDE widgets into the notebook experience. Code formatting, variable definitions, spell checker, non-garbage tier code hints, etc. The little timer noting the time it takes to run a cell alone is a huge boon.
  My only complaint is how white space heavy the VSCode layout is by default. Probably can be customized, but I have never dug into it.
  
  Reply View | 0 replies
  
  [removed] a year ago
  
  [deleted]
  
  Reply View | 0 replies
  
  adolph a year ago
  
  Killer feature of VS Code notebooks is Vim keybindings. It also manages movement between cells, so you have to be very aware of the current mode.
  
  Reply View | 1 reply
  
  dmurray a year ago
  
  Hitting Escape in normal mode takes you out of editing the cell and into "notebook manipulation mode" instead. This is so counter to the way Vim normally works - Esc should leave you in normal mode no matter where you started - that I found it almost unusable until I realised I could just remap that binding. I made it Shift-Esc and am very happy with it now.
  
  Reply View | 0 replies
Fraterkes a year ago

Also I always think it's a littly sad that Jupyter was one of the best shots for Julia to get more mainstream attention, and instead the notebooks people write are basically exclusively python

Reply View | 1 reply
- paddy_m a year ago
  
  Also the Julia people wrote their own notebook system called Pluto. Which is so on brand for them. It might be technically better, but they miss out on the whole jupyter ecosystem, further isolating the language.
  
  Reply View | 0 replies
wenc a year ago

Sounds like you’ve diagnosed your issue in the last line.
Notebooks are usually not inherently slow — I use Jupyter in VS Code running off a remote server and it’s snappy.
I have a MacBook Pro 2020.

Reply View | 0 replies
bsimpson a year ago

I've only used them in Colab, which feels a lot like a Codepen. It's a self-contained scratchpad that's easily linkable to send to others.

Reply View | 0 replies
zeofig a year ago

I have to admit that I hate them and view them as abominations. But that's just my personal opinion.

Reply View | 0 replies
[removed] a year ago

[deleted]

Reply View | 0 replies

[removed] a year ago

[deleted]

Reply View 0 replies

bsimpson a year ago

I love that notebooks started as a student hacking together a Python fork and now they're core infrastructure for all these places trying to make sense of GenAI.

Reply View 0 replies

[removed] a year ago

[deleted]

Reply View 0 replies

kyrrewk a year ago

this is cool! wish there was a commerical product that did this. marimo does something similar, but you have to do the deployment yourself

Reply View 1 reply

mscolnick a year ago

marimo has a playground to run notebooks via WebAssembly - similar to Bento - without having to deploy yourself: https://marimo.app/

Reply View | 0 replies

01222685480 a year ago

Fady11857@gmail.com 01222685480 F

Reply View 0 replies

big-chungus4 a year ago

can I, a mere mortal, use it?

Reply View 0 replies

web3aj a year ago

The internal tools at Meta are incredible tbh. There’s an ecosystem of well-designed internal tools that talk to each other. That was my favorite part of working there.

Reply View 77 replies

Random_BSD_Geek a year ago

Polar opposite of my experience. To achieve the technical equivalent of changing a lightbulb, spend the entire day wrangling a dozen tools which are broken in different ways, maintained by teams that no longer exist or have completely rolled over, only to arrive at the finish line and discover we don't use those lightbulbs anymore. Move things and break fast.

Reply View | 25 replies
- loeg a year ago
  
  IMO there's a mix of a few really good, widely used, well-supported tools as well as a long tail of random tiny tools where the original team is gone that are cruftier.
  
  Reply View | 0 replies
- extr a year ago
  
  Yeah 100%. I found it immensely frustrating to be using tools with no community (except internally), so-so documentation, and features that were clearly broken in a way that would be unacceptable for a regular consumer product. If you have a question or error not covered by an internal search or documentation, good luck, you'll need it. Literally part of the reason I left the company.
  
  Reply View | 16 replies
  
  landedgentry a year ago
  
  Well, you're supposed to read the code and figure it out. And if you can't, you're not good enough an engineer. According to people at Meta.
  
  Reply View | 14 replies
  
  zer0zzz a year ago
  
  Agreed. I often get my work done using open source build instructions and tools and then when everything works I port it to internal infra. Other people are the opposite though, which for open source based code bases has a nasty side effect of the work having no upstream able tests!
  
  Reply View | 0 replies
- uuddlrlrbaba a year ago
  
  Mmm breakfast
  
  Reply View | 2 replies
  
  grantsucceeded a year ago
  
  haha the reason I stayed as long as i did
  
  Reply View | 1 reply
  
  [removed] a year ago
  
  [deleted]
  
  Reply View | 0 replies
- aprilthird2021 a year ago
  
  But you're both talking about different things. The tools are both often left in disuse, lacking documentation, etc. But they also have a really tight integration with each other that allows for unparalleled visibility and ability over enormous systems with many moving parts.
  
  Reply View | 0 replies
- bozhark a year ago
  
  Move Smooth and Fix Things (tm) is our nonprofit corporation’s version of this atrocious motto.
  
  Reply View | 0 replies
- ElonChrist a year ago
  
  [dead]
  
  Reply View | 1 reply
  
  ec109685 a year ago
  
  Large checkouts is a solved problem now https://github.com/facebook/sapling/blob/main/eden/fs/docs/O...
  
  Reply View | 0 replies
moandcompany a year ago

My opinion: Many Meta tools and processes seem like they were created by former Googlers that sought to recreate something they previously had at Google, during the Google->FB Exodus, but also changed aspects of the tool that were annoying or diverged from their needs. This is not a bad thing.
Since Bento doesn't appear to be usable by the public, aparallel version of this that people can get a feel for cross-tool integration would be Google's Colaboratory / Colab notebooks (https://colab.research.google.com/) that have many baked-in integrations driven by actual internal use (i.e. dogfooding).

Reply View | 2 replies
- kridsdale3 a year ago
  
  As someone from both, I confirm/support your opinion 100%.
  
  Reply View | 0 replies
- mark_l_watson a year ago
  
  I agree, the paid for Pro version of Colab just seems to have the features I need. I often use it because it simply saves me time and hassles.
  
  Reply View | 0 replies
KaiserPro a year ago

You and I must be working in different areas.
For any kind of general Python/C++ work, its a _massive_ pain.
The integrated debugger rarely works, and its a 30 minute recompile to figure that out. The documentation for actually being efficient in build/run/test is basically "ask the old guy in the corner". You'd best hope they know and are willing to share.
The code search is great! The downside is that nobody bothers to document stuff, so thats all you've got. (comments/docstrings are for weaklings apparently)
You want to use a common third party library? You'd best hope its already ingested, otherwise you're going to be spending the next few days trying to get that into the codebase. (yes there are auto tools, no they don't always work.) Also, you're now on the hook to do security upgrades.

Reply View | 0 replies
JohnMakin a year ago

One of the crazier things a L4 meta colleague of mine told me, that I still don’t believe entirely, is that meta pretty much has their own fork of everything, even tools like git. is this true?

Reply View | 32 replies
- tqi a year ago
  
  Facebook actually doesn't use git, they use mercurial (https://graphite.dev/blog/why-facebook-doesnt-use-git).
  That decision is also illustrative of why they end up forking most things - Facebook's usage patterns at the far extreme end for almost any tool, and things thats are non-issues with fewer engineers or a smaller codebase become complete blockers.
  
  Reply View | 15 replies
  
  kridsdale3 a year ago
  
  Yes when I used to talk about this to interviewees, I described that every tool people commonly use is somewhere on the Big-O curves for scaling. Most of the time we don't really care if a tool is O(n) or O(10 n) or whatever.
  At Meta, N tends to be hundreds of billions to hundreds of trillions.
  So your algorithm REALLY matters. And git has a Big-O that is worse than Mercurial, so we had to switch.
  
  Reply View | 12 replies
  
  LarsDu88 a year ago
  
  They use sapling. An in-house clone of mercurial that was open sourced 2 years ago
  
  Reply View | 0 replies
  
  herval a year ago
  
  FB uses mercurial _for most things_, but like any company that size, there's teams that use git and even teams that use perforce
  
  Reply View | 0 replies
- ipsum2 a year ago
  
  Yep. Zeus is a fork of Zookeeper, Hack is a fork of PHP, etc. It's usually needed to make it work with the internal environment.
  The few things that don't have forks are usually the open source projects like React or PyTorch, but even those have some custom features added to make it work with FB internals.
  
  Reply View | 11 replies
  
  gcr a year ago
  
  This is also how things work at Google.
  Google also maintains a monorepo with "forks" of all software that they use. History diverges, but is occasionally synchronized for things like security updates etc.
  
  Reply View | 3 replies
  
  grantsucceeded a year ago
  
  Few companies experienced the explosive growth fb did, though many will claim to have done so. Hack made the existing codebase of php scale to insane levels while reaching escape velocity for the overall company to even attempt to transition away or shrink the php codebase, as i recall (i was an SRE, not a dev)
  zeus likewise.
  
  Reply View | 5 replies
  
  ahupp a year ago
  
  nit: HHVM was a completely new implementation of a runtime for a PHP-like language, it wasn't a fork of Zend.
  
  Reply View | 0 replies
- jamra a year ago
  
  Meta doesn't use git. It uses mercurial. It does fork it because they have a huge monorepo. They created a concept of stacked commits which is a way of not having branches. Each commit is in a stack and then merged into master. Lots of things built for scaling.
  
  Reply View | 0 replies
- sdenton4 a year ago
  
  It wouldn't be terribly surprising. Forking everything provides a liiiitle bit of protection against things like the 'left pad' incident.
  
  Reply View | 2 replies
  
  [removed] a year ago
  
  [deleted]
  
  Reply View | 0 replies
  
  3eb7988a1663 a year ago
  
  Left pad was from the creator pulling the code from the public source forge, not from a destructive code change.
  I assume all of the big tech companies host internal mirrors of every single code dependency + tooling. Otherwise they could not guarantee that they can build all of their code.
  
  Reply View | 0 replies
crabbone a year ago

A friend of mine is doing his PHD while being an intern at Meta. He does not share your excitement... at all. To summarize his complaints: a framework written a long while ago with design flaws that were cast in stone, that requires exorbitant effort to accomplish simple things (under the pretense of global integration that usually isn't needed, but even if was needed, would still not work).

Reply View | 8 replies
- almostgotcaught a year ago
  
  > A friend of mine is doing his PHD while being an intern at Meta
  I interned thrice as phd student at FB. your friend isn't entirely wrong but also just doesn't have enough experience to judge. all enormous companies are like this. FB is far and away better than almost all such companies (probably only with the exception of Google/Netflix).
  
  Reply View | 3 replies
  
  jonathanyc a year ago
  
  Agreed. I'm reading some complaints in the thread about being told to "just read the source code" for internal tools at Meta. When I worked at Apple we didn't even get the source code!
  
  Reply View | 0 replies
  
  crabbone a year ago
  
  I don't see why saying that Facebook's tools are bad should be invalidated by saying that Google's or others' tools are bad too. Google being bad doesn't vindicate or improve Facebook tools. There's no need for perspective: if it doesn't work well for what's it designed to do, then that's all there is to it.
  
  Reply View | 1 reply
  
  almostgotcaught a year ago
  
  > Google's or others' tools are bad too
  lol bruh read my response again - FB's and Google's and Amazon's tool are lightyears ahead of #ARBITRARY_F100_COMPANY. you haven't a clue what "bad" means if you've never worked in a place that has > 1000 engineers.
  
  Reply View | 0 replies
- sangnoir a year ago
  
  How long has he been interning? Is it long enough for him to have learned how long the timescale big-tech roadmaps operate on? If he wants a feature, he better write it himself (if his PR doesn't conflict with an upcoming rewrite, coming "soon"), or lobby to get it slotted for the second quarter of 2026.
  
  Reply View | 2 replies
  
  crabbone a year ago
  
  He started right about the time COVID started, so... about four years now, I think. I'm not sure if those were contiguous though.
  I'm not sure what your idea about PRs and features has to do with the above... he's not there to work on the internal infra framework. He's there for ML stuff. Unfortunately, the road to the later goes through the former, but he's not really a kind of programmer who'd deal with Facebook's infrastructure and plumbing.
  The point is, it's inconvenient. Is it inconvenient because Facebook works on a five-year plan basis or whatever other reason they have for it doesn't really matter. It's just not good.
  I also have no problems admitting that all big companies (two in total, one being Google) I worked for so far had bad internal tools. I don't imagine Facebook is anything special in this respect. I just don't feel like it's necessary to justify it in any way. It's just a fact of life: large companies have a tendency to produce bad internal tools (but small often have none whatsoever!) It's a water is wet kind of thing...
  
  Reply View | 1 reply
  
  sangnoir a year ago
  
  > I'm not sure what your idea about PRs and features has to do with the above... he's not there to work on the internal infra framework.
  My idea is if he's not making the monorepo codebase changes himself, he's going to have to wait for an awfully long time for any non-trivial improvements he'd like because the responsible teams have different priorities sketched out for next calendar year. It's a function of organization size, unless you have the support of someone very high up on the org chart, ICs can't unilaterally adjust another teams priorities.
  
  Reply View | 0 replies
- slt2021 a year ago
  
  how else can you build empire as Engineering Manager and get promo?
  fork open source, then demand resources to maintian this monster.
  easiest promotion + job security.
  its even called "Platform Engineering" these days
  
  Reply View | 0 replies
jchonphoenix a year ago

Meta tools are best in class when the requirement is scale. Or that the external tools haven't matured yet

Reply View | 0 replies
[removed] a year ago

[deleted]

Reply View | 0 replies
Qshdg a year ago

Looking at some of the bureaucracy in their open source projects, I'd say that they need less tooling and more thinking. These tools help to keep spaghetti code bases from imploding totally.

Reply View | 0 replies
baggiponte a year ago

Uuuh can you tell a bit more about wasabi, the Python LSP? Saw a post years ago and been eager to see whether it’d be open sourced (or why it wouldn’t).

Reply View | 0 replies
[removed] a year ago

[deleted]

Reply View | 0 replies

[removed] a year ago

[deleted]

Reply View 0 replies

ryannz a year ago

[dead]

Reply View 0 replies