My fast zero-allocation webserver using OxCaml

(anil.recoil.org)

129 points by noelwelsh 13 hours ago

int3trap 5 hours ago

> In the steady state, a webserver would have almost no garbage collector activity

I recently wrote my own zero allocation HTTP server and while the above statement is possible to achieve, at some point you need to make a decision on how you handle pipelined requests that aren't resolved synchronously. Depending on your appetite for memory consumption per connection, this often leads to allocations in the general case, though custom memory pools can alleviate some of the burden.

I didn't see anything in the article about that case specifically, which would of been interesting to hear given it's one of the challenges I've faced.

Reply View 3 replies

avsm 5 hours ago

Good point; I've decided to simply not support HTTP/1.1 pipelines, and to have a connection pooling layer for HTTP/2 instead that takes care of this.
In OxCaml, it has support for the effect system that we added in OCaml 5.0 onwards, which allows for a fiber to suspend itself and be restarted via a one-shot continuation. So it's possible to have a pipelined connection stash away a continuation for a response calculation and be woken up later on when it's ready.
All continuations have to be either discarded explicitly or resumed exactly once; this can lead to memory leaks in OCaml 5, but OxCaml has an emerging lifetime system that guarantees this is safe: https://oxcaml.org/documentation/parallelism/01-intro/ or https://gavinleroy.com/oxcaml-tutorial-icfp25/ for a taste of that. Beware though; it's cutting edge stuff and the interfaces are still emerging, but it's great fun if you don't mind some pretty hardcore ML typing ;-) When it all settles down it should be very ergonomic to use, but right now you do get some interesting type errors.

Reply View | 2 replies
- int3trap 5 hours ago
  
  > So it's possible to have a pipelined connection stash away a continuation for a response calculation and be woken up later on when it's ready.
  Ahh, that's interesting. I think you still run into the issue where you have a case like this:
  1. You get 10 pipelined requests from a single connection with a post body to update some record in a Postgres table.
  2. All 10 requests are independent and can be resolved at the same time, so you should make use of Postgres pipelining and send them all as you receive them.
  3. When finishing the requests, you likely need the information provided in the request object. Lets assume it's a lot of data in the body, to the point where you've reached you per connection buffer limit. You either allocate here to unblock the read, or you block new reads, impacting response latency, until all requests are completed. The allocation is the better choice at that point but that heuristic decision engine with the goal of peak performance is definitely nuanced, if not complicated.
  Its a cool problem space though, so always interested in learning how others attack it.
  
  Reply View | 1 reply
  
  avsm an hour ago
  
  It is a cool problem space! What I'm doing is using a single buffer for body handling (since you dispatch that away and then reuse it for chunked encoding) so it never takes unbounded stack space. This might be a bit different in HTTP/3 where you can have multiple body transmissions multiplexing; I have to look into how this works (but it's UDP as well)
  What we never need to do in OxCaml is to keep a giant body buffer list in the stack; with effects, we can fork the stack any time, so the request object is shared naturally. The only way to free the stack is to return from a function, but you can have a tree of these that share values earlier in the callchain.
  
  Reply View | 0 replies

boltzmann-brain 9 hours ago

it's a massive crime that decades into FP, we still don't have a type system that can infer or constrain the amount of copies and allocations a piece of code has. software would be massively better if it did - unnecessary copies and space leaks are some of the most performance-regressing bugs out there and there simply isn't a natural way of unearthing those.

Reply View 16 replies

avsm 7 hours ago

We do now though, with OxCaml! The local stack allocation mode puts in quite a strong constraint on the shape of the allocations that are possible.
On my TODO list next is to hook up the various O(x)Caml memory profiling tools: we have statmemprof which does statistical sampling, and then the runtime events buffer, and (hopefully) stack activity in OxCaml's case from the compiler.
This provides a pretty good automation loop for a performance optimising coding agent: it can choose between heap vs local, or copy vs reference, or fixed layout (for SIMD) vs fragmentation (for multicore NUMA) depending on the tasks at hand.
Some references:
- Statmemprof in OCaml : https://tarides.com/blog/2025-03-06-feature-parity-series-st...
- "The saga of multicore OCaml" by Ron Minsky about how Jane Street viewed performance optimisation from the launch of OCaml 5.0 to where they are today with OxCaml https://www.youtube.com/watch?v=XGGSPpk1IB0

Reply View | 0 replies
zozbot234 7 hours ago

> infer or constrain the amount of copies and allocations a piece of code has
That's exactly what substructural logic/type systems allows you to do. Affine and linear types are one example of substructural type systems, but you can also go further in limiting moves, exchanges/swaps etc. which helps model scenarios where allocation and deallocation must be made explicit.

Reply View | 5 replies
- NeutralForest 6 hours ago
  
  I don't think it's been integrated in any mainstream language though.
  
  Reply View | 3 replies
  
  johnbender 5 hours ago
  
  https://ghc.gitlab.haskell.org/ghc/doc/users_guide/exts/line...
  Experimental and of course one can debate whether Haskell is mainstream but I figured it merits a mention.
  
  Reply View | 0 replies
  
  DonaldPShimoda 6 hours ago
  
  I think by default Rust uses affine types, but that's about the extent of it.
  
  Reply View | 1 reply
  
  NeutralForest 6 hours ago
  
  I know some research languages are playing around with linear types, I wonder if we'll see it show up in some language or another.
  
  Reply View | 0 replies
- boltzmann-brain 5 hours ago
  
  do they allow the following? https://news.ycombinator.com/item?id=46859860
  
  Reply View | 0 replies
AlotOfReading 9 hours ago

Allocations and copies are one of the things substructural typing formalizes. It's how E.g. Rust essentially eliminates implicit copies.

Reply View | 4 replies
- whatis991 6 hours ago
  
  I think I've heard of Rust devs complaining about moves having implicit bitwise copies that were not optimized away.
  
  Reply View | 3 replies
  
  AlotOfReading 6 hours ago
  
  Traits with Copy can do that, I'm just saying they're not really implicit copies because it's a core, visible part of the language that the developer can control on all of their own types.
  
  Reply View | 2 replies
aseipp 7 hours ago

There are ongoing projects like Granule[1] that are exploring more precise resource usage to be captured in types, in this case by way of graded modalities. There is of course still a tension in exposing too much of the implementation details via intensional types. But it's definitely an ongoing avenue of research.
[1] http://granule-project.github.io/granule.html

Reply View | 1 reply
- boltzmann-brain 5 hours ago
  
  can Granule let me specify the following constraints on a function?
  - it will use O(n) space where n is some measure of one of the parameters (instead of n you could have some sort of function of multiple measures of multiple parameters)
  - same but time use instead of space use
  - same but number of copies
  - the size of an output will be the size of an input, or less than it
  - the allocated memory after the function runs is less than allocated memory before the function runs
  - given the body of a function, and given that all the functions used in the body have well defined complexities, the complexity of the function being defined with them is known or at least has a good upper bound that is provably true
  
  Reply View | 0 replies
3836293648 6 hours ago

There is discussion about this in the Rust world, though no attempts at implementation (and yet further from stabilisation)

Reply View | 0 replies
zokier 5 hours ago

Wouldn't such analysis in the general case run afoul of Rices theorem?

Reply View | 0 replies

smartmic 10 hours ago

From the article:

> I am also deeply sick and tired of maintaining large Python scripts recently, and crave the modularity and type safety of OCaml.

I can totally relate. Switching from Python to a purely functional language can feel like a rebirth.

Reply View 12 replies

voidUpdate 9 hours ago

While python isn't type safe, you can use Pylance or similar in combination with type hinting to get your editor to yell at you if you do something bad type-wise. I've had it turned on for a while in a large web project and it's been very helpful, and almost feels type-safe again

Reply View | 4 replies
- debugnik 7 hours ago
  
  It just isn't good enough. Anytime Pyright gives up in type checking, which is often, it simply decays the type into one involving Any/"Unknown":
  Without strict settings, it will let you pass this value as of any other type and introduce a bug.
  But with strict settings, it will prevent you from recovering the actual type dynamically with type guards, because it flags the existence of the untyped expression itself, even if used in a sound way, which defeats the point of using a gradual checker.
  Gradual type systems can and should keep the typed fragment sound, not just give up or (figuratively) panic.
  
  Reply View | 0 replies
- VorpalWay 8 hours ago
  
  > I've had it turned on for a while in a large web project and it's been very helpful, and almost feels type-safe again
  In my experience "almost" is doing a lot of heavy lifting here. Typing in python certainly helps, but you can never quite trust it (or that the checker detects things correctly). And you can't trust that another developer didn't just write `dict` instead of `dict[int, string]` somewhere, which thus defaults to Any for both key and value. And that will type check (at least with mypy) and now you lost safety.
  Using a statically typed language like C++ is way better, and moving to a language with an advanced type system like that of Rust is yet another massive improvement.
  
  Reply View | 2 replies
  
  Balinares 7 hours ago
  
  Yeah, if you're going to use static type checks, which you should, you really want to run the checker in strict mode to catch oversights such as generic container types without a qualifier.
  Although I've found that much of the pain of static type checks in Python is really that a lot of popular modules expose incorrect type hints that need to be worked around, which really isn't a pleasant way to spend one's finite time on Earth.
  
  Reply View | 1 reply
  
  girvo 3 hours ago
  
  > that a lot of popular modules expose incorrect type hints that need to be worked around
  Typescript (and Flow to a lesser extent) had this problem once upon a time. It’s a lot better today, so I imagine it will continue to improve.
  
  Reply View | 0 replies
IshKebab 8 hours ago

OCaml isn't pure.

Reply View | 6 replies
- avsm 6 hours ago
  
  (author here) it's actually the module system of OCaml that's amazing for large-scale code, not the effects. I just find that after a certain scale, being able to manipulate module signatures independently makes refactoring of large projects a breeze.
  Meanwhile, in Python, I just haven't figured out how to effectively do the same (even with uv ruff and other affordances) without writing a ton of tests. I'm sure it's possible, but OCaml's spoilt me enough that I don't want to have to learn it any more :-)
  
  Reply View | 0 replies
- pkal 7 hours ago
  
  I recently realized that "pure functional" has two meanings, one is no side-effects (functional programmers, especially of languages like Haskell use it this way) and the other is that it doesn't have imperative fragments (the jump ISWIM to SASL dropped the non-functional parts inherited from ALGOL 60). A question seems to be whether you want to view sequencing as syntax sugar for lambda expressions or not?
  
  Reply View | 4 replies
  
  nh2 6 hours ago
  
  Who uses the second meaning?
  In my experience, "purely functional" always means "you can express pure functions on the type level" (thus guaranteeing that it is referentially transparent and has no side effects) -- see https://en.wikipedia.org/wiki/Pure_function
  
  Reply View | 0 replies
  
  NeutralForest 6 hours ago
  
  I'm working with Python and I'm sympathetic to the problem so I'd be curious if you have examples of what Python issues are fixed with OCaml.
  
  Reply View | 1 reply
  
  rienbdj 5 hours ago
  
  A few ways in which Python is not really functional:
  The scoping rules of Python are not lexical
  Lambdas in Python are not multiline
  Recursion is not a practical way to write code due to stack overflows
  Monkey patching
  
  Reply View | 0 replies
  
  nesarkvechnep 4 hours ago
  
  Pure functional doesn't mean no side effects but controlled side effects.
  
  Reply View | 0 replies

ttoinou 11 hours ago

Does it look like functional programming anymore ?

Reply View 10 replies

boltzmann-brain 9 hours ago

Yes - high-performance Haskell code looks similar. There isn't much to be said there - it's a little less clean-looking because FP optimizes for the most useful scenario and trying to do highly advanced stuff like that will be more verbose. This is in contrast to OOP where everything is verbose, and sometimes high-perf stuff that falls into the shape of globals + mutation + goto looks very succinct.

Reply View | 0 replies
seanhunter 9 hours ago

Looks like 100% idiomatic normal OCaml to me.

Reply View | 2 replies
- unstruktured 8 hours ago
  
  Technically you are right but too much mutation for my tastes and probably many other ocaml developers.
  
  Reply View | 1 reply
  
  avsm 7 hours ago
  
  (author here) The mutation is only for performance critical code. I'm first trying to match C/Rust performance in my code, and then transform it to more idiomatic functional code (which flambda2 in OxCaml can optimise).
  It's too difficult right now to directly jump to the functional version since I don't understand the flambda2 compiler well enough to predict whta optimisations will work! OxCaml is stabilising more this year so that should get easier in time.
  
  Reply View | 0 replies
le-mark 11 hours ago

I think there are more succinct snippets in here and some this more verbose exposition is for pedagogical purposes. I am not a fan of ocaml because tacking on the object syntax made SML more verbose (ugly imo). Looks like 0xcaml continued trend.

Reply View | 2 replies
- pjmlp 10 hours ago
  
  OxCaml is OCaml, it is only a set of language extensions that Jane Street expects eventually being able to upstream, depending on the experience.
  
  Reply View | 1 reply
  
  le-mark 31 minutes ago
  
  Yes much like the Object extensions added to Caml.
  
  Reply View | 0 replies

cess11 9 hours ago

Looks pretty ML:ish to me, even in a segment like this:

   let parse_int64 (local_ buf) (sp : span) : int64# =
     let mutable acc : int64# = #0L in
     let mutable i = 0 in
     let mutable valid = true in
     while valid && i < I16.to_int sp.#len do
       let c = Bytes.get buf (I16.to_int sp.#off + i) in
       match c with
       | '0' .. '9' ->
         acc <- I64.add (I64.mul acc #10L) (I64.of_int (Char.code c - 48));
         i <- i + 1
       | _ -> valid <- false
     done;
     acc

Reply View 0 replies

pjmlp 10 hours ago

Depends on what one means as FP.
When I learnt FP, the choice was between Lisp, Scheme, Miranda, Caml Light and Standard ML, depending on the assignment.
Nowadays some folks consider FP === Haskell.

Reply View | 1 reply
- ttoinou 10 hours ago
  
  Even F# looks like good FP to me. But yes I expect something short in FP to clearly see the structure of the program, side effects, flow and data
  
  Reply View | 0 replies

[removed] 11 hours ago

[deleted]

Reply View 0 replies