Notes by djb on using Fil-C
(cr.yp.to)318 points by transpute a day ago
318 points by transpute a day ago
> I had originally configured the server phoenix with only 12GB swap. I then had to restart ./build_all_fast_glibc.sh a few times because the Fil-C compilation ran out of memory. Switching to 36GB swap made everything work with no restarts; monitoring showed that almost 19GB swap (plus 12GB RAM) was used at one point. A larger server, 128 cores with 512GB RAM, took 8 minutes for Fil-C plus 6 minutes for musl, with no restarts needed.
Yikes that’s a lot of memory! Filc is doing a lot of static analysis apparently.
Yes, linking LLVM takes up a lot of memory. The documented guidance is to allow one link job per 15 GB of RAM [1].
[1] https://llvm.org/docs/CMake.html#frequently-used-llvm-relate...
And, fairly uniquely, LLVM has a LLVM_PARALLEL_LINK_JOBS setting that is distinct from the number of parallel jobs for everything else. I think I was using that 15 years ago.
I wish GCC had it. I have a quad core machine with 16 GB RAM that OOMs on building recent GCC -- 15 and HEAD for sure, can't remember whether 14 is affected. Enabling even 1 GB of swap makes it work. The culprit is four parallel link jobs needing ~4 GB each.
There are only four of them, so a -j8 build (e.g., with HT) is no worse.
For those who might miss it, the notes cite a new 64-bit version of cdb that supports exabyte databases
Also maybe of interest is that the new cdb subdomain is using pqconnect instead of dnscurve
> Also maybe of interest is that the new cdb subdomain is using pqconnect instead of dnscurve
This is not correct. There isn't a cdb subdomain because cdb.cr.yp.to doesn't have NS records, which is where DNSCurve fits in. If you have a DNSCurve resolver, then your queries for cdb.cr.yp.to will use DNSCurve and will be sent to the yp.to nameservers.
From there, if you have pqconnect, your http(s) connection to cdb.cr.yp.to will happen over pqconnect.
Maybe the confusion is because both DNSCurve and pqconnect encode pubkeys in DNS, but they do different things.
Here is DNSCurve:
$ dig +short ns yp.to
uz5jmyqz3gz2bhnuzg0rr0cml9u8pntyhn2jhtqn04yt3sm5h235c1.yp.to.
Here is pqconnect: $ dig +short cdb.cr.yp.to
pq1htvv9k4wkfcmpx6rufjlt1qrr4mnv0dzygx5mlrjdfsxczbnzun055g15fg1.yp.to.
131.193.32.108
Like CurveCP, pqconnect puts the pubkey into a CNAME.RFC 1034 Domain Concepts and Facilities November 1987 [Page 8]
"A domain is identified by a domain name, and consists of that part of the domain name space that is at or below the domain name which specifies the domain. A domain is a subdomain of another domain if it is contained within that domain. This relationship can be tested by seeing if the subdomain's name ends with the containing domain's name. For example, A.B.C.D is a subdomain of B.C.D, C.D, D, and " "."
1 cdb.cr.yp.to - regular DNS:
124 bytes, 1+2+0+0 records, response, noerror
query: 1 cdb.cr.yp.to
answer: cdb.cr.yp.to 30 CNAME pq1jbw2qzb2201xj6pyx177b8frqltf7t4wdpp32fhk0w3h70uytq5020w020l0.yp.to
answer: pq1jbw2qzb2201xj6pyx177b8frqltf7t4wdpp32fhk0w3h70uytq5020w020l0.yp.to 30 A 131.193.32.109
In the terminology of RFC1034, cdb.cr.yp.to, a CNAME, can be described as a subdomain of cr.yp.to and yp.to(NB. The pq1 portion is not a public key, it is a hash of a server's long-term public key)
Use of pqconnect at yp.to is probably old news but the cdb.cr.yp.to CNAME does appear to be new as of around 21 Oct
The notes on using Fil-C were submitted three days ago
https://news.ycombinator.com/item?id=45663435 (discussed 11d ago)
Is there a reason that some of the linked benchmarks, if I'm reading it right, have Fil-C running faster than C?[0] I assume it's just due to micro-benchmark variability but I'm curious. Some of them seem impossibly fast compared to C so I wonder if there are some correctness issue there.
Usually garbage collection does improve alot of benchmarks, just look at the hans boem gc benchmarks.
The two extreme outliers I see are labeled "aead/clx192q/opt,-O3" and "aead/schwaemm128128v2/opt,-Os" according to clicking on the points with devtools. aead/schwaemm128128v2/opt,-Os looks like it is almost at 0x. 1x is at about y = 659 and that test is at 769 out of I guess 780 based on the graph.
Classic example: https://devblogs.microsoft.com/oldnewthing/20180228-00/?p=98...
Can a program be written only partially in Fil-C? That is to say, can we link regular C and Fil+C object files in a single executable?
> There is no interoperability with Yolo-C (i.e. classic C). This is both a goal and the outcome of a non goal.
(worth reading, i think all the stuff Fil writes is both super informative & quite entertaining.)
Cool project! I take it the goal is that, overhead being acceptable, most C / C++ programmes don't actually "have to be" rewritten in something like Rust?
I wonder how / where Epic Games comes in?
Note that Fil-C is a garbage-collected language that is significantly slower than C.
It's not a target for writing new code (you'd be better off with C# or golang), but something like sandboxing with WASM, except that Fil-C crashes more precisely.
From the topic starter: "I've posted a graph showing nearly 9000 microbenchmarks of Fil-C vs. clang on cryptographic software (each run pinned to 1 core on the same Zen 4). Typically code compiled with Fil-C takes between 1x and 4x as many cycles as the same code compiled with clang"
Thus, Fil-C compiled code is 1 to 4 times as slow as plain C. This is not in the "significantly slower" ballpark, like where most interpreters are. The ROOT C/C++ interpreter is 20+ times slower than binary code, for example.
Cryptographic software is probably close to a best case scenario since there is very little memory management involved and runtime is dominated by computation in tight loops. As long as Fil-C is able to avoid doing anything expensive in the inner loops you get good performance.
Along with the sibling comment, microbenchmarks should not be used as authoritative data when the use case is full applications. For that matter, highly optimized Java or Go may be "1 to 4 times as slow as plain C". Fil-C has its merits, but they should be described carefully, just with any technology.
What does "significantly" mean to you? To my ear, "significantly" means "statistically significant".
A GC lang isn't necessarily significantly slower than C. You should qualify your statements. Moreover, this is a variant of C, which means that the programs are likely less liberal with heap allocations. It remains to be seen how much of a slowdown Fil-C imposes under normal operating conditions. Moreover, although it is indeed primarily suited for existing programs, its use in new programs isn't necessarily worse than, e.g., C# or Go. If performance is the deciding factor, probably use Rust, Zig, Nim, D, etc. .
What language do people considering c as an option for a new project consider? Rust is the obvious one we aren't going to discuss because then we won't be able to talk about anything else, Zig is probably almost as well loved and defended, but it isn't actually memory safe, just much easier to be memory safe. As you say, c# and go, also maybe f# and ocaml if we are just writing simple c style stuff none of those would look all that different. Go jhs some ub related to concurrency that people run into, but most of these simple utilities are either single threaded or fine grained parallel which is pretty easy to get right. Julia too maybe?
WASM is a sandbox. It doesn't obviate memory safety measures elsewhere. A program with a buffer overflow running in WASM can still be exploited to do anything that program can do within in WASM sandbox, e.g. disclose information it shouldn't. WASM ensures such a program can't escape its container, but memory safety bugs within a container can still be plenty harmful.
For those, like me, that didn’t know what Fil-C is:
> Fil-C is a fanatically compatible memory-safe implementation of C and C++. Lots of software compiles and runs with Fil-C with zero or minimal changes. All memory safety errors are caught as Fil-C panics. Fil-C achieves this using a combination of concurrent garbage collection and invisible capabilities (InvisiCaps). Every possibly-unsafe C and C++ operation is checked. Fil-C has no unsafe statement and only limited FFI to unsafe code.
The posted article has a detailed explanation of djb successfully compiling a bunch of C and C++ codebases.
I guess to get on board with this, it is my understanding you have to accept the premise of a Garbage Collector in the runtime?
Note that it is a garbage collector designed and implemented by one of the most experienced GC experts on earth. He previously designed and implemented WebKit's state of the art concurrent GC, for example. So—yes, but don't dismiss it too quickly.
If that's all you need, the state of the art is very available already through the JVM and the .NET CLR, as well as a handful others depending on your use case. Most of those also come with decent languages, and great facilities to leverage the GC to its maximum.
But GCs aren't magic and you will never get rid of all the overhead. Even if the CPU time is not noticeable in your use case, the memory usage fundamentally needs to be at least 2-4x the actual working set of your program for GCs to be efficient. That's fine for a lot of use cases, especially when RAM isn't scarce.
Most people who use C or C++ or Rust have already made this calculation and deemed the cost to be something they don't want to take on.
That's not to say Fil-C isn't impressive, but it fills a very particular niche. In short, if you're bothering with a GC anyway, why wouldn't you also choose a better language than C or C++?
It's amazing how much technical discourse revolves around impressions.
"Oh, it has a GC! GC bad!"
"No, this GC by smart guy, so good!"
"No, GC always bad!"
People aren't engaging with the technical substance. GC based systems and can be plenty good and fast. How do people think JavaScript works? And Go? It's like people just absorbed from the discursive background radiation the idea GC is slow without understanding why that might be or whether it's even true. Of course it's not.
The author of Fil-C does have some ideas to avoid a garbage collector [1], in summary: Use-after-free at worst means you might see an object of the same size, but you can not corrupt data structures (no pointer / integer confusion). This would be more secure than standard C, but less secure than Fil-C with GC.
Both Fil-C and CHERI rely on a concurrent GC/a GC-like task to find and invalidate all pointers to free()'d memory objects (in "quarantine") before putting them back into the memory pool.
The difference is that because Fil-C has bounds in each object's header, it only has to nullify it to remove access whereas in CHERI a quarantined object can still be accessed through any pointer that hasn't been invalidated yet.
I've seen discussions on adding an additional memory tag to CHERI for memory in quarantine, but I dunno what is best.
Fil-C relies on the compiler being trusted whereas CHERI does not. If we do, then perhaps we could come up with a hardware-accelerated system that is more lightweight than either.
I’m glad Phil’s work is finally getting the recognition it deserves.
There may be useful takeaways here for Rust’s “unsafe” mode - particularly for applications willing to accept the extra burden of statically linking Fil-C-compiled dependencies. Best of both worlds!
> particularly for applications willing to accept the extra burden of statically linking Fil-C-compiled dependencies. Best of both worlds!
As near as I can tell Fil-C doesn't support this, or any other sort of FFI, at all. Nor am I sure FFI would even make sense, it seems like an approach that has to take over the entire program so that it can track pointer provenance.
For securing and maintaining a complex legacy application it seems like a reasonable approach would be to move the majority into Fil-C, then hook the bits that don't fit up via RPC. Maybe some bits get formal verification, rewritten in Rust, ported to new platform APIs, whatever, but at least you get some safety for the whole app without a rewrite.
Related:
Fil-C: A memory-safe C implementation - https://news.ycombinator.com/item?id=45735877 - Oct 2025 (130 comments)
Safepoints and Fil-C - https://news.ycombinator.com/item?id=45258029 - Sept 2025 (44 comments)
Fil's Unbelievable Garbage Collector - https://news.ycombinator.com/item?id=45133938 - Sept 2025 (281 comments)
InvisiCaps: The Fil-C capability model - https://news.ycombinator.com/item?id=45123672 - Sept 2025 (2 comments)
Just some of the memory safety errors caught by Fil-C - https://news.ycombinator.com/item?id=43215935 - March 2025 (5 comments)
The Fil-C Manifesto: Garbage In, Memory Safety Out - https://news.ycombinator.com/item?id=42226587 - Nov 2024 (1 comment)
Rust haters, unite Fil-C aims to Make C Great Again - https://news.ycombinator.com/item?id=42219923 - Nov 2024 (6 comments)
Fil-C a memory-safe version of C and C++ - https://news.ycombinator.com/item?id=42158112 - Nov 2024 (1 comment)
Fil-C: Memory-Safe and Compatible C/C++ with No Unsafe Escape Hatches - https://news.ycombinator.com/item?id=41936980 - Oct 2024 (4 comments)
The Fil-C Manifesto: Garbage In, Memory Safety Out - https://news.ycombinator.com/item?id=39449500 - Feb 2024 (17 comments)
In addition, here are the major related subthreads from other submissions:
https://news.ycombinator.com/item?id=45568231 (Oct 2025)
https://news.ycombinator.com/item?id=45444224 (Oct 2025)
https://news.ycombinator.com/item?id=45235615 (Sept 2025)
https://news.ycombinator.com/item?id=45087632 (Aug 2025)
https://news.ycombinator.com/item?id=44874034 (Aug 2025)
https://news.ycombinator.com/item?id=43979112 (May 2025)
https://news.ycombinator.com/item?id=43948014 (May 2025)
https://news.ycombinator.com/item?id=43353602 (March 2025)
https://news.ycombinator.com/item?id=43195623 (Feb 2025)
https://news.ycombinator.com/item?id=43188375 (Feb 2025)
https://news.ycombinator.com/item?id=41899627 (Oct 2024)
https://news.ycombinator.com/item?id=41382026 (Aug 2024)
https://news.ycombinator.com/item?id=40556083 (June 2024)
https://news.ycombinator.com/item?id=39681774 (March 2024)
Thanks for the subthread discussion links, e.g. authors of LuaJIT and Fil-C, https://news.ycombinator.com/item?id=40556083 (June 2024)
malloc'd memory is zeroed in fil-c:
> *zgc_alloc*
> Allocate count bytes of zero-initialized memory. May allocate slightly more than count, based on the runtime's minalign (which is currently 16).
> This is a GC allocation, so freeing it is optional. Also, if you free it and then use it, your program is guaranteed to panic.
> libc's malloc just forwards to this. There is no difference between calling malloc and zgc_alloc.
Great to see some 3letter guy into this. This might be one of those rando things which gets posted on HN (and which doesn't involve me in the slightest), but a decade later is taking over the world. Rust and Go were like that.
Previously there was that Rust in APT discussion. A lot of this middle-aged linux infrastructure stuff is considered feature-complete and "done". Not many young people are coming in, so you either attract them with "heyy rewrite in rust" or maybe the best thing is to bottle it up and run in a VM.
>Great to see some 3letter guy into this
AFAIK, djb isn't for many "some 3letter guy" for over about thirty years but perhaps it's just age related issue with those less been around.
Just to be clear, I mean to venerate Bernstein for earning his 3letters, not to trivialize him.
Not to trivialise but being a 3 letter guy means being old. So, it's at best a celebration of achieving longevity and at worst a celebration of creaky joints and a short temper.
Interesting to see some bash curl being used by a renowned cryptologist...
Almost like it's actually fine.
https://medium.com/@ewindisch/curl-bash-a-victimless-crime-d...
It is definitely not fine. The argument seems to be that since you need to trust somebody, curl | bash is fine because you just trust whoever controls the webserver. I think this is missing the point.
Building tools is one thing, building a system like Postgres or Databases is going to be another thing.
Anyone really tried building PG or MySQL or such a complex system which heavily relies on IO operations and multi threading capabilities
I would really like to see Omarchy go this direction. A fully memory-safe userland for Omarchy is possible with existing techhnology.
For cultural reasons, I would like Omarchy—culturally—to adopt straightforward security as one of their goals, in addition to usability and beauty.
It's low hanging fruit, and a great way to further differentiate their Linux distribution.
I can't wait for all the delicious four-way flamewars. Choose your fighter!
1) Rewrite X in Rust
2) Recompile X using Fil-C
3) Recompile X for WASM
4) Safety is for babies
There are a lot of half baked Rust rewrites whose existence was justified on safety grounds and whose rationale is threatened now that HN has heard of Fil-C
It's not an either-or (well, except for this last item).
It seems sensible to not write new software in plain C. Rust is certainly a valid choice for a safer language, but in many cases overkill wrt how painful the rewrite is vs benefits gained from avoiding a higher-level memory-safe one like OCaml.
At the same time, "let's just rewrite everything!" is also madness. We have many battle-tested libraries written in C already. Something like Fil-C is badly needed to keep them working while improving safety.
And as for wasm, it's sort of orthogonal - whether you're writing in C or in Rust, the software may be bug-free, but sandboxing it may still be desirable e.g. as a matter of trust (or lack thereof). Also, cross-platform binaries would be nice to have in general.
We have a saying that jam is made of fruit that gave up the fight becoming a brandy.
Yeah since Fil-C is just an LLVM transform we could make Rust memory safe with it
Wish we were talking about making Fil-C required for apt, not Rust...
Those seems to be independent issues. Fil-C is about the best way to compile/run C code.
Rust would be about what language to use for new code.
Now that I have been programming in Rust for a couple of years, I don't want to go back to C (except for some hobby projects).
I agree. The main advantage of Fil-C is compatibility with C, in a secure way. The disadvantages are speed, and garbage collection. (Even thought, I read that garbage collection might not be needed in some cases; I would be very interested in knowing more details).
For new code, I would not use Fil-C. For kernel and low-level tools, other languages seem better. Right now, Rust is the only popular language in this space that doesn't have these disadvantages. But in my view, Rust also has issues, specially the borrow checker, and code verbosity. Maybe in the future there will be a language that resolves these issues as well (as a hobby, I'm trying to build such a language). But right now, Rust seems to be the best choice for the kernel (for code that needs to be fast and secure).
> disadvantages are speed, and garbage collection.
And size. About 10x increase both on disk and in memory
$ stat -c '%s %n' {/opt/fil,}/bin/bash
15299472 /opt/fil/bin/bash
1446024 /bin/bash
$ ps -eo rss,cmd | grep /bash
34772 /opt/fil/bin/bash
4256 /bin/bashFil-C is slow.
There is no C or C++ memory safe compiler with acceptable performance for kernels, rendering, games, etc. For that you need Rust.
The future includes Fil-C for legacy code that isn’t performance sensitive and Rust for new code that is.
How slow? In some contexts, the trade-off might be acceptable. From what I've seen in pizlonator's tweets, in some cases the difference in speed didn't seem drastic to me.
Yeah, I would happily run a bunch of my network services in this. I have loads of services that are public-facing doing a lot of complex parsing and rule evaluation and are mostly idle. For example my whole mailserver stack could probably benefit from this. My few messages an hour can run 2x slower. Maybe I would leave dovecot native since the attack surface before authentication is much lower and the performance difference would be more noticeable (mostly for things like searches).
That's my guess, yeah
Also, Fil-C's overheads are the lowest for programs that are pushing primitive bits around.
Fil-C's overheads are the highest for programs that chase pointers.
I'm guessing the CPU bound bits of apt (if there are any) are more of the former
Fil-C works because you recompile the whole C userspace. Unsafe Rust doesn't do that... and for many practical purposes you probably want to touch the non-safe-version of the C userspace.
Still, it's all LLVM, so perhaps unsafe Rust for Fil-space can be a thing, a useful one for catching (what would be) UBs even [Fil-C defines everything, so no UBs, but I'm assuming you want to eventually run it outside of Fil-space].
Now I actually wonder if Fil-C has an escape hatch somewhere for syscalls that it does not understand etc. Well it doesn't do inline assembly, so I shouldn't expect much... I wonder how far one needs to extend the asm clobber syntax for it to remotely come close to working.
at the bottom of the turtle stack, there's a yolo-c libc that does some syscall stuff:
> libyoloc.so. This is a mostly unmodified [musl/glibc] libc, compiled with Yolo-C. The only changes are to expose some libc internal functionality that is useful for implementing libpizlo.so. Note that libpizlo.so only relies on this library for system calls and a few low level functions. In the future, it's possible that the Fil-C runtime would not have a libc in Yolo Land, but instead libpizlo.so would make syscalls directly.
but mostly you are using a fil-c compiled libc:
> libc.so. This is a modified musl libc compiled with Fil-C. Most of the modifications are about replacing inline assembly for system calls with calls to libpizlo.so's syscall API.
That links here: https://github.com/pizlonator/fil-c/blob/deluge/filc/include...
Quotes from: https://fil-c.org/runtime
djb uses a surprisingly low amount of RAM (12GB) considering my laptop already has 64G which is possible to expand to 128G in the future
To summarize, he's sufficiently impressed with it that he's embarking on an attempt to rebuild an entire Debian system with it, and he's written some software (a GC shim library and build scripts) that are likely to be of interest to others who are attempting the same thing.