Working through 'Writing A C Compiler'

(jollygoodsw.wordpress.com)

169 points by AlexeyBrin a day ago

sanufar a day ago

I love this book! I worked through a bunch of it during my winter break last year and found the incremental teaching style extremely rewarding. For readers of the book, Sandler’s reference OCaml implementation is super useful for getting your bearings. I was kind of thrown off by the use of TACKY as an IR, but it was nice to have a solid reference as I worked through the book. For those more experienced with compilers: what are some good resources for stuff like SSA and optimisation? I’ve looked at some of the resources here https://bernsteinbear.com/pl-resources/ but are there other canonical resources?

Reply View 4 replies

rootnod3 13 hours ago

There’s https://www.amazon.com/SSA_based-Compiler-Design/dp/30308051...

Reply View | 1 reply
- sanufar 8 hours ago
  
  This looks great! Added that and this https://www.amazon.com/Engineering-Compiler-Keith-D-Cooper/d... to my list
  
  Reply View | 0 replies
starkparker a day ago

> Sandler’s reference OCaml implementation is super useful for getting your bearings
The author also maintains a list of implementations created from the book: https://github.com/nlsandler/c-compiler-implementations

Reply View | 0 replies
sim7c00 14 hours ago

the dragonbook is a good start to go deeper inthink (modern editions). its less practical, more theory heavy. does have all the algos and some example materials. if you have a bit of basis it will be useful reference.

Reply View | 0 replies

pshirshov 7 hours ago

My personal experience of writing various DSL/general purpose compilers (I've created at least 4 DSL compilers and one general purpose) is kinda different from the books I've read.

Scala is an awesome language which frees one from working on many boring details and makes it possible to keep the codebase tiny. With such an expressive language I can concentrate on the logic instead of thinking about minor things.

We have enough memory and cpu power to use worse than linear algorithms without noticeable performance impact.

Parsers aren't an issue at all in our days, peg combinators like fastparse allow one to be extremely productive.

I tend to stick to immutable multi-staged pipelile with several immutable trees, use error-accumulating data structures (Either[NonEmptyList[Issue], T]), explicitly express entity (eg type definition) dependencies as graphs (which can be processed iteratively and in parallel).

Reply View 0 replies

UncleOxidant a day ago

The author of the book also has a series of blog entries: https://norasandler.com/2017/11/29/Write-a-Compiler.html

Reply View 0 replies

stellalo a day ago

From what the blog author says (I haven’t looked into the book), the approach reminds me of

> Abdulaziz Ghuloum, 2006, An Incremental Approach to Compiler Construction http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf

Reply View 2 replies

stellalo a day ago

Oh that’s exactly what the book’s author blog mentions: https://norasandler.com/2017/11/29/Write-a-Compiler.html

Reply View | 1 reply
- sn9 a day ago
  
  You can also find this approach in this book that comes in Racket and Python flavors [0].
  [0] https://mitpress.mit.edu/9780262047760/essentials-of-compila...
  
  Reply View | 0 replies

mkw5053 a day ago

Sounds like a great book. I worked through nand2tetris ages ago and remember enjoying it as well.

Reply View 0 replies

kragen a day ago

This makes the book sound very well structured! I also found Ghuloum's paper inspirational.

Reply View 0 replies

[removed] a day ago

[deleted]

Reply View 0 replies

[removed] a day ago

[deleted]

Reply View 0 replies

afaeta a day ago

[dead]

Reply View 0 replies

jokoon a day ago

The crafting interpreting asks the reader to use the visitor pattern, and this was quite a turn off for me, I stopped there.

Reply View 27 replies

alabhyajindal a day ago

You can choose to implement it differently based on your implementation language. Data Classes and If statements work really well for this in Python, for example.
Statement Data Classes: https://github.com/alabhyajindal/plox/blob/main/stmt.py
If statements in the parser matching against them: https://github.com/alabhyajindal/plox/blob/main/parser.py#L3...

Reply View | 0 replies
billforsternz 20 hours ago

I recall back in the day when I was an embedded systems programmer at the coal face I had similar antipathy towards a pattern. I think it was called the State pattern or similar. It involved a class hierarchy and virtual functions, one for basically every state, event pair.
I preferred my simple C design whic used a lookup table to quickly access a structure for each state, event pair instead. The structure would provide an output state, a message to send, an action to emit, a bitmask to select zero or more other standard operations, and yes a function to run in those rare cases where something really special was needed. All fields optional.
The point is I didn't need to write N x M functions, I just needed to edit a table. And I didn't need to understand any rocket science.

Reply View | 0 replies
markus_zhang a day ago

This part confused me quite a bit so I turned it into the more verbose format by copy-pasting. I don’t like the boilerplate code generation either so I converted that part too. The whole book is still pretty interesting though.

Reply View | 0 replies
quibono a day ago

Couldn't you write the interpreter without it?

Reply View | 0 replies
almostgotcaught a day ago

Lolol weirdest reason to reject that book - 90% of production parsers are recursive descent parsers.

Reply View | 18 replies
- jokoon 3 hours ago
  
  I think the book claims to be accessible and easy
  The visitor pattern is not something I find simple and easy to approach
  
  Reply View | 0 replies
- markus_zhang a day ago
  
  It probably has nothing to do with recursive descent parsing, which is intuitive, but with the visitor pattern as mentioned. I myself find it very distracting too.
  
  Reply View | 16 replies
  
  almostgotcaught a day ago
  
  .... They're the same thing....
  
  Reply View | 15 replies
UncleEntity a day ago

> The crafting interpreting asks the reader to use the visitor pattern...
...or just a big old, plain jane switch statement.
In my current project I modified my ASDL generator to output a C instead of C++ AST and the visitor pattern carried over until realizing a switch statement is just as good (or better) in C so I ripped out that part of the template file. The choice was to write a dispatch function which called the various methods based on the AST node type or have a generated struct full of function pointers with a generated dispatch function which calls the various methods based on the AST node type. Same difference, really, just one has an added level of indirection.
The amazing part is I didn't rewrite the ASDL generator for the fifth time and just decided it's 'good enough' for what I need it for. Aside from one small C++ism, which is easily worked around and turns out wasn't even needed in the C++ template, the thing is 100% language and 'access pattern' agnostic in generating the output code.
There was probably a point I was trying to make when I started typing, dunno?

Reply View | 3 replies
- grg0 a day ago
  
  My takeaway from your verbose description is:
  - You don't need a visitor pattern if you have predetermined the data you are going to work with and all the operations on it (i.e., the open/closed principle does not apply.)
  - For the same reason, you don't need dynamic dispatch, which is often how the visitor (and other) pattern(s) are implemented.
  - The code is much simpler to understand (and debug) because it's all there in once place. It's also faster than the dynamic dispatch version because it's all known at compile-time.
  - Personally: OOP is stupid, confusing, and inefficient; I think universities should only teach it as an optional course. These patterns are 50% lack of functional programming features and 50% sheer stupidity. Universities should go back to teaching real abstraction with Scheme and SICP, a MIPS-style assembly language, and stop confusing students.
  
  Reply View | 0 replies
- markus_zhang a day ago
  
  I think I did something similar for an emulator. Instead of using a big switch I simply used a big array of function pointers. So if it is a BLAH opcode, the execution code simply call fp_list[BLAH](op). But I guess it is a bit too much for CPUs that have tons of operations.
  
  Reply View | 1 reply
  
  UncleEntity 5 hours ago
  
  I actually use that pattern in the VM just with a dispatch function instead of calling the function out of the array directly. The compiler (more than likely) inlines the dispatch call and it let's me add some error checking on the opcode without it being scattered all over the instruction functions.
  The Next Big Trick™ is to just embed the function pointer into the opcode itself and do away with the dispatching completely, getting rid of a single pointer dereference per opcode has to be worth at least a 0.01% speed gain, right? I'm kidding, of course, as the original copy and patch (using C labels as references to mark the code boundaries of the code templates) should allow actual measurable gains in the single digit range.
  
  Reply View | 0 replies