Comment by lioeters
The project looks great! I browsed the codebase and enjoyed how much documentation there is about the user-facing and internal workings. I'm not familiar with the subject matter but I do love me a DSL, so the language design aspect was interesting to learn about.
I was curious how the language compiles to C, what the resulting code does, and how one interacts with it. It took a while of reading to find it, so maybe this could be linked from places where compilation is mentioned. This part is my favorite, it's cool how it works. Especially since you mention "anti-abstraction", I like seeing how the DSL maps to C.
https://github.com/rafa-rrayes/SHDL/blob/master/docs/docs/ar...
> Compiles circuits to C so that they can run anywhere
Input (Base SHDL):
component Buffer(A) -> (B) {
n1: NOT;
n2: NOT;
connect {
A -> n1.A;
n1.O -> n2.A;
n2.O -> B;
}
}
Output (C code): #include <stdint.h>
#include <string.h>
typedef struct {
uint64_t NOT_O_0;
} State;
static inline State tick(State s, uint64_t A) {
State n = s;
// NOT gate inputs
uint64_t NOT_0_A = 0ull;
NOT_0_A |= ((uint64_t)-( (A & 1u) )) & 0x1ull;
NOT_0_A |= ((uint64_t)-( ((s.NOT_O_0 >> 0) & 1u) )) & 0x2ull;
// Evaluate NOT gates
n.NOT_O_0 = (~NOT_0_A) & 0x3ull; // 2 active lanes
return n;
}
static inline uint64_t extract_B(const State *s) {
return (s->NOT_O_0 >> 1) & 1ull; // B from lane 1
}
...
Thank you so much for taking the time to dig into the code and docs, it means so much to me!
Here’s the core idea behind how SHDL compiles to C.
At compile time, SHDL groups all gates of the same typea together and packs them into uint64_t bitfields. Each individual gate occupies exactly one bit. If there are more than 64 gates of a given type, multiple uint64_t's are used.
So for example, if a circuit contains: 36 XOR gates - 82 AND gates - 1 NOT gate
The compiler will generate: 1 uint64_t for XOR (36 bits used, rest unused) - 2 uint64_t's for AND (64 + 18 bits) - 1 uint64_t for NOT
Each of these integers represents the state of all gates of that type at once.
The generated C code then works lane-wise: during `tick()`, it computes the inputs for all gates of a given type simultaneously using bitwise operations, and then evaluates them in parallel. Because everything is packed, a single ~, &, |, or ^ operates on up to 64 gates at once.
So instead of iterating gate-by-gate, the simulation step becomes something like: build input bitmasks - apply one bitwise operation per gate type - write the result back into the packed state
In other words, a full simulation step can advance dozens or hundreds of gates using just a handful of native CPU instructions. That’s the main reason the generated C code is both simple and fast.
This also ties directly into the “anti-abstraction” idea sunce there’s no hidden scheduler, no opaque simulator loop, and no dynamic dispatch. The DSL maps very explicitly to bit-level operations in C, and you can see exactly how a logical structure becomes executable code.
The final result is a compiled C shared library, which we can interact from using python (or anything else if you want to build it)
I really appreciate you calling this out. Do you think I should make it clearer in the docs? Thanks again for the comment!