Comment by jcranmer

Comment by jcranmer 21 hours ago

5 replies

> 1. It's the C programming language represented as SSA form and with some of the UB in the C spec given a strict definition.

This is becoming steadily less true over time, as LLVM IR is growing somewhat more divorced from C/C++, but that's probably a good way to start thinking about it if you're comfortable with C's corner case semantics.

(In terms of frontends, I've seen "Rust needs/wants this" as much as Clang these days, and Flang and Julia are also pretty relevant for some things.)

There's currently a working group in LLVM on building better, LLVM-based semantics, and the current topic du jour of that WG is a byte type proposal.

pizlonator 20 hours ago

> This is becoming steadily less true over time, as LLVM IR is growing somewhat more divorced from C/C++, but that's probably a good way to start thinking about it if you're comfortable with C's corner case semantics.

First of all, you're right. I'm going to reply with amusing pedantry but I'm not really disagreeing

I feel like in some ways LLVM is becoming more like C-in-SSA...

> and the current topic du jour of that WG is a byte type proposal.

That's a case of becoming more like C! C has pointer provenance and the idea that byte copies can copy "more" than just the 8 bits, somehow.

(The C provenance proposal may be in a state where it's not officially part of the spec - I'm not sure exactly - but it's effectively part of the language in the sense that a lot of us already consider it to be part of the language.)

  • jcranmer 20 hours ago

    The C pointer provenance is still in TS form and is largely constructed by trying to retroactively justify the semantics of existing compilers (which all follow some form of pointer provenance, just not necessarily coherently). This is still an area where we have a decent idea of what we want the semantics to be but it's challenging to come up with a working formalization.

    I'd have to double-check, but my recollection is that the current TS doesn't actually require that you be able to implement user-written memcpy, rather it's just something that the authors threw their hands up and said "we hope compilers support this, but we can't specify how." In that sense, byte type is going beyond what C does.

    • pizlonator 20 hours ago

      > The C pointer provenance is still in TS form and is largely constructed by trying to retroactively justify the semantics of existing compilers

      That's my understanding too

      > I'd have to double-check, but my recollection is that the current TS doesn't actually require that you be able to implement user-written memcpy, rather it's just something that the authors threw their hands up and said "we hope compilers support this, but we can't specify how."

      That's also my understanding

      > In that sense, byte type is going beyond what C does.

      I disagree, but only because I probably define "C" differently than you.

      "C", to me, isn't what the spec describes. If you define "C" as what the spec describes, then almost zero C programs are "C". (Source: in the process of making Fil-C, I experimented with various points on the spectrum here and have high confidence that to compile any real C program you need to go far beyond what the spec promises.)

      To me, when we say "C", we are really talking about:

      - What real C programs expect to happen.

      - What real C compilers (like LLVM) make happen.

      In that sense, the byte type is a case of LLVM hardening the guarantee that it already makes to real C programs.

      So, LLVM having a byte type is a necessary component of LLVM supporting C-as-everyone-practically-it.

      Also, I would guess that we wouldn't be talking about the byte type if it wasn't for C. Type safe languages with well-defined semantics have no need for allowing the user to write a byte-copy loop that does the right thing if it copies data of arbitrary type

      (Please correct me if I'm wrong, this is fun)

      • uecker 18 hours ago

        The C standard has a conformance model that distinguishes between "strictly conforming" and "conforming" C programs. Almost zero C programs are strictly conforming, but many are conforming.

    • uecker 18 hours ago

      bytewise copy just works with the TS. What it does not support is tracking provenance across the copy and doing optimization based on this. What we hope is that compilers drop these optimizations, because they are unsound.