Comment by pianoben

Comment by pianoben a day ago

13 replies

The trouble I have with this approach (which, conceptually, I agree with) is that it's damned hard to do anything with the parse results. Want to print that email_t? Then you're right back to char*, unless you somehow write your own I/O system that knows about your opaque conventions.

So you say, okay, I'll make an `email_to_string` function. Does it return a copy or a reference? Who frees it? etc, etc, and you're back to square one again. The idea is to keep char* and friends at "the edge", but I've never found a way to really achieve that.

Could just be my limitations as a C programmer, in which case I'd be thrilled to learn better.

lelanthran a day ago

Firstly, `parsing` is just a way to say "serialise from a string". The reverse operation can be done for every type you are creating. If the reverse operation (serialise to a string) does not exist in the interface then adding it gives you a single place to catch all the bugs.

I'm thinking of that recent git bug that occurred because the round-trip of `string -> type -> string` had an error (stripping out the CR character). Using a specific type for a value that is being round-tripped means that a bugfix needs to only be made in the parser function. Storing the value as simple strings would result in needing to put your fix everywhere.

> The trouble I have with this approach (which, conceptually, I agree with) is that it's damned hard to do anything with the parse results.

You're right - it is damn hard, but that is on purpose; if you're doing something with the email that boils down to "treat it like a `char *`" then the potential for error is large.

If you're forced to add in a new use-case to the `email_t` interface then you have reduced the space of potential errors.

For example:

> Want to print that email_t? Then you're right back to char, unless you somehow write your own I/O system that knows about your opaque conventions.

is a bug waiting to surface, because it's an email, not a string, and if you decide to print an email* that was read as a `char *` you might not get what you expect.

It's all a trade-off - if you want more flexibility with the value stored in a variable, then sure, you can have it but it comes at a cost: some code somewhere almost certainly will eventually use that flexibility to mismatch the type!

If you want to prevent type mismatches, then a lot of flexibility goes out the window.

  • jagged-chisel a day ago

    Linguistic nit: deserialize from a string, serialize to a string

    “Serialization” is the act of taking an internal data structure (of whatever shape and depth) and outputting it for transmission or storage. The opposite is “deserialization,” restoring the original shape and depth.

  • KerrAvon 18 hours ago

    But that’s where TFA breaks down: the whole point of this is to claim “hey, C does too have typesafety” —if you use a modern language with actual typesafety, you can just make the underlying representation accessible (probably read-only) and you don’t need a reverse conversion step. You can have type safety, efficiency, and ease of use. Just rip the band-aid off and move to Rust or Swift already.

    • lelanthran 9 hours ago

      > the whole point of this is to claim “hey, C does too have typesafety” —if you use a modern language with actual typesafety,

      I'm afraid the point was not some childish and immature comparison of C with modern languages.

      The point was to demonstrate what type safety there is, and how to use it. The advantages of modern languages are even acknowledged:

      > Much to the surprise of, well, everybody, C actually has type safety. Sure, it isn’t as enforceable as (for example) Rust… and, sure, if you are willing to do extra work you can bypass it,

      The entire point of TFA is actually in TFA:

      > The problem isn’t that C lacks type safety (it clearly enforces most types in most expressions), it’s that raw pointers do not encode semantics (e.g., a char * doesn’t tell you if it’s an email, a name, or a filename).

8organicbits a day ago

In the past I've taken inspiration from strncpy: the caller needs to allocate the memory. For the email example, you'd probably also want a function to tell you the length of the emailstring, but for other types there are clear size limits. This puts the caller in control of memory allocation, so they may be able to statically allocate, allocate in an arena, or use other methods which promote performance. The static approach is really nice when it works, because there's nothing to free.

dwattttt a day ago

email_t doesn't have to be opaque; if it's just a visible wrapper around char* then you can still do everything with it as a char* (that is, everything you do with strings).

The benefit is to avoid treating char*s as email_t, not avoiding treating email_t as char*.

  • maxbond a day ago

    (Using a thin wrapper like this to add safety is called the newtype pattern, if anyone wants to know.)

    • tetha a day ago

      I was curious how this would look in C, and I found this article[1] how this could look in C, apparently with very little overhead.

      And as I just saw, Python 3.10 also introduced a NewType[2] wrapper. I'll have to see how that feels to handle.

      1: https://blog.nelhage.com/2010/10/using-haskells-newtype-in-c...

      2: https://typing.python.org/en/latest/spec/aliases.html#newtyp...

      • masklinn a day ago

        Python’s NewType is, confusingly, a very different thing: it’s a compile-time-only subtype of the original, rather than a Haskell-style newtype (which is an entirely separate type from its source).

    • mrkeen a day ago

      I've re-read the article again since getting a bunch of up and down votes across the comment section, and I think you've chosen a better name for this article than PdV. It really is just about using newtype wrappers.

  • bcrosby95 a day ago

    In the example code they explicitly put the struct in the c file so the char* is not available.

    If you're suggesting getting around this by casting an email_t* to char* then I wish you good luck on your adventures. There's some times you gotta do stuff like that but this ain't it.

    • dwattttt a day ago

      You could probably get away with the typecast if you satisfy the "common struct prefix" requirement, that's nowhere near necessary.

      While the article does hide the internal char*, that's not strictly necessary to get the benefit of "parse, don't validate". Hide implementation details sure, but not everything is an implementation detail.

restalis a day ago

The main benefit for me with this approach is that the boundries are not transparent anymore. That content printing is such a boundry. Your data is about to exit through there and you're summoned to handle that. The inconvenience that comes with it is as any other when security enters the play. The same with the data management responsibilities - who handles what, for how long, and with whom. Without data type distinctions everything is (more or less) common, with vague or broadly defined ownership.