Comment by hansvm

Comment by hansvm 19 hours ago

7 replies

One flaw I've seen in "Parse, Don't Validate" as it pertains to real codebases is that you end up with a combinatorial prolieration of types.

E.g., requiring that a string be base64, have a certain fixed length, and be provided by the user.

E.g., requiring that a file have the correct MIME type, not be too large, and contain no EXIF metadata.

If you really always need all n of those things then life isn't terrible (you can parse your data into some type representing the composition of all of them), but you often only need 1, 2, or 3 and simultaneously don't want to duplicate too much code or runtime work, leading to a combinatorial explosion of intermediate types and parsing code.

As one possible solution, I put together a POC in Zig [0] with one idea, where you abuse comptime to add arbitrary tagging to types, treating a type as valid if it has the subset of tags you care about. I'm very curious what other people do to appropriately model that sort of thing though.

[0] https://github.com/hmusgrave/pdv

deredede 15 hours ago

> E.g., requiring that a file have the correct MIME type, not be too large, and contain no EXIF metadata.

"Parse, don't validate" doesn't mean that you must encode everything in the type system -- in fact I'd argue you should usually only create new types for data (or pieces of data) that make sense for your business logic.

Here the type your business logic cares about is maybe "file valid for upload", and it is perfectly fine to have a function that takes a file, perform a bunch of checks on it, and returns a "file valid for upload" new type if it passes the checks.

jeremyscanvic 18 hours ago

You might be interested in Lean's way of doing things. They have normal types (e.g. numeric types) and subtypes (e.g. numbers less than zero). An element of the subtype "numbers less than zero" can be understood as a tuple containing the actual number (which has a normal numeric type) and a proof that this specific number is indeed less than zero.

https://lean-lang.org/doc/reference/latest/Basic-Types/Subty...

tacitusarc 17 hours ago

Structs are representations of combinatorial types! In your file case, you could parse the input into a struct, and then accept or reject further processing based on that contents of that struct.

Of course, it would be reasonable to claim that the accept/reject step is validation, but I believe “Parse, don’t validate” is about handling input, not an admonition to never perform validation.

  • mrkeen 17 hours ago

    In pure C however, you still get the types-in-source-code explosion, for lack of parametric polymorphism. You need an email_or_error and a name_or_error, etc. The alternative is to fake PP with a void*, but that's so ugly I think I'd scrap the whole effort and just use char*.

    > I believe “Parse, don’t validate” is about handling input, not an admonition to never perform validation.

    It's about validation happening at exactly one place in the code base (during the "parse" - even though it's not limited to string-processing), so that callers can't do the validation themselves - because callers will validate 0 times or n>1 times.

    • deredede 15 hours ago

      > You need an email_or_error and a name_or_error, etc.

      You don't need that. A practical solution is a generic `error` type that you return (with a special value for "no error") and `name` or `email` output arguments that only get set if there's no error.

      • mrkeen 2 hours ago

        IOW return something for the caller to validate.

        • deredede an hour ago

          "Parse, don't validate" is a catchy way of saying "Instead of mixing data validation and data processing, ensure clean separation by first parsing 'input data' into 'valid data', and then only process 'valid data'".

          It doesn't mean you should completely eliminate `if` statements and error checking.