Comment by morshu9001

Comment by morshu9001 2 days ago

12 replies

These are only scalars that you'd encode into bytes. I guess it's slightly annoying that both ends have to agree on how to serialize rather than protobuf itself doing it, but it's not a big enough problem.

Also I don't see special ASN1 support for non-Unicode string encodings, only subsets of Unicode like ascii or printable ascii. It's a big can of worms once you bring in things like Latin-1.

zzo38computer 2 days ago

ASN.1 has support for ISO 2022 as well as ASCII and Unicode (ASCII is a subset of Unicode as well as a subset of ISO 2022). (My own nonstandard extensions add a few more (such as TRON character code and packed BCD), and the standard unrestricted character string type can be used if you really need arbitrary character sets.) (Unicode is not a very good character set, anyways.)

Also, DER allows to indicate the type of data within the file (unless you are using implicit types). Protobuf has only a limited case of this (you cannot always identify the types), and it requires different framing for different types. However, DER uses the same framing for all types, and strings are not inherently limited to 2GB by the file format.

Furthermore, there are other non-scalar types as well.

In any of these cases, you do not have to use all of the types (nor do you need to implement all of the types); you only need to use the types that are applicable for your use.

I will continue to use ASN.1; Protobuf is not good enough in my opinion.

  • cryptonector a day ago

    To be fair, if you don't need to support anything other than Unicode, then this is not an advantage, and over time we're all going to need non-Unicode less and less. That said I'm a big fan of ASN.1 (see my comment history).

    • morshu9001 21 hours ago

      I'm still confused how these ISO 2022 strings even work, and the ASN1 docs discourage using the UniversalString and GraphicString types. All these different string types are intimidating if I just want unicode/ascii, and even if I were using an obscure encoding, I'd use generic bytes instead of wanting asn1 to care about it.

      • cryptonector 21 hours ago

        GeneralString relies on control characters to "load" character sets into the C0 and C1 registers. This is madness -- specifically it's pre-Unicode madness, but before Unicode it made sense.

      • zzo38computer 20 hours ago

        > I'm still confused how these ISO 2022 strings even work

        There is C0, G0, C1, and G1 sets (C0 and C1 are control characters and G0 and G1 are graphic characters), and escape sequences are used to select the C or G set for bytes with or without the high bit set. Graphic string does not allow control characters and General string does allow control characters.

        You probably do not need all control characters; your schema should probably restrict which control characters are allowed in each context (although the ASN.1 schema format does not seem to have any way to do this). This way, you will only handle the control characters which are appropriate for your use.

        This is messy, although canonical form simplifies it by adding some restrictions (this is one of the reasons why DER is better than BER, in my opinion). TRON code is better and is much simpler than the working of ISO 2022. (Unicode has a different kind of mess; although decoding is simpler, actually handling the decoded characters in text is its own big mess for many reasons. Unicode is a stateful character set, even though the encoding is stateless; TRON code is the other way around (and with a significantly simpler stateful encoding than ISO 2022).)

        > the ASN1 docs discourage using the UniversalString and GraphicString types

        UniversalString is UTF-32BE and GraphicString is ISO 2022 without control characters. By knowing what they are, you should know in which circumstances they should be considered useful or not useful; I think that they should not be discouraged in general (although usually if you want Unicode, you would use UTF-8 rather than UTF-32, there are some circumstances where you might want to use UTF-32, such as if the data or program is already UTF-32 for other reasons).

        (The data type which probably should be avoided is the UTC time type, which is not Y2K compliant.)

        > All these different string types are intimidating if I just want unicode/ascii

        If you only want ASCII, use the IA5 type (or Visible if you do not want control characters); if you only want Unicode, use the UTF-8 string type (or Universal if you want UTF-32 instead for some reason). ("IA5" is another name for ASCII that as far as I can tell hardly anyone other than ITU uses.)

        However, Unicode is not a very good character set, and they should not force or expect you to use it.

        As I had mentioned before, you do not need to use or implement all of the ASN.1 data types; only use the ones appropriate for your application (so, if you do not like most of the types, then don't use those types). I also made up some additional nonstandard ASN.1 types (called ASN.1X), which also might be useful for some applications; you are not required to use or implement these either.