layer8 13 hours ago

No. As the RFC notes: “Silently deleting an ill-formed part of a string is a known security risk. Responding to that risk, Section 3.2 of [UNICODE] recommends dealing with ill-formed byte sequences by signaling an error or replacing problematic code points, ideally with "�" (U+FFFD, REPLACEMENT CHARACTER).”

I would almost always go for “signaling an error”.

Manfred 14 hours ago

My experience writing Unicode related libraries is that people don't use features when you have to explain why and when to use them. I assume that's why Tim puts the emphasis on "working on something new".

CharlesW 13 hours ago

This RFC and Go-language reference library is designed to be used by existing libraries that do serialization/sanitation/validation. This is hot off the press, so I'm sure Tim would appreciate it if you'd let your favorite library know it exists.

xdennis 13 hours ago

How is Unicode in any way related to JSON? JSON should just encode whatever dumb data someone wants to transport.

Unicode validation/cleanup should be done separately because it's needed in multiple places, not just JSON.

  • layer8 13 hours ago

    The contents of JSON strings doesn’t admit random binary data. You need to use an encoding like Base64 for that purpose.

  • zzo38computer 5 hours ago

    JSON (unfortunately) requires strings to be Unicode. (JSON has other problems too, but Unicode is one of them.)

  • recursive 13 hours ago

    JSON is text. If you're not going to use unicode in the representation of your text, you'll need some other way.

    • dcrazy 12 hours ago

      The current JSON spec mandates UTF-8, but practically speaking encoding is a higher-level concept. I suspect there are many server implementations that will respect the Content-Encoding header in a POST request containing JSON.

    • ninkendo 11 hours ago

      So?

      All the letters in this string are “just text”:

          "\u0000\u0089\uDEAD\uD9BF\uDFFF"
      
      JSON itself allows putting sequences of escape characters in the string that don’t unescape to valid Unicode. That’s fine, because the strings aren’t required to represent any particular encoding: it’s up to a layer higher than JSON to be opinionated about that.

      I wouldn’t want my shell’s pipeline buffers to reject data it doesn’t like, why should a JSON serializer?

      • recursive 11 hours ago

        I actually agree, now that I understand what you're talking about.