Comment by ape4
Comment by ape4 14 hours ago
Seems like libraries that serialize to JSON should have an option to filter out these bad characters.
Comment by ape4 14 hours ago
Seems like libraries that serialize to JSON should have an option to filter out these bad characters.
interesting. isn't in Go it is just `unicode.IsPrint(r rune)`? https://pkg.go.dev/unicode#IsPrint
JSON (unfortunately) requires strings to be Unicode. (JSON has other problems too, but Unicode is one of them.)
So?
All the letters in this string are “just text”:
"\u0000\u0089\uDEAD\uD9BF\uDFFF"
JSON itself allows putting sequences of escape characters in the string that don’t unescape to valid Unicode. That’s fine, because the strings aren’t required to represent any particular encoding: it’s up to a layer higher than JSON to be opinionated about that.I wouldn’t want my shell’s pipeline buffers to reject data it doesn’t like, why should a JSON serializer?
No. As the RFC notes: “Silently deleting an ill-formed part of a string is a known security risk. Responding to that risk, Section 3.2 of [UNICODE] recommends dealing with ill-formed byte sequences by signaling an error or replacing problematic code points, ideally with "�" (U+FFFD, REPLACEMENT CHARACTER).”
I would almost always go for “signaling an error”.