Comment by jerf
Comment by jerf a day ago
While the general question about string encoding is fine, unfortunately in a general-purpose and cross-platform language, a file interface that enforces Unicode correctness is actively broken, in that there are files out in the world it will be unable to interact with. If your language is enforcing that, and it doesn't have a fallback to a bag of bytes, it is broken, you just haven't encountered it. Go is correct on this specific API. I'm not celebrating that fact here, nor do I expect the Go designers are either, but it's still correct.
This is one of those things that kind of bugs me about, say, OsStr / OsString in Rust. In theory, it’s a very nice, principled approach to strings (must be UTF-8) and filenames (arbitrary bytes, almost, on Linux & Mac). In practice, the ergonomics around OsStr are horrible. They are missing most of the API that normal strings have… it seems like manipulating them is an afterthought, and it was assumed that people would treat them as opaque (which is wrong).
Go’s more chaotic approach to allow strings to have non-Unicode contents is IMO more ergonomic. You validate that strings are UTF-8 at the place where you care that they are UTF-8. (So I’m agreeing.)