Comment by 0x000xca0xfe
Comment by 0x000xca0xfe 2 days ago
Thanks for your reply. I understand that encoding the character set in the type system is more explicit and can help find bugs.
But forcing all strings to be UTF-8 does not magically help with the issue you described. In practice I've often seen the opposite: Now you have to write two code paths, one for UTF-8 and one for everything else. And the second one is ignored in practice because it is annoying to write. For example, I built the web server project in your other submission (very cool!) and gave it a tar file that has a non-UTF-8 name. There is no special handling happening, I simply get "error: invalid UTF-8 was detected in one or more arguments" and the application exits. It just refuses to work with non-UTF-8 files at all -- is this less sloppy?
Forcing UTF-8 does not "fix" compatibility in strange edge cases, it just breaks them all. The best approach is to treat data as opaque bytes unless there is a good reason not to. Which is what Go does, so I think it is unfair to blame Go for this particular reason instead of the backup applications.
> It just refuses to work with non-UTF-8 files at all -- is this less sloppy?
You can debate whether it is sloppy but I think an error is much better than silently corrupting data.
> The best approach is to treat data as opaque bytes unless there is a good reason not to
This doesn't seem like a good approach when dealing with strings which are not just blobs of bytes. They have an encoding and generally you want ways to, for instance, convert a string to upper/lowercase.