Comment by thomashabets2
Comment by thomashabets2 2 days ago
Author here.
What I intended to say with this is that ignoring the problem if invalid UTF-8 (could be valid iso8859-1) with no error handling, or other way around, has lost me data in the past.
Compare this to Rust, where a path name is of a different type than a mere string. And if you need to treat it like a string and you don't care if it's "a bit wrong" (because it's for being shown to the user), then you can call `.to_string_lossy()`. But it's be more hard to accidentally not handle that case when exact name match does matter.
When exactness matters, `.to_str()` returns `Option<&str>`, so the caller is forced to deal with the situation that the file name may not be UTF-8.
Being sloppy with file name encodings is how data is lost. Go is sloppy with strings of all kinds, file names included.
Thanks for your reply. I understand that encoding the character set in the type system is more explicit and can help find bugs.
But forcing all strings to be UTF-8 does not magically help with the issue you described. In practice I've often seen the opposite: Now you have to write two code paths, one for UTF-8 and one for everything else. And the second one is ignored in practice because it is annoying to write. For example, I built the web server project in your other submission (very cool!) and gave it a tar file that has a non-UTF-8 name. There is no special handling happening, I simply get "error: invalid UTF-8 was detected in one or more arguments" and the application exits. It just refuses to work with non-UTF-8 files at all -- is this less sloppy?
Forcing UTF-8 does not "fix" compatibility in strange edge cases, it just breaks them all. The best approach is to treat data as opaque bytes unless there is a good reason not to. Which is what Go does, so I think it is unfair to blame Go for this particular reason instead of the backup applications.