Comment by xyzzyz
Comment by xyzzyz 2 days ago
> There are a lot of operations that are valid and well-defined on binary strings, such as (...), and so on.
Every single thing you listed here is supported by &[u8] type. That's the point: if you want to operate on data without assuming it's valid UTF-8, you just use &[u8] (or allocating Vec<u8>), and the standard library offers what you'd typically want, except of the functions that assume that the string is valid UTF-8 (like e.g. iterating over code points). If you want that, you need to convert your &[u8] to &str, and the process of conversion forces you to check for conversion errors.
The problem is that there are so many functions that unnecessarily take `&str` rather than `&[u8]` because the expectation is that textual things should use `&str`.
So you naturally write another one of these functions that takes a `&str` so that it can pass to another function that only accepts `&str`.
Fundamentally no one actually requires validation (ie, walking over the string an extra time up front), we're just making it part of the contract because something else has made it part of the contract.