Comment by inferiorhuman
Comment by inferiorhuman a day ago
Problems arise when you try to take a slice of a string and end up picking an index (perhaps based on length) that would split a code point. String/str offers an abstraction over Unicode scalars (code points) via the chars iterator, but it all feels a bit messy to have the byte based abstraction more or less be the default.
FWIW the docs indicate that working with grapheme clusters will never end up in the standard library.
You can easily treat `&str` as bytes, just call `.as_bytes()`, and you get `&[u8]`, no questions asked. The reason why you don't want to treat &str as just bytes by default is that it's almost always a wrong thing to do. Moreover, it's the worst kind of a wrong thing, because it actually works correctly 99% of the time, so you might not even realize you have a bug until much too late.
If your API takes &str, and tries to do byte-based indexing, it should almost certainly be taking &[u8] instead.