Comment by xyzzyz
Comment by xyzzyz a day ago
The way Rust handles this is perfectly fine. String type promises its contents are valid UTF-8. When you create it from array of bytes, you have three options: 1) ::from_utf8, which will force you to handle invalid UTF-8 error, 2) ::from_utf8_lossy, which will replace invalid code points with replacement character code point, and 3) from_utf8_unchecked, which will not do the validity check and is explicitly marked as unsafe.
But there's no option to just construct the string with the invalid bytes. 3) is not for this purpose; it is for when you already know that it is valid.
If you use 3) to create a &str/String from invalid bytes, you can't safely use that string as the standard library is unfortunately designed around the assumption that only valid UTF-8 is stored.
https://doc.rust-lang.org/std/primitive.str.html#invariant
> Constructing a non-UTF-8 string slice is not immediate undefined behavior, but any function called on a string slice may assume that it is valid UTF-8, which means that a non-UTF-8 string slice can lead to undefined behavior down the road.