Comment by tialaramex
Comment by tialaramex a day ago
I don't think Rust gets this horribly wrong. &str is some bytes which we've agreed are UTF-8 encoded text. So, it's not a sequence of graphemes, though it does promise that it could be interpreted that way, and it is a sequence of bytes but not just any bytes.
In Rust "AbcdeF"[1] isn't a thing, it won't compile, but "AbcdeF"[1..=1] says we want the UTF-8 substring starting from byte 1 through to byte 1 and that compiles, and it'll work because that string does have a valid UTF-8 substring there, it's "b" -- However it'll panic if we try to "€300"[1..=1] because that's no longer a valid UTF-8 substring, that's nonsense.
For app dev this is too low level, but it's nice to have a string abstraction that's at home on a small embedded device where it doesn't matter that I can interpret flags, or an emoji with appropriate skin tones, or whatever else as a distinct single grapheme in Unicode, but we would like to do a bit better than "Only ASCII works in this device" in 2025.
> I don't think Rust gets this horribly wrong
> In Rust "AbcdeF"[1] isn't a thing, it won't compile, but "AbcdeF"[1..=1] says we want the UTF-8 substring starting from byte 1 through to byte 1 and that compiles, and it'll work because that string does have a valid UTF-8 substring there, it's "b" -- However it'll panic if we try to "€300"[1..=1]
I disagree. IMO, an API that uses byte offsets to substring on Unicode code points (or even larger units?) already is a bad idea, but then, having it panic when the byte offsets do not happen to be code point/(extended) grapheme cluster boundaries?
How are you supposed to use that when, as you say ”we would like to do a bit better than "Only ASCII works in this device" in 2025”?
I see there’s a better API that doesn’t throw (https://doc.rust-lang.org/std/primitive.str.html#method.get), but that IMO, still isn’t as nice as Swift’s choice because it still uses byte offsets