Comment by mrheosuper
Comment by mrheosuper a day ago
>We’ve seen four different lengths so far:
Number of UTF-8 code units (17 in this case) Number of UTF-16 code units (7 in this case) Number of UTF-32 code units or Unicode scalar values (5 in this case) Number of extended grapheme clusters (1 in this case)
We would not have this problem if we all agree to return number of bytes instead.
Edit: My mistake. There would still be inconsistency between different encoding. My point is, if we all decided to report number of bytes that string used instead number of printable characters, we would not have the inconsistency between languages.
"number of bytes" is dependent on the text encoding.
UTF-8 code units _are_ bytes, which is one of the things that makes UTF-8 very nice and why it has won