Comment by tralarpa

Comment by tralarpa a day ago

2 replies

Fascinating and annoying problem, indeed. In Java, the correct way to iterate over the characters (Unicode scalar values) of a string is to use the IntStream provided by String::codePoints (since Java 8), but I bet 99.9999% of the existing code uses 16-bit chars.

zahlman a day ago

This does not fix the problem. The emoji consists of multiple Unicode characters (in turn represented 1:1 by the integer "code point" values). There is much more to it than the problem of surrogate pairs.

ivanjermakov 17 hours ago

Codepoint is not cluster and cluster is not character. I bet there is "50 falsehoods about Unicode".