Comment by tralarpa
Fascinating and annoying problem, indeed. In Java, the correct way to iterate over the characters (Unicode scalar values) of a string is to use the IntStream provided by String::codePoints (since Java 8), but I bet 99.9999% of the existing code uses 16-bit chars.
This does not fix the problem. The emoji consists of multiple Unicode characters (in turn represented 1:1 by the integer "code point" values). There is much more to it than the problem of surrogate pairs.