Comment by amake
That has nothing to do with UTF-8; that's a Unicode issue, and one that's entirely unescapable if you are the Unicode Consortium and your goal is to be compatible with all legacy charsets.
That has nothing to do with UTF-8; that's a Unicode issue, and one that's entirely unescapable if you are the Unicode Consortium and your goal is to be compatible with all legacy charsets.
Yep, that's the point I was making - that choosing fixed 4-byte code-points doesn't significantly reduce the complexity of capturing everything that Unicode does.