Comment by degamad
It's not just the variable byte length that causes an issue, in some ways that's the easiest part of the problem. You also have to deal with code points that modify other code points, rather than being characters themselves. That's a huge part of the problem.
That has nothing to do with UTF-8; that's a Unicode issue, and one that's entirely unescapable if you are the Unicode Consortium and your goal is to be compatible with all legacy charsets.