Comment by ks2048
It's worth noting that Unicode already defines a "General Category" for all code points that categorizes some of these types of "weird" characters.
https://en.wikipedia.org/wiki/Unicode_character_property#Gen...
e.g. in Python,
import unicodedata
print(unicodedata.category(chr(0)))
print(unicodedata.category(chr(0xdead)))
Shows "Cc" (control) and "Cs" (surrogate).