HN Top New Show Ask Jobs

settings

Theme

Hand Mode

Feed

Comment by ks2048

Comment by ks2048 14 hours ago

0 replies

View on Hacker News

It's worth noting that Unicode already defines a "General Category" for all code points that categorizes some of these types of "weird" characters.

https://en.wikipedia.org/wiki/Unicode_character_property#Gen...

e.g. in Python,

   import unicodedata
   print(unicodedata.category(chr(0)))
   print(unicodedata.category(chr(0xdead)))
Shows "Cc" (control) and "Cs" (surrogate).