Comment by zokier
Seems like lot of these would be taken care by normalization though? Pre-composed characters are bit of a mess.
I do feel it is a error that unit/math symbols get changed, imho they should stay as-is through case conversions.
Seems like lot of these would be taken care by normalization though? Pre-composed characters are bit of a mess.
I do feel it is a error that unit/math symbols get changed, imho they should stay as-is through case conversions.
These lists (and the future library) were made to test normalization and break software that made bad assumptions. I initially generated the list because I knew that some of the assumptions the parser I was writing were not solid, and sure enough the tests broke it.
Someone pointed out the canonical source, which I'll have to look at more closely:
https://www.unicode.org/Public/16.0.0/ucd/CaseFolding.txt