Comment by Rendello
These lists (and the future library) were made to test normalization and break software that made bad assumptions. I initially generated the list because I knew that some of the assumptions the parser I was writing were not solid, and sure enough the tests broke it.
Someone pointed out the canonical source, which I'll have to look at more closely: