Comment by Asparagirl
Comment by Asparagirl 3 months ago
Thanks. The original data set, as provided by the VA, has all sorts of data errors and oddities in it. The major ones involving surnames include the inconsistent use of apostrophes in names like O’BRIEN, often written as O BRIEN, and/or vice versa — or the inconsistent formatting of MC and MAC names like MCMAHON as MC MAHON, and/or vice versa. There are also some names where the VA includes an errant dash, not meant to be a hyphen, and other mistakes, as well.
So we try our best to help a user find the veteran even with the dirty data we have. For example, there is code here (using a common NPM package) to convert a user’s potential typed accent marks to a non-accented version of the same letter. In compound surnames we will also break up the surname on a space or a hyphen and search both parts, but not if a surname part is three letters or fewer. It’s imperfect but we have to work with the data we’ve got and can’t and shouldn’t normalize or clean the underlying file.
>names like MCMAHON as MC MAHON, and/or vice versa.
My mom has a Van name and it's hell trying to use government and insurance websites, because they'll take the space out or add one in irregardless of what you use when signing up and then fail to find the account when doing a lookup for things like password resets or for activating the account that they created for her.
It'd seem logical that some sort of fuzzy matching for apostrophe and spaces would be built in, but I've yet to find a government site where that's the case.