Comment by crote
Just because you aren't aware of it, doesn't mean it isn't there. There are plenty of less on-the-nose examples a model can accidentally train itself on.
Into horseriding? Probably a woman. Into motorcycles? Probably a man. Into musical theater? Probably a woman. Into football? Probably a man. Worked part-time for a few years in your 30s? Probably a woman. I could go on and on for hours, as there are relatively few hobbies and interests which have a truly gender-neutral audience.
The problem isn't obvious bias. Everyone can see those and filtering them out is trivial. It's the subtle proxy values which are risky, as you have to be very careful to avoid accidentally training on those.
CVs should ideally indeed be stripped of such data, but how do you propose we verify that we stripped it of all potential proxies? And what's going to be left to train on after stripping?
I feel like what will be left is only the things that are good indicators of professional competence. Of all things that are fluff in a CV, a hobbies section would probably be #1. Not to mention probably a red flag for any CV reviewer.