Comment by crote

Comment by crote 4 days ago

6 replies

Let's say you are making a model to judge job applicants. You are aware that the training data is biased in favor of men, so you remove all explicit mentions of gender from their CVs and cover letters.

Upon evaluation, your model seems to accept everyone who mentions a "fraternity" and reject anyone who mentions a "sorority". Swapping out the words turns a strong reject into a strong accept, and vice versa.

But you removed any explicit mention of gender, so surely your model couldn't possibly be showing an anti-women bias, right?

alternatex 4 days ago

I've never had any implication of my gender other than my name in any CV over the past decade.

Who are these people who make a career history doc include gender-implicating data? And if there are such CVs, they should be stripped of such data before processing.

The fraternity example is such a specific 1 in a 1000 case.

  • crote 4 days ago

    Just because you aren't aware of it, doesn't mean it isn't there. There are plenty of less on-the-nose examples a model can accidentally train itself on.

    Into horseriding? Probably a woman. Into motorcycles? Probably a man. Into musical theater? Probably a woman. Into football? Probably a man. Worked part-time for a few years in your 30s? Probably a woman. I could go on and on for hours, as there are relatively few hobbies and interests which have a truly gender-neutral audience.

    The problem isn't obvious bias. Everyone can see those and filtering them out is trivial. It's the subtle proxy values which are risky, as you have to be very careful to avoid accidentally training on those.

    CVs should ideally indeed be stripped of such data, but how do you propose we verify that we stripped it of all potential proxies? And what's going to be left to train on after stripping?

    • alternatex 2 days ago

      I feel like what will be left is only the things that are good indicators of professional competence. Of all things that are fluff in a CV, a hobbies section would probably be #1. Not to mention probably a red flag for any CV reviewer.

  • triceratops 4 days ago

    > I've never had any implication of my gender other than my name in any CV

    So you're not implying gender other than by implying gender? If humans can use names to classify people into genders, a model can do the same thing.

    • alternatex 2 days ago

      It's information that's easy to strip before running it through machine learning.

      The implication in the parent comment is that CVs are inherently bound to gender and I cannot see that to be the case for most.