Comment by tbrownaw

Comment by tbrownaw 5 days ago

14 replies

> But the model designers were aware that features could be correlated with demographic groups in a way that would make them proxies.

There's a huge problem with people trying to use umbrella usage to predict flooding. Some people are trying to develop a computer model that uses rainfall instead, but watchdog groups have raised concerns that rainfall may be used as a proxy for umbrella usage.

(It seems rather strange to expect a statistical model trained for accuracy to infer and indirect through a shadow variable that makes it less accurate, simply because it's something easy for humans to observe directly and then use as a lossy shortcut or to promote alternate goals that aren't part of the labels being trained for or whatever.)

> These are two sets of unavoidable tradeoffs: focusing on one fairness definition can lead to worse outcomes on others. Similarly, focusing on one group can lead to worse performance for other groups. In evaluating its model, the city made a choice to focus on false positives and on reducing ethnicity/nationality based disparities. Precisely because the reweighting procedure made some gains in this direction, the model did worse on other dimensions.

Nice to see an investigation that's serious enough to acknowledge this.

tripletao 5 days ago

They correctly note the existence of a tradeoff, but I don't find their statement of it very clear. Ideally, a model would be fair in the senses that:

1. In aggregate over any nationality, people face the same probability of a false positive.

2. Two people who are identical except for their nationality face the same probability of a false positive.

In general, it's impossible to achieve both properties. If the output and at least one other input correlate with nationality, then a model that ignores nationality fails (1). We can add back nationality and reweight to fix that, but then it fails (2).

This tradeoff is most frequently discussed in the context of statistical models, since those make that explicit. It applies to any process for deciding though, including human decisions.

  • londons_explore 5 days ago

    > Two people who are identical except for their nationality face the same probability of a false positive

    It would be immoral to disadvantage one nationality over another. But we also cannot disadvantage one age group over another. Or one gender over another. Or one hair colour over another. Or one brand of car over another.

    So if we update this statement:

    > Two people who are identical except for any set of properties face the same probability of a false positive.

    With that new constraint, I don't believe it is possible to construct a model which outperforms a data-less coin flip.

    • drdaeman 4 days ago

      I think you took too much of a jump, considering all properties the same, as if the only way to make the system fair is to make it entirely blind to the applicant.

      We tend to distinguish between ascribed and achieved characteristics. It is considered to be unethical to discriminate upon things a person has no control over, such as their nationality, gender, age or natural hair color.

      However, things like a car brand are entirely dependent on one's own actions, and if there's a meaningful statistically significant correlation owning a Maserati and fraudulently applying for welfare, I'm not entirely sure it would be unethical to consider such factor.

      And it also depends on what a false positive means for a person in question. Fairness (like most things social) is not binary, and while outright rejections can be very unfair, additional scrutiny can be less so, even though still not fair (causing prolonged times and extra stress). If things are working normally, I believe there's a sort of (ever-changing, of course, as times and circumstances evolve) an unspoken social agreement on what's the balance between fairness and abuse that can be afforded.

      • luckylion 4 days ago

        > It is considered to be unethical to discriminate upon things a person has no control over, such as their nationality, gender, age or natural hair color.

        Nationality and natural hair color I understand, but age and gender? A lot of behaviors are not evenly distributed. Riots after a football match? You're unlikely to find a lot of elderly women (and men, but especially women) involved. Someone is fattening a child? That elderly women you've excluded for riots suddenly becomes a prime suspect.

        > things like a car brand are entirely dependent on one's own actions

        If you assume perfect free will, sure. But do you?

      • belorn 4 days ago

        Could we look at what kind of achieved characteristics exists that do not act as a proxy for an ascribed characteristics, because I have a really hard time to find those. Culture and values are highly intertwined with behavior, and the bigger the impact the behavior has on a person life, it seems that the stronger the proxy behavior is going to be.

        To take a few examples, looking at employment characteristics will have a strong relationship with gender, generally creating greater false positives for women. Similarly, academic success will have greater false positives for men. Where a person choose to live will proxy heavily towards social economic factors, which in turn has gender as a major factor.

        Welfare fraud in itself also has differences between men and women. The sums tend to be higher for men. Women in turn dominate the users of the welfare system. Women and men also tend to receive welfare at different time in their life. It possible even that car brand has a correlation with gender which then would act as a proxy.

        In terms of defining fairness, I do find it interesting that the Analogue Process gave men a beneficial advantage, while both the initial and the reweighed model are the opposite and give women an even bigger beneficial advantage. The change in bias against men created by using the detection algorithms is actually about the same size as the change in bias against non-dutch nationality between initial model and the reweighed one.

    • Borealid 4 days ago

      I think the ethical desire is not to remove bias across all properties. Properties that result from an individual's conscious choices are allowed to be used as factors.

      One can't change one's race, but changing marital status is possible.

      Where it gets tricky is things like physical fitness or social groups...

  • like_any_other 4 days ago

    > Ideally, a model would be fair in the senses that: 1. In aggregate over any nationality, people face the same probability of a false positive.

    Why? We've been told time and time again that 'nations' don't really exist, they're just recent meaningless social constructs [1]. And 'races' exist even less [2]. So why is it any worse if a model is biased on nation or race, than on left-handedness or musical taste or what brand of car one drives? They're all equally meaningless, aren't they?

    [1] https://www.reddit.com/r/AskHistorians/comments/18ubjpv/the_...

    [2] https://www.scientificamerican.com/article/race-is-a-social-...

    • tripletao 4 days ago

      I'm making a mathematical statement, not a moral one. I chose "nationality" as my input because the linked article focused on that, but the statement applies equally to any other input.

      As already noted, any classifier better than a coin flip will disfavor some groups. The choice of which groups are acceptable to disfavor is political and somewhat arbitrary here. For example, these authors accept disfavoring people based on poverty ("sum of assets") or romantic relationship status ("single or partnered?"), but don't accept parenthood or nationality.

  • kurthr 5 days ago

    This is a really key result. You can't effectively be "blind" to a parameter that is significantly correlated to multiple inputs and your output prediction. By using those inputs to minimize false positives you are not statistically blind, and you can't correct the statistics while being blind.

    My suspicion is that in many situations you could build a detector/estimator which was fairly close to being blind without a significant total increase in false positives, but how much is too much?

    I'm actually more concerned that where I live even accuracy has ceased to be the point.

  • tanewishly 4 days ago

    > 2. Two people who are identical except for their nationality face the same probability of a false positive.

    That seems to fall afoul of the Base Rate Fallacy. Eg, consider 2 groups of 10,000 people and testing on A vs B. First group has 9,999 A and 1 B, second has 1 A and 9,999 B. Unless you make your test blatantly ineffective, you're going to have different false positive rates -- irrespectiveof the test's performance.

    • tripletao 4 days ago

      The linked article already notes that model accuracy degraded after their reweighting, ultimately contributing to their abandonment of the project. (For completeness, they could also have considered nationality in the opposite direction, improving accuracy vs. nominally blind baseline at the cost of yet more disparate false positives; but that's so politically unacceptable that it's not even mentioned.)

      My point is that even if we're willing to trade accuracy for "fairness", it's not possible for any classifier to satisfy both those definitions of fairness. By returning to human judgment they've obfuscated that problem but not solved it.

      • tanewishly 19 hours ago

        My point was that there is no test (or classifier) that can always guarantee that one definition of fairness by itself, irrespective of the base rate. If the classifier acts the same independent of base rate, there are always base rates (ie occurrence rates in the rates population) for which the classifier will fail the given definition.

        That illustrates that the given definition cannot hold universally, irrespective of what classifier you dream up. Unless your classifier is not independent from the base rate - that is, a classifier that gets more lenient if there's more fraud in the group. That seems undesirable when considering fairness as a goal.