Comment by tripletao

Comment by tripletao 5 days ago

They correctly note the existence of a tradeoff, but I don't find their statement of it very clear. Ideally, a model would be fair in the senses that:

1. In aggregate over any nationality, people face the same probability of a false positive.

2. Two people who are identical except for their nationality face the same probability of a false positive.

In general, it's impossible to achieve both properties. If the output and at least one other input correlate with nationality, then a model that ignores nationality fails (1). We can add back nationality and reweight to fix that, but then it fails (2).

This tradeoff is most frequently discussed in the context of statistical models, since those make that explicit. It applies to any process for deciding though, including human decisions.

londons_explore 5 days ago

> Two people who are identical except for their nationality face the same probability of a false positive

It would be immoral to disadvantage one nationality over another. But we also cannot disadvantage one age group over another. Or one gender over another. Or one hair colour over another. Or one brand of car over another.

So if we update this statement:

> Two people who are identical except for any set of properties face the same probability of a false positive.

With that new constraint, I don't believe it is possible to construct a model which outperforms a data-less coin flip.

Reply View 6 replies

drdaeman 5 days ago

I think you took too much of a jump, considering all properties the same, as if the only way to make the system fair is to make it entirely blind to the applicant.
We tend to distinguish between ascribed and achieved characteristics. It is considered to be unethical to discriminate upon things a person has no control over, such as their nationality, gender, age or natural hair color.
However, things like a car brand are entirely dependent on one's own actions, and if there's a meaningful statistically significant correlation owning a Maserati and fraudulently applying for welfare, I'm not entirely sure it would be unethical to consider such factor.
And it also depends on what a false positive means for a person in question. Fairness (like most things social) is not binary, and while outright rejections can be very unfair, additional scrutiny can be less so, even though still not fair (causing prolonged times and extra stress). If things are working normally, I believe there's a sort of (ever-changing, of course, as times and circumstances evolve) an unspoken social agreement on what's the balance between fairness and abuse that can be afforded.

Reply View | 4 replies
- luckylion 4 days ago
  
  > It is considered to be unethical to discriminate upon things a person has no control over, such as their nationality, gender, age or natural hair color.
  Nationality and natural hair color I understand, but age and gender? A lot of behaviors are not evenly distributed. Riots after a football match? You're unlikely to find a lot of elderly women (and men, but especially women) involved. Someone is fattening a child? That elderly women you've excluded for riots suddenly becomes a prime suspect.
  > things like a car brand are entirely dependent on one's own actions
  If you assume perfect free will, sure. But do you?
  
  Reply View | 2 replies
  
  drdaeman 4 days ago
  
  > A lot of behaviors are not evenly distributed.
  That’s true. But the idea is that feeding it to a system as an input could be considered unethical, as one cannot control their age. Even though there’s a valid correlation.
  > If you assume perfect free will, sure. But do you?
  I’m not. If this matters, I’m actually currently persuaded that free will doesn’t exist. Which doesn’t change that if one buys a car, its make is typically all their decision. Whenever such decision is coming from them having a free will or entirely determined by antecedent causes doesn’t really matter for purposes of fraud detection (or maybe I fail to see how it does).
  I mean, we don’t need to care why people do things (at all, in general) - it matters for how we should act upon detection, but not for detecting itself. And, as I understand it, we know we don’t want to cause unfair pressure on groups defined by factors they cannot change. Because when we did that it consistently contributed to various undesirable consequences. E.g. discrimination and stereotypes against women or men, or prejudice against younger or elder people didn’t do us any well.
  
  Reply View | 1 reply
  
  luckylion 4 days ago
  
  I get where you're coming from, but I very much doubt it's true RE car makes (and many similar things). There's a reason men and women have very distinct buying habits. E.g. men are ~4x more likely to buy a motorcycle. Individual decisions with that large a discrepancy between groups aren't individual decisions.
  Can a young male really change their risk-tolerance or their innate drive to secure their place in the world (which will probably affect both their likelihood to buy sports cars and commit certain crimes)? I don't think we can pretend that everyone from toddler to granny is the same _and_ use any data to solve crimes / detect fraud.
  In the end it comes down to where we draw the line between "person can't change this, so it's invalid to consider", "we don't believe it's linked, so it's invalid to consider" and "this is free will, so it's a valid signal", and I haven't seen a line that doesn't feel arbitrary ("I don't like that group, so their thing is free will, but I like this group, so their thing isn't") and is useful.
  
  Reply View | 0 replies
- belorn 4 days ago
  
  Could we look at what kind of achieved characteristics exists that do not act as a proxy for an ascribed characteristics, because I have a really hard time to find those. Culture and values are highly intertwined with behavior, and the bigger the impact the behavior has on a person life, it seems that the stronger the proxy behavior is going to be.
  To take a few examples, looking at employment characteristics will have a strong relationship with gender, generally creating greater false positives for women. Similarly, academic success will have greater false positives for men. Where a person choose to live will proxy heavily towards social economic factors, which in turn has gender as a major factor.
  Welfare fraud in itself also has differences between men and women. The sums tend to be higher for men. Women in turn dominate the users of the welfare system. Women and men also tend to receive welfare at different time in their life. It possible even that car brand has a correlation with gender which then would act as a proxy.
  In terms of defining fairness, I do find it interesting that the Analogue Process gave men a beneficial advantage, while both the initial and the reweighed model are the opposite and give women an even bigger beneficial advantage. The change in bias against men created by using the detection algorithms is actually about the same size as the change in bias against non-dutch nationality between initial model and the reweighed one.
  
  Reply View | 0 replies
Borealid 5 days ago

I think the ethical desire is not to remove bias across all properties. Properties that result from an individual's conscious choices are allowed to be used as factors.
One can't change one's race, but changing marital status is possible.
Where it gets tricky is things like physical fitness or social groups...

Reply View | 0 replies

like_any_other 4 days ago

> Ideally, a model would be fair in the senses that: 1. In aggregate over any nationality, people face the same probability of a false positive.

Why? We've been told time and time again that 'nations' don't really exist, they're just recent meaningless social constructs [1]. And 'races' exist even less [2]. So why is it any worse if a model is biased on nation or race, than on left-handedness or musical taste or what brand of car one drives? They're all equally meaningless, aren't they?

[1] https://www.reddit.com/r/AskHistorians/comments/18ubjpv/the_...

[2] https://www.scientificamerican.com/article/race-is-a-social-...

Reply View 1 reply

tripletao 4 days ago

I'm making a mathematical statement, not a moral one. I chose "nationality" as my input because the linked article focused on that, but the statement applies equally to any other input.
As already noted, any classifier better than a coin flip will disfavor some groups. The choice of which groups are acceptable to disfavor is political and somewhat arbitrary here. For example, these authors accept disfavoring people based on poverty ("sum of assets") or romantic relationship status ("single or partnered?"), but don't accept parenthood or nationality.

Reply View | 0 replies

kurthr 5 days ago

This is a really key result. You can't effectively be "blind" to a parameter that is significantly correlated to multiple inputs and your output prediction. By using those inputs to minimize false positives you are not statistically blind, and you can't correct the statistics while being blind.

My suspicion is that in many situations you could build a detector/estimator which was fairly close to being blind without a significant total increase in false positives, but how much is too much?

I'm actually more concerned that where I live even accuracy has ceased to be the point.

Reply View 0 replies

tanewishly 4 days ago

> 2. Two people who are identical except for their nationality face the same probability of a false positive.

That seems to fall afoul of the Base Rate Fallacy. Eg, consider 2 groups of 10,000 people and testing on A vs B. First group has 9,999 A and 1 B, second has 1 A and 9,999 B. Unless you make your test blatantly ineffective, you're going to have different false positive rates -- irrespectiveof the test's performance.

Reply View 2 replies

tripletao 4 days ago

The linked article already notes that model accuracy degraded after their reweighting, ultimately contributing to their abandonment of the project. (For completeness, they could also have considered nationality in the opposite direction, improving accuracy vs. nominally blind baseline at the cost of yet more disparate false positives; but that's so politically unacceptable that it's not even mentioned.)
My point is that even if we're willing to trade accuracy for "fairness", it's not possible for any classifier to satisfy both those definitions of fairness. By returning to human judgment they've obfuscated that problem but not solved it.

Reply View | 1 reply
- tanewishly 19 hours ago
  
  My point was that there is no test (or classifier) that can always guarantee that one definition of fairness by itself, irrespective of the base rate. If the classifier acts the same independent of base rate, there are always base rates (ie occurrence rates in the rates population) for which the classifier will fail the given definition.
  That illustrates that the given definition cannot hold universally, irrespective of what classifier you dream up. Unless your classifier is not independent from the base rate - that is, a classifier that gets more lenient if there's more fraud in the group. That seems undesirable when considering fairness as a goal.
  
  Reply View | 0 replies