Comment by tomp
Key point:
The model is considered fair if its performance is equal across these groups.
One can immediately see why this is problematic, easily by considering equivalent example in less controversial (i.e. emotionally charged) situations.
Should basketball performance be equal across racial, or sex groups? How about marathon performance?
It’s not unusual that relevant features are correlated with protected features. In the specific example above, being an immigrant is likely correlated with not knowing the local language, therefore being underemployed and hence more likely to apply for benefits.
I think they're saying something more subtle.
In your basketball analogy, it's more like they have a model that predicts basketball performance, and they're saying that model should predict performance equally well across groups, not that the groups should themselves perform equally well.