One conventional definition of group fairness is that the ML algorithms produce predictions where the FPR (or FNR or both) is the same across groups. Fixating on equating FPR etc. can harm the very groups we are trying to help. So it may be useful to rethink how to solve the problem of reducing unfairness.
One big reason why the FPR may vary across groups is that, given the data, some groups’ outcomes are less predictable than others. This may be because of the limitations of the data itself or because of the limitations of algorithms. For instance, Kearns and Roth in their book bring up the example of college admissions. The training data for college admissions is the decisions made by college counselors. College counselors may well be worse at predicting the success of minority students because they are less familiar with their schools, groups, etc., and this, in turn, may lead to algorithms performing worse on minority students. (Assume the algorithm to be human decision-makers and the point becomes immediately clear.)
One way to address worse performance may be to estimate the uncertainty of the prediction. This allows us to deal with people with wider confidence bounds separately from people with narrower confidence bounds. The optimal strategy for people with wider confidence bounds people may be to collect additional data to become more confident in those predictions. For instance, Komiyama and Noda propose something similar (pdf) to help overcome a lack of information during hiring. Or we may need to figure out a way to compensate people based on their uncertainty interval.
The average width of the uncertainty interval across groups may also serve as a reasonable way to diagnose this particular problem.