Limits of Harms From Affirmative Action

17 Nov

Stories abound about unqualified people getting admitted to highly selective places because of quotas. But the chances are that these are merely stories with no basis in fact. If an institution is highly selective and if the number of applicants is sufficiently large, quotas are unlikely to lead to people with dramatically lower abilities being admitted even when there are dramatic differences across groups. Relatedly, it is unlikely to have much of an impact on the average ability of the admitted cohort. If the point wasn’t obvious enough, it would be after the following simulation. Say the mean IQ of the groups differs by 1 s.d. (which is the difference between Black and White IQ in the US). Say that the admitting institution only takes 1000 people. In the no-quota regime, the top 1000 people get admitted. In the quota regime, 20% of the seats are reserved for the second group. With this framework, we can compare the IQ of the last admitee across the conditions. And the mean ability.

# Set seed for reproducibility
set.seed(123)

# Simulate two standard normal distributions
group1 <- rnorm(1000000, mean = 0, sd = 1)  # Group 1
group2 <- rnorm(1000000, mean = -1, sd = 1)  # Group 2, mean 1 sd lower than Group 1

# Combine into a dataframe with a column identifying the groups
data <- data.frame(
  value = c(group1, group2),
  group = rep(c("Group 1", "Group 2"), each = 1000000)
)

# Pick top 800 values from Group 1 and top 200 values from Group 2
top_800_group1 <- head(sort(data$value[data$group == "Group 1"], decreasing = TRUE), 800)
top_200_group2 <- head(sort(data$value[data$group == "Group 2"], decreasing = TRUE), 200)

# Combine the selected values and estimate the mean
combined_top_1000 <- c(top_800_group1, top_200_group2)

# IQ of the last five admitees
round(tail(head(sort(data$value, decreasing = TRUE), 1000)), 2)
[1] 3.11 3.11 3.10 3.10 3.10 3.10

round(tail(combined_top_1000), 2)
[1] 2.57 2.57 2.57 2.57 2.56 2.56

# Means
round(mean(head(sort(data$value, decreasing = TRUE), 1000)), 2)
[1] 3.37

round(mean(combined_top_1000), 2)
[1] 3.31

# How many people in top 1000 from Group 2 in no-quota?
sorted_data <- data[order(data$value, decreasing = TRUE), ]
top_1000 <- head(sorted_data, 1000)
sum(top_1000$group == "Group 2")
[1] 22

Under no-quota, the person with the least ability who is admitted is 3.1 s.d. above the mean while under quota, the person with the least ability who is admitted is 2.56 s.d. above the mean. The mean ability of the admitted cohort is virtually indistinguishable—3.37 and 3.31 for the no-quota and quota conditions respectively. Not to put too fine a point—the claim that quotas lead to gross misallocation of limited resources is likely grossly wrong. This isn’t to say there isn’t a rub. With a 1 s.d. difference, the representation in the tails is grossly skewed. Without quota, there would be just 22 people from Group 2 in the top 1000. So 178 people from Group 1 get bumped. This point about fairness is perhaps best thought of in context of how much harm comes to those denied admission. Assuming enough supply across the range of selectivity—this is approximately true for the U.S. for higher education with a range of colleges at various levels of selectivity—it is likely the case that those denied admission at more exclusive institutions get admitted at slightly lower ranked institutions and do nearly as well as they would have had they been admitted to more exclusive institutions. (See Dale and Kreuger, etc.).

p.s. In countries like India, 25 years ago, there was fairly limited supply at the top and large discontinuous jumps. Post liberalization of the education sector, this is likely no longer true.

p.p.s. What explains the large racial gap in SAT scores of the admittees to Harvard? It is likely that it is founded in Harvard weighing factors such as athletic performance in admission decisions.

Building Code: Making Government Code Publicly Available

16 May

Very little of the code that the government pays for is open-sourced. One of the reasons is that private companies would rather the code remain under wraps so that the errors never come to light, the price for producing software is never debated, and they get to continue to charge for similar work elsewhere.

Open-sourcing code is liable to produce the following benefits:

  1. It will help us discover bugs.
  2. It will reduce the cost of building similar software. In a federal system, many local agencies produce (or buy) similar software to help administer similar services. Having the code open-sourced is likely to reduce the barrier to entry for firms bidding to build such software and will likely lead to lower costs over time.
  3. Freely available software under a generous license, e.g., queue management software, optimal staffing software, etc., benefits the economy as firms do not have to invest as much in building such systems.
  4. It will likely increase trust in the government. For instance, where software is used to estimate benefits, the auditability of the software is likely to lead to a modest increase in confidence in the correctness of how the law has been translated into code.

There are at least three ways to open-sourcing government code. First, firms like OpenGov that produce open-source software for the government are already helping bring some of the code online. But given that the space for government software is large, it will likely take many decades for a tangible proportion of software to be open-sourced. Second, we can lobby the government to change the law so that companies (and agencies) are mandated to open source certain software they build for the government. But the prognosis is bleak, given that the government contractors are likely lobbying hard against it. The third option is to use FOIA to request code and make it available on Github. I sense that this is a tenable option.