Building Together, Separately: Challenges of Software Development

23 Nov

With Atul Dhingra

Microservices, Macroproblems

A single page on Doordash can make upward of 1000 gRPC calls (see the interview). For many engineers, upward of a thousand network calls nicely illustrate the chaos and inefficiency unleashed by microservices. Engineers implicitly diff 1000+ gRPC calls with the orders of magnitude fewer calls made by a system designed by an architect looking at the problem afresh today. A 1000+ gRPC calls also seem like a perfect recipe for blowing up latency. 

Suboptimality on these dimensions, however, may be optimal. First, cheap compute, bandwidth, and caching have made it economically feasible to address latency caused by superfluous network calls. And second, today, the main cost of software development is often developers, and microservices may reduce that cost (though we come back to this point).

The Appeal of Microservices

When lots of people work together on the ‘same’ thing, they can step on each other’s toes. For instance, if multiple developers are working on the same file (or a feature) at the same time, ‘conflicts’ can ensue; ‘conflict resolution’ is painful. 

The natural solutions to the problem are sequencing and compartmentalizing. Sequencing seems like an answer by a smart aleck who notes that we didn’t stipulate that lots of people should work together at the same time. Gotcha!

Compartmentalizing aims higher. Couldn’t specific teams work on specific pieces in a way that they don’t come in the way of each other? Such a complete Lego-like modularization has been a dream of computer scientists. Many systems also seem perfectly modularizable. For instance, say you are running a small publishing platform that sells ads for monetization. You could separate the system that produces ML predictions that drive ad personalization from the system (and codebase) that powers the UI. In fact, you can do a lot more. You can abstract away even the foundational layers of ad prediction. For instance, there could be a team devoted to content understanding and another for context understanding. But push further and the architecture shows strains. Each team now owns a service and is in charge of deploying, scaling, and maintaining it. (Before we move further, let’s take a moment to marvel at the fact that we can even conceive of small teams owning services. Thanks to the ever-greater abstraction in software and access to cloud services, today, many standard deployments are standardized, e.g., lambda, docker, and Kubernetes. We can even talk of full-stack engineers.) But, as soon as you move to managing a complex service, the limitations become clear. The move to microservices leaves many (generally small) teams with little expertise in some of the less standardized tasks and in triaging issues that may span the stack. Naturally, small teams also don’t benefit as much from having their code reviewed by senior engineers in other part of the organization. Multiple teams may also solve the same problem.

Limits of Compartmentalization

At the extremum of microservices, each service is a black box. Services over the Internet are a good metaphor. Outside of a contract (often instantiated just as an agreement over the API, frequency of calls, and some uptime SLA), you don’t need to know more to use an Internet service. The appeal of this system is immediate. It provides a clear delineation of responsibilities. To make separation complete, advocates of this vision also want no common infrastructure, e.g., shared database, across services. The principle is fine in theory but impossible (or certainly highly inefficient) in practice. Most companies benefit from having shared services, including common databases. (The rise in the use of cloud infrastructure with cloud-native distributed databases like DynamoDB has made some of the demand for this kind of firewalling moot.) The same goes for what OS and libraries you use. For instance, if the codebase is primarily ML models, some standardization around versions of Python, Pytorch, etc., is commonplace. Some standardization allows you to recoup the benefits of working together while still recouping the benefits of working separately.

Different Ways to Compartmentalize

Microservices are but one way to compartmentalize. They happen to be the preferred way to compartmentalize for a world where we prioritize teams becoming experts in business problems rather than specific aspects of software development. Microservices are, in effect, a product manager’s vision of compartmentalization. However the issues with microservices, primarily the need for each team needing to be expert in all portions of the software stack, provide a segue to discuss other ways to compartmentalize. Computer scientists have traditionally opted to compartmentalize over the technical stack, e.g., classically, backend, and frontend teams. The CS compartmentalization prioritizes technical depth in certain areas. In practice, we often see a hybrid of the two approaches. Many companies have infrastructure and DE teams and “pod-like” product or feature-focused teams. The rationale behind the popularity of the hybrid system is again that software benefits from some computer science expertise and from feature specific problem understanding. The exact balance will be specific to the company (and where it is in its journey) but rarely is it at the one end or the other.

Monorepo

The extremum of compartmentalization means not just no shared hardware but also no shared code. But once again rarely is that an optimal arrangement. Having a single repository makes it easier to enforce standards, build a dependency graph (which can help with triaging issues around backward compatibility), and reuse code. It is useful to issue some clarifications: 1. Microservices can (and often do) exist in monorepos, 2. A monorepo doesn’t imply a single language; standards can be set at the repository and language levels, and 3. Monorepo doesn’t constrain deployment options; we can deploy services in a modular fashion or as part of some common release. By the same token, having multiple repositories is no bar to enforcing standards (common CI/CD tests), building a dependency graph (achievable with a little organization), and reusing code (which can be shipped as libraries).

Function Calling Vs. Network Calling

Function calling is better than network calling in three ways. First, function calling avoids latency from network requests. Second, network calling adds network errors to the potential set of errors and hence makes root-causing harder. The third advantage lies in the relative ease of building a dependency graph which enables checking for backward compatibility. However, the theory and practice of deprecating APIs is also well-established. Building a dependency graph from network calling is workable. For one, we generally write client libraries for the APIs that wrap the network calls in functions. For two, we can explicitly ask for identification, e.g., a team ID as part of the network call. Third, and more commonly, API deprecation strategies are well developed, including adding deprecation statuses as part of the return object, which along with publishing API specs, then makes it the responsibility of the downstream customer to make the changes in response to any breaking changes.

Lastly, some people point to another disadvantage of network calling. It is conventional today for everybody in the gRPC call graph to get a ping when a service goes down. However, this ought to be addressable by building logging that traces the issue to a particular service.

Release Cadence

The smaller the release, the easier it is to triage what went wrong. When you make bulk changes, it is possible for errors to go unnoticed. To give an example from ML, you could easily do two correct feature changes and three bad ones and still have test performance tick in the right direction. And while it may be optimal to release, the counterfactual is that with five correct features (which we would get to if we identified the issues with the three), the performance would have been even better. 

Releasing frequently, however, is not always an option. Release cadence is most strongly affected by how the software is distributed and how many other services depend on the software. For cloud-native software like Google Docs, the releases can be faster. For mobile applications, you cannot release too frequently as updates are disruptive for the user. Even frequent updates to Chrome feel exhausting. Developers of widely used OS also have to be cognizant of developers on their platforms. They need to provide enough headspace for developers of important applications that run on the platform to adequately test and amend their software. Small changes seem good from the perspective of being able to detect errors. But small releases mean frequent releases. And releasing too frequently can hinder the ability to detect errors. If you release too frequently, it is not easy to figure out which version you should roll back to, as the problems don’t always take seconds to surface. As a result, often, organizations snap to some kind of a cadence that is a compromise between velocity and the width of the window needed to reliably surface problems from deployments.

Acknowledgment: This essay benefitted from discussions with Naresh Bhatti and Khurram Nasser.

References

  1. Celozi, Cezarre. 2020. Future-proofing: How DoorDash Transitioned from a Code Monolith to a Microservice Architecture. https://careersatdoordash.com/blog/how-doordash-transitioned-from-a-monolith-to-microservice
  2. NeetCodeIO. 2024. Microservices are Technical Debt. https://www.youtube.com/watch?v=LcJKxPXYudE
  1. Kolny, Marcin. 2023. Scaling up the Prime Video audio/video monitoring service and reducing costs by 90%. https://web.archive.org/web/20240415193548/https://www.primevideotech.com/video-streaming/scaling-up-the-prime-video-audio-video-monitoring-service-and-reducing-costs-by-90 
  2. Potvin, Rachel, and Josh Levenberg. 2016. Why Google stores billions of lines of code in a single repository. Communications of the ACM 59.7: 78-87. https://research.google/pubs/why-google-stores-billions-of-lines-of-code-in-a-single-repository/

Unbiased Hiring

10 Sep

When hiring for technical positions, people often use high-precision heuristics conditional on relevance to decide who to interview. Otherwise, the interviewing burden would be too much. The advent of GPT makes use of high-precision heuristics yet more important as interviewing and resumes are less trustworthy now. The heuristics people often rely on are ~ network, fancy pants school and company, and international student (for junior applicants). But this likely leads to bias against certain groups, etc. There are at least three potential solutions that improve upon the status quo:

  1. Use test scores from exams that are widely taken that are correlated with performance (after confirming the relationship), e.g., IQ, SAT, GRE, and use that to filter.
  2. Use testing companies to offer proctored exams in topical areas. Companies like Karat are well placed to offer such testing. And working with companies like Coursera, which have strong incentives to make their certifications meaningful, may be a good idea. The larger idea has data behind it. Advanced software certifications offered by major companies are widely viewed as good signals of competence.
  3. Create a paid show-me-your-work period. The solution is the least attractive as it is expensive for the company and for most applicants (except for fresh graduates or unemployed people).

p.s. Many of the job postings are too generically worded and plausibly cause more unqualified people to apply. Making job descriptions more precise, e.g., “proficiency in pandas,” may help.

Less Known Facts About Muslims in India

18 May

There are two main concerns about Indian Muslims. First, many fear that Indian Muslims are greatly behind other groups. The second, deeper concern relates to Muslim women. Many fear sex gaps to be much larger among Muslims than other groups. 

Let’s look at the facts.

  1. Education. The literacy rate of rural Muslim Indian women is a shade higher than that of rural Hindu women. Pair it with the fact that the sex gap in literacy across urban and rural among Hindus is greater than among Muslims. (Note also how pleasing the numbers are among Sikhs.)

Source: Wikipedia (via NSO).

2. HH Expenditure. The Muslim/Hindu household expenditure ratio across rural India is 110 (see here). After adjusting for household size, it drops to 97. In no state except Delhi (which has a tiny rural population) is the ratio ever lower than 83, and in 15 out of 22 states (this is still 2007), the Muslim/Hindu ratio is over 100. There is also significant regional variation. Muslims in rural Kerala have a higher average household expenditure than the mean household expenditure among rural Hindus in any state! The urban Muslim/Hindu expenditure ratios look starker, but the mean is 87. None of this accounts for the fact that Muslims are, on average, younger than Hindus. 

3. Infant Mortality. According to Vaclav Smil, “Infant mortality is an excellent proxy for a wide range of conditions including income, quality of housing, nutrition, education, and investment in health care.” Indian Muslims have long enjoyed an advantage over Hindus (see here and here).

4. Share of Population and TFR.

  • The share of Hindus as a percent of population declined by nearly 8% over the many decades. Compare that to India’s neighbors, especially East Pakistan/Bangladesh, but also Burma (see here).
  • One metric that is correlated with development is TFR. In the last 27 years, the TFR among Hindus has declined from 3.3 to 1.9, while among Muslims it has declined from 4.4 to 2.4 (see here).

Relative Status of Brahmins Across India in 1931

15 Apr

One of the common misunderstandings about caste in India is that the extent of Brahmin privilege is similar across India. One way we can investigate this is by examining literacy rates across castes. In the 1931 census, the Bombay region was a picture of Brahmin dominance, with the literacy rate of Brahmins 1.7x that of the next caste group (Lohana). In Madras, the dominance is less pronounced, with the literacy rates of Brahmins 1.25x that of the next caste group, Nayars. But if you move to Punjab, the pattern reverses, with Brahmins no longer the most literate caste. The Khatri literacy rate in Punjab is 1.7x that of Brahmins. (See also Bengal, where Brahmins were not the most literate caste.)

Punjab and Sind are also interesting for their small populations of Scheduled Castes, with just 4.5% of the population categorized as such compared to the average of about 14% (Appendix II of Ambedkar’s book on Pakistan). (This is thought to be partly a result of conversions to Sikhism and Islam.)

How Many People Does It Take …?

9 Mar

A 700,000 square meters Hindu temple—the world’s second-largest Hindu temple—was recently inaugurated in New Jersey. The sect that built the temple is known for building many other elaborate temples, see, for instance, the temple in Delhi (TripAdvisor rates it as the second biggest attraction in Delhi) or the temple in Abu Dhabi. What is more wondrous is that many of these temples have been built in the last two decades.

BAPS Temple in Robbinsville, NJ

BAPS is not the OG of non-state groups building astonishing religious monuments in the modern era. That title rests with the Bahai. The oldest extant Bahai house of worship dates back to 1953. Since then, the Bahai have built many masterpieces, including the Santiago Bahai Temple and the Lotus Temple in Delhi.

Such architectural feats make you think of tens of millions of wealthy followers. The reality is more modest and hence more impressive. “After over 100 years of growth, the organization [behind the Akshardham temples] has … over 1 million followers” (BAPS website). The Bahai have more followers (~ 8 million at the maximum), but they have been relentlessly persecuted over the last century. All of this makes me wonder what we could accomplish if more of us came together.

p.s. The flip side is the harm that relatively small groups of people can impose on the world. For instance, the Taliban forces are no more than a couple of hundred thousand.

p.p.s. It is striking that we don’t have parallel secular achievements that are non-state and non-billionaire funded. I suppose the closest we have is open source software (though a lot of the work on major projects is done within companies).

Take 80 mg Atorvastatin for Myalgia

4 Mar

Here’s a drug pamphlet with a table about side effects

The same table can be found here: https://www.rxlist.com/lipitor-drug.htm#side_effects

As you can see, for a range of conditions, the rate at which patients experience side effects is greater in the Placebo arm than in the 80 mg arm. (Also note that patients seem to experience fewer side effects in the 80 mg arm compared to the 10 mg arm.)

It turns out it is fake news. (Note the ‘regardless of causality’ phrase dropped carelessly at the end of the title of the table.) (The likely cause is that they mixed data from multiple trials.)

Here’s an excerpt from the TNT trial that compares side effects in the 10 mg arm to the 80 mg arm:

Adverse events related to treatment occurred in 406 patients in the group given 80 mg of atorvastatin, as compared with 289 patients in the group given 10 mg of atorvastatin (8.1 percent vs. 5.8 percent, P<0.001). The respective rates of discontinuation due to treatment-related adverse events were 7.2 percent and 5.3 percent (P<0.001). Treatment-related myalgia was reported by 241 patients in the group given 80 mg of atorvastatin and by 234 patients in the group given 10 mg of atorvastatin (4.8 percent and 4.7 percent, respectively; P=0.72).

From https://www.nejm.org/doi/full/10.1056/NEJMoa050461

p.s. The other compelling thing that may go under the radar is the dramatic variability of symptoms in the Placebo arm that is implied by the data. But to get to nocebo, we would need a control group.

p.p.s. Impact of statin therapy on all cause mortality:

From ASCOT-LLA

From the TNT trial.

From Living Instinctively to Living With History

4 Mar

Listening to my maternal grandparents narrate their experience of living with Muslims was confusing. According to them, Hindus and Muslims lived harmoniously. They also liked each other. Hindus and Muslims wouldn’t eat at each other’s houses or may use separate utensils but that had less to do with discrimination and more to do with accomodating each other’s faiths. Even in their recollections of the partition, I couldn’t detect bitterness. They narrated it as an adventure. But to many Hindus (and Muslims) today, it is hard to think of a time when Hindu-Muslim relations did not have a strong undercurrent of historic grievances and suspicion. Today many Hindus have a long litany of grievances, of repeat Muslim invasions, destruction of temples, and such.

Naipaul’s India: A Million Mutinies may have an answer to the puzzle.* People may go from a time when the “wider world is unknown” because they are “without the means of understanding this world” to a time when they have the means and the politics that comes with that greater capacity, from living instinctively to living with grievances.

“… The British forces the correspondent William Howard Russell had seen at the siege of Lucknow had been made up principally of Scottish Highlanders and Sikhs. Less than 10 years before, the Sikhs had been defeated by the sepoy army of the British. Now, during the Mutiny, the Sikhs – still living as instinctively as other Indians, still fighting the internal wars of India, with almost no idea of the foreign imperial order they were serving – were on the British side.”

From India: A Million Mutinies by V. S. Naipaul

Here’s some color on the sepoy army:

“From Russell’s book I learned that the British name for the Indian sepoy, the soldier of the British East India Company who was now the mutineer, was ‘Pandy’. ‘Why Pandy? Well, because it is a very common name among the sepoys …’ It is in fact a brahmin name from this part of India. Brahmins here formed a substantial part of the Hindu population, and the British army in northern India was to some extent a brahmin army.

From India: A Million Mutinies by V. S. Naipaul

“people who – Pandy or Sikh, porter or camp-following…Hindu merchant – run with high delight to aid the foreigner to overcome their brethren. That idea of ‘brethren’ – an idea so simple to Russell that the word is used by him with clear irony – is very far from the people to whom he applies it. …The Hindus would have no loyalty except to their clan; they would have no higher idea of human association, no general idea of the responsibility of man to his fellow. And because of that missing large idea of human association, the country works blindly on ….

the India that will come into being at the end of the period of British rule will be better educated, more creative and full of possibility than the India of a century before; that it will have a larger idea of human association, and that out of this larger idea, and out of the encompassing humiliation of British rule, there will come to India the ideas of country.”

From India: A Million Mutinies by V. S. Naipaul

Elsewhere:

To awaken to history was to cease to live instinctively. It was to begin to see oneself and one’s group the way the outside world saw one; and it was to know a kind of rage. India was now full of this rage. There had been a general awakening. But everyone awakened first to his own group or community; every group thought itself unique in its awakening; and every group sought to separate its rage from the rage of other groups.

From India: A Million Mutinies by V. S. Naipaul

* The theory isn’t original to him. Others have pointed to how many Indians didn’t see themselves as part of a larger polity. The point also applies more broadly, to other groups.

A Benchmark For Benchmarks

30 Dec

Benchmark datasets like MNIST, ImageNet, etc., abound in machine learning. Such datasets stimulate work on a problem by providing an agreed-upon mark to beat. Many of the benchmark datasets, however, are constructed in an ad hoc manner. As a result, it is hard to understand why the best-performing models vary across different benchmark datasets (see here), to compare models, and to confidently prognosticate about performance on a new dataset. To address such issues, in the following paragraphs, we provide a framework for building a good benchmark dataset.

I am looking for feedback. Please let me know how I can improve this.

Inter-Group Prejudice

16 Dec

Prejudice is a bane of humanity. Unjustified aversive beliefs and affect are the primary proximal causes of aversive behavior toward groups. Such beliefs and sentiments cause aversive speech and physical violence. They also serve as justification for denying people rights and opportunities. Prejudice also creates a deadweight loss. For instance, many people refuse to trade with groups they dislike. Prejudice is the reason why so many people lead diminished lives.

So why do so many people have incorrect aversive beliefs about other groups (and commensurately, unjustified positive beliefs about their group)?

If you have suggestions about how to improve the essay, please email me.

Compensation With Currency With No Agreed Upon Value

14 Dec

Equity is an integral part of start-up compensation. However, employees and employers may disagree about the value of equity. Employers, for instance, may value equity higher than potential employees because they have access to better data or simply because they are more optimistic. One consequence of the disagreement between potential employees’ and employers’ valuations of equity is that some salary negotiations may fail. In the particular scenario that I highlightabove, one way out of the quandary may be to endow an employee with options commensurate with their lower valuation and have a buy-back clause if the employer’s prediction pans out (when the company is valued in the next round or during exit). Another way to interpret this particular trade is as trading risk for a cap on the upside. Thus, this kind of strategy may also be useful where employees are more risk-averse than employers.

Optimally Suboptimal: Behavioral-Economic Product Features

14 Dec

Booking travel online feels like shopping in an Indian bazaar: a deluge of options, no credible information, aggressive hawkers (“recommendations” and “targeted ads”), and hours of frantic search that ends with purchasing something more out of exhaustion than conviction. Online travel booking is not unique in offering this miserable experience. Buying on Amazon feels like a similar sand trap. But why is that? Poor product management? A more provocative but perhaps more accurate answer is that the product experience, largely unchanged or becoming worse in the case of Amazon, is “optimal.” Many people enjoy the “hunt.” They love spending hours on end looking for a deal, comparing features, and collecting and interpreting wisps of information. To satiate this need, the “optimal” UI for a market may well be what you see on Amazon or travel booking sites. The lack of trustworthy information is a feature, not a bug.

The point applies more broadly. A range of products have features that have no other purpose than gaming behavioral concerns. Remember the spinning wheel on your tax preparation software as the software looks for all the opportunities to save you money? That travesty is in the service of convincing users that the software is ‘working hard.’ Take another example. Many cake mixes sold today require you to add an egg. That ruse was invented to give housewives (primarily the ones who were cooking say 50 years ago) the feeling that they were cooking. One more. The permanent “sales” at Macy’s and at your local grocery store mean that everyone walks out feeling like a winner. And that means a greater likelihood of you coming back again.

p.s. When the users don’t trust the website, the utility of recommendations in improving consumer surplus ~ 0 among sophisticated users.

Related: https://gojiberries.io/2023/09/09/not-recommended-why-current-content-recommendation-systems-fail-us/

Time Will Tell

23 Nov

Part of empirical social science is about finding fundamental truths about people. It is a difficult enterprise partly because scientists only observe data in a particular context. Neither cross-sectional variation nor data that goes back at best by tens of years is often enough to come up with generalizable truths. Longer observation windows help clarify what is an essential truth and what is, at best, a contextual truth. 

Support For Racially Justified and Targeted Affirmative Action

Sniderman and Carmines (1999) find that a large majority of Democrats and Republicans oppose racially justified and targeted affirmative action policies. They find that opposition to racially targeted affirmative action is not rooted in prejudice. Instead, they conjecture that it is rooted in adherence to the principle of equality. The authors don’t say it outright but the reader can surmise that in their view, opposition to racially justified and targeted affirmative action is likely to be continued and broad-based. It is a fair hypothesis. Except 20 years later, a majority of Democrats support racially targeted and racially justified affirmative action in education and hiring (see here).

What’s the Matter with “What’s the Matter with What’s the Matter with Kansas”?

It isn’t clear Bartels was right about Kansas even in 2004 (see here) (and that isn’t to say Thomas Frank was right) but the thesis around education has taken a nosedive. See below.

Split Ticket Voting For Moderation

On the back of record split ticket voting, Fiorina (and others) theorized “divided government is the result of a conscious attempt by the voters to achieve moderate policy.” Except very quickly split ticket voting declined (with of course no commensurate radicalization of the population) (see here).

Effect of Daughters on Legislator Ideology

Having daughters was thought to lead politicians to vote more liberally (see here) but more data suggested that this stopped in the polarized era (see here). Yet more data suggested that there was no trend for legislators with daughters to vote liberally before the era covered by the first study (see here).

Why Social Scientists Fail to Predict Dramatic Social Changes

19 Nov

Soviet specialists are often derided for their inability to see the coming collapse of the Soviet Union. But they were not unique. If you look around, social scientists have very little handle on many of the big social changes that have happened over the past 70 or so years.

  1. Dramatic decline in smoking. “The percentage of adults who smoke tobacco has declined from 42% in 1965 (the first year the CDC measured this), to 12.5% in 2020.” (see here.)
  2. Large infrastructure successes in a corrupt, divided developing nation. Over the last 20 or so years, India has pulled off Aadhar, UPI, FastPass, etc., dramatically increased the number of electrified villages, the number of people with access to toilets, the length of highways, etc. 
  3. Dramatic reductions in prejudice against Italians, the Irish, Asians, Women, African Americans, LGBT, etc. (see here, here, etc.)
  4. Dramatic decline in religion, e.g., church-going, etc., in the West.
  5. Dramatic decline in marriage. “According to the study, the marriage rate in 1970 was at 76.5%, and today, it stands at just over 31%.” (see here.)
  6. Obama or Trump. Not many would have given the odds of America electing a black president in 2006. Or electing Trump in 2016.

The list probably spans all the big social changes. How many would have bet on the success of China? Or for what matter Bangladesh, whose HDI are at par or ahead of its more established South Asian neighbors? Or the dramatic liberalization that is underway in Saudi Arabia? After all, the conventional argument before MBS was that the Saudi monarchy had made a deal with the mullahs and that any change would be met with a strong backlash.

All of that begs the question: why? One reason social scientists fail to predict dramatic social change may be because they think the present reflects the equilibrium. For instance, take racial attitudes. The theories about racial prejudice have mostly been defined by the idea that prejudice is irreducible. The second reason may be that most data that social scientists have is cross-sectional or collected over short periods and there isn’t much you can see (especially about change) from small portholes. The primary evidence they have is about lack of change when world looked over longer time spans is defined by astounding change on many dimensions. The third reason may be that social scientists suffer from negativity bias. They are focused on explaining what’s wrong with the world and interpreting data in ways that highlight conventional anxieties. This means that they end up interrogating progress (which is a fine endeavor) but spend too little time acknowledging and explaining real progress. Ideology also likely plays a role. For instance, few notice the long standing progressive racial bias in television; see here for a fun example of the interpretation gymnastics.

p.s. Often, social scientists not just fail to predict but struggle to explain what underlies the dramatic changes years later. Worse, social scientists do not seem to change their mental models based on the changes.

p.p.s. So what changes do I predict? I predict a dramatic decline in caste prejudice in India because of the following reasons: 1. dramatic generational turnover, 2. urbanization, 3. uninformative last names (outside of local context and excluding a maximum of 20% of the last names, e.g., last name ‘kumar’, which means ‘boy’, is exceedingly common, 4. high intra-group variance in physical features, 5. the preferred strategy for a prominent political party is to minimize intra-Hindu religious differences, 6. the current media + religious elites are mostly against caste prejudice. I also expect fairly rapid declines in prejudice against women (though far less steeper than caste) given some of the same reasons.

Against Complacency

19 Nov

Even the best placed among us are to be pitied. Human lives today are blighted by five things:

  1. Limited time. While we have made impressive gains in longevity over the last 100 years, our lives are still too short. 
  2. Less than excellent health. Limited lifespan is further blighted by ill-health. 
  3. Underinvestment. Think about Carl Sagan as your physics teacher, a full-time personal trainer to help you excel physically, a chef, abundant access to nutritious food, a mental health coach, and more. Or an even more effective digital or robotic analog.
  4. Limited opportunity to work on impactful things. Most economic production happens in areas where we are not (directly) working to dramatically enhance human welfare. Opportunities to work on meaningful things are further limited by economic constraints.
  5. Crude tools. The tools we work with are much too crude which means that many of us are stuck executing on a low plane.

Deductions

  1. Given where we are in terms of human development, innovations in health and education are likely the most impactful though innovations in foundational technologies like AI and computation that increase our ability to innovate are probably still more important.
  2. Given that at least a third of the economy is government money in many countries, government can dramatically affect what is produced, e.g., the pace at which we increase longevity, prevent really bad outcomes like an uninhabitable planet, etc.

Traveling Salesman

18 Nov

White-collar elites venerate travel, especially to exotic and far-away places. There is some justification for the fervor—traveling is pleasant. But veneration creates an umbra that hides some truths:

  1. Local travel is underappreciated. We likely underappreciate the novelty and beauty available locally.
  2. Virtual travel is underappreciated. We know all the ways virtual travel doesn’t measure up to the real experience. But we do not ponder enough about how the gap between virtual and physical travel has closed, e.g., high-resolution video, and how some aspects of virtual travel are better:
    1. Cost and convenience. The comfort of the sofa beats the heat and the cold, the crowds, and the fatigue.
    2. Knowledgeable guides. Access to knowledgeable guides online is much greater than offline. 
    3. New vistas. Drones give pleasing viewing angles unavailable to lay tourists.
    4. Access to less visited places. Intrepid YouTubers stream from places far off the tourist map, e.g., here.
  3. The tragedy of the commons. The more people travel, the less appealing it is for everyone because a) travelers change the character of a place and b) the crowds come in the way of enjoyment.
  4. The well-traveled are mistaken as being intellectually sophisticated. “Immersion therapy” can expand horizons by challenging perspectives. But often travel needs to be paired with books, needs to be longer, the traveler needs to make an effort to learn the language, etc., for it to be ‘improving.’
  5. Traveling by air is extremely polluting. A round-trip between LA and NYC emits .62 tons of CO2 which is the same as CO2 generated from driving 1200 miles.

Limits of Harms From Affirmative Action

17 Nov

Stories abound about unqualified people getting admitted to highly selective places because of quotas. But the chances are that these are merely stories with no basis in fact. If an institution is highly selective and if the number of applicants is sufficiently large, quotas are unlikely to lead to people with dramatically lower abilities being admitted even when there are dramatic differences across groups. Relatedly, it is unlikely to have much of an impact on the average ability of the admitted cohort. If the point wasn’t obvious enough, it would be after the following simulation. Say the mean IQ of the groups differs by 1 s.d. (which is the difference between Black and White IQ in the US). Say that the admitting institution only takes 1000 people. In the no-quota regime, the top 1000 people get admitted. In the quota regime, 20% of the seats are reserved for the second group. With this framework, we can compare the IQ of the last admitee across the conditions. And the mean ability.

# Set seed for reproducibility
set.seed(123)

# Simulate two standard normal distributions
group1 <- rnorm(1000000, mean = 0, sd = 1)  # Group 1
group2 <- rnorm(1000000, mean = -1, sd = 1)  # Group 2, mean 1 sd lower than Group 1

# Combine into a dataframe with a column identifying the groups
data <- data.frame(
  value = c(group1, group2),
  group = rep(c("Group 1", "Group 2"), each = 1000000)
)

# Pick top 800 values from Group 1 and top 200 values from Group 2
top_800_group1 <- head(sort(data$value[data$group == "Group 1"], decreasing = TRUE), 800)
top_200_group2 <- head(sort(data$value[data$group == "Group 2"], decreasing = TRUE), 200)

# Combine the selected values and estimate the mean
combined_top_1000 <- c(top_800_group1, top_200_group2)

# IQ of the last five admitees
round(tail(head(sort(data$value, decreasing = TRUE), 1000)), 2)
[1] 3.11 3.11 3.10 3.10 3.10 3.10

round(tail(combined_top_1000), 2)
[1] 2.57 2.57 2.57 2.57 2.56 2.56

# Means
round(mean(head(sort(data$value, decreasing = TRUE), 1000)), 2)
[1] 3.37

round(mean(combined_top_1000), 2)
[1] 3.31

# How many people in top 1000 from Group 2 in no-quota?
sorted_data <- data[order(data$value, decreasing = TRUE), ]
top_1000 <- head(sorted_data, 1000)
sum(top_1000$group == "Group 2")
[1] 22

Under no-quota, the person with the least ability who is admitted is 3.1 s.d. above the mean while under quota, the person with the least ability who is admitted is 2.56 s.d. above the mean. The mean ability of the admitted cohort is virtually indistinguishable—3.37 and 3.31 for the no-quota and quota conditions respectively. Not to put too fine a point—the claim that quotas lead to gross misallocation of limited resources is likely grossly wrong. This isn’t to say there isn’t a rub. With a 1 s.d. difference, the representation in the tails is grossly skewed. Without quota, there would be just 22 people from Group 2 in the top 1000. So 178 people from Group 1 get bumped. This point about fairness is perhaps best thought of in context of how much harm comes to those denied admission. Assuming enough supply across the range of selectivity—this is approximately true for the U.S. for higher education with a range of colleges at various levels of selectivity—it is likely the case that those denied admission at more exclusive institutions get admitted at slightly lower ranked institutions and do nearly as well as they would have had they been admitted to more exclusive institutions. (See Dale and Kreuger, etc.).

p.s. In countries like India, 25 years ago, there was fairly limited supply at the top and large discontinuous jumps. Post liberalization of the education sector, this is likely no longer true.

p.p.s. What explains the large racial gap in SAT scores of the admittees to Harvard? It is likely that it is founded in Harvard weighing factors such as athletic performance in admission decisions.

Missing Market for Academics

16 Nov

There are a few different options for buying time with industry experts, e.g., https://officehours.com/, https://intro.co/, etc. However, there is no marketplace for buying academics’ time. Some surplus is likely lost as a result. For one, some academics want advice on what they write. To get advice, they have three choices—academic friends, reviewers, or interested academics at conferences or talks. All three have their problems. Or they have to resort to informal markets like Kahneman. 

“He called a young psychologist he knew well and asked him to find four experts in the field of judgment and decision-making, and offer them $2,000 each to read his book and tell him if he should quit writing it. “I wanted to know, basically, whether it would destroy my reputation,” he says. He wanted his reviewers to remain anonymous, so they might trash his book without fear of retribution.”

https://www.vanityfair.com/news/2011/12/michael-lewis-201112

For what it’s worth, Kahneman’s book still had major errors. And that may be the point. Had he access to a better market, with ratings on the ability to review quantitative material, he may not have had the errors. A fully fleshed market could offer options to workers to price discriminate based on whether the author is a graduate student or a tenured professor at a top-ranked private university. Such a market may also prove a useful revenue stream for academics with time and talent who want additional money.

Reviewing is but one example. Advice on navigating the academic job market, research design, etc., can all be sold.

Striking Changes Among Democrats on Race and Gender

10 Nov

The election of Donald Trump led many to think that Republicans have changed, especially on race related issues. But the data suggest that the big changes in public opinion on racial issues over the last decade or so have been among Democrats. Since 2012, Democrats have become strikingly more liberal on race, on issues related to women, and the LGBT over the last decade or so.

Conditions Make It Hard for Blacks to Succeed

The percentage of Democrats strongly agreeing with the statement more than doubled between 2012 (~ 20%) and 2020 (~ 45%).

Source: ANES

Affirmative Action in Hiring/Promotion

The percentage of Democrats for affirmative action for Blacks in hiring/promotion nearly doubled between 2012 (~ 26%) and 2020 (~ 51%).

Source: ANES

Fun fact: Support for caste based and gender based reservations in India is ~4x+ higher than support for race based Affirmative Action in the US. See here.

Blacks Should Not Get Special Favors to Get Ahead

The percentage of Democrats strongly disagreeing with the statement nearly tripled between 2012 (~ 13%) and 2020 (~ 41%).

Source: ANES

See also Sniderman and Carmines who show that support for the statement is not rooted in racial prejudice.

Feelings Towards Racial Groups

Democrats in 2020 felt more warmly toward Blacks, Hispanics, and Asians than Whites.

Source: ANES

White Democrats’ Feelings Towards Various Racial Groups

White Democrats in 2020 felt more warmly toward Asians, Blacks, and Hispanics than Whites.

Democrats’ Feelings Towards Gender Groups

Democrats felt 15 points more warmly toward feminists and LGBT in 2020 than in 2012.

Source: ANES

American PII: Lapses in Securing Confidential Data

23 Sep

At least 83% of Americans have had their confidential data shared with a company breached (see here and here). The list of most frequently implicated companies in the loss of confidential data makes for sobering reading. Reputable companies like Linkedin (Microsoft), Adobe, Dropbox, etc., are among the top 20 worst offenders. 

Source: Pwned: The Risk of Exposure From Data Breaches

There are two other seemingly contradictory facts. First, many of the companies that haven’t been able to safeguard confidential data have some kind of highly regarded security certification like SOC-2 (see, e.g., here). The second is that many data breaches are caused by elementary errors, e.g., “the password cryptography was poorly done and many were quickly resolved back to plain text” (here).

The explanation for why companies with highly regarded security certifications fail to protect the data is probably mundane. Supporters of these certifications may rightly claim that these certifications dramatically reduce the chances of a breach without eliminating it. And a 1% error rate can easily lead to the observation we started with.

So, how do we secure data? Before discussing solutions, let me describe the current state. In many companies, PII data is spread across multiple databases. Data protection is based on processes set up for controlling access to data. The data may also be encrypted, but it generally isn’t. Many of these processes to secure the data are also auditable and certifications are granted based on audits.

Rather than relying on adherence to processes, a better bet might be to not let PII data percolate across the system. The primary options for prevention are customer-side PII removal and ingestion-time PII removal. (Methods like differential privacy can be used at either end and in how automated data collection services are setup.) Beyond these systems, you need a system for cases where PII data is shown in the product. One way to handle such cases is to build a system where the PII is hashed during ingest and looked up right before serving from a system that is yet more tightly access controlled. All of these things are well known. Their lack of adoption is partly due to the fact that these services have yet to be abstracted out enough that adding them is as easy as editing a YAML file. And there lies an opportunity.

Not Recommended: Why Current Content Recommendation Systems Fail Us

9 Sep

Recommendation systems paint a wonderful picture: The system automatically gets to know you and caters to your preferences. And that is indeed what happens except that the picture is warped. Warping happens for three reasons. The first is that humans want more than immediate gratification. However, the systems are designed to learn from signals that track behaviors in an environment with strong temptation and mostly learn “System 1 preferences.” The second reason is use of the wrong proxy metric. One common objective function (on content aggregation platforms like YouTube, etc.) is to maximize customer retention (a surrogate for revenue and profits). (It is likely that the objective function doesn’t vary between subscribers and ad-based tier.) And the conventional proxy for retention is time spent on a product. It doesn’t matter much how you achieve that; the easiest way is to sell Fentanyl. The third problem is the lack of good data. Conventionally, the choices of people whose judgment I trust (and the set of people whose judgments these people trust) are a great signal. But they do not make it directly into recommendations on platforms like YouTube, Netflix, etc. Worse, recommendations based on similarity in consumption don’t work as well because of the first point. And recommendations based on the likelihood of watching often reduce to recommending the most addictive content. 

Solutions

  1. More Control. To resist temptation, humans plan ahead, e.g., don’t stock sugary snacks at home. By changing the environment, humans can more safely navigate the space during times when impulse control is weaker.
    • Rules. Let people write rules for the kinds of video they don’t want to be offered.
    • Source filtering. On X (formerly Twitter), for instance, you can curate your feed by choosing who to follow. (X has ‘For You’ and ‘Following’ tabs.) The user only sees tweets that the users they follow tweet or retweet. (On YouTube, you can subscribe to channels but the user sees more than the content produced by the channels they subscribe to.)
    • Time limits. Let people set time limits (for certain kinds of content).
    • Profiles Offer a way to switch between profiles.
  2. Better Data
    • Get System 2 Data. Get feedback on what people have viewed at a later time. For instance, in the history view, allow people to score their viewing history.
    • Network data. Only get content from people whose judgment you trust. This is different from #1a, which proposes allowing filtering on content producers.
  3. Information. Provide daily/weekly/monthly report cards on how much time was spent watching what kind of content, and what times of the day/week were where the person respected their self-recorded preferences (longer-term).
  4. Storefronts. Let there be a marketplace of curation services (curators). And let people visit the ‘store’ than the warehouse (and a particular version of curation).

Acknowledgment. The article benefitted from discussion with Chris Alexiuk and Brian Whetter.