99 Problems: How to Solve Problems

7 Jun

“Data is the new oil,” according to Clive Humby. But we have yet to build an engine that uses the oil efficiently and doesn’t produce a ton of soot. Using data to discover and triage problems is especially polluting. Working with data for well over a decade, I have learned some tricks that produce less soot and more light. Here’s a synopsis of a few things that I have learned.

  1. Is the Problem Worth Solving? There is nothing worse than solving the wrong problem. You spend time and money and get less than nothing in return—you squander the opportunity to solve the right problem. So before you turn to solutions, find out if the problem is worth solving.

    To illustrate the point, let’s follow Goji. Goji runs a delivery business. Goji’s business has an apparent problem. The company’s couriers have a habit of delivering late. At first blush, it seems like a big problem. But is it? To answer that, one good place to start is by quantifying how late the couriers arrive. Let’s say that most couriers arrive within 30 minutes of the appointment time. It seems promising but we still can’t tell whether it is good or bad. To find out, we could ask the customers. But asking customers is a bad idea. Even if the customers don’t care about their deliveries running late, it doesn’t cost them a dime to say that they care. Finding out how much they care is better. Find out the least amount of money the customers will happily accept in lieu of you running 30 minutes to the delivery. It may turn out that most customers don’t care—they will happily accept some trivial amount in lieu of a late delivery. Or it may turn out that customers only care when you deliver frozen or hot food. This still doesn’t give you the full picture. To get yet more clarity on the size of the problem, check how your price adjusted quality compares to other companies.

    Misestimating what customers will pay for something is just one of the ways to the wrong problem. Often, the apparent problem is merely an artifact of the measurement error. For instance, it may be that we think the couriers arrive late because our mechanism for capturing arrival is imperfect—couriers deliver on time but forget to tap the button acknowledging they have delivered. Automated check-in based on geolocation may solve the problem. Or incentivizing couriers to be prompt may solve it. But either way, the true problem is not late arrivals but mismeasurement.

    Wrong problems can be found in all parts of problem-solving. During software development, for instance, “[p]rogrammers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs,” according to Donald Knuth. (Knuth called the tendency “premature optimization.”) Worse, Knuth claims that “these attempts at efficiency actually ha[d] a strong negative impact” on how maintainable the code is.

    Often, however, you are not solving the wrong problem. You are just solving it at the wrong time. The conventional workflow of problem-solving is discovery, estimating opportunity, estimating investment, prioritizing, execution, and post-execution discovery, where you begin again. To find out what to focus on now, you need to get till prioritization. There are some rules of thumb, however, that can help you triage. 1. Fix upstream problems before downstream problems. The fixes upstream may make the downstream improvements moot. 2. diff the investment and returns based on optimal future workflow. If you don’t do that, you are committing to scrapping later a lot of what you build today. 3. Even on the best day, estimating the return on investment is a single decimal science. 4. You may find that there is no way to solve the problem with the people you have.
  1. MECE: Management consultants swear by it, so it can’t be a good idea. Right? It turns out that it is. Relentlessly working to pare down the problem into independent parts is among the most important tricks of the trade. Let’s see it in action. After looking at the data, Goji finds that arriving late is a big problem. So you know that it is the right problem but don’t know why your couriers are failing. You apply MECE. You reason that it could be because you have ‘bad’ couriers. Or because you are setting good couriers up for failure. These mutually exclusive comprehensively exhaustive parts can be broken down further. In fact, I think there is a law: the number of independent parts that a problem can be pared down is always one more than you think it is. Here, for instance, you may be setting couriers up to fail by giving them too little lead time or by not providing them precise directions. If you go down yet another layer, the short lead time may be a result of you taking too long to start looking for a courier or because it takes you a long time to find the right courier. So on and so forth. There is no magic to this. But there is no science to it either. MECE tells you what to do but not how to do it. We discuss how to in subsequent points.

  2. Funnel or the Plinko: The layered approach to MECE reminds most data scientists of the ‘funnel.’ Start with 100% and draw your Sankey diagram, popularized by Minard’s Napolean goes to Russia.

    Funnels are powerful tools capturing two important aspects: how much do we lose in each step, and where the losses come from. There is, however, one limitation of funnels—the need for categorical variables. When you have continuous variables, you need to decide smartly about how to discretize. Following the example we have been using, the heads-up we give to our couriers to pick something and deliver to the customer is one such continuous variable. Rather than break it into arbitrarily granular chunks, it is better to plot how lateness varies by lead time and then categorize at places where the slope changes dramatically.

    There are three things to be cautious about when building and using funnels. The first is that funnels treat correlation as causation. The second is Simpson’s paradox which deals with issues of aggregation in observational data. And the third is how coarseness of the funnel can lead to mistaken inferences. For instance, you may not see the true impact of having too little time to find a courier because you raise the prices where you have too little time.

  3. Systemic Thinking: It pays to know how the cookie is baked. Learn how the data flows through the system and what decisions we make at what point with what data and what assumptions to what purpose. The conventional tools are flow chart and process tracing. Keeping with our example, say we have a system that lets customers know when we are running late. And let’s assume that not only do we struggle to arrive on time, we also struggle to let people know when we are running late. An engineer may split the problem into an issue with detection or an issue with communication. The detection system may be made up of measuring where the courier is and estimating the time it takes to get to the destination. And either may be broken. And communication issues may be stem from problems with sending emails or issues with delivery, e.g., email being flagged as spam.

  4. Sample Failures: One way to diagnose problems is to look at a few examples closely. This is a good way to understand what could go wrong. For instance, it may allow you to discover that the locations you are getting from the couriers are wrong because the locations received a minute apart are hundreds of miles apart. This can then lead you to the diagnosis that your application is installed on multiple devices, and you cannot distinguish between data emitted by various devices.

  5. Worst Case: When looking at examples, start with the worst errors. The intuition is simple: worst errors are often the sites for obvious problems.

  6. Correlation is causation. To gain more traction, compare the worst with the best. Doing that allows you to see what is different between the two. The underlying idea is, of course, treating correlation as causation. And that is a famous warning. But often enough, correlation points in the right direction.

  7. Exploit the Skew: The Pareto principle—the 80/20 rule—holds in many places. Look for it. Rather than solve the entire pie, check if the opportunity is concentrated in small places. It often is. Pursuing our example above, it could be that a small proportion of our couriers account for a majority of the late deliveries. Or it could be that a small number of incorrect addresses our causing most of our late deliveries by waylaying couriers.

  8. Under good conditions, how often do we fail? How do you know how big of an issue a particular problem is? Say, for instance, you want to learn how big a role bad location data plays in our ability to notify. To do that, you should filter to cases where you have great location data and then see how well you can do. And then figure out the proportion of cases where we have great location data.

  9. Dr. House: The good doctor was a big believer in differential diagnosis. Dr. House often eliminated potential options by evaluating how patients responded to different treatment regimens. For instance, he would put people on an antibiotic course to eliminate infection as an option. The more general strategy is experimentation: learn by doing something.

    Experimentation is a sine-qua-non where people are involved. The impact of code is easy to simulate. But we cannot answer how much paying $10 per on-time delivery will increase on-time delivery. We need to experiment.

Trump Trumps All: Coverage of Presidents on Network Television News

4 May

With Daniel Weitzel.

The US government is a federal system, with substantial domains reserved for local and state governments. For instance, education, most parts of the criminal justice system, and a large chunk of regulation are under the purview of the states. Further, the national government has three co-equal branches: legislative, executive, and judicial. Given these facts, you would expect news coverage to be broad in its coverage of branches and the levels of government. But there is a sharp skew in news coverage of politicians, with members of the executive branch, especially national politicians (and especially the president), covered far more often than other politicians (see here). Exploiting data from Vanderbilt Television News Archive (VTNA), the largest publicly available database of TV news—over 1M broadcast abstracts spanning 1968 and 2019—we add body to the observation. We searched for references to the president during their presidency and coded each hit as 1. As the figure below shows, references to the president are common. Excluding Trump, on average, a sixth of all articles contain a reference to the sitting president. But Trump is different: 60%(!) of abstracts refer to Trump.

Data and scripts can be found here.

Trading On Overconfidence

2 May

In Thinking Fast and Slow, Kahneman recounts a time when Thaler, Amos, and he met a senior investment manager in 1984. Kahneman asked, “When you sell a stock, who buys it?”

“[The investor] answered with a wave in the vague direction of the window, indicating that he expected the buyer to be someone else very much like him. That was odd: What made one person buy, and the other person sell? What did the sellers think they knew that the buyers did not? [gs: and vice versa.]”

“… It is not unusual for more than 100M shares of a single stock to change hands in one day. Most of the buyers and sellers know that they have the same information; they exchange the stocks primarily because they have different opinions. The buyers think the price is too low and likely to rise, while the sellers think the price is high and likely to drop. The puzzle is why buyers and sellers alike think that the current price is wrong. What makes them believe they know more about what the price should be than the market does? For most of them, that belief is an illusion.”

Thinking Fast and Slow. Daniel Kahneman

Note: Kahneman is not just saying that buyers and sellers have the same information but that they also know they have the same information.

There is a 1982 counterpart to Kahneman’s observation in the form of Paul Milgrom and Nancy Stokey’s paper on the No-Trade Theorem. “[If] [a]ll the traders in the market are rational, and thus they know that all the prices are rational/efficient; therefore, anyone who makes an offer to them must have special knowledge, else why would they be making the offer? Accepting the offer would make them a loser. All the traders will reason the same way, and thus will not accept any offers.”

Lost Years: From Lost Lives to Life Lost

2 Apr

The mortality rate is puzzling to mortals. A better number is the expected number of years lost. (A yet better number would be quality-adjusted years lost.) To make it easier to calculate the expected years lost, Suriyan and I developed a Python package that uses the SSA actuarial data and life table to estimate the expected years lost.

We illustrate the use of the package by estimating the average number of years by which people’s lives are shortened due to coronavirus (see Note 1 at the end of the article). Using data from Table 1 of the paper that gives us the distribution of ages of people who died from COVID-19 in China, with conservative assumptions (assuming the gender of the dead person to be male, taking the middle of age ranges) we find that people’s lives are shortened by about 11 years on average. These estimates are conservative for one additional reason: there is likely an inverse correlation between people who die and their expected longevity. And note that given a bulk of the deaths are among older people, when people are more infirm, the quality-adjusted years lost is likely yet more modest. Given that the last life tables from China are from 1981 and given life expectancy in China has risen substantially since then (though most gains come from reductions in childhood mortality, etc.), we exploit the recent data from the US, assuming as-if people have the same life tables as Americans. Using the most recent SSA data, we find that the number to be 16. Compare this to deaths from road accidents, the modal reason for death among 5-24, and 25-44 ages in the US. Assuming everyone who dies from a traffic accident is a man, and assuming the age of death to be 25, we get ~52 years, roughly 3x as large as that of coronavirus (see Note 3 at the end of the article). On the other hand, smoking on average shortens life by about seven years. (See Note 2 at the end of the article.)

8/4 Addendum: Using COVID-19 Electronic Death Certification Data (CEPIDC), like above, we estimate the average number of years lost by people dying of coronavirus. With conservative assumptions (assuming the gender of the dead person to be male, taking the middle of age ranges) we find that people’s lives are shortened by about 9 years on average. Surprisingly, the average number of years lost of the people dying of coronavirus remained steady at about 9 years between March and July 2020.

Note 1: Years lost is not sufficient to understand the impact of Covid-19. Covid-19 has had dramatic consequences on the quality of life and has had a large financial impact, among other things. It is useful to account for those when estimating the net impact of Covid-19.

Note 2: In the calculations above, we assume that all the deaths from Coronavirus have been observed. One could do the calculation differently by tracking life spans of people infected with Covid-19 and comparing it to a similar set of people who were never infected with Covid-19. Presumably, the average years lost for people who don’t die of Covid-19 when they are first infected is a lot lower. Thus, counting them would bring the average years lost way down.

Note 3: The net impact of Covid-19 on years lost in the short-term should plausibly account for years saved because of fewer traffic accidents, etc.

The Puzzle of Price Dispersion on Amazon

29 Mar

Price dispersion is an excellent indicator of transactional frictions. It isn’t that absent price dispersion, we can confidently say that frictions are negligible. Frictions can be substantial even when price dispersion is zero. For instance, if the search costs are high enough that it makes it irrational to search, all the sellers will price the good at the buyer’s Willingness To Pay (WTP). Third world tourist markets, which are full of hawkers selling the same thing at the same price, are good examples of that. But when price dispersion exists, we can be reasonably sure that there are frictions in transacting. This is what makes the existence of substantial price dispersion on Amazon compelling.

Amazon makes price discovery easy, controls some aspects of quality by kicking out sellers who don’t adhere to its policies and provides reasonable indicators of quality of service with its user ratings. But still, on nearly all items that I looked at, there was substantial price dispersion. Take, for instance, the market for a bottle of Nature Made B12 vitamins. Prices go from $8.40 to nearly $30! With taxes, the dispersion is yet greater. If the listing costs are non-zero, it is not immediately clear why sellers selling the product at $30 are in the market. It could be that the expected service quality for the $30 seller is higher except that between the highest price seller and the next highest price seller, the ratings of the highest price seller are lower (take a look at shipping speed as well). And I would imagine that the ratings (and the quality) of Amazon, which comes in with the lowest price, are the highest. More generally, I have a tough time thinking about aspects of service and quality that are worth so much that the range of prices goes from 1x to 4x for a branded bottle of vitamin pills.

One plausible explanation is that the lowest price seller has a non-zero probability of being out of stock. And the more expensive and worse-quality sellers are there to catch these low probability events. They set a price that is profitable for them. One way to think about it is that the marginal cost of additional supply rises in the way the listed prices show. If true, then there seems to be an opportunity to make money. And it is possible that Amazon is leaving money on the table.

p.s. Sales of the boxed set of Harry Potter shows a similar pattern.

It Pays to Search

28 Mar

In Reinventing the Bazaar, John McMillan discusses how search costs affect the price the buyer pays. John writes:

“Imagine that all the merchants are quoting $1[5]. Could one of them do better by undercutting this price? There is a downside to price-cutting: a reduction in revenue from any customers who would have bought from this merchant even at the higher price. If information were freely available, the price-cutter would get a compensating boost in sales as additional customers flocked in. When search costs exist, however, such extra sales may be negligible. If you incur a search cost of 10 cents or more for each merchant you sample, and there are fifty sellers offering the urn, then even if you know there is someone out there who is willing to sell it at cost, so you would save $5, it does not pay you to look for him. You would be looking for a needle in a haystack. If you visited one more seller, you would have a chance of one in fifty of that seller being the price-cutter, so the return on average from that extra price quote would be 10 cents (or $5 multiplied by 1/50), which is the same as your cost of getting one more quote. It does not pay to search.”

Reinventing the Bazaar, John McMillan

John got it wrong. It pays to search. The cost and the expected payoff for the first quote is 10 cents. But if the first quote is $15, the expected payoff for the second quote—(1/49)*$50—is greater than 10 cents. And so on.

Another way to solve for it is to come up with the expected number of quotes you need to get to get to the seller selling at $10. It is 25. Given you need to spend on average $2.50 to get a benefit of $2.50, you will gladly search.

Yet another way to think is that the worst case is that you make no money—when the $10 seller is the last one you get a quote from. But in every other case, you make money.

For the equilibrium price, you need to make assumptions. But if the buyer knows that there is a price cutter, they will all buy from him. This means that the price cutter will be the only seller remaining.

There are two related fun points. First, one of the reasons markets are competitive on price when true search costs are high is likely because people price their time remarkably low. Second, when people spend a bunch of time looking for the cheapest deal, you incentivize all the sellers selling at a high rate to lower their rates and make it better for everyone else.

Good News: Principles of Good Journalism

12 Mar

If fake news—deliberate disinformation, not uncongenial news—is one end of the spectrum, what is the other end of the spectrum?

To get at the question, we need a theory of what news should provide. A theory of news, in turn, needs a theory of citizenship, which prescribes the information people need to execute their role, and an empirically supported behavioral theory of how people get that information.

What a democracy expects of people varies by the conception of democracy. Some theories of democracy only require citizens to have enough information to pick the better candidate when differences in candidates are material. Others, like deliberative democracy, expect people to be well informed and to have thought through various aspects of policies.

I opt for deliberative democracy to guide expectations about people for two reasons. Not only does the theory best express the highest ideals of democracy, but it also has the virtue of answering a vital question well. If all news was equally profitable to produce and was as widely read, what kind of news would lead to the best political outcomes, as judged by idealized versions of people—people who have all the information and all the time to think through the issues?

There are two virtues of answering such a question. First, it offers a convenient place to start answering what we mean by ‘good’ news; we can bring in profitability and reader preferences later. Second, engaging with it uncovers some obvious aspects of ‘good’ news.

For news to positively affect political outcomes (not in the shallow, instrumental sense), the news has to be about politics. Rather than news about Kim Kardashian or opinions about the hottest boots this season, ‘good’ news is about policymaker, policy-implementor, and policy-relevant news.

News about politics is a necessary but not a sufficient condition. Switching from discussing Kim Kardashian’s dress to Hillary Clinton’s is very plausibly worse. Thus, we also want the news to be substantive, engaging with real issues rather than cosmetic concerns.

Substantively engaging with real issues is still no panacea. If the information is not correct, it will misinform than inform the debate. Thus, the third quality of ‘good’ news is correctness.

The criterion for “good” news is, however, not just correctness, but it is the correctness of interpretation. ‘Good’ news allows people to draw the right conclusions. For instance, reporting murder rates as say ‘a murder per hour’ without reporting the actual number of murders or comparing the probability of being murdered to other common threats to life may instill greater fear in people than ‘optimal.’ (Optimal, as judged by better-informed versions of ourselves who have been given time to think. We can also judge optimal by correctness—did people form unbiased, accurate beliefs after reading the news?)

Not all issues, however, lend themselves to objective appraisals of truth. To produce ‘good’ news, the best you can do is have the right process. The primary tool that journalists have in the production of news is the sources they use to report on stories. (While journalists increasingly use original data to report, the reliance on people is widespread.) Thus, the way to increase correctness is through manipulating aspects of sources. We can increase correctness by increasing the quality of sources, e.g., source more knowledgeable people with low incentives to cook the books, increase the diversity of sources, e.g., not just government officials but also plausibly major NGOs, and the number of sources.

If we emphasize correctness, we may fall short on timeliness. News has to be timely enough to be useful, aside from being correct enough to guide policy and opinion correctly.

News can be narrowly correct but may commit sins of omission. ‘Good’ news provides information on all sides of the issue. ‘Good’ news highlights and engages with all serious claims. It doesn’t give time to discredited claims for “balance.”

Second-to-last, news should be delivered in the right tone. Rather than speculative ad-hominem attacks, “good” news engages with arguments and data.

Lastly, news contributes to the public kitty only if it is original. Thus, ‘good’ news is original. (Plagiarism reduces the incentives for producing quality news because it eats into the profits.)

Feigning Competence: Checklists For Data Science

25 Jan

You may have heard that most published research is false (Ionnadis). But what you probably don’t know is that most corporate data science is also false.

Gaurav Sood

The returns on data science in most companies are likely sharply negative. There are a few reasons for that. First, as with any new ‘hot’ field, the skill level of the average worker is low. Second, the skill level of the people managing these workers is also low—most struggle to pose good questions, and when they stumble on one, they struggle to answer it well. Third, data science often fails silently (or there is enough corporate noise around it that most failures are well-hidden in plain sight), so the opportunity to learn from mistakes is small. And if that was not enough, many companies reward speed over correctness, and in doing that, often obtain neither.

How can we improve on the status quo? The obvious remedy for the first two issues is to increase the skill by improving training or creating specializations. And one remedy for the latter two points is to create incentives for doing things correctly.

Increasing training and creating specializations in data science is expensive and slow. Vital, but slow. Creating the right incentives for good data science work is not trivial either. There are at least two large forces lined up against it: incompetent supervisors and the fluid and collaborative nature of work—work usually involves multiple people, and there is a fluid exchange of ideas. Only the first is fixable—the latter is a property of work. And fixing it comes down to making technical competence a much more important criterion for hiring.

Aside from hiring more competent workers or increasing the competence of workers, you can also simulate the effect by using checklists—increase quality by creating a few “pause points”—times during a process where the person (team) pauses and goes through a standard list of questions.

To give body to the boast, let me list some common sources of failures in DS and how checklists at different pause points may reduce failure.

  1. Learn what you will lose in translation. Good data science begins with a good understanding of the problem you are trying to solve. Once you understand the problem, you need to translate it into a suitable statistical analog. During translation, you need to be aware that you will lose something in the translation.
  2. Learn the limitations. Learn what data you would love to have to answer the question if money was no object. And use it to understand how far do you fall short from that ideal and then come to a judgment about whether the question can be answered reasonably with the data at hand.
  3. Learn how good the data are. You may think you have the data, but it is best to verify it. For instance, it is good practice to think through the extent to which a variable captures the quantity of interest.
  4. Learn the assumptions behind the formulas you use and test the assumptions to find the right thing to do. Thou shall only use math formulas when you know the limitations of such formulas. Having a good grasp of when formulas don’t work is essential. For instance, say the task is to describe a distribution. Someone may use the mean and standard deviation to describe it. But we know that these sufficient statistics vary by distribution. For binomial, it may just be p. A checklist for “describing” a variable can be:
    1. check skew by plotting: averages are useful when distributions are symmetric, and lots of observations are close to the mean. If skewed, you may want to describe various percentiles.
    2. how many missing values and what explains the missing values.
    3. check for unusual values and what explains the ‘unusual’ values.

Ruling Out Explanations

22 Dec

The paper (pdf) makes the case that the primary reason for electoral cycles in dissents is priming. The paper notes three competing explanations: 1) caseload composition, 2) panel composition, and 3) volume of caseloads. And it “rules them out” by regressing case type, panel composition, and caseload on quarters from the election (see Appendix Table D). The coefficients are uniformly small and insignificant. But is that enough to rule out alternate explanations? No. Small coefficients don’t imply that there is no path from proximity to the election via competing mediators to dissent (if you were to use causal language). We can only conclude that the pathway doesn’t exist if there is a sharp null. The best you can do is bound the estimated effect.

Preference for Sons in the US: Evidence from Business Names

24 Nov

I estimate preference for passing on businesses to sons by examining how common words son and sons are compared to daughter and daughters in the names of businesses.

In the US, all businesses have to register with a state. And all states provide a way to search business names, in part so that new companies can pick names that haven’t been used before.

I begin by searching for son(s) and daughter in states’ databases of business names. But the results of searching son are inflated because of three reasons:

  • son is part of many English words, from names such as Jason and Robinson to ordinary English words like mason (which can also be a name). 
  • son is a Korean name.
  • some businesses use the wordson playfully. For instance, sonis a homonym of sun and some people use that to create names like son of a beach.

I address the first concern by using a regex that only looks at words that exactly match son or sons. But not all states allow for regex searches or allow people to download a full set of results. Where possible, I try to draw a lower bound. But still some care is needed in interpreting the results.

Data and Scripts: https://github.com/soodoku/sonny_side

In all, I find that a conservative estimate of son to daughter ratio is between 4 to 1 to 26 to 1 across states.