Automating Understanding, Not Just ML

27 Jun

Some of the most complex parts of Machine Learning are largely automated. The modal ML person types in simple commands for very complex operations and voila! Some companies, like Microsoft (Azure) and DataRobot, also provide a UI for this. And this has generally not turned out well. Why? Because this kind of system does too little for the modal ML person and expects too much from the rest. So the modal ML person doesn’t use it. And the people who do use it, generally use it badly. The black box remains the black box. But not much is needed to place a lamp in this black box. Really, just two things are needed:

1. A data summarization and visualization engine, preferably with some chatbot feature that guides people smartly through the key points, including the problems. For instance, start with univariate summaries, highlighting ranges, missing data, sparsity, and such. Then, if it is a supervised problem, give people a bunch of loess plots or explain the ‘best fitting’ parametric approximations with y in plain English, such as, “people who eat 1 more cookie live 5 minutes shorter on average.”

2. An explanation engine, including what the explanations of observational predictions mean. We already have reasonable implementations of this.

When you have both, you have automated complexity thoughtfully, in a way that empowers people, rather than create a system that enables people to do fancy things badly.

Talking On a Tangent

22 Jun

What is the trend over the last X months? One estimate of the ‘trend’ over the last k time periods is what I call the ‘hold up the ends’ method. Look at t_k and t_0, get the difference between the two, and divide by the number of time periods. If t_k > t_0, you say that things are going up. If t_k < t_0, you say things are going down. And if they are the same, then you say that things are flat. But this method can elide over important non-linearity. For instance, say unemployment went down in the first 9 months and then went up over the last 3 but ended with t_k < t_0. What is the trend? If by trend, we mean average slope over the last t time periods, and if there is no measurement error, then 'hold up the ends' method is reasonable. If there is measurement error, we would want to smooth the time series first before we hold up the ends. Often people care about 'consistency' in the trend. One estimate of consistency is the following: the proportion of times we get a number of the same sign when we do pairwise comparison of any two time consecutive time periods. Often people also care more about later time periods than earlier time periods. And one could build on that intuition by weighting later changes more.

Targeting 101

22 Jun

Targeting Economics

Say that there is a company that makes more than one product. And users of any one of its products don’t use all of its products. In effect, the company has a \textit{captive} audience. The company can run an ad in any of its products about the one or more other products that a user doesn’t use. Should it consider targeting—showing different (number of) ads to different users? There are five things to consider:

  • Opportunity Cost: If the opportunity is limited, could the company make more profit by showing an ad about something else?
  • The Cost of Showing an Ad to an Additional User: The cost of serving an ad; it is close to zero in the digital economy.
  • The Cost of a Worse Product: As a result of seeing an irrelevant ad in the product, the user likes the product less. (The magnitude of the reduction depends on how disruptive the ad is and how irrelevant it is.) The company suffers in the end as its long-term profits are lower.
  • Poisoning the Well: Showing an irrelevant ad means that people are more likely to skip whatever ad you present next. It reduces the company’s ability to pitch other products successfully.
  • Profits: On the flip side of the ledger are expected profits. What are the expected profits from showing an ad? If you show a user an ad for a relevant product, they may not just buy and use the other product, but may also become less likely to switch from your stack. Further, they may even proselytize your product, netting you more users.

I formalize the problem here (pdf).

Firmly Against Posing Firmly

31 May

“What is crucial for you as the writer is to express your opinion firmly,” writes William Zinsser in “On Writing Well: An Informal Guide to Writing Nonfiction.” To emphasize the point, Bill repeats the point at the end of the paragraph, ending with, “Take your stand with conviction.”

This advice is not for all writers—Bill particularly wants editorial writers to write with a clear point of view.

When Bill was an editorial writer for the New York Herald Tribune, he attended a daily editorial meeting to “discuss what editorials … to write for the next day and what position …[to] take.” Bill recollects,

“Frequently [they] weren’t quite sure, especially the writer who was an expert on Latin America.

“What about that coup in Uruguay?” the editor would ask. “It could represent progress for the economy,” the writer would reply, “or then again it might destabilize the whole political situation. I suppose I could mention the possible benefits and then—”

The editor would admonish such uncertainty with a curt “let’s not go peeing down both legs.”

Bill approves of taking a side. He likes what the editor is saying if not the language. He calls it the best advice he has received on writing columns. I don’t. Certainty should only come from one source: conviction born from thoughtful consideration of facts and arguments. Don’t feign certainty. Don’t discuss concerns in a perfunctory manner. And don’t discuss concerns at the end.

Surprisingly, Bill agrees with the last bit about not discussing concerns in a perfunctory manner at the end. But for a different reason. He thinks that “last-minute evasions and escapes [cancel strength].”

Don’t be a mug. If there are serious concerns, don’t wait until the end to note them. Note them as they come up.

Sigh-tations

1 May

In 2010, Google estimated that approximately 130M books had been published.

As a species, we still know very little about the world. But what we know already far exceeds what any of us can learn in a lifetime.

Scientists are acutely aware of the point. They must specialize, as chances of learning all the key facts about anything but the narrowest of the domains are slim. They must also resort to shorthand to communicate what is known and what is new. The shorthand that they use is—citations. However, this vital building block of science is often rife with problems. The three key problems with how scientists cite are:

1. Cite in an imprecise manner. This broad claim is supported by X. Or, our results are consistent with XYZ. (Our results are consistent with is consistent with directional thinking than thinking in terms of effect size. That means all sorts of effects are consistent, even those 10x as large.) For an example of how I think work should be cited, see Table 1 of this paper.

2. Do not carefully read what they cite. This includes misstating key claims and citing retracted articles approvingly (see here). The corollary is that scientists do not closely scrutinize papers they cite, with the extent of scrutiny explained by how much they agree with the results (see the next point). For a provocative example, see here.)

3. Cite in a motivated manner. Scientists ‘up’ the thesis of articles they agree with, for instance, misstating correlation as causation. And they blow up minor methodological points with articles whose results their paper’s result is ‘inconsistent’ with. (A brief note on motivated citations: here).

Bad Hombres: Bad People on the Other Side

8 Dec

Why do many people think that people on the other side are not well motivated? It could be because they think that the other side is less moral than them. And since opprobrium toward the morally defective is the bedrock of society, thinking that the people in the other group are less moral naturally leads people to censure the other group.

But it can’t be that two groups simultaneously have better morals than the other. It can only be that people in the groups think they are better. This much logic dictates. So, there has to be a self-serving aspect to moral standards. And this is what often leads people to think that the other side is less moral. Accepting this is not the same as accepting moral relativism. For even if we accept that some things are objectively more moral—not being sexist or racist say—some groups—those that espouse that a certain sex is superior or certain races are better—will still think that they are better.

But how do people come to know of other people’s morals? Some people infer morals from political aims. And that is a perfectly reasonable thing to do as political aims reflect what we value. For instance, a Republican who values ‘life’ may think that Democrats are morally inferior because they support the right to abortion. But the inference is fraught with error. As matters stand, Democrats would also like women to not go through the painful decision of aborting a fetus. They just want there to be an easy and safe way for women should they need to.

Sometimes people infer morals from policies. But support for different policies can stem from having different information or beliefs about causal claims. For instance, Democrats may support a carbon tax because they believe (correctly) the world is warming and because they think that the carbon tax is what will help reduce global warming the best and protect American interests. Republicans may dispute any part of that chain of logic. The point isn’t what is being disputed per se, but what people will infer about others if they just had information about the policies they support. Hanlon’s razor is often a good rule.

Why Do People (Re)-Elect Bad Leaders?

7 Dec

‘Why do people (re)-elect bad leaders?’ used to be a question that people only asked of third-world countries. No more. The recent election of unfit people to prominent positions in the U.S. and elsewhere has finally woken some American political scientists from their mildly racist reverie—the dream that they are somehow different.

So why do people (re)-elect bad leaders? One explanation that is often given is that people prefer leaders that share their ethnicity. The conventional explanation for preferring co-ethnics is that people expect co-ethnics (everyone) to do better under a co-ethnic leader. But often enough, the expectation seems more like wishful thinking than anything else. After all, the unsuitability of some leaders is pretty clear.

If it is wishful thinking, then how do we expose it? More importantly, how do we fix it? Let’s for the moment assume that people care about everyone. And if they were to learn that the co-ethnic leader is much worse than someone else, they may switch votes. But what if people care about the welfare of co-ethnics more than others? The ‘good’ thing about bad leaders is that they are generally bad for everyone. So, if they knew better, they would still switch their vote.

You can verify these points using a behavioral trust game where people observe allocators of different ethnicities and different competence, and also observe welfare of both co-ethnics and others. You can also use the game to study some of the deepest concerns about ‘negative party ID’—that people will harm themselves to spite others.

Party Time

2 Dec

It has been nearly five years since the publication of Affect, Not Ideology: A Social Identity Perspective on Polarization. In that time, the paper has accumulated over 450 citations according to Google Scholar. (Citation counts on Google Scholar tend to be a bit optimistic.) So how does the paper hold up? Some reflections:

  • Disagreement over policy conditional on aims should not mean that you think that people you disagree with are not well motivated. But regrettably, it often does.
  • A lack of real differences doesn’t mean a lack of perceived differences. See here, here, here, and here.
  • The presence of real differences is no bar to liking another person or group. Nor does a lack of real differences come in the way of disliking another person or group. The history of racial and ethnic hatred will attest to the point. In fact, why small differences often serve as durable justifications for hatred is one of the oldest and deepest questions in all of social science. (Paraphrasing from Affectively Polarized?.) Evidence on the point:
    1. Sort of sorted but definitely polarized
    2. Assume partisan identity is slow-moving as Green, Palmquist, and Schickler (2002) among others show. And then add to it the fact people still like their ‘own’ party a fair bit—thermometer ratings are a toasty 80 and haven’t budged. See the original paper.
    3. People like ideologically extreme elites of the party they identify with a fair bit (see here).
  • It may seem surprising to some that people can be so angry when they spend so little time on politics and know next to nothing about it. But it shouldn’t be. Information generally gets in the way of anger. Again,
    the history of racial bigotry is a good example.
  • The title of the paper is off in two ways. First, partisan affect can be caused by ideology. Not much of partisan affect may be founded in ideological differences, but at least some of it is. (I always thought so.) Secondly, the paper does not offer a social identity perspective on polarization.
  • The effect that campaigns have on increasing partisan animus is still to be studied carefully. Certainly, ads play but a small role in it.
  • Evidence on the key take-home point—that partisans dislike each other a fair bit—continues to mount. The great thing is that people have measured partisan affect in many different ways, including using IAT and trust games. Evidence that IAT is pretty unreliable is reasonably strong, but trust games seem reasonable. Also, see my 2011 note on measuring partisan affect coldly.
  • Interpreting overtime changes is hard. That was always clear to us. But see Figure 1 here that controls for a bunch of socio-demographic variables, and note that the paper also has over-time cross-country to clarify inferences further.
  • If you assume that people learn about partisans from elites, reasoning what kinds of people would support this ideological extremist or another, it is easy to understand why people may like the opposing party less over time (though trends among independents should be parallel). The more curious thing is that people still like the party they identify with and approve of ideologically extreme elites of their party (see here).

Good NYT: Provision of ‘Not News’ in the NYT Over Time

29 Nov

The mainstream American news media is under siege from the political right, but for the wrong reasons. To the hyper-partisan political elites, small inequities in slant matter a lot. But hyperventilating about small issues doesn’t magically turn them into serious problems. It just makes them loom larger. And takes the focus away from the much more serious problems. The big afflictions of ‘MSM’ are: vacuity, sensationalism, a preference for opinions over facts, poor understanding of important issues, and disinterest in covering them diligently, disinterest in what goes on outside American borders, a poor understanding of numbers, policy myopia—covering events rather than issues, and fixation with breaking news.

Here we shed light on a small piece of the real estate: the provision of ‘not news.’ By ‘not news’ we mean news about cooking, travel, fashion, home decor, and such. We are very conservative in what we code as ‘not news,’ leaving news about health, anything local, and such in place.

We analyze provision of ‘not news’ in The New York Times (NYT), the nation’s newspaper of record. NYT is both well-regarded and popular. It has won more Pulitzer awards than any other newspaper. And it is the 30th most visited website in the U.S. (as of October 2017).

So, has the proportion of news stories about topics unrelated to politics or the economy, such as cooking, travel, fashion, music, etc., in NYT gone up over time? (For issues related to measurement, see here.) As the figure shows, the net provision of not news increased by 10 percentage points between 1987 and 2007.

Interested in learning about more ways in which NYT has changed over time? See here.

Missing Women on the Streets of Delhi

19 Nov

In 1990, Amartya Sen estimated that more than 100 million women were missing in South and West Asia, and China. His NYRB article shed light on sex-discrimination in parts of Asia, highlighting, among other things, pathologies like sex-selective abortion, biases in nutrition, healthcare, and schooling.

We aim to extend that line of inquiry, and shed light on the question: “How many women are missing from a public life?” In particular, we aim to answer how many women are missing from the streets.

To estimate ‘missing women,’ we need a baseline. While there are some plausible ‘taste-based’ reasons for the sex ratio on the streets to differ from 50-50, for the current analysis, I assume that in a gender equal society, roughly equal number of men and women are out on the streets. And I attribute any skew to real (and perceived) threat of molestation, violence, harassment, patriarchy (allowing wives, daughters, sisters to go out), discrimination in employment, and similar such things.

Note About Data and Measurement

Of all the people out on the street over the course of a typical day in Delhi, what proportion are women? To answer that, I devised what I thought was a pretty reasonable sampling plan, and a pretty clever data collection strategy see here. Essentially, we would send people at random street locations at random times and ask them to take photos at head height, and then crowd-source counting the total number of people in the picture and the total number of women in the picture.

The data we finally collected in this round bears little resemblance to the original data collection plan. As to why the data collection went off rails, we have nothing to share publicly for now. The map of the places from which we collect data though lays bare the problems.

Data, Scripts, and Analyses are posted here.

Results

The data were collected between 2016-11-12 and 2017-01-11. And between roughly 10 am and 7 pm. In all, we collected nearly 1,958 photos from 196 locations. On average about 81.5% of the people on the street were men. The average proportion of men across various locations was 86.7% which suggests that somewhat busier places have somewhat more women.

Estimating Bias and Error in Perceptions of Group Composition

14 Nov

People’s reports of perceptions of the share of various groups in the population are typically biased. The bias is generally greater for smaller groups. The bias also appears to vary by how people feel about the group—they are likelier to think that the groups they don’t like are bigger—and by stereotypes about the groups (see here and here).

A new paper makes a remarkable claim: “explicit estimates are not direct reflections of perceptions, but systematic transformations of those perceptions. As a result, surveys and polls that ask participants to estimate demographic proportions cannot be interpreted as direct measures of participants’ (mis)information since a large portion of apparent error on any particular question will likely reflect rescaling toward a more moderate expected value…”

The claim is supported by a figure that takes the form of plotting a curve over averages. (It also reports results from other papers that base their inferences on similar figures.)

The evidence doesn’t seem right for the claim. Ideally, we want to plot curves within people and show that the curves are roughly the same. (I doubt it to be the case.)

Second, it is one thing to claim that the reports of perceptions follow a particular rescaling formula and another to claim that people are aware of what they are doing. I doubt that people are.

Third, if the claim that ‘a large portion of apparent error on any particular question will likely reflect rescaling toward a more moderate expected value’ is true, then presenting people with correct information ought not to change how people think about groups, for e.g., perceived threat from immigrants. The calibrated error should be a much better moderator than the raw error. Again, I doubt it.

But I could be proven wrong about each. And I am ok with that. The goal is to learn the right thing, not to be proven right.

God, Weather, and News vs. Porn

22 Oct

The Internet is for porn (Avenue Q). So it makes sense to measure things on the Internet in porn units.

I jest, just a bit.

In Everybody Lies, Seth Stephens Davidowitz points out that people search for porn more than weather on GOOG. Data from Google Trends for the disbelievers.

But how do searches for news fare? Surprisingly well. And it seems the new president is causing interest in news to outstrip interest in porn. Worrying, if you take Posner’s point that people’s disinterest in politics is a sign that they think the system is working reasonably well. The last time searches for news > porn was when another Republican was in the White House!

How is the search for porn affected by Ramadan? For answer, we turn to Google Trends from Pakistan. But you may say that the trend is expected given Ramadan is seen as a period for ritual purification. And that is a reasonable point. But you see the same thing with Eid-ul-Fitr and porn.

And in Ireland, of late, it seems searches for porn increase during Christmas.

Incentives to Care

11 Sep

A lot of people have their lives cut short because they eat too much and exercise too little. Worse, the quality of their shortened lives is typically much lower as a result of avoidable' illnesses that stem frombad behavior.’ And that isn’t all. People who are not feeling great are unlikely to be as productive as those who are. Ill-health also imposes a significant psychological cost on loved ones. The net social cost is likely enormous.

One way to reduce such costly avoidable misery is to invest upfront. Teach people good habits and psychological skills early on, and they will be less likely to self-harm.

So why do we invest so little up front? Especially when we know that people are ill-informed (about the consequences of their actions) and myopic.

Part of the answer is that there are few incentives for anyone else to care. Health insurance companies don’t make their profits by caring. They make them by investing wisely. And by minimizing ‘avoidable’ short-term costs. If a member is unlikely to stick with a health plan for life, why invest in their long-term welfare? Or work to minimize negative externalities that may affect the next generation?

One way to make health insurance care is to rate them on estimated quality-adjusted years saved due to interventions they sponsored. That needs good interventions and good data science. And that is an opportunity. Another way is to get the government to invest heavily early on to address this market failure. Another version would be to get the government to subsidize care that reduces long-term costs.

Measuring Segregation

31 Aug

Dissimilarity index is a measure of segregation. It runs as follows:

\frac{1}{2} \sum\limits_{i=1}^{n} \frac{g_{i1}}{G_1} - \frac{g_{i2}}{G_2}
where:

g_{i1} is population of g_1 in the ith area
G_{i1} is population of g_1 in the larger area
from which dissimilarity is being measured against

The measure suffers from a couple of issues:

  1. Concerns about lumpiness. Even in a small area, are black people at one end, white people at another?
  2. Choice of baseline. If the larger area (say a state) is 95\% white (Iowa is 91.3% White), dissimilarity is naturally likely to be small.

One way to address the concern about lumpiness is to provide an estimate of the spatial variance of the quantity of interest. But to measure variance, you need local measures of the quantity of interest. One way to arrive at local measures is as follows:

  1. Create a distance matrix across all addresses. Get latitude and longitude. And start with Euclidean distances, though smart measures that take account of physical features are a natural next step. (For those worried about computing super huge matrices, the good news is that computation can be parallelized.)
  2. For each address, find n closest addresses and estimate the quantity of interest. Where multiple houses are similar distance apart, sample randomly or include all. One advantage of n closest rather than addresses in a particular area is that it naturally accounts for variations in density.

But once you have arrived at the local measure, why just report variance? Why not report means of compelling common-sense metrics, like the proportion of addresses (people) for whom the closest house has people of another race?

As for baseline numbers (generally just a couple of numbers): they are there to help you interpret. They can be brought in later.

Unstrapped

21 Aug

When strapped for time, some resort to wishful thinking, others to lashing out. Both are illogical. If you are strapped for time, it is either because you scheduled poorly or because you were a victim of unanticipated obligations. Both are understandable, but neither justify ‘acting out.’ So don’t.

Whatever the reason, being strapped for time means either that some things won’t get done on time, or that you will have to work harder, or that you will need more resources (yours or someone else’s), or all three. And the only things to do are:

  1. Triage,
  2. Ask for help, and
  3. Communicate effectively to those affected

If you have landed in soup because of poor scheduling, for instance, by not budgeting any time to deal with things you haven’t scheduled, make a note. And improve.

And since it is never rational to worry—it is at best unproductive, and at worst corrosive—avoid it like plague.

How Do We Know?

17 Aug

How can fallible creatures like us know something? The scientific method is about answering that question well. To answer the question well, we have made at least three big innovations:

1. Empiricism. But no privileged observer. What you observe should be reproducible by all others.

2. Open to criticism: If you are not convinced about the method of observation, the claims being made, criticize. Offer reason or proof.

3. Mathematical Foundations: Reliance on math or formal logic to deduce what claims can be made if certain conditions are met.

These innovations along with two more innovations have allowed us to ‘scale.’ Foremost among the innovations that allow us to scale is our ability to work together. And our ability to preserve information on stone, paper, electrons, allows us to collaborate with and build on the work done by people who are now dead. The same principle that allows us to build as gargantuan a structure as the Hoover Dam and entire cities allows us to learn about complex phenomenon. And that takes us to the final principle of science.