## Dismissed Without Prejudice: Evaluating Prejudice Reduction Research

25 Sep

Prejudice is a blight on humanity. How to reduce prejudice, thus, is among the most important social scientific questions. In the latest assessment of research in the area, a follow-up to the 2009 Annual Review article, Betsy Paluck et al., however, paint a dim picture. In particular, they note three dismaying things:

Publication Bias

Table 1 (see below) makes for grim reading. While one could argue that the pattern is explained by the fact that lab research tends to have smaller samples and has especially powerful treatments, the numbers suggest—see the average s.e. of the first two rows (it may have been useful to produce a $sqrt(1/n)$ adjusted s.e.)—that publication bias very likely plays a large role. It is also shocking to know that just a fifth of the studies have treatment groups with 78 or more people.

Light Touch Interventions

The article is remarkably measured when talking about the rise of ‘light touch’ interventions—short exposure treatments. I would have described them as ‘magical thinking’ for they seem to be founded in the belief that we can make profound changes in people’s thinking on the cheap. This isn’t to say light-touch interventions can’t be worked into a regime that affects profound change—repeated light touches may work. However, as far as I could tell, no study tried multiple touches to see how the effect cumulates.

Near Contemporaneous Measurement of Dependent Variables

Very few papers judged the efficacy of the intervention a day or more after the intervention. Given the primary estimate of interest is longer-term effects, it is hard to judge the efficacy of the treatments in moving the needle on the actual quantity of interest.

Beyond what the paper notes, here are a couple more things to consider:

1. Perspective getting works better than perspective-taking. It would be good to explore this further in inter-group settings.
2. One way to categorize ‘basic research interventions’ is by decomposing the treatment into its primary aspects and then slowly building back up bundles based on data:
1. channel: f2f, audio (radio, etc.), visual (photos, etc.), audio-visual (tv, web, etc.), VR, etc.
2. respondent action: talk, listen, see, imagine, reflect, play with a computer program, work together with someone, play together with someone, receive a public scolding, etc.
3. source: peers, strangers, family, people who look like you, attractive people, researchers, authorities, etc.
4. message type: parable, allegory, story, graph, table, drama, etc.
5. message content: facts, personal stories, examples, Jonathan Haidt style studies that show some of the roots of our morality are based on poor logic, etc.

## The (Mis)Information Age: Provenance is Not Enough

31 Aug

The information age has bought both bounty and pestilence. Today, we are deluged with both correct and incorrect information. If we knew how to tell apart correct claims from incorrect, we would have inched that much closer to utopia. But the lack of nous in telling apart generally ‘obvious’ incorrect claims from correct claims has brought us close to the precipice of disarray. Thus, improving people’s ability to identify untrustworthy claims as such takes on urgency.

http://gojiberries.io/2020/08/31/the-misinformation-age-measuring-and-improving-digital-literacy/

Inferring the Quality of Evidence Behind the Claims: Fact Check and Beyond

One way around misinformation is to rely on an expert army that assesses the truth value of claims. However, assessing the truth value of a claim is hard. It needs expert knowledge and careful research. When validating, we have to identify with which parts are wrong, which parts are right but misleading, and which parts are debatable. All in all, it is a noisy and time-consuming process to vet a few claims. Fact check operations, hence, cull a small number of claims and try to validate those claims. As the rate of production of information increases, thwarting misinformation by checking all the claims seems implausibly expensive.

Rather than assess the claims directly, we can assess the process. Or, in particular, the residue of one part of the process for making the claim—sources. Except for claims based on private experience, e.g., religious experience, claims are based on sources. We can use the features of these sources to infer credibility. The first feature is the number of sources cited to make a claim. All else equal, the more number of sources saying the same thing, the greater the chances that the claim is true. None of this is to undercut a common observation: lots of people can be wrong about something. A harder test for veracity if a diverse set of people say the same thing. The third test is checking the credibility of the sources.

Relying on the residue is not a panacea. People can simply lie about the source. We want the source to verify what they have been quoted as saying. And in the era of cheap data, this can be easily enabled. Quotes can be linked to video interviews or automatic transcriptions electronically signed by the interviewee. The same system can be scaled to institutions. The downside is that the system may prove onerous. On the other hand, commonly, the same source is cited by many people so a public repository of verified claims and evidence can mitigate much of the burden.

But will this solve the problem? Likely not. For one, people can still commit sins of omission. For two, they can still draft things in misleading ways. For three, trust in sources may not be tied to correctness. All we have done is built a system for establishing provenance. And establishing the provenance is not enough. Instead, we need a system that incentivizes both correctness and presentation that makes correct interpretation highly likely. It is a high bar. But it is the bar—correct and liable to correctly interpreted.

To create incentives for publishing correct claims, we need to either 1. educate the population, which brings me to the previous post, or 2. find ways to build products and recommendations that incentivize correct claims. We likely need both.

## The (Mis)Information Age: Measuring and Improving ‘Digital Literacy’

31 Aug

The information age has bought both bounty and pestilence. Today, we are deluged with both correct and incorrect information. If we knew how to tell apart correct claims from incorrect, we would have inched that much closer to utopia. But the lack of nous in telling apart generally ‘obvious’ incorrect claims from correct claims has brought us close to the precipice of disarray. Thus, improving people’s ability to identify untrustworthy claims as such takes on urgency.

Before we find fixes, it is good to measure how bad things are and what things are bad. This is the task the following paper sets itself by creating a ‘digital literacy’ scale. (Digital literacy is an overloaded term. It means many different things, from the ability to find useful information, e.g., information about schools or government programs, to the ability to protect yourself against harm online (see here and here for how frequently people’s accounts are breached and how often they put themselves at risk of malware or phishing), to the ability to identify incorrect claims as such, which is how the paper uses it.)

Rather than build a skill assessment kind of a scale, the paper measures (really predicts) skills indirectly using some other digital literacy scales, whose primary purpose is likely broader. The paper validates the importance of various constituent items using variable importance and model fit kinds of measures. There are a few dangers of doing that:

1. Inference using surrogates is dangerous as the weakness of surrogates cannot be fully explored with one dataset. And they are liable not to generalize as underlying conditions change. We ideally want measures that directly measure the construct.
2. Variable importance is not the same as important variables. For instance, it isn’t clear why “recognition of the term RSS,” the “highest-performing item by far” has much to do with skill in identifying untrustworthy claims.

Some other work builds uncalibrated measures of digital literacy (conceived as in the previous paper). As part of an effort to judge the efficacy of a particular way of educating people about how to judge untrustworthy claims, the paper provides measures of trust in claims. The topline is that educating people is not hard (see the appendix for the description of the treatment). A minor treatment (see below) is able to improve “discernment between mainstream and false news headlines.”

Understandably, the effects of this short treatment are ‘small.’ The ITT short-term effect in the US is: “a decrease of nearly 0.2 points on a 4-point scale.” Later in the manuscript, the authors provide the substantive magnitude of the .2 pt net swing using a binary indicator of perceived headline accuracy: “The proportion of respondents rating a false headline as “very accurate” or “somewhat accurate” decreased from 32% in the control condition to 24% among respondents who were assigned to the media literacy intervention in wave 1, a decrease of 7 percentage points.” The .2 pt. net swing on a 4 point scale leading to a 7% difference is quite remarkable and generally suggests that there is a lot of ‘reverse’ intra-category movement that the crude dichotomization elides over. But even if we take the crude categories as the quantity of interest, a month later in the US, the 7 percent swing is down to 4 percent:

“…the intervention reduced the proportion of people endorsing false headlines as accurate from 33 to 29%, a 4-percentage-point effect. By contrast, the proportion of respondents who classified mainstream news as not very accurate or not at all accurate rather than somewhat or very accurate decreased only from 57 to 55% in wave 1 and 59 to 57% in wave 2.

Guess et al. 2020

The opportunity to mount more ambitious treatments remains sizable. So does the opportunity to more precisely understand what aspects of the quality of evidence people find hard to discern. And how we could release products that make their job easier.

## Another ANES Goof-em-up: VCF0731

30 Aug

At this point, it’s well established that the ANES CDF’s codebook is not to be trusted (I’m repeating “not to be trusted to include a second link!). Recently, I stumbled across another example of incorrect coding in the cumulative data file, this time in VCF0731 – Do you ever discuss politics with your family or friends?

The codebook reports 5 levels:

Do you ever discuss politics with your family or friends?

1. Yes
5. No

8. DK
9. NA

INAP. question not used

However, when we load the variable and examine the unique values:

# pulling anes-cdf from a GitHub repository
cdf <- rio::import("https://github.com/RobLytle/intra-party-affect/raw/master/data/raw/cdf-raw-trim.rds")

unique(cdf$VCF0731) ## [1] NA 5 1 6 7 We see a completely different coding scheme. We are left adrift, wondering “What is 6? What is 7?” Do 1 and 5 really mean “yes” and “no”? We may never know. For a survey that costs several million dollars to conduct, you’d think we could expect a double-checked codebook (or at least some kind of version control to easily fix these things as they’re identified). ## Survey Experiments With Truth: Learning From Survey Experiments 27 Aug Tools define science. Not only do they determine how science is practiced but also what questions are asked. Take survey experiments, for example. Since the advent of online survey platforms, which made conducting survey experiments trivial, the lure of convenience and internal validity has persuaded legions of researchers to use survey experiments to understand the world. Conventional survey experiments are modest tools. Paul Sniderman writes, “These three limitations of survey experiments—modesty of treatment, modesty of scale, and modesty of measurement—need constantly to be borne in mind when brandishing term experiment as a prestige enhancer.” I think we can easily collapse these in two — treatment (which includes ‘scale’ as he defines it— the amount of time) and measurement. Paul Sniderman Note: We can collapse these three concerns into two— treatment (which includes ‘scale’ as Paul defines it— the amount of time) and measurement. But skillful artisans have used this modest tool to great effect. Famously, Kahneman and Tversky used survey experiments, e.g., Asian Disease Problem, to shed light on how people decide. More recently, Paul Sniderman and Tom Piazza have used survey experiments to shed light on an unsavory aspect of human decision making: discrimination. Aside from shedding light on human decision making, researchers have also used survey experiments to understand what survey measures mean, e.g., Ahler and Sood The good, however, has come with the bad; insight has often come with irreflection. In particular, Paul Sniderman implicitly points to two common mistakes that people make: 1. Not Learning From the Control Group. The focus on differences in means means that we sometimes fail to reflect on what the data in the Control Group tells us about the world. Take the paper on partisan expressive responding, for instance. The topline from the paper is that expressive responding explains half of the partisan gap. But it misses the bigger story—the partisan differences in the Control Group are much smaller than what people expect, just about 6.5% (see here). (Here’s what I wrote in 2016.) 2. Not Putting the Effect Size in Context. A focus on significance testing means that we sometimes fail to reflect on the modesty of effect sizes. For instance, providing people$1 for a correct answer within the context of an online survey interview is a large premium. And if providing a dollar each on 12 (included) questions nudges people from an average of 4.5 correct responses to 5, it suggests that people are resistant to learning or impressively confident that what they know is right. Leaving $7 on the table tells us more than the .5, around which the paper is written. More broadly, researchers are obtuse to the point that sometimes what the results show is how impressively modest the movement is when you ratchet up the dosage. For instance, if an overwhelming number of African Americans favor Whites who have scored just a few points more than a Black student, it is a telling testament to their endorsement of meritocracy. ## Nothing to See Here: Statistical Power and “Oversight” 13 Aug “Thus, when we calculate the net degree of expressive responding by subtracting the acceptance effect from the rejection effect—essentially differencing off the baseline effect of the incentive from the reduction in rumor acceptance with payment—we find that the net expressive effect is negative 0.5%—the opposite sign of what we would expect if there was expressive responding. However, the substantive size of the estimate of the expressive effect is trivial. Moreover, the standard error on this estimate is 10.6, meaning the estimate of expressive responding is essentially zero. https://journals.uchicago.edu/doi/abs/10.1086/694258 (Note: This is not a full review of all the claims in the paper. There is more data in the paper than in the quote above. I am merely using the quote to clarify a couple of statistical points.) There are two main points: 1. The fact that estimate is close to zero and the s.e. is super fat are technically unrelated. The last line of the quote, however, seems to draw a relationship between the two. 2. The estimated effect sizes of expressive responding in the literature are much smaller than the s.e. Bullock et al. (Table 2) estimate the effect of expressive responding at about 4% and Prior et al. (Figure 1) at about ~ 5.5% (“Figure 1(a) shows, the model recovers the raw means from Table 1, indicating a drop in bias from 11.8 to 6.3.”). Thus, one reasonable inference is that the study is underpowered to reasonably detect expected effect sizes. ## Trump Trumps All: Coverage of Presidents on Network Television News 4 May With Daniel Weitzel. The US government is a federal system, with substantial domains reserved for local and state governments. For instance, education, most parts of the criminal justice system, and a large chunk of regulation are under the purview of the states. Further, the national government has three co-equal branches: legislative, executive, and judicial. Given these facts, you would expect news coverage to be broad in its coverage of branches and the levels of government. But there is a sharp skew in news coverage of politicians, with members of the executive branch, especially national politicians (and especially the president), covered far more often than other politicians (see here). Exploiting data from Vanderbilt Television News Archive (VTNA), the largest publicly available database of TV news—over 1M broadcast abstracts spanning 1968 and 2019—we add body to the observation. We searched for references to the president during their presidency and coded each hit as 1. As the figure below shows, references to the president are common. Excluding Trump, on average, a sixth of all articles contain a reference to the sitting president. But Trump is different: 60%(!) of abstracts refer to Trump. Data and scripts can be found here. ## Making an Impression: Learning from Google Ads 31 Oct Broadly, Google Ads works as follows: 1. Advertisers create an ad, choose keywords, and make a bid (on cost-per-click or CPC) (You can bid on cost-per-view and cost-per-impression also, but we limit our discussion to CPC.), 2. the Google Ads account team vets whether the keywords are related to the product being advertised, and 3. people see the ad from the winning bid when they search for a term that includes the keyword or when they browse content that is related to the keyword (some Google Ads are shown on sites that use Google AdSense). There is a further nuance to the last step. Generally, on popular keywords, Google has thousands of candidate ads to choose from. And Google doesn’t simply choose the ad from the winning bid. Instead, it uses data to choose an ad (or a few ads) that yield the most profit (Click Through Rate (CTR)*bid). (Google probably has a more complex user utility function and doesn’t show ads below a low predicted CTR*bid.) In all, who Google shows ads to depends on the predicted CTR and the money it will make per click. Given this setup, we can reason about the audience for an ad. First, the higher the bid, the broader the audience. Second, it is not clear how well Google can predict CTR per ad conditional on keyword bid especially when the ad run is small. And if that is so, we expect Google to show the ad with the highest bid to a random subset of people searching for the keyword or browsing content related to the keyword. Under such conditions, you can use the total number of impressions per demographic group as an indicator of interest in the keyword. For instance, if you make the highest bid on the keyword ‘election’ and you find that total number of impressions that your ad makes among people 65+ are 10x more than people between ages 18-24, under some assumptions, e.g., similar use of ad blockers, similar rates of clicking ads conditional on relevance (which would become same as predicted relevance), similar utility functions (that is younger people are not more sensitive to irritation from irrelevant ads than older people), etc., you can infer relative interest of 18-24 versus 65+ in elections. The other case where you can infer relative interest in a keyword (topic) from impressions is when ad markets are thin. For common keywords like ‘elections,’ Google generally has thousands of candidate ads for national campaigns. But if you only want to show your ad in a small geographic area or an infrequently searched term, the candidate set can be pretty small. If your ad is the only one, then your ad will be shown wherever it exceeds some minimum threshold of predicted CTR*bid. Assuming a high enough bid, you can take the total number of impressions of an ad as a proxy for total searches for the term and how often people browsed related content. With all of this in mind, I discuss results from a Google Ads campaign. More here. ## The Other Side 23 Oct Samantha Laine Perfas of the Christian Science Monitor interviewed me about the gap between perceptions and reality for her podcast ‘perception gaps’ over a month ago. You can listen to the episode here (Episode 2). The Monitor has also made the transcript of the podcast available here. Some excerpts: “Differences need not be, and we don’t expect them to be, reasons why people dislike each other. We are all different from each other, right. …. Each person is unique, but we somehow seem to make a big fuss about certain differences and make less of a fuss about certain other differences.” One way to fix it: If you know so little and assume so much, … the answer is [to] simply stop doing that. Learn a little bit, assume a little less, and see where the conversation goes. The interview is based on the following research: Related blog posts and think pieces: ## Don’t Expose Yourself! Discretionary Exposure to Political Information 10 Oct As the options have grown, so have the fears. Are the politically disinterested taking advantage of the nearly limitless options to opt out of news entirely? Are the politically interested siloing themselves into “echo chambers”? In an eponymous Oxford Research Encylopedia article, I discuss what we think we know, and some concerns about how we can know. Some key points: • Is the gap between how much the politically interested and politically disinterested know about politics increasing, as Post-broadcast Democracy posits? Figure 1 suggests not. • Quantity rather than ratio: “If the dependent variable is partisan affect, how ‘selective’ one is may not matter as much as the net imbalance in consumption—the difference between the number of congenial and uncongenial bits consumed…” • To measure how much political information a person is consuming, you must be able to distinguish political information from its complement. But what isn’t political information? “In this chapter, our focus is on consumption of varieties of political information. The genus is political information. And the species of this genus differ in congeniality, among other things. But what is political information? All information that influences people’s political attitudes or behaviors? If so, then limiting ourselves to news is likely too constraining. Popular television shows like The Handmaid’s Tale, Narcos, and Law and Order have clear political themes. … Shows like Will and Grace and The Cosby Show may be less clearly political, but they also have a political subtext.” (see Figure 4) … “Even if we limit ourselves to news, the domain is still not clear. Is news about a bank robbery relevant political information? What about Hillary Clinton’s haircut? To the extent that each of these affect people’s attitudes, they are arguably pertinent. “ • One of the challenges with inferring consumption based on domain level data is that domain level data are crude. Going to http://nytimes.com is not the same as reading political news. And measurement error may vary by the kind of person. For instance, say we label http://nytimes.com as political news. For the political junkie, the measurement error may be close to zero. For teetotalers, it may be close to 100% (see more). • Show people a few news headlines along with the news source (you can randomize the source). What can you learn from a few such ‘trials’? You cannot learn what proportion of news they get from a particular source. you can learn the preferences, but not reliably. More from the paper: “Given the problems with self-reports, survey instruments that rely on behavioral measures are plausibly better. … We coded congeniality trichotomously: congenial, neutral, or uncongenial. The correlations between trials are alarmingly low. The polychoric correlation between any two trials range between .06 to .20. And the correlation between choosing political news in any two trials is between -.01 and .05.” • Following up on the previous point: preference for a source which has a mean slant != preference for slanted news. “Current measures of [selective exposure] are beset with five broad problems. First is conceptual errors. For instance, people frequently equate preference for information from partisan sources with a preference for congenial information.” ## Code 44: How to Read Ahler and Sood 27 Jun This is a follow-up to the hilarious Twitter thread about the sequence of 44s. Numbers in Perry’s 538 piece come from this paper. First, yes 44s are indeed correct. (Better yet, look for yourself.) But what do the 44s refer to? 44 is the average of all the responses. When Perry writes “Republicans estimated the share at 46 percent,” (we have similar language in the paper, which is regrettable as it can be easily misunderstood), it doesn’t mean that every Republican thinks so. It may not even mean that the median Republican thinks so. See OA 1.7 for medians, OA 1.8 for distributions, but see also OA 2.8.1, Table OA 2.18, OA 2.8.2, OA 2.11 and Table OA 2.23. Key points = 1. Large majorities overestimate the share of party-stereotypical groups in the party, except for Evangelicals and Southerners. 2. Compared to what people think is the share of a group in the population, people still think the share of the group in the stereotyped party is greater. (But how much more varies a fair bit.) 3. People also generally underestimate the share of counter-stereotypical groups in the party. ## Bad Hombres: Bad People on the Other Side 8 Dec Why do many people think that people on the other side are not well motivated? It could be because they think that the other side is less moral than them. And since opprobrium toward the morally defective is the bedrock of society, thinking that the people in the other group are less moral naturally leads people to censure the other group. But it can’t be that two groups simultaneously have better morals than the other. It can only be that people in the groups think they are better. This much logic dictates. So, there has to be a self-serving aspect to moral standards. And this is what often leads people to think that the other side is less moral. Accepting this is not the same as accepting moral relativism. For even if we accept that some things are objectively more moral—not being sexist or racist say—some groups—those that espouse that a certain sex is superior or certain races are better—will still think that they are better. But how do people come to know of other people’s morals? Some people infer morals from political aims. And that is a perfectly reasonable thing to do as political aims reflect what we value. For instance, a Republican who values ‘life’ may think that Democrats are morally inferior because they support the right to abortion. But the inference is fraught with error. As matters stand, Democrats would also like women to not go through the painful decision of aborting a fetus. They just want there to be an easy and safe way for women should they need to. Sometimes people infer morals from policies. But support for different policies can stem from having different information or beliefs about causal claims. For instance, Democrats may support a carbon tax because they believe (correctly) the world is warming and because they think that the carbon tax is what will help reduce global warming the best and protect American interests. Republicans may dispute any part of that chain of logic. The point isn’t what is being disputed per se, but what people will infer about others if they just had information about the policies they support. Hanlon’s razor is often a good rule. ## Why Do People (Re)-Elect Bad Leaders? 7 Dec ‘Why do people (re)-elect bad leaders?’ used to be a question that people only asked of third-world countries. No more. The recent election of unfit people to prominent positions in the U.S. and elsewhere has finally woken some American political scientists from their mildly racist reverie—the dream that they are somehow different. So why do people (re)-elect bad leaders? One explanation that is often given is that people prefer leaders that share their ethnicity. The conventional explanation for preferring co-ethnics is that people expect co-ethnics (everyone) to do better under a co-ethnic leader. But often enough, the expectation seems more like wishful thinking than anything else. After all, the unsuitability of some leaders is pretty clear. If it is wishful thinking, then how do we expose it? More importantly, how do we fix it? Let’s for the moment assume that people care about everyone. And if they were to learn that the co-ethnic leader is much worse than someone else, they may switch votes. But what if people care about the welfare of co-ethnics more than others? The ‘good’ thing about bad leaders is that they are generally bad for everyone. So, if they knew better, they would still switch their vote. You can verify these points using a behavioral trust game where people observe allocators of different ethnicities and different competence, and also observe welfare of both co-ethnics and others. You can also use the game to study some of the deepest concerns about ‘negative party ID’—that people will harm themselves to spite others. ## Party Time 2 Dec It has been nearly five years since the publication of Affect, Not Ideology: A Social Identity Perspective on Polarization. In that time, the paper has accumulated over 450 citations according to Google Scholar. (Citation counts on Google Scholar tend to be a bit optimistic.) So how does the paper hold up? Some reflections: • Disagreement over policy conditional on aims should not mean that you think that people you disagree with are not well motivated. But regrettably, it often does. • Lack of real differences doesn’t mean a lack of perceived differences. See here, here, here, and here. • The presence of real differences is no bar to liking another person or group. Nor does a lack of real differences come in the way of disliking another person or group. History of racial and ethnic hatred will attest to the point. In fact, why small differences often serve as durable justifications for hatred is one of the oldest and deepest questions in all of social science. (Paraphrasing from Affectively Polarized?.) Evidence on the point: 1. Sort of sorted but definitely polarized 2. Assume partisan identity is slow moving as Green, Palmquist, and Schickler (2002) among others show. And then add to it the fact people still like their ‘own’ party a fair bit—thermometer ratings are a toasty 80 and haven’t budged. See the original paper. 3. People like ideologically extreme elites of the party they identify with a fair bit (see here). • It may seem surprising to some that people can be so angry when they spend so little time on politics and know next to nothing about it. But it shouldn’t be. Information generally gets in the way of anger. Again, the history of racial bigotry is a good example. • The title of the paper is off in two ways. First, partisan affect can be caused by ideology. Not much of partisan affect may be founded in ideological differences, but at least some of it is. (I always thought so.) Secondly, the paper does not offer a social identity perspective on polarization. • The effect that campaigns have on increasing partisan animus is still to be studied carefully. Certainly, ads play but a small role in it. • Evidence on the key take-home point—that partisans dislike each other a fair bit—continues to mount. The great thing is that people have measured partisan affect in many different ways, including using IAT and trust games. Evidence that IAT is pretty unreliable is reasonably strong, but trust games seem reasonable. Also see my 2011 note on measuring partisan affect coldly. • Interpreting over-time changes is hard. That was always clear to us. But see Figure 1 here that controls for a bunch of socio-demographic variables, and note that the paper also has over-time cross-country to clarify inferences further. • If you assume that people learn about partisans from elites, reasoning what kinds of people would support this ideological extremist or another, it is easy to understand why people may like the opposing party less over time (though trends among independents should be parallel). The more curious thing is that people still like the party they identify with and approve of ideologically extreme elites of their party (see here). ## Learning About [the] Loss (Function) 7 Nov One of the things we often want to learn is the actual loss function people use for discounting ideological distance between self and a legislator. Often people try to learn the loss function using over actual distances. But if the aim is to learn the loss function, perceived distance rather than actual distance is better. It is so because perceived = what the voter believes to be true. People can then use the function to simulate out scenarios if perceptions = fact. ## Confirmation Bias: Confirming Media Bias 31 Oct It used to be that searches for ‘Fox News Bias’ were far more common than searches for ‘CNN bias’. Not anymore. The other notable thing—correlated peaks around presidential elections. Note also that searches around midterm elections barely rise above the noise. ## God, Weather, and News vs. Porn 22 Oct The Internet is for porn (Avenue Q). So it makes sense to measure things on the Internet in porn units. I jest, just a bit. In Everybody Lies, Seth Stephens Davidowitz points out that people search for porn more than weather on GOOG. Data from Google Trends for the disbelievers. But how do searches for news fare? Surprisingly well. And it seems the new president is causing interest in news to outstrip interest in porn. Worrying, if you take Posner’s point that people’s disinterest in politics is a sign that they think the system is working reasonably well. The last time searches for news > porn was when another Republican was in the White House! How is the search for porn affected by Ramadan? For answer, we turn to Google Trends from Pakistan. But you may say that the trend is expected given Ramadan is seen as a period for ritual purification. And that is a reasonable point. But you see the same thing with Eid-ul-Fitr and porn. And in Ireland, of late, it seems searches for porn increase during Christmas. ## Measuring Segregation 31 Aug Dissimilarity index is a measure of segregation. It runs as follows: $\frac{1}{2} \sum\limits_{i=1}^{n} \frac{g_{i1}}{G_1} - \frac{g_{i2}}{G_2}$ where: $g_{i1}$ is population of $g_1$ in the ith area $G_{i1}$ is population of $g_1$ in the larger area from which dissimilarity is being measured against The measure suffers from a couple of issues: 1. Concerns about lumpiness. Even in a small area, are black people at one end, white people at another? 2. Choice of baseline. If the larger area (say a state) is 95\% white (Iowa is 91.3% White), dissimilarity is naturally likely to be small. One way to address the concern about lumpiness is to provide an estimate of the spatial variance of the quantity of interest. But to measure variance, you need local measures of the quantity of interest. One way to arrive at local measures is as follows: 1. Create a distance matrix across all addresses. Get latitude and longitude. And start with Euclidean distances, though smart measures that take account of physical features are a natural next step. (For those worried about computing super huge matrices, the good news is that computation can be parallelized.) 2. For each address, find n closest addresses and estimate the quantity of interest. Where multiple houses are similar distance apart, sample randomly or include all. One advantage of n closest rather than addresses in a particular area is that it naturally accounts for variations in density. But once you have arrived at the local measure, why just report variance? Why not report means of compelling common-sense metrics, like the proportion of addresses (people) for whom the closest house has people of another race? As for baseline numbers (generally just a couple of numbers): they are there to help you interpret. They can be brought in later. ## The Innumerate American 19 Feb In answering a question, scientists sometimes collect data that answers a different, sometimes yet more important question. And when that happens, scientists sometimes overlook the easter egg. This recently happened to me, or so I think. Kabir and I recently investigated the extent to which estimates of motivated factual learning are biased (see here). As part of our investigation, we measured numeracy. We asked American adults to answer five very simple questions (the items were taken from Weller et al. 2002): 1. If we roll a fair, six-sided die 1,000 times, on average, how many times would the die come up as an even number? — 500 2. There is a 1% chance of winning a$10 prize in the Megabucks Lottery. On average, how many people would win the \$10 prize if 1,000 people each bought a single ticket? — 10
3. If the chance of getting a disease is 20 out of 100, this would be the same as having a % chance of getting the disease. — 20
4. If there is a 10% chance of winning a concert ticket, how many people out of 1,000 would be expected to win the ticket? — 100
5. In the PCH Sweepstakes, the chances of winning a car are 1 in a 1,000. What percent of PCH Sweepstakes tickets win a car? — .1%

The average score was about 57%, and the standard deviation was about 30%. Nearly 80% (!) of the people couldn’t answer that 1 in a 1000 chance is .1% (see below). Nearly 38% couldn’t answer that a fair die would turn up, on average, an even number 500 times every 1000 rolls. 36% couldn’t calculate how many people out of a 1,000 would win if each had a 1% chance. And 34% couldn’t answer that 20 out of 100 means 20%.

If people have trouble answering these questions, it is likely that they struggle to grasp some of the numbers behind how the budget is allocated, or for that matter, how to craft their own family’s budget. The low scores also amply illustrate that the education system fails Americans.

Given the importance of numeracy in a wide variety of domains, it is vital that we pay greater attention to improving it. The problem is also tractable — with the advent of good self-learning tools, it is possible to intervene at scale. Solving it is also liable to be good business. Given numeracy is liable to improve people’s capacity to count calories, make better financial decisions, among other things, health insurance companies could lower premiums in lieu of people becoming more numerate, and lending companies could lower interest rates in exchange for increases in numeracy.