The (Mis)Information Age: Provenance is Not Enough

31 Aug

The information age has bought both bounty and pestilence. Today, we are deluged with both correct and incorrect information. If we knew how to tell apart correct claims from incorrect, we would have inched that much closer to utopia. But the lack of nous in telling apart generally ‘obvious’ incorrect claims from correct claims has brought us close to the precipice of disarray. Thus, improving people’s ability to identify untrustworthy claims as such takes on urgency.

http://gojiberries.io/2020/08/31/the-misinformation-age-measuring-and-improving-digital-literacy/

Inferring the Quality of Evidence Behind the Claims: Fact Check and Beyond

One way around misinformation is to rely on an expert army that assesses the truth value of claims. However, assessing the truth value of a claim is hard. It needs expert knowledge and careful research. When validating, we have to identify with which parts are wrong, which parts are right but misleading, and which parts are debatable. All in all, it is a noisy and time-consuming process to vet a few claims. Fact check operations, hence, cull a small number of claims and try to validate those claims. As the rate of production of information increases, thwarting misinformation by checking all the claims seems implausibly expensive.

Rather than assess the claims directly, we can assess the process. Or, in particular, the residue of one part of the process for making the claim—sources. Except for claims based on private experience, e.g., religious experience, claims are based on sources. We can use the features of these sources to infer credibility. The first feature is the number of sources cited to make a claim. All else equal, the more number of sources saying the same thing, the greater the chances that the claim is true. None of this is to undercut a common observation: lots of people can be wrong about something. A harder test for veracity if a diverse set of people say the same thing. The third test is checking the credibility of the sources.

Relying on the residue is not a panacea. People can simply lie about the source. We want the source to verify what they have been quoted as saying. And in the era of cheap data, this can be easily enabled. Quotes can be linked to video interviews or automatic transcriptions electronically signed by the interviewee. The same system can be scaled to institutions. The downside is that the system may prove onerous. On the other hand, commonly, the same source is cited by many people so a public repository of verified claims and evidence can mitigate much of the burden.

But will this solve the problem? Likely not. For one, people can still commit sins of omission. For two, they can still draft things in misleading ways. For three, trust in sources may not be tied to correctness. All we have done is built a system for establishing provenance. And establishing the provenance is not enough. Instead, we need a system that incentivizes both correctness and presentation that makes correct interpretation highly likely. It is a high bar. But it is the bar—correct and liable to correctly interpreted.

To create incentives for publishing correct claims, we need to either 1. educate the population, which brings me to the previous post, or 2. find ways to build products and recommendations that incentivize correct claims. We likely need both.

Trump Trumps All: Coverage of Presidents on Network Television News

4 May

With Daniel Weitzel.

The US government is a federal system, with substantial domains reserved for local and state governments. For instance, education, most parts of the criminal justice system, and a large chunk of regulation are under the purview of the states. Further, the national government has three co-equal branches: legislative, executive, and judicial. Given these facts, you would expect news coverage to be broad in its coverage of branches and the levels of government. But there is a sharp skew in news coverage of politicians, with members of the executive branch, especially national politicians (and especially the president), covered far more often than other politicians (see here). Exploiting data from Vanderbilt Television News Archive (VTNA), the largest publicly available database of TV news—over 1M broadcast abstracts spanning 1968 and 2019—we add body to the observation. We searched for references to the president during their presidency and coded each hit as 1. As the figure below shows, references to the president are common. Excluding Trump, on average, a sixth of all articles contain a reference to the sitting president. But Trump is different: 60%(!) of abstracts refer to Trump.

Data and scripts can be found here.

Good News: Principles of Good Journalism

12 Mar

If fake news—deliberate disinformation, not uncongenial news—is one end of the spectrum, what is the other end of the spectrum?

To get at the question, we need a theory of what news should provide. A theory of news, in turn, needs a theory of citizenship, which prescribes the information people need to execute their role, and an empirically supported behavioral theory of how people get that information.

What a democracy expects of people varies by the conception of democracy. Some theories of democracy only require citizens to have enough information to pick the better candidate when differences in candidates are material. Others, like deliberative democracy, expect people to be well informed and to have thought through various aspects of policies.

I opt for deliberative democracy to guide expectations about people for two reasons. Not only does the theory best express the highest ideals of democracy, but it also has the virtue of answering a vital question well. If all news was equally profitable to produce and was as widely read, what kind of news would lead to the best political outcomes, as judged by idealized versions of people—people who have all the information and all the time to think through the issues?

There are two virtues of answering such a question. First, it offers a convenient place to start answering what we mean by ‘good’ news; we can bring in profitability and reader preferences later. Second, engaging with it uncovers some obvious aspects of ‘good’ news.

For news to positively affect political outcomes (not in the shallow, instrumental sense), the news has to be about politics. Rather than news about Kim Kardashian or opinions about the hottest boots this season, ‘good’ news is about policymaker, policy-implementor, and policy-relevant news.

News about politics is a necessary but not a sufficient condition. Switching from discussing Kim Kardashian’s dress to Hillary Clinton’s is very plausibly worse. Thus, we also want the news to be substantive, engaging with real issues rather than cosmetic concerns.

Substantively engaging with real issues is still no panacea. If the information is not correct, it will misinform than inform the debate. Thus, the third quality of ‘good’ news is correctness.

The criterion for “good” news is, however, not just correctness, but it is the correctness of interpretation. ‘Good’ news allows people to draw the right conclusions. For instance, reporting murder rates as say ‘a murder per hour’ without reporting the actual number of murders or comparing the probability of being murdered to other common threats to life may instill greater fear in people than ‘optimal.’ (Optimal, as judged by better-informed versions of ourselves who have been given time to think. We can also judge optimal by correctness—did people form unbiased, accurate beliefs after reading the news?)

Not all issues, however, lend themselves to objective appraisals of truth. To produce ‘good’ news, the best you can do is have the right process. The primary tool that journalists have in the production of news is the sources they use to report on stories. (While journalists increasingly use original data to report, the reliance on people is widespread.) Thus, the way to increase correctness is through manipulating aspects of sources. We can increase correctness by increasing the quality of sources, e.g., source more knowledgeable people with low incentives to cook the books, increase the diversity of sources, e.g., not just government officials but also plausibly major NGOs, and the number of sources.

If we emphasize correctness, we may fall short on timeliness. News has to be timely enough to be useful, aside from being correct enough to guide policy and opinion correctly.

News can be narrowly correct but may commit sins of omission. ‘Good’ news provides information on all sides of the issue. ‘Good’ news highlights and engages with all serious claims. It doesn’t give time to discredited claims for “balance.”

Second-to-last, news should be delivered in the right tone. Rather than speculative ad-hominem attacks, “good” news engages with arguments and data.

Lastly, news contributes to the public kitty only if it is original. Thus, ‘good’ news is original. (Plagiarism reduces the incentives for producing quality news because it eats into the profits.)

Good NYT: Provision of ‘Not News’ in the NYT Over Time

29 Nov

The mainstream American news media is under siege from the political right, but for the wrong reasons. To the hyper-partisan political elites, small inequities in slant matter a lot. But hyperventilating about small issues doesn’t magically turn them into serious problems. It just makes them loom larger. And takes the focus away from the much more serious problems. The big afflictions of ‘MSM’ are: vacuity, sensationalism, a preference for opinions over facts, poor understanding of important issues, and disinterest in covering them diligently, disinterest in what goes on outside American borders, a poor understanding of numbers, policy myopia—covering events rather than issues, and fixation with breaking news.

Here we shed light on a small piece of the real estate: the provision of ‘not news.’ By ‘not news’ we mean news about cooking, travel, fashion, home decor, and such. We are very conservative in what we code as ‘not news,’ leaving news about health, anything local, and such in place.

We analyze provision of ‘not news’ in The New York Times (NYT), the nation’s newspaper of record. NYT is both well-regarded and popular. It has won more Pulitzer awards than any other newspaper. And it is the 30th most visited website in the U.S. (as of October 2017).

So, has the proportion of news stories about topics unrelated to politics or the economy, such as cooking, travel, fashion, music, etc., in NYT gone up over time? (For issues related to measurement, see here.) As the figure shows, the net provision of not news increased by 10 percentage points between 1987 and 2007.

Interested in learning about more ways in which NYT has changed over time? See here.

Town Level Data on Cable Operators and Cable Channels

12 Sep

I am pleased to announce the release of TV and Cable Factbook Data (1997–2002; 1998 coverage is modest). Use of the data is restricted to research purposes.

Background

In 2007, Stefano DellaVigna and Ethan Kaplan published a paper that used data from Warren’s Factbook to identify the effect of the introduction of Fox News Channel on Republican vote share (link to paper). Since then, a variety of papers exploiting the same data and identification scheme have been published (see, for instance, Hopkins and Ladd, Clinton and Enamorado, etc.)

In 2012, I embarked on a similar such project—trying to use the data to study the impact of the introduction of Fox News Channel on attitudes and behaviors related to climate change. However, I found the original data to be limited—DellaVigna and Kaplan had used a team of research assistants to manually code a small number of variables for a few years. So I worked on extending the data. I planned on extending the data in two ways: adding more years, and adding ‘all’ the data for each year. To that end, I developed custom software. The data collection and parsing of a few thousand densely packed, inconsistently formatted, pages (see below) to a usable CSV (see below) finished sometime early in 2014. (To make it easier to create a crosswalk with other geographical units, I merged the data with Town lat/long (centroid) and elevation data from http://www.fallingrain.com/world/US/.)

Sample Page
cable_factbook_example
Snapshot of the Final CSV
csv_snap

Soon after I finished the data collection, however, I became aware of a paper by Martin and Yurukoglu. They found some inconsistencies between the Nielsen data and the Factbook data (see Appendix C1 of paper), tracing the inconsistencies to delays in updating the Factbook data—“Updating is especially poor around [DellaVigna and Kaplan] sample year. Between 1999 and 2000, only 22% of observations were updated. Between 1998 and 1999, only 37% of observations were updated.” Based on their paper, I abandoned the plan to use the data, though I still believe the data can be used for a variety of important research projects, including estimating the impact of the introduction of Fox News. Based on that belief, I am releasing the data.

Ipso Facto: Analysis of Complaints to IPSO

11 Jun

Independent Press Standards Agency (IPSO) handles complaints about accuracy etc. in the media in the U.K. Against which media organization are most complaints filed? And against which organization are the complaints most often upheld? We answer these questions using data from the IPSO website. (The data and scripts behind the analysis are posted on GitHub.)

Between its formation in September, 2014 and May 20th, 2016, IPSO received 371 complaints. Expectedly, tabloid newspapers are well represented. Of the 371 complaints, The Telegraph alone received 35 complaints, or about 9.4% of the total complaints. It was followed closely by The Mail with 31 complaints. The Times had 25 complaints filed against it, The Mirror and The Express 22 each, and The Sun, 19 complaints.

Generally, less than half the number of complaints were completely or partly upheld. Topping the list was The Express and The Telegraph with 10 upheld complaints each. And following close behind was The Times with 8 complaints, The Mail with 6, and The Sun and the Daily Star with 4 each.

See also the plot of batting average of media organizations with most complaints against them.

Some Facts About PolitiFact

27 May

I assessed PolitiFact on:

1. Imbalance in scrutiny: Do they vet statements by Democrats or Democratic-leaning organizations more than statements Republicans or Republican-leaning organizations?

2. Batting average by party: Roughly n_correct/n_checked, but instantiated here as mean Politifact rating.

To answer the questions, I scraped the data from PolitiFact and independently coded and appended data on the party of the person or organization covered. (Feel free to download the script for scraping and analyzing the data, scraped data and data linking people and organizations to party from the GitHub Repository.)

Until now, Politifact has checked veracity 3,859 statements by 703 politicians and organizations. Of these, I was able to establish the partisanship of 554 people and organizations. I restrict the analysis to 3,396 statements by organizations and people whose partisanship I could establish and who lean either towards the Republican or Democratic party. I code the Politifact 6-point True to Pants on Fire scale (true, mostly-true, half-true, barely-true, false, pants-fire) linearly so that it lies between 0 (pants-fire) and 1 (true).

Of the 3,396 statements, about 44% (n = 1506) of the statements checked by PolitiFact are by Democrats or Democratic-leaning organizations. Rest of the roughly 56% (n = 1890) are by Republicans or Republican-leaning organizations. The average PolitiFact rating of statements by Democrats or Democratic-leaning organizations (batting average) is .63; it is .49 for statements by Republicans or Republican-leaning organizations.

To check whether the results are driven by some people receiving a lot of scrutiny, I tallied the total number of statements investigated for each person. Unsurprisingly, there is a large skew, with a few prominent politicians receiving a bulk of the attention. For instance, PolitiFact investigated more than 500 claims by Barack Obama alone. The figure below plots the total number of statements investigated for thirty politicians receiving the most scrutiny.
t30_total_investigated

If you take out Barack Obama, the percentage of Democrats receiving scrutiny reduces to 33.98%. More generally, limiting ourselves to the bottom 90% of the politicians in terms of scrutiny received, the share of Democrats is about 42.75%.

To analyze whether there is selection bias in covering politicians who say incorrect things more often, I estimated the correlation between the batting average and the total number of statements investigated. The correlation is very weak and does not appear to vary systematically by party. Accounting for the skew by taking the log of the total statements or by estimating a rank-ordered correlation has little effect. The figure below plots batting average as a function of total statements investigated.

batting_average_total_investigated

Caveats About Interpretation

To interpret the numbers, you need to make two assumptions:

1. The number of statements made by Republicans and Republican-leaning persons and organizations is the same as that made by people and organizations on the left.

2. Truthiness of statements by Republican and Republican-leaning persons and organizations is the same as that of left-leaning people and organizations.

Getting a Measure of Measures of Selective Exposure

24 Jul

Ideally, we would like to be able to place ideology of each bit of information consumed in relation to the ideological location of the person. And we would like a time-stamped distribution of the bits consumed. We can then summarize various moments of that distribution (or the distribution of ideological distances). And that would be that. (If we were worried about dimensionality, we would do it by topic.)

But lack of data means we must change the estimand. We must code each bit of information as merely uncongenial or uncongenial. This means taking directionality out of the equation. For a Republican at a 6 on a 1 to 7 liberal to conservative scale, consuming a bit of information at 5 is the same as consuming a bit at 7.

The conventional estimand then is a set of two ratios: (Bits of politically congenial information consumed)/(All political information) and (Bits of uncongenial information)/(All political information consumed). Other reasonable formalizations exist, including the difference between congenial and uncongenial. (Note that the denominator is absent, and reasonably so.)

To estimate these quantities, we must often make further assumptions. First, we must decide on the domain of political information. That domain is likely vast and increasing by the minute. We are all producers of political information now. (We always were but today we can easily access political opinions of thousands of lay people.) But see here for some thoughts on how to come up with the relevant domain of political information from passive browsing data.

Next, people often code ideology at the level of ‘source.’ The New York Times is ‘independent’ or ‘liberal’ and ‘Fox’ simply ‘conservative’ or perhaps more accurately ‘Republican-leaning.’ (Continuous measures of ideology — as estimated by Groseclose and Milyo or Gentzkow and Shapiro — are also assigned at the source level.) This is fine except that it means coding all bits of information consumed from a source as the same. And there are some attendant risks. We know that not all NYT articles are ‘liberal.’ In fact, we know much of it is not even political news. A toy example of how such measures can mislead:

Page Views: 10 Fox, 10 CNN. Est: 10/20
But say Fox Pages 7R, 3D and CNN 5R, 5D
Est: 7/10 + 5/10 = 12/20

If the measure of ideology is continuous, there are still some risks. If we code all page views as the mean ideology of the source, we assume that the person views a random sample of pages on the source. (Or some version of that.) But that is too implausible an assumption. It is much more likely that a liberal reading the NYT likely stays away from the David Brooks’ columns. If you account for such within source self-selection, selective exposure measures based on source level coding are going to be downwardly biased — that is find people as less selective than they are.

Discussion until now has focused on passive browsing data, eliding over survey measures. There are two additional problems with survey measures. One is about the denominator. Measures based on limited choice experiments like ones used by Iyengar and Hahn 2009 are bad measures of real-life behavior. In real life, we just have far more choices. And inferences from such experiments can at best recover ordinal rankings. The second big problem with survey measures is ‘expressive responding.’ Republicans indicating they watch Fox News not because they do but because they want to convey they do.

Where’s the Porn? Classifying Porn Domains Using a Calibrated Keyword Classifier

23 Jul

Aim: Given a very large list of unique domains, find domains carrying adult content.

In the 2004 comScore browsing data, for instance, there are about a million unique domains. Comparing a million unique domain names against a large database is doable. But access to such databases doesn’t often come cheap. So a hack.

Start with an exhaustive keyword search containing porn-related keywords. Here’s mine

breast, boy, hardcore, 18, queen, blowjob, movie, video, love, play, fun, hot, gal, pee, 69, naked, teen, girl, cam, sex, pussy, dildo, adult, porn, mature, sex, xxx, bbw, slut, whore, tit, pussy, sperm, gay, men, cheat, ass, booty, ebony, asian, brazilian, fuck, cock, cunt, lesbian, male, boob, cum, naughty

For the 2004 comScore data, this gives about 140k potential porn domains. Compare this list to the approximately 850k porn domains in the Shallalist. This leaves us with a list of 68k domains with uncertain status. Use one of the many URL classification APIs. Using Trusted Source API, I get about 20k porn and 48k non-porn.

This gives us the lower bound of adult domains. But perhaps much too low.

To estimate the false positives, take a large random sample (say 10,000 unique domains). Compare results from keyword search and eliminate using API to API search of all 10k domains. This will give you an estimate of the false positive rate. But you can learn from the list of false negatives to improve your keyword search. And redo everything. A couple of iterations can produce a sufficiently low false negative rate (false positive rate is always ~ 0). (For 2004 comScore data, a false negative rate of 5% is easily achieved.)

See PyDomains.