Subscribing To Unpopular Opinion

11 Dec

How does the move from advertising-supported content to a subscription model, e.g., NY Times, Substack luminaries, etc., change the content being produced? Ben Thompson mulls over the question in a new column. One of the changes he foresees is that the content will be increasingly geared toward subscribers—elites who are generally interested in “unique and provocative” content. The focus on unique and provocative can be problematic in at least three ways: 

  1. “Unique and provocative” doesn’t mean correct. And since people often confuse original, counterintuitive points as deep, correct, and widely true insights about the world, it is worrying. The other danger is that journalism will devolve into English literature.
  2. As soon as you are in the idea generation business, you pay less attention to “obvious” things, which are generally the things that deserve our most careful attention.
  3. There is a greater danger of people falling into silos. Ben quotes Johan Peretti: “A subscription business model leads towards being a paper for a particular group and a particular audience and not for the broadest public.” Ben summarizes Peretti’s point as: “He’s alluding, in part, to the theory that the Times’s subscriber base wants to read a certain kind of news and opinion — middle/left of center, critical of Donald Trump, etc. — and that straying from that can cost it subscribers.”

There are other changes that a subscriber driven model will wreak. The production of news will favor the concerns of the elites even more. The demise of “newspaper of record” will mean that a common understanding of what is important and how we see things will continue to decline.

p.s. It is not lost on me that Ben’s newsletter is one such subscriber driven outlet.

The (Mis)Information Age: Provenance is Not Enough

31 Aug

The information age has bought both bounty and pestilence. Today, we are deluged with both correct and incorrect information. If we knew how to tell apart correct claims from incorrect, we would have inched that much closer to utopia. But the lack of nous in telling apart generally ‘obvious’ incorrect claims from correct claims has brought us close to the precipice of disarray. Thus, improving people’s ability to identify untrustworthy claims as such takes on urgency.

Inferring the Quality of Evidence Behind the Claims: Fact Check and Beyond

One way around misinformation is to rely on an expert army that assesses the truth value of claims. However, assessing the truth value of a claim is hard. It needs expert knowledge and careful research. When validating, we have to identify with which parts are wrong, which parts are right but misleading, and which parts are debatable. All in all, it is a noisy and time-consuming process to vet a few claims. Fact check operations, hence, cull a small number of claims and try to validate those claims. As the rate of production of information increases, thwarting misinformation by checking all the claims seems implausibly expensive.

Rather than assess the claims directly, we can assess the process. Or, in particular, the residue of one part of the process for making the claim—sources. Except for claims based on private experience, e.g., religious experience, claims are based on sources. We can use the features of these sources to infer credibility. The first feature is the number of sources cited to make a claim. All else equal, the more number of sources saying the same thing, the greater the chances that the claim is true. None of this is to undercut a common observation: lots of people can be wrong about something. A harder test for veracity if a diverse set of people say the same thing. The third test is checking the credibility of the sources.

Relying on the residue is not a panacea. People can simply lie about the source. We want the source to verify what they have been quoted as saying. And in the era of cheap data, this can be easily enabled. Quotes can be linked to video interviews or automatic transcriptions electronically signed by the interviewee. The same system can be scaled to institutions. The downside is that the system may prove onerous. On the other hand, commonly, the same source is cited by many people so a public repository of verified claims and evidence can mitigate much of the burden.

But will this solve the problem? Likely not. For one, people can still commit sins of omission. For two, they can still draft things in misleading ways. For three, trust in sources may not be tied to correctness. All we have done is built a system for establishing provenance. And establishing the provenance is not enough. Instead, we need a system that incentivizes both correctness and presentation that makes correct interpretation highly likely. It is a high bar. But it is the bar—correct and liable to correctly interpreted.

To create incentives for publishing correct claims, we need to either 1. educate the population, which brings me to the previous post, or 2. find ways to build products and recommendations that incentivize correct claims. We likely need both.

Trump Trumps All: Coverage of Presidents on Network Television News

4 May

With Daniel Weitzel.

The US government is a federal system, with substantial domains reserved for local and state governments. For instance, education, most parts of the criminal justice system, and a large chunk of regulation are under the purview of the states. Further, the national government has three co-equal branches: legislative, executive, and judicial. Given these facts, you would expect news coverage to be broad in its coverage of branches and the levels of government. But there is a sharp skew in news coverage of politicians, with members of the executive branch, especially national politicians (and especially the president), covered far more often than other politicians (see here). Exploiting data from Vanderbilt Television News Archive (VTNA), the largest publicly available database of TV news—over 1M broadcast abstracts spanning 1968 and 2019—we add body to the observation. We searched for references to the president during their presidency and coded each hit as 1. As the figure below shows, references to the president are common. Excluding Trump, on average, a sixth of all articles contain a reference to the sitting president. But Trump is different: 60%(!) of abstracts refer to Trump.

Data and scripts can be found here.

Good News: Principles of Good Journalism

12 Mar

If fake news—deliberate disinformation, not uncongenial news—is one end of the spectrum, what is the other end of the spectrum?

To get at the question, we need a theory of what news should provide. A theory of news, in turn, needs a theory of citizenship, which prescribes the information people need to execute their role, and an empirically supported behavioral theory of how people get that information.

What a democracy expects of people varies by the conception of democracy. Some theories of democracy only require citizens to have enough information to pick the better candidate when differences in candidates are material. Others, like deliberative democracy, expect people to be well informed and to have thought through various aspects of policies.

I opt for deliberative democracy to guide expectations about people for two reasons. Not only does the theory best express the highest ideals of democracy, but it also has the virtue of answering a vital question well. If all news was equally profitable to produce and was as widely read, what kind of news would lead to the best political outcomes, as judged by idealized versions of people—people who have all the information and all the time to think through the issues?

There are two virtues of answering such a question. First, it offers a convenient place to start answering what we mean by ‘good’ news; we can bring in profitability and reader preferences later. Second, engaging with it uncovers some obvious aspects of ‘good’ news.

For news to positively affect political outcomes (not in the shallow, instrumental sense), the news has to be about politics. Rather than news about Kim Kardashian or opinions about the hottest boots this season, ‘good’ news is about policymaker, policy-implementor, and policy-relevant news.

News about politics is a necessary but not a sufficient condition. Switching from discussing Kim Kardashian’s dress to Hillary Clinton’s is very plausibly worse. Thus, we also want the news to be substantive, engaging with real issues rather than cosmetic concerns.

Substantively engaging with real issues is still no panacea. If the information is not correct, it will misinform than inform the debate. Thus, the third quality of ‘good’ news is correctness.

The criterion for “good” news is, however, not just correctness, but it is the correctness of interpretation. ‘Good’ news allows people to draw the right conclusions. For instance, reporting murder rates as say ‘a murder per hour’ without reporting the actual number of murders or comparing the probability of being murdered to other common threats to life may instill greater fear in people than ‘optimal.’ (Optimal, as judged by better-informed versions of ourselves who have been given time to think. We can also judge optimal by correctness—did people form unbiased, accurate beliefs after reading the news?)

Not all issues, however, lend themselves to objective appraisals of truth. To produce ‘good’ news, the best you can do is have the right process. The primary tool that journalists have in the production of news is the sources they use to report on stories. (While journalists increasingly use original data to report, the reliance on people is widespread.) Thus, the way to increase correctness is through manipulating aspects of sources. We can increase correctness by increasing the quality of sources, e.g., source more knowledgeable people with low incentives to cook the books, increase the diversity of sources, e.g., not just government officials but also plausibly major NGOs, and the number of sources.

If we emphasize correctness, we may fall short on timeliness. News has to be timely enough to be useful, aside from being correct enough to guide policy and opinion correctly.

News can be narrowly correct but may commit sins of omission. ‘Good’ news provides information on all sides of the issue. ‘Good’ news highlights and engages with all serious claims. It doesn’t give time to discredited claims for “balance.”

Second-to-last, news should be delivered in the right tone. Rather than speculative ad-hominem attacks, “good” news engages with arguments and data.

Lastly, news contributes to the public kitty only if it is original. Thus, ‘good’ news is original. (Plagiarism reduces the incentives for producing quality news because it eats into the profits.)

Good NYT: Provision of ‘Not News’ in the NYT Over Time

29 Nov

The mainstream American news media is under siege from the political right, but for the wrong reasons. To the hyper-partisan political elites, small inequities in slant matter a lot. But hyperventilating about small issues doesn’t magically turn them into serious problems. It just makes them loom larger. And takes the focus away from the much more serious problems. The big afflictions of ‘MSM’ are: vacuity, sensationalism, a preference for opinions over facts, poor understanding of important issues, and disinterest in covering them diligently, disinterest in what goes on outside American borders, a poor understanding of numbers, policy myopia—covering events rather than issues, and fixation with breaking news.

Here we shed light on a small piece of the real estate: the provision of ‘not news.’ By ‘not news’ we mean news about cooking, travel, fashion, home decor, and such. We are very conservative in what we code as ‘not news,’ leaving news about health, anything local, and such in place.

We analyze provision of ‘not news’ in The New York Times (NYT), the nation’s newspaper of record. NYT is both well-regarded and popular. It has won more Pulitzer awards than any other newspaper. And it is the 30th most visited website in the U.S. (as of October 2017).

So, has the proportion of news stories about topics unrelated to politics or the economy, such as cooking, travel, fashion, music, etc., in NYT gone up over time? (For issues related to measurement, see here.) As the figure shows, the net provision of not news increased by 10 percentage points between 1987 and 2007.

Interested in learning about more ways in which NYT has changed over time? See here.

Town Level Data on Cable Operators and Cable Channels

12 Sep

I am pleased to announce the release of TV and Cable Factbook Data (1997–2002; 1998 coverage is modest). Use of the data is restricted to research purposes.


In 2007, Stefano DellaVigna and Ethan Kaplan published a paper that used data from Warren’s Factbook to identify the effect of the introduction of Fox News Channel on Republican vote share (link to paper). Since then, a variety of papers exploiting the same data and identification scheme have been published (see, for instance, Hopkins and Ladd, Clinton and Enamorado, etc.)

In 2012, I embarked on a similar such project—trying to use the data to study the impact of the introduction of Fox News Channel on attitudes and behaviors related to climate change. However, I found the original data to be limited—DellaVigna and Kaplan had used a team of research assistants to manually code a small number of variables for a few years. So I worked on extending the data. I planned on extending the data in two ways: adding more years, and adding ‘all’ the data for each year. To that end, I developed custom software. The data collection and parsing of a few thousand densely packed, inconsistently formatted, pages (see below) to a usable CSV (see below) finished sometime early in 2014. (To make it easier to create a crosswalk with other geographical units, I merged the data with Town lat/long (centroid) and elevation data from

Sample Page
Snapshot of the Final CSV

Soon after I finished the data collection, however, I became aware of a paper by Martin and Yurukoglu. They found some inconsistencies between the Nielsen data and the Factbook data (see Appendix C1 of paper), tracing the inconsistencies to delays in updating the Factbook data—“Updating is especially poor around [DellaVigna and Kaplan] sample year. Between 1999 and 2000, only 22% of observations were updated. Between 1998 and 1999, only 37% of observations were updated.” Based on their paper, I abandoned the plan to use the data, though I still believe the data can be used for a variety of important research projects, including estimating the impact of the introduction of Fox News. Based on that belief, I am releasing the data.

Ipso Facto: Analysis of Complaints to IPSO

11 Jun

Independent Press Standards Agency (IPSO) handles complaints about accuracy etc. in the media in the U.K. Against which media organization are most complaints filed? And against which organization are the complaints most often upheld? We answer these questions using data from the IPSO website. (The data and scripts behind the analysis are posted on GitHub.)

Between its formation in September, 2014 and May 20th, 2016, IPSO received 371 complaints. Expectedly, tabloid newspapers are well represented. Of the 371 complaints, The Telegraph alone received 35 complaints, or about 9.4% of the total complaints. It was followed closely by The Mail with 31 complaints. The Times had 25 complaints filed against it, The Mirror and The Express 22 each, and The Sun, 19 complaints.

Generally, less than half the number of complaints were completely or partly upheld. Topping the list was The Express and The Telegraph with 10 upheld complaints each. And following close behind was The Times with 8 complaints, The Mail with 6, and The Sun and the Daily Star with 4 each.

See also the plot of batting average of media organizations with most complaints against them.

Some Facts About PolitiFact

27 May

I assessed PolitiFact on:

1. Imbalance in scrutiny: Do they vet statements by Democrats or Democratic-leaning organizations more than statements Republicans or Republican-leaning organizations?

2. Batting average by party: Roughly n_correct/n_checked, but instantiated here as mean Politifact rating.

To answer the questions, I scraped the data from PolitiFact and independently coded and appended data on the party of the person or organization covered. (Feel free to download the script for scraping and analyzing the data, scraped data and data linking people and organizations to party from the GitHub Repository.)

Until now, Politifact has checked veracity 3,859 statements by 703 politicians and organizations. Of these, I was able to establish the partisanship of 554 people and organizations. I restrict the analysis to 3,396 statements by organizations and people whose partisanship I could establish and who lean either towards the Republican or Democratic party. I code the Politifact 6-point True to Pants on Fire scale (true, mostly-true, half-true, barely-true, false, pants-fire) linearly so that it lies between 0 (pants-fire) and 1 (true).

Of the 3,396 statements, about 44% (n = 1506) of the statements checked by PolitiFact are by Democrats or Democratic-leaning organizations. Rest of the roughly 56% (n = 1890) are by Republicans or Republican-leaning organizations. The average PolitiFact rating of statements by Democrats or Democratic-leaning organizations (batting average) is .63; it is .49 for statements by Republicans or Republican-leaning organizations.

To check whether the results are driven by some people receiving a lot of scrutiny, I tallied the total number of statements investigated for each person. Unsurprisingly, there is a large skew, with a few prominent politicians receiving a bulk of the attention. For instance, PolitiFact investigated more than 500 claims by Barack Obama alone. The figure below plots the total number of statements investigated for thirty politicians receiving the most scrutiny.

If you take out Barack Obama, the percentage of Democrats receiving scrutiny reduces to 33.98%. More generally, limiting ourselves to the bottom 90% of the politicians in terms of scrutiny received, the share of Democrats is about 42.75%.

To analyze whether there is selection bias in covering politicians who say incorrect things more often, I estimated the correlation between the batting average and the total number of statements investigated. The correlation is very weak and does not appear to vary systematically by party. Accounting for the skew by taking the log of the total statements or by estimating a rank-ordered correlation has little effect. The figure below plots batting average as a function of total statements investigated.


Caveats About Interpretation

To interpret the numbers, you need to make two assumptions:

1. The number of statements made by Republicans and Republican-leaning persons and organizations is the same as that made by people and organizations on the left.

2. Truthiness of statements by Republican and Republican-leaning persons and organizations is the same as that of left-leaning people and organizations.

Getting a Measure of Measures of Selective Exposure

24 Jul

Ideally, we would like to be able to place ideology of each bit of information consumed in relation to the ideological location of the person. And we would like a time-stamped distribution of the bits consumed. We can then summarize various moments of that distribution (or the distribution of ideological distances). And that would be that. (If we were worried about dimensionality, we would do it by topic.)

But lack of data means we must change the estimand. We must code each bit of information as merely uncongenial or uncongenial. This means taking directionality out of the equation. For a Republican at a 6 on a 1 to 7 liberal to conservative scale, consuming a bit of information at 5 is the same as consuming a bit at 7.

The conventional estimand then is a set of two ratios: (Bits of politically congenial information consumed)/(All political information) and (Bits of uncongenial information)/(All political information consumed). Other reasonable formalizations exist, including the difference between congenial and uncongenial. (Note that the denominator is absent, and reasonably so.)

To estimate these quantities, we must often make further assumptions. First, we must decide on the domain of political information. That domain is likely vast and increasing by the minute. We are all producers of political information now. (We always were but today we can easily access political opinions of thousands of lay people.) But see here for some thoughts on how to come up with the relevant domain of political information from passive browsing data.

Next, people often code ideology at the level of ‘source.’ The New York Times is ‘independent’ or ‘liberal’ and ‘Fox’ simply ‘conservative’ or perhaps more accurately ‘Republican-leaning.’ (Continuous measures of ideology — as estimated by Groseclose and Milyo or Gentzkow and Shapiro — are also assigned at the source level.) This is fine except that it means coding all bits of information consumed from a source as the same. And there are some attendant risks. We know that not all NYT articles are ‘liberal.’ In fact, we know much of it is not even political news. A toy example of how such measures can mislead:

Page Views: 10 Fox, 10 CNN. Est: 10/20
But say Fox Pages 7R, 3D and CNN 5R, 5D
Est: 7/10 + 5/10 = 12/20

If the measure of ideology is continuous, there are still some risks. If we code all page views as the mean ideology of the source, we assume that the person views a random sample of pages on the source. (Or some version of that.) But that is too implausible an assumption. It is much more likely that a liberal reading the NYT likely stays away from the David Brooks’ columns. If you account for such within source self-selection, selective exposure measures based on source level coding are going to be downwardly biased — that is find people as less selective than they are.

Discussion until now has focused on passive browsing data, eliding over survey measures. There are two additional problems with survey measures. One is about the denominator. Measures based on limited choice experiments like ones used by Iyengar and Hahn 2009 are bad measures of real-life behavior. In real life, we just have far more choices. And inferences from such experiments can at best recover ordinal rankings. The second big problem with survey measures is ‘expressive responding.’ Republicans indicating they watch Fox News not because they do but because they want to convey they do.

Where’s the Porn? Classifying Porn Domains Using a Calibrated Keyword Classifier

23 Jul

Aim: Given a very large list of unique domains, find domains carrying adult content.

In the 2004 comScore browsing data, for instance, there are about a million unique domains. Comparing a million unique domain names against a large database is doable. But access to such databases doesn’t often come cheap. So a hack.

Start with an exhaustive keyword search containing porn-related keywords. Here’s mine

breast, boy, hardcore, 18, queen, blowjob, movie, video, love, play, fun, hot, gal, pee, 69, naked, teen, girl, cam, sex, pussy, dildo, adult, porn, mature, sex, xxx, bbw, slut, whore, tit, pussy, sperm, gay, men, cheat, ass, booty, ebony, asian, brazilian, fuck, cock, cunt, lesbian, male, boob, cum, naughty

For the 2004 comScore data, this gives about 140k potential porn domains. Compare this list to the approximately 850k porn domains in the Shallalist. This leaves us with a list of 68k domains with uncertain status. Use one of the many URL classification APIs. Using Trusted Source API, I get about 20k porn and 48k non-porn.

This gives us the lower bound of adult domains. But perhaps much too low.

To estimate the false positives, take a large random sample (say 10,000 unique domains). Compare results from keyword search and eliminate using API to API search of all 10k domains. This will give you an estimate of the false positive rate. But you can learn from the list of false negatives to improve your keyword search. And redo everything. A couple of iterations can produce a sufficiently low false negative rate (false positive rate is always ~ 0). (For 2004 comScore data, a false negative rate of 5% is easily achieved.)

See PyDomains.

Where’s the news?: Classifying News Domains

23 Jul

We select an initial universe of news outlets (i.e., web domains) via the Open Directory Project (ODP,, a collective of tens of thousands of editors who hand-label websites into a classification hierarchy. This gives 7,923 distinct domains labeled as: news, politics/news, politics/media, and regional/news. Since the vast majority of these news sites receive relatively little traffic, to simplify our analysis we restrict to the one hundred domains that attracted the largest number of unique visitors from our sample of toolbar users. This list of popular news sites includes every major national news source, well-known blogs and many regional dailies, and
collectively accounts for over 98% of page views of news sites in the full ODP list (as estimated via our toolbar sample). The complete list of 100 domains is given in the Appendix.

From Filter Bubbles, Echo Chambers, and Online News Consumption by Flaxman, Goel and Rao.

When using rich browsing data, scholars often rely on ad hoc lists of domains to estimate consumption of certain kind of media. Using these lists to estimate consumption raises three obvious concerns – 1) Even sites classified as ‘news sites,’ such as the NYT, carry a fair bit of non-news 2) (speaking categorically) There is the danger of ‘false positives’ 3) And (speaking categorically again) there is a danger of ‘false negatives.’

FGR address the first concern by exploiting the URL structure. They exploit the fact that the URL of NY Times story contains information about the section. (The classifier is assumed to be perfect. But likely isn’t. False positive and negative rates for this kind of classification can be estimated using raw article data.) This leaves us with concern about false positives and negatives at the domain level. Lists like those published by DMOZ appear to be curated well-enough to not contain too many false-positives. The real question is about how to calibrate false negatives. Here’s one procedure. Take a large random sample of the browsing data (at least 10,000 unique domain names). Compare it to a large comprehensive database like Shallalist. Of the domains that aren’t in the database, query a URL classification service such as Trusted Source. (The initial step of comparing against Shallalist is to reduce the amount of querying.) Using the results, estimate the proportion of missing domain names (the net number of missing domain names is likely much much larger). Also estimate missed visitation time, page views etc.

Liberal Politicians are Referred to More Often in News

8 Jul

The median Democrat referred to in television news is to the left of the House Democratic Median, and the median Republican politician referred to is to the left of the House Republican Median.

Click here for the aggregate distribution.

And here’s a plot of top 50 politicians cited in news. The plot shows a strong right skewed distribution with a bias towards executives.

News data: UCLA Television News Archive, which includes closed-caption transcripts of all national, cable and local (Los Angeles) news from 2006 to early 2013. In all, there are 155,814 transcripts of news shows.

Politician data: Database on Ideology, Money in Politics, and Elections (see Bonica 2012).

Taking out data from local news channels or removing Obama does little to change the pattern in the aggregate distribution.

Measuring the Impact of Media

10 Nov

Measuring the impact of media accurately is challenging. Findings of minimal effects abound when intuition tells us that an activity that an average American engages in over forty hours a week is likely to have a larger impact. These insignificant findings have been typically attributed to the frailty of survey self-reports of media exposure, though debilitating error in dependent variables has also been noted as a culprit. Others have noted weaknesses in research design, inadequate awareness of analytic techniques that allow one to compensate for the error in measures, etc. as stumbling blocks.

Here are a few of the methods that have been used to overcome some of the problems in media research, along with some modest new proposals of my own:

  • Measurement
    Since measures are error-prone, one strategy has been to combine multiple measures. Multiple measures of a single latent concept can be combined using latent variable models, factor analysis, or even simple averaging. Precaution must be taken to check that errors across measures aren’t heavily correlated, for under such conditions improvements from combining multiple measures are likely to be weak or non-existent. In fact, deleterious effects are possible.

    Another point of worry is that measurement error can be correlated with irrelevant respondent characteristics. For instance, women guess less than men on knowledge questions. Hence responses to knowledge questions are a function of ability and propensity to guess when one doesn’t know (tallied here by gender). By conditioning on gender, we can recover better estimates of ability. Another application would be in handling satisficing.

  • Measurement of exposure
    Rather than use self-assessments of exposure, which have been shown to be correlated to confounding variables, one may want to track incidental consequences of exposure as a measure of exposure. For example, knowledge of words of a campaign jingle, attributes of a character in a campaign commercial, source (~channel) on which the campaign was shown, program, etc. These measures factor in attention, in addition to exposure, which is useful. Unobtrusive monitoring of consumption is, of course, likely to be even more effective.

  • Measurement of Impact
    1. Increased exposure to positive images ought to change procedural memory and implicit associations. One can use IAT or AMP to assess the effect.
    2. Tracking Twitter and Facebook feeds for relevant information. These measures can be calibrated to opinion poll data to get a sense of what they mean.
  • Data Collection
    1. Data collection efforts need to reflect half-life of the effect. Recent research indicates that some of the impacts of the media may be short-lived. Short-term effects may be increasingly consequential as people increasingly have the ability to act on their impulses – be it buying something, or donating to a campaign, or finding more information about the product. Behavioral measures (e.g. website hits) corresponding to ads may thus be one way to track impact.
    2. Future ‘panels’ may contain solely passive monitoring of media use (both input and output) and consumption behavior.
  • Estimating recipient characteristics via secondary data
    1. Geocoded IP addresses can be used to harvest secondary demographic data (race, income, etc.) from census
    2. Para-data like what browser and operating system the customer uses etc. are reasonable indicators of tech. savvy. And these data are readily harvested.
    3. Datasets can be merged via matching or by exploiting correlation across items and by calibrating.

Sharing Information about Sharing Misinformation

16 May

The Internet has revolutionized the dissemination of misinformation. Easy availability of incorrect information, gullible and eager masses, and ease of sharing has created fertile conditions for misinformation epidemics.

While a fair proportion of misinformation is likely created deliberately, it may well spread inadvertently. Misinformation that people carry is often no different than fact to them. People are likely to share misinformation with the same enthusiasm as they would fact.

Attitude congenial misinformation is more likely to be known (and accepted as fact), and more likely to be enthusiastically shared with someone who shares the same attitude (for social, and personal rewards). Misinformation considered useful is also more likely to be shared, e.g. (mis)-information about health-related topics.

The chance of acceptance of misinformation may be greater still if people know little about the topic, or if they have no reason to think that the information is motivated. Lastly, these epidemics are more likely to take place among those less familiar with technology.

Idealog: Internet Panel + Media Monitoring

4 Jan

Media scholars have for long complained about the lack of good measures of media use. Survey self-reports have been shown to be notoriously unreliable, especially for news, where there is significant over-reporting, and without good measures, research lags. The same is true for most research in marketing.

Until recently, the state of the art aggregate media use measures were Nielsen ratings, which put a `meter’ in a few households, or asked people to keep a diary of what they saw. In short, the aggregate measures were pretty bad as well. Digital media, which allows for effortless tracking, and the rise of Internet polling however for the first time provides an opportunity to create `panels’ of respondents for whom we have near perfect measures of media use. The proposal is quite simple: create a hybrid of Nielsen on steroids and YouGov/Polimetrix or Knowledge Network kind of recruiting of individuals.

Logistics: Give people free cable and Internet (~ 80/month) in return for 2 hours of their time per month and monitoring of media consumption. Pay people who already have cable (~100/month) for installing a device and software. Recording channel information is enough for TV, but Internet equivalent of a channel—domain—clearly isn’t, as people can self-select within websites. So we only need to monitor the channel for TV but more for the Internet.

While the number of devices on which people browse the Internet, and watch TV has multiplied, there generally remains only one `pipe’ per house. We can install a monitoring device at the central hub for cable, and automatically install software for anyone who connects to the Internet router or do passive monitoring on the router. Monitoring can also be done through applications on mobile devices.

Monetizability: Consumer companies (say Kellog’s, Ford), Communication researchers, Political hacks (e.g. how many watched campaign ads) will all pay for it. The crucial innovation (modest) is the addition of the possibility to survey people on a broad range of topics, in addition to getting great media use measures.

Addressing privacy concerns:

  1. Limit recording information to certain channels/websites, ones on which customers advertise, etc. This changing list can be made subject to approval by the individual.
  2. Provide for a web-interface where people can look/suppress the data before it is sent out. Of course, reconfirm that all data is anonymous to deter such censoring.

Ensuring privacy may lead to some data censoring and we can try to prorate the data we get it a couple of ways –

  • Survey people on media use
  • Use Television Rating Points (TRP) by sociodemographics to weight data.