Weighting to Multiple Datasets

27 Aug

Say there are two datasets—one that carries attitudinal variables, and demographic variables (dataset 1), and another that carries just demographic variables (dataset 2). Also assume that Dataset 2 is the more accurate and larger dataset for demographics (e.g. CPS). Our goal is to weight a dataset (dataset 3) so that it is “closest” to the population at large on both socio-demographic characteristics and attitudinal variables. We can proceed in the following manner: weight Dataset 1 to Dataset 2, and then weight dataset 3 to dataset 1. This will mean multiplying the weights. One may also impute attitudes for the larger dataset (dataset 2), using a prediction model built using dataset 1, and then use the larger dataset to generalize to the population.

Star Trek: Trekking Uncertainly Between Utopia and Twentieth Century Earth

27 Aug

Star Trek (and its spin-offs) are justly applauded for including socially progressive ideas in both, the themes of their stories, and the cultural fabric of the counterfactual imagination of the future. For instance, women and minorities command positions of responsibility, those working for the ‘Federation’ take ethical questions seriously, both ‘Data’ (an android) and empathy (via a ‘Betazoid counselor’) play a central role in making command decisions (at least in one of the series), etc.

There are some other pleasant aspects of the show. The background hum of a ship replaces cacophonous noise that passes for as background score on many shows; order prevails; professionalism and intelligence are shown as being rewarded; backroom machinations are absent, and the thrill of exploration and discovery is elevated to a virtue.

However, there are a variety of places where either insufficient thought or distinctly twentieth-century considerations intrude. For one, the central protagonists belong to ‘Star Fleet’, a military (and peacekeeping) arm of the ‘Federation.’ More distressingly, this military arm seems to be run internally on many of the same time-worn principles as on earth in the twentieth century including, an extremely hierarchical code, uniform clothing, etc. The saving grace is that most members of the Star Fleet are technical personnel. Still, the choice of conceptualizing the protagonists as belonging to the military wing (of arguably a peaceful organization) is somewhat troubling.

There are other ‘backward’ aspects. Inter-species stereotyping is common. For instance, Ferengis are mostly shown as irredeemably greedy, the Romulans and Klingons as devoted to war, and the Borg and the Dominion as simply evil. While some shows make some attempts at dealing with the issue, attributing psychological traits to entire cultures and worlds is relatively common. Further, regrettably, uniforms of women in some of the series are noticeably tighter.

More forgivably perhaps, there is an almost exclusive focus on people in command. This is perhaps necessitated by demands of creating non inter-personal drama, most easily achieved by focusing on important situations that affect the fate of many—the kinds of situations only people in command confront (in the hierarchical institutional format shown). The hierarchical structure and need for drama often create some absurdity. Since those in command have to be shown ‘commanding’, the captain of the ship is shown giving the largely superfluous order of ‘engage’ (akin to asking the driver to ‘drive’ when he knows he has to drive you to the destination) in a theatrical fashion. Similarly, given the level of automation and technological sophistication shown, opportunities for showing heroism have to be many a time contrived. Hence many of the ‘missions’ are tremendously low tech.

Where does this leave us? Nowhere in particular but perhaps with just a slightly better appreciation of some of the ‘tensions’ between how the show is often imagined by ‘nerds’ (as a vision of utopia) and what the show is really about.

Size Matters, Significantly

26 Aug

Achieving statistical significance is entirely a matter of sample size. In the frequentist world, we can always distinguish between two samples if we have enough data (except of course if the samples are exactly the same). On the other hand, we may fail to reject even large differences when sample sizes are small. For example, over 13 Deliberative Polls (list at the end), the correlation between the proportion of attitude indices showing significant change and size of the participant sample is .81 (rank ordered correlation is .71). This sharp correlation is suggestive evidence that average effect is roughly equal across polls (and hence power matters).

When the conservative thing to do is to the reject the null, for example, in “representativeness” analysis designed to see if the experimental sample is different from control, one may want to go for large sample sizes or say something about substantiveness of differences, or ‘adjust’ results for differences. If we don’t do that samples can look more ‘representative’ as sample size reduces. So for instance, the rank-ordered correlation between proportion significant differences between non-participants and participants, and the size of the smaller sample (participant sample), for the 13 polls is .5. The somewhat low correlation is slightly surprising. It is partly a result of the negative correlation between the size of the participant pool and average size of the differences.

Polls included: Texas Utilities: (CPL, WTU, SWEPCO, HLP, Entergy, SPS, TU, EPE), Europolis 2009, China Zeguo, UK Crime, Australia Referendum, and NIC

Adjusting for Covariate Imbalance in Experiments with SUTVA Violations

25 Aug

Consider the following scenario: control group is 50% female while the participant sample is 60% female. Also, assume that this discrepancy is solely a matter of chance and that the effect of the experiment varies by gender. To estimate the effect of the experiment, one needs to adjust for the discrepancy, which can be done via matching, regression, etc.

If the effect of the experiment depends on the nature of the participant pool, such adjustments won’t be enough. Part of the effect of Deliberative Polls is a consequence of the pool of respondents. It is expected that the pool matters only in small group deliberation. Given people are randomly assigned to small groups, one can exploit the natural variation across groups to estimate how say proportion females in a group impacts attitudes (dependent variable of interest). If that relationship is minimal, no adjustments outside the usual are needed. If, however, there is a strong relationship, one may want to adjust as follows: predict attitudes under simulated groups from a weighted sample, with the probability of selection proportional to the weight. This will give us a distribution — which is correct— as women may be allocated in a variety of ways to small groups.

There are many caveats, beginning with limitations of data in estimating the impact of group characteristics on individual attitudes, especially if effects are heterogeneous. Where proportions of subgroups are somewhat small, inadequate variation across small groups can result.

This procedure can be generalized to a variety of cases where the effect is determined by the participant pool except where each participant interacts with the entire sample (or a large proportion of it). Reliability of the generalization will depend on getting good estimates.

Poor Browsers and Internet Surveys

14 Jul

Given,

  1. older browsers are likelier to display the survey incorrectly.
  2. type of browser can be a proxy for respondent’s proficiency in using computers, and speed of the Internet connection.

People using older browsers may abandon surveys at higher rates than those using more modern browsers.

Using data from a large Internet survey, we test whether people who use older browsers abandon surveys at higher rates, and whether their surveys have larger amount of missing data.

Read more here: https://github.com/soodoku/poor_browser

Elite Lawyers!

11 Jul

(Based on data from the 111th Congress)

Law is the most popular degree at the Capitol Hill (it has been the case for a long time). Nearly 52% of the senators, and 36% of congressional representatives have a degree in law. There are some differences across parties and across houses, with Republicans likelier to have a law degree than Democrats in the Senate (58% to 48%), and the reverse holding true for the Congress, where a greater share of Democrats holds law degrees than Republicans (40% to 32%). Less than 10% of members of congress have a degree in the natural sciences or engineering. Nearly 8% have a degree from Harvard, making Harvard’s the largest alumni contingent at the Capitol. Yale is a distant second with less than half the number that went to Harvard.

Data and Script

Does Children’s Sex Cause Partisanship?

26 May

More women identify themselves as Democrats than as Republicans. The disparity is yet greater among single women. It is possible (perhaps even likely) that this difference in partisan identification is due to (perceived) policy positions of Republicans and Democrats.

Now let’s do a thought experiment: Imagine a couple about to have a kid. Also, assume that the couple doesn’t engage in sex-selection. Two things can happen – the couple can have a son or a daughter. It is possible that having a daughter persuades the parent to change his or her policy preferences towards a direction that is perceived as more congenial to women. It is also possible that having a son has the opposite impact — persuading parents to adopt more male congenial political preferences. Overall, it is possible that gender of the child makes a difference to parents’ policy preferences. With panel data, one can identify both movements. With cross-sectional data, one can only identify the difference between those who had a son, and those who had a daughter.

Let’s test this using cross-sectional data from Jennings and Stoker’s “Study of Political Socialization: Parent-Child Pairs Based on Survey of Youth Panel and Their Offspring, 1997.”

Let’s assume that a couple’s partisan affiliation doesn’t impact the gender of their kid.

The number of kids, however, is determined by personal choice, which in turn may be impacted by ideology, income, etc. For example, it is likely that conservatives have more kids as they are less likely to believe in contraception, etc. This is also supported by the data. (Ideology is a post-treatment variable. This may not matter if the impact of having a daughter is same in magnitude as the impact of having a son, and if there are similar numbers of each across people.)

Hence, one may conceptualize “treatment” as the gender of the kids, conditional on the number of kids.

Understandably, we only study people who have one or more kids.

Conditional on number of kids, the more daughters respondent has, the less likely respondent is to identify herself as a Republican (b = -.342, p < .01) (when dependent variable is curtailed to Republican/Democrat dichotomous variable; the relationship holds—indeed becomes stronger—if the dependent variable is coded as an ordinal trichotomous variable: Republican, Independent, and Democrat, and an ordered multinomial estimated)

Future:

If what we observe is true then we should also see that as party stances evolve, the impact of gender on policy preference of a parent should vary. One should also be able to do this cross-nationally.

Some other findings:

  1. Probability of having a son (limiting to live births in the U.S.) is about .51. This natural rate varies slightly by income. Daughters are more likely to be born among people with lower incomes. However, the effect of income is extremely modest in the U.S. The live birth ratio is marginally rebalanced by the higher child mortality rate among males. As a result, among 0–21, the ratio between men and women is about equal in U.S.

    In the sample, there are significantly more daughters than sons. The female/male ratio is 1.16. This is ‘significantly’ unusual.

  2. If families are less likely to have kids after the birth of a boy, the number of kids will be negatively correlated with proportion sons. Among people with just one kid, the number of sons is indeed greater than number of daughters, though the difference is insignificant. Overall correlation between proportion sons and number of kids is also very low (corr. = -.041).

Reducing Errors in Survey Analysis

22 May

Analysis of survey data is hard to automate because of the immense variability across survey instruments—differentvariables, differently coded, and named in ways that often defy even the most fecund imagination. What often replaces complete automation is ad-hoc automation—quickly coded functions, e.g. recoding a variable to lie within a particular range, applied by intelligent people frustrated by the lack of complete automation and bored by the repetitiveness of the task. Ad-hoc automation attracts mistakes, as functions are often coded without rigor, and useful alerts and warnings are usually missing.

One way to reduce mistakes is to prevent them from happening. Carefully coded functions with robust error checking and handling, alerts, and passive verbose outputs that are cognizant of our own biases, and bounded attention, can reduce mistakes. Functions that are used most frequently typically need the most attention.

Let’s use the example of recoding a variable to lie between 0 and 1 in R to illustrate how to code a function. Some things to consider:

  1. Data type: Is the variable numeric, ordinal, or categorical? Let’s say we want to constrain our function to handle only numeric variables. Some numeric variables may be coded as ‘character.’ We may want to seamlessly deal with these issues, and possibly issue warnings (or passive outputs) when improper data types are used.
  2. Range: The range that the variable takes in the data may not span the entire domain. We want to account for that, but perhaps seamlessly by printing out the range that the variable takes and by also allowing the user to input the true range.
  3. Missing Values: A variety of functions we may rely on when recoding our variable may take fail (quietly) when fronted with missing values, for example, range(x). We may want to alert the user to the issue but still handle missing values seamlessly.
  4. A user may not see the actual data so we may want to show the user some of the data by default. Efficient summaries of the data (fivenum, mean, median, etc.) or displaying a few initial items may be useful.

A function that addresses some of the issues:


zero1 <- function(x, minx=NA, maxx=NA) {
# Test the type of x and see if it is a double, or can be transformed into a double
stopifnot(identical(typeof(as.numeric(x)), 'double'))
if(typeof(x)=='character') x <- as.numeric(x)
print(head(x)) #displays first few items
print(paste("Range:", paste(range(x, na.rm=T), collapse=" "))) #shows the range the variable takes in the data
res <- rep(NA, length(x))
if(!is.na(minx)) res <- (x - minx)/(maxx - minx)
if(is.na(minx))  res <- (x - min(x,na.rm=T))/(max(x,na.rm=T) - min(x,na.rm=T))
res
}

These tips also apply to canned functions available in R (and those writing them) and functions in other statistical packages that do not normally display alerts or other secondary information that may reduce mistakes. One can always build on canned functions. For instance, the recode (car package) function can be coded to passively display the correlation between the recoded variable and the original variable by default.

In addition to writing better functions, one may also want to check post hoc. But a caveat about post hoc checks: Post hocchecks are only good at detecting aberrations among the variables you test, and they are costly.

  1. Using prior knowledge:

    1. Identify beforehand how some variables relate to each other. For example, education is typically correlated with political knowledge, race with partisan preferences, etc. Test these hypotheses. In some cases, these can also be diagnostic of sampling biases.
    2. Over an experiment, you may have hypotheses about how variables change across time. For example, constraint typically increases across attitude indices over the course of a treatment designed to produce learning. Test these priors.
  2. Characteristics of the coded variable: If using multiple datasets, check to see if the number of levels of a categorical variable are the same across each dataset. If not, investigate. Cross-tabulations across merged data are a quick way to diagnose problems, which can range from varying codes for missing data to missing levels.

Sort of Sorted but Definitely Cold

18 May

By now students of American Politics have all become accustomed to seeing graphs of DW-NOMINATE scores showing ideological polarization in Congress. Here are the equivalent graphs (we assume two dimensions) at the mass-level.

Data are from the 2004 ANES. Social and Cultural Preferences are from Confirmatory Factor Analysis over relevant items.
cult

sw

cult.therm

sw.therm

3d.therm

Here’s how to interpret the graphs:

1) There is a large overlap in preference profiles of Rs and Ds.

2) Conditional on same preferences, there is a large gap in thermometer ratings. Without partisan bias – same-preferences should yield about the same R-D thermometer ratings. And this gap is not particularly responsive to change in preferences within parties.

Sharing Information about Sharing Misinformation

16 May

The Internet has revolutionized the dissemination of misinformation. Easy availability of incorrect information, gullible and eager masses, and ease of sharing has created fertile conditions for misinformation epidemics.

While a fair proportion of misinformation is likely created deliberately, it may well spread inadvertently. Misinformation that people carry is often no different than fact to them. People are likely to share misinformation with the same enthusiasm as they would fact.

Attitude congenial misinformation is more likely to be known (and accepted as fact), and more likely to be enthusiastically shared with someone who shares the same attitude (for social, and personal rewards). Misinformation considered useful is also more likely to be shared, e.g. (mis)-information about health-related topics.

The chance of acceptance of misinformation may be greater still if people know little about the topic, or if they have no reason to think that the information is motivated. Lastly, these epidemics are more likely to take place among those less familiar with technology.

Cricket: An Unfairly Random Game?

7 May

In many cricket matches, it is claimed that there is a clear advantage to bowling (batting) first. The advantage is pointed to by commentators, and by captains of the competing teams in the pre-toss interview. And sometimes in the post-match interview.

The opportunity to bowl or bat first is decided by a coin toss. While this method of deciding on who is advantaged is fair on average, the system isn’t fair in any one game. At first glance, the imbalance seems inevitable. After all, someone has to bat first. One can, however, devise a baseball-like system where short innings are interspersed. If that violates the nature of the game too much, one can easily create pitches that don’t deteriorate appreciably over the course of a game. Or, one can come up with an estimate of the advantage and adjust scores accordingly (something akin to an adjustment issued when matches are shortened due to rain).

But before we move to seriously consider these solutions, one may ask about the evidence.

Data are from nearly five thousand one-day international matches.

The team that wins the toss wins the match approximately 49.3% of the time. With 5335 matches, we cannot rule out that the true proportion is 50%. Thus, counter to intuition, the effect of winning the toss is, on average, at best minor. This may be so because it is impossible to predict well in advance the advantage of bowling or batting first. Or it may simply be because teams are bad at predicting it, perhaps because they use bad heuristics.

time

No effects across the entire sample may hide some subgroup effects. It is often claimed that toss is more crucial in day and night matches, due to dew and lower visibility of the white ball under lights. And data show as much.

daynight

It may well be the case that toss is more important in tests than one-day matches.

Nudging

5 May

Nudging the mood?
Important consequential decisions in life are hostage to our mood. What we intend to do (and actually do) often varies by mood. Mood, in turn, can vary due to a variety of exogenous reasons – negative swings can be caused by ill-health (a headache, or allergies) and positive swings can be caused by a nice thing said by someone you meet by accident. This variation is a proof of our irrationality. The irrational aspect is not just misattribution of ill-health to mood, but why mood at all affects our decisions. Being aware of the relationship between mood and decisions can allow one to choose better. Given the central place mood occupies in decision making, it is likely that a nudge to affect the mood would be powerful.

End of a nudge
One of the paper-towel dispensers I use has the following sticker ‘These come from trees.’ This is a famous ‘nudge’ (In Sunstein/Thaler terminology). So far so good. Till perhaps a few months ago, I always read the sticker when I used the dispenser. Yesterday I noticed that I had stopped noticing the sticker. This contrasts with my behavior towards the hotel notes about saving water – which I still read. I think that is so partly because there is so much time in a hotel room. Nudges for quick everyday decisions perhaps need to change over time.

On (Modest) Differences In Racial Distribution of Voting Eligible Population and Registered Voters in California

13 Apr

Each election cycle, many hands are waved and spit is launched in air, when the topic of registration rates of Latinos (and other minorities) comes up. And indeed registration rates of Latinos substantially lag those of Whites. In California, percent eligible Latinos who are registered is 62.8%, whereas percent eligible Whites registered to vote is approximately 72.9%.

This somewhat large difference in registration rates doesn’t automatically translate to (equally) wide distortions in racial distribution of the eligible population and the registered voter population. For example, while self-identified Whites constitute 62.8% of the VEP, they constitute marginally more – 64.2% of the voting eligible respondents who self-identify as having registered to vote.

Here’s the math:

Assume VEP Pop. = 100
Whites = 63/100; of these 72% register = 45
Latinos = 23/100; of these 62% register = 14
Rest = 14/100; of these 62% register = 9
New Registered Population = 45 + 14 + 9 = 68
Registered: Whites = 66.2; Latinos = 20.6

Source: PPIC Survey (September 2010).
Note: CPS 2008, Secretary of State data confirm this. Voting day population estimates from Exit Poll also show no large distortions.

Some simple math:
For a two category case, say proportion category a = pa
Proportion category b = 1 - pa

Assume response rates for category a = qa, and for category b = qb = c*qa


Initial Ratio = pa/(1 -pa)
Final Ratio = pa*qa/(1-pa)*qb

Or between time 1 and 2, ratio changes by qa/qb or 1/c


T1 Diff. = pa - (1- pa) = 2pa - 1
T2 Diff. = (pa*qa - qb + pa*qb)/(pa*qa + (1-pa)*qb)
= (pa(qa + qb) - qb)/(pa(qa - qb) + qb)
= [pa*qa (1 + c) - c*qa]/[pa*qa(1-c) + c*qa]

T2 Diff. - T1 Diff. = [pa*qa (1 + c) - c*qa]/[pa*qa(1-c) + c*qa] - (2pa -1)
= [pa*qa (1 + c) - c*qa + pa*qa(1-c) + c*qa - 2pa (pa*qa(1-c) + c*qa)]/[pa*qa(1-c) + c*qa]
= [pa*qa + pa*qa*c - c*qa + pa*qa - pa*qa*c + c*qa - 2pa*pa*qa + 2pa*pa*qa*c - 2pa*c*qa]/[pa*qa(1-c) + c*qa]
= [2pa*qa - 2pa*pa*qa + 2pa*pa*qa*c - 2pa*c*qa]/[pa*qa(1-c) + c*qa]
= [2pa*qa(1- pa + pa*c -c)]/[pa*qa(1-c) + c*qa]
= [2pa*qa((1- c) - pa(1-c))]/[pa*qa(1-c) + c*qa]
= [2pa*qa(1-pa)(1-c)]/[pa*qa(1-c) + c*qa]

Diff. in response rates = qa - qb

When will diff. in response rates be greater than T2 - T1 Diff. -
qa - qb > [2pa*qa(1-pa)(1-c)]/(pa*qa - pa*qac + cqa)
qa(1-c)(pa*qa - pa*qac + cqa) > 2pa*qa(1-pa)(1-c)
qa(1-c)(pa*qa - pa*qa*c + c*qa) - 2pa*qa(1-pa)(1-c) > 0
(1-c)qa [pa*qa - pa*qa*c + c*qa - 2pa(1 -pa)] > 0
(1-c)qa[pa*qa -pa*qa*c + c.qa - 2pa + 2pa*pa] > 0
(1-c)qa[pa(qa - qa*c -2 + 2pa) - c.qa] > 0
(1- c) and qa are always greater than 0. Lets take them out.

pa.qa - pa.qa.c - 2pa + 2pa.pa - c.qa > 0
qa - qa.c - 2 + 2pa - c.qa/pa > 0 [ dividing by pa]
qa + 2pa - c.qa(1 + 1/pa) > 0
qa + 2pa > c.qa(1 + 1/pa)
(qa + 2pa)/[qa(1 + 1/pa)] > c
[pa*(qa + 2pa)]/[(pa + 1)qa] > c

When will diff. in response rates + initial diff. > T2 diff.
qa - qa*c + 2pa - 1 > [pa*qa (1 + c) - c*qa]/[pa*qa(1-c) + c*qa]
[pa*qa(1-c) + c*qa][qa - qa*c + 2pa - 1] - [pa*qa (1 + c) - c*qa] > 0
- pa*qa + pa*qa*c - c*qa + [pa*qa(1-c) + c*qa][qa - qa*c + 2pa] - pa*qa - pa*qa*c + c*qa > 0
-2pa*qa + [pa*qa(1-c) + c*qa][qa - qa*c + 2pa] > 0
-2pa*qa + [pa*qa - pa*qa*c + c*qa][qa - qa*c + 2pa] > 0
-2pa*qa + pa*qa[qa - qa*c + 2pa] - pa*qa*c[qa - qa*c + 2pa] + c*qa[qa - qa*c + 2pa] > 0
-2pa*qa + pa*qa*qa - pa*qa*qa*c + 2pa*qa*pa - pa*qa*c*qa + pa*qa*c*qa*c + 2pa*qa*c*pa + c*qa*qa - c*qa*qa*c + 2pa*c*qa> 0
-2pa*qa + pa*qa^2 - 2c*pa*qa^2 + 2qa*pa^2 + pa*c^2*qa^2 + 2pa^2*c*qa + c*qa^2 + c^2*qa^2 + 2pa*c*qa > 0
-2pa*qa + 2qa*pa^2 + 2pa*c*qa + 2pa^2*c*qa + pa*qa^2 - 2c*pa*qa^2 + pa*c^2*qa^2 + c*qa^2 + c^2*qa^2 > 0
2qa*pa(-1 + c + pa + pa*c) + pa*qa^2 (1 - 2c + c^2) + c*qa^2(1 + c) > 0
2qa*pa(-1 + c + pa(1+c)) + pa*qa^2 (1 - c)^2 + c*qa^2(1 + c) > 0
two of the terms are always 0 or more.
2qa*pa(-1 + c + pa(1+c)) > 0
-1 + c + pa(1+c) > 0
pa > (1-c)/(1 +c)

Idealog: Creating A Leaky Internet

6 Apr

Recent Wikileaks episode has highlighted the immense control national governments and private companies have on what content can be hosted. Within days of being identified by the U.S. government as a problem, private companies in charge of hosting and providing banking services to Wikileaks withdrew support, largely neutering organization’s ability to raise funds, and host content.

Successful attempts to cut Internet in Egypt and Libya also pose questions of a similar nature.

So two questions follow. Should anything be done about it? And if so, what? The answer to the first is not as clear, but on balance, perhaps such absolute discretionary control over the fate of ‘hostile’ information/or technology should not be the allowed. As to the second question, given many of the hosting, banking companies, etc. essential to disseminating content are privately held, and susceptible to both government and market pressures, dissemination engine ought to be independent of those as much as possible (bottlenecks remain: most pipes are owned by governments or corporations). Here are three ideas:

  1. Create an international server farm on which content can be hosted by anyone but only removed after due process, set internationally. (NGO supported farms may work as well.)
  2. We already have ways to disseminate content without centralized hosting—P2P. But these systems lack a browser that collates torrents and builds a webpage in live time. Such a torrent based browser can vastly improve the ability of P2P networks to host content.
  3. For Libya/Egypt etc. the problem is of a different nature. We need applications like Twitter to continue to function even if the artery to central servers goes down. This can be handled by building applications in a manner that they can be run on edge servers with local data. I believe this kind of redundancy can also be useful for businesses.

Measuring Partisan Affect Coldly

24 Mar

Outside of the variety of ways of explicitly asking people how they feel about another group — feeling thermometers, like/dislike scales, favorability ratings — explicit measures asked using mechanisms designed to overcome or attenuate social desirability concerns — bogus pipeline, ACASI — and a plethora of implicit measures — affect misattribution, IAT — there exist a few other interesting ways of measuring affect:

  • Games as measures – Jeremy Weinstein uses games like the dictator game to measure (inter-ethnic) affect. One can use prisoner’s dilemma, among other games, to do the same.
  • Systematic bias in responding to factual questions when ignorant about the correct answer. For example, most presidential elections years since 1988, ANES has posed a variety of retrospective evaluative and factual questions including assessments of the state of the economy, whether the inflation/unemployment/crime rose, remained the same, or declined in the past year (or some other time frame). Analyses of these questions have revealed significant ‘partisan bias’, but these questions have yet to be used as a measure of ‘partisan affect’ that is the likely cause of the observed ‘bias’.

Fairly Random

15 Mar

The lottery is a way to assign disproportionate rewards (or punishments) fairly. Procedural fairness—equal chance of selection—provides legitimacy to this system of disproportionate allocation.

Given the purpose of a lottery is unequal allocation, it is essential that we seek informed consent from the participants, and that we only use a lottery in important areas when necessary.

Fairness over the longer term
One particular use of lottery is in the fair assignment of scarce indivisible resources. For example, think of a good school with only a hundred open seats that receives a thousand applications from candidates who are indistinguishable (or only weakly distinguishable)—given limitations of the data —from each other in matters of ability. One fair way of assigning seats would be to do it randomly.

One may choose to consider the matter closed at this point. However, this means making peace with disproportional outcomes. Alternatives to this option exist. For example, one may ask the winners of the lottery to give back to those who didn’t win, say by sharing the portion of their income attributable to going to a good school, or by producing public goods, or by some other mutually agreed mechanism.

Fair Selection
Random selection is a fair method of selection over objects where we have no or little reason to prefer one over the other. When objects are observably—as much as the data can tell us—the same, or similar, same within some margin, random selection can be seen as fair.

One may extend it to objects that are different but for no discretionary action of theirs, say people with physical or mental disabilities. However, competing concerns, such as lower efficiency, etc., exist. More generally, selection based on some commonly agreed metric, for instance, maximal increase in the public good, may also be considered fair.

As is clear, those who aren’t selected don’t deserve less, and indeed adequate compensation ought to be the formal basis of selection, unless of course rewards once earned cannot be transferred (say lottery to get a liver transplant, which leaves others dead, and hence unable to receive any compensation, though one can imagine rewards being transferred to relatives, etc.).

Idealog: Surveys in Exchange for Temporary Internet Access

21 Feb

Recruiting diverse samples on the Internet is a tough business. Even if one uses probability sampling, one must still think of creative ways of incentivizing response. For example, Knowledge Networks samples uses RDD (one can use ABS) and then entices selected respondents with free Internet access (or money) in exchange for filling out surveys each month.

One can extend that concept in the following manner – there are a variety of places where people must wait or have time to spare, where they also increasingly have their laptops and their cell-phones, for example, the airport, the airplane, the dentist’s office, the DMV, hotel room, etc., and where they look to browse the Internet without paying any money. There are other places where people just want access to the Internet without paying (say cafes). Hence, one way to recruit people for surveys would be to give free temporary Internet access or local coupon(s) in return for filling out survey(s).

One can extend this method to ‘buying’ temporary Internet access to buying any sort of product (e.g. content) or service.

Some Potential Negatives of Elite Polarization

18 Feb

Growing ideological distance between the parties has produced clearer choices. This added clarity has resulted in improved propensity among voters to make ideologically consistent choices (Levendusky 2010). This is seen as a positive.

However, there may be some negative normative implications as well. If parties have moved away from the center, and if most people are near the center (as data shows), two things follow –
1) The average distance of each of those people from either of the parties has increased. So people’s choices have become impoverished.
2) The penalty of misclassification – for a leaner to mistakenly vote for the wrong party – has increased substantially. It may well be that while propensity of misclassification has decreased, the penalty has increased, leaving aggregate utility slightly worse off.

Secondly, if the government is at least partly in the business of providing public goods that require collective action (distributed costs), the split nature of constituencies and constituency-based entrenched positions may very likely lead to an under-provision of public goods.

Thirdly, and something that is covered in the first point, clearer choices don’t mean the choices that are the best, or even what people want. One would hope that the choices on offer are optimal but we know that monopolies or duopolies under conditions where start-up costs are high to get into the sector have a sparse record to providing something like that.

Fourthly, it also follows that given we have firm partisans, parties will stop broadening their constituencies beyond a certain point due to the law of really rapidly diminishing returns under sorted electorate conditions. This will mean that the policy buckets shrink, and they will have larger incentives to cater to their bases.

Fifthly, the legitimacy of the government is likely to be reduced among the losing camp which has reasons to believe that the ruling coalition doesn’t represent it.