Lack of reproducibility is a symptom of science in crisis. An eye-catching symptom to be sure, but hardly the only one vying for attention. Recent analyses suggest that nearly two-thirds of the (relevant set of) articles published in prominent political science journals condition on post-treatment variables (see here.) Another analysis suggests that half of the relevant set of articles published in prominent neuroscience journals interpret the difference between a significant and non-significant result as evidence that the difference between the two is significant (see here). What is behind this? My guess: poor understanding of statistics, poor editorial processes, and poor strategic incentives.
- Poor understanding of statistics among authors. It likely stems from:
- Ignorance: It would be harsh to impute bad faith on the part of those who use post-treatment variables as control or treat the difference between significant and non-significant results as significant. It is plausibly better explained by ignorance.
- Misinformation (Bad Advice): Ever wonder who is the gardener of the garden of forking paths? Amy Orban finds some social science books advise people to p-hack (page 38 of the presentation; Diana Mutz’s book Population-Based Survey Experiments.)
- Poor understanding of statistics among editors, reviewers, etc. This creates two problems:
- Cannot catch inevitable mistakes: Whatever the failings of authors, they aren’t being caught during the review process. (It would be good to know how often reviewers are the source of bad recommendations.)
- Creates Bad Incentives: If editors are misinformed, say to look for significant results, authors will be motivated to deliver to that.
- If you know what is the right thing to do but know that there is a premium for doing the wrong thing (see the second point of the second point), you may use a lack of transparency as a way to cater to bad incentives.
- Psychological biases:
- Motivated Biases: Scientists are likely biased toward their own theories. They wish them to be true. This may lead to motivated skepticism and scrutiny. The same principle likely applies to reviewers who catch on to the storytelling and give a wider pass to stories that jive with them.
- Production Pressures: Given production pressures, there is likely sloppiness in what is produced. For instance, it is troubling how often retracted articles are cited after the publication of the retraction notice.
- Weak Penalties for Being Sloppy: Without easy ways for others to find mistakes, it is easier to be sloppy.
Given these problems, the big solution I can think of is improving training. Another would be programs that highlight some of the psychological biases and drive clarity on the purpose of science. The troubling part is that the most commonly proposed solution is transparency. As Gelman points out, transparency is neither necessary nor sufficient to prevent the “statistical and scientific problems” that underlie “the scientific crisis” because:
- Emphasis on transparency would merely mean transparent production of noise (last column on page 38).
- Transparency makes it a tad easier to spot errors but doesn’t provide incentives to learn from errors. And a thumb rule is to fix upstream issues than downstream issues.
Gelman also points out the negative externalities of transparency as a be-all fix. When you focus on transparency, secrecy is conflated with dishonesty.