Lack of reproducibility is a symptom of science in crisis. An eye-catching symptom to be sure, but hardly the only one vying for attention. Recent analyses suggest that nearly two-thirds of the (relevant set of) articles published in prominent political science journals condition on post-treatment variables (see here.) Another set of analysis suggests that half of the relevant set of articles published in prominent neuroscience journals treat difference in significant and non-significant result as the basis for the claim that difference between the two is significant (see here). What is behind this? My guess: poor understanding of statistics, poor editorial processes, and poor strategic incentives.
Poor understanding of statistics: It is likely the primary reason. For it would be good harsh to impute bad faith on part of those who use post-treatment variables as control or treating difference between significant and non-significant result as significant. There is likely a fair bit of ignorance — be it on the part of authors or reviewers. If it is ignorance, then the challenge doesn’t seem as daunting. Let us devise good course materials, online lectures, and teach. And for more advanced scholars, some outreach. (And it may involve teaching scientists how to write-up their results.)
Poor editorial processes: Whatever the failings of authors, they aren’t being caught during the review process. (It would be good to know how often reviewers are actually the source of bad recommendations.) More helpfully, it may be a good idea to create small questionnaires before submission that alert authors about common statistical issues.
Poor strategic incentives: If authors think that journals are implicitly biased towards significant findings, we need to communicate effectively that it isn’t so.