Rocks and Scissors for Papers

17 Apr

Zach and Jack* write:

What sort of papers best serve their readers? We can enumerate desirable characteristics: these papers should

(i) provide intuition to aid the reader’s understanding, but clearly distinguish it from stronger conclusions supported by evidence;

(ii) describe empirical investigations that consider and rule out alternative hypotheses [62];

(iii) make clear the relationship between theoretical analysis and intuitive or empirical claims [64]; and

(iv) use language to empower the reader, choosing terminology to avoid misleading or unproven connotations, collisions with other definitions, or conflation with other related but distinct concepts [56].

Recent progress in machine learning comes despite frequent departures from these ideals. In this paper, we focus on the following four patterns that appear to us to be trending in ML scholarship:

1. Failure to distinguish between explanation and speculation.

2. Failure to identify the sources of empirical gains, e.g. emphasizing unnecessary modifications to neural architectures when gains actually stem from hyper-parameter tuning.

3. Mathiness: the use of mathematics that obfuscates or impresses rather than clarifies, e.g. by confusing technical and non-technical concepts.

4. Misuse of language, e.g. by choosing terms of art with colloquial connotations or by overloading established technical terms.

Funnily Zach and Jack fail to take their own advice, forgetting to distinguish between anecdotal evidence (they claim a ‘troubling trend’ without presenting systematic evidence for it). But the points they make are compelling. The second and third points are especially applicable to economics though they apply to a lot of scientific production.


* It is Zachary and Jacob.

Citing Working Papers

2 Apr

Public versions of working papers are increasingly the norm. So are citations to them. But there are four concerns with citing working papers:

  1. Peer review: Peer review improves the quality of papers, but often enough it doesn’t catch serious, basic issues. Thus, a lack of peer review is not as serious a problem as is often claimed.
  2. Versioning: Which version did you cite? Often, there is no canonical versioning system. The best we have is tracking which conference was the paper presented at. This is not good enough.
  3. Availability: Can I check the paper, code, and data for a version? Often enough, the answer is no.

The solution to the latter two is to increase transparency through the entire pipeline. For instance, people can check how my paper with Ken has evolved on Github, including any coding errors that have been fixed between versions. (Admittedly, the commit messages can be improved. Better commit messages—plus descriptions—can make it easier to track changes across versions.)

The first point doesn’t quite deserve addressing in that the current system draws an optimistic line on the quality of published papers. Peer review ought not to end when a paper is published in a journal. If we accept that, then all concerns flagged by peers and non-peers can be addressed in various commits or responses to issues and appropriately credited.

Stemming Link Rot

23 Mar

The Internet gives many things. But none that are permanent. That is about to change. Librarians got together and recently launched https://perma.cc/ which provides a permanent link to stuff.

Why is link rot important?

Here’s an excerpt from a paper by Gertler and Bullock:

“more than one-fourth of links published in the APSR in 2013 were broken by the end of 2014”

If what you are citing evaporates, there is no way to check the veracity of the claim. Journal editors: pay attention!

Sometimes Scientists Spread Misinformation

24 Aug

To err is human. Good scientists are aware of that, painfully so. The model scientist obsessively checks everything twice over and still keeps eyes peeled for loose ends. So it is a shock to learn that some of us are culpable for spreading misinformation.

Ken and I find that articles with serious errors, even articles based on fraudulent data, continue to be approvingly cited—cited without any mention of any concern—long after the problems have been publicized. Using a novel database of over 3,000 retracted articles and over 74,000 citations to these articles, we find that at least 31% of the citations to retracted articles happen a year after the publication of the retraction notice. And that over 90% of these citations are approving.

What gives our findings particular teeth is the role citations play in science. Many, if not most, claims in a scientific article rely on work done by others. And scientists use citations to back such claims. The readers rely on scientists to note any concerns that impinge on the underlying evidence for the claim. And when scientists cite problematic articles without noting any concerns they very plausibly misinform their readers.

Though 74,000 is a large enough number to be deeply concerning, retractions are relatively infrequent. And that may lead some people to discount these results. Retractions may be infrequent but citations to retracted articles post-retraction are extremely revealing. Retractions are a low-low bar. Retractions are often a result of convincing evidence of serious malpractice, generally fraud or serious error. Anything else, for example, a serious error in data analysis, is usually allowed to self-correct. And if scientists are approvingly citing retracted articles after they have been retracted, it means that they have failed to hurdle the low-low bar. Such failure suggests a broader malaise.

To investigate the broader malaise, Ken and I exploited data from an article published in Nature that notes a statistical error in a series of articles published in prominent journals. Once again, we find that approving citations to erroneous articles persist after the error has been publicized. After the error has been publicized, the rate of citation to erroneous articles is, if anything, higher, and 98% of the citations are approving.

In all, it seems, we are failing.

The New Unit of Scientific Production

11 Aug

One fundamental principle of science is that there is no privileged observer. You get to question what people did. But to question, you first must know what people did. So part of good scientific practice is to make it easy for people to understand how the sausage was made—how the data were collected, transformed, and analyzed—and ideally, why you chose to make the sausage that particular way. Papers are ok places for describing all this, but we now have better tools: version controlled repositories with notebooks and readme files.

The barrier to understanding is not just lack of information, but also poorly organized information. There are three different arcs of information: cross-sectional (where everything is and how it relates to each other), temporal (how the pieces evolve over time), and inter-personal (who is making the changes). To be organized cross-sectionally, you need to be macro organized (where is the data, where are the scripts, what do each of the scripts do, how do I know what the data mean, etc.), and micro organized (have logic and organization to each script; this also means following good coding style). Temporal organization in version control simply requires you to have meaningful commit messages. And inter-personal organization requires no effort at all, beyond the logic of pull requests.

The obvious benefits of this new way are known. But what is less discussed is that this new way allows you to critique specific pull requests and decisions made in certain commits. This provides an entirely new way to make progress in science. The new unit of science also means that we just don’t dole out credits in crude currency like journal articles but we can also provide lower denominations. We can credit each edit, each suggestion. And why not. The third big benefit is that we can build epistemological trees where the logic of disagreement is clear.

The dead tree edition is dead. It is also time to retire the e-version of the dead tree edition.

Sigh-tations

1 May

In 2010, Google estimated that approximately 130M books had been published.

As a species, we still know very little about the world. But what we know already far exceeds what any of us can learn in a lifetime.

Scientists are acutely aware of the point. They must specialize, as chances of learning all the key facts about anything but the narrowest of the domains are slim. They must also resort to shorthand to communicate what is known and what is new. The shorthand that they use is—citations. However, this vital building block of science is often rife with problems. The three key problems with how scientists cite are:

1. Cite in an imprecise manner. This broad claim is supported by X. Or, our results are consistent with XYZ. (Our results are consistent with is consistent with directional thinking than thinking in terms of effect size. That means all sorts of effects are consistent, even those 10x as large.) For an example of how I think work should be cited, see Table 1 of this paper.

2. Do not carefully read what they cite. This includes misstating key claims and citing retracted articles approvingly (see here). The corollary is that scientists do not closely scrutinize papers they cite, with the extent of scrutiny explained by how much they agree with the results (see the next point). For a provocative example, see here.)

3. Cite in a motivated manner. Scientists ‘up’ the thesis of articles they agree with, for instance, misstating correlation as causation. And they blow up minor methodological points with articles whose results their paper’s result is ‘inconsistent’ with. (A brief note on motivated citations: here).

How Do We Know?

17 Aug

How can fallible creatures like us know something? The scientific method is about answering that question well. To answer the question well, we have made at least three big innovations:

1. Empiricism. But no privileged observer. What you observe should be reproducible by all others.

2. Open to criticism: If you are not convinced about the method of observation, the claims being made, criticize. Offer reason or proof.

3. Mathematical Foundations: Reliance on math or formal logic to deduce what claims can be made if certain conditions are met.

These innovations along with two more innovations have allowed us to ‘scale.’ Foremost among the innovations that allow us to scale is our ability to work together. And our ability to preserve information on stone, paper, electrons, allows us to collaborate with and build on the work done by people who are now dead. The same principle that allows us to build as gargantuan a structure as the Hoover Dam and entire cities allows us to learn about complex phenomenon. And that takes us to the final principle of science.

Peer to Peer

20 Mar

Peers are equals, except as reviewers, when they are more like capricious dictators. (Or when they are members of a peerage.)

We review our peers’ work because we know that we are all fallible. And because we know that the single best way we can overcome our own limitations is by relying on well-motivated, informed, others. We review to catch what our peers may have missed, to flag important methodological issues, to provide suggestions for clarifying and improving the presentation of results, among other such things. But given a disappointingly long history of capricious reviews, authors need assurance. So consider including in the next review a version of the following note:

Reviewers are fallible too. So this review doesn’t come with the implied contract to follow all ill-advised things or suffer. If you disagree with something, I would appreciate a small note. But rejecting a bad proposal is as important as accepting a good one.

Fear no capriciousness. And I wish you well.

Motivated Citations

13 Jan

The best kind of insight is the ‘duh’ insight—catching something that is exceedingly common, almost routine, but something that no one talks about. I believe this is one such insight.

The standards for citing congenial research (that supports the hypothesis of choice) are considerably lower than the standards for citing uncongenial research. It is an important kind of academic corruption. And it means that the prospects of teleological progress toward truth in science, as currently practiced, are bleak. An alternate ecosystem that provides objective ratings for each piece of research is likely to be more successful. (As opposed to the ‘echo-system’—here are the people who find stuff that ‘agrees’ with what I find—in place today.)

An empirical implication of the point is that the average ranking of journals in which congenial research that is cited is published is likely to be lower than in which uncongenial research is published. Though, for many of the ‘conflicts’ in science, all sides of the conflict will have top-tier publications—-which is to say that the measure is somewhat crude.

The deeper point is that readers generally do not judge the quality of the work cited for support of specific arguments, taking many of the arguments at face value. This, in turn, means that the role of journal rankings is somewhat limited. Or more provocatively, to improve science, we need to make sure that even research published in low ranked journals is of sufficient quality.

The Case for Ending Closed Academic Publishing

21 Mar

A few commercial publishers publish a large chunk of top flight of academic research. And earn a pretty penny doing so. The standard operating model of the publishers is as follows: pay the editorial board no more than $70-$100k, pay for typesetting and publishing, and in turn get copyrights to academic papers. And then go on and charge already locked in institutional customers—university and government libraries—and ordinary scholars extortionary rates. The model is gratuitously dysfunctional.

Assuming there are no long term contracts with the publishers, the system ought to be rapidly dismantled. But if dismantling is easy, creating something better may not be. It just happens to be. A majority of the cost of publishing is in printing on paper. Twenty first century has made printing large organized bundles on paper largely obsolete; those who need it can print on paper at home. Beyond that, open source software for administering a journal already exists. And the model of a single editor with veto powers seems anachronistic. Editing duties can be spread around much like peer review. As unpaid peer review can survive as it always has, though better mechanisms can be thought about. If some money is still needed for administration, it could be gotten easily by charging a nominal submission tax, waived where the author self identifies as being unable to pay.