countpy: Incentivizing more and better software

22 Mar

Developers of Python packages sometimes envy R developers for the simple perks they enjoy, like a reliable web service that gives a reasonable fill-in for the total number of times an R package has been downloaded. To achieve the same, Python developers need to do a Google BigQuery (which costs money) and wait for 30 or so seconds.

Then there are sore spots that are shared by all developers. Downloads are a shallow metric. Developers often want to know how often other people writing software use their package. Without such a number, it is hard to defend against accusations like, “the total number of downloads are unreliable because they can be padded by numerous small releases,” “the total number of downloads doesn’t reflect how often people use the software,” etc. We partly solve this problem for Python developers by providing a website that tallies how often a package is used in repositories on Github, the largest open-source software hosting platform. provides the total number of times a package has been called in the requirements file and in the import statement in files in Python language repositories. (At the time of writing, the crawl is incomplete.)

The net benefit (loss) of a piece of software is, of course, greater than mere counts of how many people use it directly in the software they build. We don’t yet count indirect use: software that uses software that uses the software of interest. Ideally, we would like to tally the total time saved, the increase in the number of new projects started, projects which wouldn’t have started had the software not been there, impact on style in which other code is written, and such. We may also need to tally the cost of errors in the original software. To the extent that people don’t produce software because they can’t be credited reasonably for it, better metrics about the impact of software can increase the production of software and increase the quality of the software that is being provided.