countpy: Incentivizing more and better software

22 Mar

Developers of Python packages sometimes envy R developers for the simple perks they enjoy. For example, a reliable web service that gives a reasonable indication of the total number of times an R package has been downloaded (albeit only from one of the mirrors). To achieve the same, Python developers need to do a Google BigQuery (which costs money) and wait for 30 or so seconds.

Then there are sore spots that are shared by all developers. Downloads are a shallow metric. Developers often want to know how often people use their packages. Without such a number, it is hard to defend against accusations like, “the total number of downloads is unreliable because they can be padded by numerous small releases,” “the total number of downloads doesn’t reflect how often people use the software,” etc. We partly solve this problem for Python developers by providing a website that tallies how often a package is used in repositories on Github, the largest open-source software hosting platform. http://countpy.com (Defunct. Code.) provides the total number of times a package has been called in the requirements file and in the import statement in files in Python language repositories.

The net benefit (loss) of a piece of software is, of course, greater than tallied by counts of how many people use it directly in the software they build. We don’t yet count indirect use: software that uses software that uses the software of interest. Ideally, we would like to tally the total time saved, the increase in the number of projects started, projects that wouldn’t have started had the software not been there, the impact on the style in which other code is written, and such. We also want to tally the cost of errors in the original software. To the extent that people don’t produce software because they can’t be credited reasonably for it, better metrics about the impact of software can increase the production of software andincrease the quality of the software that is being provided.