Code You Can Quote

GitHub, FigShare, and Mozilla join forces to make programming code fit for academic citation.

"Software is eating the world," declared entrepreneur Marc Andreessen in 2011, arguing that software-based businesses will disrupt many industries in the years to come.

Nowhere is that disruption more apparent than in academia, where online courses are transforming established traditions of education and even non-technical academic disciplines are becoming increasingly data-driven.

But software poses a problem in a world founded upon literature and published research: It isn't easily cited in a way that's meaningful to academic standards and expectations. At the same time, as more and more research papers and projects incorporate and rely upon computer code, researchers who create code find their career prospects constrained when their contributions are not documented in the accepted manner.

"Every level of academia is becoming more computational," explained Mark Hahnel, founder of FigShare, in a phone interview. "There are a lot of post-docs who create code but don't get credit for their work."

To help remedy the situation, FigShare, on online research sharing and citation service, has partnered with online code repository GitHub and Mozilla Science Lab, a Mozilla Foundation project supporting open science, to help those who write software get credit for their code in published research.

Through the partnership, programmers will be able to archive code created for research projects in a public GitHub repository and receive a citable digital object identifier (DOI) through FigShare. The resulting code will thus represent "research output" and will exist in a publicly accessible space where it will be available for reuse, to reproduce experimental results if necessary.

However, traditional publications don't make it easy to reproduce experiments. "A significant portion of research is technically impossible to reproduce," says Kaitlin Thaney, director of Mozilla Science. A reliable way to cite the code used in experiments could change that.

[But will this help Windows XP users? Read Windows XP Goes Dark: 5 Things To Expect.]

Hahnel observes that the initiative comes at a time when there's a push, particularly in government and policy circles, to make data more available. Last month, PLOS announced a new data-sharing policy for its various journals: "Authors must make all data publicly available, without restriction, immediately upon publication of the article."

With any luck, DOI-documented code will help ensure that experimental output is scientifically meaningful rather than dubious data.

It turns out there's quite a bit of dubious data: Nature last month reported that publishers Springer and IEEE decided to withdraw more than 120 papers from their subscription services after a French researcher found the works were "computer-generated nonsense." The papers were generated by software called SCIgen, created by MIT researchers in 2005 to demonstrate that academic conferences would accept gibberish.

Software may be eating the world, but blame human nature for the quality of the food.