On the scale of the federal government, it should come as no surprise that open source innovation would be a messy affair. Open source doesn't lend itself to central planning. If the consumers of code have complete freedom, including the freedom to fork -- or to abandon an open source project and try to create something better on their own -- then they will tend to do so.
Consider the case of Pillbox, an initiative of the National Libraries of Medicine, which is part of the National Institutes of Health and ultimately part of the US Department of Health and Human Services. Pillbox is a search engine for medicines, a tool for identifying loose pills by shape, size, color, and the text imprinted on the outside of the capsule. Working with the Veterans Administration's huge pharmacy system, the Pillbox team captures pill images and matches them with products from the drug label database maintained by the U.S. Food and Drug Administration. The first version of the tool came out in 2010, and since then it has been opened up further with API and source code releases on API documentation and code releases on GitHub.
[Are open source projects the answer to lack of government creativity? Read Federal IT Innovation Depends On Being Open.]
At some point, the Pillbox project was probably doing more to make FDA data organized and publicly accessible than the FDA itself. The FDA announced its OpenFDA initiative in June, after catching up on a paperwork backlog. The FDA's first open data API is for adverse drug event reports, but it's meant to be the first in a series of open data initiatives from the agencies.
When I met Pillbox project manager David Hale at the Federal Big Data Summit earlier this summer, he was hopeful that the two projects would prove complementary. He also complimented the FDA on the way it had used the Amazon cloud for data processing, storage, and elastic search. "That is absolutely groundbreaking, not just within government but especially within the FDA -- showing that it's okay to use something like the cloud and that there are immediate benefits to doing so," he said.
At the same time, although he was circumspect discussing the politics of interagency cooperation and competition, I could tell he also had some concerns about whether the FDA would wind up duplicating his efforts unnecessarily. The Pillbox project has been politically fraught all along, sometimes running afoul of concerns within his own agency that it was "not scientific" in the mode of most NLM research and might open the agency up to liability. The Pillbox site was taken offline a couple of times as a result, and the version that's live now is plastered with disclaimer messages that the data shouldn't necessarily be counted on for life-and-death decisions. Yet emergency room doctors have used the tool to identify pills that a patient used in an attempted overdose. Through the API, Pillbox data also has been incorporated into the pill identification feature on the Drugs.com website and into mobile apps for use by doctors and first responders.
Some federal open data initiatives make the mistake of believing that just making the data available or just providing an API is enough. But what those in government can really contribute is an understanding of how the data is structured and what it means. By making the Python code behind Pillbox available as open source, he believes he is conveying more of that information -- and allowing outsiders to see the assumptions baked in to how the software does what it does.
Along the way, Hale believes he has learned a lot about how to scrub the data he pulls in from the FDA and other sources to make it more useful (more explanation in the video included below). He doesn't want to see that work go to waste.
Yet almost by definition, open source projects inside or outside of government invite developers who think they have a better idea to blaze their own path. You could argue that it's a waste of effort for the developers of WordPress and Drupal to invest so much effort into solving a lot of the same problems in Web content management -- but each platform also solves problems that the other does not.
Damon Davis, the director of the Health Data Initiative and an official of the HHS CTO's office, acknowledged that tension while participating in a panel discussion along with Hale at the Federal Big Data conference. "It does happen sometimes that there are two very similar projects happening at the same time," he said. "A lot of times the leaders of those efforts are very open to merging their projects together." However, forcing the issue doesn't make sense -- sometimes two projects that superficially look redundant are actually pursuing distinct goals, he said.
"We have to understand, too, what the differences are," Damon said. "It's a major, major ball of yarn that can be tough to unwind."
A light touch is probably the right touch, encouraging cooperation where it makes sense while understanding the role of healthy competition.
Sophisticated attacks demand real-time risk management and continuous monitoring. Here's how federal agencies are meeting that challenge. Get the new Flexibility Equals Strength issue of InformationWeek Government Tech Digest today. (Free registration required.)