While the Obama Administration might talk a lot about transparency, government agencies face any number of challenges to turn that buzzword into reality, especially in terms of getting usable data online.
While the Obama Administration might talk a lot about transparency, government agencies face any number of challenges to turn that buzzword into reality, especially in terms of getting usable data online.According to Mary McCaffery, senior advisor to the Environmental Protection Agency's CIO and a member of the groups setting up data.gov and recovery.gov -- two sites designed to put general government data and stimulus plan data online, respectively -- federal CIO Vivek Kundra has already defined 76 different data feeds he wants to put online.
It's an admirable pursuit. Much more data should be on the way, but how to handle all this data will prove trickier. Here are a number of challenges the government faces, and my take on them.
How much information should the government share? What information should the government share?
As a reporter, I guess I'm biased toward the release of more information rather than less, but frankly, the more information, the better. There's always going to be someone interested in geographic data about broadband penetration and another person interested in water quality data from Michigan in the 1940s. As long as it isn't sensitive, personally identifiable or classified, there's room for it to be made available. However, an abundance of data leads to another question.
How should the government organize the data?
It's unclear whether the government should point people to agency Websites, like recovery.gov does by pointing them to state recovery websites, or whether everything should just be posted centrally. State-run websites on recovery.gov could make it harder for citizens to digest information, as Maine's recovery site might be organized differently than Oregon's recovery Website. It might be prudent, then, to create some standard organizational hierarchies to make the data more navigable.
However, even with well-structured data, citizens will come up against information overload. Additional tools are likely needed to search, tag, and develop on the data that is made available, and many of those will likely be created by developers, who will have to leverage APIs that the government will have to develop.
In what format should the government share information?
Data in some sort of standard databases or presented in XML is easily parsed, graphical data is easily viewed. Discuss.
Where should the data be hosted?
Right now, government data is everywhere, dispersed throughout dozens of government agencies, stored everywhere from tapes to mainframes. It's unclear whether data should be centralized, hosted in clouds, or...
Who pays for this?
It's unclear where the money is going to come from to put all this data online. It's not a trivial task, and will cost lots of money and a significant amount of manpower.
These are just the potential problems with data transparency discussed in a short conversation with McCaffery and others at Government 2.0 Camp, but there are surely many others that could arise. What other issues might data.gov and their ilk face?
The Agile ArchiveWhen it comes to managing data, donít look at backup and archiving systems as burdens and cost centers. A well-designed archive can enhance data protection and restores, ease search and e-discovery efforts, and save money by intelligently moving data from expensive primary storage systems.
2014 Analytics, BI, and Information Management SurveyITís tried for years to simplify data analytics and business intelligence efforts. Have visual analysis tools and Hadoop and NoSQL databases helped? Respondents to our 2014 InformationWeek Analytics, Business Intelligence, and Information Management Survey have a mixed outlook.