Exchanging research data worldwide gets more complex in the big data age. But the Research Data Alliance seeks innovative solutions.
Google's 10 Big Bets On The Future
(Click image for larger view and slideshow.)
The Croke Park Stadium in Dublin is usually the scene of hurling, Gaelic football, and other sports. But for four days at the end of March, it was home to 497 researchers and data scientists from 31 countries from the Research Data Alliance, all working intensively to tackle a question that hits data users in academia, private enterprise, and the public sector: How do we share and exchange research data worldwide more effectively in an age when data is growing exponentially?
The Research Data Alliance (RDA) is a new international organization that, in one short year, has attracted the attention of an international data community of more than 1,500 members from more than 70 countries. RDA members gather in working groups to create, adopt, and use infrastructure for sharing research data, and meet in interest groups that identify and advance the discussion on data sharing within a variety of domains and communities.
The public sector has an important stake in these discussions. Federal agencies generate a vast amount of agriculture, climate, health, science, and space data that researchers depend upon and contribute to. But there's a shared concern among many in the data community on how to effectively host and share data over the long term in order to maximize return on investment.
That's one reason the RDA Plenary has become such an important forum. Members meet face-to-face twice yearly at the RDA Plenary, which is simultaneously a working meeting, a data community "town hall," and a venue for compelling presentations from community and world leaders.
The Plenary in Dublin was the third plenary of the RDA. Previously, the group met at the National Academies in Washington, D.C., and for RDA's launch in Gothenburg, Sweden.
RDA's momentum comes in part from its focus on data infrastructure and its potential to enhance the value of sponsored research. Examples of data infrastructure include tools for data discovery, interoperability frameworks, and data type registries, as well as community standards, adopted policy, and shared practice. RDA working groups must find ways to bring specific value to the data-sharing community (i.e., both adopters and users must be identified and included in the working group when it is proposed). In Dublin, more than three dozen working groups, interest groups, and "birds of a feather" groups jockeyed for breakout rooms and bandwidth.
One new group at the Dublin Plenary, for instance, was the Data Repositories Interest Group, a group of US and international repositories that host publicly accessible research data for diverse research communities (e.g., astronomy, social science, biology, etc.). The interest group shared best-practices for repository management and is working to identify and develop models for organizational sustainability. This is especially important with increasing expectations from the US and other countries that publicly funded research data be made publicly available.
In another example, the newly minted Wheat Data Interoperability Working Group is contributing common standards, vocabularies, and an interoperability framework to the Global Wheat Initiative Information System. By helping combine genomic annotations, phenotypes, genetic maps, physical maps, and germplasm data, the working group's infrastructure will help researchers and decision makers improve agricultural productivity and answer questions such as "What genes and traits are relevant for understanding the impact of climate change on wheat plant productivity?"
Consistent with the RDA's focus on infrastructure use and adoption, the infrastructure developed by the RDA Wheat Data Interoperability Working Group will be adopted within the Global Forum on Agricultural Research (GFAR), the Cooperative Group on International Agricultural Research (CGIAR), and the Coherence in Information for Agricultural Research for Development (CIARD) movement, which is striving to open up access to agricultural knowledge worldwide.
The quality of the discussions was also reflected by this year's speakers, which included leaders in the data community as well as major stakeholders. Research keynoters included Tony Hey, VP of Microsoft Research Connections, who discussed data science; and Milena Žic Fuchs, chair of the Standing Committee for the Humanities in the European Science Foundation, who discussed digital humanities. Stakeholder keynoters included John Henry, the Irish Minister of State; Mark Ferguson, director general of the Science Foundation of Ireland and chief scientific advisor to the government of Ireland; and Ian Chubb, Australia's chief scientist. US representatives from the National Science Foundation, the National Institute of Standards and Technology, and the Department of Energy also participated.
Both the RDA and its twice-yearly plenaries are emerging as loci for researchers, funding agencies, organizations, and companies to work together to meet, exchange ideas, and form common agendas. At Plenary 2 in Washington, for instance, multiple data organizations used the RDA Plenary to form a common agenda around data citation, laying the groundwork for greater access to data described in scholarly publications and greater recognition of data scientists in the research community.
Research data is increasingly recognized as a key driver for innovation, but only when there is sufficient infrastructure to access, share, and preserve the data. RDA's efforts come at a time when coordinated and community effort is critical to build the data ecosystem needed for low-barrier data access and sharing across scales, technologies, and cultures.
These five higher education CIOs are driving critical changes in an industry ripe for digital disruption. Also in the Chiefs Of The Year issue of InformationWeek: Stop bragging about your Agile processes and make them better. (Free registration required.)
Dr. Francine Bermanis chair of Research Data Alliance (US) and the Edward P. Hamilton distinguished professor in computer science at Rensselaer Polytechnic Institute. In 2009, Berman was the inaugural recipient of the ACM/IEEE-CS Ken Kennedy Award for "influential leadership in the design, development, and deployment of national-scale cyberinfrastructure." Berman is former director of the San Diego Supercomputer Center and is currently serving as co-chair of the National Academies Board on Research Data and Information.
Mark Parsons is the secretary general of the Research Data Alliance, a global organization devoted to reducing barriers to data sharing. He has been leading major data stewardship efforts for more than 20 years. Prior to joining RDA, Parsons was a senior associate scientist and the lead project manager at the National Snow and Ice Data Center (NSIDC), where he defined and implemented the overall data management process. He is currently active in several international committees aimed at accelerating innovation through data exchange.
Skirting the Big Data Expertise ShortageFederal departments and agencies have embraced big data in a big way, despite a shortage of trained and experienced workers, particularly data scientists. What tools and strategies are helping bridge the divide?
Big Data, Big ChallengesIf there’s one asset the U.S. government has in abundance, it’s data. But a fight for expertise is hindering both the public and private sectors when it comes to managing and mining information. Can Uncle Sam compete for talent?