Researchers spotted new ways to treat cancer when the National Institutes of Health enabled semantic searches of its huge Medline database of published medical articles.
7 Cool Products At Interop New York
(Click image for larger view and slideshow.)
The point of big data is to be able to extract usable information -- knowledge -- from large volumes of data that do not have any immediately apparent relationships. Even with advances in computing power, the task of searching to find correlations can be daunting and even impractical if the datasets are large enough.
The National Institutes of Health has enabled semantic searches of the data in its Medline database, allowing researchers to find correlations in published medical data between therapies and outcomes that had not been noticed before. In one case, cancer researchers using graph analysis were able to see that in some types of cancer cases immunotherapy produced better results than chemotherapy.
"It's a real discovery," said Brand Niemann, founder of the Federal Big Data Working Group and former senior enterprise architect and data scientist that the Environmental Protection Agency. "It's like finding a needle in the haystack of medical literature."
The haystack is Medline, the bibliographical database of the National Library of Medicine, which contains more than 21 million references to medical journal articles dating back to 1946. The database contains an embarrassment of riches, with 2,000 to 4,000 new references added daily, five days a week, in 2013 alone. These entries have been enhanced with 65 million semantic predications -- entries using semantic markup standards -- resulting in 2.2 billion Resource Description Framework statements.
To make the search practical, researchers used the Urika graph analytics appliance from YarcData. Urika works with existing data warehouses to handle graph workloads, which allow relationships within the data to be plotted graphically. All resources to be searched are stored on the appliance's shared memory, so data does not have to first be partitioned or formed in data models. The team was able to identify connections between outcomes of therapies for different types of cancers from the 10 million semantic predications.
By creating a practical way to extract visual relationships from the data, the researchers were able to find the correlations quickly and without first developing a hypothesis about them. Making the data semantically searchable enables analysis that can make better use of existing data to drive future research, Niemann said.
The owners of electronic health records aren't necessarily the patients. How much control should they have? Get the new Who Owns Patient Data? issue of InformationWeek Healthcare today.
William Jackson is writer with the <a href="http://www.techwritersbureau.com" target="_blank">Tech Writers Bureau</A>, with more than 35 years' experience reporting for daily, business and technical publications, including two decades covering information ... View Full Bio
Big Data, Big ChallengesIf thereís one asset the U.S. government has in abundance, itís data. But a fight for expertise is hindering both the public and private sectors when it comes to managing and mining information. Can Uncle Sam compete for talent?
Top IT Trends to Watch in Financial ServicesIT pros at banks, investment houses, insurance companies, and other financial services organizations are focused on a range of issues, from peer-to-peer lending to cybersecurity to performance, agility, and compliance. It all matters.
Join us for a roundup of the top stories on InformationWeek.com for the week of September 18, 2016. We'll be talking with the InformationWeek.com editors and correspondents who brought you the top stories of the week to get the "story behind the story."