The Fourth Paradigm: All About The Data - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Government // Enterprise Architecture
Commentary
12/15/2009
02:02 PM
Serdar Yegulalp
Serdar Yegulalp
Commentary
Connect Directly
Google+
Twitter
RSS
E-Mail
50%
50%

The Fourth Paradigm: All About The Data

Hope you're not tired of buzzwords. After "the network is the computer" and "the cloud", welcome to "data-intensive computing". This time, however, there's far more at work here than a clever turn of phrase.

Hope you're not tired of buzzwords. After "the network is the computer" and "the cloud", welcome to "data-intensive computing". This time, however, there's far more at work here than a clever turn of phrase.

On Monday, the New York Times took a page out of their Science section to talk about Dr. Jim Gray, a software engineer and researcher and former Microsoft researcher. Two years ago, he vanished off the coast of California in his yacht and was presumed dead, but left behind a major body of work in the field of mass information analysis.

His point of view was that scientists in general, not just computer scientists, are best served by creating systems that are designed from the inside out to process and help people visualize massive amounts of data efficiently. We live in a world where terabytes, petabytes and now exabytes of data are routinely generated . Not just by scientific research, but that's one of the best fields for applying these insights, and certainly one of the most fruitful.

Gray's work is about to get a major boost in the form of a collection of essays that expand greatly on his insights and premises. The book is named The Fourth Paradigm: Data-Intensive Scientific Discovery, and it's free. Not just in the sense that you can download the full PDF-format text of the book at the link, but free in its licensing. It's been published under the Creative Commons license, which allows people to re-use it in any number of contexts as long as they provide proper citation for the original work.

I can see this being used in any number of statistical or computer-science courses because of that. Who's going to say no to an insightful book about a timely subject that doesn't cost a dime to use, and is packed with things that would be more than worth paying for in a print edition?

The book's main focus is how we get and deal with volumes of data that form the axis upon which crucial research revolves:

The essays [in the book] focus on research on the earth and environment, health and well-being, scientific infrastructure and the way in which computers and networks are transforming scholarly communication. The essays also chronicle a new generation of scientific instruments that are increasingly part sensor, part computer, and which are capable of producing and capturing vast floods of data. For example, the Australian Square Kilometre Array of radio telescopes, CERN's Large Hadron Collider and the Pan-Starrs array of telescopes are each capable of generating several petabytes of digital information each day, although their research plans call for the generation of much smaller amounts of data, for financial and technical reasons.

Good scientific work requires that your results be reproducible. At least part of that is the sharing of data: if your number-crunching is suspect, other people can take your raw data and attempt to achieve the same results. Gray's work -- and those of his colleagues -- point towards a future where scientific work that involves such data sets is not only more automatic, but far more of an open process than it is now.

This isn't a matter of convenience, either. Our future survival as a species may depend on it. I'd wager it already does. The more of it that's done out in the open, and the more tools we have to enhance that process, the better.

Our "A New IT Manifesto" report looks at a variety of new approaches and technologies that let IT rebels take on a whole new role, enhancing their companies' competitiveness and engaging their entire organizations more intimately with customers. Download the report here (registration required).

Twitter: Me | InformationWeek
Facebook: InformationWeek

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
News
The State of Chatbots: Pandemic Edition
Jessica Davis, Senior Editor, Enterprise Apps,  9/10/2020
Commentary
Deloitte on Cloud, the Edge, and Enterprise Expectations
Joao-Pierre S. Ruth, Senior Writer,  9/14/2020
Slideshows
Data Science: How the Pandemic Has Affected 10 Popular Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  9/9/2020
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
IT Automation Transforms Network Management
In this special report we will examine the layers of automation and orchestration in IT operations, and how they can provide high availability and greater scale for modern applications and business demands.
Slideshows
Flash Poll