Big Data. Big Decisions
InformationWeek
Special Coverage Series


Open Source: Key To Big Data Riches?

Big data makes a company smart, open source makes it rich, Gartner report contends.

Even though big data is approaching the peak of its hype cycle, there is value to be had from it immediately, according to a new Gartner report.

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

Despite its high potential for delivering short-term value by identifying customers' behavior or attitudes, however, big data's long-term value rests in the ability and willingness of both vendors and users to make heavily processed, manipulated big data stores open to tools other than those designed specifically for high-intensity analytics, according to David Newman, research VP at Gartner.

Big data is an imprecise term describing the amalgamation of many types of data from many sources--often into databases that push the upper boundaries of the hardware and software assigned to manage them.

The leading file-management/database software used to handle big data--Hadoop--is already open source, as are many of the tools purpose-built to add new functions.

The data sets themselves are a different story, however.

Building a useful base of big data is an extremely complex process that requires careful selection of data sources, parsing the available sources to pick only the data that is appropriate in date, content, and context, and an even more rigorous series of efforts to remove duplicate, corrupt, or inappropriate data, convert the remainder to a common format, and store the lot with a database manager able to handle the volume, variety, and occasional conflicts among insufficiently processed bits.

[ Learn about the biggest Big Data Development Challenges: Talent, Cost, Time. ]

"There are obvious integration issues when you're taking data from server logs and social networks and other non-standard sources, especially human-generated content," according to Mike Boyarski, director of product marketing for big data tools vendor Jaspersoft. "You have to be able to cull through the data and not create changes based on your cull, and you've got to prove the data are still correct and relevant. You need more than just the ability to collect data cheaply."

Once all that work is done and the big data set has answered the questions at hand, however, both the culled data and the work that went into it are going to waste if the data can only be used for that one purpose.

The cost of big data makes the most sense when its architects are able to use publicly available APIs, data conversion utilities, or common data and query formats to pull in additional data, transfer culled and cleaned data to a data broker that can pay for the privilege, or give employees access to the data through existing analytics or business intelligence applications.

"There is a positive relationship between the openness of information goods (for example, code, data, content, and standards) and information services (for example, services that offer information goods, such as the Internet, Wikipedia, OpenStreetMap and GPS) and the size and diversity of the community sharing them," according to the Gartner report. "From the viewpoint of enterprise information architects, this is known as the information-sharing network effect: the business value of a data asset increases the more widely and easily it is shared."

The primary method Gartner analysts recommend for companies wanting to share big data datasets or answers is the open API--a set of programming interfaces based on either an API set made available to customers of an enterprise application vendor, or a set of interfaces developed specifically to open source corporate data projects.

"The challenge for organizations is to determine how best to use APIs and how an open data strategy should align with business priorities," Newman said.

One additional tip about making money from big data as well as simply "getting smart," as Gartner's report puts it: Big data projects are difficult and expensive, so it makes sense to choose tools based on cost as well as functionality.

"Teams should use low-cost, open-source tools in early pilots to demonstrate the feasibility of big data projects," according to an April big data report, which predicted the best-practice habits established by enterprise architects could be the best investments for any company leaping into big data.

Open source tools tend to be less expensive upfront, have quicker and more ambitious roadmaps, and reflect (much more closely than other indicators) the real needs of the developers that use them, Boyarski said.

They also tend to preserve the value of both data and applications in the long run specifically because they don't trap either one in enterprise applications that rely on proprietary or hard-to-use APIs and data formats to keep users loyal to a single product set, according to Boyarski.

The community of open source users is much larger than Jaspersoft anticipated--more than a quarter million potential customers have downloaded various big data connectors from Jaspersoft's site.

Open source is also more secure and, in the long run, more useful because the people using the tools are often the ones building new features, reports, search algorithms, or other additions that enhance the value of the original software and make it possible to move apps, data, and reports out of kludged-together big data analytics frameworks and into brand new, high-function analytics, he said.

"After putting in all that work, the last thing you want is to have something trapped where you can't move it," Boyarski said.

InformationWeek is conducting a survey on big data. Take our InformationWeek 2013 Big Data Survey now. Survey ends Aug. 31.



Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

Follow InformationWeek

By The Numbers

What Are Your Primary Concerns About Using Big Data Software?

Base: 417 respondents at organizations using or planning to deploy data analytics, BI or statistical analysis software
Data: InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012

What Do You Think?

What's your attitude about SQL analysis on top of Hadoop?
We want fast, standard SQL analysis capabilities on Hadoop ASAP
Hadoop is for unstructured data; SQL is for relational databases
We'll give SQL on Hadoop a try, but relational DBs will remain the mainstay
Given strong SQL support on Hadoop, we'd nix the data warehouse
We're not interested in Hadoop
No opinion



Related Content

From Our Sponsor

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Business leaders often need a visual snapshot of data to quickly grasp and use it. This paper identifies five challenges in presenting data and how visual analytics can resolve them. Solutions are suggested to overcome the challenges of: speed, data clarity, data quality, displaying meaningful results, and dealing with outliers.

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Today's competitive advantage requires a deeper understanding of your business, your market and your customers. As an IT executive, you can drive that knowledge transformation. In this white paper, learn how to make decisions as a strategic business leader and three steps to begin an analytics initiative within your enterprise.

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

High-performance data visualization turns sophisticated analyses into meaningful graphics, leading to faster and smarter decision making. In this white paper, learn how visual analytics can transform big data, with additional features such as real-time functionality, mobile compatibility, robust applications for technical groups and accessibility for nontechnical users.

Big Data: Lessons from the Leaders

Big Data: Lessons from the Leaders

Financial performance, competitive advantage, operational efficiency, strategic decision making - every business goal can extract value from big data, and the time for doubt or inaction has long passed. In this Economist Intelligence Unit report, in-depth interviews with data pioneers reveal the link between the effective use of big data and the bottom line among other results.

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Which came first, the data or the decision? This white paper makes the case for having a decision in mind, then tailoring big data's volume, variety and velocity to achieve business results such as overcoming customer dissatisfaction or creating well-informed strategies in real time.

Informationweek Reports

Research: The Big Data Management Challenge

Research: The Big Data Management Challenge

The challenge of big data is real, but most organizations don't differentiate 'big data' from traditional data, and nearly 90% of respondents to our survey use conventional databases as the primary means of handling data. We'll help you understand what constitutes big data (it's not just size) and the numerous management challenges it poses.