Big Data. Big Decisions
InformationWeek
Special Coverage Series


When Big Data Questions Can't Wait For Data Scientists

Alteryx and other vendors are pushing tools that aim to make big data accessible to business-side teams and other non-specialists.

Big Data Talent War: 10 Analytics Job Trends
Big Data Talent War: 10 Analytics Job Trends
(click image for larger view and for slideshow)
The problem with big data--the reason it hasn't been adopted into production roles at more companies--isn't just the nonexistent budgets, lack of skills to manage big data properly, or the lack of a demonstrable business case, according to business-intelligence vendor Alteryx.

More Insights

Webcasts

More >>

White Papers

More >>

Reports

More >>

The big problem, the one that's holding up progress on all the other fronts but especially the development of business cases, is that the tools to analyze and manage big data projects are as rare, complicated, and specialized as the high-level statistics and data-integration requirements in a data-scientist's job description.

Companies such as Alteryx say their products humanize data and make it simple enough for non-data-specialists to use. Alteryx's Strategic Analytics product includes data-cleansing and management tools designed to give non-specialists the ability to extract data to work with themselves, rather than waiting for a data scientist to have time.

There are good reasons to make complex analytics more accessible to analysts working in business units rather than in IT, according to Shalini Das, research director for the Washington, D.C.-based CIO Executive Board. "According to the research we've done on big data during the past couple of years, about 82% of employees in an average company are knowledge workers who need some analytic skills and access to information to do their jobs," Das said. "Saying we need to restrict access to big data or new analytics to a set of specialists creates a bottleneck where most knowledge workers have to put in a request and then wait for specialists with specialized tools to do their jobs."

The whole point of big data is to make it possible for business people to find answers where they previously found only data. That can't happen if tools available to handle big-data analyses are so complex only data scientists can use them, she said.

Data scientists are an absolute necessity for big companies faced with mountains of raw data they don't know what to do with, according to Mike Boyarski, director of product marketing for business-intelligence/big-data software vendor Jaspersoft, which recently published a survey on the topic.

Parsing, cleaning, de-duplicating and preparing raw text or machine-to-machine data to be analyzed as if it consisted of numbers slotted into cells in a relational-database table is not a job for the faint of heart or shallow of understanding, Boyarski said. Whipping big data into decent shape requires someone very good with statistics, with a deep understanding of how business works and what business units actually need to know. Without that understanding, whoever chooses the sources of a developing big-data set and decides how to use it will tend to skew the project toward the needs of data scientists, not end users who ultimately need the answers, Boyarski said.

[ Read How To Choose 'Advanced' Data Visualization Tools. ]

Most companies have access to huge amounts of data, but risk creating a system that takes garbage in and puts garbage out if they don't filter, manage, and process both structured and unstructured data to make it work effectively with existing analytics, according to a report from Ventana Research. According to the survey Jaspersoft ran on 600 members of its Hadoop open-source big-data analytics community, analytics able to deliver solid information on the experience and attitudes of customers is the No. 1 user requirement for big data projects.

Customer-experience analytics are simply one more tool to give corporate planners some insight into the plans, requirements, and attitudes of their customers--exactly the kind of tool that could enhance and eventually replace spreadsheets as the go-to data tool for corporate planners, Das said. The other top five requirements fall into the same category: Customer segmentation and churn analyses; marketing campaign optimization; financial risk analysis; marketing competitive analysis.

Sixty percent of respondents are using relational databases as their primary big-data store, which makes complex analytics more difficult than with specialized tools, Boyarski said. Even among members of the forums at Jaspersoft, which uses Hadoop as its main big-data filing system, only 18% of respondents use either Hadoop or the big-data-mananging MongoDB as their big-data data stores, the survey showed.

Of those who responded to the survey, only 6% had business-unit titles. The others were application developers, report developers, or BI system administrators. That mix shows how little the business units are often involved in big-data projects, even though it is to them that the data-driven revelations of big data are supposedly made, Das said.

"About 85% of the data in corporate environments can't be analyzed with the usual tools available to people in those businesses," Das said. "So people are on board with the need to make decisions using more than 15% of the available information; the tools available and their knowledge of what to do with them are still somewhat lacking, however."

Of three hype-burdened technologies for which venture-capital firm Ascent Partners has already created a metric based on public discussions about new technologies, big data attracted by far the most mentions during April, May, June, and July. That could mean there is far more interest in big data in general than in BYOD or cloud security, the other two areas Ascent measured, said Ascent blogger Matt Fates. Most of the discussions were about the scalability of big data, how to parse and analyze increasingly large data sets very quickly, Fates wrote. The acquisition of former social-networking market leader Digg by news aggregator Betaworks in July was at least partially due to Digg's failure to keep under control the MySQL database in which user-entered data was stored, he wrote.

That failure slowed down the whole service, which put off users wanting to recommend sites to their friends, or find recommended sites, leading to a drop in Digg's estimated value from $160 million to $500,000 by the time it was acquired by Betaworks, which said it would combine Digg with its own News.me to produce a news discovery and sharing site, according to the Washington Post.

The Alteryx tools are designed to allow a business analyst to ask a question, then guide him or her through the process of identifying potential sources for relevant data, assembling the data into a single data store, cleaning and enhancing the results with metadata to add context, and then passing the results to analytics and workflow modules.

That kind of functionality is rare and its availability will be a critical element in the success or failure of individual big-data projects, or at least of the projects' ability to do the job business analysts want them to do, Das said.

"The question right now is the level of maturity in the market for tools. We are still early in the early-adopter segment of the adoption bell curve, so it's not surprising that tools aren't widely available that make it easier and that encourage the later adopters to use something in greater numbers," she said. "There should be a range of tools with either basic, intermediate or advanced levels of functionality… We're looking for a variety of options, but for the most part, even the basics are not yet in place."

InformationWeek is conducting a survey on big data. Take our InformationWeek 2013 Big Data Survey now. Survey ends Aug. 31.



Related Reading




Currently we allow the following HTML tags in comments:

Single tags

These tags can be used alone and don't need an ending tag.

<br> Defines a single line break

<hr> Defines a horizontal line

Matching tags

These require an ending tag - e.g. <i>italic text</i>

<a> Defines an anchor

<b> Defines bold text

<big> Defines big text

<blockquote> Defines a long quotation

<caption> Defines a table caption

<cite> Defines a citation

<code> Defines computer code text

<em> Defines emphasized text

<fieldset> Defines a border around elements in a form

<h1> This is heading 1

<h2> This is heading 2

<h3> This is heading 3

<h4> This is heading 4

<h5> This is heading 5

<h6> This is heading 6

<i> Defines italic text

<p> Defines a paragraph

<pre> Defines preformatted text

<q> Defines a short quotation

<samp> Defines sample computer code text

<small> Defines small text

<span> Defines a section in a document

<s> Defines strikethrough text

<strike> Defines strikethrough text

<strong> Defines strong text

<sub> Defines subscripted text

<sup> Defines superscripted text

<u> Defines underlined text

BYTE encourages readers to engage in spirited, healthy debate, including taking us to task. However, BYTE moderates all comments posted to our site, and reserves the right to modify or remove any content that it determines to be derogatory, offensive, inflammatory, vulgar, irrelevant/off-topic, racist or obvious marketing/SPAM. BYTE further reserves the right to disable the profile of any commenter participating in said activities.

Disqus Tips To upload an avatar photo, first complete your Disqus profile. | View the list of supported HTML tags you can use to style comments. | Please read our commenting policy.

Follow InformationWeek

By The Numbers

What Are Your Primary Concerns About Using Big Data Software?

Base: 417 respondents at organizations using or planning to deploy data analytics, BI or statistical analysis software
Data: InformationWeek 2013 Analytics, Business Intelligence and Information Management Survey of 541 business technology professionals, October 2012

What Do You Think?

What's your attitude about SQL analysis on top of Hadoop?
We want fast, standard SQL analysis capabilities on Hadoop ASAP
Hadoop is for unstructured data; SQL is for relational databases
We'll give SQL on Hadoop a try, but relational DBs will remain the mainstay
Given strong SQL support on Hadoop, we'd nix the data warehouse
We're not interested in Hadoop
No opinion



Related Content

From Our Sponsor

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Five Big Data Challenges and How to Overcome Them with Visual Analytics

Business leaders often need a visual snapshot of data to quickly grasp and use it. This paper identifies five challenges in presenting data and how visual analytics can resolve them. Solutions are suggested to overcome the challenges of: speed, data clarity, data quality, displaying meaningful results, and dealing with outliers.

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Game-Changing Analytics: How IT Executives Can Use Analytics to Create Innovation and Business Success

Today's competitive advantage requires a deeper understanding of your business, your market and your customers. As an IT executive, you can drive that knowledge transformation. In this white paper, learn how to make decisions as a strategic business leader and three steps to begin an analytics initiative within your enterprise.

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

Data Visualization Techniques: From Basics to Big Data with SAS Visual Analytics

High-performance data visualization turns sophisticated analyses into meaningful graphics, leading to faster and smarter decision making. In this white paper, learn how visual analytics can transform big data, with additional features such as real-time functionality, mobile compatibility, robust applications for technical groups and accessibility for nontechnical users.

Big Data: Lessons from the Leaders

Big Data: Lessons from the Leaders

Financial performance, competitive advantage, operational efficiency, strategic decision making - every business goal can extract value from big data, and the time for doubt or inaction has long passed. In this Economist Intelligence Unit report, in-depth interviews with data pioneers reveal the link between the effective use of big data and the bottom line among other results.

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Decision-Driven Data Management: A Strategy for Better Decisions with Better Data

Which came first, the data or the decision? This white paper makes the case for having a decision in mind, then tailoring big data's volume, variety and velocity to achieve business results such as overcoming customer dissatisfaction or creating well-informed strategies in real time.

Informationweek Reports

Research: The Big Data Management Challenge

Research: The Big Data Management Challenge

The challenge of big data is real, but most organizations don't differentiate 'big data' from traditional data, and nearly 90% of respondents to our survey use conventional databases as the primary means of handling data. We'll help you understand what constitutes big data (it's not just size) and the numerous management challenges it poses.