5 Big Data Myths, Busted - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management // Big Data Analytics
08:06 AM
Connect Directly

5 Big Data Myths, Busted

Be right, not fast. Think business, not IT. Don't worry about dirty data. A big data guru shares contrarian advice about the worst whoppers.

Myth 4: Big companies are so diverse, it's impossible to agree on big data projects.
Here's where you need to look for common denominators. The most obvious example is usually the customer. In its work with British Airways and its Know Me loyalty program, Opera Solutions helped the airline see the connections across nine systems silos that each held data related to the customer experience.

"Just making a connection between the baggage-claim system and the loyalty system helped the airline take the simple step of sending an apology letter when a bag is lost, where previously they couldn't," says Gupta.

Customers aren't the only connection point among otherwise disparate data sets. Products, suppliers, and partners can also be axes of big data integration and insight. Here again, the advice is to find a focal point.

Myth 5: Big data demands data scientists, who are expensive and hard to come by.
Opera Solutions has studied data scientists' practices and has found that 80 percent of their efforts involve finding the signal in the noise -- that is, the time-consuming work is in capturing data and finding the patterns therein. But this is information management work that can be done by information management professionals, not data scientists. As for the remaining 20 percent of work that does require data science expertise -- the choosing of algorithms and statistical methods -- companies have to focus on making these choices repeatable.

Time-series analyses, for example, show up in many big data projects, including marketing optimization, trading and financial services, route optimization, inventory forecasting, and many more. Constructing a time-series analysis for the first time -- the combining of datasets that will feed the analytic system -- may require a lot of time-consuming upfront data management work. But the actual data science work -- determining the right algorithms and analysis techniques -- is not nearly as laborious or time-consuming.

The main point is that once you've done something once, it can be automated and repeatable, according to Gupta. "The mistake people make is starting all over again with each new project. You have to create a repeatable process or it will never scale."

So don't think of big data initiatives as requiring a colony of hard-to-find, expensive data scientists. Create a repeatable process and embed the intelligence gained along the way into software. That way your data science needs will start small and get smaller as you gain experience.

There's no single migration path to the next generation of enterprise communications and collaboration systems and services, and  Enterprise Connect delivers what you need to evaluate all the options. Register today and learn about the full range of platforms, services and applications that comprise modern communications and collaboration systems. Register with code MPIWK and save $200 on the entire event and Tuesday-Thursday conference passes or for a Free Expo pass. It happens in Orlando, Fla., March 17-19.

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
2 of 2
Comment  | 
Print  | 
More Insights
Newest First  |  Oldest First  |  Threaded View
User Rank: Moderator
11/19/2013 | 11:12:33 PM
Big Data.
Doug, great insight into these myths. We are seeing an increase in businesses seeking specialized skills to help address challenges that arose with the era of big data. The HPCC Systems platform from LexisNexis helps to fill this gap by allowing data analysts themselves to own the complete data lifecycle. Designed by data scientists, ECL is a declarative programming language used to express data algorithms across the entire HPCC platform. Their built-in analytics libraries for Machine Learning and BI integration provide a complete integrated solution from data ingestion and data processing to data delivery. HPCC Systems provides proven solutions to handle what are now called Big Data problems, and have been doing so for more than a decade. More at http://hpccsystems.com
User Rank: Ninja
11/19/2013 | 12:28:19 PM
Bite the Bullet
There is data that companies have "forgotten" they are capturing.  With a little work it be related to other data or analyzed in ways that make it truly valuable.  The bad/dirty data issue is just one that people have to bite the bullet on.  It's not going to get any better (or smaller) by waiting.
D. Henschen
D. Henschen,
User Rank: Author
11/19/2013 | 9:45:09 AM
Use the data you've already captured
Arnab Gupta makes the point that many successful big data deployments are taking advantage of data companies have already captured but aren't using. CRM comment fields about customers, for example, are easier to crack than the fire hoses of comments out there on social networks.
Augmented Analytics Drives Next Wave of AI, Machine Learning, BI
Jessica Davis, Senior Editor, Enterprise Apps,  3/19/2020
How Startup Innovation Can Help Enterprises Face COVID-19
Joao-Pierre S. Ruth, Senior Writer,  3/24/2020
Enterprise Guide to Robotic Process Automation
Cathleen Gagne, Managing Editor, InformationWeek,  3/23/2020
White Papers
Register for InformationWeek Newsletters
Current Issue
IT Careers: Tech Drives Constant Change
Advances in information technology and management concepts mean that IT professionals must update their skill sets, even their career goals on an almost yearly basis. In this IT Trend Report, experts share advice on how IT pros can keep up with this every-changing job market. Read it today!
Flash Poll