Forget about the cautions and clichés, the easy generalizations and the dark warnings. The big data domain is rife with persistent myths that block progress -- or worse, get people headed in the wrong direction.
Taking on some of the big data realm's worst whoppers, Arnab Gupta, CEO of analytics platform provider Opera Solutions, insists that successful big data projects aren't boil-the-ocean, years-long IT infrastructure projects in the mold of data warehouse deployments. Rather, they must be focused business initiatives. Here's his take on five leading myths.
Myth 1: Big data is the next paradigm, and if you don't make a change right away you'll get left behind.
Not so fast, says Gupta. It's this kind of thinking that gets people deploying Hadoop clusters and stockpiling data before they have any idea what they want to do with the information.
"The problem with first vs. last thinking is that you assume that if you're first, you're going to get a competitive advantage, but that won't be the case if you don't focus business results that will give you a business advantage," Gupta explains.
[ Maybe you already have the data guru you've been seeking. See Can Your SysAdmin Be Your Data Scientist? ]
Many big data initiatives seem to be experiments because people just aren't used to working big volumes and varieties of data. By starting with a specific known problem, you'll reduce the scope of change management and of pioneering required to get to a big data breakthrough.
Myth 2: Big data is an IT problem.
Closely related to Myth 1, this kind of thinking can get you in trouble. The danger in starting with IT experimentation is ending up with boil-the-ocean IT infrastructure projects. Avoid the trap of "build it and they will come" thinking.
"Most of the investments in big data projects have gone into information management infrastructure. If you start with the business use case, you may still be investing in infrastructure, but it will be for precisely the tools you need to solve a specific business need."
Myth 3: Our data is so messed up we can't possibly master big data.
There's no doubt that enterprise data is often flawed, but data quality, master data management, and data governance tools have made it easier to clean up the mess. "The huge investments companies have made in data management are now paying massive dividends."
Where companies used to have to invent tools and come up with data management, data analysis, and data visualization systems on their own, they can now turn to packaged applications on all fronts. These tools have made it far easier to capture, clean, manage, and analyze information. So don't let fear of bad data become a mental stumbling block.
Myth 4: Big companies are so diverse, it's impossible to agree on big data projects.
Here's where you need to look for common denominators. The most obvious example is usually the customer. In its work with British Airways and its Know Me loyalty program, Opera Solutions helped the airline see the connections across nine systems silos that each held data related to the customer experience.
"Just making a connection between the baggage-claim system and the loyalty system helped the airline take the simple step of sending an apology letter when a bag is lost, where previously they couldn't," says Gupta.
Customers aren't the only connection point among otherwise disparate data sets. Products, suppliers, and partners can also be axes of big data integration and insight. Here again, the advice is to find a focal point.
Myth 5: Big data demands data scientists, who are expensive and hard to come by.
Opera Solutions has studied data scientists' practices and has found that 80 percent of their efforts involve finding the signal in the noise -- that is, the time-consuming work is in capturing data and finding the patterns therein. But this is information management work that can be done by information management professionals, not data scientists. As for the remaining 20 percent of work that does require data science expertise -- the choosing of algorithms and statistical methods -- companies have to focus on making these choices repeatable.
Time-series analyses, for example, show up in many big data projects, including marketing optimization, trading and financial services, route optimization, inventory forecasting, and many more. Constructing a time-series analysis for the first time -- the combining of datasets that will feed the analytic system -- may require a lot of time-consuming upfront data management work. But the actual data science work -- determining the right algorithms and analysis techniques -- is not nearly as laborious or time-consuming.
The main point is that once you've done something once, it can be automated and repeatable, according to Gupta. "The mistake people make is starting all over again with each new project. You have to create a repeatable process or it will never scale."
So don't think of big data initiatives as requiring a colony of hard-to-find, expensive data scientists. Create a repeatable process and embed the intelligence gained along the way into software. That way your data science needs will start small and get smaller as you gain experience.
There's no single migration path to the next generation of enterprise communications and collaboration systems and services, and Enterprise Connect delivers what you need to evaluate all the options. Register today and learn about the full range of platforms, services and applications that comprise modern communications and collaboration systems. Register with code MPIWK and save $200 on the entire event and Tuesday-Thursday conference passes or for a Free Expo pass. It happens in Orlando, Fla., March 17-19.