Top Reasons Big Data in the Cloud Is Raining on On-Premise - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management
Commentary
5/30/2019
07:00 AM
Alex Gorelik, Founder and CTO, Waterline Data
Alex Gorelik, Founder and CTO, Waterline Data
Commentary
50%
50%

Top Reasons Big Data in the Cloud Is Raining on On-Premise

The marketplace is finally flush with analytics-specific services that deliver on the cloud's promise of reduced cost and complexity, and greater agility.

According to analysts, the cloud revolution is well underway. Synergy Research says cloud services are eating into on-premise technology growth. Forrester says cloud computing is “coming of age as the foundation for enterprise digital transformation” in its Predictions 2019: Cloud Computing report.

However, while companies have spent the last few years shifting a wide variety of IT components to the cloud, they have been much slower to move big data services away from their internal infrastructures. Early adopters of Hadoop and other large-scale data analytics technologies had to keep things in-house because these were essentially still experimental technologies.

Now, those companies that are starting their analytics forays are finding that Hadoop is simply too damn hard, and cloud vendors have come a long way with their data services. Take it all together and companies are finding that the cloud better suits their big data needs for the following reasons:

The physical implementation of a cluster is too much effort

Why buy a cluster of servers when you can have AWS or Azure and spin up a bunch of them for you? As is the case with all cloud services, you don’t have to order the hardware, nor do you need to power or even cable them up. Most of the time just constructing the physical environment alone is hard enough, not to mention getting the actual software up and running.

Skills shortage

This is a major problem that continues to plague big data. Cloud vendors are continually chipping away at big data’s ease-of-use problem by providing more automation. With the ability to automatically spin massive computing clusters up and down, cloud services suppliers are significantly reducing the need for people who have deep expertise in running them, which is important because these specialists remain hard to find.

Reduced risk

One huge advantage of the cloud, especially for big data implementations, is that they dramatically mitigate risk. You don’t know up front if your data will contain great revelations. But with cloud vendors, you can spin up a cluster, do some work and then spin it back down if you can’t unearth insights of any value, all without incurring much overall project risk. Better yet, if you do find something potentially game-changing in your data, you can then quickly spin up more systems to scale your project without spending time and money purchasing and implementing systems and software.

Of course, scaling up and down does not work for all use cases. Sometimes you must ramp up systems in the cloud and keep them running due to the nature of the project or the data. Nonetheless, it’s a lot easier to get that done in the cloud, which contributes greatly to risk reduction.

Incremental cost vs. big up-front investments

Directly related to the risk point above is the associated cost. Big data–related cloud deployments allow consumers to pay only for the services they use. The good news is that if your experimental project yields little value, your losses will be reduced significantly, assuming you fail fast. By contrast, your initiative will be an expensive failure if you were to buy all the equipment only to see your project get shut down.

Elasticity

The elasticity of the cloud allows faster time to insight. When you build a physical cluster, you are limited in how much processing you can do. A massive analytics job could take 10 hours on a 100-node cluster. With the cloud, for the same price, you can spin 1,000 nodes to run your job in an hour.

Elasticity is also key to helping organizations share massive data sets. Moving large data sets around is always a challenge. Even sharing them within an organization can be problematic because adding new users introduces load on a system. For example, if business unit A wants access to business unit B’s data, there might not be enough compute power to support more users. When the data is sitting in the cloud, it’s much easier to add capacity without having to duplicate the data. (Even if data needs to be duplicated, that process can happen quickly and easily in the cloud.)

Big data may have been late to the party, but the marketplace is finally flush with analytics-specific services that deliver on the cloud’s promise of reduced cost and complexity, and greater agility. 

Alex Gorelik is author of O’Reilly Media's “The Enterprise Big Data Lake: Delivering the Promise of Big Data and Data Science”, and the founder and CTO of data cataloging company Waterline Data. Prior to Waterline Data, Gorelik served as senior vice president and general manager of Informatica’s Data Quality Business Unit, driving R&D, product marketing and product management for an $80 million business. He joined Informatica from IBM, where he was an IBM Distinguished Engineer for the Infosphere team. IBM acquired Gorelik’s second startup, Exeros (now Infosphere Discovery), where he was founder, CTO and vice president of engineering. Previously, he was cofounder, CTO and vice president of engineering at Acta Technology, a pioneering ETL and EII company, which was subsequently acquired by Business Objects.

 

The InformationWeek community brings together IT practitioners and industry experts with IT advice, education, and opinions. We strive to highlight technology executives and subject matter experts and use their knowledge and experiences to help our audience of IT ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Commentary
Enterprise Guide to Digital Transformation
Cathleen Gagne, Managing Editor, InformationWeek,  8/13/2019
Slideshows
IT Careers: How to Get a Job as a Site Reliability Engineer
Cynthia Harvey, Freelance Journalist, InformationWeek,  7/31/2019
Commentary
AI Ethics Guidelines Every CIO Should Read
Guest Commentary, Guest Commentary,  8/7/2019
White Papers
Register for InformationWeek Newsletters
State of the Cloud
State of the Cloud
Cloud has drastically changed how IT organizations consume and deploy services in the digital age. This research report will delve into public, private and hybrid cloud adoption trends, with a special focus on infrastructure as a service and its role in the enterprise. Find out the challenges organizations are experiencing, and the technologies and strategies they are using to manage and mitigate those challenges today.
Video
Current Issue
Data Science and AI in the Fast Lane
This IT Trend Report will help you gain insight into how quickly and dramatically data science is influencing how enterprises are managed and where they will derive business success. Read the report today!
Slideshows
Flash Poll