Don’t Lift and Shift Big Data Analytics, Lift and Optimize

Machine Learning & AI

Advancing through the digital era with a greater use of self-service analytics doesn't mean that you have to trash proven analytics tools and techniques.

Jen Underwood, Impact Analytix

February 26, 2018

3 Min Read

Image: Pixabay

Increasing volumes and varieties of data, combined with self-service data access, is overwhelming existing reporting tools and infrastructure. To scale for digital era demands, organizations are adopting new cloud Hadoop-based, data lake architectures and next generation OLAP semantic layers. As early adopters make this move, they are learning from expensive “lift and shift” failures. “Lift and optimize” approaches are proving to be far more cost-effective and successful.

OLAP is not dead

Contrary to the rumors of OLAP’s impending death when in-memory analytics entered the market years ago, OLAP is still not dead. The rules never changed. Exponential growth in data sources, varieties and types did rapidly surpass what traditional OLAP solutions were designed to handle. Thus, new OLAP on Hadoop technologies such as Apache Kylin, AtScale, Kyvos Insights, and other similar solutions were invented for the big data analytics world.

Today OLAP is in high demand. Last year, a CEO of a large system implementation firm in the United States expressed the unicorn was not a data scientist. Their firm received over 300 data scientist resumes for one open role. They could not find any OLAP talent with classic Kimball technique, dimensional modeling skills. Surprisingly, the unicorn was someone with OLAP design skills.

Assess and optimize for the cloud

There are many benefits for both the business and IT to use new OLAP on Hadoop solutions. Speed, scale, simplicity and saving money on cloud bills are usually cited as the top reasons for migration from legacy on-premises OLAP offerings. OLAP on Hadoop solutions make big data analytics easy and familiar. The business can use it within mainstream self-service BI tools such as Excel, SAS, Tableau, Qlik, and TIBCO Spotfire that they already know and love. Without a user-friendly, governed semantic layer that can intelligently cache and pre-aggregate, self-service big data analytics would be overwhelming, frustrating, and expensive. Keep in mind that cloud analytics can be billed by compute or query scan usage.

Implementing the right big data analytics infrastructure is vital for future success. It cuts capital expenditures, maintenance and operating costs while also improving reporting agility. Since most OLAP on Hadoop solutions support standard analytics tools, SQL, and MDX languages, inefficient data copying, ETL routines, and silo semantic model processes can be redesigned to use modern ingest pipelines, schema-on-read, and shared semantic layers across numerous BI and analytics tools. As you assess existing analytical environments, review the hidden recurring pains and costs caused by analytical silos that could be eradicated with a new approach when migrating on-premises analytics architectures to cloud analytics architectures.

Beware of breaking changes

Although the “lift and shift” strategies might sound quick and easy, they also might not work. After analysts connect reports to big data sources, they will usually notice errors and timeouts. Reporting errors are usually related to subtle big data query syntax differences, missing script functions, or other unavailable elements. Timeouts are where OLAP on Hadoop saves the day.

After interviewing several groups that rolled out OLAP on Hadoop, two key lessons were shared. The first lesson learned was to treat your big data analytics project as a real project. Don’t assume that you can just change a data source in your reports. As Benjamin Franklin was quoted, “an ounce of prevention is worth a pound of cure”. Don’t undermine big data analytics success by failing to understand and plan for the move.

The second lesson learned was to assess your analytics environment and look for optimization opportunities. Modern OLAP on Hadoop semantic layers can work with numerous analytical tools. As you invest in improving analytics architecture, also review the end-to-end analytical processes. You may uncover a plethora of ways to reduce non-value added work while also removing proprietary analytics silos.

[Jen Underwood returns to Interop ITX 2018 in Las Vegas on Monday, April 30, presenting a workshop Introduction to Machine Learning for Mere Mortals: Solving Common Business Problems with Data Science.]

About the Author(s)

Jen Underwood

Impact Analytix

Jen Underwood, founder of Impact Analytix, LLC, is a recognized analytics industry expert. She has a unique blend of product management, design and over 20 years of "hands-on" development of data warehouses, reporting, visualization and advanced analytics solutions. In addition to keeping a constant pulse on industry trends, she enjoys digging into oceans of data. Jen is honored to be an IBM Analytics Insider, SAS contributor, former Tableau Zen Master, and active analytics community member.

In the past, Jen has held worldwide product management roles at Microsoft and served as a technical lead for system implementation firms. She has launched new analytics products and turned around failed projects. Today she provides industry thought leadership, advisory, strategy, and market research.

Jen has a Bachelor of Business Administration - Marketing, Cum Laude from the University of Wisconsin, Milwaukee and a post-graduate certificate in Computer Science - Data Mining from the University of California, San Diego.

See more from Jen Underwood

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

About the Author(s)

Editor's Choice