Data Prep Becomes an Enterprise Effort

The days of data prep by Excel spreadsheet may very well be numbered as more organizations look to create or buy platforms to simplify the process for everyone. Charles Schwab Corp.'s users have cut their data prep time in half by using a new data prep capability in their data analytics and visualization platform.

Jessica Davis, Senior Editor

May 3, 2018

3 Min Read
<p>(Image: Pexels/Pixabay)</p>

It's not a secret that data preparation is one of the most time-consuming tasks for data analysts and data scientists. Surveys show that the vast majority their time is spent on this repetitive task, with some estimates showing it takes up as much as 80% of of a data professional's time. Yet, it's an essential task. If you want to get the value out of your data and analytics investments, you need to prepare the data first.  

Now that organizations are adding more data sources to improve their analytics and putting self-service tools in the hands of users, there's a lot more data, and a lot more people are working with it. Also, data scientists remain in short supply. Organizations have a vested interest in better tools to automate data preparation to save time so that data scientists can work on high-value tasks.

Calling data preparation the "most time-consuming task in analytics and BI," Gartner said that the effort is "evolving from a self-service activity to an enterprise imperative."

Data preparation used to support only self-service use cases, according to a December 2017 market report by Gartner. Now these platforms have evolved to enable data and analytics teams to build agile and searchable datasets at an enterprise scale for distributed content authors, the company said in the report.

Gartner also notes that the market for data preparation tools is really in flux right now. Enterprises can choose from a host of stand-alone tools and from new capabilities offered by their existing analytics/business intelligence or data science platform providers.  For instance, analytics platforms from Alteryx, Qlik, Microsoft Power BI, Oracle, SAP, SAS, and IBM, among many others, all offer this capability today. Tableau announced the general release of this capability, too, in April 2018, called Tableau Prep.

Charles Schwab

Bank and brokerage firm Charles Schwab Corp. was among the customers piloting the Tableau data prep technology in the months before the general release. The company relies heavily on its data and uses a variety of tools to help, including Alteryx and Teradata. But it's also a big Tableau shop with 520 desktop license users, and thousands of server users. About half of the company is using Tableau daily, according to Charles Schwab Tableau Administrator/Engineer Gessica Briggs-Sullivan, who spoke with InformationWeek in an interview. Charles Schwab uses the platform for everything from HR reports, to ITSM, to retail, to 401(k)s.

Before the pilot, Tableau users doing data prep would often use Excel, according to Deepak Reddy Mogula, an information systems engineer at Charles Schwab, who also spoke with InformationWeek. These efforts took a lot of time. The new Tableau capability has made data prep a lot easier and cut the time required by more than half, he said.

Part of what made it so much easier is the visual nature of the new capability, Mogula said. That was by design, Tableau's Chief Product Officer Francois Ajenstat told me in an interview. The goal was to use the visual interface to make data prep "accessible to a many people as possible," Ajenstat said. The tool lets people aggregate and filter data with a visual interface that does not require writing macros and simplifies fixing data issues such as different spellings and leading spaces.

At Charles Schwab the top benefit so far of using Tableau Prep has been cutting down on the time required for data preparation. Briggs-Sullivan said that it's still early to tell if the new capability will be helpful for improving data quality, too. But Ajenstat said that other customers have used the tool to identify data quality problems that hadn't been discovered previously.

It's all part of the evolution of the data analytics platform, according to the Gartner data prep report. The firm said that by next year "data and analytics organizations that provide agile, curated internal and external datasets for a range of content authors will realize twice the business benefits as those that do not."

What's more, Gartner believes that by 2023 all this will converge into a single modern enterprise information management platform for most new analytics projects.

About the Author

Jessica Davis

Senior Editor

Jessica Davis is a Senior Editor at InformationWeek. She covers enterprise IT leadership, careers, artificial intelligence, data and analytics, and enterprise software. She has spent a career covering the intersection of business and technology. Follow her on twitter: @jessicadavis.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights