Estimating the Total Costs of Your Cloud Analytics Platform

Addressing modern real-world use cases requires the application of multiple functions working together on the data. Here are some things to know to cost out the stack effectively.

William McKnight, President, McKnight Consulting Group

April 22, 2022

4 Min Read
balloons floating away into the clouds with a dollar sign holding it together
peshkov via Adobe Stock

Organizations today need a broad set of enterprise data cloud services with key data functionality to modernize projects and utilize machine learning. They need a platform designed to address multi-faceted needs by offering multi-function data management and analytics to solve the enterprise’s most pressing data and analytic challenges in a streamlined fashion. They need a selection that allows a worry-less experience with the architecture and its components.

Addressing real-world use cases requires the application of multiple functions working together on the same data. Building the data ecosystem to support this converged data use case can be a daunting task. There are many solutions and alternatives, and too many vendor claims. Building the entire cloud technology platform to address enterprise-wide data challenges and needs can be achieved one of three ways: build the stack within the same cloud vendor’s umbrella of products; stitch together various vendor product offerings; or utilize a single vendor multi-purpose stack.

Some architectures look integrated -- but may be more complex and more expensive. When almost every additional demand of performance, scale, or analytics can only be met by adding new resources, it gets expensive. Stacks are innumerable but a few are popular.

Highlights of the Azure stack include Synapse, Synapse SQL Pool, Azure Data Factory, Azure Stream Analytics, Azure Databricks Premium Tier, HDInsight, Power BI Professional, Azure Machine Learning, Azure Active Directory P1, and Azure Purview.

The AWS stack includes Amazon Redshift, Glue, Kinesis, EMD, Spectrum, Quicksight, SageMaker, IAM, and AWS Glue Data Catalog.

The Google stack is BigQuery, Dataflow, Dataproc, Cloud IAM and Google Data Catalog.

Another stack could be called the Snowflake Stack since Snowflake is the featured vendor for dedicated compute, storage, and data exploration, but it is really a multi-vendor heterogeneous stack. This includes a data integration tool like Informatica or Talend, Kafka Confluent Cloud, Azure Databricks Premium Tier, Cloudera Data Hub + S3, Tableau, SageMaker, Amazon IAM, and a Data Catalog like Alation or Collibra.

The cost numbers below will focus on the stack costs of projects, including development costs. If you are doing a full ROI for these projects, you would need to consider cost of money, a probability distribution, the n-ordered benefits and determining and using only what is tangible.

Also, when projects are done in an agile fashion with functionality metered out, it can be difficult to say when initial project costs end, and costs go into maintenance. I use the usual enterprise standard and draw the line between initial costs and maintenance around the point where most of the functionality is delivered. In this context, it is very important to consider both the accumulated costs to that point as well as the “maintenance” costs for bug fixes, enhancements, and updates on an ongoing basis afterwards.

Breaking Down Costs

For a single (multi-quarter) project on these stacks, including people costs, will cost between $2.7M and $8M in a medium enterprise and $7M to $23M in a large enterprise. Using the modern stack, the first time will pave the way for future uses.

For all enterprises uses of the modern platform including production costs, a 2-year total cost of ownership for medium enterprises, ranges from $6M to $15M. For large enterprises, i.e., over $1B revenue, the cost ranges from $17M to $42M.

Perils of TCO measurement aside, enterprise projects should be attaining high returns. However, if the application is not being implemented to a modern standard, using a machine learning stack, there are huge inefficiencies and competitive gaps in the functionality. Therefore, many enterprises are considering leveling up or migrating these use cases now and reaping the benefits.

A full analytics platform in the cloud is more than just a data warehouse, cloud storage, and a business intelligence solution. There are at least 11 categories needed to establish both equivalence among analytics stacks’ offerings and a fair estimate of costing. All these components are essential to having a full enterprise-ready analytics stack.

The categories, or components in a modern enterprise analytics stack, that I included in the TCO calculations are as follows:

  • Dedicated Compute

  • Storage

  • Data Integration

  • Streaming

  • Spark Analytics

  • Data Exploration

  • Data Lake

  • Business Intelligence

  • Machine Learning

  • Identity Management

  • Data Catalog

These stacks can be used for a variety of machine learning projects including customer analytics, fraud detection, supply chain optimization and IoT analytics. Of course, each project could use a slightly different set of components, or quantity of each component.

About the Author(s)

William McKnight

President, McKnight Consulting Group

William McKnight has advised many of the world's best-known organizations. His strategies form the information management plan for leading companies in various industries. He is a prolific author and a popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database, data lake, streaming and data integration products. William is a leading global influencer in data warehousing and master data management and he leads McKnight Consulting Group, which has twice placed on the Inc. 5000 list. He can be reached at [email protected].

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights