Despite the setbacks of a challenging year, the volume and importance of analytics and corporate data to corporate strategy and execution grew. Harnessing and utilizing as much corporate data as possible is not only important, it’s an imperative.
Fueling the imperative are several new approaches, technologies and platforms. It is indeed an exciting time to be working with enterprise data and analytics if you like progress. Enterprise analytics in 2021 is going to be an exciting journey. Here are the trends to watch.
Cloud computing leads technology rebound
Corporate technology spend will rise and the majority of that will go toward data and analytics -- data management, data privacy, data intensive projects, etc. Cloud computing capabilities make it possible to rapidly try and rapidly deploy these projects like never before. Innovations by hyperconverged vendors are propelling this momentum. For example, AWS recently announced EBSio2 Block Express volumes. This is SAN for the cloud. They also announced Gp3 volumes, which let you set SLAs for IOPS. Another big announcement is automatic tiering and replication, which automatically moves data to colder storage tiers.
Traditional storage is growing only modestly, forcing traditional storage players to pivot. Repatriation to on-premises makes headlines but happens only sparsely.
COVID-19 has only exacerbated the need for companies to be focused and efficient and therefore cloud-based.
Artificial intelligence and machine learning
Organizations are increasing a focus on artificial intelligence and machine learning (AI/ML). Leading organizations are embracing this revolution that will follow the widely acknowledged information revolution and are well into full company reengineering with AI/ML. With dozens of models in production, these companies are going beyond initial use cases.
By studying corporate goals and roadmaps, there is seldom an activity there that could not be injected with AI/ML. Common areas of initial focus have been automation and customer experience, but leading organizations are expanding into downside protection, predictive analytics and the supply chain. Other organizations and applications will follow in 2021.
Collaborative ML will begin its multi-year journey as a preferred ML approach. This approach combines human expertise and ML and is a good fit for these early days of ML, and the growing corporate comfort with it, as we bridge over to more reliance on ML in future years. Collaborative ML uses ML as an augmentation to human thought in data-driven decision-making. Collaborative ML will be mostly evidentiary in customer interaction initiatives in 2021.
ML model deployment will take center stage in 2021. Model deployment will rise to the top activity of data professionals, with models getting increasingly sophisticated. However, most organizations will struggle with -- or should I say without -- MLOps.
MLOps applies DevOps principles to ML delivery. Development of models can benefit from an iterative approach, so the domain can be better understood, and the models improved. The process, MLOps, needs a highly automated pipeline of tools, repositories to store and keep track of models, code, data lineage, and a target environment which can be deployed into at speed. There is a large amount of trial and error in ML, and therefore exercise of its process. MLOps can help organizations save on infrastructure costs and speed up model deployment while reducing operational burdens.
Success with MLOps could account for 50% or more of the value delivery of ML this year.
Data lakes and cloud storage
Deploying data lakes was a large trend in 2020 yet it’s still strong enough to be a trend for this year.
Data lakes deployed in 2021 will follow the trend of utilizing cloud storage and will be connected to the relationally based data warehouse in a “lakehouse” concept. Earlier lakes deployed are seeing the need for this now. The retrofit will also be a major activity for 2021.
Interesting advancements in cloud storage are also ratcheting up their usefulness. For example, Project Nessie provides a Git-like experience for data lakes, and Apache Iceberg is now an option that provides transactional consistency, rollbacks and time travel for a data lake. Nessie also enables transactions to span multiple users and engines like Spark, Kakfa, Hive and Dremio.
Data Lakes are part of an expanded modern data stack. While source data, data integration, and data access used to form a coherent stack, the stacks in 2021 will be expanding to include data analytics, data science, a data catalog, workload management, deployment and security components.
While I don’t see any falloff in the use of relational database technologies as a result of these developments, they certainly keep storage layer selection in play as a hot discussion point for this year.
Trends are important to watch because they become the wants of your customer. For the ones that stick, it’s better to be at the beginning of the trend than at the end. These trends should give your business ideas and, all things being equal, should heavily influence the activities undertaken in your corporation in 2021.
William McKnight has advised many of the world's best-known organizations. His strategies form the information management plan for leading companies in various industries. He is a prolific author and a popular keynote speaker and trainer. He has performed dozens of benchmarks on leading database, data lake, streaming and data integration products. William is a leading global influencer in data warehousing and master data management and he leads McKnight Consulting Group, which has twice placed on the Inc. 5000 list. He can be reached at [email protected].