Breaking Silos and Curating Data for Impactful AI

AI requires both high-quality data and an infrastructure that ensures data is always available. Without that foundation, we’ll never reach the future.

Guest Commentary, Guest Commentary

October 7, 2019

5 Min Read
Image: WrightStudio -

There is a symbiotic relationship between data and artificial intelligence. We use data to create the foundation of a successful implementation, and AI is then used to further interpret and refine that data. It is a constant feedback loop where one affects the effectiveness of the other. For machine learning to have an impact, data needs to be curated, high-quality and easily accessible. Successfully training such a technology without bias is a bigger challenge than one might expect.

Architecting an IT infrastructure that can break down data silos and make information available and actionable, while at the same time ensuring security and compliance, is already a major challenge for enterprises. Add in the desire to run that data through machine learning and AI functions and things become even more challenging -- especially in the age of cloud -- when data is widely dispersed. 

AI offers so much promise as an enterprise technology: taking on decision-making tasks and helping employees perform their jobs better. But to realize that future, organizations need to understand how to prepare their architecture and data for an AI-driven future.

Enterprise data challenge

Cloud has offered the enterprise near-limitless resources for compute and storage, which has made it possible to retain nearly limitless amounts of data, but this has been a blessing and a curse. While it presents tremendous opportunity to analyze data and derive insights into financial projections, customer demand and more, the sheer volume of data has made it difficult to easily manage and utilize for such functions. Factor in additional requirements from compliance to data quality control, and you have a rather complex situation.

The increasing adoption of cloud services in the enterprise and the continued use of legacy on-premises and hybrid solutions have created vast data silos that are often difficult to identify, let alone consolidate and analyze. These silos may not even be known by IT teams and others and have the potential to severely limit analytics and intelligence tools. 

Combined, this has led to a situation where enterprises are capturing enormous amounts of data, but know very little about it, including the amount of data that’s being stored or even where it lives. What many enterprises currently have is a complex web of on-premises and cloud data stores, each with its own management, storage, privacy and regulatory concerns. The reality is that as data becomes more fragmented, enterprises need to take a hard look at centralizing its management. This is the only way to wrap our hands around such an unwieldy amount of data and turn it into something that can positively impact the larger business. 

Why data quality matters

The goal of machine learning is to perform data-driven tasks with a level of skill, precision and speed that is far greater than what a human counterpart could provide. In the same way a person wouldn’t be able to learn a skill from the wrong textbook, a machine learning process trying to understand a poorly managed data set will fail to learn anything valuable. Conversely, an incomplete data set can help create a process that is narrower or skewed. There’s a balance required when building these data sets. 

AI, which is currently an incredibly exciting trend in the enterprise space, also can’t be built on incomplete, erroneous data sets. Much of what AI is meant to accomplish involves predictive decision-making, modeling and analysis, none of which are possible if data is incomplete, dirty or siloed. Algorithms trained to analyze a specific trend need access to as much good data as possible that can go into analyzing that trend, which may be held in separate data silos. It’s like a student that is writing a research paper; they likely need to reference sources from different sections of the library but having everything accessible under the same roof improves the process immensely.

Building an infrastructure to support AI innovation

Enterprises are collecting more data at a faster pace, and generating insights requires an approach to infrastructure that breaks down data silos and ensures high-quality data is readily available. IT departments need to broaden their focus beyond collection and retention, and begin to emphasize architecture, management and curation. Specifically, the creation of a data lake that allows for a single repository of data, as opposed to a siloed approach that puts critical information out of reach.

AI is one of the transformational technologies of the 21st century, and it promises to reshape modern businesses and mold the future of work. In fact, we are already seeing its impact in places like customer experience, where it helps create a customized and curated experience for each buyer. But it’s not a plug-and-play solution, and it requires both high-quality data and an infrastructure that ensures data is always available. Without that foundation, we’ll never reach the future.


Jaspreet Singh is the founder and CEO of Druva. An entrepreneur at heart, he bootstrapped the company, delivering the first and only cloud native data management offering that is disrupting the classic data protection market. Prior to starting Druva, Singh held foundational roles at Veritas and Ensim Corp. Additionally, he holds multiple patents and has a B.S. in Computer Science from the Indian Institute of Technology, Guwahati.

About the Author(s)

Guest Commentary

Guest Commentary

The InformationWeek community brings together IT practitioners and industry experts with IT advice, education, and opinions. We strive to highlight technology executives and subject matter experts and use their knowledge and experiences to help our audience of IT professionals in a meaningful way. We publish Guest Commentaries from IT practitioners, industry analysts, technology evangelists, and researchers in the field. We are focusing on four main topics: cloud computing; DevOps; data and analytics; and IT leadership and career development. We aim to offer objective, practical advice to our audience on those topics from people who have deep experience in these topics and know the ropes. Guest Commentaries must be vendor neutral. We don't publish articles that promote the writer's company or product.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like

More Insights