Signing on the Dotted Line for Better Data Quality

Here’s what leaders need to know about the rapidly emerging area of data contracts: what they are; what businesses stand to gain; how they help with governance; and more.

Mathisse de Strooper, Director of Product, Soda

June 18, 2023

4 Min Read
signing on the dotted line
Image Source via Alamy Stock

It’s a well-known fact that more data than ever before is being collected and analyzed, and with that, there has been an explosion in data sharing.

Whether between two employees at the same company, or business to business, as more data is shared, and different products are built on top of that data, two key challenges arise: first, establishing who owns the data; and second, mitigating the possibility of data quality issues caused by unexpected changes to that data. The question is, how can organizations overcome these challenges?

Playing By the Rules

Since the start of the digital era, businesses have been gathering a lot of new data. 

While some of it isn’t so useful, to maximize the value of the data collected, businesses need to begin to outline the rules within which data can be used. Creating clear rules will ensure that two of the biggest data challenges facing organizations -- accountability and ownership of data, and unexpected changes to datasets -- can be mitigated. Like an instruction manual for building furniture, these rules ensure that, from a governance perspective, the ownership of the data is clear, the data itself is functional, and is always being used to drive business outcomes.

That said, it’s one thing acknowledging the need for these rules, but enforcing them across an organization without the appropriate governance requirements is another thing entirely. 

To borrow a construction metaphor (data is the building block for almost everything we do in the digital era, after all), data is a lot like concrete. They both serve as the foundations for things we experience every day in our lives, whether it's the roads we drive on and the houses we live in, or the search engines we use daily and our favorite social media channels. They can also both be used in a variety of different ways: To ensure you get the right concrete to suit your needs, you’ll need to sign a contract with suppliers to ensure you both understand what is required, and what is being supplied. With so many different uses for data, the data sector needs something similar -- and that’s where data contracts come in.

At their most basic, a data contract is an agreement put in place between data consumers and data producers that outlines what good data is (and what it isn’t). Think of it as being like an API for data -- you define what can be expected by your dataset, including information about any missing strings, and defining data quality checks to ensure a basic level of data quality. 

The idea is that contracts are published at the same time as the data sets are published. As an engineer, you are sharing a contract at the same time as the product to clearly lay out what the product or dataset is; as a consumer, you take a contract to ensure the data product you are using isn’t changed due to edits to its foundational datasets without first being alerted about it (and given time to remedy any issues).

Signing Away Bad Data

Data contracts are an important counter against bad data being able to affect business-wide data operations by solving potential problems at the source. It is, in effect, a way of avoiding the “garbage in, garbage out” scenario that so many data experts fear by scanning for issues based on what the contract says the data should be -- if the contract says one thing, and the data says another, for example, an alert will pop up and lead on to simple incident management.

Of course, as well as being a great fix for data quality issues, data contracts make collaboration more seamless. They allow for far greater levels of cooperation between data producers and consumers by ensuring transparency around foundational datasets. Data contracts allow for producers to define what they have done with a dataset, as well as helping to automatically populate data catalogs to ensure standard definitions between producers and consumers.

Data contracts could also be used to automate permissions, ensuring the right people have access quickly to datasets and products. If a contract says someone should have access, a tool could then take those rules and automatically set permissions, freeing up data owners to focus on other more pressing tasks. In this way, data contracts aren’t just an important tool in the data governance arsenal, but they are actively streamlining collaboration and making processes more efficient.

Data contracts have the potential to lead to a paradigm shift in data by empowering organizations to overcome some of the biggest challenges they currently face when it comes to their data operations. They have the power to prevent a lot of data issues, by improving the workflows and communication between producers and consumers, as well as easing the integration of the large variety of different tools. To get higher quality data and improved workflows, all data professionals need to do is sign on the dotted line.

About the Author

Mathisse de Strooper

Director of Product, Soda, Soda

Mathisse de Strooper is the Director of Product at data quality management company, Soda. With over a decade of experience in the data space, Mathisse was previously Product Manager at Collibra.

Never Miss a Beat: Get a snapshot of the issues affecting the IT industry straight to your inbox.

You May Also Like


More Insights