AI/Machine Learning

AI and Machine Learning Need Quality Assurance

Artificial intelligence and machine learning are not “set and forget” technologies. They need quality assurance to operate, and continue to operate, as intended.

Artificial intelligence and machine learning are the backbone for advanced decision-making, monitoring, medicine, cybersecurity, and other critical applications. To fully rely on AI and ML output, you need to ensure that your data is accurate, unbiased, error-free, and meets requirements. To ensure AI and ML output meets expectations, you also need a quality assurance (QA) plan that thoroughly tests for any conditions that could cause inaccurate results in your applications.

For small businesses or those experimenting with AI, QA seems like an unnecessary step that slows down deployments and increases cost. However, QA ensures that the output used to base many of your business decisions remains consistent. Errors in AI and ML output can not only affect application reliability, but they can also be life threatening in healthcare and medicine.

Challenges with QA Planning

In many cases, the challenges with QA planning involve knowing where to start, who can help, and what needs to be tested. Testing must also continue in production, and many businesses don’t have the tools necessary for automated testing in production.

The first step in the QA process is to plan. For small businesses, engineers might start with a simple internet search and find a generic document. These plans are often unsuitable for all business use cases and should be avoided. Instead, a plan should be created by a professional who knows data science, data modeling, and the steps necessary in testing.

What Should a QA Plan Include?

The goal of creating a test plan is a setup process which helps to find errors in your code. Your QA plan will evolve as your business requirements and goals change.

The most critical part of the test plan covers your data. Businesses often collect data from many sources, but it’s the quality of the data that counts.

Data quality is defined by several characteristics:

  • Completeness — There are no omissions in the records, all the cells required must be filled.
  • Uniqueness — There should not be identical records in the data.
  • Credibility — The cells of tables with good data contain what they should contain: IP address, telephone number, etc.
  • Accuracy — Data should be the exact number of characters. For example, 12 decimal places.
  • Consistency — Data must retain values, regardless of how they are measured.
  • Timeliness — The data should be relevant, especially if it is updated periodically. For example, every month the amount of data should increase. In addition, data should not be obsolete.

This means that QA must test for the following conditions:

  • Duplicates that must be removed from the dataset.
  • Missing values and incomplete records.
  • Formatting errors. For example, dates are represented differently in Europe versus the United States. The format must be consistent across all records.
  • Syntax anomalies. For example, phone numbers for the US must be 10 digits.

If any of these tests are mistakenly skipped, AI or ML results could be inaccurate. Your QA plan should be created to find all instances of these errors early in the process and not after deployment to production.

Test for Data and Bias, Too

Testing for errors is important in QA. However, testing for bias is one of the most difficult aspects of QA in AI and ML. Bias could harm specific groups of people and the accuracy of your output. It can stem from several sources including improper data collection or the data scientist creating the model.

An example of bias can be seen in financial loan applications. To determine if an individual should be eligible for a loan, you need their age, income, and credit history. Factoring in race would create bias in your application that discriminates against a protected class. QA would identify unnecessary data points like race in loan applications and eliminate them from the model.

Bias doesn’t always come from collected data. Data scientists can even unknowingly introduce this issue into their models by selecting data based on their own bias (“selection bias”), which makes it even more important for QA to be independent from other aspects of development. Data scientists continue to take steps to ensure that they don’t introduce bias, but mistakes can happen.

Testing Doesn’t Stop after Deployment to Production

Testing must be constantly performed in production. This is because data models degrade over time, after deployment. Changes to customer preferences, business requirements, products and even economic factors cause your models to be less accurate. When models are no longer accurate, QA determines what must change and helps the organization deploy new models to stay resilient against data drifting. These tests can be automated so that it will be a good investment to reduce costs in future.


Aleksei Chumagin is the Engineering Manager at Provectus and a data quality enthusiast. He has extensive experience in automation testing and ML quality assurance. Aleksei actively contributes to the local QA community. He understands how difficult it can be to begin a new QA process, and he generously shares his knowledge and experience with the community.