The Basics of CI/CD for Data Science and Machine Learning - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management
08:00 AM
Pierre DeBois
Pierre DeBois

The Basics of CI/CD for Data Science and Machine Learning

Continuous integration and continuous deployment are IT practices that encourage testing code often. Learn how these practices also shape data-driven initiatives.

The basics behind how machine learning and data science should work often feel less than basic. Machine learning practitioners from programmers to scientists are learning how to apply advanced statistics and mathematical application within the context of software programming. The result is complexity in selecting good machine learning models that conflict with management’s options at hand, be it objectives deadlines or limited resources to execute a decision based on the model.

Fortunately a few developer practices -- continuous integration and continuous deployment (CI/CD) -- are providing managers with ways to lead machine learning and data science initiatives early in a development process, making truly beneficial model-based decisions possible.

Let’s look at the definition of CI/CD to understand how the paired processes impact machine learning.

Continuous integration is a practice that ensures that code and any related resources are placed into a shared repository at regular intervals of time. These check-ins are next verified using automated builds, helping to highlight any problems early in the development cycle.

Continuous deployment is a practice in which software updates are built automatically, tested, and made ready for release. With developers and database teams working collaboratively and in parallel, continuous deployment paves a way for stable and consistent versions of software.

Image: Shutterstock
Image: Shutterstock

CI/CD is valuable because today’s business strategies have become reliant on how the ongoing nature of software management impacts the development of products and services. The consequential agility needed to deliver functional software has transformed the software itself into microservice architectures. Microservices are a set of development techniques that arrange an application as a set of coupled services. Maintaining microservices permits software releases to be deployed frequently, even multiple times a day, without interrupting other software segments. The advantage to a business model is being able to provide seamless updates.

The seamless updates of microservices can also complement data-related changes, such as adding software updates that meet privacy compliance needs with any associated data. The update capability allows data science and machine learning processes to be incorporated into CI/CD phases at the right time.

As a consequence, CI/CD-influenced projects have the opportunity to minimize technical debt, the tendency to overfocus on code syntax without considering the long-term consequences to programming maintenance and its impact on the business model. For example, a team could develop an app, but not examine the steps needed to update the environment in which the app operates. Technical debt is the enemy of organizations that have multiple deployment environments (e.g., development, testing, production). Technical debt is also the enemy of data-driven initiatives, since data deployment environments are demonstrating similar concerns that arise in software development, such as API documentation -- in this case from data resources -- as well as different data types. Getting an overview of needed data mining and transformation steps can become complex very quickly.

So where within a development process can managers contribute to a CI/CD process to help simplify the complexity? One great opportunity is through evaluating test processes like user acceptance testing (UAT), a test phase that evaluates user needs, business requirements, and software functionality. Managers can help the test team set the evaluation parameters for business requirements, leading to a robust methodology for evaluating continuous improvement of those parameters. A project manager is usually assigned to work with developers on this effort. UAT can be effective in reducing development time and expenses, while CI/CD can inform data management on how development of a model output can potentially impact customer experience with a service or product.

Experts indicate other opportunities for managers to apply CI/CD practices are emerging. Ben Lorica, chief data scientist at O'Reilly Media, noted in his O’Reilly Strata conference keynote that tools specialized for machine learning will layer onto existing analytics. The trend will allow teams to increment their capabilities and experiment with other architectures. Recent announcements by Microsoft Azure, Amazon Web Services, and Google, for example, emphasize faster model training, better workflow management, and greater security for project deployment.

Evaluating the programming used for those projects can aid in selecting complementary IDEs and regular needs among teams. If a team had used R programming to develop models, for example, a version control system would be needed to keep packages and dependencies updated and a documented history on changes that drives decisions among the responsible teams.

All of these considerations can enhance how well a CI/CD workflow complements the time machine learning algorithms take to train on the data and return results for inspection.

Turning data into a valuable business decision is not simple. But as data transformations increasingly occur in applications and software-managed devices, managers are experimenting with software management techniques like CI/CD to keep complex machine learning models in step with good data management basics.

Pierre DeBois is the founder of Zimana, a small business analytics consultancy that reviews data from Web analytics and social media dashboard solutions, then provides recommendations and Web development action that improves marketing strategy and business profitability. He ... View Full Bio
We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
InformationWeek Is Getting an Upgrade!

Find out more about our plans to improve the look, functionality, and performance of the InformationWeek site in the coming months.

Becoming a Self-Taught Cybersecurity Pro
Jessica Davis, Senior Editor, Enterprise Apps,  6/9/2021
Ancestry's DevOps Strategy to Control Its CI/CD Pipeline
Joao-Pierre S. Ruth, Senior Writer,  6/4/2021
IT Leadership: 10 Ways to Unleash Enterprise Innovation
Lisa Morgan, Freelance Writer,  6/8/2021
White Papers
Register for InformationWeek Newsletters
2021 State of ITOps and SecOps Report
2021 State of ITOps and SecOps Report
This new report from InformationWeek explores what we've learned over the past year, critical trends around ITOps and SecOps, and where leaders are focusing their time and efforts to support a growing digital economy. Download it today!
Current Issue
Planning Your Digital Transformation Roadmap
Download this report to learn about the latest technologies and best practices or ensuring a successful transition from outdated business transformation tactics.
Flash Poll