5 Reasons Data Scientists Should Adopt DevOps Practices - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Big Data Analytics
News
2/19/2018
08:00 AM
Lisa Morgan
Lisa Morgan
Slideshows
Connect Directly
Twitter
RSS
E-Mail
100%
0%

5 Reasons Data Scientists Should Adopt DevOps Practices

Enterprise software development teams have historically had trouble ensuring the code that runs well on a developer's machine also runs well in production. DevOps has promoted more collaboration between developers and IT operations. Data scientists and data science teams face similar challenges, which DevOps concepts can help address.
Previous
2 of 6
Next

Achieve More Consistent Results and Predictability

Image: Pixabay
Image: Pixabay

Like application software, models may run well in a lab environment, but perform differently when applied in production.

"Models and algorithms are software [so] data scientists face the traditional problems when moving to production – untracked dependencies, incorrect permissions, missing configuration variables," said Clare Gollnick, CTO and chief data scientist at dark web monitoring company Terbium Labs. "The ‘lab to real world’ problem is really a restatement of the problem of model generalization. We build models based on historical, sub-sampled data [and then expect that model] to perform on future examples even if the context changes over time. DevOps can help close this gap by enabling iterative and fast hypothesis testing [because] 'fail fast' has nice parallels to the ‘principle of falsifiability’ in science. If [a hypothesis] is wrong, we should reject [it] quickly and move on."

One reason a model may fail to generalize is overfitting, which occurs when a model is so complex that it starts finding patterns in noise. To prevent that result, data scientists use methods including out-of-sample testing and cross-validation. Those methods, which are familiar to data scientists, are part of the model-building process, according to Jennifer Prendki, head of Search and Smarts Engineering at enterprise software company Atlassian.

"The biggest challenge, model-wise, comes from non-stationary data. Due to seasonality or other effects, a model that performed well yesterday can fail miserably tomorrow," she said. "Another challenge comes from the fact that models are trained on historical (static) data and then applied in runtime. This can lead to performance issues as data scientists are not used to thinking about performance."

Lisa Morgan is a freelance writer who covers big data and BI for InformationWeek. She has contributed articles, reports, and other types of content to various publications and sites ranging from SD Times to the Economist Intelligent Unit. Frequent areas of coverage include ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Previous
2 of 6
Next
Comment  | 
Print  | 
More Insights
News
The State of Chatbots: Pandemic Edition
Jessica Davis, Senior Editor, Enterprise Apps,  9/10/2020
Commentary
Deloitte on Cloud, the Edge, and Enterprise Expectations
Joao-Pierre S. Ruth, Senior Writer,  9/14/2020
Slideshows
Data Science: How the Pandemic Has Affected 10 Popular Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  9/9/2020
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
IT Automation Transforms Network Management
In this special report we will examine the layers of automation and orchestration in IT operations, and how they can provide high availability and greater scale for modern applications and business demands.
Slideshows
Flash Poll