Important data science jobs in industry aren’t being filled. Skills shortages are present in almost every large US city, as online job boards like LinkedIn and Indeed have been reporting through the past year.
And it isn’t for want of applicants. Entry level data science jobs draw dozens, even hundreds of applicants. “The data science field has an experience shortage,” explains Daniel Zhao, Senior Economist, Glassdoor, another job board. “There are plenty of recent grads who can throw a hodgepodge of models at a dataset, but there's a serious shortage of experienced and qualified workers who have the full combination of technical skills, business expertise and domain knowledge."
Automated machine learning, or AutoML, offers a way out. “It is widely known that there is a shortage of advanced skills, and since AutoML does a lot of the expertise-based tasks in machine learning [...] it can help alleviate the skill shortage,” according to IT guru Tom Davenport in a white paper sponsored by DataRobot.
AutoML represents “a major step toward democratizing the field,” says Davenport, President's Distinguished Professor of Information Technology and Management at Babson College. He is not alone in this view.
“Yes, it can alleviate the data science skills shortage,” says Kirk Borne, Principal Data Scientist and Executive Advisor at Booz Allen Hamilton. Borne’s firm will need thousands of additional data scientists over the next three years to support its consulting business, and the firm can’t meet all the demand externally. It will be “upscaling” its own people -- upgrading business intelligence (BI) professionals, for example -- and AutoML could be a boon, he says.
But can AutoML really live up to the hype -- making data science experts out of neophytes? Two German researchers recently wrote a paper (Survey on Automated Machine Learning) in which they examine, among other things, claims that AutoML can reduce the demand for data scientists by enabling domain experts to automatically build predictive models without much knowledge of statistics and machine learning (ML).
Yes, it will happen -- one day – says University of Stuttgart Professor Marco F. Huber. But he and co-author Marc-André Zöller were looking for full-fledged AutoML, covering the whole spectrum -- from data selection to model testing -- and in their view full-pipeline AutoML “is not there, not yet.”
“It’s a help, but not a solution in terms of the skills shortage,” says consultant Alexis Perrier. Data cleansing still accounts for as much as 80% of a data scientist’s work, and AutoML doesn’t really address this. “It’s not a magical solution. It’s part of the pipeline, another tool to use.”
The most exciting possibilities of AutoML is what it can do for domain experts, says Huber. “It is easier to turn a domain expert into a data scientist than it is to turn a data scientist into a domain expert.” A data scientist may be knowledgeable about algorithm selection and feature engineering and dimension reduction but getting specific subject matter knowledge into that data scientist’s head can be very difficult, Huber adds. “There is no tool for that.”
Specific uses are varied. A small business owner, for instance, could use AutoML to make sense of (i.e., categorize) the unstructured form submissions from the company's website -- then deliver that feedback to the appropriate manager. No programming or natural language processing (NLP) knowledge would be required.
Meanwhile, AutoML startup firms like H20.ai have stocked their ranks with Kaggle competition ‘Grand Masters’’ to write ML “recipes” that they say can turn data science novices into professionals, particularly with regard to feature engineering, one of the trickiest elements in predictive modeling, says H20.ai’s CEO Sri Ambati. “We call upon AI to do AI.” Their aim is to make the entire data science pipeline easier, faster and effective, he says.
Data scientists today have such a long laundry list of algorithms to choose from, with new ones being developed all the time, that “keeping up is crazy,” says Jen Underwood, Senior Director, DataRobot, another AutoML startup.
As for the criticism that AutoML doesn’t really help with data cleansing, Underwood notes that while DataRobot’s platform does require clean input data, it need not be perfectly clean: It can handle missing data, for instance, as well as format changes (such as converting string data to numeric data). But data must still be submitted on single a data frame -- as opposed to multiple spreadsheets.
The combination of AutoML and domain expertise can be “very powerful,” says Huber. But can it alleviate the skills shortage? In Germany, there is no lack of talent at the largest companies -- Volkswagen and Porsche, for example -- but there is a shortfall at the middle sized and smaller companies. Many of those smaller firms are using consultants for their ML needs, and they could certainly profit from AutoML when it matures.
There have been some objections to AutoML, including from data scientists, who worry about carelessness, or misuse of data. Perhaps, too, they are worried about being disrupted. Will AutoML replace data scientists eventually?
“No,” says Borne, emphatically. “You will still need experts” for things like coding and feature engineering, e.g., someone who can build deep neural networks for more complex problems. Companies might eventually have two levels of practitioners: AutoML-enabled subject-matter experts for simpler tasks, like predictive pricing, or anomaly detection, and PhD-level data scientists for uncovering more complex patterns.
"The really exciting thing about automated machine learning is that it highlights the power behind increasing democratization of data,” says Glassdoor’s Zhao. “There's a massive untapped opportunity for data science, especially for areas where adoption has been slow, like for small businesses.”
Andrew W. Singer is a freelance writer based in New York who specializes in machine learning and blockchain technology. He is the founder and former editor of Ethikos.