The ongoing shortage of data scientists has been well documented. Even as the business world grows increasingly digitized and reliant on big data modelling and analytics to drive value and profit, those possessing the requisite education and expertise in mathematics/statistics, data prep, programming, and distributed computing to meet data science challenges are rare birds. The ability to make sense of the enormous troves of transaction, customer, and equipment data across digitized industries has become a premium skillset, and the recent explosion in machine learning (ML) and artificial intelligence (AI) capabilities has compounded the problem.
Now that we can access the compute power and data volumes necessary to operationalize tasks such as pattern recognition, anomaly detection/diagnosis, customer analytics, pricing and predictive planning, we want ML systems that can learn to automatically prepare and perform data science functions with minimal programming. Thus, the irony: Machine learning is often deployed as a kind of digital surrogate for the data scientist, but one that requires the skills of a data scientist to be brought into existence.
At the same time, there has been a growing recognition that better utilization of those rare skillsets can ameliorate the shortage in sufficiently skilled data scientists. A large part of what data scientists do today are functions that can be anatomized and redistributed. Particular skills in data wrangling, cleaning, and preparation or in data modeling and data processing systems have led to classifications such as “data engineer,” “data architect,” and the like — which serve to remove some of the workload burden from data scientists. And appropriately managed ML efforts can automate many additional laborious data science functions. But there also has been an even greater democratization of data.
According to Gartner research director Joao Tapadinhas, "Most organizations don't have enough data scientists consistently available throughout the business, but they do have plenty of skilled information analysts that could become citizen data scientists.”
These citizen data scientists are more than mere consumers of analytic output.
IDC big data analytics and artificial intelligence research director Chwee Kan Chua notes that in the face of the data scientist shortage, "lowering the barriers to allow even non-technical business users to be ‘data scientists’ is a great approach.” The idea is to utilize intuitive, often ML-enhanced, tools within the enterprise that enable these citizen data scientists to develop and administer focused analytics models for specific kinds of business analyses via wizards and templates and dashboards. Further, the results of these efforts can be interpreted and applied for the benefit of other line-of-business users. There’s a collaborative aspect underpinning the citizen data science phenomenon that aims to amplify data science knowledge, but also to scale domain expertise and business acumen as well.
The rise of the citizen data scientist shows that the lines between business intelligence and data science have started to blur. Digitalization has created an enormous demand for the ability to understand data that extends into every area of enterprise from the marketing department, to R&D, engineering and supply chain, executive suites, and so on.
For example, everybody has customers. If you want to use data to most effectively target your customers with the things that they want to buy from you, that is going to require analytic applications such as segmentation, propensity modeling, ad targeting, lead scoring, dynamic pricing and a range of concomitant data engineering and data science capabilities. Practicality dictates that you’re going to have non-data scientists working on the effort. This gap is solved via citizen data science, data engineering and analytics operations tools that simplify and/or automate the application of sophisticated underlying functions, techniques and processes. You’re not going to have your data scientists updating your sales team on your company’s marketing progress, but you are going to have your marketing professionals creating their own analyses of market interest and visualizations of customer engagement models in fulfilling that role.
Citizen data scientists bring their own domain expertise and an understanding of the particular business problem to the table, then use smart software and automation to perform specific data science functions. Tools now feature convenient user interfaces that allow rapid building of analytic pipelines in web and collaborative environments with suggestive components that both guide the process and mitigate inherent complexity. This democratization of data science is being executed across a population of people who don’t have PhDs in statistics, but still need to digitize various business problems and demand tools that allow them to do so independently. Some key components driving adoption of such tools include:
- Web-based visual composition workflow environments (along with smart notebooks)
- Collaboration-friendly components enabling rapid interactions among citizen data scientists, data scientists, data engineers, and business managers in an intuitive context (much like social media environments)
- Templates and accelerators to adoption in common scenarios (largely predefined business use cases such as customer analytics; and technical areas such as data prep, data engineering, and model deployments for operations)
- Web-based communities and ecosystems with resources for technical support, training, and best practices.
Data science is becoming more of a self-service proposition, and that is a good thing. With digitalization accelerating, the need for greater numbers of qualified data scientists remains. Democratization through a growing set of knowledge-sharing tools is expanding data science capabilities to more and more "citizens” in the broader working community. It isn’t a stretch to say that someday, perhaps soon, everyone in the enterprise will be a citizen data scientist to one degree or another.
Michael O'Connell, PhD, is Chief Analytics Officer for TIBCO Software.