Five Factors Shaping Data Science
As data science evolves, key challenges are driving organizations to seek innovative solutions to compete in the new AI-driven economy.
In late 2018, a survey by Univa found that 96% of respondents expected an "explosion in machine learning projects" in production by 2020. International Data Corp. forecasts that spending on artificial intelligence and machine learning will grow to $57.6B by 2020. Fraud detection, customer analysis, churn prediction, and numerous other applications are driving the rapid growth of AI and ML. The world of AI, however, has a problem. A 2019 survey by Dimensional Research found that 80% of companies reported stalled AI and ML projects.
The following five factors are causing these slowdowns, each with its own set of challenges and opportunities:
1. Making data actionable for data science
The same Dimensional Research article found that 96% of respondents cited data quality and data labeling as crucial problems slowing their AI and ML adoption. Data silos are especially troublesome for data science.
Businesses store vast amounts of data, but often in different lines of business, across disparate systems and with varying levels of leadership and governance. Whether through manual processes or by leveraging automation, the first struggle for data science teams is to access and collect relevant data from different sources. Chief information officers and chief data officers must lead the charge to make data actionable for data science. Mitigating challenges related to data integration, ETL, security, and data privacy will drive faster turnaround of data science projects and making data science quicker and more efficient.
2. Shortage of data science talent
A 2018 LinkedIn survey found a lack of over 150,000 people in the U.S. with data science skills. The rapid adoption of ML and AI and the shortage of labor are likely to exacerbate the talent problem. To produce meaningful results, organizations must leverage statistical knowledge, data management, engineering, and subject matter expertise to tackle data quality, architecture design, and model production. Finding this multi-talented unicorn is impossible. Given the complexity of data science, it's no wonder that 88% of data science graduates have a master's degree and 46% a Ph.D. Addressing this problem requires expanded education as well as continued investments in growing the talent pool at a corporate as well as governmental level. New technologies to automate and accelerate the data science process also promise to reduce talent constraints.
3. Time-to-value must accelerate
The plodding pace of development also slows data science. Data science projects are iterative in nature due to the uncertainty of data and require a deep understanding of underlying business problems. Data scientists create a series of hypotheses to be tested and validated with actual business data by wrangling, cleansing, joining, combining, and aggregating data to identify data relationships and extract relevant patterns to build ML models. This process requires a rigorous trial and error approach to find the right answers, often involving multiple exchanges between business and data science teams prolonging projects. Accelerating the time-to-value of data science is critical to fulfilling the promise of AI and ML.
4. Business users need transparency
While the benefits of AI and ML can be high, one of the challenges of data science is the frequent disconnect between ML models and the expectations of business users. The difficulty in explaining how ML and AI models work, and how they generate results leads to a lack of trust by line of business users who don't have enough transparency to trust the process. Providing greater clarity and transparency for users will be a critical aspect of bridging the gap between the "black box" of data science and user needs. While systems that better "verbalize" AI models will help, a closer relationship between LOB users and data science teams will be critical in bridging the gap.
5. Improving the operationalization process
Lastly, the migration of data science models to production environments is fraught with impediments and challenges. Models that worked well in development don't scale and often don't work in production systems. The result is the slow and tedious rework and "fine-tuning" of models. When models work in production environments, they degrade as data changes, leading to model maintenance and rework. The integration and acceleration of AI and ML models into production environments will require a shift in thinking to be able to accelerate rework and optimize production use.
The world of data science is undergoing some radical changes. Increasing requirements for transparency, an ever-increasing workload from business users and a continuing shortage of qualified data science experts are all putting more significant pressure on data science teams to accelerate processes, automate as much of their work as possible and provide broader levels of adoption of the data science process by non-data scientists. Organizations relying on data science will have to put critical changes in place to effectively address each of these challenges and compete in the new AI-driven economy.
Ryohei Fujimaki is the Founder & CEO of dotData, a spin-off of NEC Corp., and the first company focused on delivering full-cycle data science automation for the enterprise. Fujimaki is a world-renowned data scientist and was the youngest research fellow appointed in the 119-year history of NEC.
About the Author
You May Also Like