If you're a regular AllAnalytics.com reader, you're probably already up to speed on data science topics like artificial intelligence, machine learning, the Internet of Things (IoT), and predictive analytics.
But what is coming next?
IDC forecasts that worldwide spending on big data and analytics could top $210 billion by 2020 (an increase from a projected $150.8 billion in 2017). What will enterprises be spending their money on then? And what techniques, tools, and other trends will play a major role in data science efforts in the coming years?
To answer that question, AllAnalytics.com poured through press releases, analyst reports, academic journals, and blog posts looking for terms that could become important for data scientists in 2018 and beyond. The following slideshow takes a closer look at ten data science trends that could soon have a significant impact on enterprise analytics.
While supervised, unsupervised and semi-supervised methods of training have gotten a lot of attention previously, reinforcement learning seems to be gaining more traction. For example, the Google DeepMind team recently published a paper explaining how they used reinforcement learning to train AlphaGo Zero, the latest iteration of their artificial intelligence engine that plays the board game Go. With absolutely no human intervention and no historical data to use as a reference, AlphaGo Zero needed just three days of training to match the capabilities of the first version of AlphaGo to beat a world champion. And within 40 days, reinforcement learning propelled this AI to become -- arguably -- the world's best Go player, human or machine.
In the past, most data scientists have conducted analytics on data that resides in the cloud or a centralized data center. However, the growth of the Internet of Things has generated more interest in edge analytics. When you have thousands of devices and sensors all collecting data, transferring that data to another location for analysis eats up too much network bandwidth and becomes cost prohibitive. For these use cases, enterprises are beginning to do more data processing and analytics out at the edge of the network, either on or very close to the devices and sensors generating the data.
Two independent factors are increasing enterprise interest in data protection. First, the seemingly endless string of high-profile data breaches highlights the facts that cyberattacks have breached the defenses of very well-respected companies and that these attacks can be extremely costly for their victims. Second, the European Union's General Data Protection Regulation (GDPR) law goes into effect next year. In order to comply, companies all over the world are going to need to improve their security and privacy procedures.
As part of their jobs, data scientists frequently access sensitive information. With enterprises imposing stronger security measures, data scientists will likely find themselves needing to adapt to changing protocols and procedures, and they may perhaps need to go to greater lengths to anonymize and secure data used for analytics.
Earlier this year, Gartner published a paper titled "Augmented Analytics Is the Future of Data and Analytics," and in its report on the "Top 10 Strategic Technology Trends for 2018," the analyst firm called augmented analytics a "particularly strategic, next-generation data and analytics paradigm." Augmented analytics makes use of machine learning and natural language processing to automate many of the fundamental tasks involved in analytics. The idea is to take self-service to a new level, enabling business users to conduct advanced analytics on their own. In theory, that should free up data scientists to focus on more specialized questions.
Gartner and several other analyst firms are predicting that blockchain will take off in the coming year. Of particular interest to the financial services industry, Blockchain is the technology that underlies cryptocurrencies like Bitcoin. It is a highly secure distributed ledger that doesn't allow data to be edited or deleted once it is added to the ledger. Some experts predict that blockchain could soon be used to record a wide variety of different enterprise transactions. If this happens, data scientists may need new tools and procedures to help them analyze blockchain data, but they will also have access to a data source that is more reliable than any they have had before.
Cloud analytics is nothing new, and the trend isn't going away anytime soon. Instead, spending on cloud-based data science tools seems to skyrocketing. In fact, IDC predicts, "By 2020, new cloud pricing models will service specific analytics workloads, contributing to 5x higher spending growth on cloud vs. on-premises analytics." Machine learning and artificial intelligence applications are particularly well-suited to the cloud, where organizations can access high-end systems that enable them to process and analyze data very quickly for an affordable price.
To date, most uses of artificial intelligence have focused on "narrow AI." That is, they target a specific task, like making product recommendations or predicting corporate margins for the coming quarter. However, researchers are becoming increasingly interested in general AI. These systems would be able to analyze any problem and would continuously learn from their successes and failures. In essence, they would have the equivalent of human-level intelligence and adaptability. Most experts believe true general AI is still a long way off, but it wasn't all that long ago that any form of AI seemed like science fiction. Look for more announcements related to general AI in the years ahead.
Recently, meta-learning has been appearing in scholarly papers and blogs with some frequency. Theoreticians and data scientists haven't yet settled on an industry-standard definition for meta-learning as it applies to AI, even though the term has been used for some time. Essentially, meta-learning involves a system learning how to learn. These systems extract and analyze metadata from previous deep learning processes and learn from those experiences. The goal is to create a system that would be able to generalize learning from one field to another, thus moving one step closer to the creation of a general AI.
Back in 2013, the Defense Advanced Research Projects Agency (DARPA) began a project called Probabilistic Programming for Advancing Machine Learning (PPAML). Its goal was to create new programming languages that would make it faster and easier for developers and data scientists to create new models and machine learning applications. The project is slated to wrap up this year, but work on probabilistic languages is far from complete. Look for probabilistic programming to generate more interest as the popularity of machine learning and advanced analytics increases.
The last big data science trend in this slideshow is neither a new technology nor an academic theory. Instead, it relates to the data science workforce.
For several years, demand for data scientists has outstripped supply. As a result, a growing number of people have taken courses in data science in hopes of snagging high-paying jobs. Executive recruiting firm Burtch Works has observed that this is resulting in a glut of junior-level data scientists. Today, many data science job seekers have master's degrees (or less), rather than the traditional PhD-level qualifications. This trend, combined with the trend toward augmented analytics and self-service analytics, could lead to lower demand for data scientists and possibly reduce average salaries over time, even as analytics become more important to enterprises.