Data science has evolved far beyond science. It now represents the heart and soul of many disruptive business applications.
Everywhere you look, enterprise data science practices have become industrialized within 24x7 DevOps workflows. Under that trend, automation has come to practically every process in the machine-learning DevOps pipeline that surrounds AI.
Automating the modeling of AI’s neural-network algorithms
Modeling is the next and perhaps ultimate milestone in the move toward end-to-end, data-science pipeline automation. Already, most data science toolkits now include tools for automation of model feature engineering and hyperparameter tuning. One of the next milestones in this trend, sometimes known as AutoML, is the development of standard approaches for “neural architecture search.” This refers to tools and methodologies for automating creation of optimized architectures for convolutional, recurrent, and other neural network architectures at the heart of AI’s machine learning models.
Essentially, neural architecture search involves tailoring the structure, weights, and hyperparameters of a machine learning model’s algorithmic “neurons” in order to make them more accurate, speedy, and efficient in performing data-driven inferences. By automating the ability to create novel neural architectures for the most demanding AI challenges, it could result in AI apps of unprecedented accuracy, efficiency, and speed. And by automating the time-consuming and error-prone processes of neural-net architecture design that are traditionally performed with manual techniques, it could free data scientists to use their creative imaginations to build more disruptive intelligence into everything from cloud microservices to edge devices.
As neural architecture search emerges from R&D into enterprise data scientists’ workbenches, it could revolutionize the practice of building and optimizing AI applications for their intended purposes. It could greatly reduce the need to train machine learning models for their intended uses, which could thereby lessen the need for AI developers to tap into data lakes’ massive computational, memory, and storage resources. And it could boost the productivity of data scientists by guiding their decisions whether to build their ML models on established algorithms, such as linear regression and random forest algorithms -- or on any of the newer, more advanced neural-network algorithms.
Exploring diverse algorithmic approaches to neural architecture search
The research literature shows that neural architecture search methods have already outperformed manually designed neural nets in laboratory environments. The range of technical approaches -- which have been applied to such AI tasks as image classification, object detection, and semantic segmentation -- include the following:
Evolutionary algorithms: These have been the foundation of neural architecture search since the 1990s and are, in fact, being used now at OpenAI, Uber Labs, Sentient Labs, DeepMind, and Google Brain team on a variety of AutoML initiatives. Evolutionary algorithms have traditionally been used by researchers to evolve neural architectures and weights. They evolve a population of models, with at least one model from the population sampled in each step to serve as parent in the auto-generation of offspring models by applying mutations to it. For neural architecture search, mutations involve adding or removing a layer, altering the hyperparameters of a layer, adding skip connections, and altering training hyperparameters. After training the offspring models, their fitness, as applied to a validation data set, is evaluated and, if adequate, they are added to the population.
Reinforcement learning: This has been popular in new neural architecture search projects over the past several years. In reinforcement learning terms, the generation of a neural architecture is an agent’s action and the agent’s reward is based on an estimate of the performance of the trained architecture on unseen data. Different RL approaches optimize the agent’s neural-architecture-search policy in different ways and encode the architectures, hyperparameters, weights, and states in different ways. Some projects report acceptable results without explicit weight training. These approaches tend to use considerable computational resources over a long time frame to achieve its results, though researchers continue to reduce the computational costs and achieve further improvements in performance.
Gradient-based methods: These perform neural architecture search through alternation of stochastic gradient descent steps on validation data (for network architecture) and training data (for weights). This approach is scalable to millions of weights and highly connected architectures with intricate hyperparameters. Recent neuro-evolutionary approaches also rely on gradient-based methods for optimizing weights.
Bayesian optimization: This has been a popular approach for neural architecture search since early in this decade. It has been used to create optimal neural-net architecture for computer vision and data augmentation.
Random search: This uses tree-based models to effectively search high-dimensional conditional spaces to optimize both neural architectures and their hyperparameters.
Throwing agile and massive computing resources at neural-net architectural search
Still other researchers start with some initial set of neural network architectures, and then evolve that toward a single architecture that is adept at a wide range of inferences on disparate data types. One project that’s attempting this is at the U.S. Department of Energy’s Oak Ridge National Laboratory. The lab’s MENNDL program can automate discovery, generation, and testing of millions of neural net architectures associated with any specific AI modeling challenge.
Currently, the lab’s researchers use MENNDL to automatically build neural networks for analyzing cancer scans. MENNDL implements a Python-based framework to generate new neural nets through recombination of reusable components of such architectures. It creates new neural network architectures 16x faster than previous manual processes. MENNDL is computationally intensive and runs on a 1.3 exaflop supercomputer that contains 4,608 nodes, each of which contains two IBM POWER9 CPUs and six Nvidia Volta GPUs. A typical MENNDL job runs the code for mixed precision floating point operations on a total of 9,216 CPUs and 27,648 GPUs on the supercomputer.
Of the millions of potential architectures that it creates for a particular data set, MENNDL chooses the best one based on a neural network’s size, the degree to which it is computationally intensive to train, and its accuracy at detecting tumors in scans. The models that it selects are then trained intensively through separate tools to optimize them for particular inferencing challenges. These models, in turn, are stored by researchers in order to build a reusable library of neural-net architectures for various application challenges.
Exploiting the potential of neural-net pruning
One of the most promising neural architecture search approaches involves finding the optimal approach for “pruning” down a neural network up front. Alternately, some researchers start with sparse networks and only add connections and other complexities as necessary.
These approaches, which are increasingly evident in the literature, produce sparse neural networks that keep only a small fraction of the connections, while maintaining similar or even superior performance on inferencing tasks compared to the full unpruned network. They boost a neural net’s accuracy and efficiency from the start and avoid or greatly reduce the need to train it prior to deployment.
Recent research at MIT uses pruning to radically reduce the amount of computing horsepower needed to do neural architecture search. Researchers have developed a neural architecture search algorithm that can directly learn specialized convolutional neural networks for target hardware platforms -- when run on a massive image dataset in only 200 GPU hours, which could enable far broader use of these types of algorithms.
To achieve this, they developed ways to prune unnecessary neural network design components, thereby reducing computing times and requiring only a fraction of hardware memory to run a neural architecture search algorithm. Each outputted network runs more efficiently on specific hardware platforms, such as CPUs and GPUs, than neural networks designed by traditional manual approaches.
The neural architecture search algorithms can run on smaller proxy datasets and rapidly transfer their learned networks to larger data sets in production with no loss of accuracy. To achieve this speed and efficiency, researchers developed a technique called “path-level binarization,” which stores only one sampled neural-net path at a time, combined this with path-level pruning, and automatically adjusted the probabilities of paths to optimize both accuracy and efficiency. This approach also uses the latency on each target hardware platform as a feedback signal to optimize the architecture.
If this approach gains traction, it could have a disruptive impact on the standard practice of data science. As discussed here, it could radically reduce the size of neural networks, boost their accuracy and efficiency, and enable anybody to build highly optimized networks without vast server farms and data lakes at their disposal.
As that trend picks up speed, it might someday eliminate the hyperscale-related advantages that the likes of Google, Amazon, and Microsoft have over the practice of building sophisticated AI. In the process, it could accelerate the ongoing democratization of AI so that even one-person shops can develop mind-blowing innovations that formerly would have been beyond their resources.