AI is seeping into just about everything, from consumer products to industrial equipment. As enterprises utilize AI to become more competitive, more of them are taking advantage of machine learning to accomplish more in less time, reduce costs and discover something whether a drug or a latent market desire.
While there's no need for non-data scientists to understand how machine learning (ML) works, they should understand enough to use basic terminology correctly.
Although the scope of ML extends considerably past what's possible to cover in this short article, following are some of the fundamentals.
Before one can grasp machine learning concepts, they need to understand what machine learning terms mean. Some of the commonly used terms include:
- A/B testing – testing two machine learning techniques to determine which performs better.
- Clustering – grouping objects based on similarity. For example, within an M&M population, the individuals within that population might be grouped by color or by type. (E.g., peanut M&Ms versus regular M&Ms).
- Decision tree – a hierarchy of binary values used for decision-making (E.g., Is the customer athletic or not? Does that athletic customer ski or not?).
- False negative – a result that appears to be negative but is in fact positive. (E.g., a cybersecurity breach that evades detection.)
- False positive – a result that appears to be positive but is in fact negative. (E.g., a facial recognition system that misidentifies a congressman as a murder suspect.)
- Features – input variables used for prediction. (E.g., women (1) under 25 (2) who smoke tobacco (3).
- Feature engineering – determining which features should be used in a model.
- Feature set – the group of features used to train a model.
- Holdout data – data that is withheld from training data that is later used to test the model.
- Inference – making a prediction using a trained model on unlabeled data.
- K-means – a clustering technique that uses Euclidean geometry (and more specifically, Euclidean distance) to determine the similarity of examples.
- Label – a result defined by humans. (E.g., cats, dogs, tall, short)
- Model – the result of running an algorithm on training data.
- Neural network – a collection of artificial neurons (aka nodes) that typically use multiple inputs to generate an output.
- Proxy – data that can be used to infer a sensitive attribute. (E.g., using zip codes to determine race or the likelihood or recidivism.)
- Random forest – creating several decision trees with random features (using different parts of a dataset) to determine the average prediction of a single decision tree. Random forests are more accurate than a single decision tree but are not as interpretable.
- Reinforcement learning – a type of machine learning that uses rewards and penalties.
- Semi-supervised learning – uses labeled data and infers labels for unlabeled data.
- Supervised learning – uses labeled data to learn by example. (E.g., day, night.)
- Training set – the subset of data used for training.
- Unsupervised learning – infers data labels and is often used to discover what humans have not discovered yet. (E.g., discovering the main cause of hospital readmissions.)
- Validation – a process used to determine the quality of a model.
Machine Learning Versus Deep Learning
Deep learning is a subset of machine learning that utilizes multiple layers of algorithms. The algorithms form neural network nodes that are arranged in three basic layers: input layer, hidden layer, and output layer. If the network has more than one hidden layer, it is considered a deep neural network.
"Deep learning is just a series of matrix multiplications and nonlinear transformations," said Brooke Wenig, machine learning practice lead at cloud data platform provider Databricks. "You do a bunch of matrix multiplications to your input features; each has a corresponding weight and then you add nonlinear transformations."
There are many different types of neural network architectures available today, and the list keeps growing.
One of the things to keep in mind with deep learning is its expense because it requires a lot of data and therefore storage. It also requires a lot of compute power. This can not only be expensive from a resource point of view but also from an environmental (carbon footprint) point of view. There are also other considerations.
"People should be minimizing their models, not based on some error criteria, but based upon some kind of economic impact of the model," said Wayne Thompson, chief data scientist at analytics software provider SAS. "The problem is, we don't know what numbers to put in for the economic aspect. When I talk to some customers, they can't tell me the price of acquiring a customer or the revenue associated with keeping them once acquired."
Which type of ML technique(s) data scientists use depends on several factors including the business problem that needs to be solved, the data available, the level of accuracy required, time, efficiency, etc. Sometimes, the most elegant solution is the simplest, not the most sophisticated or complex.
Some of the Popular Neural Networks
There are many different types of neural network architectures, all of which have an input layer, an output layer and one or more hidden layers. Generative adversarial networks (GANs), convolutional neural networks (CNNs) and recurrent neural networks (RNNs) are just three examples.
Cybersecurity and games use GANs because in both cases an adversary is involved. GANs involve two networks, one of which is adversarial.
"We've tried just about everything with GANs. They work really well, but they're a little problematic, because they're hard to train," said Cameron Fen, co-founder and head of research at venture capital firm AI Capital Management. "People are trying to replace GANs with another generative model that works just as well or better because they don't like training GANs."
Convolutional Neural Networks (ConvNets or CNNs) are modeled after the visual cortex of animals so not surprisingly, they're used for image recognition. The purpose of a CNN is to reduce the image size for processing without sacrificing the features necessary for a good prediction.
CNNs are used for a variety of use cases including advertising, climate change, natural disaster prediction and self-driving cars.
Recurrent Neural Networks (RNNs) use sequential or time series data. They are called "recurrent" because they perform the same task on every step of the sequence. Practically speaking, RNNs are used for handwriting and speech recognition, time series prediction, time series anomaly detection and even robot control.