In today's world of threat detection, firewalls, proxies, and intrusion prevention systems, security professionals typically address known threats proactively by enforcing policies inline. They also act reactively to security threats by analyzing and correlating data with offline technologies like intrusion detection and security event management systems, which can feed policies defined in inline solutions.
I got to thinking about this the other day in the context of big data and emerging security analytics while I was talking with a colleague about the shift from monolithic computing to discrete distributed apps and computing endpoints. (Yes, we really talk about these sorts of things.) We noted that, as this happens, two things occur:
This poses great challenges and opportunities for enterprise IT, chief among them the lack of visibility and control over apps that consume enterprise data, and the incredible transience of data as it moves across the many apps that consume, manipulate, visualize, and make sense of it.
Two dimensions of security controls
To better understand the issues related to security controls, let's consider the nature of the control itself: Is it proactive (inline and policy-based) or is it reactive (correlating data with offline technologies)? The second dimension we need to put into the equation is whether the threat behavior is known or unknown.
Several mature technologies in the market today address known threats. Unknown threats are a different animal altogether. The ability to ride herd over these animals is considered a home run for security practitioners. SEM tools can reactively detect unknown threats, while DDoS, fraud detection, and sandbox tools fall more into the proactive dimension.
At the same time, security practitioners have gotten a lot smarter about how we use our data. We've figured out techniques that identify anomalous behaviors that can signal unknown threats -- and the more real-time our analytics are, the more proactive we've become. For enterprise apps -- especially those operating in the cloud -- this is an emerging area that will become even more critical in solving complex problems like advanced persistent threats, data loss, and fraud.
It's all about the data
All this brings me to my main point: The foundation of a good anomaly detection framework is the data that is used; the richer the data, the better the inferences we draw. In fact, data used in anomaly detection algorithms can be categorized several ways:
Today, most data is analyzed in isolation in a single category (a practice that, frankly, is not that interesting or useful when it comes to threat detection). But try correlating data in multiple categories. Now you're cooking with grease.
Consider algorithms. To build or find an algorithm that can operate on the data and detect anomalous events, the key is to first hypothesize the expected behavior, which we'll call the baseline. Then you must allow for a well-defined learning period and then refine continuously. The learning period for this will vary but should be long enough to capture all possible uses of the system. The learning period should also be subdivided into intervals of specific usage patterns. For an enterprise application, work-hours usage and off-hours usage must be baselined separately.
Once you have a baseline, any activity that is an outlier is an anomaly. In the data science industry, we put classes of algorithms that can be used to build baselines into two different buckets:
As cloud adoption and data explosion trends collide, anomaly detection will becomes a critical component of enterprises' security posture and a critical tool for complex problems like advanced persistent threats, data loss, and fraud. Getting it right means proactive control of unknown threats -- which is the holy grail of security.Krishna Narayanaswamy is a founder and chief scientist of Netskope, a leader in cloud app analytics and policy enforcement based in Los Altos, Calif. He is a highly regarded researcher in deep packet inspection, security, and behavioral anomaly detection and leads Netskope's ... View Full Bio