4 Machine Learning Challenges for Threat Detection

While ML can dramatically enhance an organization’s security posture, it is critical to understand some of its challenges when designing security strategies.

Guest Commentary, Guest Commentary

May 4, 2020

4 Min Read

Image: NicoElNino - stock.adobe.com

The growth of machine learning and its ability to provide deep insights using big data continues to be a hot topic. Many C-level executives are developing deliberate ML initiatives to see how their companies can benefit, and cybersecurity is no exception. Most information security vendors have adopted some form of ML, however it’s clear that it isn’t the silver bullet some have made it out to be.

While ML solutions for cybersecurity can and will provide a significant return on investment, they do face some challenges today. Organizations should be aware of a few potential setbacks and set realistic goals to realize ML’s full potential.

False positives and alert fatigue

The greatest criticism of ML-detection software is the “impossible” number of alerts it generates -- think millions of alerts per day, effectively delivering a denial-of-service attack against analysts. This is particularly true of “static analysis” approaches that rely heavily on how threats look.

Even an ML-based detection solution that is 97% accurate may not help because, simply put, the math is not favorable.

Let’s say organizations have one threat among 10,000 users on their network. Thanks to Bayes’ law, we can calculate an alert is truly a positive attack by multiplying 0.97 (for 97% accuracy) by the chance of an actual threat amongst all users, or 1/10,000. This means that even with 97% accuracy, the actual likelihood of an alert being a real attack is 0.0097%!

Since improving beyond 97% may not be feasible, the best way to address this is to limit the population under evaluation by whitelisting or prior filtering with domain expertise. This could mean focusing on highly credentialed, privileged users or a specific vital part of the business unit.

Dynamic environments

ML algorithms work by learning the environment and establishing baseline norms before they monitor for anomalous events that can indicate a compromise. However, if the IT enterprise is constantly reinventing itself to meet business agility needs and the dynamic environment doesn’t have a steady baseline, the algorithm cannot effectively determine what is normal and will issue alerts on completely benign events.

To help minimize this impact, security teams must work within DevOps environments to know what changes are being made and update their tooling accordingly. The DevSecOps (development, security, and operations) acronym is beginning to gain traction since each of these elements should be synchronized and work within a shared consciousness.

Context

ML’s power comes from its ability to conduct massive multi-variable correlation to develop its predictions. However, when a real alert makes its way to a security analyst’s queue, this powerful correlation takes the appearance of a black box and leaves little more than a ticket that says, “Alert.” From there, an analyst must comb through logs and events to figure out why it triggered the action.

The best way to minimize this challenge is to enable a security operations center with tools that can quickly filter through log data on the triggering entity. This is an area where artificial intelligence can help automate and speed data contextualization. Data visualization tools can help as well by providing a fast timeline of events coupled with an understanding of a specific environment. A security analyst can then determine rapidly why the ML software sent the alert and whether it is valid.

Anti-ML attacks

The final challenge for ML is hackers who are quickly able to adapt and bypass detection. When that does occur, it can have catastrophic effects, as recent hackers demonstrated by causing a Tesla to accelerate to 85 MPH by altering a 35 MPH sign on a road.

ML in security is no different. A perfect example is an ML-network-detection algorithm that uses byte analysis to very effectively determine whether traffic is benign or shellcode. Hackers adapted quickly by using polymorphic blending attacks, padding their shellcode attacks with additional bytes to alter the byte frequency and fully bypass detection algorithms. It’s more ongoing proof that no one tool is bulletproof and security teams need to constantly assess their security posture and stay educated on the latest attack trends.

ML can be extremely effective in enabling and advancing security teams. The ability to automate detection and correlate data can save a significant amount of time for security practitioners.

However, the key to an improved security posture is human-machine teaming where a symbiotic relationship exists between machine (an evolving library of indicators of compromise) and man (penetration testers and a cadre of mainframe white-hat hackers). ML brings the speed and agility needed to stay ahead of the curve, and humans bring qualities that it can’t (yet) replicate -- logic, emotional reasoning, and decision-making skills based on experiential knowledge.

Christopher Perry is the Lead Product Manager for BMC AMI for Security at BMC Software. Perry got his start in cybersecurity while studying computer science at the United States Military Academy. While assigned to Army Cyber Command, Perry helped define expeditionary cyberspace operations as a company commander and led over 70 soldiers conducting offensive operations. He is currently getting his master’s degree in Computer Science with a focus in Machine Learning at Georgia Institute of Technology.

About the Author(s)

Guest Commentary

The InformationWeek community brings together IT practitioners and industry experts with IT advice, education, and opinions. We strive to highlight technology executives and subject matter experts and use their knowledge and experiences to help our audience of IT professionals in a meaningful way. We publish Guest Commentaries from IT practitioners, industry analysts, technology evangelists, and researchers in the field. We are focusing on four main topics: cloud computing; DevOps; data and analytics; and IT leadership and career development. We aim to offer objective, practical advice to our audience on those topics from people who have deep experience in these topics and know the ropes. Guest Commentaries must be vendor neutral. We don't publish articles that promote the writer's company or product.

See more from Guest Commentary

Related Topics

Recent in Leadership

Related Topics

Recent in Resilience

Related Topics

Recent in ML & AI

Related Topics

Recent in Data

Related Topics

Recent in Sustainability

Related Topics

Recent in Infrastructure

Related Topics

Recent in Software

Related Topics

4 Machine Learning Challenges for Threat Detection

About the Author(s)

Editor's Choice