Machine Learning: Expertise vs. Coverage

By Haystax, December 26, 2017 | SHARE

Critics of machine learning (ML) often point out that it can’t come close to emulating a subject-matter expert working at his or her maximum potential. Sure, ML powers our smart phones, determines which advertisements we view and soon will dominate the automobile industry through self-driving cars, but in a straight-ahead analytics competition where judgment and wisdom are required, the experienced human will always produce higher-caliber analysis than the algorithm.

However, such criticisms miss the point, says Daniel Miessler, an information-security professional and writer. Miessler writes in a recent blog post that the true value of ML to analysts and decision-makers in mission-critical security tasks is its ability to scale far beyond the capacity of a single expert, or even a roomful of them. And with most organizations very short on in-house analytical expertise, this is a major advantage.

“Humans being better than machines at a particular task is irrelevant when there aren’t enough humans to do the work,” Miessler notes. He estimates that perhaps 5% of companies analyze anywhere from 50-85% of the data they have available, while 90% “probably have human security analyst ratios that only allow 5-25% coverage of what they wish they were seeing and evaluating,” and the lowest percentile are looking at less than 1%, “likely because they don’t have any security analysts at all.”

We’ve often written in blogs and white papers about the ‘data paradox.’ This is the problem of having oceans of available data (which is growing exponentially every day) that is supposed to give organizations a much richer set of actionable intelligence, but which in fact merely overwhelms them with lots of noise while critical signals remain buried. The more data rolls in, the more the organization is overwhelmed and the less secure it becomes.

Enter ML, with its litany of advantages. As per Miessler:

Miessler highlights an important expectations gap in ML that causes critics to miss its true value: that of being able to operate at scale — in other words at a volume and velocity of incoming data that no human analyst, however highly experienced, could sustain for very long. It’s an advantage he calls ‘Superior Coverage,’ and he sums it up this way: “When you hear someone dismissing Machine Learning by saying humans are better, ask yourself what percentage of the data in question a potential human workforce can realistically evaluate.”

But what about that human expertise? Can that, too, be scaled? At Haystax Technology we believe the answer is yes. Before analyzing any data, we first build probabilistic models called Bayesian inference networks, which encode subjective human judgment and experience to help solve many complex problems facing security professionals today. Only then do we identify a diverse array of data sets to run through the models to prioritize risks. We call this technique ‘model first.’

Since machine-learning and rules-based systems excel at finding anomalies we use them for that exact purpose, for example flagging anomalous printer events, odd badge-in/badge-out times or the presence of an employee at an unusual location. Instead of feeding anomalies to an analyst, however, they are applied to the model, which performs the analysis. And because that model already has encoded within its nodes the inherent knowledge and judgment of those very analysts, it frees them to focus on what’s important, and to operate at a scale and speed that would be unachievable were there no model performing the analysis and prioritization for them.

As a result, our Constellation user behavior analytics (UBA) solution eases the alert overload that afflicts security analysts by eliminating many of the false positives generated by ML-based systems, and providing decision-makers with prioritized alerts of their biggest risks.

Miessler goes on to suggest that machine learning algorithms may eventually become better than human experts at analyzing complex problems, not just providing superior coverage. Perhaps, but until that time we will continue to rely on Bayesian modeling to do the heavy analytical lifting in such complex missions as detecting and preventing insider threats, financial fraud and other rare but high-consequence events.