Few things in life can be expressed in black and white terms. Sure, a light switch is either on or it’s off; one baseball team wins the World Series each year and one doesn’t; and every line of computer binary starts with either a one or a zero.
Most of the time, though, our lives are full of gray areas, not absolutes. Brent crude almost never drops below $40 a barrel, but it did happen once and the chances of it happening again are greater than zero. There may be a 60- or 70-percent chance of rain tomorrow, but it’s rarely 100 percent. And, sometimes, even the Chicago Cubs win the World Series.
So why is it that security practitioners often treat their threat environment as if it’s black or white, rather than a spectrum of possible states or probable outcomes — even when this binary view diminishes the quality of their comprehension and decision-making and thus jeopardizes their actual security?
Consider:
- Individuals either receive or do not receive a security clearance. Once they have it they’re 100-percent cleared.
- Foreign visitors are vetted to gain a visa into the country, but once that visa is granted they are 100-percent legal.
- When an alert is generated by a network or user behavior analytics (UBA) monitoring system it is automatically considered a threat, at least until a security analyst determines otherwise.
Let’s delve more deeply into that last example. Say an employee prints an unusually large file containing sensitive data he typically is not privy to. Current data-driven analytics tools will use some combination of network monitoring, machine learning and rules-based techniques to detect that anomaly and generate an alert. An analyst receives the alert and, because the detector is often a ‘black box,’ she then has to chase down additional contextual information to answer the question: what is the probability that this is a true threat and not just another false positive?
If it were only one or two alerts popping up on her monitor that wouldn’t be so taxing, but in the real world she probably works in an organization that manages dozens — possibly hundreds or thousands — of alerts every day.
The good news is that this labor-intensive process of ascertaining probability can be automated in ways that free the analyst to focus on her employer’s most pressing threats, even when the data being analyzed is coming in rapidly and at high volumes.
The key lies in that word ‘probability.’ I am a big fan of core security risk management tenets and Bayesian theory because of their shared emphasis on measuring likelihoods and probabilities, rather than on simplistic statements of binary truth. In my view this analytical focus on the gray areas represents a more realistic way of explaining the world as it is, and a more reliable way of anticipating what it will be in the future.
A Bayesian inference network is a statistical model that determines the probability of a particular outcome based on its various indicators and the relative strength of the inter-relationships between those indicators. The model can be transformed into software and data can be applied to it as evidence. Combined with other artificial intelligence techniques to provide additional context, the model-based analytics can help explain if an anomaly is indicative of something bigger, even when the signals are weak or key data is missing.
Take that odd printer activity. Badge data applied to a Bayesian model shows that the employee’s colleagues were there at the same time, indicating this was a special team project. Given evidence of this sort, plus information showing the employee to be reliable and trustworthy, the model would automatically determine the anomaly to be a false positive and the analyst would never be alerted. Conversely, say badge data reveals that the employee was working alone at an unusually late hour, while network data shows he accessed restricted drives without authorization. The model would immediately alert the analyst to initiate further investigation.
In the latter case, our analyst could show the model results to a decision-maker and demonstrate how she arrived at her conclusion, by pointing to the easily understood behavioral nodes in the model and showing how the applied data led to a true positive. The decision-maker thus would have access to transparent and detailed analytic evidence that would assist in making difficult yes/no decisions about how to respond to this or similar high-priority threat alerts, and in clearly explaining and defending those decisions afterwards.
Absolute certainty is very comforting. But in its absence, measurable probability is the next best thing. That’s why whole industries have emerged to anticipate the likely outcome of a political race, or the approximate price of a barrel of oil or the outlook for tomorrow’s weather. And you don’t see many of them forecasting outcomes at zero- or 100-percent likelihood.
A model-driven security analytics system that is thoughtfully designed and built would generate virtually no black-and-white answers. And that’s a good thing, because the security challenges we face now and in the future will rely on actionable intelligence that is to be found mostly in the gray areas.
# # #
Note: A version of this article first appeared in NetworkWorld on May 24, 2017.