Using AI to Extract High-Value Threat Intel from Data

By John Boatman, November 30, 2017 | SHARE

Today’s security and risk analysts have access to oceans of raw data, thanks to a proliferation of information sources, drastically improved computing power and dirt-cheap storage. Paradoxically, though, they’re having an increasingly hard time coping with the daily tidal waves of security alerts generated from that data. To alleviate their growing alert fatigue, they have started turning to artificial intelligence-based tools that can guide them toward the most significant ‘signals’ while filtering out the distractions of trivial alerts, false-positive alarms and other noise.

In a recent podcast,* The CyberWire asked Haystax Technology CEO Bryan Ware what AI techniques and approaches might help these analysts stay focused on the threat signals that matter most. He recommended a number of steps, including that they explore the analytical power inherent in probabilistic models known as Bayesian inference networks.

“The approach that we’ve taken at Haystax is what we call our ‘model-first’ approach,” Ware explained. He likened the model-building process to emulating “the physics of the problem space,” because models represent what diverse experts believe about the dynamics of, say, a suspicious event or insider threat, and what analysts would do if they were trying to ascertain the validity or probability of each. With a Bayesian model in place, Ware said, “I know how I would use data, as it becomes available, to determine the degree to which this person looks like an insider threat or the degree to which [that event] looks like a suspicious transaction.”

A Bayesian model can produce analytic results even with no data, because encoded within its nodes are the experts’ existing beliefs as to how various observable behaviors and actions interact with and influence each other. Once connected to data, moreover, the model results become much more detailed and predictive. Ware explained that the model is “not so much learning from the data as watching the data as it changes. And as the data changes, then the model updates as well, so that the beliefs change.”

As a result, Ware recommended that organizations consider tapping more data sources, not less. “So often we discover that a breach has taken place months after” it happened, Ware said. “When you think through that, you realize that the data was there at the time the breach was taking place – maybe even before. But that data wasn’t actionable [so] you couldn’t make a decision from it.”

Because Bayesian models can represent a wide array of behaviors, actions and dynamics, they open up possibilities for many more types of data to be processed before the results are presented to analysts and decision-makers. For example, data sources applied to the Carbon model of ‘whole-person trustworthiness’ at the core of the company’s Constellation for Insider Threat solution would certainly include logs of network and device activity like most other user behavior analytics (UBA) solutions, but could also extend to employee badge in/out data, printer activity logs, on-boarding/off-boarding files, travel and expense records and incident and investigative information — even open-source feeds and public records.

Critically, these broader data sets provide deeper context, which enables the model to ‘reason’ that even though an individual may work after hours, or print an unusually large file or a document he doesn’t usually have access to, other evidence can ‘overrule’ the possibility that he represents a threat. With more conventional data-driven solutions, Ware cautioned, “if you just built an alert for printing to an unusual printer, or for printing a large file, or for coming into work after 6:00 PM, then you’ll end up with lots and lots of alerts that are almost always easily explained.”

With a model-driven UBA solution, Ware added, “we can connect it to all those alerts that come from other machine-learning approaches, or to specific pieces of data.” Even if an organization is generating thousands or even hundreds of thousands of events per day, the model can analyze, prioritize and resolve those alerts the same way an analyst would — only at massive scale and virtually in real time. With the model, Ware said, “you can really build a scalable system, and you can just let the analyst see the ones that are of serious concern.”

* The podcast interview runs from minutes 4:35 – 9:10.

#   #   #

Note: Haystax Technology has just been named to Red Herring’s Top 100 Global list, on the heels of also becoming a SINET 16 winner. If you have what it takes to work for an award-winning company that builds pioneering software products, we’d love to hear from you. Check out our Jobs page for the latest notices seeking model developers, data scientists, data engineers, full-stack software and cloud operations developers, and more.