Every national security clearance investigation revolves around a simple question: “What is this person not telling me?”
After all, if every applicant for a cleared job was completely honest, there would be no need for investigations in the first place. A clearance could be processed as quickly as it took for the applicant to submit paperwork and for the government to review it. In such a perfect world, the savings in time and money would be astronomical.
The reality, as every investigator knows, is that some applicants lie about — or at least downplay or otherwise conceal — aspects of their backgrounds. Often this behavior stems from a desire to ‘put their best foot forward.’ But sometimes such omissions reflect more malign intentions, such as an attempt by a foreign intelligence service to infiltrate and spy on the U.S. government.
Background investigations will thus remain a crucial step in the clearance process, despite the hassle and expense.
Haystax eases these burdens by leading investigators to the specific aspects of each case that are most likely to yield actionable information. To achieve this, we:
- Deploy a probabilistic model whose central hypothesis is that a person is clearance-worthy.
- Connect to a broad range of data sets that are augmented by machine learning and other analytic techniques.
- Apply the enhanced data to the model as evidence for or against our hypothesis.
- Display the analytic results in a simple interface that conforms to commonly accepted investigative workflows.
Security analysts call this ‘smart’ user behavior analytics (UBA). What distinguishes our approach from more conventional UBA solutions that rely solely on machine learning or rules-based systems is the model itself, which is known as a Bayesian inference network.
Named after Thomas Bayes, an eighteenth-century English statistician and cleric, Bayesian networks are ideally suited for problems where:
- The event being predicted has never before occurred, or has occurred but only very infrequently.
- The supporting data is missing, incomplete and/or inconsistent.
- The analytic results must be transparent, consistent, repeatable and analytically defensible.
Our Bayesian network for workforce security, known as Carbon (pictured, top), encodes hundreds of facets of human behavior; many are key risk indicators that represent early warning signs of a potential insider threat.
Carbon’s ‘belief’ that a person embodies clearance-worthiness concepts is expressed as a probability, informed both by knowledge about his or her background as well as available information about the prevalence of certain behaviors in the population at large. Moreover, each node in the model is linked to every other node, meaning model results can be updated any time new data is applied as evidence.
This unique approach means Carbon can make inferences about a person’s behavior even with limited background information; the investigator’s original question now becomes: “What is this person not telling me, given all the things I already know about human behavior?” Basically, the Carbon model is taking advantage of the fact that investigators possess a wealth of reliable information about even their most untruthful subjects and can use that knowledge to generate new investigative leads.
For example, imagine that an investigator is looking at the case of a security clearance candidate who’s had some recent concerning behavior — say, an arrest for drunk and disorderly conduct. It’s reasonable to think the person might be concealing further misconduct. But what is it?
The Carbon model has some ideas:
- First, the model’s belief that the candidate abuses alcohol changes from 6% (the prevalence of alcohol abuse in the general population) to 47% when data is entered showing the candidate was arrested for an alcohol-related offense.
- Looking at the model’s other beliefs provides additional insights. For example, its belief that the subject may have an alcohol problem leads it to infer that the person may also have issues with illegal drugs (a 32% probability), be vulnerable to coercion (30%) or misuse an IT system (27%). The image below displays some of these inferences.
Conversely, the model does not infer a connection between the possible alcohol issue and allegiance to the U.S., foreign influence, foreign preferences or other loyalty issues.
While an investigator should never jump to conclusions regarding an applicant’s conduct based on fragmentary evidence or model results alone, the model has given the investigator a clear, tailored indication of the most likely risks for this particular applicant. (And every other applicant as well.)
A reasonable next step would be build an investigatory roadmap around the applicant’s substance-abuse issues, with a secondary emphasis on the person’s use of IT or vulnerability to coercion. The roadmap could be adjusted as the investigation reveals new issues that can be applied as evidence to update all beliefs in Carbon (image below).
Our operational experience with Carbon confirms that a smart UBA approach centered around Bayesian networks is better suited to solving those classes of hard problems for which ‘big data-only’ approaches are ineffective, or don’t scale well. These include critical mission areas like insider threat detection and various forms of financial risk mitigation, such as bank- and insurance-fraud prevention and anti-money laundering, plus terrorism and cyber-threat intelligence analysis.
And clearance investigations, of course. The result of using our UBA solution to support a risk-based investigatory approach is a clear, well-contextualized picture of each applicant’s risk profile; increased confidence in the investigation’s accuracy and completeness; and faster, more cost-effective investigations.
# # #
Kevin Kiernan is a Senior Data Scientist at Haystax Technology.