Entity resolution: the secret agent of Continuous Controls Monitoring

We often talk about what Continuous Controls Monitoring is and why organizations need to get visibility into their security posture. But we rarely talk about how it actually works. There is an open data-science secret to automating security measurement: entity resolution. Much like MI6 have James Bond, CCM’s secret agent is entity resolution. Clunky metaphor aside, let’s get into entity resolution and its application in CCM.

Thordis Thorsteins
read

What is entity resolution?

Entity resolution is the data science process that CCM relies on to take data from all kinds of sources and aggregate it into useful security metrics. At a high level, it works out what information is about the same thing and how many distinct things are represented in all that data.

Here’s a quick rundown of what CCM does:

CCM ingests data from various disparate sources in your organization and uses entity resolution to create a unified, context-rich view. The process cleans, normalizes, aggregates, de-duplicates, correlates, and unifies data from all these sources.

But what value does this bring? Entity resolution provides links and enrichment across the ‘entities’ in an organization – that is to say: people, endpoints, servers, accounts, databases, applications, and more.

It is the ‘secret agent’ that combines all the fragments of data from all your siloed data sources. It gives the ability to understand every asset, the status of all controls relating to that asset, and how those assets relate to each other and the business.

This view is fundamental for any analysis of your assets. Without it, it would be challenging to even determine with confidence how many endpoints you have. This allows automated, real-time security metrics that provide a view of the overall security posture of the organization.

Entity resolution is not something unique to Panaseer or Continuous Controls Monitoring, though. It is a process that has applications across marketing, finance, investigation, and more. However, it is relatively new to apply entity resolution to understanding cybersecurity posture. In the context of CCM, it takes data from multiple sources across the environment to create the richest possible view of each entity.

An example of entity resolution

To explain the concept to a layperson, we like to use what we call the ‘James Bond’ example (hence the clunky metaphor). Say we have a number of data sources providing different pieces of information about people. We want to get all the information that we can about one of the people in those records – James Bond. All of the data sources provide records about him, but they refer to him in different ways. These data sources might include:

The MI6 HR database
The registry of international spies
Goldfinger’s database of enemies

And those sources may refer to him as:

James Bond
Bond, James
007
james.bond@mi6.com (his work email)
+447007007 (his work phone)

Entity resolution is the process that works out that all those data sources are referring to the same person. And the same for every other person in the data sources, not just James. By triangulating across these sources to find relevant data points, entity resolution creates a unique identifier for each person and enriches it with information. This unified view provides more context about James Bond than could be gathered from any single original source.

Common pitfalls must be avoided during this process. What if there is more than one James Bond for example? In our case, we are creating an entity for the British spy. But what if we have data on James Bond, the twentieth-century American ornithologist of the same name? The entity resolution process must be capable of working out that these are two different people and therefore two different entities.

How does entity resolution apply to CCM?

The entity resolution process is a key requirement for an effective Continuous Controls Monitoring operation. That’s because for CCM to provide understanding and visibility of security posture across the organization, it needs to know about so many types of entities.

In the context of CCM, the entity resolution process provides enriched information and context not just for people (i.e. the employees in the organization), but also for the other relevant entities mentioned above – endpoints, servers, accounts, databases, applications, and more.

To give a more realistic CCM-based example, imagine doing the same thing to identify endpoint devices within your organization. Your data sources will be security tools like vulnerability scanners, patch management tools, endpoint agents, the CMDB or other directories, awareness tools, identity and access tools, and further relevant tools from HR and IT.

They may all refer to the device using slightly different identifiers, each containing different fragments of data relating to its purpose, ownership, or location. The process identifies that all those sources are talking about the same device and brings all those fragments together to provide a detailed picture of that device.

Automating this for every asset in the organization is a recipe for enriched, trustworthy asset inventory, which is the foundation for understanding your cybersecurity posture. It’s easier said than done, though.

The entity resolution process needs to be entirely automated so that the security measurement and metrics that rely on it are continuously up-to-date and available, compared to the outdated standard of weekly/monthly/quarterly. It needs to continuously compare current status with current policy, so you can see where you stand versus where you expect to be. Automation drives consistency and provides context, so that conversations and decisions are based on reducing risk, and there is no cross-functional debate about the validity of the data. Automating entity resolution provides that all-important data trust.

Final thought

With all that achieved, entity resolution can therefore be seen as the cornerstone of an effective CCM security measurement program. Using an entity resolution process means you can look at a CCM platform not just as a platform that produces automated security metrics, but as an evidence-collection system. It proves that your controls are operating with integrity and the underlying data is trustworthy.

By providing regularity, showing the audit trail, demonstrating the chain of custody of the data, the lineage, how you got it, where from, and how it's transformed and presented, the better trusted that data will be.

Once you have the trusted asset inventories, this gives the organization a single source of data truth. The CCM platform uses that to provide a trustworthy, up-to-date, view of the organization’s cybersecurity posture and the effectiveness of its security program.