Entity resolution: the secret agent of Continuous Controls Monitoring
April 28, 2021
We often talk about what Continuous Controls Monitoring is and why organisations need to get visibility into their security posture. But we rarely talk about how it actually works. There is an open data-science secret to automating security measurement: entity resolution.
Much like MI6 have James Bond, CCM’s secret agent is entity resolution. Clunky metaphor aside, let’s get into entity resolution and its application in CCM.
What is entity resolution?
Entity resolution is the data science process that CCM relies on to be able to take data from all kinds of sources and aggregate it into useful security metrics. At a high level, it works out what information is about the same thing and how many distinct things are represented in all that data.
Here’s a quick rundown of what CCM does: CCM ingests data from various disparate sources in your organisation and uses entity resolution to create a unified, context-rich view. The process cleans, normalises, aggregates, de-duplicates, correlates and unifies data from all these sources.
But what value does this bring?
Entity resolution provides links and enrichment across the ‘entities’ in an organisation – that is to say: people, endpoints, servers, accounts, databases, applications, and more.
It is the ‘secret agent’ that combines all the fragments of data from all your siloed data sources. It gives the ability to understand every asset, the status of all controls relating to that asset, and how those assets relate to each other and the business. This view is fundamental for any analysis about your assets. Without it, it would be challenging to even determine with confidence how many endpoints you have.
This allows automated, real-time security metrics that provide a view of the overall security posture of the organisation.
Entity resolution is not something unique to Panaseer or Continuous Controls Monitoring, though. It is a process that has applications across marketing, finance, investigation and more. However, it is relatively new to apply entity resolution to understanding cybersecurity posture.
In the context of CCM, it takes data from multiple sources across the environment to create the richest possible view of each entity.
An example of entity resolution
To explain the concept to a layperson, we like to use what we call the ‘James Bond’ example (hence the clunky metaphor).
Say we have a number of data sources providing different pieces of information about people. We want to get all the information that we can about one of the people in those records – James Bond. All of the data sources provide records about him, but they refer to him in different ways.
These data sources might include:
- The MI6 HR database
- The registry of international spies
- Goldfinger’s database of enemies
And those sources may refer to him as:
- James Bond
- Bond, James
- firstname.lastname@example.org (his work email)
- +447007007 (his work phone)
Entity resolution is the process that works out that all those data sources are referring to the same person. And the same for every other person in the data sources, not just James. By triangulating across these sources to find relevant data points, entity resolution creates a unique identifier for each person and enriches it with information. This unified view provides more context about James Bond than could be gathered from any single original source.
Common pitfalls must be avoided during this process. What if there is more than one James Bond for example? In our case, we are creating an entity for the British spy. But what if we have data on James Bond, the twentieth-century American ornithologist of the same name? The entity resolution process must be capable of working out that these are two different people and therefore two different entities.
How does it apply to Continuous Controls Monitoring?
The entity resolution process is a key requirement for an effective Continuous Controls Monitoring operation. That’s because for CCM to provide understanding and visibility of security posture across the organisation, it needs to know about so many types of entity. In the context of CCM, the entity resolution process provides enriched information and context not not just for people (i.e. the employees in the organisation), but also for the other relevant entities mentioned above – endpoints, servers, accounts, databases, applications, and more.
To give a more realistic CCM-based example, imagine doing the same thing to identify endpoint devices within your organisation. Your data sources will be security tools like vulnerability scanners, patch management tools, endpoint agents, the CMDB or other directories, awareness tools, identity and access tools, and further relevant tools from HR and IT.
They may all refer to the device using slightly different identifiers, each containing different fragments of data relating to its purpose, ownership or location. The process identifies that all those sources are talking about the same device and brings all those fragments together to provide a detailed picture of that device.
Automating this for every asset in the organisation is a recipe for enriched, trustworthy asset inventory, which is the foundation for understanding your cybersecurity posture.
It’s easier said than done, though.
The entity resolution process needs to be entirely automated so that the security measurement and metrics that rely on it are continuously up-to-date and available, compared to the outdated standard of weekly/monthly/quarterly. It needs to continuously compare current status with current policy, so you can see where you stand versus where you expect to be.
Automation drives consistency and provides context, so that conversations and decisions are based on reducing risk, and there is no cross-functional debate about the validity of the data. Automating entity resolution provides that all-important data trust.
With all that achieved, entity resolution can therefore be seen as the cornerstone of an effective CCM security measurement programme. Using an entity resolution process means you can look at a CCM platform not just as a platform that produces automated security metrics, but as an evidence collection system. It proves that your controls are operating with integrity and the underlying data is trustworthy. By providing regularity, showing the audit trail, demonstrating the chain of custody of the data, the lineage, how you got it, where from, how it’s transformed and presented, the better trusted that data will be.
Once you have the trusted asset inventories, this gives the organisation a single source of data truth. The CCM platform uses that to provide a trustworthy, up-to-date, view of the organisation’s cybersecurity posture and the effectiveness of its security programme.