Why Data Science is Becoming so Important in Cyber Security

Ultimately data science is enabling the cyber-security sector to move from assumption to facts. For the last decade, the cybersecurity sector has been driven by FUD concerns – fear, uncertainty and doubt. Spend on cybersecurity was justified by the rationale that ‘if we don’t have XYZ widget then you only have yourselves to blame when bad stuff happens.”

And the bad stuff is only increasing. The relationship between industry and cybercriminals is asymmetric – the attacks succeed because of the challenge that companies face in maintaining perfect cyber hygiene – they have tens of thousands of computers and they have tens of thousands of employees using those machines. And much like in the field of counter terrorism, the adversary only needs to succeed once, whereas defenders have to get it right every single time.

This is further complicated by the myriad of IT systems and security technologies that have been deployed over the years to protect the company. Often they do not talk to each other and those responsible for security understandably find it hard to see a joined up picture of what’s going on.

However, this era of operating blind and justifying spend on FUD is getting old. Chief Information Security Officers don’t want to operate on gut instinct – they want and need to be able to develop a value proposition that outlines how they are prioritising what to focus on, justifying it and then showing how investment is solving it in ways business can understand. To do this relies on them having access to the right data.

This is where data science comes in. With the correct data, CISOs can translate technical risk into business risk, deliver a business case to solve it and demonstrate success. The current struggle is that CISOs have information that is meaningful but not timely, or it is timely but not meaningful because the content is too technical and siloed. What they really need is data that will enable them to market and measure the security programme – these are the pivotal cyber security skills gap that must be closed.

To effectively market the security programme the CISO wants to be able to prove the risk status and priorities, so they can articulate opportunities, show success and outline to the Board where they will get the best bang for buck on a roadmap. The key areas in cybersecurity are: identification (or prevention), detection, response and recovery. There is already a lot of spend and investment in data science approaches in the detection and response space, but ultimately no organisation is currently more secure as a result.

This is because the root cause is often failure to prevent, which requires an enhancement in enterprise cyber hygiene. Obviously knowing you have been breached important, but ultimately prevention is better than a cure.  This is where new data science approaches come in.

Many large organisations already have a team of data scientists – however they do not usually work in security. They report to the Chief Data Officer and deal exclusively in business outcomes. For those companies that are starting to embrace data science as part of their security strategy, it is by and large coming from external consultants.

Working with the security team, data science can integrate with controls to give those managing them better sense of what to focus on, and can help manage upward by combining technical data to ‘measure something that matters’, as well as ensuring data is robust and not misleading (accidentally or otherwise).

There are huge opportunities at the intersection of data science, big data technology and cyber security to set a foundation for business to be able to gain control over ‘cyber’ as a business risk. Global banks are at the forefront hiring data scientists for the security team and aggregating data into Hadoop environments.

As organisations start to look to gain continuous visibility into risk and security performance to manage it, there are three critical questions they will have to answer to work out how able they are to take a data driven approach. What is the data we have available and its quality? What does that mean for the insight we can get? What is our game plan to add and improve our data sources to be able to answer the questions that matter most?