Don’t Get Me Wrong, I love Splunk & Tableau…
March 20, 2018
Here’s the dilemma I faced when starting to write about Splunk and Tableau:
Great products solve real problems for real people. But great products don’t solve all problems for all people. Data analytics products particularly suffer from this as its quite easy to assume that if you can load data, transform it, analyse it, query it, visualise it and interact with the results that you should use it to answer all data related questions and problems.
Such is the case for Splunk and Tableau, or log monitoring and BI tools more generally. Specifically, I want to look at how well they support security and risk use cases. These products, or equivalents, are ubiquitous in organisations now and the result is that we at Panaseer are always being asked to compare ourselves. Now don’t get me wrong, I am a fan of both products, but my hesitation comes from the fact that I am not an expert user of either product (though happy to be constructively educated). At the same time, our customers have chosen to invest in a combination of these tools and ours to help them solve different problems and answer different questions, so it seems foolish not to draw out some of those differences since understanding them will help you pick the right tool for the right problem.
It all started when one of our customers shared his views on how he viewed BI tools as things to make “silo views” – single reports or dashboards. I am familiar enough with them to know that this is not true, but I understand what he was getting at. People often turn to BI tools to provide a one-time answer to a stakeholder’s question. Tableau’s visualisations are very appealing (I love how a core part of their mission is to embed visualisation best practice into the tool and we encourage that at Panaseer). Intuitive drag and drop features make it relatively easy to create interactive visualisations with drill-downs into charts. You can easily drop a few of them into a dashboard and build the narrative you want to tell.
So what’s the issue?
The success you achieve in answering one set of questions can open up a Pandora’s box of problems as a number of scenarios/requirements can arise where that flexibility becomes a challenge. For example, your audience asks more questions than your dashboard can answer so you have to go back and rebuild the data model. They don’t trust the analysis because the numbers differ in some way from other analysis they have seen from your peers and the dashboards can’t help you to explain why. They love the analysis but want it refreshed more frequently and with more data and so on. BI tools are more expert than people think.
Tableau and BI tools
Dropping a single data file into Tableau and plotting a chart is very different to producing robust, automated data pipelines with sufficient data modelling to enable the types and depth of interaction your users expect in a dashboard. Who is capturing the requirements of these users? Who has the skills to model the data and interactions to ensure the dashboards are performant and functional as the data scales? Who is supporting the automated pipeline and checking that it is performing as expected? Who is designing the look and feel and ensuring that self-service users have the context and assumptions at hand to correctly interpret the data? These questions almost never arise all at once but before very long you find yourself supporting an in-house product that you, the security team, never set out to build. On the flip side, if you knowingly walked into a product-esque build then following proper product development processes possibly mean you never got the project off the ground in the first place!
Here is where I introduce another truth about great products. Great products build great communities and some of those members become ninjas, wizards or Jedi in the product’s capabilities. It’s these users that will jump to the defence of their chosen product at this point and, by all means, I encourage that. With enough will-power, time or budget you can solve all of these challenges and make the tool do what you want, but should you? You have smart people in your organisation so use them to secure and defend the organisation rather than building and supporting a reporting product. Let someone else solve that problem!
The strengths and weaknesses of Tableau and other BI tools are that they are, by their very nature, domain agnostic. That means it falls to the user/developer to impart domain knowledge to extract value.
Splunk on the other hand recognised the value in helping the user to unlock value quicker by structuring and presenting data so that it aligns with a particular domain. In our case, that’s enterprise security. A suite of security dashboards and applications that are configurable out of the box significantly accelerates any teams value. Behind Splunk is a log search platform that addresses a number of the data pipeline challenges raised previously. An array of data connectors for common security tools and logging, high velocity ingest of large volumes of data, support for low latency queries coupled with the data pipeline health monitoring makes for a robust solution. In terms of the security use cases it’s been used to great effect for threat hunting and has emerged as a good SIEM tool – particularly following their acquisition of Phantom, the orchestration platform, which allows teams to integrate workflow and automation more effectively. It’s easy to see how Splunk has managed to acquire a dominant market share.
And yet, our customers and prospects are talking to us about replacing the platform or going with something more next generation. So why is that? There is no definitive answer but in the next section, I will talk about some of my observations from the feedback we’ve received.
Why do users want to go next-gen?
Firstly, the out-of-the-box dashboards and applications provide a challenge as users often want more and more customised views. Despite the initial ‘wow’ moment, as users start to interact with these dashboards they once again find that they can’t quite answer the questions they are posing. Bespoke development of dashboards and queries once again become the norm. At the risk of angering the Splunk druids, this is unsustainable. I’ve witnessed former NYPD cops copying and pasting queries that the druid built for them without really knowing what they are doing. This balance between configuration and customisation is something we are acutely aware of at Panaseer and continue to explore innovative ways to overcome these inherent challenges.
Secondly, as an incumbent, Splunk has to react to market demand for functionality. In doing so it has fallen into the franken-stack (we have previously coined this term, you’re welcome) trap of bolting on lots of different products and retrofitting functionality which clearly affects the user experience (as an example, look at the Caspida acquisition and the integration of machine learning – I will watch this space with interest though as Splunk has deep enough pockets to respond!). This just adds to the configuration & maintenance overheads the in-house platform support team are dealing with already.
The pitch I made earlier makes the setup of the pipeline appear easy, of course, it’s not that straightforward and a lot of data challenges (e.g. query performance, data duplication, field mapping, etc.) that are uncovered downstream usually need to be managed and accounted for at upstream where specialist skills are needed. It is also worth noting that the cost model can often be a disincentive to loading more data on to the platform and as one customer described it, the additional ‘tax’ of purchasing the security-specific applications can often push the potential value out of the reach of many security teams.
The data model
I mentioned the data model previously and I want to come back to that now. Splunk embedded a Common Information Model (CIM) for security to enable the data and entities (e.g. IP address, hostnames) to be correlated across the diverse data streams that are loaded into the platform. However, by their own admission (see Splunk CIM documents) they settled for one that covers the lowest common denominator across the different security domains. My interpretation of such a model is that this can only work well if the data is clean, a supporting contextual data source is available or the pipeline wizards are on hand to massage each data source as exceptions and anomalies arise. None of these can be relied upon in the enterprise and this can limit the extensibility of the applications.
Some of the open data schemas like Apache Spot made steps to improve this with contextual lookup models but here at Panaseer we felt these haven’t gone far enough to really unlock the entity-centric risk analysis that is at the heart of our platform. Being able to model a range of entity (or asset) types such as devices, users, applications, vulnerabilities or data requires a richer schema that resolves and links each entity across every data source on the platform. This level of detail, designed into the platform from scratch and informing technology choices, is the next generation that our customers and prospects want from our product. Essentially, this entity resolution is Panaseer’s ‘secret sauce’.
Regardless of the technology or platform type, organisations are now looking for data science out of the box, but they need to recognise that it’s not zero effort on their part. Nonetheless, we as product developers must do more to lower the barrier to entry through improved user experience that aligns with what our customers want to do. We also need to help them build trust in insights they derive from these tools by being more transparent about the data pipeline and analysis assumptions.
As security estates expand and more and newer technology gets incorporated you can’t continue with a manual model. It’s an outdated strategy. Yet, it sometimes feels that in the security domain it is hard to build a data product that has repeatable value across many organisations. The key is finding the most direct and aligned route in delivering the objective that will save you time, aggravation and money.
And so we at Panaseer bent our head to create a new category in risk management – Continuous Controls Monitoring.