Don’t Get Me Wrong, I love Splunk & Tableau…

March 20, 2018

Mike MacIntyre

Great products solve real problems for real people. But great products don’t solve all problems for all people. Data analytics products particularly suffer from this as its quite easy to assume that if you can load data, transform it, analyse it, query it, visualise it and interact with the results that you should use it to answer all data related questions and problems.  

This was the dilemma I had when asked to write a blog about Splunk and Tableau or more generally log monitoring and BI tools and how well they support Security and Risk use cases. These products, or equivalents, are ubiquitous in organisations now and the result is that we at Panaseer are always being asked to compare ourselves. Now don’t get me wrong, I am actually a fan of both products, but my hesitation comes from the fact that I am not an expert user of either product (happy to be constructively educated in the comments section). At the same time, our customers have chosen to invest in a combination of these tools and ours to help them solve different problems and answer different questions, so it seems foolish not to draw out some of those differences since understanding them will help you pick the right tool for the right problem. 

It all started when one of our customers shared his views on how he viewed BI tools as things to make “silo views” – single reports or dashboards. I am familiar enough with them to know that this is not true, but I understand what he was getting at. People often turn to BI tools to provide a one-time answer to a stakeholder’s question. Tableau’s visualisations are very appealing (I love how a core part of their mission is to embed visualisation best practice in to the tool and we encourage that at Panaseer). Intuitive drag and drop features make it relatively easy to create interactive visualisations with drill downs into charts. You can easily drop a few of them in to a dashboard and build the narrative you want to tell.  

However, the success you achieve in answering one set of questions can open up a Pandora’s box of problems as a number of scenarios/requirements can arise where that flexibility becomes a challenge. For example, your audience asks more questions than your dashboard can answer so you have to go back and rebuild the data model; they don’t trust the analysis because the numbers differ in some way from other analysis they have seen from your peers and the dashboards can’t help you to explain why; they love the analysis but want it refreshed more frequently and with more data and so on. BI tools are more expert than people think.  

Dropping a single data file in to Tableau and plotting a chart is very different to producing robust, automated data pipelines with sufficient data modelling to enable the types and depth of interaction your users expect in a dashboard. Who is capturing the requirements of these users? Who has the skills to model the data and interactions to ensure the dashboards are performant and functional as the data scales? Who is supporting the automated pipeline and checking that it is performing as expected? Who is designing the look and feel and ensuring that self-service users have the context and assumptions at hand to correctly interpret the data? These questions almost never arise all at once but before very long you find yourself supporting an in-house product that you, the Security team, never set out to build. On the flip side, if you knowingly walked in to a product-esque build then following proper product development processes possibly mean you never got the project off the ground in the first place!  

Here is where I introduce another truth about great products. Great products build great communities and some of those members become Ninjas, Wizards or Jedi’s in the products capabilities. It’s these users that will jump to the defence of their chosen product at this point and by all means I encourage that. With enough will-power, time or budget you can solve all of these challenges and make the tool do what you want, but should you? You have smart people in your organisation so use them to secure and defend the organisation rather than building and supporting a reporting product. Let someone else solve that problem! 

The strengths and weaknesses of Tableau and other BI products is that they are, by their very nature, domain agnostic which means it falls to the user/developer to impart domain knowledge to extract value. Splunk on the other hand recognised the value in helping the user to unlock value quicker by structuring and presenting data so that it aligns with a particular domain, in our case, enterprise security. A suite of security dashboards and applications that are configurable out of the box significantly accelerates any teams value.  

Behind Splunk is a log search platform that addresses a number of the data pipeline challenges raised previously. An array of data connectors for common security tools and logging, high velocity ingest of large volumes of data, support for low latency queries coupled with the data pipeline health monitoring makes for a robust solution. In terms of the security use cases it’s been used to great effect for threat hunting and has emerged as a good SIEM tool – particularly following their recent acquisition of Phantom, the orchestration platform, which allows teams to integrate workflow and automation more effectively. It’s easy to see how Splunk has managed to acquire a dominant market share. And yet, our Customers and Prospects are talking to us about replacing the platform or going with something more next generation. So why is that? There is no definitive answer but here are some of my observations from the feedback we’ve received.  

Firstly, the out-of-the-box dashboards and applications provide a challenge as users often want more and more customised views. Despite the initial ‘WOW’ moment, as users start to interact with these dashboards they once again find that they can’t quite answer the questions they are posing. Bespoke development of dashboards and queries once again become the norm. At the risk of angering the Splunk druids, this is unsustainable. I’ve witnessed former NYPD cops copying and pasting queries that the druid built for them without really knowing what they are doing. This balance between configuration and customisation is something we are acutely aware of at Panaseer and continue to explore innovative ways to overcome these inherent challenges. 

Secondly, as an incumbent, Splunk has to react to market demand for functionality. In doing so it has fallen in to the franken-stack (we have previously coined this term) trap of bolting on lots of different products and retrofitting functionality which clearly affects the user experience (as an example, look at the Caspida acquisition and the integration of machine learning – I will watch this space with interest though as Splunk has deep enough pockets to respond!). This just adds to the configuration & maintenance overheads the in-house platform support team are dealing with already.  

The pitch I made earlier makes the setup of the pipeline appear easy, of course it’s not that straightforward and a lot of data challenges (e.g. query performance, data duplication, field mapping, etc.) that are uncovered downstream usually need to be managed and accounted for at upstream where specialist skills are needed. It is also worth noting that the cost model can often be a disincentive to loading more data on to the platform and as one customer described it, the additional ‘tax’ of purchasing the Security-specific applications can often push the potential value out of the reach of many security teams. 

I mentioned the data model previously and I want to come back to that now. Splunk embedded a Common Information Model (CIM) for security to enable the data and entities (e.g. IP Address, Hostnames) to be correlated across the diverse data streams that are loaded in to the platform. However, by their own admission (see Splunk CIM documents) they settled for one that covers the lowest common denominator across the different security domains. My interpretation of such a model is that this can only work well if the data is clean, a supporting contextual data source is available or the pipeline Wizard’s are on hand to massage each data source as exceptions and anomalies arise. None of these can be relied upon in the enterprise and this can limit the extensibility of the applications.  

Some of the open data schemas like Apache Spot made steps to improve this with contextual lookup models but here at Panaseer we felt these haven’t gone far enough to really unlock the entity-centric risk analysis that is at the heart of our platform. Being able to model a range of entity (or asset) types such as Devices, Users, Applications, Vulnerabilities or Data requires a richer schema that resolves and links each entity across every data source on the platform. This level of detail, designed in to the platform from scratch and informing technology choices, is the next generation that our Customer and Prospects want from our product.   

Regardless of the technology or platform type, organisations are now looking for data science out of the box, but they need to recognise that it’s not zero effort on their part. Nonetheless, we as product developers must do more to lower the barrier to entry through improved user experience that aligns with what our customers want to do. We also need to help them build trust in insights they derive from these tools by being more transparent about the data pipeline and analysis assumptions – but more about this in a later blog! 

As security estates expand and more and newer technology gets incorporated you can’t continue with a manual model. It’s an outdated strategy.  Yet, it sometimes feels that in the security domain it is hard to build a data product that has repeatable value across many organisations. The key is finding the most direct and aligned route in delivering the objective that will save you time, aggravation and money.