How Do You Solve a Problem Like Big Data?
“Big Data!”, “Big Data!”, “Big Data!” – Everybody’s talking about Big Data. The term rolls off the tongue and often gets thrown around all too hastily by hype-peddling vendors as the answer to all data analysis tasks. Its overuse as a marketing buzzword has led to some consumer fatigue. I’ve seen the mere mention of the two words elicit derisive reactions from businesses who see it as little more than a fad.
I think part of the frustration comes from an overemphasis on Big Data as a technical enabler rather than the outcomes and benefits it can deliver. The business tends not to care much about the size of the data (commonly referred to as the 3, 4 or 5 V’s of Big Data), or the deep learning algorithms your “unicorn” data scientists implemented, or the high performance distributed data stores you deployed to crunch the numbers. They’re interested in the answer – not the working out. The promise of Big Data has to be all about the insights that can be drawn from any and all data available to you, to the decisions that these insights help steer and ultimately the actions taken based on these decisions.
The other common complaint is around the reputation Big Data projects have earned for delivering limited business value. Gartner’s 2015 Hadoop Adoption Survey concludes this will be the main reason behind a stagnation in Hadoop growth over the next two years (note a subsequent article from Gartner contradicts this view by saying Hadoop procurement trends are showing healthy growth). Skills shortages, immature technologies and integration challenges can hinder progress with Big Data but there are plenty of organisations who have made it work and are reaping the rewards (see the recent AtScale Hadoop Maturity Survey but take with a pinch of salt as it was conducted by the major Hadoop vendors). Here are a few thoughts on attributes commonly shared by fruitful Big Data projects based on my observations.
Big Data Needs a Mission
First and foremost, think of the business case. Consider the problems the business is trying to solve and the questions they are trying to answer before you assess if and how Big Data can help. Technologists love to get their hands on the latest toys but buying some hardware and putting Hadoop on it doesn’t immediately lead to a Big Data success story. Also be wary of attempts to force square pegs through round holes – Hadoop isn’t the answer to every problem. A business focus is more likely to lead to data products that actually get used resulting in enduring data initiatives.
Take the Right People Along with You on the Journey
Buy-in from executive stakeholders is generally the ideal form of sponsorship to ensure your project has the right level of resourcing including team, budget and access to data. Better yet, senior leadership who cultivate a data-informed corporate culture can lead by example by promoting the use of data to support decision making. Data products are only valuable when people are willing to take actions based on them and the more the output of your data projects gets embedded into Business As Usual, as part of tactical, operational or strategic decision making, the more difficult it is to turn off.
These business directives can also provide the necessary impetus to encourage collaboration across the business lines, IT operations and data analysis functions. Cooperation is essential to break down the silos that commonly exist between various enabling factors such as infrastructure, data and the business domain.
Start Small with Big Data…
Avoid the temptation to boil the ocean when starting a Big Data project. Embarking on a mission to build a data lake which brings together data from across the whole organisation might sound like the promised land but it’s an ambitious and most likely overwhelming undertaking. Such endeavours could take months or even years to get off the ground and may run out of budget before delivering any value. Even if you get to the implementation stage you may find that your solution is woefully out of date as the Big Data technology landscape has moved on.
Select a limited set of prioritised use cases to start with. Build these iteratively to rapidly deliver value and prove the concept of Big Data. By incrementally building up capability you can continue to justify investment for broadening the scope.
Another approach to increase the speed to value is to minimise the infrastructure complexity. For example, deploying in an existing lab environment (e.g. on VMs) with limited integrations (e.g. static data dumps rather than live system integrations) can limit the dependency on other functions and reduce lead times.
…But Don’t Lose Sight of Enterprise Integration
It can be relatively easy to cut corners under the context of a POC. You can start a POC with a laptop or a few machines under your desk or in your lab where you have full control of the environment to do with it what you wish. Eventually, the enterprise integration challenge is going to catch up with you as you transition your POC into production and integrate with business operations and systems. It’s often in this migration phase where Big Data projects can fall short and so it pays to get the right people involved early to consider matters such as corporate and regulatory policies on security, data privacy, audit and compliance as well as to understand the restrictions imposed by production IT governance. Production infrastructure can also introduce roadblocks when you have to think about Business Continuity, Disaster Recovery, High Availability and backup all on scaled up hardware and software that IT are prepared to support.
Don’t Underestimate the Total Cost of Ownership
When considering the potential return on investment of a Big Data initiative, it’s often easy to underestimate the total cost of ownership, which can quickly add up. Some Big Data projects are driven by cost-cutting objectives to move data off expensive databases or log management solutions. While it might save in licensing costs, there are other expenses to consider.
Commodity hardware isn’t cheap when you have to deploy a whole cluster. On top of that you might need multiple clusters to support development, testing, pre-production, production and disaster recovery environments. And while open-source software might be free to procure it’s not free to operate. The costs to design, deploy and manage these systems on an on-going basis can be significant. On the bright side, you can start small and incrementally scale up to add more data and applications as demand rises.
You’ll also need to consider the costs of training staff. From systems administrators to developers to data scientists you’ll need teams who can maintain and take advantage of your new Big Data investment.
You might just be put off by the term or fundamentally disagree with the idea; either way, Big Data is here to stay. Even if you’ve missed the Big Data craze, chances are you’re benefiting from it on a daily basis. Whether it’s the recommendations you get from Amazon or when you’re watching your favourite show on Netflix or even a simple Google Search, Big Data is core to the business of these Internet giants and it’s rapidly spreading outside that domain to touch every part of our lives.
So ubiquitous has it become that Gartner has taken Big Data off its Emerging Technologies hype cycle, citing the fact that Big Data has become “normal”. One day, in the not too distant future, the novelty will wear off and there’ll be no differentiation in Big Data as everyone will be doing it. Maybe then we’ll revert back to “Big Data” just being called “data”.