Can Data Scientists Sprint?
March 21, 2016
Can data scientists sprint? In this blog, let’s look at how we can figure out an agile approach to organise and schedule our work.
While there are well-defined tasks in data science (the ‘data’ bit – or in our case ‘big data‘), it also involves a lot of research activity (the ‘science’ bit). Research tasks can be as long or as short as time allows – a good example of the metaphorical piece of string. There are often many possible solutions to a problem with varying levels of time-investment producing varying levels of sophistication in the results.
Fully understanding the range of solutions, let alone implementing one, can be very time-consuming. We recognise, of course, that we need to apply the 80/20 rule and call time when we have a solution that meets customer requirements. Yet, even with this pragmatic approach, it’s still hard to estimate how long it will take us to achieve this balance. Remember – if we had a solution mapped out we wouldn’t be doing research in the first place! So how do we organise and schedule our work? Or, ‘can data scientists sprint?’
The ‘academic approach’
First, we tried what we’ll call the ‘academic’ approach. We would discuss which projects (typically broad areas of data analysis) were aligned with the current business prioritisation, then data science team members would plan their own schedule for the week. Our data scientists typically have research backgrounds, so we’re well-versed in self-directed work. This was somewhat successful, but crucially we were missing transparency around our decision-making processes. We were also lacking the ability to deliver intermediate results that could be understood by the rest of the business and would allow us to be held accountable for our progress. While this may work in academia it definitely doesn’t cut it in a tech company, where engineers need direction about what to build, customer/industry facing team members need to talk about what we can do and everyone needs to keep an eye on the roadmap.
So if we ask our question now, ‘can data scientists sprint?’, you might get a light jog.
Our engineering pals
Next, we looked for inspiration across our desks to the engineers. They use the agile methodology of software development. In short, this revolves around frequent iterations (‘sprints’) that always result in a viable product and allow teams to be responsive to changes of plan or customer feedback. ‘Just the ticket!’, we thought. We started out with our own agile weekly sprints (organised through Jira) and an end-of-week sprint demo. We eschewed daily stand-ups for a midweek update, as our tasks are often more diffuse and take longer to get to grips with. Although we may have feared that adding in more process would hinder us, there are many useful tools to make the agile workflow easy to navigate, such as integrations between IDEs, Jira, Github and Slack.
Unsurprisingly, our main challenge is in breaking work down into chunks that can be represented by Jira tickets. This is hard enough for engineering-type tasks, such as writing code to calculate our security intelligence metrics, but there’s a whole different set of challenges for tasks such as ‘what can we do with this data source?’ After some iteration, we’ve settled on a process we’re happy with and so I wanted to share it. You can turn your research tasks to tickets as follows:
- Ask yourself, ‘Why am I doing this research?’ If there are multiple research goals, find out which are the most urgent (according to the current prioritisation in your business) and limit the research scope accordingly. If you have multiple research tasks you many need to pick a few goals from each, or focus on one, depending on the prioritisation. For each high-priority goal, consider what questions you need to answer in order to achieve it. This approach has parallels with the GQM framework we use in our research.
- Write a to-do list, based on the goal and associated questions, as if you were just planning your own work.
- Review the list and try to group related points together. Think of a higher-level summary for each of these groups – this is your Jira ticket. The finer details are your sub-tasks.
- If you end up with many tasks, consider an iterative solution to reach the goal. Rather than completing half of it in detail, consider making a first pass using simple solutions and then using another sprint to revisit the issue and build in complexity. Having a sprint demo scheduled at the end of the week encourages this way of working as you’ll need something coherent and self-contained to present.
When following this approach, it is important to remain flexible. Tickets will exceed estimates sometimes, due to the unpredictable nature of research, so don’t consider this a failure. In addition, if you can schedule a mix of ‘data’ tasks alongside ‘science’ tasks in each sprint, this will limit frustration and ensure everyone in your team has something tangible to present come demo day.
Before working in this way I was a little sceptical about how well data science would fit into this kind of workflow, but I’ve been pleasantly surprised. While it doesn’t totally remove the ‘piece of string’ problem, it does minimise it – chopping up work into bitesize chunks makes it clear to which specific part it is difficult to assign a time estimate. The agile approach has provided other benefits too, particularly in the way we interact with the wider team. Using the same workflow and tools as engineering helps us integrate better, making it easy, for example, for requirements we generate to be linked from our Jira project to theirs. And breaking down research into small tasks demystifies the research process, helping us articulate to the rest of the business when we are working at capacity – particularly when there’s a lot of ‘thinking time’ involved.
So, can data scientists sprint? Panaseer ones can – and we can prove it!
If you’re working in a data science team using the agile approach, why not get in touch telling us what you like about it and how you have customised it to suit your needs? On the other hand, if you’re still reluctant to enter the world of Jira, we encourage you to give it a try following the suggestions above – there’s nothing more satisfying than going to the sprint board and dragging tickets over to ‘Done’!