Can Data Scientists Sprint?
Can data scientists sprint? In this blog, let's look at how we can figure out an agile approach to organise and schedule our work.While there are well-defined tasks in data science (the 'data' bit - or in our case 'big data'), it also involves a lot of research activity (the 'science' bit). Research tasks can be as long or as short as time allows – a good example of the metaphorical piece of string. There are often many possible solutions to a problem with varying levels of time-investment producing varying levels of sophistication in the results.Fully understanding the range of solutions, let alone implementing one, can be very time-consuming. We recognise, of course, that we need to apply the 80/20 rule and call time when we have a solution that meets customer requirements. Yet, even with this pragmatic approach, it’s still hard to estimate how long it will take us to achieve this balance. Remember – if we had a solution mapped out we wouldn’t be doing research in the first place! So how do we organise and schedule our work? Or, 'can data scientists sprint?'
The 'academic approach'
First, we tried what we’ll call the 'academic' approach. We would discuss which projects (typically broad areas of data analysis) were aligned with the current business prioritisation, then data science team members would plan their own schedule for the week. Our data scientists typically have research backgrounds, so we’re well-versed in self-directed work. This was somewhat successful, but crucially we were missing transparency around our decision-making processes. We were also lacking the ability to deliver intermediate results that could be understood by the rest of the business and would allow us to be held accountable for our progress. While this may work in academia it definitely doesn’t cut it in a tech company, where engineers need direction about what to build, customer/industry facing team members need to talk about what we can do and everyone needs to keep an eye on the roadmap.So if we ask our question now, 'can data scientists sprint?', you might get a light jog.Our engineering pals
Next, we looked for inspiration across our desks to the engineers. They use the agile methodology of software development. In short, this revolves around frequent iterations ('sprints') that always result in a viable product and allow teams to be responsive to changes of plan or customer feedback. 'Just the ticket!', we thought. We started out with our own agile weekly sprints (organised through Jira) and an end-of-week sprint demo. We eschewed daily stand-ups for a midweek update, as our tasks are often more diffuse and take longer to get to grips with. Although we may have feared that adding in more process would hinder us, there are many useful tools to make the agile workflow easy to navigate, such as integrations between IDEs, Jira, Github and Slack.Unsurprisingly, our main challenge is in breaking work down into chunks that can be represented by Jira tickets. This is hard enough for engineering-type tasks, such as writing code to calculate our security intelligence metrics, but there’s a whole different set of challenges for tasks such as 'what can we do with this data source?' After some iteration, we’ve settled on a process we’re happy with and so I wanted to share it. You can turn your research tasks to tickets as follows:- Ask yourself, 'Why am I doing this research?' If there are multiple research goals, find out which are the most urgent (according to the current prioritisation in your business) and limit the research scope accordingly. If you have multiple research tasks you many need to pick a few goals from each, or focus on one, depending on the prioritisation. For each high-priority goal, consider what questions you need to answer in order to achieve it. This approach has parallels with the GQM framework we use in our research.
- Write a to-do list, based on the goal and associated questions, as if you were just planning your own work.
- Review the list and try to group related points together. Think of a higher-level summary for each of these groups – this is your Jira ticket. The finer details are your sub-tasks.
- If you end up with many tasks, consider an iterative solution to reach the goal. Rather than completing half of it in detail, consider making a first pass using simple solutions and then using another sprint to revisit the issue and build in complexity. Having a sprint demo scheduled at the end of the week encourages this way of working as you’ll need something coherent and self-contained to present.