Maturing maturity models with Spotify Squad Health Check

How does a manager know what needs to be improved and what help can he get to decide?

Academic journals are stuffed with papers about what knowledge and practices it takes to really succeed at doing X, where X is some organisational method of getting things done.

Martin Fowler defines them like this:

A maturity model is a tool that helps people assess the current effectiveness of a person or group and supports figuring out what capabilities they need to acquire next in order to improve their performance.

The idea is that if a manager does not know what needs to be improved to do X then well now he has a list. With the addition of some little boxes he has a checklist.

Here is what that might look like (and you should pray your checklist is half as rational as this one):

The problem with checklists is that people take it very seriously when there is a box without a tick. They start to ask questions and get judgy. The people being judged can't then help but try to put a tick in every box. Judgement placed upon them replaces the judgement exercised by them and they become box-tickers.

Henrik Kniberg, the author of the example checklist above, spells out what this might look like in practice:

Lisa: “But we do 2 week sprints and almost always manage to deliver what we commit to, and the customers are happy. Sprint burndown charts wouldn’t add value at this stage.”

Big Boss: “Well it says here that you should do it, so don’t let me catch you cheating again, or I’ll call in the Scrum Police!”

That looks like a disaster to me.

Compared to the academic tradition the Agile movement is the new kid on the block. It has an image of being more modern and more relaxed and therefore feels younger. This is true, but another way of looking at it is that now we have access to more experience of software development, itself a very new field. This has given our generation the tools to approach ignorance with confidence and to value individual autonomy, feedback, and lived-experience over 1970s-era managerial innovations.

A gap in a checklist is no longer a crime, the checklist is the crime.

What can you base your interventions on now

But how do you tailor your process improvement methods to match your values? How does a manager know what he needs to improve for a team if it is not on a list? What is needed is a new way to put it on a list, and to place the autonomous team at the center of that new process.

Enter the "Spotify Squad Health Check":

At Spotify we’ve done careful experimentation with this for several years, and found some ways that work fairly OK (as in more gain than pain). At best Helpful, at worst Meh, and so far never a Disaster. We’ve introduced this health check model to several other companies as well and heard similar results

...

the primary audience is the team itself rather than management.

The health check process involves identifying around 10 questions or themes for the team to discuss and capturing their sentiment about each theme. The theme "can we release easily?" might elicit responses of "our releases are awesome/meh/crappy" i.e. green, amber and red, the colours of a traffic light.

The health check session involves a conversation and planning poker style card-reveal of a traffic-light coloured card. The facilitator should jot down the number of cards of each kind, as demonstrated by learning facilitator Barry Overeem below:

Table of captured health-check results

As with planning poker, there will usually be a consensus position and people who are at odds with it. Those outliers often have a different perspective to offer. Once, I recall that nearly everyone responded green for "Fun" because we all enjoyed our work. A relative newcomer gave a "Red", and pointed out that we don't tend to socialise much, which she considered disappointing. A few greens turned amber at that insight, as you might expect, but the takeaway is that this team was defining and improving it's own standards about what "good" meant, and was doing so from within the context that mattered most - from within the project.

Running the Numbers

The real genius is that once the sentiment has been quantified with one of these three values, an aggregate can be produced. If a Green/Happy is scored 1, Amber as 0, and a Red/Sad as -1, then the numerical mean captures something of every opinion expressed. Suddenly multiple perspectives on a mass of complex interlocking concerns have been reduced to ten handy numbers.

Of course data from a single team is unlikely to produce reliable statistics, as staff changes, absences, and short-term factors can easily distort your analysis. The statistics at team level are more for spot checking and conversation starting. But Spotify Labs do give some examples of good visualizations that can be used to spot patterns, such as overconfident teams that need a reality check.

Aggregating the data across multiple teams will yield more reliable statistical analysis, but will tend to blur out interesting details. Theme level aggregates composed of multiple teams' data will show up the pain-points which effect your whole programme.

Tracking this data in a spreadsheet over the quarters can also reveal trends. You are entitled to expect steady improvement rather than regression. Upon spotting a regressions a good response might be to trigger a deep-dive retrospective on that theme.

Potential Pitfalls

Of course, all of this number crunching can be a little dry. One scrummaster we spoke to noticed developers skeptical and confused about what the point of the process was, even at the end of a session. Yet everyone we spoke to seems to believe that the process has value and some large London corporations have adopted the process as a quarterly mainstay across their technology function.

As with many things, we believe success comes down to excellent facilitation by scrum masters and leaders.

If you mechanically capture the numerical sentiment data and carry it away to be scrutinised outside the team then perhaps you will be wasting your time. Worse, is the danger that you might draw the attention of senior managers and external interference.

Instead use the themes and the traffic-light poker-card exercise as a way to encourage deep and broad conversations. Capture actions for the team to execute locally. If you are doing this then you will have had an effective retrospective on anyone's definition.

Now, if you were good enough to manage both of those things in the one meeting, then that might really be something.