Living in a state of accord.

Show Me the Metrics

There’s been a lot of new innovations going on in programming languages and development techniques in general lately, but there’s a really key missing element: metrics.

Seeing posts like Stephan Schmidt’s Go Ahead: Next Generation Java Programming Style and the response from Cedric Otaku really should make any professional software engineer concerned about the future of their craft. Not because either of them are necessarily right or wrong, but because two highly skilled engineers are engaged in debate based purely on pure personal preference. There’s barely even any anecdotal evidence provided, let alone actual measurements to see what impact on productivity, quality or other desirable qualities the proposed approaches have.

A while back I stumbled across the slides from Greg Wilson’s presentation Bits of Evidence. It’s almost a shame that the slides are well designed because they really are a short summary and I’d love to have heard all the extra information Greg spoke about around those slides. Even so, they really highlight the fact that we’re rushing off down all these new paths basically just because some guy said to. One of the side notes sums it up really well:

Please note: I’m not disagreeing with his claims – I just want to point out that even the best of us aren’t doing what we expect the makers of acne creams to do.

It would be easy to dismiss the importance of this because in your own experience technique A works better than technique B and with regular Agile retrospectives you’d notice if it had failings and worked to overcome them. Unfortunately, you’re very likely to fall victim to the same selection bias that makes psychics believable. People inherently remember the “hits” and forget the “misses”1, so even personal experience isn’t something you should depend on to make decisions unless it’s backed up at least partly by real hard data.

I like playing with new technology as much as the next person, but at the end of the day I have to wonder if the benefits are always as compelling as we’d all like to think. Probability would suggest that some of the new approaches really are a big improvement, but it would also suggest some of them are a mistake. Without metrics to judge them on, how are we going to know the difference?

1 – Wikipedia claims this was first raised in Broad, C. D. (1937). The philosophical implications of foreknowledge. Proceedings of the Aristotelian Society (Supplementary), 16, 177-209.