Living in a state of accord.

Show Me the Metrics

There’s been a lot of new innovations going on in programming languages and development techniques in general lately, but there’s a really key missing element: metrics.

Seeing posts like Stephan Schmidt’s Go Ahead: Next Generation Java Programming Style and the response from Cedric Otaku really should make any professional software engineer concerned about the future of their craft. Not because either of them are necessarily right or wrong, but because two highly skilled engineers are engaged in debate based purely on pure personal preference. There’s barely even any anecdotal evidence provided, let alone actual measurements to see what impact on productivity, quality or other desirable qualities the proposed approaches have.

A while back I stumbled across the slides from Greg Wilson’s presentation Bits of Evidence. It’s almost a shame that the slides are well designed because they really are a short summary and I’d love to have heard all the extra information Greg spoke about around those slides. Even so, they really highlight the fact that we’re rushing off down all these new paths basically just because some guy said to. One of the side notes sums it up really well:

Please note: I’m not disagreeing with his claims – I just want to point out that even the best of us aren’t doing what we expect the makers of acne creams to do.

It would be easy to dismiss the importance of this because in your own experience technique A works better than technique B and with regular Agile retrospectives you’d notice if it had failings and worked to overcome them. Unfortunately, you’re very likely to fall victim to the same selection bias that makes psychics believable. People inherently remember the “hits” and forget the “misses”1, so even personal experience isn’t something you should depend on to make decisions unless it’s backed up at least partly by real hard data.

I like playing with new technology as much as the next person, but at the end of the day I have to wonder if the benefits are always as compelling as we’d all like to think. Probability would suggest that some of the new approaches really are a big improvement, but it would also suggest some of them are a mistake. Without metrics to judge them on, how are we going to know the difference?

1 – Wikipedia claims this was first raised in Broad, C. D. (1937). The philosophical implications of foreknowledge. Proceedings of the Aristotelian Society (Supplementary), 16, 177-209.

  • Stephan Schmidt says:

    I forgot I also agree with the “bias” bit. Wrote about it some years ago:

    “The biggest problem with tales about software success is that they only show the survivors.”


    March 3, 2010 at 3:27 pm
  • Stephan Schmidt says:

    “[...] in debate based purely on pure personal preference.”

    I thought about this too, is it just my personal preference? I tend towards a no. The main reason being that several senior developers on my teams in the past came naturally up with some of this (like final, Goggle collections, getters/setters, …). But I agree this is anecdotal.

    ” There’s barely even any anecdotal evidence provided, let alone actual measurements to see what impact on productivity, quality or other desirable qualities the proposed approaches have.”

    You’re obviously right about the evidence. I agree that our industry needs more of that, and I’ve blogged several times about this, always in the spirit of hacknot. For example:

    “Comparing Java and Python – is Java 10x more verbose than Python (LOC)? A modest empiric approach”

    But what evidence would you need that final variables are better? I’d assume experiences from immutable variables seems to be there – see many tractats about functional immutability. For loops are not reusable? Evidence for that? They are just not. Thinking about my philosophy minor, “proofing” the sky is blue is a rather hard task. It just is. Google collections predicates are objects, so they are reusable while for loops are not. They are also composable, shown in “proofs” if you look at the examples. Compare the fluent version of Google collections to a constructor with 20 parameter, which one is easier to read? Needs a proof?


    [1] My take on the bias in software tales:

    March 3, 2010 at 3:29 pm
  • Adrian Sutton says:

    Hey Stephan,
    I think you’re trying to prove the obvious and then wondering why you bother. Instead of trying to prove that loops aren’t reusable, you need to prove that there’s some benefit in having a reusable loop structure – probably in terms of reduced cost of development or lower defect rates. Similarly the sky is blue by definition, but whether blue is the optimal colour for the sky is a potentially useful question and one that can be tested and proved (for a given definition of optimal).

    Easier to read is amazingly subjective – reasonable people looking at the same examples can easily disagree (Cedric did in fact disagree with one of your examples being more readable for example). Again, the question is really not whether it’s easier to read but whether it reduces the cost of maintenance – being easier to read is just an indicator that something will reduce maintenance costs.

    Final variables is a particularly interesting example, a number of comments on your or Cedric’s article stated that they thought final variables were clearly better, but that they couldn’t remember a time when a non-final variable was the cause of a defect. The anecdotal evidence would therefore suggest that making variables final is a waste of time since it’s not improving quality levels and you’d think it would reduce productivity because it’s more code and more typing. Obviously there’s no actual evidence to conclude either way, perhaps it does reduce defects after all and perhaps it does make developers more productive but they’re the kind of questions we should be asking for evidence around.

    In real world development, I’d be looking for statistics around defects caused by incorrect variable assignments as about the only thing I could imagine would be realistically traceable and support the conclusion. Better though would be a controlled test where a control group codes a solution to a problem without using final variables and the test group uses immutability. Lots of variables to try and control in there and it could become quite expensive to run the test long enough to find out about maintenance costs, but even a simple test would give some more concrete information.

    Doing these tests and especially eliminating outside factors is incredibly difficult – software teams usually change a lot of different things before there’s any time to gather data about any one individual change so the data is usually messy if not worthless. That doesn’t mean it’s not important to aim higher, gather what data we can and devote some time to proper controlled testing. Perhaps the coding dojo meetings that have started springing up would be a good place to run some of these controlled tests.

    Anyway, thanks for the extra links – lots of good info in there.

    March 3, 2010 at 3:51 pm
  • Stephan Schmidt says:

    I partialy agree, but ROI on proofing efficiency might not high enough. One way or the other you need to believe in things being better or worse, proofing every little assumption might paralyze you, e.g. thinking if a constructor with 20 parameters or a fluent interface is better.

    I agree with the difficulty of tests, I’ve worked for some years for the biggest german (software) research institute (around 1 billion turnover) as a researcher. Proofing efficiency of software methodologies or tools is hard stuff.


    March 3, 2010 at 8:22 pm
  • Adrian Sutton says:

    Yeah the ROI is difficult if you’re just doing it for yourself, but across the entire industry the ROI is much clearer. So essentially, it’s not feasible for everyone to do it, but that’s ok because we only need *someone* to do it. Internally then we just look for the statistics that are most likely to verify our claims rather than to completely answer the question (like defects caused by variable changes would verify the use of the final keyword as at least providing some benefit even though it doesn’t provide information on what other side effects it might have).

    March 4, 2010 at 7:52 am
  • Stephan Schmidt says:

    Should you start such an journey, count me in.

    March 4, 2010 at 8:20 am
  • Adrian Sutton says:

    Hmm, I should have said someone *else*. :)

    More seriously, I think it is worth getting started, but it will require some pondering about how to do so.

    March 4, 2010 at 8:24 am
  • ddoctor says:

    While I love the new styles of programming becoming popular (functionals, immutables, weak typing craziness), more and more I come to think that no one style fits all. Different problems call for different styles, and personal preference is important – if you’re comfortable in a code style – if it feels good – then that’s a damn good reason to use it. I love a 2B pencil… but if you love crayons, I’m up for trying crayons.

    I understand the argument that we need empirical evidence in software… and it’s extremely valid… but I don’t think you’re going to get it. It’s difficult to measure – as the most complex thing humans create, how can you possibly have a controlled experiment? That’s not a reason not to try… I just don’t think it’ll get very far in a practical sense.

    Software engineering is stuck for a metaphor. We think it’s science so we measure it, we think its engineering so we write processes, we think it’s art so we craft and critique. It has parts of all of these things, but it is so different from any other human pursuit that it needs a whole new class of ways to analyze.

    Were I to pick an analogy, I really prefer art. How do you measure art? You don’t – you analyze it. You debate it. The qualities of software are very subjective and you will always get different opinions. Every artist has their own style. You’re going to get varying levels of opinions, conflicts, trends and consensuses.

    I think immutability is becoming popular as a response to messy code, controlling complexity, more emphasis on testability and its benefits in concurrency. To use an art metaphor – it’s like a renaissance of functional programming. Similarly, people become frustrated with the limitations of certain type systems, like Java, which influences developments in more solid static typing (scala) and more expressive power through weak typing (ruby).

    I’ve worked the full spectrum from very strong to very weak types and I like every step of the way. They are not religions for me to devote myself to, they’re just different tools at my disposal. I’ve written the same sort of code in javascript – with tight, simple, compact code , but type errors and stupid runtime errors – and in java – with robust, type-safe, testable, refactorable, solid code that’s more bloated. There are always tradeoffs.

    Using final everywhere in Java is great, but there’s a syntactic overhead that impacts readability. Maybe if it were default like in Scala… jeez, even if we agree that immutability is awesome, there’s still a class of problems where mutability is great – some thing logically have a state that changes, and this makes a nice way of representing it… maybe even with a performance benefit. I like Scala in that sense… and that doesn’t make it better, it just makes it align better with the preferences of its users. You can even think of this as a usability issue!

    Some age old debates keep resurfacing: Does data change, or is it new data? Should types be weak or strong? Should I group procedurally or OO? Decompose vertically or horizontally? Should I use traditional or functional iteration? What structures best accommodate features a, b, c with criteria x, y, z?

    Rather than continuing these debates, I think it’s more important to step beyond them and find the next big idea that will make these issues seem petty and blow them out of the water.

    What I’m trying to say is that I’m sick of the emphasis being on “is technique X better than technique Y and if so we must always use technique X”. We have many tools at our disposal and, sure, we can analyse or measure them to find different strengths and weaknesses. But, rather than finding one ring to rule them all, a code craftsman must use his still, intuition, knowledge and style to create elegant solutions. That’s not something that can be reduced to a set or rules, a standard, a process or a provable hypothesis. It’s far more subtle and complex than that.

    I feel that we’ve also forgotten where we are. The state of the art of programming is still primitive – we’re still in the stone age. We squabble about the different options available to us now, but (hopefully) we will look back and think: jeez, did we really squabble over weak vs strong back in the day when Kickass Technique X exists? It’s like arguing over whether fortran or cobol is better.

    This game of making software is strange, it’s subtle. Maybe we try to measure it, analyze it, apply intuition, creativity, innovation. I dunno. I’m not going to pretend I have all the answers… and I think that’s the key. Too many people think they do have the answers, when, really nobody does. I don’t want people to front up with proof.. I want them apply some critical thinking and innovation and think: what was good and bad how can I improve it? What’s the next step? We’ve got a long way to go.

    March 4, 2010 at 12:57 pm

Your email address will not be published. Required fields are marked *