Symphonious

Living in a state of accord.

Making End-to-End Tests Work

The Google Testing Blog has an article “Just Say No to More End-to-End Tests” which winds up being a rather depressing evaluation of the testing capabilities, culture and infrastructure at Google. For example:

Let’s assume the team already has some fantastic test infrastructure in place. Every night:

  1. The latest version of the service is built. 
  2. This version is then deployed to the team’s testing environment. 
  3. All end-to-end tests then run against this testing environment. 
  4. An email report summarizing the test results is sent to the team.

If your idea of fantastic test infrastructure starts with the words “every night” and ends with an email being sent you’re doomed. Bryan Pendleton does a good job of analysing and correcting the details so I won’t recover that ground. Instead, let me provide a view of what reasonable test infrastructure looks like.

At LMAX we’ve recently reached the milestone of 10,000 end to end acceptance tests. We’ve obviously invested a lot of time in building up all those tests but they’re invaluable in the way they free us to try daring things and make sweeping changes, confident that if anything is broken it will be caught. We’re happy to radically restructure components in ways that require lots of changes to unit tests because of those end-to-end tests.

We also have huge numbers of unit tests, integration tests, performance tests, static analysis and various other forms of tests, but the end-to-end tests are far more than a sanity-check, they’re a primary form of quality control.

Those end-to-end tests, or acceptance tests as we call them:

  • run constantly through the day
  • complete in around 50 minutes, including deploying and starting the servers, running all the tests and shutting down again at the end
  • are all required to pass before we consider a version releasable
  • are included in our information radiators to ensure the team has constant visibility into the test results
  • are owned by the whole team – testers, developers and business analysts together

That’s pretty much entry-level for doing end-to-end testing (or frankly any testing). We’ve also got a few extra niceties that I’ve written about before:

Plus the test results are displayed in real-time, so we don’t even have to wait for the end of the test run to see any failures. Tests that failed on the previous run are run first to give us quick feedback on whether they’ve been fixed or not.

There’s lots of great stuff in there, but we have more work to do. We have an intermittency problem. When we started out we didn’t believe that intermittency could be avoided and accepted a certain level of breakage on each build – much like the Google post talks about expecting 90% of tests to pass. That attitude is a death-knell for test reliability. If you don’t have all the tests passing consistently, gradually and inevitably more and more tests become intermittent over time.

We’ve been fighting back hard against intermittency and making excellent progress – we’ve recently added the requirement that releases have no failures and green builds are the norm and if there are intermittent failures it’s usually only one or two per run. Currently we’re seeing an intermittent failure rate of around 0.00006% of tests run (which actually sounds pretty good but with 10,000 tests that’s far too many runs with failures that should have been green).

But improvements come in waves with new intermittency creeping in because it can hide in amongst the existing noise. It has taken and will take a lot of dedication and commitment to dig ourselves out of the intermittency hole we’re in but it’s absolutely possible and we will get there.

So next time you hear someone try to tell you that end-to-end tests aren’t worth the effort, point them to LMAX. We do end-to-end testing big time and it is massively, indisputably worth it. And we only expect it to become more worth it as we reduce the intermittency and continue improving our tools over time.

Printing Only Part of a grep Match

It’s not uncommon to want to find a particular line in a file and print just a part of it.  For example, if we wanted to find the total memory on a machine we could use:

grep '^MemTotal:' /proc/meminfo | 
sed -e 's/^MemTotal:\s*\([0-9]*\).*/\1/'

Which does the job but duplicates a bunch of the matching regex. If you’re using a reasonably recent gnu grep, you can use it’s support for perl regex syntax and the \K operator:

grep -Po '^(MemTotal:\s*)\K[0-9]*' /proc/meminfo

The -P is for perl regex, and the o option causes grep to output only what is matched. You can see though that we’re actually matching the MemTotal: at the start of the line yet somehow it doesn’t wind up in the output.  That’s the magic of \K which excludes the token immediately prior to it from the matched text (without rewinding the stream and attempting a different match for that text).

Update

Alexandre Niveau emailed me to point out that in this particular case (and probably most others where I’d be tempted to use \K with grep) grep is completely superfluous and we can just use sed with the -n flag:

sed -n -e 's/^MemTotal:\s*\([0-9]*\).*/\1/p' /proc/meminfo

The -n disables automatic printing of lines and the ‘p’ at the end of the regex prints the matching lines. Definitely good to know – thanks Alexandre.

Patterns are for People

Avdi Grimm in Patterns are for People:
Patterns aren’t tools for programming computers; they are tools for programming people. As such, to say that “patterns are a language smell” makes very little sense. Patterns are a tool for augmenting language: our language, the language we use to talk to each other about a problem and its solutions; the language we use to daydream new machines in our minds before committing them to code. Patterns aren’t a language smell; rather, patterns are by definition, [optional] language features.
There’s so much we can learn if only we stop making everything into a battle between “new” tools or approaches and “old” ones. Patterns are relevant and useful in all forms of programming and we shouldn’t discard them just because they’re not “cool” anymore. Sadly we seem to throw out nearly all our experience and learning each time a new approach comes along rather than learning from and taking advantage of both.

Pre-Java 8:

ThreadLocal<Foo> foo = new ThreadLocal() {
@Override
protected Foo initialValue() {
return new Foo();
}
};

Post-Java 8:

ThreadLocal<Foo> foo = ThreadLocal.withInitial(Foo::new);

Nice.