Mocks Are A Sometimes Food

September 11th, 2006

Cookie MonsterThere's an interesting pattern when you start doing TDD and trying to make your tests as atomic as possible1. First of all you wonder how anyone could ever get far with completely standalone classes that don't interact with anything – obviously a program needs some level of communication between classes. Then you discover Mock objects. These wonderful little gems allow you to have communication between classes but still test them independently. Pretty soon you're going on a cookie monster style binge session with mocks. Everything can should and will be mocked out and there's no longer any need to worry about making your classes keep to themselves, all those external dependencies can just be mocked out.

At some point though, you get an error about the max permspace being reached. You've created so many mock classes that the JVM no longer has room to load class definitions. What's worse, your atomic tests which should be running lightning fast because they're so atomic, aren't. This is the point where you learn that Mocks are in fact a sometimes food. They're a nasty hack which allows you to pretend the world is all happy and isolated even though it's not.

Now, there are certainly architectures you could use which would completely avoid the need for mocks while still allowing inter-class communication but it can become mind-bendingly difficult to work out where the heck those messages are going and the performance implications of breaking things up at this level is pretty severe. A completely separated design is often beneficial at a component level, but it's usually impractical at the class level.

So mocks are a useful tool to help test things that need to interface with other classes but are largely independent. I've taken to the rule of thumb that if I need to use more than one mock in a test I'm probably doing something wrong – if I'm using more than three mocks I really need to rethink my approach.

How do other people approach the use of mocks and other related techniques?

1 – I'm not yet sure, but perhaps it would be better to focus on making them run as fast as possible, rather than as atomic as possible. They would still need to be focussed enough to make it clear where the problem is instead of just showing that this is a problem.

2 – possibly hidden away in the language runtime

End To End Testing And The 10 Minute Build

September 7th, 2006

At least in my mind, there seems to be a clash of aims in XP. You want to make sure that you have complete confidence in your tests so that you can go faster and reduce the cost of change. To achieve this you write lots and lots of tests – until your fear of something breaking turns to boredom from writing tests you know will pass. Most of those tests are atomic and test a particular component, but fear lies in the gaps between components too so you regularly get recommendations like Ola Ellnestam's on my previous post, Testing Your Setup Code:

Note: When you're TDDing and getting the very loose coupling every one is longing for ;-) you must be aware that integration tests and acceptance tests are an absolute necessity. Since this is the only way to really test your configuration.

On the other hand, to be able to get rapid feedback you want a fast build – under 10 minutes. From James Shore's draft of the 10 Minute Build chapter of his upcoming book, The Art of Agile:

For most teams, their tests are the source of a slow build. Usually it's because their tests aren't focused enough. Look for common problems: are you writing end-to-end tests when you should be writing unit tests and integration tests? Do you unit tests talk to a database, network, or file system?

You should be able to run about 100 unit tests per second (test_driven_development). Unit tests should comprise the majority of your tests. A fraction (less than 10%) should be integration tests, checking that two components synchronize properly. Only a handful, if any at all, should be end-to-end tests (testing).

The problem is, if you want to test your software comprehensively and be able to have confidence that your tests will tell you if you've broken something, I can't see how you can avoid writing a lot of integration tests. I also don't see why you would avoid automating end to end tests and running them very regularly. The reality is that you need to have a QA process that tests the application from how users actually use it – not from the point of view of this bit of the system or that bit of the system. You need to verify that when a user clicks on this menu item, the event is sent over to the editor pane which interprets it as a bold action and instructs the document to apply bold and finally that when the document serializes it comes out with a STRONG or B tag (depending on the user's preferences) around the text.

If you don't have a test that verifies that the message from the menu bar actually gets to the editor pane, how can you have confidence that bold works? How can you have confidence that the complex changes you're making to the document result in the right end effect when the document is serialized?

I suspect there are a couple of contributing factors to my confusion around this issue. Firstly, I work with text and frankly there is nothing less predictable and safe in software than text. It seems simple on the face of it, but there is a huge amount of complexity that goes on behind the scenes and user's absolutely demand that the editing experience is completely seamless and intuitive. There are few other environments where the number of possible program states is so incomprehensibly huge in a practical sense, where the differences really matter. On top of that, there's a ridiculous number of possible user actions that are all available at the same time and all of them interact with the program state is subtly different ways to try to best match the user's expectation.

In short, if there were any environment where you should be afraid of making changes, it's code that deals with text. That fear turns into a desire to write lots of tests and make them as close as possible to what the user is actually doing. Capturing all the subtleties of the state and embedding them in an atomic test is difficult to get precisely write – you tend to cover the most important details but miss one or two bits of state that can come back to bite you when you least expect it. Having end to end tests resolves that sense of fear, because you know that the program is operating just like when the user actually uses it – you can't have missed a detail somewhere, it's all the real deal.

The other contributing factor is that I haven't had the opportunity to work on a high quality code base that has very comprehensive atomic tests. Ephox's code base is quite old, it's mostly high quality code but it has some back alleys where ambushes lie in wait and it's well tested but mostly with integration level tests, not atomic tests. It's no surprise then that I don't have complete confidence in the atomic tests – they just don't cover enough of the application. That said, my confidence in the atomic tests is definitely growing as we add more tests and get better at knowing what to test and how to test it.

The bottom line is that now and for the foreseeable future, I'm not going to have enough confidence in the atomic tests to get rid of the slow end to end tests. However I do see it as important to improve our atomic tests and the confidence we have in them. Being able to verify that your changes haven't caused problems in 5-10 seconds by running the appropriate tests is a huge boost to productivity. Being able to have confidence that everything will work in a minute or two by running all the atomic tests is a extremely powerful too. Despite that, knowing that Bob the Builder is going to come along behind you and run a comprehensive suite of end to end tests as well is priceless.

Refactoring To Make Improvements Possible

September 7th, 2006

I've had an interesting experience the last couple of days – I've been trying to add some major new functionality into our list code. The code is exceptionally well tested and fairly easy to understand but it wasn't clear how to write a test that described the functionality I wanted to add.

I started off by writing an acceptance test for what I wanted and then started drilling down to what I needed, but it was leading me off into a rewrite of our list code because it was too difficult to see how to reuse the existing code for what I wanted. In the end, I decided to almost reverse refactor the existing code to extract out the logic that I needed. I say reverse refactor because instead of making the code simpler to read and understand, it was making it more complex – it really felt quite wrong to be applying the refactorings.

By the time I left this afternoon though, I'd reached the point where the logic I wanted to reuse was quite clearly separated and the design was starting to be cleaned up again. I've got some duplicate code lying around because I haven't finished cleaning up yet, but I'm quite happy with the way it's all shaping up. What has surprised me the most is how much more code I can reuse than I had expected to. Taking the refactorings one step at a time and depending on the tests to make sure I had things right has led to a much better design and a clearer path forward than I had ever thought possible.

One question that remains is what new tests I should be writing. While all the code has simply been refactored and is still covered by the old tests, the refactoring has exposed a bunch of new opportunities for atomic tests. For example, a couple of new classes have been split out – it's possible to add atomic tests for those to verify that they do the subset of the task that they claim to, no more and no less – currently we only have tests for the task as a whole. So far I don't have any fear that it's wrong so I'm not worrying about writing tests, but I'm trying to stay alert to my fear level to be sure that tests are added as soon as they provide benefit.

Where Should You Deploy From?

August 29th, 2006

Once you have an automated build, the next step is to automate deployment1. A lot of people take this to mean that you should be able to check out the code, compile it and deploy it all from your local work station. I think this is largely a really bad idea.

Firstly, if you have deployment system that needs to vary, or might in the future need to vary, based on the version of the product, then your deployment scripts have to be in source control with the product and be branched and versioned just like the actual source code. If your product just spits out a zip file that is uploaded to a web server for clients to download, you my want to separate the deployment of that zip file from the code base since it will change based on changes to the web server, not changes to the code. You should however still be able to build the zip file from scripts that are versioned with your source code.

In either case though, there's no reason that you should be deploying builds from your local machine. Having a centralized deployment machine is a good idea for many of the same reasons that having a standard integration machine – primarily to avoid the it worked on my computer issues. At Ephox, our integration machine is also our deployment machine and we've simplified deployment of a build to the web site down to a couple of clicks on a web page that the integration machine runs. Technically, we can kick off the deployment from any machine without having to get up and the actual work is handled by the deployment server.

The deployment script is home grown and very basic – it actually just runs an ant process on the server and passes through the output – but it has a couple of nice features. Firstly, it allows you to select any successful build of any of our products and deploy it all through the same interface. Secondly, once you've picked the build to deploy it shows you the list of check-in comments for the changes specific to that build. You can also get it to retrieve a subversion log of all changes between two builds.

Switching to a deployment server has had a few benefits at Ephox – firstly, it means that your local machine is free to continue work while the build uploads. Even though an upload can be done in the background, the computer is tied up doing the build and running all the tests for a while, plus the developer can't make changes to that source check out until it's done. It all becomes a hassle, so moving that over to a separate machine makes it easier to get on with the next task straight away.

More importantly though, it means that every single build that is sent out to a client is built the same way, from a completely clean check out2 and we know that every single test has been run against it and passed. We've had a lot of success with eliminating regressions in builds we send to clients and the central build server was a good way to make sure we didn't cut corners and skip the tests "just this once". These days running the tests is so in-grained in our culture we could probably lose the server and still not be tempted to cut corners.

That same deployment system can also deploy new builds to all our internal systems with a single command. Again, it's just a matter of copying files around so it doesn't need to be versioned with our source code, but it has given us a huge amount of testing and feedback because the whole company is always using the latest cutting edge build from development. It certainly keeps the engineers focussed on not breaking stuff too.

Looking back on it, for the small amount of time required to set it up and the almost non-existent maintenance required, setting up a central distribution server has been one of the biggest pay-offs we've seen for improving our development process. Every company should consider doing it.

1 – in fact, some would argue that automating deployment is part of automating the build

2 – we configure cruise control to completely delete the checked out source tree and check out a fresh copy to make sure it is in a pristine state. We have a faster build that runs on every commit that just does an update, but that build stops as soon as it has run all the tests, so no distribution is made.

What Is Included In The 10 Minute Build?

August 28th, 2006

Having a fast build is an important part of XP so that the continuous integration cycle doesn't take too long. Do people usually see this as including all the acceptance tests as well or does this just include the developer's tests?

How do people handle acceptance tests failing? Does it cause the build to fail or not? What about acceptance tests that have been added for the current iteration but haven't been implemented yet?

At Ephox, we treat acceptance tests as critical tests that must never fail and we run them on every build as part of our continuous integration. We do however have an in progress directory where we put acceptance tests that are yet to be implemented and currently our build system doesn't run them at all. At some point we'll update the build script to run these tests, probably as part of the nightly build, and report on which are passing and failing. We also need to get more into the habit of monitoring these tests and making sure they are moved into the acceptance tests proper once they are passing.

I need to think a lot more about how to write acceptance tests as well – our current tests are written from the user's point of view, so they are end to end tests and tend to be slow to run. The advantage is they comprehensively test what the user expects, instead of just looking at part of the systems internals and assuming that the full code path works. There seems to be advantages and disadvantages with both forms, but I do find that having end to end acceptance tests makes me feel safer. Having efficient atomic and integration tests that test the expected internals for those acceptance tests lets me work fast with a high level of confidence because in most cases if the atomic tests pass, the acceptance tests will too. It's nice to have the acceptance tests making sure nothing falls through the cracks of the atomic tests though.