Mocks Are A Sometimes Food

September 11th, 2006

Cookie MonsterThere's an interesting pattern when you start doing TDD and trying to make your tests as atomic as possible1. First of all you wonder how anyone could ever get far with completely standalone classes that don't interact with anything - obviously a program needs some level of communication between classes. Then you discover Mock objects. These wonderful little gems allow you to have communication between classes but still test them independently. Pretty soon you're going on a cookie monster style binge session with mocks. Everything can should and will be mocked out and there's no longer any need to worry about making your classes keep to themselves, all those external dependencies can just be mocked out.

At some point though, you get an error about the max permspace being reached. You've created so many mock classes that the JVM no longer has room to load class definitions. What's worse, your atomic tests which should be running lightning fast because they're so atomic, aren't. This is the point where you learn that Mocks are in fact a sometimes food. They're a nasty hack which allows you to pretend the world is all happy and isolated even though it's not.

Now, there are certainly architectures you could use which would completely avoid the need for mocks while still allowing inter-class communication but it can become mind-bendingly difficult to work out where the heck those messages are going and the performance implications of breaking things up at this level is pretty severe. A completely separated design is often beneficial at a component level, but it's usually impractical at the class level.

So mocks are a useful tool to help test things that need to interface with other classes but are largely independent. I've taken to the rule of thumb that if I need to use more than one mock in a test I'm probably doing something wrong - if I'm using more than three mocks I really need to rethink my approach.

How do other people approach the use of mocks and other related techniques?

1 - I'm not yet sure, but perhaps it would be better to focus on making them run as fast as possible, rather than as atomic as possible. They would still need to be focussed enough to make it clear where the problem is instead of just showing that this is a problem.

2 - possibly hidden away in the language runtime

Stripping Styles As Part Of Sanitation

September 10th, 2006

Somewhere along the line I stumbled across Mark Pilgrim's description of how the universal feed parser sanitizes HTML. A key part of this is that the universal feed parser strips styles because IE on Windows can wind up executing part of that style as JavaScript.

While obviously at the moment this how to be done, it seems completely unreasonable to me that any content that wants to be syndicated accurately needs to avoid CSS entirely. It seems to me that rather than stripping style information, we should be pressuring Microsoft (and any other affected browser vendors) to fix the browser so that it doesn't ever treat information in CSS as executable code.

There are two aspects of RSS security to consider. Firstly and most seriously, there is the possibility of security risks posed by things like the embed, object, applet, script and a bunch of other tags. Out of the box these can allow cross-site scripting exploits and with user intervention (accepting a security dialog) can completely hose a user's machine. The security dialog is a lot less protection than it would normally be because the user may not realize they are viewing syndicated content and that the site their on isn't the source of the security dialog. These things are big dangerous security risks and obviously need to be stripped if any of the aggregated content sources is not as trusted as the end publication point1. Another point of entry here is that the syndicated content is effectively moved into a more trusted domain (eg: it's viewed from your web server at localhost instead of from some random remote server on the web), thus it gets more permissions than it should otherwise. Stop trusting localhost. IE already has moved to treating any file on your hard drive as completely untrusted and preventing it from executing JavaScript etc. We need to treat all content coming into our browsers as if it came from a random site on the internet - the concept of different security zones in browsers has always been a hideously bad idea.

Secondly, there is the potential for syndicated content to mess up the display of the final site by including CSS. If we were in a world where executable code couldn't be embedded in CSS, then the worst that could happen is the rendering is messed up content. No cross site scripting, no security dialogs popping up as if there were from a more trusted site, just a bunch of things laid out on the page wrong. The most serious case of this that I can think of is using CSS to overlay form elements, say for a username and password from syndicated content over the real login box on the page. Similarly, it would be possible to position HTML elements on the page such that they looked like they came from the trusted site instead of from the syndicated content.

If the final site was a public site like Google News, then such messing up of the content, even if it wasn't devious would be extremely bad and obviously stripping style information is really the only way to go. If however, the final destination site is my news aggregator, then the chances are that I don't really care. I know everything that is displayed comes from syndicated content and if something messes up the display I'll drop that feed from my subscriptions. Web-based aggregators fall into the first category, but a planet aggregator I run locally falls into the second. Readers like NetNewsWire that display the stories separately instead of combining them are an even safer case - the content is completely delineated from everything else, has no extra abilities to interact with other things and is essentially the same as viewing the page on the web. This is particularly so in NetNewsWire's case since it uses WebKit so it's even the same rendering engine. As long as the rendering engine is setup to treat the content as a random web site instead of in some ultra-stupid trusted mode, I can't see a way that any content could be malicious.

The best way that I can see to solve this is to improve the final renderer rather than trying to limit what can be aggregated. Firstly, no browser should have a trusted mode - everything should be treated as if it was downloaded from a remote site somewhere on the internet.

Secondly, browsers need to provide a way to specify that part of the page has been syndicated and is actually from a separate source. Any syndicated source would run its JavaScript in a fresh sandbox, it would have no access to any part of the DOM outside of that specific syndicated content area or any other resource - it is effectively a brand new web page that just happens to render as part of the existing one. There is no way to jump back out of that syndicated block from inside it. Aggregators would then just have to make sure that the syndicated content is well-formed XHTML2 and put it inside a DIV with the special syndicated content marker. Everything inside that DIV is now separated from the rest of the page. Further, nothing from inside that DIV can render outside of the boundaries.

Using the syndicated content flag would only ever add restrictions, for example if you were browsing a site that you had explicitly disabled JavaScript on, syndicated content in that page could not use JavaScript even if the syndicated content reportedly came from a completely trusted site. Browsers could choose to never allow syndicated content to trigger a security dialog, or use both the original page URL and the reported syndication source in the dialog so the user is effectively asked to trust both sources. Personally, I'd heavily lean towards just not allowing syndicated content to trigger security dialogs because users tend to approve them without thinking.

The essential part of the syndication flag isn't that it is a way of saying where the content comes from, it's a way of saying "this site doesn't vouch for this content, don't trust it even if you trust me, oh and by the way, I think it came from over there". Now, I haven't thought about this for more than about half an hour, so there may be a reason it wouldn't work. Regardless of the feasibility of this scheme though, we need to come up with some way of syndicating things without giving up the latest, and downright old, web technologies and the enhanced user experience it brings. Even if we have to put up with stripping styles from feeds for now, what can we do to make it safe to keep in the future?

1 - In other words, if your intranet pulls data via RSS from your internal systems you probably don't need to worry - all the systems and thus all the content is under your own control and RSS is just being used as a transport mechanism.

2 - thus avoiding an extraneous closing DIV tag which would allow the content to suddenly become part of the main page again

End To End Testing And The 10 Minute Build

September 7th, 2006

At least in my mind, there seems to be a clash of aims in XP. You want to make sure that you have complete confidence in your tests so that you can go faster and reduce the cost of change. To achieve this you write lots and lots of tests - until your fear of something breaking turns to boredom from writing tests you know will pass. Most of those tests are atomic and test a particular component, but fear lies in the gaps between components too so you regularly get recommendations like Ola Ellnestam's on my previous post, Testing Your Setup Code:

Note: When you're TDDing and getting the very loose coupling every one is longing for ;-) you must be aware that integration tests and acceptance tests are an absolute necessity. Since this is the only way to really test your configuration.

On the other hand, to be able to get rapid feedback you want a fast build - under 10 minutes. From James Shore's draft of the 10 Minute Build chapter of his upcoming book, The Art of Agile:

For most teams, their tests are the source of a slow build. Usually it's because their tests aren't focused enough. Look for common problems: are you writing end-to-end tests when you should be writing unit tests and integration tests? Do you unit tests talk to a database, network, or file system?

You should be able to run about 100 unit tests per second (test_driven_development). Unit tests should comprise the majority of your tests. A fraction (less than 10%) should be integration tests, checking that two components synchronize properly. Only a handful, if any at all, should be end-to-end tests (testing).

The problem is, if you want to test your software comprehensively and be able to have confidence that your tests will tell you if you've broken something, I can't see how you can avoid writing a lot of integration tests. I also don't see why you would avoid automating end to end tests and running them very regularly. The reality is that you need to have a QA process that tests the application from how users actually use it - not from the point of view of this bit of the system or that bit of the system. You need to verify that when a user clicks on this menu item, the event is sent over to the editor pane which interprets it as a bold action and instructs the document to apply bold and finally that when the document serializes it comes out with a STRONG or B tag (depending on the user's preferences) around the text.

If you don't have a test that verifies that the message from the menu bar actually gets to the editor pane, how can you have confidence that bold works? How can you have confidence that the complex changes you're making to the document result in the right end effect when the document is serialized?

I suspect there are a couple of contributing factors to my confusion around this issue. Firstly, I work with text and frankly there is nothing less predictable and safe in software than text. It seems simple on the face of it, but there is a huge amount of complexity that goes on behind the scenes and user's absolutely demand that the editing experience is completely seamless and intuitive. There are few other environments where the number of possible program states is so incomprehensibly huge in a practical sense, where the differences really matter. On top of that, there's a ridiculous number of possible user actions that are all available at the same time and all of them interact with the program state is subtly different ways to try to best match the user's expectation.

In short, if there were any environment where you should be afraid of making changes, it's code that deals with text. That fear turns into a desire to write lots of tests and make them as close as possible to what the user is actually doing. Capturing all the subtleties of the state and embedding them in an atomic test is difficult to get precisely write - you tend to cover the most important details but miss one or two bits of state that can come back to bite you when you least expect it. Having end to end tests resolves that sense of fear, because you know that the program is operating just like when the user actually uses it - you can't have missed a detail somewhere, it's all the real deal.

The other contributing factor is that I haven't had the opportunity to work on a high quality code base that has very comprehensive atomic tests. Ephox's code base is quite old, it's mostly high quality code but it has some back alleys where ambushes lie in wait and it's well tested but mostly with integration level tests, not atomic tests. It's no surprise then that I don't have complete confidence in the atomic tests - they just don't cover enough of the application. That said, my confidence in the atomic tests is definitely growing as we add more tests and get better at knowing what to test and how to test it.

The bottom line is that now and for the foreseeable future, I'm not going to have enough confidence in the atomic tests to get rid of the slow end to end tests. However I do see it as important to improve our atomic tests and the confidence we have in them. Being able to verify that your changes haven't caused problems in 5-10 seconds by running the appropriate tests is a huge boost to productivity. Being able to have confidence that everything will work in a minute or two by running all the atomic tests is a extremely powerful too. Despite that, knowing that Bob the Builder is going to come along behind you and run a comprehensive suite of end to end tests as well is priceless.

Refactoring To Make Improvements Possible

September 7th, 2006

I've had an interesting experience the last couple of days - I've been trying to add some major new functionality into our list code. The code is exceptionally well tested and fairly easy to understand but it wasn't clear how to write a test that described the functionality I wanted to add.

I started off by writing an acceptance test for what I wanted and then started drilling down to what I needed, but it was leading me off into a rewrite of our list code because it was too difficult to see how to reuse the existing code for what I wanted. In the end, I decided to almost reverse refactor the existing code to extract out the logic that I needed. I say reverse refactor because instead of making the code simpler to read and understand, it was making it more complex - it really felt quite wrong to be applying the refactorings.

By the time I left this afternoon though, I'd reached the point where the logic I wanted to reuse was quite clearly separated and the design was starting to be cleaned up again. I've got some duplicate code lying around because I haven't finished cleaning up yet, but I'm quite happy with the way it's all shaping up. What has surprised me the most is how much more code I can reuse than I had expected to. Taking the refactorings one step at a time and depending on the tests to make sure I had things right has led to a much better design and a clearer path forward than I had ever thought possible.

One question that remains is what new tests I should be writing. While all the code has simply been refactored and is still covered by the old tests, the refactoring has exposed a bunch of new opportunities for atomic tests. For example, a couple of new classes have been split out - it's possible to add atomic tests for those to verify that they do the subset of the task that they claim to, no more and no less - currently we only have tests for the task as a whole. So far I don't have any fear that it's wrong so I'm not worrying about writing tests, but I'm trying to stay alert to my fear level to be sure that tests are added as soon as they provide benefit.

Another Thing To Dislike About Obnoxious Referrer Links

September 2nd, 2006

I complained before about Obnoxious Referrer Links and now Andy has stumbled across problems they cause in the real world.

It turns out that having a meme tracker for the feeds you subscribe to can produce some interesting results. The big issue is that some feeds either rewrite URLs to include a redirect through their server, or strip all HTML and just give you a snippet of the article. This makes it basically impossible to determine if two items link to the same article.