When Should You Rewrite?

May 28th, 2006

Greg picks up on my previous post about XP principles and how it helps avoid rewrites. I thought I should explain in more detail why rewrites are a bad thing and my thoughts about when and how you should do them anyway.

Programmers for some reason seem to think any code they didn't write, and often any code they wrote some time ago, is poor quality, misguided and generally crap. Often this is quite true, but the degree to which the code is bad is usually significantly less than the initial impression it gives. That is, when you first look at a piece of code and start working your way through it, you feel as if the programmer was completely brain-dead and it's amazing the software worked at all.  Except in very rare occasions the code does work with just minor bugs or even no bugs (at least that have been discovered) so the inclination to think that it can't possibly work is just a form of panic reaction from your brain while it struggles to comprehend the new code. It's very easy to condemn a piece of code in those first few moments when you don't actually understand it and are just seeing a mess of symbols and a bunch of bad coding practices, but doing so condemns your rewrite.

XP has a notion that the code is the documentation (this particularly applies to test cases in XP) and while you may debate whether or not that's a good idea, it's hard to disagree that the code contains knowledge, information about little gotcha's, user requirements and previously fixed bugs. That information must be preserved if you want to successfully fix the code either through refactoring or rewriting. In the first few panics moments where you just can't believe how unbelievably bad this code is, you can't possibly detect and document all the knowledge that resides in the existing code. That code is currently your only lifeline and your greatest asset - even if you do wind up throwing it out, treat it like your best friend right up until the point that you do.

It's this concept of extracting the knowledge from the code that drives people (including Martin Fowler in the early chapters of "Refactoring") to make the first step of any rewrite to write tests for the code. The same applies for a rewrite, first write tests to capture exactly what the current code does in as much detail as possible. If at all possible, write automated tests, but for things you can't automate, write manual tests and make sure you run through them regularly while you work on the code. For your first pass, write the tests assuming that the old code is absolutely perfect and bug free - whatever output it gives you should be considered the right answer. If you have a test that looks like it shows a bug in the old code, by all means add a note about it right there in the test so that when you come to make that test pass in your rewrite or when that test starts failing after your refactoring you can work through the impacts of the change you've made and decide on what the best result actually is in that case.

Once you've extracted the knowledge out of the old code, you're finally ready to decide if you should rewrite or refactor. If after all this analysis you still think the old code is unfixable, then you are probably in the right situation for a rewrite. In most cases though you will now have an understanding of why the code was so complex and hard to understand and you will probably have a number of insights into better ways it could have been done - that's the time when you should refactor. Start with the most obvious improvement you could make and do it, then look for the new most obvious improvement. Repeat that process over and over until you're happy with the code.  Just make sure you run all the tests after each change.

If you've decided to rewrite you have some more planning work to do. You want to avoid working off the main branch as much as possible or other team members might end up relying on the old code that you're about to change - any of your bug fixes or improvements might impact on your team members. Besides that, bad code generally isn't nicely modularized so you're probably going to have to make changes to a number of areas of your code base - that makes merging back in difficult if you leave it too long. To avoid these problems, see if you can break the rewrite down into smaller chunks - keep most of the old code (possibly with an abstraction layer that provides the interface you eventually want to have) and replace one part of it, then merge back with the main branch. Repeat this process of replacing small chunks until you're happy with the code. Note here that you don't repeat until the old code is completely replaced, some of the smaller chunks might just need refactoring and can be kept around, some of them might be quite good once you get rid of the bad code from around them and don't need any major changes. Keep your eyes open for ways of reducing your workload but still ending up with the same quality (or better) code.

As you go about the rewrite, make sure you write lots of tests - you already have your acceptance tests written, they're the ones that document the output of the previous code, but make sure you write integration and unit tests, plus tests for whatever new functionality you wind up adding. If you're not doing XP that's fine, you can still do your big design up front, you can even write your tests after you write your code, but don't consider an acceptance test passing until you have all the lower-level tests for that code written. The last thing you want is to wind up with another piece of crap code after you've spent so much time and effort replacing the old, previously working code. You need to take it slow and do it right - the code was probably bad last time because it was rushed. I suggest something like acceptance test driven development.

Always remember, you're probably no smarter than the programmer that came before you - you're genius is not going to make the replacement better than the original, you need to leverage some other advantage. That advantage might come from the new knowledge and experience with the related libraries you're using. It might come from using a new library to do most of the heavy lifting for your. It might come from actually thinking about the design ahead of time, it might come from not thinking about the design ahead of time and using TDD. It might come from writing better documentation or more tests. It might just come from actually focussing on quality and not rushing. Ideally your advantage would come from a number of different things so that the replacement code is the best it can be. Your management will not be happy if this bit of code causes complaints in the future.

Speaking of management, it's actually business needs that are most impacted by doing rewrites. The trouble for the business is that they wind up paying a number of really expensive developers to sit around and recreate things that the business has already paid to have develop. Even worse, while that's happening much less or potentially no value adding work is being done. To the business the rewrite might have strategic value, but it's very hard to quantify and measure. Engineering will tell the business that they'll be able to add features faster in the future after this rewrite but they can't say how much faster or which features will benefit. There may not even be a significant number of users complaining about product quality, so there's no easily visible reason for the business to support the cost of a rewrite. As part of deciding to do a rewrite, you need to weigh up these business interests. Will the company go broke before you complete the rewrite? Will the company miss a big, long-term opportunity because it can't add features while the rewrite is happening. Has it been a long time since a new version went out and would it be better to delay the rewrite until you get a new version out so that sales and marketing have something to sell while you work? What does the business need to do to handle the increase in support after your rewrite ships (you should be expecting to lose stability in the short-term but gain it in the long-term in most cases)? How much of a reduction in support will you see after the new code has settled in? How much easier will it be to add new features?

Always remember, rewrites are hard, they take much longer than you thought and introduce more bugs than you thought. As Greg said:

Naturally, if the decision to rewrite is taken, it should only be done if there is a clear commitment to first learn the lessons from the failed implementation and to create a design and a methodology that have a reasonable likelihood of success.

Knowing The Importance Of Code

May 25th, 2006

Sometimes you write code that is really important, sometimes you write code that is not and other times you write code that is somewhere in between. Should you apply the same quality standards to all of that code?

It really comes down to a question of value - code that you write once, run and then throw away obviously doesn't need to be pretty and certainly doesn't need any documentation. What about code that you keep around, make the odd change to and run regularly but is of low importance? What if it just doesn't matter if the code breaks? Where is the value in making the code robust and easy to maintain?

Over the past week or so I've been working on code that fits into that grey area between code that obviously needs to be high quality and code that just doesn't matter. I've been writing an AppleScript to pull various bits of data about our development progress and code quality and generate a bunch of graphs with that data. Now if I were really good at AppleScript, I could probably whip up high quality scripts in a short time frame, but I'm not so just getting it working can be time consuming. I just can't see the value proposition in spending a lot of time cleaning up this script when it doesn't matter if it fails and the number of changes I'm likely to need to make to it are very small. Besides which, the time required to completely reimplement the script is pretty low - most of the time I've invested in it was learning how to AppleScript Excel and how to use the various command line programs it uses to parse the data. Now that I have that knowledge it's pretty simple to do.

I'm also inclined to not worry about it too much, because if I were concerned with the script I wouldn't have written it in AppleScript - it would depend on OS X, Excel, GUI Scripting and various other things - I'd probably write it in Java as a full blown application. Yes it would take much longer but it would be something that anyone on the team could run and would be expected to have the skills to maintain. I doubt anyone else on our team has ever seen an AppleScript, let alone had to write one. Does it matter? Not really. The extra time to write it in a language that the rest of the team knew would have made the project cost more than it was worth. Let's face it, having a few basic graphs about your project's health isn't that valuable when you already have the reports available.

The catch however, is that if this code were to ever become important I need to make sure that I go back and make its quality match its importance. That is, I'd need to take the time and refactor the code to make it more robust and easier to maintain - I may even need to rewrite it from scratch so it doesn't have so many dependencies. For now though, I'm happy that it is automating a process that I could do by hand and bringing value to the team.

Judging the value of code can be hard at times - it's easy to think that some part of your product is rarely used or unlikely to break, but there's value in avoiding software entropy - it's actually costly to introduce bad code to your code base even if it doesn't ever break because it encourages you to get sloppy in other places in your code. A while back I argued that code should be beautiful and I stand by that, including the follow up: Beauty Is Only Skin Deep. In those entries I was talking about any code that you ship to clients or anything that impacts on your actual product. Don't risk failing your users because you didn't care enough about your code. At the same time, don't let the cost of writing beautiful code, or even good code, put you off automating tasks that you do repetitively, or on little tools to make your development life easier - just don't let that code get anywhere near your actual product.

What's the lesson here? Ask "Where is the value in what is being done?" constantly, including when you decide on the required quality levels.

Return Of The Killer Smart Tags

May 21st, 2006

Well maybe not so much "killer"… Anyway, Scoble mentions the return of SmartTags due to bloggers choosing to add them to their site. I pretty much never actually go to bloggers sites unless I want to write a blog post about them in which case I open a new tab in NetNewsWire to remind me for later, so I don't notice them much. When I do see them though they really annoy me - they look far too much like hyperlinks and distract far too much from the content.

It's odd as well that they never feel relevant. Just because a post mentions the word apple doesn't mean I want to buy one (the fruit or the computer). What tends to disturb me more though is forums that use this kind of advertising that I've been seeing a lot. That's putting ads in the middle of your users words and I'd consider it outright unethical.

Now using this technology to provide inline definitions would be useful. Interestingly one of our engineers implemented a system like this as a plug-in for EditLive! for Java with the idea of being able to define glossary terms and have the definition display on the page if it's wanted but be out of the way if not. You can imagine how useful this would be in industries that use a ridiculous number of acronyms and specialized terms.

In fact, on our internal wiki there are a whole bunch of pages dedicated to defining terms and acronyms that are specific to our business which would be really useful to be able to display like this automatically. I find it really annoying to auto-link every capitalized word like most wikis do as it leads to too many links so I'd restrict this to just showing a pop up definition if the page was specifically marked as a definition (or perhaps if it was below a certain length). I'd also change the rendering to something much more subtle than the double green underline that seems to be typically used. Maybe a dotted pastel colour?

Framing The XP Principles

May 21st, 2006

A while back Ben Hyde wrote his thoughts on the key XP principles in What every you doing, it’s wrong! I'm not sure I fully comprehend exactly what Ben is trying to say but a lot of it seems opposed to the way I see XP in theory and to the experience I've had in implementing XP.

I'll start with Ben's rewording of the summary of Extreme Programming (see also James Mead's original):

  • integrate continiously assures you take only small steps
  • writing tests first assumes the code and platform has nothing to say about the problem
  • pairing programmers assures you leave a large swath of good talent for your competors to hire
  • refactor mercilessly assumes you have large code bases rather large installed bases, poor you.

Firstly, integrating continuously, does ensure you take small steps and that's a good thing. Ask Netscape about how successful throwing out an entire code base and rewriting it is. Sure Firefox is becoming popular now, but Netscape has all but gone out of business because of one bad technical decision to rewrite their entire browser instead of taking small steps and fixing the existing one.

There are more benefits to small steps - by taking small steps you get quicker feedback, you know sooner that something is broken and there are fewer changes that could have broken it. You're also less likely to break things in the first place because small steps are easier to fully understand, so it's less likely that you'll miss something.

Most importantly though, taking small steps doesn't mean that you can't make large changes. You just make large changes one step at a time. You can achieve the effect of any large step you might want to take in small steps, with the added benefit that you know sooner if it's not going to work and you may realize you don't need to go the whole way because the benefits are achieved by just doing part of the work.

I'm not sure what the criticism of test first is meant to be - writing tests firsts mean you do a small piece of design up front (by writing a test) and then write the code to execute on that design. When you write your tests you should be considering the strengths and limitations of both the existing code base and the platform you're working with. In addition, you get the ability to verify that your code matches your design and that it does what you think it does by running the tests. That doesn't mean you'll have bug free code, but you will have code that works in all the cases you thought to write a test for, considering you have a test for every branch and every side-effect (or at least you should if you're actually doing TDD) is a pretty high level of quality and provides a lot of confidence in your code.

The pairing programmers rewrite is the biggest falsehood in this list. If you have a good team who actually have social abilities then pairing is fun. Our office comes alive with happy chatter and laughter as people pair because we've built a team that gets on well together and focussed on enjoying our work. Frankly, I don't care how brilliant you are at writing code, if you don't have the social skills to interact with your co-workers you are not a talented programmer, you're a code monkey and I wouldn't want you on my team.

It's not XP that makes communication important, software engineering has always required good communication skills and it's a shame that so many people in the field don't realize that. You almost always have to work with a team in software development, you almost always have contact with other parts of the business and if you want to develop good software, your engineering team should be in regular contact with customers so they understand their problems. Besides which, wouldn't it be nice to have confidence that when your biggest customer has problems you can send one of your developers out to help them fix the problem? You can't do that unless you value social skills when you hire engineers, yet even really big companies value the ability to do just that.

Finally, the refactoring rewrite shows a complete lack of understanding about what refactoring is. Refactoring is not about changing public APIs, it's not about constantly shifting the product under your user's feet. It is about changing the implementation of APIs for the better, without affecting the end result. As Martin Fowler put it, refactoring is "a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior" (Fowler, Martin. Refactoring: Improving the Design of Existing Code. P. 53).

If you are refactoring instead of redesigning, your users will never see the difference except perhaps for a smaller binary size (due to removed duplication in your code) and fewer bugs (due to the code being more maintainable). The functionality of your program, the public APIs and the user interface should all remain unchanged when refactoring.

As an example, Ephox has a very large installed base - not only do we have thousands of clients around the world, most of those clients deploy to a large number of end-users, add in the OEM deals we have and the number of people affected by changes to our product is pretty immense. Even more importantly, we regularly deal with very large enterprise development projects that don't upgrade regularly and have very long test cycles to deal with any changes. If we were to break our backward compatibility we'd get huge numbers of complaints from our existing clients.

Despite that, we refactor mercilessly to keep our code maintainable. In our last major release we completely refactored out HTML list editing code - users would only have noticed that bugs were fixed. In the major release before that, we completely refactored the way we handle loading the editor interface based on the configuration file we're given - none of our clients or end-users noticed, but we did because the new code was far more maintainable. In the next major release we'll have done some refactoring to nearly every part of our code, but it won't affect our public API at all. The only changes users will see are those that we deliberately decided to make because the product manager wanted to improve features or add new features, but all of it is 100% backwards compatible.

It’s worth noting that encouraging refactoring runs counter to the advice to integrate continously. Refactoring is fundimentally a choice to buy a bag of disintegration.

This is also false. You simply refactor in small steps and integrate continuously. You should definitely not be forking a project just to refactor it, you shouldn't even create a separate branch in version control. Refactoring should happen continuously, every day you'll find something that could be better be it because it wasn't perfect when it was first written or because the new code you've added means there are extra demands on its flexibility. Refactoring should just be a routine day to day habit so that you ensure you keep your code base in top condition all the time.

All that said, if you happen to be working on a library instead of a product, refactoring can be more difficult, particularly if you've failed to clearly denote what APIs are intended for public use and which are just available because of limitations of the language you're using. That pain only extends as far as the public APIs though, you should still refactor your internal code mercilessly.

Features vs Stories

May 18th, 2006

I realized today that I hadn't made explicit the difference in my mind between features and stories and it's an important difference. Essentially, a feature is a group of stories that are related and deliver a package of functionality that end users would generally expect to get all at once. For instance, inline table resizing is a feature (note: this is the ability to drag to resize tables, rows and columns - try it in Word). In the first pass, you'd probably have a single story for inline resizing of tables, but it would be too big to estimate. So you break it down into three stories, resize columns, resize rows and resize the table itself.

Now we have three stories - three things to develop which add value to the product and can be done fairly quickly (we're currently aiming for the biggest of our stories to take a day). There is value in being able to resize columns but not rows or the table itself - it allows users to distribute space between columns more easily to make the table look good. In fact, it covers 90% of the use cases for inline table resizing - people rarely resize the overall table or the height of rows, usually they just want to make column widths fit the data better. Despite that, shipping just that story out to users would cause confusion and complaints to support because it sets up the expectation in user's minds that they can resize tables so why can't they resize the rows or the table?

The feature is all three stories put together, and to be able to ship without confusing our users we really want to make sure all three stories are completed before we ship - however we can still release to internal users as part of our dog-fooding program or to beta testers before we have all three, so there is value in doing just one of the stories.

There are two key advantages to breaking down features like this. Firstly, it makes estimation easier and more accurate - things that take a long time tend to be very difficult to estimate. Secondly, it allows us to track progress on the feature accurately. One of the first rules of tracking project progress is that you only ever track tasks complete, and not partial tasks because people are really bad at estimating how close to completion they are. With small stories you can track progress just based on the number of complete stories without losing too much accuracy.

Breaking down stories is pretty hard though because you have to make sure that they still deliver value. We had one story recently to develop a particular properties panel that would become part of a larger dialog. Unfortunately we were given the panel story before the overall dialog story so we developed a panel that the user couldn't possibly access. Worse still, we wasted time beginning to create a dialog just for that panel because we assumed it was required - we should have picked up on the lack of value in the story and asked the client about it before we started work - we probably would have done the create the dialog story first, thus making the panel story suddenly have value.