Recently, the subject of build tools and systems has come up again at Ephox and it appears the topic is rising up again around the internet. As part of this I’ve been reading up on and playing with a bunch of build tools to get a feel for their benefits and limitations, so it seemed worthwhile writing up what I find as I go along.
The Projects
Which build tool suits best clearly depends on the type of project you’re working with. I’m currently playing with three quite different projects:
- EditLive! – a big, old code base with a few dependencies and a quite complex ant-based build process. While the primary code base is Java, the distribution includes a bunch of JavaScript and other ancillary files, documentation and is packaged up in three or four different distribution packages.
- “EPipes” – a brand new, very simple Java library with no dependencies (other than JUnit). Ant and Maven build scripts at the moment.
- “E2” – a newish internal web app with quite a few dependencies (including a complex transitive dependency tree), a few sub-projects and the need to include EditLive!
One important thing to note here is that I’m not trying to solve deployment as part of the build script – I’m only interesting in building the software ready to be deployed where it’s required and no server management etc1.
The Problems
Complexity
Simple projects have nice simple build scripts, but as you add multiple output formats and various other filtering steps into the build process the build scripts grow in complexity and become hard to understand. The build script becomes a software project in its own right and chews up large amounts of engineering time.
Sub-modules and Internal Dependencies
Breaking code up into separate modules is one of the most powerful ways to reduce complexity and increase maintainability, but if those modules are all just thrown into the same project and built together, it’s very common for interdependencies to leak through the code because there’s nothing to enforce the separation. What you want to be able to do is break the project up into separate sub-projects which are built in isolation so that the build fails if you add unwanted dependencies.
However, that means that instead of building one project, you’re now building multiple projects to get the final output and duplication gets introduced into the build system. In most cases you also want to be able to make changes across the modules without doing a full release or even committing. For example, you may want to add a new configuration option to one of your sub-modules that the main code base requires. Ideally you should be able to try that approach out in your own sandbox without having to commit, build and release the sub-module first.
Internal dependencies are a slight variant on this – they are at least conceptually developed by a different team, so you don’t need to make concurrent changes, but you do need to depend on versions that potentially aren’t publicly released yet. The real challenge here is that they may use a different build system and may not be available in or meet the assumptions of any particular dependency management scheme. Including EditLive! usually presents this kind of challenge.
Transitive Dependencies
When you depend on a library it usually requires a bunch of other libraries in turn. It’s usually pretty easy to grab all of the required library and shove them in at the start, but when you later go to upgrade that library or remove the dependency on it, you need to know which libraries it pulled in and whether or not they are still required, which versions are required etc.
Build Reliability
We write all kinds of automated tests for our code to ensure it works correctly, but most of the time our build scripts are completely untested. As complexity increases the chances of errors in the build scripts increases pretty dramatically. The challenge is that the build script is what runs the tests, so who tests the tester?
Spin-Up
Creating a new project or a new sub-project takes too long or it doesn’t get started with the right set of quality controls (e.g. it’s not running checkstyle yet, or not checking code coverage etc).
Repeatability
The single most important aspect of a build system is that it is 100% repeatable. An interesting exception at Ephox is that we never do two builds with the same version number. It was originally a poor-mans repeatability – if you want consistency only conduct the test once. We’ve kept it around even though we have a very reliable build system because a) it’s a bit of a safety blanket and b) code signing certificates have an expiry date that we can’t avoid, so the jar signature might be valid on one build and invalid on the other even though we do everything exactly the same.
False Problems
There are a few false problems that people bring up a lot when discussing build systems. Usually these are actually contributing factors rather than actual problems, sometimes they’re just personal preference showing through.
XML is not a Programming Language
This has to the most common false problem. It’s pretty clear that there are XML dialects which are in fact programming languages but that’s not really what the complaint is about. The real point here is that the build script is either too complex or too verbose. It might also be a complaint about productivity when editing the build script. It’s important to dig into these particular complaints because understanding what the real problem is leads to finding the right solution. If productivity is the problem, better editing tools are often the best solution. Often ant’s use of XML is being confused with the declarative intent of ant. In other words, with ant you’re build script isn’t a programming language and isn’t trying to be – you’re just trying to use it wrong (and perhaps the tool isn’t the right choice for you).
Downloading the Internet
This is commonly levelled at maven since it uses it’s own dependency management system to reduce the initial size of it’s download. Complaining that maven is so complex it has to download extra modules just to run clean is a red-herring – it could just as easily have included those modules with the initial download and been just as complex without needing to download anything. Ant for instance is “so complex” it requires you to program your own clean.
That said, accessing the internet can be a real problem in various ways. Does the build work the same way on the internal intranet as it does when working from home? What happens if the internet connection is unreliable? Does downloading stuff mean that the build isn’t reproducible? Once you identify the real problems, most build tools provide solutions to them in various ways. For example, all of those problems apply to maven by default, but can actually be solved2.
The Build is Too Slow
This can be a real problem if your build tool happens to be the bottle-neck but that’s fairly unlikely. More likely is that your unit tests are too slow, or that it’s too hard to make the build run in a distribution fashion, or that the complexity has introduced bugs into the build script and it’s doing work that’s either not needed or has already been done once before.
My Build is a Unique and Special Snowflake
It’s easy to believe that your software project is somehow special and there’s no other software that’s built like it out there. More likely though, it’s a pretty close variant on a theme rather than something completely unique to itself. Down this path lies building your own build tool – from scratch at the extreme. A custom built build tool can be the right choice, but it’s no cakewalk. I just look at the pain many newly open-sourced projects have had because they have an unusual build process. Huge amounts of time are spent teaching people how to set up the build environment and compile the project before they can start contributing. That cost is encountered with every new team member even if you never open-source your code. Plus, now instead of one project to build, now you have two and it’s unlikely to reduce the complexity.
So dig into this false problem further and really understand it:
- Are you mixing in deployment stuff into the build and would that be better split out?
- Is there something about your project you could or should change to make it easier to build?
- Can you build custom modules for your build system rather than trying to do everything in the build script? For example, build an ant task or a maven plugin to perform particularly custom tasks.
- Would you be better running an external script as part of the build rather than replacing the whole build process?
The key theme here is to identify the parts of your project that follow common patterns and the parts that are more unusual then leverage existing tools for the common parts. Depending on how much is common and how much is unusual will affect which tools is right, but you can usually avoid having to write a completely custom build system.
The Options I See So Far
There are a lot of build tools out there these days, but here are the ones I’ve found so far that are worth investigating:
- ant All our systems are built around it so far but we need to find better ways of using it to solve the problems we’re seeing.
- Maven The other Java build system. Maven has a real love it or hate it thing so it’s hard to know. I see huge potential with Maven and have huge concerns as well. I can’t put much faith in what I’ve read against Maven though because the articles always seem to lack:
- A sense of rationality. Huge diatribes are easy to find, careful analysis is much less common.
- Using a private maven repository. This is a must for Maven but is usually just mentioned off-hand as a solution that was put in the too hard basket rather than really tried.
- Building custom plugins for custom parts of the build. People who complain maven “just can’t do” something haven’t looked at building a plugin for it, or calling out to an external script etc.
- Rake Except Rake doesn’t seem to really know anything about building Java projects so it would have to be combined with another library, possibly Raven, but I need to do more looking into how best to use rake with Java projects and Rake in general.
Anything else that would be worth looking into would be good to know as well. I’m deliberately ignoring make since the build process needs to run cross-platform and while that’s possible in make, it’s a pretty big challenge.
1 – I tend to be of the view that your build and deployment systems should be separate even if they do wind up using the same tools. That way you separate the configuration from your actual code since the deployment step does the configuration rather than it being baked into what gets built. It’s also a pretty good split-point to help reduce complexity and lets you pick a deployment tool that best fits rather than having to use the same tool for both. ↩
2 – whether the effort required to solve them is worth it or not depends on the particulars of the project and potentially how many projects that effort can be amortised over; it usually needs to be done per company rather than per project. ↩
ddoctor says:
> XML is not a Programming Language
My main gripe along these lines is productivity, modularity and refactoring ability. Normal programming languages have great modular constructs – methods and classes – and modern IDEs have great refactoring tools. A lot of the time I want to just do the equivalent of a method refactor in ant Ant script, but the tools aren’t there. And Ant has some terrible modularity concepts: includes are primitive compared to Java imports; antcall is messy; macrodef is ok… but it’s still not quite a method.
My other gripe about coding in XML is related the classic complaint about XML syntax. In Ant, for instance, a macrodef looks like this:
whereas java would do this:
public void mymacro(String blah) {
// do something
}
I do understand, though, that if you’re coding in XML, you’re doing it wrong – XML should be for data and specification, not code.
More and more in programming I’m finding that elegant syntax is the most important part of any language or API.
ddoctor says:
Hmmmm… looks like my xml example was stripped out.
ddoctor says:
> The Build is Too Slow
I think another cause of this is that a single build is doing too much – i.e. you’re building one big monolithic system, rather than building small individual components, or integrating them. If you only change one part of a system, only that part needs to be rebuilt, then integrated.
Adrian Sutton says:
True, breaking into more modules would help build speed a fair bit – it would also make the build more easily distributed across multiple servers.
ddoctor says:
> Is there something about your project you could or should change to make it easier to build?
Yep, this is an important one people miss. My project may be a unique snowflake at the moment, but could I change it to fit a common paradigm? It’s like design patterns for projects.
Nathan says:
We’re using maven, and it’s so much better now than when we were using ant.
One of the complaints about maven compared to ant is that it’s so much more difficult to do something in maven. This actually ended up helping us, since there’s 50 different ways to do things in ant, and they’re all so easy, we had the same functionality represented differently across all scripts. Makes it very hard to eyeball a file and know what’s going on.
With maven, because it’s “so hard” to figure out how to do something, it tends to fall to tech leads to find out that one way to do it, and then everyone else just follows that to save themselves the dramas. Enforcing consistency can be a problem in certain organisations – but thankfully maven gave this to us. Unexpected, but greatly appreciated.
Andy says:
Came across gradle recently (http://www.gradle.org). Havnt’t tried it out myself, but it looks very promising. From my current experience (many years with ant – my current preference; only few trials with maven – I didn’t get on with it), gradle looks to me like being the best route forward.
Pingback/Trackback
Symphonious » Comparing Build Systems
Andrae Muys says:
Three comments:
1. Really looking forward to seeing what you come up with here.
2. My problem with Maven is its opacity – I run it, magic happens, and something pops out — except sometimes it doesn’t, and when that happens I often have no idea why it didn’t. Configuring it is also something of a black art, and it appears to have been designed with the assumption that I will be using IntelliJ or Eclipse. Sorry, I have the same opacity issues with those tools, so when you stack them on top of each other, my productivity goes out the window as I spend all my time trying to fight my tools, and none actually designing/cutting/testing code.
3. I would be interested in seeing Ant+Ivy included in your comparison – I’ve seen a fair few comments from maven refugees pointing at the combination as a reasonable option.
Adrian Sutton says:
Andrae,
You don’t have to wait too long – my analysis is mostly at http://www.symphonious.net/2010/01/11/comparing-build-systems/ You can’t discern a lot more about a build system without really digging into it much more. The decision for Ephox is to stick with ant but to set up a set of common build scripts that do most of the work and some project templates to make spinning up a new project faster. Basically, build the same out of the box behavior that Maven, Gradle and buildr give you but using ant because it’s so well known, stable and flexible.
That said, one of the tasks for today is to better investigate Ivy and why it’s different to Maven’s repository and sort out some kind of best practice for repositories and sharing projects/modules. Hopefully I’ll get a post up over the weekend with what I learn from that.