Ant, Subant and Basedir

January 25th, 2010

Apache Ant logoHere’s an important lesson for people combining ant scripts – the way basedir is calculated is very unlikely to be what you expect. In particular, if you combine the <ant> task with the <subant> task you’re probably in for a surprise.

I learnt this important life lesson when the improved build scripts I’d been working on failed on the build server even though it worked perfectly on my machine. The difference is that the build server is running cruise control and it has a wrapper ant script which checks out a fresh copy of the project then uses the <ant> task to build it. The ant task was:

<ant antfile="build.xml" target="dist" dir="projectDir" />

As far as that main antfile is concerned, everything is perfect – the basedir is the directory that it’s build.xml is in and all is good with the world. However, if that build.xml happens to use subant, the basedir will not be changed. Basically, basedir is now a user configured property rather than a calculated value so it doesn’t get changed. However, if you instead use:

<ant antfile="projectDir/build.xml" target="dist" />

It all works out. The main build.xml still gets the basedir as projectDir but when it uses subant, the basedir will be automatically changed to whatever directory the build file subant points to is in.

The behavior is explained in this bug report which is closed as WONTFIX for backwards compatibility. Thankfully ant 1.8 adds a useNativeBasedir attribute which provides much more predictable basedir behavior for the ant task.

Three Types of Ant Scripts

January 22nd, 2010

Bryan comments on the three types of ant scripts:

In my experience, there are three types of Ant scripts that you encounter "in the wild":

  • Small Ant scripts, generally Java-only, which can use most of Ant's default behaviors and are clear and simple. A lot of open source build scripts are this way.
  • Serious commercial Ant scripts written before macrodef and import became available. These are generally impossible to understand and evolve, and the reality is that a small cadre of Build Wizards keep them running. Such systems often involve a substantial number of custom Ant tasks.
  • Serious commercial Ant scripts written to use macrodef and import. In my experience, the need for custom Ant tasks drops way off with Ant releases post-1.6.

This really does ring true to me. Ant scripts can fairly quickly become unwieldy and difficult to work with if you aren’t using import and macrodef, but with them you can achieve so much more without the complexity getting out of hand. They won’t absolve you of the need to properly understand ant and the declarative paradigm it wants you to work with, but it’s much more approachable.

If I can ever get someone to add the optional dependencies for scripting support on our build servers1 I may well find they help a lot too.

1 – which happen to be Windows boxes, behind a firewall on the other side of the world from me, so not easy to make remote changes on

Comparing Build Systems

January 11th, 2010

After spending some time thinking about and using different build systems, I can’t say I really like any of them all that much. I know have a reasonably complex project, using submodules that can be built with ant, buildr, maven and gradle – with varying degrees of support for the Ephox specific requirements and reporting.

Maven

Ah, the Java world’s favorite whipping boy. The more I play with Maven, the more I start to understand why it gets such a bad wrap: it’s too easy to get started and do it all wrong. Maven is not a build tool that is quick to start using – it requires planning, common conventions, a number of extras systems such as repository servers be set up and templates to be designed. However, Maven does an unfortunately good job of working if you just create a simple pom.xml and run mvn.

If you want Maven to work reliably though, you need to set up a bunch of stuff:

  • Private repository servers, including vetting all the meta-data that goes into it. Don’t just import the central Maven repository.
  • A company specific parent POM. This should specify explicit versions for each Maven plugin, point the project at the private repository servers and anything else that is standard across the company.
  • Build plugins for anything custom you need. Real Maven plugins are the best way to extend things – trying to put it all in your pom.xml or similar build scripts rapidly gets too complex and unmaintainable.
  • Strictly follow the Maven way.

Make no mistake, that’s a lot of work, but it sets up a very robust, enterprise build system and most of it only needs to be done once for the entire company, so starting a project from then on does become pretty simple.

All in all, it’s more work than I really want to take on – especially the building of custom plugins. Ephox doesn’t follow the Maven way strictly enough either – the most problematic area being how to release and version snapshots.

Buildr

Lots of people talk about using rake to build Java projects, but rake by itself simple doesn’t know enough about Java to really shine. Buildr is a layer over rake that adds Java knowledge. I actually quite like Buildr and think it has a fair bit of promise.

It can use the Maven repository for dependencies, so if you go that way you need to set up your own private repository again. Support for Thankfully, Buildr can also use plain jar files so if you don’t want to go the repository route, you don’t have to. The big downside is how complex it can be to get transitive dependencies working – partly by design and partly due to bugs and current limitations.

Buildr can run on either C-Ruby or JRuby. Start-up is faster on C-Ruby, but JRuby can run things like Javac without spawning a new JVM, so it fairly quickly starts to make up the time difference. Installing either version is quite easy.

It is a little odd to be writing the build script for a Java project in Ruby. The language difference doesn’t really matter, but having a different set of libraries for things like IO and file system stuff is an unfortunate overhead. If you’re using Ruby in some other form, it’s no problem, otherwise it’s extra training and ramp-up cost.

I also had a lot of problems because Buildr was including stuff on the test classpath by default. Some version of JUnit and JMock are meant to be included, but it varies between versions and the documentation doesn’t always match up with what happens. Given that JMock 2 is very much incompatible with JMock 1.3, it didn’t go so well. The odd thing is that Buildr can be quite flexible – it supports TestNG and a few other libraries, but with JUnit it really wants to also use JMock. There is a way to specify a different version, but the group name is hard coded, so you can’t go back to JMock 1.3 because it has a different group name1. It would be far better for Buildr to just leave the mock framework as a dependency the project developer has to add if they want it.

Overall, Buildr is a pretty good solution, it’s quick and easy to get up and running (beware the central repository though) but it’s still fairly immature which caused me a fair few problems. The good news is that every issue I ran into was already a known problem that the Buildr team is working on resolving, so it’s pretty likely to become a very good option in the future.

Gradle

Gradle is a particularly interesting project. It feels a lot like a Java-oriented version of Rake with elements of Buildr, Maven and Ant mixed in. It has the best ant integration story of any of the non-ant build tools. Like Buildr it can pretty seamlessly utilise ant tasks but within the scripting language rather than with XML. Gradle can also import an existing ant build file and use the targets it defines as if they were native. That’s surprisingly powerful and useful, especially if you already have various build scripts and utilities written in ant.

Gradle also has a really nice approach to multi-project builds, allowing you to inherit configuration from the parent build script and pull in project dependencies easily. However, it’s not all smooth sailing. The main project build file starts to get pretty complex because it winds up configuring both itself and the sub-modules in the one file. The inheritance doesn’t really make sense if you have a mix of sub-module types, say some Java, some Scala and some just web resources. In hindsight, it would probably be better to ignore the built-in inheritance and just use normal file import functionality within each module to select the default module behavior to use.

Sadly, the multi-project stuff really came crashing down because of some pretty unexpected behavior about what the current project actually was. Depending on when a particular bit of script gets accessed, it might take project() to mean the current sub-module or it might wind up referring to the last sub-module that gets processed. It made sense when you think through the way the code works, but it’s far from intuitive when you’re just trying to build your project. I really couldn’t see myself recommending Gradle until this is significantly simplified and made intuitive.

Gradle also handles transitive dependencies much better than Buildr, though I had some confusion of when dependencies were transitive and when they weren’t. Gradle is one of the most flexible tools in terms of how dependencies are handled – using either the maven repository, ivy or defining dependencies as groups directly in the build files, allowing you to check jar files directly into svn if you prefer.

Gradle was also one of the slowest of the tools I tried. Once it’s up and running, build times are equivalent to ant, but a do-nothing target took just over 4 seconds whereas ant took well under a second. For a full build, that’s not a big deal – a project that ant builds in 2 minutes would take gradle 2 minutes and 4 seconds. The problem is when you’re running a really simple task as part of some development (e.g. trying to remove duplication that simian had flagged).

Ant

I’m beginning to think Ant should have been called Cockroach instead – at the end of the build tool war, you can bet ant will still be there going strong. It’s really quite scary that it’s been around for about 10 years now and is up to version 1.8, and not for lack of maintainers. Ant doesn’t use convention over configuration – you have to code up your entire build by piecing together the task building blocks it provides. That said, the tasks ant provides are it’s key strength – powerful, flexible and in almost every case very well designed – piecing together a build process from ant tasks is much simpler than piecing together one from scratch or with command line stuff like make would use.

Since you’re writing the whole build script yourself though, ant files can become very long winded and hard to maintain. If you want to use ant successfully, you have to build a set of base scripts that provide the kind of standard project system that tools like Maven and Gradle give you. The benefit being that you can build it the way you want, not the way the tools want you to, without ever having to fight the tool.

Thankfully, Ant actually has some pretty powerful tools to build up that project system. It’s simple to import other build files so you can break your script up, and it’s simple to use extension points by overriding targets in the actual project build file. Ant 1.8 adds ‘extension-point’ to make this even easier, but it’s quite good even without that. It’s not quick to build up the right structure but like the infrastructure required for Maven, you can probably share your ant templates company wide (or perhaps much further). Easyant seems to be an attempt to provide a ready-made project convention on top of ant, but I think it went a bit too far and buried the ant functionality too deeply.

What I wound up with is a set of build scripts that define various useful macros and targets – version numbering, tagging in subversion, working with dependencies on sub-modules – and a set of scripts that define module types such as jar, war and the parent project. Frankly, that approach has revolutionised the way I work with ant.

The build scripts are now basically a first class project in and of themselves. They can be re-used across multiple projects and improved over time as different projects have various needs. Those improvements can then be easily shared back with the original projects since the scripts aren’t being copied into every project, just treated like any other dependency.

The build files in the project and sub-module are then very short and simple, since they only have to define any non-standard behavior and the required dependencies. This makes the build process for the project much simpler to maintain and understand. It obviously makes it much easier to start a new project or module as well.

I also found it really useful to define macros to make things more readable. For example, many but not all projects need to include a copy of EditLive! so there’s a set of predefined targets to grab the right version and make it available. Targets that need EditLive! just depend on the “editlive” predefined target, but that doesn’t allow a specific version of EditLive! to be requested. So there’s also a macro defined, editlive-version, that lets you set the version to use. All it does is set a few properties that control where to get EditLive! from, but the final syntax is much easier to read the the previous method of putting them directly in a properties file allowed. I may be overusing the technique at the moment, but it’s quite useful2.

However, there are some drawbacks. It got fairly difficult to keep track of which properties were declared at which points in the build – especially as the number of build files being imported increased. There were a few builds tagged as ${version-major}.${version-minor} because of that. Fortunately, that confusion was limited to within the build framework I was building – not the actual build files for the project itself, and I found that it was mostly caused by my habit of defining every variable in a properties file that’s included at the top of the build script. For build frameworks, it’s a lot better to declare properties as late as possible, within targets. That way, the property is set at the same time as the first work related to it is being done and any dependent information would already be set. Essentially, it uses the target dependencies to work out the right order for setting properties.

The downside of building this framework is that now we have to maintain it ourselves as well – it would be much nicer to use something like Maven or Gradle and have their dev teams deal with the on-going maintenance. However, since we do have some Ephox specific stuff, we’ll always be maintaining some amount of build infrastructure and since we’re currently maintaining it separate for every project, this approach is a lot better than the status quo.

Conclusion

Ultimately, I think that ant is still the best choice, but it really is vital to set up a build framework that you can reuse, rather than doing everything from scratch for each project. Buildr and Gradle look like they have some huge potential in the future though, but they need some more time to improve stability, simplicity and consistency. I’d guess that their version 2.0’s will be something pretty serious to reckon with. Maven is an awesome tool but it just requires too much effort to get it working right. However, the consistency in how a project is built that the Maven project has brought to the Java would is absolutely revolutionary – neither Buildr or Gradle could be anywhere near as simple as they are if it weren’t for the work and evangelism of the Maven team to embed the Maven way in to the Java world’s consciousness. Sure we’re still fighting over many parts of it, but the Maven project structure is now well established and a very powerful convention.

1 – that pesky Maven 1 to Maven 2 transition again…

2 – I would also note that the greatest improvement ever added to ant is the else attribute on the condition task (in ant 1.6.2). Makes it much easier and more maintainable to set a property value to one of two options.

Project Directory Structure

January 6th, 2010

Having spent a bunch of time looking at various build systems and tools, one of the simplest and most effective improvements I’ve discovered is to always use the Maven project structure. It doesn’t matter if you’re not using Maven, there’s no downside to using it and every build tool that uses convention over configuration uses the Maven structure.

Previously I’ve been of the opinion that the directory structure really didn’t matter much – I went with whatever happened to be auto-generated by whatever tool I was using that day. There is a small overhead in remembering to look for the JavaSource directory instead of src or source when you switch projects but it’s incredibly minimal and not worth worrying about on its own. When it comes to build scripts, getting these basics in the same place saves a whole lot of configuration and makes everything simpler.

When you start using sub-modules within the project it really becomes clear just how much time you can waste tweaking bits of build script to work with even slightly different directory structures. Not only do many build systems give you more functionality automatically, you suddenly get the ability to re-use build scripts and templates across different modules and across projects.

Don’t think that using the Maven directory structure means you have to play by all the Maven rules though. You can still generate more than one artifact from a project if that suits you best, you can check jar files into source control rather than using a repository etc, but source files go in src/main/java (or scala or groovy or webapp etc) and test files go in src/test/java (or scala or groovy etc).

Simple,quick and easy to switch to1 and saves a surprising amount of setup work in your build scripts – especially when you first get the project up and running.

1 – assuming your current build scripts aren’t completely sadistic

On Build Systems

January 4th, 2010

Recently, the subject of build tools and systems has come up again at Ephox and it appears the topic is rising up again around the internet. As part of this I’ve been reading up on and playing with a bunch of build tools to get a feel for their benefits and limitations, so it seemed worthwhile writing up what I find as I go along.

The Projects

Which build tool suits best clearly depends on the type of project you’re working with. I’m currently playing with three quite different projects:

  1. EditLive! – a big, old code base with a few dependencies and a quite complex ant-based build process. While the primary code base is Java, the distribution includes a bunch of JavaScript and other ancillary files, documentation and is packaged up in three or four different distribution packages.
  2. “EPipes” – a brand new, very simple Java library with no dependencies (other than JUnit). Ant and Maven build scripts at the moment.
  3. “E2” – a newish internal web app with quite a few dependencies (including a complex transitive dependency tree), a few sub-projects and the need to include EditLive!

One important thing to note here is that I’m not trying to solve deployment as part of the build script – I’m only interesting in building the software ready to be deployed where it’s required and no server management etc1.

The Problems

Complexity

Simple projects have nice simple build scripts, but as you add multiple output formats and various other filtering steps into the build process the build scripts grow in complexity and become hard to understand. The build script becomes a software project in its own right and chews up large amounts of engineering time.

Sub-modules and Internal Dependencies

Breaking code up into separate modules is one of the most powerful ways to reduce complexity and increase maintainability, but if those modules are all just thrown into the same project and built together, it’s very common for interdependencies to leak through the code because there’s nothing to enforce the separation. What you want to be able to do is break the project up into separate sub-projects which are built in isolation so that the build fails if you add unwanted dependencies.

However, that means that instead of building one project, you’re now building multiple projects to get the final output and duplication gets introduced into the build system. In most cases you also want to be able to make changes across the modules without doing a full release or even committing. For example, you may want to add a new configuration option to one of your sub-modules that the main code base requires. Ideally you should be able to try that approach out in your own sandbox without having to commit, build and release the sub-module first.

Internal dependencies are a slight variant on this – they are at least conceptually developed by a different team, so you don’t need to make concurrent changes, but you do need to depend on versions that potentially aren’t publicly released yet. The real challenge here is that they may use a different build system and may not be available in or meet the assumptions of any particular dependency management scheme. Including EditLive! usually presents this kind of challenge.

Transitive Dependencies

When you depend on a library it usually requires a bunch of other libraries in turn. It’s usually pretty easy to grab all of the required library and shove them in at the start, but when you later go to upgrade that library or remove the dependency on it, you need to know which libraries it pulled in and whether or not they are still required, which versions are required etc.

Build Reliability

We write all kinds of automated tests for our code to ensure it works correctly, but most of the time our build scripts are completely untested. As complexity increases the chances of errors in the build scripts increases pretty dramatically. The challenge is that the build script is what runs the tests, so who tests the tester?

Spin-Up

Creating a new project or a new sub-project takes too long or it doesn’t get started with the right set of quality controls (e.g. it’s not running checkstyle yet, or not checking code coverage etc).

Repeatability

The single most important aspect of a build system is that it is 100% repeatable. An interesting exception at Ephox is that we never do two builds with the same version number. It was originally a poor-mans repeatability – if you want consistency only conduct the test once. We’ve kept it around even though we have a very reliable build system because a) it’s a bit of a safety blanket and b) code signing certificates have an expiry date that we can’t avoid, so the jar signature might be valid on one build and invalid on the other even though we do everything exactly the same.

False Problems

There are a few false problems that people bring up a lot when discussing build systems. Usually these are actually contributing factors rather than actual problems, sometimes they’re just personal preference showing through.

XML is not a Programming Language

This has to the most common false problem. It’s pretty clear that there are XML dialects which are in fact programming languages but that’s not really what the complaint is about. The real point here is that the build script is either too complex or too verbose. It might also be a complaint about productivity when editing the build script. It’s important to dig into these particular complaints because understanding what the real problem is leads to finding the right solution. If productivity is the problem, better editing tools are often the best solution. Often ant’s use of XML is being confused with the declarative intent of ant. In other words, with ant you’re build script isn’t a programming language and isn’t trying to be – you’re just trying to use it wrong (and perhaps the tool isn’t the right choice for you).

Downloading the Internet

This is commonly levelled at maven since it uses it’s own dependency management system to reduce the initial size of it’s download. Complaining that maven is so complex it has to download extra modules just to run clean is a red-herring – it could just as easily have included those modules with the initial download and been just as complex without needing to download anything. Ant for instance is “so complex” it requires you to program your own clean.

That said, accessing the internet can be a real problem in various ways. Does the build work the same way on the internal intranet as it does when working from home? What happens if the internet connection is unreliable? Does downloading stuff mean that the build isn’t reproducible? Once you identify the real problems, most build tools provide solutions to them in various ways. For example, all of those problems apply to maven by default, but can actually be solved2.

The Build is Too Slow

This can be a real problem if your build tool happens to be the bottle-neck but that’s fairly unlikely. More likely is that your unit tests are too slow, or that it’s too hard to make the build run in a distribution fashion, or that the complexity has introduced bugs into the build script and it’s doing work that’s either not needed or has already been done once before.

My Build is a Unique and Special Snowflake

It’s easy to believe that your software project is somehow special and there’s no other software that’s built like it out there. More likely though, it’s a pretty close variant on a theme rather than something completely unique to itself. Down this path lies building your own build tool – from scratch at the extreme. A custom built build tool can be the right choice, but it’s no cakewalk. I just look at the pain many newly open-sourced projects have had because they have an unusual build process. Huge amounts of time are spent teaching people how to set up the build environment and compile the project before they can start contributing. That cost is encountered with every new team member even if you never open-source your code. Plus, now instead of one project to build, now you have two and it’s unlikely to reduce the complexity.

So dig into this false problem further and really understand it:

  • Are you mixing in deployment stuff into the build and would that be better split out?
  • Is there something about your project you could or should change to make it easier to build?
  • Can you build custom modules for your build system rather than trying to do everything in the build script? For example, build an ant task or a maven plugin to perform particularly custom tasks.
  • Would you be better running an external script as part of the build rather than replacing the whole build process?

The key theme here is to identify the parts of your project that follow common patterns and the parts that are more unusual then leverage existing tools for the common parts. Depending on how much is common and how much is unusual will affect which tools is right, but you can usually avoid having to write a completely custom build system.

The Options I See So Far

There are a lot of build tools out there these days, but here are the ones I’ve found so far that are worth investigating:

  • ant All our systems are built around it so far but we need to find better ways of using it to solve the problems we’re seeing.
  • Maven The other Java build system. Maven has a real love it or hate it thing so it’s hard to know. I see huge potential with Maven and have huge concerns as well. I can’t put much faith in what I’ve read against Maven though because the articles always seem to lack:
    • A sense of rationality. Huge diatribes are easy to find, careful analysis is much less common.
    • Using a private maven repository. This is a must for Maven but is usually just mentioned off-hand as a solution that was put in the too hard basket rather than really tried.
    • Building custom plugins for custom parts of the build. People who complain maven “just can’t do” something haven’t looked at building a plugin for it, or calling out to an external script etc.
  • Rake Except Rake doesn’t seem to really know anything about building Java projects so it would have to be combined with another library, possibly Raven, but I need to do more looking into how best to use rake with Java projects and Rake in general.

Anything else that would be worth looking into would be good to know as well. I’m deliberately ignoring make since the build process needs to run cross-platform and while that’s possible in make, it’s a pretty big challenge.

1 – I tend to be of the view that your build and deployment systems should be separate even if they do wind up using the same tools. That way you separate the configuration from your actual code since the deployment step does the configuration rather than it being baked into what gets built. It’s also a pretty good split-point to help reduce complexity and lets you pick a deployment tool that best fits rather than having to use the same tool for both.

2 – whether the effort required to solve them is worth it or not depends on the particulars of the project and potentially how many projects that effort can be amortised over; it usually needs to be done per company rather than per project.