Using Ivy for Dependency Management

January 25th, 2010

At first glance, Ivy looks like a re-implementation of Maven’s dependency management that works nicely with ant, and to some degree it is, but it also adds some pretty significant improvements and some pretty significant complexity.

Maven Compatibility

Firstly, Ivy is compatible with Maven repositories, so if you think the way Maven manages dependencies is perfect, but don’t want to buy into the rest of Maven, Ivy provides a good answer. The configuration is a little bit different and you’ll have to learn a little bit about Ivy’s configurations which are both more powerful and more complex than Maven’s dependency “scope”, but you won’t have to go too far into them.

Ivy will use the Maven 2 repository by default so all the same libraries are available – complete with all the metadata problems.

Ivy Configurations

Configurations are probably the biggest difference between Ivy and Maven’s dependency management. At the simplest level they are roughly equivalent to setting the scope attribute in Maven. It lets you choose whether a library is required only for compilation, if it should be included in the packaged WAR/EAR or if it’s used only for testing. Since Ivy doesn’t actually build the project, what a configuration means is a lot more flexible than what I scope means in Maven. Each Ivy setup needs to setup the configurations it needs from scratch and it’s up to the Ant build process to ensure the libraries are used as intended.

If you import a project from the Maven repository, Ivy will convert the various scopes into the following configurations:

Name Description Example Library
default runtime dependencies and master artifact can be used with this conf
master contains only the artifact published by this module itself, with no transitive dependencies The project’s jar itself
compile this is the default scope, used if none is specified. Compile dependencies are available in all classpaths commons-lang
provided this is much like compile, but indicates you expect the JDK or a container to provide it. It is only available on the compilation classpath, and is not transitive Servlet APIs
runtime this scope indicates that the dependency is not required for compilation, but is for execution. It is in the runtime and test classpaths, but not the compile classpath An AOP runtime library?
test this scope indicates that the dependency is not required for normal use of the application, and is only available for the test compilation and execution phases JUnit
system this scope is similar to provided except that you have to provide the JAR which contains it explicitly. The artifact is always available and is not looked up in a repository ??
sources this configuration contains the source artifact of this module, if any Source for the project
javadoc this configuration contains the javadoc artifact of this module, if any JavaDoc for the project
optional contains all optional dependencies Anything optional

The three most commonly used configurations for dependencies are compile, provided and test – most projects will only ever need to add dependencies in those categories. The master, sources and javadoc configurations typically don’t contain any dependencies, they are used for publishing the results of the build into – the same configurations defined here are used by other projects that depend on this one. The default configuration is provided to make this easy, by including all the runtime dependencies and the project’s jar itself in one configuration.

It’s also worth nothing that configurations can extend each other. In the Maven import, runtime extends compile so any libraries you compile against are also available at run time. Default extends runtime and master. You could of course ignore the extends functionality and just include multiple lib dirs on the classpath within your normal ant script as you would if the jars were directly checked in to source control, but the functionality is reasonably simple and makes it very explicitly clear what jars are on what classpath1.

My approach has been to stick with these configurations – they’re simple enough but cover every conceivable situation that I can currently imagine. The only real difference I’m not currently using the master configuration – the produced jars are just put directly into default. That means you can’t easily require the library without any dependencies, but it means that the defaults can be used for what most libraries produce (a single jar file with the same name as the project and in the default configuration).

Ivy Repositories

Ivy has it’s own concept of a repository that uses Ivy descriptor files instead of the Maven POM files and gives the full power, flexibility and complexity of Ivy’s configuration concepts. The configurations might be handy, but what I find most useful, is the fact that Ivy supports multiple different types of repositories. So instead of having to run specific software to provide the repository, it can be accessed via sftp, ssh, a shared drive or even subversion. This makes setting up your own repository significantly easier – it’s quite likely you already have a server that’s available via ssh somewhere. I also like the fact that the whole Ivy configuration can be included in the project itself, so once you check out the source code, you have all the config settings you need to be able to build it2.

Importing Libraries

Ivy can pull libraries from the public maven repository and import them into your private one which makes it reasonably quick to spin up a private repository and get rid of the dependency on the public one altogether. However, it’s still a fairly slow process and takes up the bulk of the time required to get Ivy up and running. That said, it immediately solves a lot of headaches about incorrect meta-data or missing libraries in the maven repository.

What I found was missing however, was a simple tool to add new libraries that don’t exist in the Maven repository. For a single jar, it’s pretty easy to create the directory structure and an ivy.xml file for it, but when you have a large number like the GData APIs, it can take up a huge amount of time. I whipped up a simple bash script to work out most of the meta-data from a jar file (module name and version number), create the directory structure and a simple ivy.xml file then upload it to the repository. It doesn’t add the dependencies automatically but that’s easy once the directory structure and basic ivy.xml has been created.

Namespaces

One of the most common problems with the Maven repository is the number of Apache libraries that still use the Maven 1 naming scheme. For example Commons IO is available under both the org.apache.commons group and the commons-io group. To Maven and Ivy, the different groups mean it’s actually a different library so it can potentially wind up on the classpath twice. When you import libraries from the Maven repository, ivy lets you configure namespaces to avoid this problem. Essentially, Ivy will rewrite the group name to consistently use one or the other, resulting in a more consistent repository and no duplicate jars on the classpath. The rename is done both for the imported library and anything else you import that depends on it so things automatically point to the right place.

Actual Usage

Actually using Ivy is really quite straight forward. There are a lot of different ant tasks you can use, largely due to the huge amount of flexibility Ivy provides, but you can also keep it quite straight forward. I’ve got five basic uses at the moment:

  1. configure – I’m calling this outside of any target at the moment so it always happens before anything else. It sets up Ivy using the configuration file designed for this project. So it has the right configurations set up and points to the private repository instead of the default Maven one.
  2. retrieve – actually calculate the dependencies and link them into the specified directories. Each configuration gets the jar files placed in it’s own sub-directory. From then on, to use the dependencies in the ant script, just point at the appropriate directory. I also set this to use symlinks when possible so the files actually stay in Ivy’s cache directory and it just creates a symlink to save some time.
  3. Inline retrieves – the retrieve task can also be configured to retrieve a library that you specify directly via attributes instead of in an ivy.xml file. This is really handy for things that the build scripts itself want to use, like a scala compiler or cobertura. Otherwise each project would have to have the same dependencies.
  4. Import – I have an ant task set up specifically to make it easy to import libraries from the Maven repository. It uses a slightly different settings file (which includes the Maven repository) and prompts for the group, module and revision to import.
  5. buildlist – this is a handy little task for building sub-modules within a project. It looks at each project’s dependencies and calculates the correct build order which you can then pass in to subant. Very much like the Maven reactor.

The nice thing is that Ivy just plugs in to your existing ant file without many changes at all.

Tracking Versions

Ivy has some nice ideas for working with versions which I haven’t found in Maven before though they may exist. Firstly, it can generate a build number by looking at the latest version available in the repository and adding one which is much better than having a normal ant buildnumber file on a shared drive somewhere. Secondly, when Ivy published an artifact to the repository, it rewrites the ivy.xml to include the specific version of dependencies that it was built with, so anything that depends on it will get known good libraries, even if it was using a SNAPSHOT style build. In Ivy the SNAPSHOT version is equivalent to latest.integration, but it always generates some form of version number when the build is published, so you never get a latest.integration version in the repository. By default it uses a date stamp which actually works out really well.

Aside from latest.integration you can also specify a range of restrictions for acceptable versions which is handy but probably too complex to be worth the effort in most cases. Ivy already resolves most version conflicts automatically by evicting the older versions which works fine as long as the libraries maintain backwards compatibility3. So when commons-httpclient requires commons-logging 1.0 but commons-io requires commons-logging 1.1, you wind up with just commons-logging 1.1 on the classpath and everything works.

Sharing Projects and Modules

Ultimately, the main reason I was looking to move away from jar files in source control was to make it easy to share modules between different projects. Right now, that would basically have to be done by either forking the code, occasionally building a jar and dropping it in or using svn:externals – none of which are particularly appealing. With Ivy, the buildlist can handle any submodule being dumped into the project directory and you can pull a version of the module from the repository regardless of whether or not you have the source code checked out for it. There are now two modes you can use sub-modules in:

  1. Just declare it as a dependency and Ivy will grab the version from the repository. Just like including a jar in source control but a little easier.
  2. Declare it as a dependency and check out a copy of the source as a submodule of the project you’re currently working on. This then gets picked up by the Ivy buildlist task and automatically inserted in the right place in the build process. Now you can easily make changes to the module and use them in your outer project without doing a full release. You only need to push a new version into the shared repository when you’ve got the bugfix or new API completely sorted out and ready for others to use. In the mean time, everything still builds with a single execution of ant.

Speed

The project I tested this out on has a lot of dependencies – it took ages to import them all into the private repository, many required manually fixing up metadata problems as well. With the jar files in subversion, the build server could check out the full source (including libraries) and build the project in 3-4 minutes (the build server and the subversion repository have a gigabit connection between them). The first build it did with Ivy took the build time up to about 25 minutes as it downloaded all the dependencies (the Ivy repository was in the US, the build server was in Australia), but because Ivy keeps a local cache of everything it downloaded the second build went back down to the usual 3-4 minutes.

Ivy can be a bit slow at calculating dependencies though I probably wouldn’t have noticed it if the project didn’t have as many submodules, each of which require Ivy to calculate the dependencies before building in. In the grand scheme of things though, the time that Ivy spends working out dependencies is small enough to be insignificant compared to whatever the rest of the build is doing. Since the jar files are just dropped in to a normal directory, it’s also easy to add a flag to completely skip Ivy and use the existing dependencies if you do have small tasks that are run really frequently. With a remote subversion server you can waste far more time updating, moving or deleting jar files.

Summary

I think Ivy is a pretty clear winner. It’s simple to set up a private repository and avoid all the common problems people hit with the Maven repository, what complexity Ivy adds can be isolated to the template build scripts so individual projects stay quite simple and with buildlist it’s now easy to share modules between projects which had been causing me a lot of headaches.

The downside: there’s a reasonable amount of learning that the team will have to take on and at first glance Ivy looks like it adds more complexity than benefit so getting buy-in isn’t a guarantee.

1 – once you “resolve” the dependencies you can simply look in the appropriate directory and see all the jar files for that configuration in one place

2 – I also put the Ivy jar files in subversion so that it’s available and doesn’t need to be installed separately. So the requirements to build are just a JDK, ant and a checked out source tree

3 – and if they don’t you’re in a lot of trouble as the project that wanted the newer version is unlikely to work with an older version anyway.

Ant, Subant and Basedir

January 25th, 2010

Apache Ant logoHere’s an important lesson for people combining ant scripts – the way basedir is calculated is very unlikely to be what you expect. In particular, if you combine the <ant> task with the <subant> task you’re probably in for a surprise.

I learnt this important life lesson when the improved build scripts I’d been working on failed on the build server even though it worked perfectly on my machine. The difference is that the build server is running cruise control and it has a wrapper ant script which checks out a fresh copy of the project then uses the <ant> task to build it. The ant task was:

<ant antfile="build.xml" target="dist" dir="projectDir" />

As far as that main antfile is concerned, everything is perfect – the basedir is the directory that it’s build.xml is in and all is good with the world. However, if that build.xml happens to use subant, the basedir will not be changed. Basically, basedir is now a user configured property rather than a calculated value so it doesn’t get changed. However, if you instead use:

<ant antfile="projectDir/build.xml" target="dist" />

It all works out. The main build.xml still gets the basedir as projectDir but when it uses subant, the basedir will be automatically changed to whatever directory the build file subant points to is in.

The behavior is explained in this bug report which is closed as WONTFIX for backwards compatibility. Thankfully ant 1.8 adds a useNativeBasedir attribute which provides much more predictable basedir behavior for the ant task.

Three Types of Ant Scripts

January 22nd, 2010

Bryan comments on the three types of ant scripts:

In my experience, there are three types of Ant scripts that you encounter "in the wild":

  • Small Ant scripts, generally Java-only, which can use most of Ant's default behaviors and are clear and simple. A lot of open source build scripts are this way.
  • Serious commercial Ant scripts written before macrodef and import became available. These are generally impossible to understand and evolve, and the reality is that a small cadre of Build Wizards keep them running. Such systems often involve a substantial number of custom Ant tasks.
  • Serious commercial Ant scripts written to use macrodef and import. In my experience, the need for custom Ant tasks drops way off with Ant releases post-1.6.

This really does ring true to me. Ant scripts can fairly quickly become unwieldy and difficult to work with if you aren’t using import and macrodef, but with them you can achieve so much more without the complexity getting out of hand. They won’t absolve you of the need to properly understand ant and the declarative paradigm it wants you to work with, but it’s much more approachable.

If I can ever get someone to add the optional dependencies for scripting support on our build servers1 I may well find they help a lot too.

1 – which happen to be Windows boxes, behind a firewall on the other side of the world from me, so not easy to make remote changes on

Comparing Build Systems

January 11th, 2010

After spending some time thinking about and using different build systems, I can’t say I really like any of them all that much. I know have a reasonably complex project, using submodules that can be built with ant, buildr, maven and gradle – with varying degrees of support for the Ephox specific requirements and reporting.

Maven

Ah, the Java world’s favorite whipping boy. The more I play with Maven, the more I start to understand why it gets such a bad wrap: it’s too easy to get started and do it all wrong. Maven is not a build tool that is quick to start using – it requires planning, common conventions, a number of extras systems such as repository servers be set up and templates to be designed. However, Maven does an unfortunately good job of working if you just create a simple pom.xml and run mvn.

If you want Maven to work reliably though, you need to set up a bunch of stuff:

  • Private repository servers, including vetting all the meta-data that goes into it. Don’t just import the central Maven repository.
  • A company specific parent POM. This should specify explicit versions for each Maven plugin, point the project at the private repository servers and anything else that is standard across the company.
  • Build plugins for anything custom you need. Real Maven plugins are the best way to extend things – trying to put it all in your pom.xml or similar build scripts rapidly gets too complex and unmaintainable.
  • Strictly follow the Maven way.

Make no mistake, that’s a lot of work, but it sets up a very robust, enterprise build system and most of it only needs to be done once for the entire company, so starting a project from then on does become pretty simple.

All in all, it’s more work than I really want to take on – especially the building of custom plugins. Ephox doesn’t follow the Maven way strictly enough either – the most problematic area being how to release and version snapshots.

Buildr

Lots of people talk about using rake to build Java projects, but rake by itself simple doesn’t know enough about Java to really shine. Buildr is a layer over rake that adds Java knowledge. I actually quite like Buildr and think it has a fair bit of promise.

It can use the Maven repository for dependencies, so if you go that way you need to set up your own private repository again. Support for Thankfully, Buildr can also use plain jar files so if you don’t want to go the repository route, you don’t have to. The big downside is how complex it can be to get transitive dependencies working – partly by design and partly due to bugs and current limitations.

Buildr can run on either C-Ruby or JRuby. Start-up is faster on C-Ruby, but JRuby can run things like Javac without spawning a new JVM, so it fairly quickly starts to make up the time difference. Installing either version is quite easy.

It is a little odd to be writing the build script for a Java project in Ruby. The language difference doesn’t really matter, but having a different set of libraries for things like IO and file system stuff is an unfortunate overhead. If you’re using Ruby in some other form, it’s no problem, otherwise it’s extra training and ramp-up cost.

I also had a lot of problems because Buildr was including stuff on the test classpath by default. Some version of JUnit and JMock are meant to be included, but it varies between versions and the documentation doesn’t always match up with what happens. Given that JMock 2 is very much incompatible with JMock 1.3, it didn’t go so well. The odd thing is that Buildr can be quite flexible – it supports TestNG and a few other libraries, but with JUnit it really wants to also use JMock. There is a way to specify a different version, but the group name is hard coded, so you can’t go back to JMock 1.3 because it has a different group name1. It would be far better for Buildr to just leave the mock framework as a dependency the project developer has to add if they want it.

Overall, Buildr is a pretty good solution, it’s quick and easy to get up and running (beware the central repository though) but it’s still fairly immature which caused me a fair few problems. The good news is that every issue I ran into was already a known problem that the Buildr team is working on resolving, so it’s pretty likely to become a very good option in the future.

Gradle

Gradle is a particularly interesting project. It feels a lot like a Java-oriented version of Rake with elements of Buildr, Maven and Ant mixed in. It has the best ant integration story of any of the non-ant build tools. Like Buildr it can pretty seamlessly utilise ant tasks but within the scripting language rather than with XML. Gradle can also import an existing ant build file and use the targets it defines as if they were native. That’s surprisingly powerful and useful, especially if you already have various build scripts and utilities written in ant.

Gradle also has a really nice approach to multi-project builds, allowing you to inherit configuration from the parent build script and pull in project dependencies easily. However, it’s not all smooth sailing. The main project build file starts to get pretty complex because it winds up configuring both itself and the sub-modules in the one file. The inheritance doesn’t really make sense if you have a mix of sub-module types, say some Java, some Scala and some just web resources. In hindsight, it would probably be better to ignore the built-in inheritance and just use normal file import functionality within each module to select the default module behavior to use.

Sadly, the multi-project stuff really came crashing down because of some pretty unexpected behavior about what the current project actually was. Depending on when a particular bit of script gets accessed, it might take project() to mean the current sub-module or it might wind up referring to the last sub-module that gets processed. It made sense when you think through the way the code works, but it’s far from intuitive when you’re just trying to build your project. I really couldn’t see myself recommending Gradle until this is significantly simplified and made intuitive.

Gradle also handles transitive dependencies much better than Buildr, though I had some confusion of when dependencies were transitive and when they weren’t. Gradle is one of the most flexible tools in terms of how dependencies are handled – using either the maven repository, ivy or defining dependencies as groups directly in the build files, allowing you to check jar files directly into svn if you prefer.

Gradle was also one of the slowest of the tools I tried. Once it’s up and running, build times are equivalent to ant, but a do-nothing target took just over 4 seconds whereas ant took well under a second. For a full build, that’s not a big deal – a project that ant builds in 2 minutes would take gradle 2 minutes and 4 seconds. The problem is when you’re running a really simple task as part of some development (e.g. trying to remove duplication that simian had flagged).

Ant

I’m beginning to think Ant should have been called Cockroach instead – at the end of the build tool war, you can bet ant will still be there going strong. It’s really quite scary that it’s been around for about 10 years now and is up to version 1.8, and not for lack of maintainers. Ant doesn’t use convention over configuration – you have to code up your entire build by piecing together the task building blocks it provides. That said, the tasks ant provides are it’s key strength – powerful, flexible and in almost every case very well designed – piecing together a build process from ant tasks is much simpler than piecing together one from scratch or with command line stuff like make would use.

Since you’re writing the whole build script yourself though, ant files can become very long winded and hard to maintain. If you want to use ant successfully, you have to build a set of base scripts that provide the kind of standard project system that tools like Maven and Gradle give you. The benefit being that you can build it the way you want, not the way the tools want you to, without ever having to fight the tool.

Thankfully, Ant actually has some pretty powerful tools to build up that project system. It’s simple to import other build files so you can break your script up, and it’s simple to use extension points by overriding targets in the actual project build file. Ant 1.8 adds ‘extension-point’ to make this even easier, but it’s quite good even without that. It’s not quick to build up the right structure but like the infrastructure required for Maven, you can probably share your ant templates company wide (or perhaps much further). Easyant seems to be an attempt to provide a ready-made project convention on top of ant, but I think it went a bit too far and buried the ant functionality too deeply.

What I wound up with is a set of build scripts that define various useful macros and targets – version numbering, tagging in subversion, working with dependencies on sub-modules – and a set of scripts that define module types such as jar, war and the parent project. Frankly, that approach has revolutionised the way I work with ant.

The build scripts are now basically a first class project in and of themselves. They can be re-used across multiple projects and improved over time as different projects have various needs. Those improvements can then be easily shared back with the original projects since the scripts aren’t being copied into every project, just treated like any other dependency.

The build files in the project and sub-module are then very short and simple, since they only have to define any non-standard behavior and the required dependencies. This makes the build process for the project much simpler to maintain and understand. It obviously makes it much easier to start a new project or module as well.

I also found it really useful to define macros to make things more readable. For example, many but not all projects need to include a copy of EditLive! so there’s a set of predefined targets to grab the right version and make it available. Targets that need EditLive! just depend on the “editlive” predefined target, but that doesn’t allow a specific version of EditLive! to be requested. So there’s also a macro defined, editlive-version, that lets you set the version to use. All it does is set a few properties that control where to get EditLive! from, but the final syntax is much easier to read the the previous method of putting them directly in a properties file allowed. I may be overusing the technique at the moment, but it’s quite useful2.

However, there are some drawbacks. It got fairly difficult to keep track of which properties were declared at which points in the build – especially as the number of build files being imported increased. There were a few builds tagged as ${version-major}.${version-minor} because of that. Fortunately, that confusion was limited to within the build framework I was building – not the actual build files for the project itself, and I found that it was mostly caused by my habit of defining every variable in a properties file that’s included at the top of the build script. For build frameworks, it’s a lot better to declare properties as late as possible, within targets. That way, the property is set at the same time as the first work related to it is being done and any dependent information would already be set. Essentially, it uses the target dependencies to work out the right order for setting properties.

The downside of building this framework is that now we have to maintain it ourselves as well – it would be much nicer to use something like Maven or Gradle and have their dev teams deal with the on-going maintenance. However, since we do have some Ephox specific stuff, we’ll always be maintaining some amount of build infrastructure and since we’re currently maintaining it separate for every project, this approach is a lot better than the status quo.

Conclusion

Ultimately, I think that ant is still the best choice, but it really is vital to set up a build framework that you can reuse, rather than doing everything from scratch for each project. Buildr and Gradle look like they have some huge potential in the future though, but they need some more time to improve stability, simplicity and consistency. I’d guess that their version 2.0’s will be something pretty serious to reckon with. Maven is an awesome tool but it just requires too much effort to get it working right. However, the consistency in how a project is built that the Maven project has brought to the Java would is absolutely revolutionary – neither Buildr or Gradle could be anywhere near as simple as they are if it weren’t for the work and evangelism of the Maven team to embed the Maven way in to the Java world’s consciousness. Sure we’re still fighting over many parts of it, but the Maven project structure is now well established and a very powerful convention.

1 – that pesky Maven 1 to Maven 2 transition again…

2 – I would also note that the greatest improvement ever added to ant is the else attribute on the condition task (in ant 1.6.2). Makes it much easier and more maintainable to set a property value to one of two options.

Project Directory Structure

January 6th, 2010

Having spent a bunch of time looking at various build systems and tools, one of the simplest and most effective improvements I’ve discovered is to always use the Maven project structure. It doesn’t matter if you’re not using Maven, there’s no downside to using it and every build tool that uses convention over configuration uses the Maven structure.

Previously I’ve been of the opinion that the directory structure really didn’t matter much – I went with whatever happened to be auto-generated by whatever tool I was using that day. There is a small overhead in remembering to look for the JavaSource directory instead of src or source when you switch projects but it’s incredibly minimal and not worth worrying about on its own. When it comes to build scripts, getting these basics in the same place saves a whole lot of configuration and makes everything simpler.

When you start using sub-modules within the project it really becomes clear just how much time you can waste tweaking bits of build script to work with even slightly different directory structures. Not only do many build systems give you more functionality automatically, you suddenly get the ability to re-use build scripts and templates across different modules and across projects.

Don’t think that using the Maven directory structure means you have to play by all the Maven rules though. You can still generate more than one artifact from a project if that suits you best, you can check jar files into source control rather than using a repository etc, but source files go in src/main/java (or scala or groovy or webapp etc) and test files go in src/test/java (or scala or groovy etc).

Simple,quick and easy to switch to1 and saves a surprising amount of setup work in your build scripts – especially when you first get the project up and running.

1 – assuming your current build scripts aren’t completely sadistic