JavaScript Testing in the Cloud

June 8th, 2010

One of the things Ephox is contributing to TinyMCE is a build farm to run the automated tests in various browsers which winds up publishing it’s results for all to see. This has been pretty interesting to set up and there are a range of different approaches. Matt Raible posted recently about his experiences using Selenium with Sauce Labs. I had initially looked into that as well, but was worried about a few of the issues Matt hit and TinyMCE had already written a lot of tests using QUnit rather than Selenium.

Instead I’ve wound up with quite a neat little set up based around Hudson. The master Hudson server is running on an EC2 instance, so the configuration and control interface is easily available from anywhere. However, EC2 can only run specific Windows server versions so it can’t provide all the browsers we need. Instead, the slaves are run as VMWare instances back behind the firewall in our UK office. They use Hudson’s webstart slave support so they connect out to the master, avoiding the need to punch a hole in the firewall. At this stage we have Windows XP, Vista and Windows 7 running a suite of browsers (roughly grouped as “old”, “previous version” and “latest” on each of those VMs). We’re also using the master server to run the Linux browser tests.

The next challenge was to get QUnit working with continuous integration so it’s results are reported back correctly. Unlike JSUnit or Selenium, QUnit doesn’t really have anything like this built in, though it does provide some hooks to make it possible. I simply took the JSUnit server, which is completely agnostic about what actually runs in the browser, and a simple HTML and JavaScript harness to marshall the QUnit test results and submit them back to the server.

Finally we want to set up a workflow so tests on different VMs can run in parallel but builds are only published if they actually pass all the tests. To achieve that, I’ve split the build into three parts:

  1. The build itself. Minifies the JavaScript and various other stuff to build the zip package that would be distributed. At this stage, that zip is just left on the Hudson server and not published anywhere. It may be possible to do this using the “touchstone build” option for the configuration matrix type but I haven’t investigated that yet.
  2. The test project is set up as a configuration matrix type, so Hudson automatically duplicates the build on any slaves that are chosen (in this case, all of them plus the master to cover Linux). Each slave then downloads the zip package from the previous build phase and runs the tests. If any of the VM builds fail the test project is considered to fail and the process stops.
  3. Finally if the tests all pass the packaging project starts which simply publishes the zips to various places. It uses the workspace clone plugin to effectively pick up where the build step left off.

With this approach we are essentially building a full release candidate, testing it and then finally releasing it. While the tests don’t exercise it fully enough, this has the advantage of actually testing that the intended files are making it into the zip. Most unit test setups run the tests based on just the compiled classes with no guarantee that they’ll come out the same way after being packaged up. While I was just grabbing the full zip out of pure convenience, it is nice to know that catastrophic packaging problems would be picked up by the tests now and I can quite easily build out more tests to pick up smaller errors as well.

Current issues:

  • The slaves are extremely good at reconnecting after the server reboots, but if the slaves themselves reboot sometimes their authentication has timed out and they can’t re-download the JNLP file without help. It would be nice if I could specify authentication credentials for Java WebStart on the command line.
  • The test project is mostly triggered by the build project completing, but it also polls for changes in the git repository of the test harness. That way if we change the harness it verifies that all the projects that test harness is used by will still build and pass successfully rather than giving us a surprise failure on a later, unrelated change. Unfortunately you can’t decide which Hudson instance will be the one polling for updates and winds up distributing it out to the remote slaves (in itself a pretty stupid thing to do in this particular setup). When you restart the server though it takes a few moments for the slaves to reconnect and in the mean time Hudson moves the polling back to the master server, which has an out of date workspace for that project so it almost always detects a change. The net result is that restarting the server causes all the tests to run for no reason.
  • The matrix project type basically makes a set of sub-projects for each VM so things like Twitter notifications report on multiple cryptic projects rather than just reporting on the overall project (hence I’ve given up on the twitter notifications).

Things to improve:

  • Add a Mac into the mix – Safari is covered from the Windows builds anyway but there can be subtle differences between versions on different OS’s and since it’s all automated, we may as well be thorough.
  • Make the slaves redundant. I’d like to effectively duplicate each of the slave VMs so that I have a “latest browser” VM pool rather than a specific VM. That way if one of the VM fails to reconnect or isn’t available for some reason, there is a reasonable chance that the other one will be available. If they’re both up and running it gives extra throughput which is always good too.

Overall, I’m quite pleased with the setup and look forward to growing the test coverage for TinyMCE.

Building in the Cloud

April 26th, 2010

Once upon a time, the state of the art for building software projects was to have a continuous integration server so your builds were repeatable, reliable and performed in a controlled environment. As a bonus they also ran a whole suite of automated tests so you knew that what came out was at least reasonable quality.

Those days have gone.

These days, it’s far more common to have a build and test farm churning away to handle the volume of tests, various different modules and projects that are all built separately and of course the requirement to run the tests on a huge variety of different platforms and environments. VMWare has been a big help in this area for quite some time, effectively letting teams build a private cloud before it was cool to call it a private cloud. Now with the growing availability of public cloud instances, it makes a lot of sense to stop buying and maintaining hardware and simply move the whole system into the public cloud.

I’ve recently had the opportunity to try that out with a new project but have been so focussed on getting things up and running that I haven’t found the time to write up the details and my thoughts about it all. I’m going to try and go back over things so I have a record of the things I’ve learnt, starting with this post and some thoughts on the advantages and disadvantages of moving builds into a public cloud1.

Advantages

Power on Demand

If you have a lot of projects, you can use the dynamic scaling of the cloud to add more build servers for those times when everyone seems to commit at once and avoid big backlogs, without paying for that hardware all the time. This is less useful when you only have one project, at least with EC2. Since it takes so long to spin up a new instance, if you probably don't want to have servers that will be needed for every build spun down when not in use – it might reduce the EC2 build, but it will make your builds too slow.

Even so, it is a lot easier to justify continuously running a number of EC2 servers so you can run tests in parallel than it is to justify buying a number of physical servers yourself. Not to mention how much easier it is to setup and maintain multiple instances.

No More Down Time

If only it were that simple. Moving to the public cloud won’t eliminate down time, but it does do a pretty good job of making hardware faults someone else’s problem. Even if a hardware fault or something else does take out a critical part of your build system, it’s usually quite simple to run up a replacement and get everything working again. You never have to pay for idle backup hardware or waste time repairing machines. Plus, with the ability to take a snapshot and store it, backups are easier than ever both to take and to restore.

Available Anywhere

Moving to a public cloud means that the current build status is available without any hassle from anywhere in the world. Basically, it gets moved out from behind the corporate firewall. This isn’t an advantage for everyone – many companies already have a very well setup and maintained VPN that is routinely used. While this can work pretty seamlessly, it’s surprisingly complex to get VPNs up and running for everyone in the company, resulting in plenty of time being wasted on system administration and providing tech support. For developers who mostly work in the office, the barrier of setting up VPN may be high enough to prevent them occasionally working from home or makes them less productive when they are occasionally on the road.

Accessing Builds

With a build server in the sky, everyone in your company can easily grab the builds and it’s often quicker to deploy them to the website or other places. I’ve taken advantage of the fast and free data transfer between EC2 and S3 so keep a complete backlog of builds available for support purposes. Previously, this was done with a shared drive and every so often we ran out of space and had to delete some of the less likely to be needed builds2. S3 doesn’t ever run out of space which is nice.

If you have dependency management, you will probably want to move the repository into the cloud as well – either on the master build server or a dedicated instance if you have enough demand for that.

Competitive Advantage

One of the biggest misconceptions I’ve come across when dealing with build systems is the idea that “we’re an IT company, this kind of thing is a key competitive advantage”. Continuous integration, automated testing and deployment technologies can all be competitive advantages, but maintaining the hardware they run on almost never is. Maintaining hardware or software for a source control system is almost never a competitive advantage, unless your product happens to be a source control system.

If you can stop spending money buying hardware, and you can stop wasting time maintaining servers internally, you can spend more time and money on the software side of your build systems or on developing your products and that’s where you get the real competitive advantage. There’s no such thing as an “IT company”, it’s just too broad a category – find the specific area that you should build competitive advantages in and then focus in on that and get someone else to worry about providing anything else.

Disadvantages

Security

Everyone worries about security in the cloud and often needlessly so, but moving the build server outside your corporate firewall makes it less secure. On the other hand, you then wind up paying more attention to properly securing it rather than just depending on the firewall, so it’s not all bad news. Since the build server has to have access to your source code, it is a vector of attack that you really want to take seriously and make sure you mitigate.

Accessing Source Control

If the build server is outside the firewall, your source code will need to be too. For small to mid-size companies, I’ve come to think that hosted source control is the right way to go anyway – why would you want to waste your time maintaining source control servers? Subversions isn’t particularly nice to use if the server is on the other side of the world, but the distributed version control systems like git have no difficulties with it. The way I see it is that if it’s worth hosting either your source control or your build servers internally, it’s worth hosting both internally. If not, move them both to a hosted environment and you’ll have more time to focus on developing the software that actually makes you money.

Accessing Builds

If all of or most of your development team is in one office, having the builds stored externally is a bit of a disadvantage because now you have longer to wait while they download, and in a backwards, outdated country like Australia3, it can also chew into the limited download quota you have.

When is This a Bad Idea?

I can think of two situations when this may not make sense:

  1. You’re a big company and can take advantage of economies of scale all by yourself. IBM, Apple, Microsoft and especially Google can maintain a private cloud cheaper than they could move it to a public cloud. I’m not sure where the cut off would be, but I suspect companies much smaller than those would still be included in this batch.
  2. You have a centrally located team, slow internet and/or a slow source control system. Moving stuff externally doesn’t make sense if it becomes too slow to access. However, I’d still be looking to fix the internet and source control – a centrally located team would still be better off without maintaining hardware if it was fast enough.

Why else wouldn’t you want to do this?

1 – I use Amazon EC2 pretty much exclusively as a cloud provider but I’d definitely be interested in hearing about other options and what benefits they might bring. I suspect pretty much all of this would apply regardless of which cloud provider you went with though.

2 – naturally we could go and rebuild them based on the source code in subversion, but who can be bothered when you could just grab a pre-built version?

3 – at least in terms of internet access

Stop Concatenating CSS Files

April 5th, 2010

One of the common examples of the limits of Maven and other “strict” build tools was how difficult is to concatenate CSS files together as part of the build. It’s nice to be able to split CSS files up to categorize things but the extra HTTP requests really slow down page loads, so the build should concatenate them back together. What I’ve come to realise though, is that building a custom concatenation script is just another instance where people are reinventing the wheel in build systems, and getting less power as a result.

Instead of using plain CSS and concatenating, use something like LESS to compile the CSS. You can specify the CSS files to import with the usual @import statement and LESS will do the concatenation for you. This is particularly nice since the order of import is specified right in the CSS files, rather than in the build script and you can build full hierarchies of imports. Plus, you get the full power of LESS, with variables, includes etc etc if and when you want to use it. There are a range of CSS compilers beyond LESS and a range of ways LESS can be integrated into a variety of build tools, however I wound up just doing it dynamically with the LESS Servlet. Then it sets headers appropriately and optimises CSS and JavaScript with YUI compressor while it’s at it.

This of course leaves me wondering if something like Google’s Closure could do the same kind of thing for JavaScript files. The question being not just, can Google’s Closure concatenate JavaScript files for me, but rather can it do that plus give me a bunch of other useful tools for free?

Using Ivy for Dependency Management

January 25th, 2010

At first glance, Ivy looks like a re-implementation of Maven’s dependency management that works nicely with ant, and to some degree it is, but it also adds some pretty significant improvements and some pretty significant complexity.

Maven Compatibility

Firstly, Ivy is compatible with Maven repositories, so if you think the way Maven manages dependencies is perfect, but don’t want to buy into the rest of Maven, Ivy provides a good answer. The configuration is a little bit different and you’ll have to learn a little bit about Ivy’s configurations which are both more powerful and more complex than Maven’s dependency “scope”, but you won’t have to go too far into them.

Ivy will use the Maven 2 repository by default so all the same libraries are available – complete with all the metadata problems.

Ivy Configurations

Configurations are probably the biggest difference between Ivy and Maven’s dependency management. At the simplest level they are roughly equivalent to setting the scope attribute in Maven. It lets you choose whether a library is required only for compilation, if it should be included in the packaged WAR/EAR or if it’s used only for testing. Since Ivy doesn’t actually build the project, what a configuration means is a lot more flexible than what I scope means in Maven. Each Ivy setup needs to setup the configurations it needs from scratch and it’s up to the Ant build process to ensure the libraries are used as intended.

If you import a project from the Maven repository, Ivy will convert the various scopes into the following configurations:

Name Description Example Library
default runtime dependencies and master artifact can be used with this conf
master contains only the artifact published by this module itself, with no transitive dependencies The project’s jar itself
compile this is the default scope, used if none is specified. Compile dependencies are available in all classpaths commons-lang
provided this is much like compile, but indicates you expect the JDK or a container to provide it. It is only available on the compilation classpath, and is not transitive Servlet APIs
runtime this scope indicates that the dependency is not required for compilation, but is for execution. It is in the runtime and test classpaths, but not the compile classpath An AOP runtime library?
test this scope indicates that the dependency is not required for normal use of the application, and is only available for the test compilation and execution phases JUnit
system this scope is similar to provided except that you have to provide the JAR which contains it explicitly. The artifact is always available and is not looked up in a repository ??
sources this configuration contains the source artifact of this module, if any Source for the project
javadoc this configuration contains the javadoc artifact of this module, if any JavaDoc for the project
optional contains all optional dependencies Anything optional

The three most commonly used configurations for dependencies are compile, provided and test – most projects will only ever need to add dependencies in those categories. The master, sources and javadoc configurations typically don’t contain any dependencies, they are used for publishing the results of the build into – the same configurations defined here are used by other projects that depend on this one. The default configuration is provided to make this easy, by including all the runtime dependencies and the project’s jar itself in one configuration.

It’s also worth nothing that configurations can extend each other. In the Maven import, runtime extends compile so any libraries you compile against are also available at run time. Default extends runtime and master. You could of course ignore the extends functionality and just include multiple lib dirs on the classpath within your normal ant script as you would if the jars were directly checked in to source control, but the functionality is reasonably simple and makes it very explicitly clear what jars are on what classpath1.

My approach has been to stick with these configurations – they’re simple enough but cover every conceivable situation that I can currently imagine. The only real difference I’m not currently using the master configuration – the produced jars are just put directly into default. That means you can’t easily require the library without any dependencies, but it means that the defaults can be used for what most libraries produce (a single jar file with the same name as the project and in the default configuration).

Ivy Repositories

Ivy has it’s own concept of a repository that uses Ivy descriptor files instead of the Maven POM files and gives the full power, flexibility and complexity of Ivy’s configuration concepts. The configurations might be handy, but what I find most useful, is the fact that Ivy supports multiple different types of repositories. So instead of having to run specific software to provide the repository, it can be accessed via sftp, ssh, a shared drive or even subversion. This makes setting up your own repository significantly easier – it’s quite likely you already have a server that’s available via ssh somewhere. I also like the fact that the whole Ivy configuration can be included in the project itself, so once you check out the source code, you have all the config settings you need to be able to build it2.

Importing Libraries

Ivy can pull libraries from the public maven repository and import them into your private one which makes it reasonably quick to spin up a private repository and get rid of the dependency on the public one altogether. However, it’s still a fairly slow process and takes up the bulk of the time required to get Ivy up and running. That said, it immediately solves a lot of headaches about incorrect meta-data or missing libraries in the maven repository.

What I found was missing however, was a simple tool to add new libraries that don’t exist in the Maven repository. For a single jar, it’s pretty easy to create the directory structure and an ivy.xml file for it, but when you have a large number like the GData APIs, it can take up a huge amount of time. I whipped up a simple bash script to work out most of the meta-data from a jar file (module name and version number), create the directory structure and a simple ivy.xml file then upload it to the repository. It doesn’t add the dependencies automatically but that’s easy once the directory structure and basic ivy.xml has been created.

Namespaces

One of the most common problems with the Maven repository is the number of Apache libraries that still use the Maven 1 naming scheme. For example Commons IO is available under both the org.apache.commons group and the commons-io group. To Maven and Ivy, the different groups mean it’s actually a different library so it can potentially wind up on the classpath twice. When you import libraries from the Maven repository, ivy lets you configure namespaces to avoid this problem. Essentially, Ivy will rewrite the group name to consistently use one or the other, resulting in a more consistent repository and no duplicate jars on the classpath. The rename is done both for the imported library and anything else you import that depends on it so things automatically point to the right place.

Actual Usage

Actually using Ivy is really quite straight forward. There are a lot of different ant tasks you can use, largely due to the huge amount of flexibility Ivy provides, but you can also keep it quite straight forward. I’ve got five basic uses at the moment:

  1. configure – I’m calling this outside of any target at the moment so it always happens before anything else. It sets up Ivy using the configuration file designed for this project. So it has the right configurations set up and points to the private repository instead of the default Maven one.
  2. retrieve – actually calculate the dependencies and link them into the specified directories. Each configuration gets the jar files placed in it’s own sub-directory. From then on, to use the dependencies in the ant script, just point at the appropriate directory. I also set this to use symlinks when possible so the files actually stay in Ivy’s cache directory and it just creates a symlink to save some time.
  3. Inline retrieves – the retrieve task can also be configured to retrieve a library that you specify directly via attributes instead of in an ivy.xml file. This is really handy for things that the build scripts itself want to use, like a scala compiler or cobertura. Otherwise each project would have to have the same dependencies.
  4. Import – I have an ant task set up specifically to make it easy to import libraries from the Maven repository. It uses a slightly different settings file (which includes the Maven repository) and prompts for the group, module and revision to import.
  5. buildlist – this is a handy little task for building sub-modules within a project. It looks at each project’s dependencies and calculates the correct build order which you can then pass in to subant. Very much like the Maven reactor.

The nice thing is that Ivy just plugs in to your existing ant file without many changes at all.

Tracking Versions

Ivy has some nice ideas for working with versions which I haven’t found in Maven before though they may exist. Firstly, it can generate a build number by looking at the latest version available in the repository and adding one which is much better than having a normal ant buildnumber file on a shared drive somewhere. Secondly, when Ivy published an artifact to the repository, it rewrites the ivy.xml to include the specific version of dependencies that it was built with, so anything that depends on it will get known good libraries, even if it was using a SNAPSHOT style build. In Ivy the SNAPSHOT version is equivalent to latest.integration, but it always generates some form of version number when the build is published, so you never get a latest.integration version in the repository. By default it uses a date stamp which actually works out really well.

Aside from latest.integration you can also specify a range of restrictions for acceptable versions which is handy but probably too complex to be worth the effort in most cases. Ivy already resolves most version conflicts automatically by evicting the older versions which works fine as long as the libraries maintain backwards compatibility3. So when commons-httpclient requires commons-logging 1.0 but commons-io requires commons-logging 1.1, you wind up with just commons-logging 1.1 on the classpath and everything works.

Sharing Projects and Modules

Ultimately, the main reason I was looking to move away from jar files in source control was to make it easy to share modules between different projects. Right now, that would basically have to be done by either forking the code, occasionally building a jar and dropping it in or using svn:externals – none of which are particularly appealing. With Ivy, the buildlist can handle any submodule being dumped into the project directory and you can pull a version of the module from the repository regardless of whether or not you have the source code checked out for it. There are now two modes you can use sub-modules in:

  1. Just declare it as a dependency and Ivy will grab the version from the repository. Just like including a jar in source control but a little easier.
  2. Declare it as a dependency and check out a copy of the source as a submodule of the project you’re currently working on. This then gets picked up by the Ivy buildlist task and automatically inserted in the right place in the build process. Now you can easily make changes to the module and use them in your outer project without doing a full release. You only need to push a new version into the shared repository when you’ve got the bugfix or new API completely sorted out and ready for others to use. In the mean time, everything still builds with a single execution of ant.

Speed

The project I tested this out on has a lot of dependencies – it took ages to import them all into the private repository, many required manually fixing up metadata problems as well. With the jar files in subversion, the build server could check out the full source (including libraries) and build the project in 3-4 minutes (the build server and the subversion repository have a gigabit connection between them). The first build it did with Ivy took the build time up to about 25 minutes as it downloaded all the dependencies (the Ivy repository was in the US, the build server was in Australia), but because Ivy keeps a local cache of everything it downloaded the second build went back down to the usual 3-4 minutes.

Ivy can be a bit slow at calculating dependencies though I probably wouldn’t have noticed it if the project didn’t have as many submodules, each of which require Ivy to calculate the dependencies before building in. In the grand scheme of things though, the time that Ivy spends working out dependencies is small enough to be insignificant compared to whatever the rest of the build is doing. Since the jar files are just dropped in to a normal directory, it’s also easy to add a flag to completely skip Ivy and use the existing dependencies if you do have small tasks that are run really frequently. With a remote subversion server you can waste far more time updating, moving or deleting jar files.

Summary

I think Ivy is a pretty clear winner. It’s simple to set up a private repository and avoid all the common problems people hit with the Maven repository, what complexity Ivy adds can be isolated to the template build scripts so individual projects stay quite simple and with buildlist it’s now easy to share modules between projects which had been causing me a lot of headaches.

The downside: there’s a reasonable amount of learning that the team will have to take on and at first glance Ivy looks like it adds more complexity than benefit so getting buy-in isn’t a guarantee.

1 – once you “resolve” the dependencies you can simply look in the appropriate directory and see all the jar files for that configuration in one place

2 – I also put the Ivy jar files in subversion so that it’s available and doesn’t need to be installed separately. So the requirements to build are just a JDK, ant and a checked out source tree

3 – and if they don’t you’re in a lot of trouble as the project that wanted the newer version is unlikely to work with an older version anyway.

Ant, Subant and Basedir

January 25th, 2010

Apache Ant logoHere’s an important lesson for people combining ant scripts – the way basedir is calculated is very unlikely to be what you expect. In particular, if you combine the <ant> task with the <subant> task you’re probably in for a surprise.

I learnt this important life lesson when the improved build scripts I’d been working on failed on the build server even though it worked perfectly on my machine. The difference is that the build server is running cruise control and it has a wrapper ant script which checks out a fresh copy of the project then uses the <ant> task to build it. The ant task was:

<ant antfile="build.xml" target="dist" dir="projectDir" />

As far as that main antfile is concerned, everything is perfect – the basedir is the directory that it’s build.xml is in and all is good with the world. However, if that build.xml happens to use subant, the basedir will not be changed. Basically, basedir is now a user configured property rather than a calculated value so it doesn’t get changed. However, if you instead use:

<ant antfile="projectDir/build.xml" target="dist" />

It all works out. The main build.xml still gets the basedir as projectDir but when it uses subant, the basedir will be automatically changed to whatever directory the build file subant points to is in.

The behavior is explained in this bug report which is closed as WONTFIX for backwards compatibility. Thankfully ant 1.8 adds a useNativeBasedir attribute which provides much more predictable basedir behavior for the ant task.