Ant SCP/SSH Task Hangs Or Never Disconnects

October 23rd, 2007

If you're using the scp or ssh tasks with ant, you may run into a problem where part way during the upload or never disconnecting after the command completes for the ssh task. There are a couple of possible causes:

  1. The scp problem is almost certainly caused by using ant 1.7.0 or below and jsch 0.1.30 or above. You could upgrade to the latest nightly of ant1 but it's probably easier to just drop back to jsch 0.1.29 which is what ant was developed against and works nicely. Bug 41090 contains the gory details.
  2. If the command you're executing with the ssh task starts a background service or otherwise leaves a process running, that may be the cause of the problem. You can add 'shopt -s huponexit' to your /etc/profile, .bashrc or somewhere like that. I must admit, I'm somewhat vague on the exact details of what that does but the basic idea seems to be to signal any background processes that bash is exiting and then not wait for them to complete (which allows your ssh connection to close). If you're starting a server they'll probably ignore the hup signal it sends and if not, use the nohup command.

Hopefully that will be the last I'll see of that issue.

1 - which seems to mean compiling from source at the moment, since the nightly build directory the Ant website links to is empty

Tomcat Startup Issues

August 21st, 2007

I was so close to having everything working… EC2, S3, automatically pulling down the latest build and deploying it, Tomcat 5.5 with the native APR libraries, SSL support and using iptables to forward ports 80 and 443 directly over to Tomcat. Everything ready to go. Except Tomcat isn't so keen on starting.

It usually starts, though it can take over half an hour to do so and on a couple of occasions it's just flat out sat there and done nothing for multiple hours on end. At startup it outputs the log message:

Aug 20, 2007 3:08:56 PM org.apache.coyote.http11.Http11AprProtocol init
INFO: Initializing Coyote HTTP/1.1 on http-8080

and then nothing until all of a sudden 5-45 minutes later it suddenly comes back to life, finishes starting up and works perfectly. There's no CPU usage while it's out, it's just sitting there waiting for something to happen (network lookup??).

Sigh. I'm sure the world is out to get me….

Hosting on Amazon EC2

August 3rd, 2007

I've done a fair bit more investigation into using EC2 for web hosting and it seems to be something that people do with a fair bit of success. In addition to Geert who commented on my last post and who's site rifers.org is hosted directly on EC2, there's also hanzoweb.com and www.gumiyo.com all of which just have their DNS pointing at an EC2 instance.

I still wish Amazon had a preconfigured solution that acted as the web front end and load balancer with a static IP, but it appears that it's quite feasible to just point your DNS at EC2 and your server seems to stay put.

I've also done a bunch of development against S3 with some pretty fantastic results. With a little bit of simple caching it's actually feasible to run the server here in Australia and store the data on S3 without too much pain. Having very simple APIs is nice because it allows you to build a mock S3 quite easily to use for testing without having to jump through a lot of hoops.

Overall, I'm very impressed - building a web app entirely on S3 is not only feasible, it's fairly simple and can actually speed up development. There's even a couple of personal projects of mine that S3 may be a good fit for.

Solr Search Index Backups?

July 25th, 2007

If you have a massive set of documents that you're using Solr to search (let's say a few million HTML pages) how much should you worry about losing the search index?

It is of course always possible to reindex the original documents, but that would take a fair while, so should you keep a backup of the search index? If you restored the backup, how would you identify which documents needed updating?

Solr seems to support replication - should you just use that as a constant backup that you can swap over to using if something goes wrong?

So many questions…

And it looks like there are answers out there somewhere - Solr has a bunch of tools related to backups and stuff. Looks promising, though the documentation doesn't really say what the best practices are at least there are tools that look like they help you manage the search indexes.

Amazon EC2 As A Webhost

July 24th, 2007

We need to move our company wiki and JIRA instance to a server with more RAM and CPU to spare as they're pretty slow on the current overloaded virtual server, so we've been looking at a few different options. One that came up was using Amazon's EC2 and S3 services. We knew straight off that we didn't need the scalability they offered but getting some experience using them could be beneficial and we really didn't know anything about what they actually offered so it was worth a quick look.

Those familiar with EC2 won't be surprised to hear that we won't be going with the service for three reasons:

  1. It's at least as expensive as the dedicated server we'd need.
  2. The filesystem gets reset everytime the server reboots (S3 provides a REST API to store and retrieve data, not a filesystem)
  3. The server gets a new IP address every time it reboots.

The cost only applies to us because we don't need scalability - our needs are really quite consistent so we're not avoiding purchasing large amounts of redundant hardware. We also have the ability to just pay a hosting company to set up one dedicated server for us instead of setting up our own server farm. If you were offering software as a service however, Amazon's offering is likely to save you a lot of money.

The filesystem resetting is a challenge for deploying most existing web applications, but not for software designed to run with S3. For instance, it's pretty easy to imagine a wiki implementation that uses S3 as it's "database" for storing data directly (probably with some local caching etc). Wikis are somewhat ideal for this because search is about the only query you perform on the data - otherwise you just retrieve pages by name which S3 is perfectly suited for. The fact that so many wikis use flat files instead of databases is an indication that it'd work pretty well. There would be a few hurdles but nothing insurmountable.

The dynamic IP however is a real pain. There are examples of using dynamic DNS to work around it but the lag in DNS updates seems like a problem to me. The better solution would of course be to have a load balancer in front of your EC2 instances - the load balancer would have a static IP address and the EC2 instances would just register with it when they start up. Unfortunately this means you have to have a server outside of EC2 to do the load balancing which means another hosting provider to work with and it just seems odd to have the load balancing server in a different data center to the rest of the servers. If Amazon added an option to build an EC2 machine that could only ever have one instance but had a guaranteed IP address it would be the perfect solution.

It's certainly something interesting to play with - I'll have to chase up a corporate credit card and see if I can get access to do some experimentation some time.