HttpClient -= Me

October 20th, 2010

I was quite pleased to see Odi’s post this morning stating that the NTLM code in HttpClient (now HttpComponents) has finally been replaced with a more robust and compatible version that supports the more modern NTLM variants. Many years ago that NTLM code was my first contribution to someone else’s open source project and it lead on to me writing a bunch of documentation and becoming an Apache committer. In turn, that’s put me in touch with a whole heap of incredibly smart people.

Sadly, the HttpClient 2.0 build proved so stable and reliable that I really didn’t have any more itches to scratch around the project and as the APIs changed quite significantly over the next couple of releases we still haven’t upgraded. With the new NTLM code now landing and the huge improvements already made everywhere else, I suspect there’s no longer any trace of my ever contributing to the project.

So a big congratulations to Odi, Oleg and the rest of the team that are still pushing HttpComponents forward and doing a great job of it.

Communities and Git

May 5th, 2009

GroupThe conversation that has sprung up around how the use of distributed version control, Github in particular, affects community is a refreshing change in the blogosphere1. It’s people collectively thinking things through rather than just reacting in uproar or following the latest meme.

The latest installment is from Ben Hyde, git: Balene for Knowledge2. It’s definitely worth reading in its entirety, but let me pull out a couple of key points:

[Source control] provides a locus around which the work can rendezvous. It a campfire, and the community gathers around it. I accept that. But let’s be clear, it isn’t the only campfire. Other examples: the irc channel, the email, the bug tracking, the release process, the distribution system, the license, the social networks, etc. etc.

and:

The thing that blew me away about git was that it helps to address this problem. It increases the probability that users will reveal their knowledge. It helps to create a cultural norm toward that behavior. To me this is far more important than the risk that their forking would quench the community campfire.

The thing that really strikes me about these two comments – GitHub doesn’t just provide forks with a git branch to work on, it also provides a wiki and issue tracker. It does keep track of the network and provide tools to help feed changes back and they do work well, but it unconditionally forks more than just the source control.

So perhaps one improvement GitHub could make is to enable a project to be forked without creating the wiki, issue tracker etc. Not only would that make it clearer that this was just an experimental or temporary fork rather than an alternate, long term project, it would further reduce the barrier to people getting involved. Right now, creating a fork means creating a whole open source project and it feels like you should be managing it as a project which feels like hard work. If you only forked the source control, it would just be a place to code out in the open so that you can get what you need from the project but also make it easily available to everyone else.

1 – wow that really feels like a blast from the past…

2 – in response to J Aaron Farr’s A Community of Rockstars which amusingly is part of the latest uproar.

I Love mod_proxy

November 1st, 2008

After my amazingly successful use of mod_proxy to provide clean URLs in an IWWCM instance, it’s been added to my bag of useful tricks to know about.  When you realize you can proxy differently based on the current virtual host it’s a very powerful solution.

My latest use for it was to add name based virtual host support to two completely separate virtual machines.  One machine runs IBM WCM and the other runs Quickr. Both use the same port, and in the future there will be more VMs with different versions as well, so while it would be possible to assign different port numbers, I’d prefer to not have to remember which VM is using which port etc.  The firewall however can only forward connections on a given port to one VM.

The solution then is to forward the traffic to Apache and configure mod_proxy within name based virtual hosts. Effectively Apache is acting as an intelligent HTTP router and I can add any number of VMs by adding a new CNAME and virtual host entry.

Ant SCP/SSH Task Hangs Or Never Disconnects

October 23rd, 2007

If you're using the scp or ssh tasks with ant, you may run into a problem where part way during the upload or never disconnecting after the command completes for the ssh task. There are a couple of possible causes:

  1. The scp problem is almost certainly caused by using ant 1.7.0 or below and jsch 0.1.30 or above. You could upgrade to the latest nightly of ant1 but it's probably easier to just drop back to jsch 0.1.29 which is what ant was developed against and works nicely. Bug 41090 contains the gory details.
  2. If the command you're executing with the ssh task starts a background service or otherwise leaves a process running, that may be the cause of the problem. You can add 'shopt -s huponexit' to your /etc/profile, .bashrc or somewhere like that. I must admit, I'm somewhat vague on the exact details of what that does but the basic idea seems to be to signal any background processes that bash is exiting and then not wait for them to complete (which allows your ssh connection to close). If you're starting a server they'll probably ignore the hup signal it sends and if not, use the nohup command.

Hopefully that will be the last I'll see of that issue.

1 – which seems to mean compiling from source at the moment, since the nightly build directory the Ant website links to is empty

Solr Is Cool

August 17th, 2007

I've struggled with Lucene before and failed to configure it properly resulting in absolutely horrendous search results so a recent need to implement search functionality wasn't something I particularly wanted to take on. In fact, I was prepared to do almost anything to avoid delving back into Lucene filters and parsers and tokenizers and "stuff". This tends to be problematic given that Lucene is the dominate search library – so popular in fact that it's been ported to other languages.

So I took a look at Solr – a web services front end to Lucene. Exchanging Lucene APIs for HTTP requests seems like a good tradeoff for me and Solr comes with a pretty decent configuration for Lucene right out of the box.

As it turns out, Solr's default configuration isn't just pretty decent, it's also surprisingly well commented. Combined with some reasonable documentation it was pretty straight forward to get Solr to do what I want and provide good search results without much effort. With a bit more effort I should be able to get search highlighting working as well which takes search results to a whole new level.

Two things really made me appreciate the choice to use Solr:

  1. It has a DirectSolrConnection class that removes the need for actual HTTP requests. As a bonus, it still uses the HTTP URLs and returns the same responses so if you later need to split Solr out onto it's own server you just have to implement the HTTP stuff and not change the result of your processing.
  2. It caches searches automatically.

Caching is just so cool to see in action. Using the built in search from Jackrabbit (which also uses Lucene) it was too slow to include the output of a search with each page (think, related links etc). With Solr's caching this isn't a problem anymore.

There's still a bunch of learning to do so that I can get really optimal results – getting searches to work over multiple fields properly so that I can weight the results based on which field matched would be good and I can see Solr can do it – just not quite sure how to make it all happen. Still, the current search is way better than anything I've managed to do before. Thanks to the Solr and Lucene teams!