Living in a state of accord.

Solr Is Cool

I've struggled with Lucene before and failed to configure it properly resulting in absolutely horrendous search results so a recent need to implement search functionality wasn't something I particularly wanted to take on. In fact, I was prepared to do almost anything to avoid delving back into Lucene filters and parsers and tokenizers and "stuff". This tends to be problematic given that Lucene is the dominate search library – so popular in fact that it's been ported to other languages.

So I took a look at Solr – a web services front end to Lucene. Exchanging Lucene APIs for HTTP requests seems like a good tradeoff for me and Solr comes with a pretty decent configuration for Lucene right out of the box.

As it turns out, Solr's default configuration isn't just pretty decent, it's also surprisingly well commented. Combined with some reasonable documentation it was pretty straight forward to get Solr to do what I want and provide good search results without much effort. With a bit more effort I should be able to get search highlighting working as well which takes search results to a whole new level.

Two things really made me appreciate the choice to use Solr:

  1. It has a DirectSolrConnection class that removes the need for actual HTTP requests. As a bonus, it still uses the HTTP URLs and returns the same responses so if you later need to split Solr out onto it's own server you just have to implement the HTTP stuff and not change the result of your processing.
  2. It caches searches automatically.

Caching is just so cool to see in action. Using the built in search from Jackrabbit (which also uses Lucene) it was too slow to include the output of a search with each page (think, related links etc). With Solr's caching this isn't a problem anymore.

There's still a bunch of learning to do so that I can get really optimal results – getting searches to work over multiple fields properly so that I can weight the results based on which field matched would be good and I can see Solr can do it – just not quite sure how to make it all happen. Still, the current search is way better than anything I've managed to do before. Thanks to the Solr and Lucene teams!

Most Annoying Bug Ever

I've just spent the past three or four hours setting up Apache, Subversion, all my browsers etc etc to use SSL connections and client certificates for authentication with my Debian stable server. I'm sure the mod_ssl devs already know what's coming here and are either chuckling gleefully or ripping their hair out right now. Anyway, the joke for all those who are mod_ssl devs, is that you can't get subversion to use client certificates with a Debian stable server because Debian stable has Apache 2.0.54, complete with everybody's favorite mod_ssl bug. It's fixed in Apache 2.2, but not in 2.0.

So, back to basic authentication over SSL. Sigh, I had that working about two hours ago.

Visiting “Apache HQ”

(I wrote this Thursday night but didn’t have net access on the train to post it) I finally managed to catch up with a number of Apache people tonight at the Thirsty Bear (rest assured the bear is not quite as thirsty after our beer drinking efforts). Afterwards we picked up the two new IBM servers that have been sitting at Collab.Net and deposited them into the cage at the colo facility. Since I’m not a server guy at all this is one of very few times I’ve been in a server room and the first chance I’ve had to see the Apache server setup (there’s now another colo in Europe somewhere I believe). It’s pretty small and simple but seems to do the job quite well which is what really matters. Apparently the Technorati servers are in the same colo and they look a heck of a lot more impressive with a few racks full of servers and cables going everywhere. Either way it was great to finally meet a few Apache developers and put some faces to names. My description of myself as “the tall redhead” seemed to be effective as people walked straight in the door and introduced themselves to me. Sometimes it pays to stand out a bit I guess.

New ASF Machines

Apparently, the ASF took delivery of a few new machines today. I just can’t get the image of Sam Ruby sitting around ASF head office and suddenly there’s a knock on the door and he finds a pile of orphaned servers wrapped in a blanket. Then again, I always was weird…