I've struggled with Lucene before and failed to configure it properly resulting in absolutely horrendous search results so a recent need to implement search functionality wasn't something I particularly wanted to take on. In fact, I was prepared to do almost anything to avoid delving back into Lucene filters and parsers and tokenizers and "stuff". This tends to be problematic given that Lucene is the dominate search library – so popular in fact that it's been ported to other languages.
So I took a look at Solr – a web services front end to Lucene. Exchanging Lucene APIs for HTTP requests seems like a good tradeoff for me and Solr comes with a pretty decent configuration for Lucene right out of the box.
As it turns out, Solr's default configuration isn't just pretty decent, it's also surprisingly well commented. Combined with some reasonable documentation it was pretty straight forward to get Solr to do what I want and provide good search results without much effort. With a bit more effort I should be able to get search highlighting working as well which takes search results to a whole new level.
Two things really made me appreciate the choice to use Solr:
- It has a DirectSolrConnection class that removes the need for actual HTTP requests. As a bonus, it still uses the HTTP URLs and returns the same responses so if you later need to split Solr out onto it's own server you just have to implement the HTTP stuff and not change the result of your processing.
- It caches searches automatically.
Caching is just so cool to see in action. Using the built in search from Jackrabbit (which also uses Lucene) it was too slow to include the output of a search with each page (think, related links etc). With Solr's caching this isn't a problem anymore.
There's still a bunch of learning to do so that I can get really optimal results – getting searches to work over multiple fields properly so that I can weight the results based on which field matched would be good and I can see Solr can do it – just not quite sure how to make it all happen. Still, the current search is way better than anything I've managed to do before. Thanks to the Solr and Lucene teams!