« Book Review: Machine Beauty | Main | Closure on the North Pole Marathon »

April 9, 2007 6:53 AM

The Growth of Lucene

All of a sudden I’m seeing Lucene (and Nutch and Solr) being adopted by a bunch of my companies.  While several have been using it for a while, it’s spreading rapidly.  I’m curious about the experience folks have had with it – everyone I’ve talked to is thrilled; I’ve yet to hear from someone that said “it didn’t work for me and I decided to use ‘X’ instead.”  Any feedback?

Posted in: Technology

COMMENTS (14)

Brad,
We have used it for a couple of projects and it works extremely well. It gave us indexing ability to data that we previously did not have access to. I do not know all the tech details, but from a product guys side of things, it works well!

Will

Will , April 9, 2007 7:29 AM

I used Lucene for a bioinformatics tool platform back in 2004 and it worked very well. There was a bit of a learning curve from a developer's perspective, which I imagine has flattened after a couple years of documentation. Saved a lot of time and effort overall, however.

Ian Spivey , April 9, 2007 8:17 AM

At IBM we've incorporated Lucene into our no-charge enterprise search offering: http://omnifind.ibm.yahoo.com/productinfo.php

Josh , April 9, 2007 9:59 AM

There's not really anything else you could use instead :-) The Lucene API isn't the most, shall we say, intuitive, but it gets the job done.

Chris , April 9, 2007 10:17 AM

Lucene is the de-facto freetext indexing toolkit in the Java space.

By itself it only accepts text. This means you have to use adapters to extract text from files (e.g. Word documents). Typically the Jakarta POI project is recommended (for most common MS doc formats), but it doesn't seem too active these days so I would have concerns with how it would play with Office Vista docs for example. The commercial offerings have good document format support, but are not friendly from a dev point of view (their Java toolkits suck - at least the ones I'm familiar with).

Gavin , April 9, 2007 11:43 AM

Brad:
Lucene is good, if I may say so. It's been around for nearly a decade. One of the companies you invested in in (and I worked at) cca. 2000 used Lucene - Neomeo. I make *heavy* use of Lucene in Simpy.

Josh:
Not really true - you could use a number of similar (and free and/or open-source) tools - have a look at http://www.simpy.com/user/otis/search/retrieval for some of them. You will also find a few alternatives mentioned in my Lucene in Action book - see http://lucenebook.com/ .

Regarding the API not being friendly - please provide more feedback. I'm one of Lucene developers, so I'm curious about what's not intuitive.

Otis Gospodnetic , April 9, 2007 3:56 PM

Another one for Josh and anyone else interested in Lucene and its alternatives:
http://searchcafe.blogspot.com/2007/03/open-source-search-engines-in-java-and.html

Otis Gospodnetic , April 10, 2007 12:48 AM

You might also be interested in http://www.opensymphony.com/compass/

This sits on top of lucene and ORM frameworks like Hibernate. Basically it gives you lucene search for your database. So when you search it returns object through hibernate etc. Also when you update the domain model through hibernate it updates the the search index.

If lucene is like JDBC then Compass is like Hibernate for search.

Peter Delahunty , April 10, 2007 2:00 AM


you might want to also look at some of the 'newer' technologies coming out of the same space.

Hadoop http://lucene.apache.org/hadoop a framework for running apps on large clusters of hardware

UIMA - http://incubator.apache.org/uima/ a framework for analyzing large amounts of unstructured data

Ian Holsman , April 10, 2007 6:04 PM

Lucene is a fantastic project - it is the core search technology within our SaaS Web Content Management platform. We have deployed Lucene in the form of stand alone "search servers" - like solr, but we built long before solr was out. This provides a service based approach for both in-application search, and published website search, both of which it is excellent for.

This replaced a database based full text search, providing a much higher degree of performance, scalability, customization and flexibility.

Jeff Freund , April 11, 2007 9:27 AM

Some great stuff can be done with Lucene by even a mid level java developer. In the hands of a more advanced Java developer, you can get just about anything you could need out of an indexer.

We love its flexibility.

mnorusis , April 12, 2007 7:38 AM

We have gone through both Lucene.Net as well as Solr, mixing and matching the technologies as we needed them. We're eventually migrating everything to Solr, as it keeps us very flexible and allows us to take advantage of a bigger community on the java/linux side of Lucene than what's presently found on the dotnet/win side.

Jeff Rodenburg , April 13, 2007 10:31 PM

SearchBlox is an enterprise search product that uses Lucene. SearchBlox provides out-of-the-box search capabilities. http://www.searchblox.com/

Robert Selvaraj , April 16, 2007 5:09 AM

We just rebuilt our search backend for the LouderVoice launch using Lucene + PyLucene. It's working out very well so far for review search. The language stemming and multi-lingual capabilities really help us hugely.

Conor O'Neill , May 4, 2007 6:47 AM

Thanks for signing in, . Now you can comment. (sign out)

(If you haven't left a comment here before, you may need to be approved by the site owner before your comment will appear. Until then, it won't appear on the entry. Thanks for waiting.)


Remember me?