beagle - gnome, mono http://kat.mandriva.com/ - KDE http://www.recoll.org - Xapian, Qt http://pinot.berlios.de/ - GTKmm, Xapian, affilated with BerliOS? pyindex - python Xapian - Implemented in C++, but with many other language bindings lucene/pylucene - written in java, but there's a gcj+swig means of getting to it from python. Affiliated with OSAF. Recent version of swig don't work for this - 2006-01-22.
From: http://www.let.rug.nl/~gosse/Imix/clin04_tiedemann.pdf Amberfish: http://www.etymon.com/tr.html GPL, C/C++, plain text, semi-structured/XML (with nested fields), wild-card search, phrase search, boolean queries, relevance ranking DRS: very easy build. find ~/Maildir/ -type f -name '[0-9][0-9][0-9]*' -print | count -b | af -d ~/amberfish/all-mail -iCF Started out really fast (70+ files/second), but between 30,000 and 40,000 files, it slowed way down. Lucene: http://jakarta.apache.org/lucene/docs/index.html Apache License, Java, plain/semi-structured documents, snowball stemmers, phrase search, boolean queries, relevance ranking Managing Gigabytes (MG): http://www.cs.mu.oz.au/mg/ GPL, C, csh, plain text, images, boolean or ranked queries Swish-e: http://swish-e.org/ GPL, C, plain/semi-structured documents, snowball stemmers, wild card search, phrase search, fuzzy search (soundex, metaphone), flex- ible configuration (input/output, tokenisation etc), boolean queries, relevance ranking, Perl bindings DRS: swish-e 2.4.3 made short work of indexing my mail archive, but: 1) It gave a number of seemingly-spurious I/O errors 2) It couldn't find words that I'm certain it should've been able to Xapian: http://www.xapian.org/ GPL, C++, plain text, snowball stemmers, phrase search, proximity search, relevance feedback, wide range of boolean operators, relevance ranking, Perl/SWIG bindings Zebra: http://www.indexdata.dk/zebra/ GPL, C, structured (XML), phrase search, boolean queries, relevance ranking, wild-card search, Z39.50 protocoll, client-server implementa- ton Zettair: http://www.seg.rmit.edu.au/zettair/ BSD-style license, C, plain, semi-structured (TREC), phrase search, boolean queries, relevance ranking, summary function DRS: Seems to be unable to index plain text, unless you feed it plain text and hope it won't conflict with html parsing conventions. Indexes rapidly. It hated dangling symlinks, and also errored out on a file that didn't exist - presumably it existed when find listed it, but something probably renamed it. This was with zettair 0.6.1.