Note: This web page was automatically created from a PalmOS "pedit32" memo.
Open Source desktop search applications
beagle - gnome, mono
http://kat.mandriva.com/ - KDE
http://www.recoll.org - Xapian, Qt
http://pinot.berlios.de/ - GTKmm,
Xapian, affilated with BerliOS?
pyindex - python
Xapian - Implemented in C++, but with many other language bindings
lucene/pylucene - written in java, but there's a gcj+swig means of getting
to it from python. Affiliated with OSAF. Recent version of swig don't
work for this - 2006-01-22.
From:
http://www.let.rug.nl/~gosse/Imix/clin04_tiedemann.pdf
Amberfish: http://www.etymon.com/tr.html
GPL, C/C++, plain text, semi-structured/XML (with nested fields),
wild-card search, phrase search, boolean queries, relevance ranking
DRS: very easy build.
find ~/Maildir/ -type f -name '[0-9][0-9][0-9]*' -print | count -b |
af -d ~/amberfish/all-mail -iCF
Started out really fast (70+ files/second), but between 30,000 and
40,000 files, it slowed way down.
Lucene: http://jakarta.apache.org/lucene/docs/index.html
Apache License, Java, plain/semi-structured documents, snowball
stemmers, phrase search, boolean queries, relevance ranking
Managing Gigabytes (MG): http://www.cs.mu.oz.au/mg/
GPL, C, csh, plain text, images, boolean or ranked queries
Swish-e: http://swish-e.org/
GPL, C, plain/semi-structured documents, snowball stemmers, wild
card search, phrase search, fuzzy search (soundex, metaphone), flex-
ible configuration (input/output, tokenisation etc), boolean queries,
relevance ranking, Perl bindings
DRS: swish-e 2.4.3 made short work of indexing my mail archive, but:
1) It gave a number of seemingly-spurious I/O errors
2) It couldn't find words that I'm certain it should've been able to
Xapian: http://www.xapian.org/
GPL, C++, plain text, snowball stemmers, phrase search, proximity
search, relevance feedback, wide range of boolean operators, relevance
ranking, Perl/SWIG bindings
Zebra: http://www.indexdata.dk/zebra/
GPL, C, structured (XML), phrase search, boolean queries, relevance
ranking, wild-card search, Z39.50 protocoll, client-server implementa-
ton
Zettair: http://www.seg.rmit.edu.au/zettair/
BSD-style license, C, plain, semi-structured (TREC), phrase search,
boolean queries, relevance ranking, summary function
DRS: Seems to be unable to index plain text, unless you feed it plain
text and hope it won't conflict
with html parsing conventions. Indexes rapidly. It hated dangling
symlinks, and also errored out on a file
that didn't exist - presumably it existed when find listed it, but
something probably renamed it.
This was with zettair 0.6.1.