Saw a notice the other day that IBM and Yahoo! had joined up to create a free crawler and search engine - IBM OmniFind Yahoo! edition. I couldn't resist trying it out an installed it on a spare server. It was quite painless but as always Debian isn't on officially supported.
To install grab the files at http://omnifind.ibm.yahoo.net/ [1] or fetch using wget. I downloaded the language pack as well as I'm a native swedish speaker and use these strange umlauts chars.
wget http://files.omnifind.info/setuplinux_i586.bin wget http://files.omnifind.info/dictionaryPacklinux_i586.bin
When installing use the argument -silent. The other argument, -console, didn't work. So run this:
server:~# ./setuplinux_i586.bin -silent server:~# ./dictionaryPacklinux_i586.bin -silent
The files are installed in /opt/ibm/OmniFindYahooEdition/. As no other web service was running on the server the search service claimed port 80. Start up OmniFind by running:
server:~# /opt/ibm/OmniFindYahooEdition/bin/startup.sh &
Likewise, to stop the daemon run:
server:~# /opt/ibm/OmniFindYahooEdition/bin/shutdown.sh
When service is up running verify that by pointing your web browser to the url. Now shutdown the service. as no document have been indexed so far nothing will show up when trying to search. The administrative interface runs on another port than the local and by default it's port 8888 on the localhost. If you need to change it edit:
/opt/ibm/OmniFindYahooEdition/shortcuts/admin.desktop
Change the last line to the port number that is suitable of your server:
[Desktop Entry] Encoding=UTF-8 Icon=www Type=Link URL=http://localhost:8765/admin
Start service and go to the administrative site at http://<mydomain>/admin and add web sites to crawl.
The first time you enter this site you will be asked to create a new account. This account is the administrative account so set a hard to crack password.
IBM OmniFind is based on Lucene and has a limit of 500 000 indexed document in a variety of formats. Seems to be a free killer app.