IBM OmniFind on Debian

Saw a notice the other day that IBM and Yahoo! had joined up to create a free crawler and search engine - IBM OmniFind Yahoo! edition. I couldn't resist trying it out an installed it on a spare server. It was quite painless but as always Debian isn't on officially supported.

To install grab the files at http://omnifind.ibm.yahoo.net/ or fetch using wget. I downloaded the language pack as well as I'm a native swedish speaker and use these strange umlauts chars.

wget http://files.omnifind.info/setuplinux_i586.bin
wget http://files.omnifind.info/dictionaryPacklinux_i586.bin

When installing use the argument -silent. The other argument, -console, didn't work. So run this:

server:~# ./setuplinux_i586.bin -silent
server:~# ./dictionaryPacklinux_i586.bin -silent

The files are installed in /opt/ibm/OmniFindYahooEdition/. As no other web service was running on the server the search service claimed port 80. Start up OmniFind by running:

server:~# /opt/ibm/OmniFindYahooEdition/bin/startup.sh &

Likewise, to stop the daemon run:

server:~# /opt/ibm/OmniFindYahooEdition/bin/shutdown.sh

When service is up running verify that by pointing your web browser to the url. Now shutdown the service. as no document have been indexed so far nothing will show up when trying to search. The administrative interface runs on another port than the local and by default it's port 8888 on the localhost. If you need to change it edit:

/opt/ibm/OmniFindYahooEdition/shortcuts/admin.desktop

Change the last line to the port number that is suitable of your server:

[Desktop Entry]
Encoding=UTF-8
Icon=www
Type=Link
URL=http://localhost:8765/admin

Start service and go to the administrative site at http://<mydomain>/admin and add web sites to crawl.

The first time you enter this site you will be asked to create a new account. This account is the administrative account so set a hard to crack password.

IBM OmniFind is based on Lucene and has a limit of 500 000 indexed document in a variety of formats. Seems to be a free killer app.

Comments

OmniFind Yahoo Edition on Debian 4.0

Hi,

I've been trying to install OmniFind 8.4.2 or 8.4.1 on my Debian 4.0 (2.6.18-5) machine - just as you descibed it above, combined with following all the ibm-install-guide requirements - but the installation allways ends with several exception lines like following:

./setuplinux_i586.bin: line 204: bc: command not found
./setuplinux_i586.bin: line 419: [: : integer expression expected
./setuplinux_i586.bin: line 1259: [: : integer expression expected
./setuplinux_i586.bin: line 204: bc: command not found

and so on, about 20 of them ...

Do someone have any idea what's going wrong?
Thx, greets from Vienna, Martin

'bc' missing

Same thing occurred to me : missing the 'bc' tool. On my ubuntu, I typed
>sudo apt-get install bc
Then installation was successful with the argument -silent because it was made remotely via ssh

OmniFind Yahoo Edition on Debian 4.0 <update>

hi again,

the problem is my - not supported by Omnifind - 64bit system ... ok, so I guess, I'll have to wait for IBM to come up with a 64bit version ...

32-bit libraries

You could try by installing the package ia32-libs first and see if it works.

// John

Need for libstdc++5 on Ubuntu 8.04

When installing the service on Ubuntu 8.04 you must have libstdc++5 installed or you will get an error thrown in your face. This is most likely the case even for more recent Debian releases and flavors thereof. So run:

aptitude install libstdc++5

See this post:

http://omnifind.ibm.yahoo.net/forums/index.php/topic,1211.0.html

// John

Omnifind on Ubuntu 9.10 server and Windows XP

First I've installed Omnifind 8.4.2 Yahoo edition on Ubuntu 9.10 server (32-bit). Crawling of an internal wiki-site goes perfect, but parsing doesn't pick up MS word documents or PDF documents...

Then I've installed Omnifind 8.4.2. Yahoo edition on a Windows XP machine. Crawling of the same internal wiki-site goes perfect. Even parsing MS word or PDF documents is fine.

Unfortunately I'm not able to find any solutions on the internet. I've tried to install Sun java and default that on Ubuntu, in order to exclude the java from being the cause of error, but no success so far...

The error which is seen in the error log (translation into english):

IQQP2601E - the parser component couldn't be started. The parser component isn't available right now
Cause of the problem:
IQQG0143E - Error during initialisation of child processes.
IQQG0146E - Child proces stopped with returncode: 127.

And when you search of *.pdf or *.doc in the documentstatus of the collection you get the following information (translated into english):
- Crawlerstatus 200 - Succes
- Parser- en indexstatus 404 - There was an error during execution of the document.
Errorinformation IQQP0011W during parsing of document http://wiki/images/SAP.doc the parser found an I/O-error. The document is not indexed. Cause of the problem: IQQG0020E com.ibm.es.nuvo.util.transform.TransformIOException IQQG0020E RemoteProcessException: Caused by: IQQG0006E Veroorzaakt door: IQQG0020E java.util.concurrent.ExecutionException: RemoteProcessException: An error occurred when communicating with a child process. IQQG0147E An error occurred when communicating with a child process.

I just was wondering if someone got the same problem on a linux installation and I was wondering if they found a solution for it. It really would be great if the Linux installation would work.

I'm sorry if this post shouldn't be here, but there was no public forum for this. All links at IBM towards Yahoo won't work...

OYE on Ubuntu 9.10

I've just installed OYE 8.4.2 on Ubuntu 9.04 Desktop and it works just fine indexing (local files) and searching PDFs and Word-files. So the issue might just hit Ubuntu version 9.10 for some reason, but I don't see why at the moment.

The OYE installation file is supposed to be self-contained, all parts needed are enclosed in the package, so you don't need to install Suns Java package and the setup.sh script in the bin-folder sets the JAVA_HOME and CLASSPATH variables.

The documentation isn't very clear and only suggests looking in the log file for further information and do a restart. I would be more radical and remove Suns Java package and the JAVA_HOME and CLASSPATH env vars that you setup yourself unless some other services depend on those. After that I would restart the service. If that isn't working I would reinstall OYE with the switch -console after removing the installation dirs.

Good luck!
// John

I need to try your suggestion

Maybe it was not functioning correct because I used the 9.10 SERVER version (no gui). But I'll let you know on this site. Thanks so far!

Finally had time to try it (on Ubuntu 10.04 now)

After installing OmniFind Yahoo Edition on Ubuntu 10.04 32-bit desktop everything went fine what I was expecting.

Crawling of an internal wiki-site goes perfect. And parsing picks up MS word documents and PDF documents. Searching inside attachments was a relief.

Sorry for the late reply, but I was really busy...

Only struggle I had was for the libstdc++.so.5, but after I had followed the instructions in the box from the link below, I could add it using the Synaptic Packet Manager.
http://packages.ubuntu.com/en/jaunty/i386/libstdc++5/download

John, thanks for your effort.

Bundled JRE is not binary compatible with host OS/Arch.

Hi! I am on Linux Ubuntu 10.04 too and I am unable to install the OY!E version 8.4.2

I downloaded the .bin file, which is now all in one, and I get error:

Initializing InstallShield Wizard........
Extracting Bundled JRE.

Bundled JRE is not binary compatible with host OS/Arch or it is corrupt. Testing bundled JRE failed.

My ubuntu runs as virtual machine, data:

$ cat /etc/issue
Ubuntu 10.04.1 LTS \n \l
$ uname -a
Linux ubuntu 2.6.32-25-generic #44-Ubuntu SMP Fri Sep 17 20:26:08 UTC 2010 i686 GNU/Linux

What version of OmniFind did you install? Please contact me as soon as possible at: jakub dot godawa at gmail

Thanks

debian 5.0

I installed 8.4.2 on Debian/Lenny (after libstdc++.so.5). I used the -console argument instead of -slient. Installed just fine. I stopped a crawl I had running on over 200K documents, then restarted and after a while, java hit 100% cpu, so I had to kill -9 - restarted, working okay. I have java/100% cpu issues with zimbra, so maybe this is a debian thing.

Does anyone know how to make 'file system crawls' run on a schedule? Having to do this manually is pretty lame.

Also, does anyone know if the document limitation is 500K total, or per collection (500K x 5 = 2.5 mil)?

And... is it a limitation on how many it recognizes/indexes (.zip/pdf/html, etc), or simply every file (including .exe/.bin)?

Official OmniFind Yahoo Edition forum

Dear all,
I found the forum at https://www.ibm.com/developerworks/forums/forum.jspa?forumID=1671 . Posted 2 questions there. Hopefully, I'll get answers

Sincerely,
Paul Pambudi
http://www.linkedin.com/in/paulpambudi

Post new comment

  • Allowed HTML tags: <a> <em> <strong> <cite> <code> <ul> <ol> <li> <dl> <dt> <dd><pre>
  • Lines and paragraphs break automatically.

More information about formatting options

To combat spam, please enter the code in the image.