RSS
 

Archive for the ‘Information Retrieval’ Category

Apache Lucene – PHP Implementation VS Java version

10 May

I was excited to know about Lucene’s inclusion into Zend’s Framework, but a bit of Googling brought up some serious performance issues with the PHP implementation. The Java implementation works faster and I would consider using it instead.

Click here for a comprehensive comparison…

By the way, the Xapian project offers much superior performance as shown by my Recoll personal search system. I have indexed more than 7GB of data which consists of large PDF files, IMAP mailboxes, Text files, DOC files, etc.

The search results are quick, considering the size of the data indexed and the resources on my machine.

Good work Xapian team !!

 

DBSight – The Ultimate Faceted Database Search

02 Aug

DBSight Logo

Information Retrieval” as a science has attained a certain level of maturity and we are seeing a lot of companies offering nice products making use of the most advanced techniques. I recently came accross such a product that’s worth mentioning here. I was able to set it up on my pc in less than 3 minutes. It comes with a very easy to use web interface. It makes use of JDBC to retrieve data from virtually any database. You can specify an SQL query to specify the search able records. DBSight automatically retrieves the records from database, creates the search index and it can automate the process as per a schedule specified by you. The figure below shows the possibilities for search results retrieval once an index has been created.

DBSight Overview

DBSight Overview

Read the rest of this entry »