An Enhanced Lucene based System for Efficient Document/Information Retrieval


Alaidine Ben Ayed1, Ismaïl Biskri1,2 and Jean-Guy Meunier1, 1Université du Québec à Montréal (UQAM), Canada, 2Université du Québec à Trois-Rivières (UQTR), Canada


In this paper we implement a document retrieval system using the Lucene tool and we conduct some experiments in order to compare the efficiency of two different weighting schema: the well-known TF-IDF and the BM25. Then, we expand queries using a comparable corpus (wikipedia) and word embeddings. Obtained results show that the latter method (word embeddings) is a good way to achieve higher precision rates and retrieve more accurate documents.


Internet and Web Applications, Data and knowledge Representation, Document Retrieval.

Full Text  Volume 10, Number 9