Publication Type
Journal Article
Version
acceptedVersion
Publication Date
2001
Abstract
This study investigated whether document retrieval can be improved if documents are divided into smaller sub-documents or passages and the retrieval score for these passages are incorporated in the final retrieval score for the whole document. The documents were segmented by sliding a window of a certain size across the document and extracting the words displayed each time the window stopped. A retrieval score was calculated for each of the passages extracted and the highest score obtained by a passage of that size was taken as the document’s passage-level score for that window size. A range of window sizes was tried. The experimental results indicated that using a fixed window size of 50 words gave better results than other window sizes for the TREC-5 and TREC-6 test collections. This window size yielded a significant retrieval improvement of 24% compared to using the whole-document retrieval score (using the traditional tf*idf weighting scheme with cosine normalisation). However, combining this window score and the whole-document retrieval score did not yield a retrieval improvement. Using a variable window size (ranging from 50 to 400 words) yielded a retrieval improvement of about 5% over using a fixed window size of 50. Different window sizes were found to work best for different queries. If the best window size to use for each query could be predicted accurately, a maximum retrieval improvement of 42% could be obtained. Subsequent work suggests that the usefulness of passage-level evidence in document retrieval depends on the weighting scheme and type of normalisation used in the retrieval method.
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Publication
Journal of Information Science
Volume
27
Issue
2
First Page
73
Last Page
80
ISSN
0165-5515
Identifier
10.1177/016555150102700202
Publisher
SAGE
Citation
XI, Wensi; XU-RONG, Richard; KHOO, Christopher Soo Guan; and LIM, Ee Peng.
Incorporating window-based passage-level evidence in document retrieval. (2001). Journal of Information Science. 27, (2), 73-80.
Available at: https://ink.library.smu.edu.sg/sis_research/137
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://doi.org/10.1177/016555150102700202
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons