Publication Type
Journal Article
Version
publishedVersion
Publication Date
7-2006
Abstract
In information retrieval, retrieving relevant passages, as opposed to whole documents, not only directly benefits the end user by filtering out the irrelevant information within a long relevant document, but also improves retrieval accuracy in general. A critical problem in passage retrieval is to extract coherent relevant passages accurately from a document, which we refer to as passage extraction. While much work has been done on passage retrieval, the passage extraction problem has not been seriously studied. Most existing work tends to rely on presegmenting documents into fixed-length passages which are unlikely optimal because the length of a relevant passage is presumably highly sensitive to both the query and document.In this article, we present a new method for accurately detecting coherent relevant passages of variable lengths using hidden Markov models (HMMs). The HMM-based method naturally captures the topical boundaries between passages relevant and nonrelevant to the query. Pseudo-feedback mechanisms can be naturally incorporated into such an HMM-based framework to improve parameter estimation. We show that with appropriate parameter estimation, the HMM method outperforms a number of strong baseline methods on two datasets. We further show how the HMM method can be applied on top of any basic passage extraction method to improve passage boundaries.
Keywords
Algorithms, Hidden Markov models, passage retrieval
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Research Areas
Data Science and Engineering
Publication
ACM Transactions on Information Systems
Volume
24
Issue
3
First Page
295
Last Page
319
ISSN
1046-8188
Identifier
10.1145/1165774.1165775
Publisher
ACM
Citation
JIANG, Jing and ZHAI, ChengXiang.
Extraction of Coherent Relevant Passages using Hidden Markov Models. (2006). ACM Transactions on Information Systems. 24, (3), 295-319.
Available at: https://ink.library.smu.edu.sg/sis_research/130
Copyright Owner and License
Publisher
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1145/1165774.1165775
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons