Publication Type
Conference Proceeding Article
Version
submittedVersion
Publication Date
7-2002
Abstract
Large sequence databases, such as protein, DNA and gene sequences in biology, are becoming increasingly common. An important operation on a sequence database is approximate subsequence matching, where all subsequences that are within some distance from a given query string are retrieved. This paper proposes a filter-and-refine algorithm that enables efficient approximate subsequence matching in large DNA sequence databases. It employs a bitmap indexing structure to condense and encode each data sequence into a shorter index sequence. During query processing, the bitmap index is used to filter out most of the irrelevant subsequences, and false positives are removed in the final refinement step. Analytical and experimental studies show that the proposed strategy is capable of reducing response time substantially while incurring only a small space overhead.
Keywords
DNA sequences, approximate subsequence matching, biology, bitmap indexing structure, data sequence condensing, data sequence encoding, false positive removal, fast filter-and-refine algorithms, gene sequences, index sequence, large DNA sequence databases, large sequence databases, protein sequences, query processing, query string, response time, small space overhead, subsequence selection
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Publication
IDEAS 2002: International Database Engineering and Applications Symposium 2002, July 17-19, Edmonton, Canada
First Page
243
Last Page
254
ISBN
9780769516387
Identifier
10.1109/IDEAS.2002.1029677
Publisher
IEEE Computer Society
City or Country
Los Alamitos, CA
Citation
OOI, Beng-Chin; PANG, Hwee Hwa; WANG, Hao; WONG, Limsoon; and YU, Cui.
Fast Filter-and-Refine Algorithms for Subsequence Selection. (2002). IDEAS 2002: International Database Engineering and Applications Symposium 2002, July 17-19, Edmonton, Canada. 243-254.
Available at: https://ink.library.smu.edu.sg/sis_research/1144
Copyright Owner and License
Authors
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://doi.ieeecomputersociety.org/10.1109/IDEAS.2002.1029677
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons