Mining Closed Discriminative Dyadic Sequential Patterns
Publication Type
Conference Proceeding Article
Publication Date
3-2011
Abstract
A lot of data are in sequential formats. In this study, we are interested in sequential data that goes in pairs. There are many interesting datasets in this format coming from various domains including parallel textual corpora, duplicate bug reports, and other pairs of related sequences of events. Our goal is to mine a set of closed discriminative dyadic sequential patterns from a database of sequence pairs each belonging to one of the two classes +ve and -ve. These dyadic sequential patterns characterize the discriminating facets contrasting the two classes. They are potentially good features to be used for the classification of dyadic sequential data. They can be used to characterize and flag correct and incorrect translations from parallel textual corpora, automate the manual and time consuming duplicate bug report detection process, etc. We provide a solution of this new problem by proposing new search space traversal strategy, projected database structure, pruning properties, and novel mining algorithms. To demonstrate the scalability and utility of our solution, we have experimented with both synthetic and real datasets. Experiment results show that our solution is scalable. Mined patterns are also able to improve the accuracy of one possible downstream application, namely the detection of duplicate bug reports using pattern-based classification.
Discipline
Software Engineering
Research Areas
Software Systems
Publication
International Conference on Extending Database Technology (EDBT)
First Page
21
Last Page
32
Identifier
10.1145/1951365.1951371
Publisher
ACM
Citation
LO, David; CHENG, Hong; and Lucia, -.
Mining Closed Discriminative Dyadic Sequential Patterns. (2011). International Conference on Extending Database Technology (EDBT). 21-32.
Available at: https://ink.library.smu.edu.sg/sis_research/1358
Additional URL
http://dx.doi.org/10.1145/1951365.1951371