Publication Type
Journal Article
Version
acceptedVersion
Publication Date
2-2007
Abstract
Traditional text mining techniques transform free text into flat bags of words representation, which does not preserve sufficient semantics for the purpose of knowledge discovery. In this paper, we present a two-step procedure to mine generalized associations of semantic relations conveyed by the textual content of Web documents. First, RDF (resource description framework) metadata representing semantic relations are extracted from raw text using a myriad of natural language processing techniques. The relation extraction process also creates a term taxonomy in the form of a sense hierarchy inferred from WordNet. Then, a novel generalized association pattern mining algorithm (GP-Close) is applied to discover the underlying relation association patterns on RDF metadata. For pruning the large number of redundant overgeneralized patterns in relation pattern search space, the GP-Close algorithm adopts the notion of generalization closure for systematic overgeneralization reduction. The efficacy of our approach is demonstrated through empirical experiments conducted on an online database of terrorist activities.
Keywords
RDF mining, association rule mining, relation association, text mining
Discipline
Computer Engineering | Databases and Information Systems
Research Areas
Data Science and Engineering
Publication
IEEE Transactions on Knowledge and Data Engineering
Volume
19
Issue
2
First Page
164
Last Page
179
ISSN
1041-4347
Identifier
10.1109/TKDE.2007.36
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Citation
JIANG, Tao; TAN, Ah-hwee; and WANG, We.
Mining generalized associations of semantic relations from textual web content. (2007). IEEE Transactions on Knowledge and Data Engineering. 19, (2), 164-179.
Available at: https://ink.library.smu.edu.sg/sis_research/5228
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/TKDE.2007.36