Publication Type
Journal Article
Version
publishedVersion
Publication Date
1-2010
Abstract
Domain ontologies play an important role in supporting knowledge‐based applications in the Semantic Web. To facilitate the building of ontologies, text mining techniques have been used to perform ontology learning from texts. However, traditional systems employ shallow natural language processing techniques and focus only on concept and taxonomic relation extraction. In this paper we present a system, known as Concept‐Relation‐Concept Tuple‐based Ontology Learning (CRCTOL), for mining ontologies automatically from domain‐specific documents. Specifically, CRCTOL adopts a full text parsing technique and employs a combination of statistical and lexico‐syntactic methods, including a statistical algorithm that extracts key concepts from a document collection, a word sense disambiguation algorithm that disambiguates words in the key concepts, a rule‐based algorithm that extracts relations between the key concepts, and a modified generalized association rule mining algorithm that prunes unimportant relations for ontology learning. As a result, the ontologies learned by CRCTOL are more concise and contain a richer semantics in terms of the range and number of semantic relations compared with alternative systems. We present two case studies where CRCTOL is used to build a terrorism domain ontology and a sport event domain ontology. At the component level, quantitative evaluation by comparing with Text‐To‐Onto and its successor Text2Onto has shown that CRCTOL is able to extract concepts and semantic relations with a significantly higher level of accuracy. At the ontology level, the quality of the learned ontologies is evaluated by either employing a set of quantitative and qualitative methods including analyzing the graph structural property, comparison to WordNet, and expert rating, or directly comparing with a human‐edited benchmark ontology, demonstrating the high quality of the ontologies learned.
Discipline
Computer and Systems Architecture | Computer Engineering | Databases and Information Systems
Research Areas
Data Science and Engineering
Publication
Journal of the American Society for Information Science and Technology
Volume
61
Issue
1
First Page
150
Last Page
168
ISSN
1532-2882
Identifier
10.1002/asi.21231
Publisher
Association for Information Science and Technology (ASIS&T): JASIS&T
Citation
JIANG, Xing and TAN, Ah-hwee.
CRCTOL: A semantic based domain ontology learning system. (2010). Journal of the American Society for Information Science and Technology. 61, (1), 150-168.
Available at: https://ink.library.smu.edu.sg/sis_research/5223
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1002/asi.21231