Publication Type

Journal Article

Version

publishedVersion

Publication Date

1-2010

Abstract

Domain ontologies play an important role in supporting knowledge‐based applications in the Semantic Web. To facilitate the building of ontologies, text mining techniques have been used to perform ontology learning from texts. However, traditional systems employ shallow natural language processing techniques and focus only on concept and taxonomic relation extraction. In this paper we present a system, known as Concept‐Relation‐Concept Tuple‐based Ontology Learning (CRCTOL), for mining ontologies automatically from domain‐specific documents. Specifically, CRCTOL adopts a full text parsing technique and employs a combination of statistical and lexico‐syntactic methods, including a statistical algorithm that extracts key concepts from a document collection, a word sense disambiguation algorithm that disambiguates words in the key concepts, a rule‐based algorithm that extracts relations between the key concepts, and a modified generalized association rule mining algorithm that prunes unimportant relations for ontology learning. As a result, the ontologies learned by CRCTOL are more concise and contain a richer semantics in terms of the range and number of semantic relations compared with alternative systems. We present two case studies where CRCTOL is used to build a terrorism domain ontology and a sport event domain ontology. At the component level, quantitative evaluation by comparing with Text‐To‐Onto and its successor Text2Onto has shown that CRCTOL is able to extract concepts and semantic relations with a significantly higher level of accuracy. At the ontology level, the quality of the learned ontologies is evaluated by either employing a set of quantitative and qualitative methods including analyzing the graph structural property, comparison to WordNet, and expert rating, or directly comparing with a human‐edited benchmark ontology, demonstrating the high quality of the ontologies learned.

Discipline

Computer and Systems Architecture | Computer Engineering | Databases and Information Systems

Research Areas

Data Science and Engineering

Publication

Journal of the American Society for Information Science and Technology

Volume

61

Issue

1

First Page

150

Last Page

168

ISSN

1532-2882

Identifier

10.1002/asi.21231

Publisher

Association for Information Science and Technology (ASIS&T): JASIS&T

Additional URL

https://doi.org/10.1002/asi.21231

Share

COinS