Publication Type
Conference Proceeding Article
Version
submittedVersion
Publication Date
11-2003
Abstract
In web classification, most researchers assume that the objects to classify are individual web pages from one or more web sites. In practice, the assumption is too restrictive since a web page itself may not always correspond to a concept instance of some semantic concept (or category) given to the classification task. In this paper, we want to relax this assumption and allow a concept instance to be represented by a subgraph of web pages or a set of web pages. We identify several new issues to be addressed when the assumption is removed, and formulate the web unit mining problem. We also propose an iterative web unit mining (iWUM) method that first finds subgraphs of web pages using some knowledge about web site structure. From these web subgraphs, web units are constructed and classified into semantic concepts (or categories) in an iterative manner. Our experiments using the WebKB dataset showed that iWUM improves the overall classification performance and works very well on the more structured parts of a web site.
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Publication
CIKM '03: Proceedings of the 12th international conference on Information and knowledge management
First Page
108
Last Page
115
ISBN
9781581137231
Identifier
10.1145/956863.956885
Publisher
ACM
City or Country
New Orleans, LA
Citation
SUN, Aixin and LIM, Ee Peng.
Web Unit Mining: Finding and classifying subgraphs of web pages. (2003). CIKM '03: Proceedings of the 12th international conference on Information and knowledge management. 108-115.
Available at: https://ink.library.smu.edu.sg/sis_research/991
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://doi.org/10.1145/956863.956885
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons