Research Collection School Of Computing and Information Systems

Finding and Classifying Web Units in Web Sites

Publication Type

Journal Article

Publication Date

2005

Abstract

In web classification, most researchers assume that the objects to be classified are individual web pages from one or more websites. In practice, the assumption is too restrictive since a web page itself may not carry sufficient information for it to be treated as an instance of some semantic class or concept. In this paper, we relax this assumption and allow a subgraph of web pages to represent an instance of the semantic concept. Such a subgraph of web pages is known as a web unit. To construct and classify web units, we formulate the web unit mining problem and propose an iterative web unit mining (iWUM) method. The iWUM method first finds subgraphs of web pages using knowledge about website structure and connectivity among the web pages. From these web subgraphs, web units are constructed and classified into categories in an iterative manner. Our experiments using the WebKB dataset showed that iWUM was able to construct web units and classify web units with high accuracy for the more structured parts of a website.

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing

Publication

International Journal of Business Intelligence and Data Mining (IJBIDM)

Volume

Issue

First Page

161

Last Page

193

ISSN

1743-8187

Identifier

10.1504/IJBIDM.2005.008361

Publisher

InderScience

Citation

LIM, Ee Peng and SUN, Aixin. Finding and Classifying Web Units in Web Sites. (2005). International Journal of Business Intelligence and Data Mining (IJBIDM). 1, (2), 161-193.
Available at: https://ink.library.smu.edu.sg/sis_research/139

Additional URL

http://dx.doi.org/10.1504/IJBIDM.2005.008361

Link to Full Text

Find it in your library

COinS

Research Collection School Of Computing and Information Systems

Finding and Classifying Web Units in Web Sites

Publication Type

Publication Date

Abstract

Discipline

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Additional URL

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Finding and Classifying Web Units in Web Sites

Author

Publication Type

Publication Date

Abstract

Discipline

Publication

Volume

Issue

First Page

Last Page

ISSN

Identifier

Publisher

Citation

Additional URL

Share

Search

Links

Browse

Links