Research Collection School Of Computing and Information Systems

Web structure analysis for information mining

Publication Type

Book Chapter

Publication Date

12-2003

Abstract

Our approach to extracting information from the web analyzes the structural content of web pages through exploiting the latent information given by HTML tags. For each specific extraction task, an object model is created consisting of the salient fields to be extracted and the corresponding extraction rules based on a library of HTML parsing functions. We derive extraction rules for both single-slot and multiple-slot extraction tasks which we illustrate through two sample domains.

Discipline

Databases and Information Systems | OS and Networks

Research Areas

Data Science and Engineering

Publication

Web document analysis: Challenges and opportunities

Editor

ANTONACOPOULOS, Apostolos; HU, Jianying

First Page

Last Page

ISBN

9789812385826

Identifier

10.1142/9789812775375_0003

Publisher

World Scientific

Citation

VIJJAPPU, Lakshmi; TAN, Ah-hwee; and TAN, Chew-Lim. Web structure analysis for information mining. (2003). Web document analysis: Challenges and opportunities. 39-57.
Available at: https://ink.library.smu.edu.sg/sis_research/5255

Additional URL

https://doi.org/10.1142/9789812775375_0003

This document is currently not available here.

Find it in your library

COinS

Research Collection School Of Computing and Information Systems

Web structure analysis for information mining

Publication Type

Publication Date

Abstract

Discipline

Research Areas

Publication

Editor

First Page

Last Page

ISBN

Identifier

Publisher

Citation

Additional URL

Search

Links

Browse

Links

Research Collection School Of Computing and Information Systems

Web structure analysis for information mining

Author

Publication Type

Publication Date

Abstract

Discipline

Research Areas

Publication

Editor

First Page

Last Page

ISBN

Identifier

Publisher

Citation

Additional URL

Share

Search

Links

Browse

Links