Web structure analysis for information mining
Publication Type
Book Chapter
Publication Date
12-2003
Abstract
Our approach to extracting information from the web analyzes the structural content of web pages through exploiting the latent information given by HTML tags. For each specific extraction task, an object model is created consisting of the salient fields to be extracted and the corresponding extraction rules based on a library of HTML parsing functions. We derive extraction rules for both single-slot and multiple-slot extraction tasks which we illustrate through two sample domains.
Discipline
Databases and Information Systems | OS and Networks
Research Areas
Data Science and Engineering
Publication
Web document analysis: Challenges and opportunities
Editor
ANTONACOPOULOS, Apostolos; HU, Jianying
First Page
39
Last Page
57
ISBN
9789812385826
Identifier
10.1142/9789812775375_0003
Publisher
World Scientific
Citation
VIJJAPPU, Lakshmi; TAN, Ah-hwee; and TAN, Chew-Lim.
Web structure analysis for information mining. (2003). Web document analysis: Challenges and opportunities. 39-57.
Available at: https://ink.library.smu.edu.sg/sis_research/5255
Additional URL
https://doi.org/10.1142/9789812775375_0003