Web structure analysis for information mining

Publication Type

Book Chapter

Publication Date

12-2003

Abstract

Our approach to extracting information from the web analyzes the structural content of web pages through exploiting the latent information given by HTML tags. For each specific extraction task, an object model is created consisting of the salient fields to be extracted and the corresponding extraction rules based on a library of HTML parsing functions. We derive extraction rules for both single-slot and multiple-slot extraction tasks which we illustrate through two sample domains.

Discipline

Databases and Information Systems | OS and Networks

Research Areas

Data Science and Engineering

Publication

Web document analysis: Challenges and opportunities

Editor

ANTONACOPOULOS, Apostolos; HU, Jianying

First Page

39

Last Page

57

ISBN

9789812385826

Identifier

10.1142/9789812775375_0003

Publisher

World Scientific

Additional URL

https://doi.org/10.1142/9789812775375_0003

This document is currently not available here.

Share

COinS