Title

Re-engineering Structures from Web Documents

Publication Type

Conference Proceeding Article

Publication Date

6-2000

Abstract

To realize a wide range of applications (including digital libraries) on the Web, a more structured way of accessing the Web is required and such requirement can be facilitated by the use of XML standard. In this paper, we propose a general framework for reverse engineering (or re-engineering) the underlying structures i.e.,the DTD from a collection of similarly structured XML documents when they share some common but unknown DTDs. The essential data structures and algorithms for the DTD generation have been delveloped and experiments on real Web collections have been conducted to demonstrate their feasibilty. In addition, we also proposed a method ofimposing a constraint on the repetitiveness on the element in a DTD rule to further simplify the generated DTD without compromising their correctness.

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing

Research Areas

Data Management and Analytics

Publication

5th ACM Conference on Digital Libraries (DL00)

Identifier

10.1145/336597.336638

Publisher

ACM

City or Country

San Antonio, Texas

Additional URL

http://portal.acm.org/citation.cfm?id=336638

This document is currently not available here.

Share

COinS