Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
6-2000
Abstract
To realize a wide range of applications (including digital libraries) on the Web, a more structured way of accessing the Web is required and such requirement can be facilitated by the use of XML standard. In this paper, we propose a general framework for reverse engineering (or re-engineering) the underlying structures i.e.,the DTD from a collection of similarly structured XML documents when they share some common but unknown DTDs. The essential data structures and algorithms for the DTD generation have been delveloped and experiments on real Web collections have been conducted to demonstrate their feasibilty. In addition, we also proposed a method ofimposing a constraint on the repetitiveness on the element in a DTD rule to further simplify the generated DTD without compromising their correctness.
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Publication
5th ACM Conference on Digital Libraries (DL00)
Identifier
10.1145/336597.336638
Publisher
ACM
City or Country
San Antonio, Texas
Citation
HUE, Moh Chuang; LIM, Ee Peng; and NG, Wee-Keong.
Re-engineering structures from web documents. (2000). 5th ACM Conference on Digital Libraries (DL00).
Available at: https://ink.library.smu.edu.sg/sis_research/966
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://portal.acm.org/citation.cfm?id=336638
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons