Data model for warehousing historical web information
Publication Type
Journal Article
Publication Date
3-2003
Abstract
In this paper, we present a temporal web data model designed for warehousing historical data from World Wide Web (WWW). As the Web is now populated with large volume of information, it has become necessary to capture selected portions of web information in a data warehouse that supports further information processing such as data extraction, data classification, and data mining. Nevertheless, due to the unstructured and dynamic nature of Web, the traditional relational model and its temporal variants could not be used to build such a data warehouse. In this paper, we therefore propose a temporal web data model that represents web documents and their connectivities in the form of temporal web tables. To represent web data that evolve with time, a visible time interval is associated with each web document. To manipulate temporal web tables, we have defined a set of web operators with capabilities ranging from extracting WWW information into web tables, to merging information from different web tables. We further illustrate the use of our temporal web data model using some realistic motivating examples.
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Publication
Information and Software Technology
Volume
45
Issue
6
First Page
315
Last Page
334
ISSN
0950-5849
Identifier
10.1016/S0950-5849(03)00019-3
Publisher
Elsevier
Citation
LIM, Ee Peng; CAO, Yinyan; and NG, Wee-Keong.
Data model for warehousing historical web information. (2003). Information and Software Technology. 45, (6), 315-334.
Available at: https://ink.library.smu.edu.sg/sis_research/67
Additional URL
http://doi.org/10.1016/S0950-5849(03)00019-3