Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
12-2005
Abstract
Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose WEBARC as a set of software tools to allow users to construct a logical structure for a website to be archived. Classifiers are trained to. determine relevant web pages and their categories, and subsequently used in website downloading. The archival schedule can be specified and executed by a scheduler. A website viewer is also developed to browse one or more versions of archived web pages.
Keywords
Scheduling, Downloading, World wide web, Internet, Classification, Software tool, Surveillance, Monitoring, Electronic library
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Publication
Digital Libraries: Implementing Strategies and Sharing Experiences:
Volume
3815
First Page
406
Last Page
410
ISBN
9783540322917
Identifier
10.1007/11599517_49
Publisher
Springer Verlag
City or Country
Bangkok, Thailand
Citation
LIM, Ee Peng and MARISSA, Maria.
WebArc: Website Archival using a structured approach. (2005). Digital Libraries: Implementing Strategies and Sharing Experiences:. 3815, 406-410.
Available at: https://ink.library.smu.edu.sg/sis_research/891
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
http://doi.org/10.1007/11599517_49
Included in
Databases and Information Systems Commons, Numerical Analysis and Scientific Computing Commons