Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

12-2005

Abstract

Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose WEBARC as a set of software tools to allow users to construct a logical structure for a website to be archived. Classifiers are trained to. determine relevant web pages and their categories, and subsequently used in website downloading. The archival schedule can be specified and executed by a scheduler. A website viewer is also developed to browse one or more versions of archived web pages.

Keywords

Scheduling, Downloading, World wide web, Internet, Classification, Software tool, Surveillance, Monitoring, Electronic library

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing

Publication

Digital Libraries: Implementing Strategies and Sharing Experiences:

Volume

3815

First Page

406

Last Page

410

ISBN

9783540322917

Identifier

10.1007/11599517_49

Publisher

Springer Verlag

City or Country

Bangkok, Thailand

Additional URL

http://doi.org/10.1007/11599517_49

Share

COinS