Fault Tolerance for Parallel Applications through Replication

Publication Type

Conference Proceeding Article

Publication Date

9-1997

Abstract

Based on the technique of replication, an efficient fault-tolerant model for parallel computing on workstation clusters is proposed. The model is built on top of a runtime system which supports resource allocation for parallel applications running on heterogeneous workstation clusters. According to the results of resource allocation, replicated parallel applications can minimize their resource consumption by runtime reconfiguration. Besides, checkpointed states only transfer among replicated applications, no expensive disk read/write operations are therefore required.

Discipline

Databases and Information Systems | Numerical Analysis and Scientific Computing

Publication

ICICS 1997: Proceedings of the 1st International Conference on Information, Communication and Signal Processing, 9-12 September, Singapore

ISBN

0780336763

Identifier

10.1109/ICICS.1997.652234

Publisher

IEEE

City or Country

Singapore

Additional URL

http://dx.doi.org/10.1109/ICICS.1997.652234

Share

COinS