Fault Tolerance for Parallel Applications through Replication
Publication Type
Conference Proceeding Article
Publication Date
9-1997
Abstract
Based on the technique of replication, an efficient fault-tolerant model for parallel computing on workstation clusters is proposed. The model is built on top of a runtime system which supports resource allocation for parallel applications running on heterogeneous workstation clusters. According to the results of resource allocation, replicated parallel applications can minimize their resource consumption by runtime reconfiguration. Besides, checkpointed states only transfer among replicated applications, no expensive disk read/write operations are therefore required.
Discipline
Databases and Information Systems | Numerical Analysis and Scientific Computing
Publication
ICICS 1997: Proceedings of the 1st International Conference on Information, Communication and Signal Processing, 9-12 September, Singapore
ISBN
0780336763
Identifier
10.1109/ICICS.1997.652234
Publisher
IEEE
City or Country
Singapore
Citation
SHUM, Kam Hong.
Fault Tolerance for Parallel Applications through Replication. (1997). ICICS 1997: Proceedings of the 1st International Conference on Information, Communication and Signal Processing, 9-12 September, Singapore.
Available at: https://ink.library.smu.edu.sg/sis_research/1054
Additional URL
http://dx.doi.org/10.1109/ICICS.1997.652234