Publication Type
Conference Proceeding Article
Version
publishedVersion
Publication Date
10-2023
Abstract
Soft errors have become one of the main concerns for the resilience of HPC applications, as these errors can cause HPC applications to generate serious outcomes such as silent data corruption (SDC). Many approaches have been proposed to analyze the resilience of HPC applications. However, existing studies rarely address the challenges of analysis result perception. Specifically, resilience analysis techniques often produce a massive volume of unstructured data, making it difficult for programmers to perform resilience analysis due to non-intuitive raw data. Furthermore, different analysis models produce diverse results with multiple levels of detail, which can create obstacles to compare and explore the resilience of the HPC program execution. To this end, we present Visilience, an interactive VISual resILIENCE analysis framework to allow programmers to facilitate the resilience analysis of HPC applications. In particular, Visilience leverages an effective visualization approach, Control Flow Graph (CFG) to present a function execution. Furthermore, three widely used models for resilience analysis (i.e., Y-Branch, IPAS, and TRIDENT) are seamlessly integrated into the framework for resilience analysis and result comparison. Multiple case studies have been conducted to demonstrate the effectiveness of our proposed framework Visilience.
Keywords
Error Resilience, Visualization, Visual Analytics, Control Flow Graph
Discipline
Information Security | Software Engineering
Research Areas
Intelligent Systems and Optimization
Publication
2023 IEEE 28th Pacific Rim International Symposium on Dependable Computing (PRDC): Singapore, October 24-27: Proceedings
First Page
250
Last Page
256
ISBN
9798350358766
Identifier
10.1109/PRDC59308.2023.00041
Publisher
IEEE
City or Country
Piscataway, NJ
Citation
JIANG, Hailong; RUAN, Shaolun; FANG, Bo; WANG, Yong; and GUAN, Qiang.
Visilience: An interactive visualization framework for resilience analysis using control-flow graph. (2023). 2023 IEEE 28th Pacific Rim International Symposium on Dependable Computing (PRDC): Singapore, October 24-27: Proceedings. 250-256.
Available at: https://ink.library.smu.edu.sg/sis_research/8602
Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 International License.
Additional URL
https://doi.org/10.1109/PRDC59308.2023.00041