Publication Type

Conference Proceeding Article

Version

publishedVersion

Publication Date

10-2023

Abstract

Soft errors have become one of the main concerns for the resilience of HPC applications, as these errors can cause HPC applications to generate serious outcomes such as silent data corruption (SDC). Many approaches have been proposed to analyze the resilience of HPC applications. However, existing studies rarely address the challenges of analysis result perception. Specifically, resilience analysis techniques often produce a massive volume of unstructured data, making it difficult for programmers to perform resilience analysis due to non-intuitive raw data. Furthermore, different analysis models produce diverse results with multiple levels of detail, which can create obstacles to compare and explore the resilience of the HPC program execution. To this end, we present Visilience, an interactive VISual resILIENCE analysis framework to allow programmers to facilitate the resilience analysis of HPC applications. In particular, Visilience leverages an effective visualization approach, Control Flow Graph (CFG) to present a function execution. Furthermore, three widely used models for resilience analysis (i.e., Y-Branch, IPAS, and TRIDENT) are seamlessly integrated into the framework for resilience analysis and result comparison. Multiple case studies have been conducted to demonstrate the effectiveness of our proposed framework Visilience.

Keywords

Error Resilience, Visualization, Visual Analytics, Control Flow Graph

Discipline

Information Security | Software Engineering

Research Areas

Intelligent Systems and Optimization

Publication

2023 IEEE 28th Pacific Rim International Symposium on Dependable Computing (PRDC): Singapore, October 24-27: Proceedings

First Page

250

Last Page

256

ISBN

9798350358766

Identifier

10.1109/PRDC59308.2023.00041

Publisher

IEEE

City or Country

Piscataway, NJ

Additional URL

https://doi.org/10.1109/PRDC59308.2023.00041

Share

COinS