SC20 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Log-Based Identification, Classification, and Behavior Prediction of HPC Applications


Workshop:HPCSYSPROS20

Authors: Ryan D. Lewis (Northern Illinois University, Argonne National Laboratory Leadership Computing Facility) and Zhengchun Liu, Rajkumar Kettimuthu, and Michael E. Papka (Argonne National Laboratory (ANL))


Abstract: Leadership supercomputers, such as those operated by the Argonne Leadership Computing Facility (ALCF), provide an important avenue for scientific exploration and discovery, enabling simulation, data analysis and visualization, and artificial intelligence at massive scale. As we move into the exascale supercomputing era in 2021 with the advent of Aurora, Frontier, and other exascale machines, it's important that we are able to understand the interactions between the applications being run, and the hardware they run on, to optimize the use of these expensive and high-demand resources.

In previous work, we analyzed a collection of production machine scheduling and performance logs to better understand application behaviors and characteristics. This work further refines our understanding of how scientific users leverage leadership computing resources; we show that system-level hardware performance counters can work as a lightweight, low-overhead alternative to more performance-intensive benchmarking and logging instrumentation for certain data analysis tasks.  We also demonstrate a method for predicting application runtimes on leadership computing resources using data gathered from logging sources at submission.





Back to HPCSYSPROS20 Archive Listing



Back to Full Workshop Archive Listing