A Case Study and Characterization of a Many-Socket, Multi-Tier NUMA HPC Platform

SC20 Proceedings

A Case Study and Characterization of a Many-Socket, Multi-Tier NUMA HPC Platform

Workshop:HiPar20: Workshop on Hierarchical Parallelism for Exascale Computing

Authors: Connor Imes (University of Southern California), Steven Hofmeyr (Lawrence Berkeley National Laboratory), and Dong In D. Kang and John Paul Walters (University of Southern California)

Abstract: As the number of processor cores and sockets on HPC compute nodes increase and systems expose more hierarchical non-uniform memory access (NUMA) architectures, efficiently scaling applications within even a single shared memory system is becoming more challenging. It is now common for HPC compute nodes to have two or more sockets and dozens of cores, but future generation systems may contain an order of magnitude more of each. We conduct experiments on a state-of-the-art Intel Xeon Platinum system with 12 processor sockets, totaling 288 cores (576 hardware threads), arranged in a multi-tier NUMA hierarchy. Platforms of this scale and memory hierarchy are uncommon today, providing us a unique opportunity to empirically evaluate—rather than model or simulate—an architecture potentially representative of future HPC compute nodes. We quantify the platform’s multi-tier NUMA patterns, then evaluate its suitability for HPC workloads using a modern HPC metagenome assembler application as a case study, and other HPC benchmarks with a variety of parallelization techniques to characterize the system’s performance, scalability, I/O patterns, and performance/power behavior. Our results demonstrate near- perfect scaling for embarrassingly parallel and weak scaling workloads, but challenges for random memory access workloads. For the latter, we find poor scaling performance with the default scheduling approaches—e.g., which do not pin threads— suggesting that userspace or kernel schedulers may require changes to better manage the multi-tier NUMA hierarchies of very large shared memory platforms.

Back to HiPar20: Workshop on Hierarchical Parallelism for Exascale Computing Archive Listing

Back to Full Workshop Archive Listing