SC20 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

Implementation and Numerical Techniques for One Eflop/s HPL-AI Benchmark on Fugaku


Workshop:11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems

Authors: Toshiyuki Imamura, Shuhei Kudo, Keigo Nitadori, and Takuya Ina (RIKEN Center for Computational Science (R-CCS))


Abstract: Our performance benchmark of HPL-AI on the supercomputer Fugaku was awarded the 55th Top500. The effective performance was 1.42 EFlop/s, and the world's first achievement to exceed the wall of exascale in a floating-point arithmetic benchmark. Because HPL-AI is brand new and has no reference code for large systems, several challenges in the large-scale benchmark emerge from a low-precision numerical viewpoint. It is not sufficient to replace FP64 operations solely with those of FP32 or FP16. At the least, we need thoughtful numerical analysis for lower-precision arithmetic and the introduction of optimization techniques on extensive computing such as on Fugaku. This study presents some technical analyses and insights on the accuracy issues, implementation and performance improvement, and report on the exascale benchmark on Fugaku.





Back to 11th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems Archive Listing



Back to Full Workshop Archive Listing