BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160544Z
LOCATION:Track 10
DTSTART;TZID=America/New_York:20201113T110000
DTEND;TZID=America/New_York:20201113T112500
UID:submissions.supercomputing.org_SC20_sess230_ws_waccpd102@linklings.com
SUMMARY:ADELUS: A Performance-Portable Dense LU Solver for Distributed-Mem
 ory Hardware-Accelerated Systems
DESCRIPTION:Workshop\n\nADELUS: A Performance-Portable Dense LU Solver for
  Distributed-Memory Hardware-Accelerated Systems\n\nDang, Kotulski, Rajama
 nickam\n\nSolving dense systems of linear equations is essential in applic
 ations encountered in physics, mathematics and engineering. This paper des
 cribes our current efforts toward the development of the ADELUS package fo
 r current and next generation distributed, accelerator-based high-performa
 nce computing platforms. The package solves dense linear systems using par
 tial pivoting LU factorization on distributed-memory systems with CPUs/GPU
 s. The matrix is block-mapped onto distributed memory on CPUs/GPUs and is 
 solved as if it were torus-wrapped for an optimal balance of computation a
 nd communication. A permutation operation is performed to restore the resu
 lts so the torus-wrap distribution is transparent to the user. This packag
 e targets performance portability by leveraging the abstractions provided 
 in the Kokkos and Kokkos Kernels libraries. Comparison of the performance 
 gains versus the state-of-the-art SLATE and DPLASMA GESV functionalities o
 n the Summit supercomputer are provided. Preliminary performance results f
 rom large-scale electromagnetic simulations using ADELUS are also presente
 d. The solver achieves 7.7 petaflops on 7600 GPUs of the Sierra supercompu
 ter, translating to 16.9% efficiency.\n\nRegistration Category: Workshop R
 eg Pass
END:VEVENT
END:VCALENDAR

