BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160554Z
LOCATION:Track 2
DTSTART;TZID=America/New_York:20201112T101000
DTEND;TZID=America/New_York:20201112T103000
UID:submissions.supercomputing.org_SC20_sess208_ws_pmbss103@linklings.com
SUMMARY:Performance Modeling of Streaming Kernels and Sparse Matrix-Vector
  Multiplication on A64FX
DESCRIPTION:Workshop\n\nPerformance Modeling of Streaming Kernels and Spar
 se Matrix-Vector Multiplication on A64FX\n\nAlappat, Laukemann, Gruber, Ha
 ger, Wellein...\n\nThe A64FX CPU powers the current #1 supercomputer on th
 e Top500 list. Although it is a traditional cache-based multicore processo
 r, its peak performance and memory bandwidth rival accelerator devices. Ge
 nerating efficient code for such a new architecture requires a good unders
 tanding of its performance features. Using these features, we construct th
 e Execution-Cache-Memory (ECM) performance model for the A64FX processor i
 n the FX700 supercomputer and validate it using streaming loops. We also i
 dentify architectural peculiarities and derive optimization hints. Applyin
 g the ECM model to sparse matrix-vector multiplication (SpMV), we motivate
  why the CRS matrix storage format is inappropriate and how the SELL-C-&#9
 63; format with suitable code optimizations can achieve bandwidth saturati
 on for SpMV.\n\nRegistration Category: Workshop Reg Pass
END:VEVENT
END:VCALENDAR