BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160556Z
LOCATION:Track 1
DTSTART;TZID=America/New_York:20201111T123000
DTEND;TZID=America/New_York:20201111T125500
UID:submissions.supercomputing.org_SC20_sess193_ws_hipar106@linklings.com
SUMMARY:Flexible Runtime Reconfigurable Computing Overlay Architecture and
  Optimization for Dataflow Applications
DESCRIPTION:Workshop\n\nFlexible Runtime Reconfigurable Computing Overlay 
 Architecture and Optimization for Dataflow Applications\n\nShah, Carrion S
 chafer\n\nMany computationally intensive applications are accelerated on F
 PGAs following the stream computing, also called dataflow computing, parad
 igm.  This entails that data is streamed through different components of a
  given application in wide deep pipelines to maximize throughput. One of t
 he main drawbacks of this computing paradigm is that it consumes a large n
 umber of hardware resources.\n\nThus, in this work, we propose a partial r
 untime reconfigurable overlay on which to map any computationally intensiv
 e application given as a behavioral description for High-Level Synthesis (
 HLS) composed of multiple stages, which would typically fit the stream com
 puting paradigm. This overlay uses the internal's FPGA BlockRAM to store t
 he intermediate results of each stage in order to speed up the computation
  and time-multiplexes the different stages by reconfiguring the computatio
 nal part.\n\nThis work also includes a design methodology to optimize the 
 micro-architectural implementation of each stage in order to balance the d
 ataflow architecture as well as generating systems with unique area vs. pe
 rformance trade-offs. The proposed architecture and methodology has been p
 rototyped on a Xilinx Zedboard mounting a Zynq FPGA using a variety of syn
 thetic dataflows and a case study of a JPEG encoder is presented highlight
 ing the benefits of it. The overlay will be made public and open source af
 ter the publication of this paper.\n\nTag: Extreme Scale Computing, Hetero
 geneous Systems, Parallel Programming Languages, Libraries, and Models, Po
 rtability, Resource Management and Scheduling, Scalable Computing\n\nRegis
 tration Category: Workshop Reg Pass
END:VEVENT
END:VCALENDAR