BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160544Z
LOCATION:Poster Module
DTSTART;TZID=America/New_York:20201119T083000
DTEND;TZID=America/New_York:20201119T170000
UID:submissions.supercomputing.org_SC20_sess337_rpost125@linklings.com
SUMMARY:A Workflow Hierarchy-Aware Fault Tolerance System
DESCRIPTION:Posters, Research Posters\n\nA Workflow Hierarchy-Aware Fault 
 Tolerance System\n\nBehera, Ahn, Herbein, Mueller, Rountree\n\nComplex sci
 entific workflows present unprecedented challenges to fault tolerance supp
 ort in high-performance computing (HPC). While existing solutions such as 
 checkpoint/restart (C/R) and resource over-provisioning work well at the a
 pplication level, they do not scale to the demand by complex workflows. As
  workflows are composed of a large variety of components, they must detect
 , propagate and recover from a fault in a highly coordinated way, lest han
 dling action itself do more harm than good. We propose Workflow Hierarchy-
 aware Exception Specification Language (WHESL), a novel solution that allo
 ws a modern workflow to specify and handle faults and exceptions among its
  disparate components in an easy and coordinated fashion. Our preliminary 
 study using our prototype built on top of Flux, a next-generation hierarch
 ical resource and job management system (RJMS), shows that WHESL can signi
 ficantly extend the traditional HPC fault tolerance support for complex wo
 rkflows.\n\nRegistration Category: Tech Program Reg Pass, Exhibits Reg Pas
 s
END:VEVENT
END:VCALENDAR

