BEGIN:VCALENDAR
VERSION:2.0
PRODID:Linklings LLC
BEGIN:VTIMEZONE
TZID:America/New_York
X-LIC-LOCATION:America/New_York
BEGIN:DAYLIGHT
TZOFFSETFROM:-0500
TZOFFSETTO:-0400
TZNAME:EDT
DTSTART:19700308T020000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=2SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:-0400
TZOFFSETTO:-0500
TZNAME:EST
DTSTART:19701101T020000
RRULE:FREQ=YEARLY;BYMONTH=11;BYDAY=1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTAMP:20210402T160559Z
LOCATION:Track 8
DTSTART;TZID=America/New_York:20201113T114000
DTEND;TZID=America/New_York:20201113T121500
UID:submissions.supercomputing.org_SC20_sess227_ws_pyhpc105@linklings.com
SUMMARY:Data Engineering for HPC with Python
DESCRIPTION:Workshop\n\nData Engineering for HPC with Python\n\nAbeykoon, 
 Perera, Widanage, Kamburugamuve, Kanewala...\n\nData engineering is becomi
 ng an increasingly important part of scientific discoveries with the adopt
 ion of deep learning and machine learning. Data engineering deals with a v
 ariety of data formats, storage, data extraction, transformation and data 
 movements. One goal of data engineering is to transform data from original
  data to vector/matrix/tensor formats accepted by deep learning and machin
 e learning applications. There are many structures such as tables, graphs 
 and trees to represent data in these data engineering phases. Among them, 
 tables are a versatile and commonly used format to load and process data. 
 In this paper, we present a distributed Python API based on table abstract
 ion for representing and processing data. Unlike existing state-of-the-art
  data engineering tools written purely in Python, our solution adopts high
 -performance compute kernels in C++, with an in-memory table representatio
 n with Cython-based Python bindings. In the core system, we use MPI for di
 stributed memory computations with a data-parallel approach for processing
  large datasets in HPC clusters.\n\nRegistration Category: Workshop Reg Pa
 ss
END:VEVENT
END:VCALENDAR

