High-Bypass Learning: Automated Detection of Tumor Cells that Significantly Impact Drug Response

SC20 Proceedings

High-Bypass Learning: Automated Detection of Tumor Cells that Significantly Impact Drug Response

Workshop:Machine Learning in HPC Environments

Authors: Justin Wozniak (Argonne National Laboratory (ANL), University of Chicago); Hyunseung Yoo (Argonne National Laboratory (ANL)); Jamaludin Mohd-Yusof (Los Alamos National Laboratory); and Bogdan Nicolae, Richard Turgeon, Nick Collier, Jonathan Ozik, Thomas Brettin, and Rick Stevens (Argonne National Laboratory (ANL))

Abstract: Machine learning in biomedicine is reliant on the availability of large, high-quality data sets. These corpora are used for training statistical or deep learning -based models that can be validated against other data sets and ultimately used to guide decisions. The quality of these data sets is an essential component of the quality of the models and their decisions. Thus, identifying and inspecting outlier data is critical for evaluating, curating, and using biomedical data sets. Many techniques are available to look for outlier data, but it is not clear how to evaluate the impact on highly complex deep learning methods. In this paper, we use deep learning ensembles and workflows to construct a system for automatically identifying data subsets that have a large impact on the trained models. These effects can be quantified and presented to the user for further inspection, which could improve data quality overall. We then present results from running this method on the near-exascale Summit supercomputer.

Website:

Back to Machine Learning in HPC Environments Archive Listing

Back to Full Workshop Archive Listing