SC20 Proceedings

The International Conference for High Performance Computing, Networking, Storage, and Analysis

EventGraD: Event-Triggered Communication in Parallel Stochastic Gradient Descent


Workshop:Machine Learning in HPC Environments

Authors: Soumyadip Ghosh and Vijay Gupta (University of Notre Dame)


Abstract: Communication in parallel systems consumes significant amount of time and energy which often turns out to be a bottleneck in distributed machine learning. In this paper, we present EventGraD - an algorithm with event-triggered communication in parallel stochastic gradient descent. The main idea of this algorithm is to modify the requirement of communication at every epoch to communicating only in certain epochs when necessary. In particular, the parameters are communicated only in the event when the change in their values exceed a threshold. The threshold for a parameter is chosen adaptively based on the rate of change of the parameter. The adaptive threshold ensures that the scheme can be applied to different models on different datasets without any change. We focus on data-parallel training of a popular convolutional neural network used for training the MNIST dataset and show that EventGraD can reduce the communication load by up to 70% while retaining the same level of accuracy.


Website:






Back to Machine Learning in HPC Environments Archive Listing



Back to Full Workshop Archive Listing