An Efficient Task-based All-Reduce for Machine Learning Applications

Zhenyu Li, James Davis, Stephen Jarvis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

All-Reduce is a collective-combine operation frequently utilised in synchronous parameter updates in parallel machine learning algorithms. The performance of this operation - and subsequently of the algorithm itself - is heavily dependent on its implementation, configuration and on the supporting hardware on which it is run. Given the pivotal role of all-reduce, a failure in any of these regards will significantly impact the resulting scientific output. In this research we explore the performance of alternative allreduce algorithms in data-flow graphs and compare these to the commonly used reduce-broadcast approach. We present an architecture and interface for all-reduce in task-based frameworks, and a parallelization scheme for object-serialization and computation. We present a concrete, novel application of a butterfly all-reduce algorithm on the Apache Spark framework on a high-performance compute cluster, and demonstrate the effectiveness of the new butterfly algorithm with a logarithmic speed-up with respect to the vector length compared with the original reduce-broadcast method - a 9x speed-up is observed for vector lengths in the order of 10 8 . This improvement is comprised of both algorithmic changes (65%) and parallel-processing optimization (35%). The effectiveness of the new butterfly all-reduce is demonstrated using real-world neural network applications with the Spark framework. For the model-update operation we observe significant speedups using the new butterfly algorithm compared with the original reduce-broadcast, for both smaller (Cifar and Mnist) and larger (ImageNet) datasets.

Original languageEnglish
Title of host publicationProceedings of MLHPC 2017
Subtitle of host publicationMachine Learning in HPC Environments - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis
PublisherAssociation for Computing Machinery
ISBN (Electronic)9781450351379
DOIs
Publication statusPublished - 12 Nov 2017
Event2017 Machine Learning in HPC Environments, MLHPC 2017 - Denver, United States
Duration: 12 Nov 201717 Nov 2017

Publication series

NameProceedings of MLHPC 2017: Machine Learning in HPC Environments - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis

Conference

Conference2017 Machine Learning in HPC Environments, MLHPC 2017
Country/TerritoryUnited States
CityDenver
Period12/11/1717/11/17

Bibliographical note

Funding Information:
This research is supported by Atos IT Services UK Ltd and by the EPSRC Centre for Doctoral Training in Urban Science and Progress (grant no. EP/L016400/1).

Publisher Copyright:
© 2017 Association for Computing Machinery.

Keywords

  • Apache Spark
  • Butterfly All-Reduce
  • Data-flow Frameworks
  • Synchronous Model Training

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'An Efficient Task-based All-Reduce for Machine Learning Applications'. Together they form a unique fingerprint.

Cite this