Abstract
All-Reduce is a collective-combine operation frequently utilised in synchronous parameter updates in parallel machine learning algorithms. The performance of this operation - and subsequently of the algorithm itself - is heavily dependent on its implementation, configuration and on the supporting hardware on which it is run. Given the pivotal role of all-reduce, a failure in any of these regards will significantly impact the resulting scientific output. In this research we explore the performance of alternative allreduce algorithms in data-flow graphs and compare these to the commonly used reduce-broadcast approach. We present an architecture and interface for all-reduce in task-based frameworks, and a parallelization scheme for object-serialization and computation. We present a concrete, novel application of a butterfly all-reduce algorithm on the Apache Spark framework on a high-performance compute cluster, and demonstrate the effectiveness of the new butterfly algorithm with a logarithmic speed-up with respect to the vector length compared with the original reduce-broadcast method - a 9x speed-up is observed for vector lengths in the order of 10 8 . This improvement is comprised of both algorithmic changes (65%) and parallel-processing optimization (35%). The effectiveness of the new butterfly all-reduce is demonstrated using real-world neural network applications with the Spark framework. For the model-update operation we observe significant speedups using the new butterfly algorithm compared with the original reduce-broadcast, for both smaller (Cifar and Mnist) and larger (ImageNet) datasets.
Original language | English |
---|---|
Title of host publication | Proceedings of MLHPC 2017 |
Subtitle of host publication | Machine Learning in HPC Environments - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis |
Publisher | Association for Computing Machinery |
ISBN (Electronic) | 9781450351379 |
DOIs | |
Publication status | Published - 12 Nov 2017 |
Event | 2017 Machine Learning in HPC Environments, MLHPC 2017 - Denver, United States Duration: 12 Nov 2017 → 17 Nov 2017 |
Publication series
Name | Proceedings of MLHPC 2017: Machine Learning in HPC Environments - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis |
---|
Conference
Conference | 2017 Machine Learning in HPC Environments, MLHPC 2017 |
---|---|
Country/Territory | United States |
City | Denver |
Period | 12/11/17 → 17/11/17 |
Bibliographical note
Funding Information:This research is supported by Atos IT Services UK Ltd and by the EPSRC Centre for Doctoral Training in Urban Science and Progress (grant no. EP/L016400/1).
Publisher Copyright:
© 2017 Association for Computing Machinery.
Keywords
- Apache Spark
- Butterfly All-Reduce
- Data-flow Frameworks
- Synchronous Model Training
ASJC Scopus subject areas
- Computer Networks and Communications
- Artificial Intelligence