Abstract
All-Reduce is a collective-combine operation frequently utilised in synchronous parameter updates in parallel machine learning algorithms. The performance of this operation - and subsequently of the algorithm itself - is heavily dependent on its implementation, configuration and on the supporting hardware on which it is run. Given the pivotal role of all-reduce, a failure in any of these regards will significantly impact the resulting scientific output. In this research we explore the performance of alternative allreduce algorithms in data-flow graphs and compare these to the commonly used reduce-broadcast approach. We present an architecture and interface for all-reduce in task-based frameworks, and a parallelization scheme for object-serialization and computation. We present a concrete, novel application of a butterfly all-reduce algorithm on the Apache Spark framework on a high-performance compute cluster, and demonstrate the effectiveness of the new butterfly algorithm with a logarithmic speed-up with respect to the vector length compared with the original reduce-broadcast method - a 9x speed-up is observed for vector lengths in the order of 10 8 . This improvement is comprised of both algorithmic changes (65%) and parallel-processing optimization (35%). The effectiveness of the new butterfly all-reduce is demonstrated using real-world neural network applications with the Spark framework. For the model-update operation we observe significant speedups using the new butterfly algorithm compared with the original reduce-broadcast, for both smaller (Cifar and Mnist) and larger (ImageNet) datasets.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of MLHPC 2017 |
| Subtitle of host publication | Machine Learning in HPC Environments - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis |
| Publisher | Association for Computing Machinery |
| ISBN (Electronic) | 9781450351379 |
| DOIs | |
| Publication status | Published - 12 Nov 2017 |
| Event | 2017 Machine Learning in HPC Environments, MLHPC 2017 - Denver, United States Duration: 12 Nov 2017 → 17 Nov 2017 |
Publication series
| Name | Proceedings of MLHPC 2017: Machine Learning in HPC Environments - Held in conjunction with SC 2017: The International Conference for High Performance Computing, Networking, Storage and Analysis |
|---|
Conference
| Conference | 2017 Machine Learning in HPC Environments, MLHPC 2017 |
|---|---|
| Country/Territory | United States |
| City | Denver |
| Period | 12/11/17 → 17/11/17 |
Bibliographical note
Funding Information:This research is supported by Atos IT Services UK Ltd and by the EPSRC Centre for Doctoral Training in Urban Science and Progress (grant no. EP/L016400/1).
Publisher Copyright:
© 2017 Association for Computing Machinery.
Keywords
- Apache Spark
- Butterfly All-Reduce
- Data-flow Frameworks
- Synchronous Model Training
ASJC Scopus subject areas
- Computer Networks and Communications
- Artificial Intelligence
Fingerprint
Dive into the research topics of 'An Efficient Task-based All-Reduce for Machine Learning Applications'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver