Modernizing production-grade, often legacy applications to take advantage of modern multi-core and many-core architectures can be a difficult and costly undertaking. This is especially true currently, as it is unclear which architectures will dominate future systems. The complexity of these codes can mean that parallelisation for a given architecture requires significant re-engineering. One way to assess the benefit of such an exercise would be to use mini-Applications that are representative of the legacy programs.In this paper, we investigate different implementations of TeaLeaf, a mini-Application from the Mantevo suite that solves the linear heat conduction equation. TeaLeaf has been ported to use many parallel programming models, including OpenMP, CUDA and MPI among others. It has also been re-engineered to use the OPS embedded DSL and template libraries Kokkos and RAJA. We use these different implementations to assess the performance portability of each technique on modern multi-core systems.While manually parallelising the application targeting and optimizing for each platform gives the best performance, this has the obvious disadvantage that it requires the creation of different versions for each and every platform of interest. Frameworks such as OPS, Kokkos and RAJA can produce executables of the program automatically that achieve comparable portability. Based on a recently developed performance portability metric, our results show that OPS and RAJA achieve an application performance portability score of 71% and 77% respectively for this application.
|Title of host publication||Proceedings - 2017 IEEE International Conference on Cluster Computing, CLUSTER 2017|
|Publisher||Institute of Electrical and Electronics Engineers (IEEE)|
|Number of pages||8|
|Publication status||Published - 22 Sept 2017|
|Event||2017 IEEE International Conference on Cluster Computing, CLUSTER 2017 - Honolulu, United States|
Duration: 5 Sept 2017 → 8 Sept 2017
|Name||Proceedings - IEEE International Conference on Cluster Computing, ICCC|
|Conference||2017 IEEE International Conference on Cluster Computing, CLUSTER 2017|
|Period||5/09/17 → 8/09/17|
Bibliographical noteFunding Information:
The OPS project is funded by the UK Engineering and Physical Sciences Research Council projects EP/K038494/1, EP/K038486/1, EP/K038451/1 and EP/K038567/1 on “Future-proof massively-parallel execution of multi-block applications” project. This research was also funded by the Hungarian Human Resources Development Operational Programme (EFOP-3.6.2-16-2017-00013) and by the János Bólyai Research Scholarship of the Hungarian Academy of Sciences.
This work was supported by the UK Atomic Weapons Establishment under grant CDK0724 (AWE Technical Outreach Programme). Prof. Stephen Jarvis is an AWE William Penney Fellow. This work would not have been possible without the assistance of a number of members of the Applied Computer Science group at AWE, to whom we would like to express our gratitude.
© 2017 IEEE.
- Performance Portability
ASJC Scopus subject areas
- Hardware and Architecture
- Signal Processing