TY - GEN
T1 - Predictive simulation of HPC applications
AU - Hammond, S. D.
AU - Smith, J. A.
AU - Mudalige, G. R.
AU - Jarvis, S. A.
PY - 2009
Y1 - 2009
N2 - The architectures which support modern supercomputing machinery are as diverse today, as at any point during the last twenty years. The variety of processor core arrangements, threading strategies and the arrival of heterogeneous computation nodes are driving modern-day solutions to petaflop speeds. The increasing complexity of such systems, as well as codes written to take advantage of the new computational abilities, pose significant frustrations for existing techniques which aim to model and analyse the performance of such hardware and software. In this paper we demonstrate the use of post-execution analysis on trace-based profiles to support the construction of simulation-based models. This involves combining the runtime capture of call-graph information with computational timings, which in turn allows representative models of code behaviour to be extracted. The main advantage of this technique is that it largely automates performance model development, a burden associated with existing techniques. We demonstrate the capabilities of our approach using both the NAS Parallel Benchmark suite and a real-world supercomputing benchmark developed by the United Kingdom Atomic Weapons Establishment. The resulting models, developed in less than two hours per code, have a good degree of predictive accuracy. We also show how one of these models can be used to explore the performance of the code on over 16,000 cores, demonstrating the scalability of our solution.
AB - The architectures which support modern supercomputing machinery are as diverse today, as at any point during the last twenty years. The variety of processor core arrangements, threading strategies and the arrival of heterogeneous computation nodes are driving modern-day solutions to petaflop speeds. The increasing complexity of such systems, as well as codes written to take advantage of the new computational abilities, pose significant frustrations for existing techniques which aim to model and analyse the performance of such hardware and software. In this paper we demonstrate the use of post-execution analysis on trace-based profiles to support the construction of simulation-based models. This involves combining the runtime capture of call-graph information with computational timings, which in turn allows representative models of code behaviour to be extracted. The main advantage of this technique is that it largely automates performance model development, a burden associated with existing techniques. We demonstrate the capabilities of our approach using both the NAS Parallel Benchmark suite and a real-world supercomputing benchmark developed by the United Kingdom Atomic Weapons Establishment. The resulting models, developed in less than two hours per code, have a good degree of predictive accuracy. We also show how one of these models can be used to explore the performance of the code on over 16,000 cores, demonstrating the scalability of our solution.
UR - http://www.scopus.com/inward/record.url?scp=70349481819&partnerID=8YFLogxK
U2 - 10.1109/AINA.2009.95
DO - 10.1109/AINA.2009.95
M3 - Conference contribution
AN - SCOPUS:70349481819
SN - 9780769536385
T3 - Proceedings - International Conference on Advanced Information Networking and Applications, AINA
SP - 33
EP - 40
BT - Proceedings - 2009 International Conference on Advanced Information Networking and Applications, AINA 2009
T2 - 2009 International Conference on Advanced Information Networking and Applications, AINA 2009
Y2 - 26 May 2009 through 29 May 2009
ER -