From physics model to results: An optimizing framework for cross-architecture code generation

Blazewicz, Marek; Hinder, Ian; Koppelman, David M.; Brandt, Steven R.; Ciznicki, Milosz; Kierzynka, Michal; Löffler, Frank; Schnetter, Erik; Tao, Jian

doi:10.3233/SPR-130360

From physics model to results: An optimizing framework for cross-architecture code generation

Article type: Research Article

Authors: Blazewicz, Marek^{; ;} | Hinder, Ian | Koppelman, David M.^; | Brandt, Steven R.^; | Ciznicki, Milosz | Kierzynka, Michal^; | Löffler, Frank | Schnetter, Erik^{; ;} | Tao, Jian

Affiliations: Applications Department, Poznań Supercomputing & Networking Center, Poznań, Poland | Poznań University of Technology, Poznań, Poland | Max-Planck-Institut für Gravitationsphysik, Albert-Einstein-Institut, Potsdam, Germany | Center for Computation & Technology, Louisiana State University, Baton Rouge, LA, USA | Division of Electrical & Computer Engineering, Louisiana State University, Baton Rouge, LA, USA | Division of Computer Science, Louisiana State University, Baton Rouge, LA, USA | Perimeter Institute for Theoretical Physics, Waterloo, ON, Canada | Department of Physics, University of Guelph, Guelph, ON, Canada

Note: [] Corresponding author: Marek Blazewicz. E-mail: [email protected]

Abstract: Starting from a high-level problem description in terms of partial differential equations using abstract tensor notation, the Chemora framework discretizes, optimizes, and generates complete high performance codes for a wide range of compute architectures. Chemora extends the capabilities of Cactus, facilitating the usage of large-scale CPU/GPU systems in an efficient manner for complex applications, without low-level code tuning. Chemora achieves parallelism through MPI and multi-threading, combining OpenMP and CUDA. Optimizations include high-level code transformations, efficient loop traversal strategies, dynamically selected data and instruction cache usage strategies, and JIT compilation of GPU code tailored to the problem characteristics. The discretization is based on higher-order finite differences on multi-block domains. Chemora's capabilities are demonstrated by simulations of black hole collisions. This problem provides an acid test of the framework, as the Einstein equations contain hundreds of variables and thousands of terms.

Keywords: Automatic parallelization, hybrid computing, GPU computing, parallel application frameworks, numerical methods

DOI: 10.3233/SPR-130360

Journal: Scientific Programming, vol. 21, no. 1-2, pp. 1-16, 2013

Published: 2013

Price: EUR 27.50

North America

IOS Press, Inc.
6751 Tepper Drive
Clifton, VA 20124
USA

Tel: +1 703 830 6300
Fax: +1 703 830 2300
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

Europe

IOS Press
Nieuwe Hemweg 6B
1013 BG Amsterdam
The Netherlands

Tel: +31 20 688 3355
Fax: +31 20 687 0091
[email protected]

For editorial issues, permissions, book requests, submissions and proceedings, contact the Amsterdam office [email protected]

Asia

Inspirees International (China Office)
Ciyunsi Beili 207(CapitaLand), Bld 1, 7-901
100025, Beijing
China

Free service line: 400 661 8717
Fax: +86 10 8446 7947
[email protected]

For editorial issues, like the status of your submitted paper or proposals, write to [email protected]

如果您在出版方面需要帮助或有任何建, 件至: [email protected]

Share this:

North America

Europe

Asia