Comparative evaluation and case studies of shared‐memory and data‐parallel execution patterns[1]This work is supported in part by the National Science Foundation under grants CCR‐9102854 and CCR‐9400719, by the U.S. Air Force under research agreement FD‐204092‐64157, by Air Force Office of Scientific Research under grant AFOSR‐95‐01‐0215, and by a grant from Cray Research. Part of the experiments were conducted on the CM‐5 machines in Los Alamos National Laboratory and in the National Center for Supercomputing Applications at the University of Illinois, and on the KSR‐1 machines at Cornell University and at the University of Washington.
Affiliations: Department of Computer Science, College of William and Mary, Williamsburg, VA 23187‐8795, USA | USWest Advanced Technology, Denver, CO 80503, USA
Abstract: Shared‐memory and data‐parallel programming models are two important paradigms for scientific applications. Both models provide high‐level program abstractions, and simple and uniform views of network structures. The common features of the two models significantly simplify program coding and debugging for scientific applications. However, the underlining execution and overhead patterns are significantly different between the two models due to their programming constraints, and due to different and complex structures of interconnection networks and systems which support the two models. We performed this experimental study to present implications and comparisons of execution patterns on two commercial architectures. We implemented a standard electromagnetic simulation program (EM) and a linear system solver using the shared‐memory model on the KSR‐1 and the data‐parallel model on the CM‐5. Our objectives are to examine the execution pattern changes required for an implementation transformation between the two models; to study memory access patterns; to address scalability issues; and to investigate relative costs and advantages/disadvantages of using the two models for scientific computations. Our results indicate that the EM program tends to become computation‐intensive in the KSR‐1 shared‐memory system, and memory‐demanding in the CM‐5 data‐parallel system when the systems and the problems are scaled. The EM program, a highly data‐parallel program performed extremely well, and the linear system solver, a highly control‐structured program suffered significantly in the data‐parallel model on the CM‐5. Our study provides further evidence that matching execution patterns of algorithms to parallel architectures would achieve better performance.