Affiliations: Barcelona Supercomputing Center, UPC, Campus Nord
– C6, Jordi Girona 1–3, 08034 Barcelona, Spain | IBM T.J. Watson Research Center, 1101 Kitchawan Road,
Route 134, Yorktown Heights, NY 10598, USA
Abstract: This paper evaluates and analyzes multilevel parallelism on a chip
multiprocessor (CMP) architecture. The environment is based on the experimental
IBM BG/Cyclops architecture, where we have run the multi–zone parallel
benchmarks. Multilevel parallelism is spawned using the Nanos OpenMP execution
environment. We have performed the analysis with different execution parameters
in order to evaluate different hardware threads distributions, cache
utilization, and thread grouping configurations. Our results demonstrate that a
large number of thread groups and good balancing algorithms are critical for
high performance. We also show that a small number of threads can share the
same data cache to increase the performance, but a large number of threads
should better not share the same data caches.