Affiliations: Department of Electrical and Computer Engineering,
University of Colorado at Boulder, 425 UCB, Boulder, CO 80309, USA | Department of Computer Architecture, Politecnic
University of Catalunya, C/. Jordi Girona, 1–3, Modulo C6 (Campus Nord)
E-08034 Barcelona, Spain
Abstract: Chip multi-processors (CMP) are rapidly emerging as an important
design paradigm for both high performance and embedded processors. These
machines provide an important performance alternative to increasing the clock
frequency. In spite of the increase in potential performance, several issues
related to resource sharing on the chip can negatively impact the performance
of embedded applications. In particular, the shared on-chip caches make each
job's memory access times dependent on the behavior of the other jobs sharing
the cache. If not adequately managed, this can lead to problems in meeting hard
real-time scheduling constraints. This work explores adaptable caching
strategies which balance the resource demands of each application and in turn
lead to improvements in throughput for the collective workload. Experimental
results demonstrate speedups of up to 1.47X for workloads of two co-scheduled
applications compared against a fully-shared two-level cache hierarchy.
Additionally, the adaptable caching scheme is shown to achieve an average
speedup of 1.10X over the leading cache partitioning model. By dynamically
managing cache storage for multiple application threads at runtime, sizable
performance levels are achieved, which provides chip designers the opportunity
to maintain high performance as cache size and power budgets become a concern
in the CMP design space.