Reading List
0. Three topics are chosen around multiprocessors: deterministic replay, cache and main memory; |
1. Words in blue starting with '↳' under each paper title is a brief summary of the paper; |
2. [Italic red words within brackets] are my classification labels for the paper; |
3. Three selected papers are listed for each topic; |
4. Main factors in choosing the papers are contents, novality, conferences and citations, etc. |
Determistic Replay in Multiprocessors
In multiprocessors, multithreaded programs are executed non-deterministically; for bug reproduction and fault tolerance, deterministic replay was proposed to record sufficient execution events and replay these events later. |
||
A "Flight Data Recorder" for enabling full-system multiprocessor deterministic replay
[hardware, offline, one-run]
↳Processor-based offline full-system deterministic replay of multiprocessor executions. FDR is a practical low-overhead hardware recorder for cache-coherent multiprocessors. |
ISCA'2003 [cite'353] | |
PRES: Probabilistic replay with execution sketching on multiprocessors
[software, offline, several-run]
↳A software-only solution to reproduce concurrency bug on multiprocessors in multiple runs, which greatly lowers record overhead. |
SOSP'2009 [cite'162] | |
Respec: Efficient online multiprocessor replay via speculation and external determinism
[software, online, one-run]
↳The first system to support low-overhead, online deterministic replay on multiprocessors without hardware support. |
ASPLOS'2010 [cite'99] |
Cache Management in CMP
CMPs often execute a wide variety of applications with differing requirements. To maximize performance, cache should be configured with respect to workload characteristics. |
||
Token coherence: Decoupling performance and correctness
[coherence]
↳A new coherence framework to enable coherence protocols by separating performance from correctness. |
ISCA'2003 [cite'259] | |
ASR: Adaptive selective replication for CMP caches
[partition wrt average access time]
↳Dynamically monitors workload behaviors and then adjusts the replication level to minimize average access time. |
MICRO'2006 [cite'193] | |
Ubik: Efficient cache sharing with strict QoS for latency-critical workloads
[partition wrt QoS]
↳Proposed Ubik, a dynamic partitioning technique that predicts and exploits the transient behavior of latency-critical workloads to maintain their tail latency while maximizing the cache space available to batch applications. |
ASPLOS'2014 [cite'3] |
Main Memory in CMP
CMPs have limited off-chip bandwidth, which is competed by various applications. The potential interference may harm both system performance and individual application performance. |
||
Scaling the bandwidth wall: Challenges in and avenues for CMP scaling
[general study]
↳Developed an analytical model to study the bandwidth wall problems for CMP systems. |
ISCA'2009 [cite'142] | |
Understanding how off-chip memory bandwidth partitioning in chip multiprocessors affects system performance
[partition]
↳Constructed an analytical model to understand how bandwidth partitioning affects performance, and how bandwidth and cache partitioning interact with one another. |
HPCA'2010 [cite'62] | |
Thread cluster memory scheduling: Exploiting differences in memory access behavior
[scheduling]
↳A new memory scheduling that addresses system throughput and fairness separately with the goal of achieving the best of both. |
MICRO'2010 [cite'175] |