Angen Zheng



Communication Heterogeneity in Multicore Systems
1. "LiMIC: Support for High-Performance MPI Intra-Node Communication on Linux Cluster" by Hyun-Wook Jin, Sayantan Sur, Lei Chai, and Dhabaleswar K. Panda. ICPP, 2005.
2. "Cache-Efficient, Intranode, Large-Message MPI Communication with MPICH2-Nemesis" by Darius Buntinas, Brice Goglin, David Goodell, Guillaume Mercier, St́ephanie Moreaud. ICPP, 2009.
3. "Processor Affinity and MPI Performance on SMP-CMP Clusters " by Chi Zhang , Xin Yuan , Ashok Srinivasan IPDPSW, 2010.
4. "Designing topology-aware collective communication algorithms for large scale InfiniBand clusters: Case studies with Scatter and Gather" by Krishna Kandalla, Hari Subramoni, Abhinav Vishnu, and Dhabaleswar K. (DK) Panda IPDPSW, 2010.
5. "Multi-core and network aware MPI topology functions" by Mohammad Javad Rashti, Jonathan Green, Pavan Balaji,Ahmad Afsahi, and William Gropp EuroMPI, 2011.
6. "Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System" by Lei Chai, Qi Gao, and Dhabaleswar K. Panda CCGRID, 2007.

Resource Contention in Multicore Systems
1. "Performance impact of resource contention in multicore systems " by R. Hood, H. Jin, P. Mehrotra,J. Chang, J. Djomehri, S. Gavali,D. Jespersen, K. Taylor, and R. Biswas. IPDPS, 2010.
2. "The impact of memory subsystem resource sharing on datacenter applications" by Lingjia Tang, Jason Mars, Neil Vachharajani, Robert Hundt, and Mary Lou Soffa ISCA, 2011.
3. "Contentiousness vs. sensitivity: improving contention aware runtime systems on multicore architectures" by Lingjia Tang, Jason Mars, and Mary Lou Soffa EXADAPT, 2011.
4. "Traffic management: a holistic approach to memory placement on NUMA systems" by Mohammad Dashti, Alexandra Fedorova, Justin Funston, Fabien Gaud, Renaud Lachaize, Baptiste Lepers, Vivien Quema, and Mark Roth ASPLOS, 2013.

Multi-Level Graph (Re)Partitioning (Heavyweight)
1. Zoltan Paper: "A repartitioning hypergraph model for dynamic load balancing" by Catalyurek, Umit V., Erik G. Boman, Karen D. Devine, Doruk Bozdağ, Robert T. Heaphy, and Lee Ann Riesen. Journal of Parallel and Distributed Computing, 2009.
2. ParMetis Paper: "A unified algorithm for load balancing adaptive scientific simulations" by Kirk Schloegel, George Karypis, Vipin Kumar. ACM/IEEE Supercomputing Conference, 2000.
3. "Graph partitioning for high performance scientific simulations" by Kirk Schloegel, George Karypis, Vipin Kumar. Army High Performance Computing Research Center, 2000.
4. "Graph partitioning models for parallel computing" by Bruce Hendrickson, Tamara G Kolda. Parallel computing, 2000.

Lightweight Graph (Re)Partitioning
1. "Mizan: a system for dynamic load balancing in large-scale graph processing" by Zuhair Khayyat, Karim Awara, Amani Alonazi, Hani Jamjoom, Dan Williams, Panos Kalnis. Proceedings of the 8th ACM European Conference on Computer Systems, 2013.
2. "Catch the wind: Graph workload balancing on cloud" by Zechao Shang, Jeffrey Xu Yu. ICDE, 2013.
3. "xdgp: A dynamic graph processing system with adaptive partitioning" by Luis Vaquero, Félix Cuadrado, Dionysios Logothetis, Claudio Martella. CoRR, 2013.
4. "Managing Social Network Data through Dynamic Distributed Partitioning" by Daniel Nicoara, Shahin Kamali, Khuzaima Daudjee, Lei Chen. 2014.
5. "LogGP: A Log-based Dynamic Graph Partitioning Method" by Xu, Ning, Lei Chen, and Bin Cui. VLDB, 2015.

Streaming Graph (Re)Partitioning
1. "Streaming graph partitioning for large distributed graphs" by Isabelle Stanton, Gabriel Kliot. KDD, 2012.
2. "(Re)partitioning for stream-enabled computation" by Erwan Le Merrer, Yizhong Liang, Gilles Trédan. arXiv:1310.8211, 2013.

Architecture-Aware Graph (Re)Partitioning
1. "Improving large graph processing on partitioned graphs in the cloud" by Chen, Rishan, Mao Yang, Xuetian Weng, Byron Choi, Bingsheng He, and Xiaoming Li. SoCC, 2012
2. "Architecture Aware Partitioning Algorithms" by Moulitsas Irene, and George Karypis. Springer Berlin Heidelberg, 2008
3. "Heterogeneous Environment Aware Streaming Graph Partitioning " by Ning Xu, Bin Cui ; Lei Chen ; Zi Huang ; Yingxia Shao TKDE, 2015

Differentiated Graph (Re)Partitioning
1. "Powerlyra: Differentiated graph computation and partitioning on skewed graphs " by Chen, R., J. Shi, Y. Chen, H. Guan, B. Zang, and H. Chen. SoCC, 2012
2. "Bipartite-oriented Distributed Graph Partitioning for Big Learning" by Chen, Rong, Jiaxin Shi, Binyu Zang, and Haibing Guan. mirror 4, no. 2 (2014)
3. "On Graphs, GPUs, and Blind Dating: A Workload to Processor Matchmaking Quest" by Gharaibeh, Abdullah, Lauro Beltrao Costa, Elizeu Santos-Neto, and Matei Ripeanu. IPDPS, 2013
4. "Scaling Techniques for Massive Scale-Free Graphs in Distributed (External) Memory" by Roger Pearce, Maya Gokhale, Nancy M. Amato. IPDPS, 2013

2D Graph (Re)Partitioning
1. "Scalable matrix computations on large scale-free graphs using 2D graph partitioning " by Boman, Erik G., Karen D. Devine, and Sivasankaran Rajamanickam. SC13, 2013

Graph Partition Refinement
1. "An efficient heuristic procedure for partitioning graphs" by Kernighan, B. W.; Lin, Shen. Bell System Technical Journal, 1970.
2. "A linear-time heuristic for improving network partitions" by Charles M Fiduccia, Robert M Mattheyses. 19th Conference on Design Automation, 1982.
3. "Scalable Parallel Refinement of Graph Partitions" by Christian Schulz. PhD Thesis, 2009.

Distributed Graph Computation
1. "Pregel: a system for large-scale graph processing" by Malewicz, Grzegorz, Matthew H. Austern, Aart JC Bik, James C. Dehnert, Ilan Horn, Naty Leiser, and Grzegorz Czajkowski. SIGMOD, 2010.
2. "PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs" by Gonzalez, Joseph E., Yucheng Low, Haijie Gu, Danny Bickson, and Carlos Guestrin. OSDI, 2012.
4. "Parallel breadth-first search on distributed memory systems" by Buluç, Aydin, and Kamesh Madduri. International Conference for High Performance Computing, Networking, Storage and Analysis, 2011
5. "Parallel Breadth First Search on GPU Clusters" by Fu, Zhisong, Harish Kumar Dasari, Martin Berzins, and Bryan Thompson. BigData, 2014
6. "Delta-stepping: A parallel single source shortest path algorithm" by Meyer, Ulrich, and Peter Sanders. Algorithms—ESA, 1998
7. "Many Random Walks Are Faster Than One" by Noga Alon, Chen Avin, Michal Koucky, Gady Kozma, Zvi Lotker, Mark R. Tuttle. arXiv:0705.0467, 2007
8. "Fast Parallel PageRank: A Linear System Approach" by Gleich, David, Leonid Zhukov, and Pavel Berkhin. Yahoo! Research Technical Report YRL-2004-038, 2004
9. "Large-Scale Distributed Graph Computing Systems: An Experimental Evaluation" by Yi Lu, James Cheng, Da Yan, Huanhuan Wu VLDB Endowment, 2014

OLTP Database Partitioning
1. "Skew-aware automatic database partitioning in shared-nothing, parallel OLTP systems " by Pavlo, Andrew, Carlo Curino, and Stanley Zdonik. SIGMOD, 2012
2. "E-Store: Fine-Grained Elastic Partitioning for Distributed Transaction Processing Systems" by Taft, Rebecca, Essam Mansour, Marco Serafini, Jennie Duggan, Aaron J. ElmoreΑ, Ashraf Aboulnaga, Andrew Pavlo, and Michael Stonebraker. VLDB, 2014

Architecture-Aware Dynamic Load Balancing for Scientific Simulations
1. " Communication and topology-aware load balancing in Charm++ with TreeMatch" by Jeannot, Emmanuel, Esteban Meneses, Guillaume Mercier, François Tessier, and Gengbin Zheng. CLUSTER, 2013
2. "NucoLB: A hierarchical approach for load balancing on parallel multi-core systems " by Pilla, Laércio Lima, Christiane Pousa Ribeiro, Daniel Cordeiro, Chao Mei, Abhinav Bhatele, Philippe Olivier Alexandre Navaux, Francois Broquedis, J. Mehaut, and Laxmikant V. Kale. ICPP, 2012
3. "HwTopoLB: A topology-aware load balancing algorithm for clustered hierarchical multi-core machines " by Pilla, Laércio L., Christiane P. Ribeiro, Pierre Coucheney, François Broquedis, Bruno Gaujal, Philippe OA Navaux, and Jean-François Méhaut. Future Generation Computer Systems, 2013