Publications

Download BibTeX.

2024
GCSM: GPU-Accelerated Continuous Subgraph Matching for Large Graphs.
Yihua Wei and Peng Jiang.
IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024, San Francisco, CA, USA, May 27-31, 2024.
2024
CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2.
Shihui Song, Yafan Huang, Peng Jiang, Xiaodong Yu, Weijian Zheng, Sheng Di, Qinglei Cao, Yunhe Feng, Zhen Xie, and Franck Cappello.
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2024, Pisa, Italy, June 3-7, 2024.
2024
cuKE: An Efficient Code Generator for Score Function Computation in Knowledge Graph Embedding.
Lihan Hu, Jing Li, and Peng Jiang.
IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024, San Francisco, CA, USA, May 27-31, 2024.
2023
PIMMiner: A High-performance PIM Architecture-aware Graph Mining Framework.
Jiya Su, Peng Jiang, and Rujia Wang.
CoRR.
2023
End-to-End LU Factorization of Large Matrices on GPUs.
Yang Xia, Peng Jiang, Gagan Agrawal, and Rajiv Ramnath.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2023, Montreal, QC, Canada, 25 February 2023 - 1 March 2023.
2022
STMatch: Accelerating Graph Pattern Matching on GPU with Stack-Based Loop Optimizations.
Yihua Wei and Peng Jiang.
Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC).
2022
SampleMine: A Framework for Applying Random Sampling to Subgraph Pattern Mining through Loop Perforation.
Peng Jiang, Yihua Wei, Jiya Su, Rujia Wang, and Bo Wu.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT).
2022
Exposing and Exploiting Fine-Grained Block Structures for Fast and Accurate Sparse Training.
Peng Jiang, Lihan Hu, and Shihui Song.
Advances in Neural Information Processing Systems (NeurIPS).
2022
Scaling and Selecting GPU Methods for All Pairs Shortest Paths Computations.
Yang Xia, Peng Jiang, Gagan Agrawal, and Rajiv Ramnath.
2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
2022
Rethinking Graph Data Placement for Graph Neural Network Training on Multiple GPUs.
Shihui Song and Peng Jiang.
Proceedings of the 36th ACM International Conference on Supercomputing (ICS).
2021
Scaling Sparse Matrix Multiplication on CPU-GPU Nodes.
Yang Xia, Peng Jiang, Gagan Agrawal, and Rajiv Ramnath.
2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
2021
Exploring PIM Architecture for High-Performance Graph Pattern Mining.
Jiya Su, Linfeng He, Peng Jiang, and Rujia Wang.
IEEE Computer Architecture Letters 20(2).
2021
Communication-Efficient Sampling for Distributed Training of Graph Convolutional Networks.
Peng Jiang and Masuma Akter Rumi.
CoRR.
2020
Scaling out speculative execution of finite-state machines with parallel merge.
Yang Xia, Peng Jiang, and Gagan Agrawal.
25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).
2020
A novel data transformation and execution strategy for accelerating sparse matrix multiplication on GPUs.
Peng Jiang, Changwan Hong, and Gagan Agrawal.
25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).
2020
Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning.
Masuma Akter Rumi, Xiaolong Ma, Yanzhi Wang, and Peng Jiang.
Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT).
2020
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning.
Peng Jiang and Gagan Agrawal.
CoRR.
2019
A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance Prediction.
Gangyi Zhu, Peng Jiang, and Gagan Agrawal.
28th International Conference on Parallel Architectures and Compilation Techniques (PACT).
2019
Enabling prefix sum parallelism pattern for recurrences with principled function reconstruction.
Yang Xia, Peng Jiang, and Gagan Agrawal.
Proceedings of the 28th International Conference on Compiler Construction (CC).
2018
Revealing parallel scans and reductions in recurrences through function reconstruction.
Peng Jiang, Linchuan Chen, and Gagan Agrawal.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (PACT).
2018
Conflict-free vectorization of associative irregular applications with recent SIMD architectural advances.
Peng Jiang and Gagan Agrawal.
Proceedings of the 2018 International Symposium on Code Generation and Optimization (CGO).
2018
A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication.
Peng Jiang and Gagan Agrawal.
Advances in Neural Information Processing Systems.
2017
Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigation.
Peng Jiang and Gagan Agrawal.
Proceedings of the International Conference on Supercomputing (ICS).
2017
Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative Speculation.
Peng Jiang and Gagan Agrawal.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP).
2016
Exploiting recent SIMD architectural advances for irregular applications.
Linchuan Chen, Peng Jiang, and Gagan Agrawal.
Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO).
2016
Reusing Data Reorganization for Efficient SIMD Parallelization of Adaptive Irregular Applications.
Peng Jiang, Linchuan Chen, and Gagan Agrawal.
Proceedings of the 2016 International Conference on Supercomputing (ICS).