2024 |
GCSM: GPU-Accelerated Continuous Subgraph Matching for Large Graphs.
IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024, San Francisco, CA, USA, May 27-31, 2024. |
2024 |
CereSZ: Enabling and Scaling Error-bounded Lossy Compression on Cerebras CS-2.
Proceedings of the 33rd International Symposium on High-Performance Parallel and Distributed Computing, HPDC 2024, Pisa, Italy, June 3-7, 2024. |
2024 |
cuKE: An Efficient Code Generator for Score Function Computation in Knowledge Graph Embedding.
IEEE International Parallel and Distributed Processing Symposium, IPDPS 2024, San Francisco, CA, USA, May 27-31, 2024. |
2023 |
PIMMiner: A High-performance PIM Architecture-aware Graph Mining Framework.
CoRR. |
2023 |
End-to-End LU Factorization of Large Matrices on GPUs.
Proceedings of the 28th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2023, Montreal, QC, Canada, 25 February 2023 - 1 March 2023. |
2022 |
STMatch: Accelerating Graph Pattern Matching on GPU with Stack-Based Loop Optimizations.
Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis (SC). |
2022 |
SampleMine: A Framework for Applying Random Sampling to Subgraph Pattern Mining through Loop Perforation.
Proceedings of the International Conference on Parallel Architectures and Compilation Techniques (PACT). |
2022 |
Exposing and Exploiting Fine-Grained Block Structures for Fast and Accurate Sparse Training.
Advances in Neural Information Processing Systems (NeurIPS). |
2022 |
Scaling and Selecting GPU Methods for All Pairs Shortest Paths Computations.
2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS). |
2022 |
Rethinking Graph Data Placement for Graph Neural Network Training on Multiple GPUs.
Proceedings of the 36th ACM International Conference on Supercomputing (ICS). |
2021 |
Scaling Sparse Matrix Multiplication on CPU-GPU Nodes.
2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS). |
2021 |
Exploring PIM Architecture for High-Performance Graph Pattern Mining.
IEEE Computer Architecture Letters 20(2). |
2021 |
Communication-Efficient Sampling for Distributed Training of Graph Convolutional Networks.
CoRR. |
2020 |
Scaling out speculative execution of finite-state machines with parallel merge.
25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). |
2020 |
A novel data transformation and execution strategy for accelerating sparse matrix multiplication on GPUs.
25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). |
2020 |
Accelerating Sparse CNN Inference on GPUs with Performance-Aware Weight Pruning.
Proceedings of the ACM International Conference on Parallel Architectures and Compilation Techniques (PACT). |
2020 |
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning.
CoRR. |
2019 |
A Methodology for Characterizing Sparse Datasets and Its Application to SIMD Performance Prediction.
28th International Conference on Parallel Architectures and Compilation Techniques (PACT). |
2019 |
Enabling prefix sum parallelism pattern for recurrences with principled function reconstruction.
Proceedings of the 28th International Conference on Compiler Construction (CC). |
2018 |
Revealing parallel scans and reductions in recurrences through function reconstruction.
Proceedings of the 27th International Conference on Parallel Architectures and Compilation Techniques (PACT). |
2018 |
Conflict-free vectorization of associative irregular applications with recent SIMD architectural advances.
Proceedings of the 2018 International Symposium on Code Generation and Optimization (CGO). |
2018 |
A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication.
Advances in Neural Information Processing Systems. |
2017 |
Efficient SIMD and MIMD parallelization of hash-based aggregation by conflict mitigation.
Proceedings of the International Conference on Supercomputing (ICS). |
2017 |
Combining SIMD and Many/Multi-core Parallelism for Finite State Machines with Enumerative Speculation.
Proceedings of the 22nd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). |
2016 |
Exploiting recent SIMD architectural advances for irregular applications.
Proceedings of the 2016 International Symposium on Code Generation and Optimization (CGO). |
2016 |
Reusing Data Reorganization for Efficient SIMD Parallelization of Adaptive Irregular Applications.
Proceedings of the 2016 International Conference on Supercomputing (ICS). |