dpro: A generic performance diagnosis and optimization toolkit for expediting distributed dnn training H Hu, C Jiang, Y Zhong, Y Peng, C Wu, Y Zhu, H Lin, C Guo Proceedings of Machine Learning and Systems 4, 623-637, 2022 | 15 | 2022 |
GNNFlow: A Distributed Framework for Continuous Temporal GNN Learning on Dynamic Graphs Y Zhong, G Sheng, T Qin, M Wang, Q Gan, C Wu arXiv preprint arXiv:2311.17410, 2023 | 9 | 2023 |
Compressed communication for distributed training: Adaptive methods and system Y Zhong, C Xie, S Zheng, H Lin arXiv preprint arXiv:2105.07829, 2021 | 9 | 2021 |
Heta: Distributed Training of Heterogeneous Graph Neural Networks Y Zhong, J Su, C Wu, M Wang arXiv preprint arXiv:2408.09697, 2024 | 1 | 2024 |
SWIFT: Expedited Failure Recovery for Large-scale DNN Training Y Zhong, G Sheng, J Liu, J Yuan, C Wu IEEE Transactions on Parallel and Distributed Systems, 2024 | 1 | 2024 |