How to protect copyright data in optimization of large language models? T Chu*, Z Song*, C Yang* Proceedings of the AAAI Conference on Artificial Intelligence 38 (16), 17871 …, 2024 | 44 | 2024 |
Towards Infinite-Long Prefix in Transformer Y Liang*, Z Shi*, Z Song*, C Yang* arXiv preprint arXiv:2406.14036, 2024 | 21 | 2024 |
Unmasking transformers: A theoretical approach to data recovery via attention weights Y Deng*, Z Song*, S Xie*, C Yang* arXiv preprint arXiv:2310.12462, 2023 | 13 | 2023 |
How Sparse Attention Approximates Exact Attention? Your Attention is Naturally Sparse Y Deng*, Z Song*, J Xiong*, C Yang* arXiv preprint arXiv:2404.02690, 2024 | 12* | 2024 |
Curse of attention: A kernel-based perspective for why transformers fail to generalize on time series forecasting and beyond Y Ke*, Y Liang*, Z Shi*, Z Song*, C Yang* The Second Conference on Parsimony and Learning (Proceedings Track), 2024 | 4 | 2024 |
Video Latent Flow Matching: Optimal Polynomial Projections for Video Interpolation and Extrapolation Y Cao*, Z Song*, C Yang* arXiv preprint arXiv:2502.00500, 2025 | 3 | 2025 |
Theoretical Foundation of Flow-Based Time Series Generation: Provable Approximation, Generalization, and Efficiency J Long, Z Song, C Yang arXiv preprint arXiv:2503.14076, 2025 | | 2025 |
ParallelComp: Parallel Long-Context Compressor for Length Extrapolation J Xiong, J Shen, C Zheng, Z Wan, C Zhao, C Yang, F Ye, H Yang, L Kong, ... arXiv preprint arXiv:2502.14317, 2025 | | 2025 |
Unlock the Theory behind Scaling 1-bit Neural Networks M Daliri, Z Song, C Yang The Second Conference on Parsimony and Learning (Proceedings Track), 2025 | | 2025 |