Theme transformer: Symbolic music generation with theme-conditioned transformer YJ Shih, SL Wu, F Zalkow, M Müller, YH Yang IEEE Transactions on Multimedia 25, 3495-3508, 2022 | 63 | 2022 |
SpeechCLIP: Integrating speech with pre-trained vision and language model YJ Shih, HF Wang, HJ Chang, L Berry, H Lee, D Harwath 2022 IEEE Spoken Language Technology Workshop (SLT), 715-722, 2023 | 20 | 2023 |
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval L Berry, YJ Shih, HF Wang, HJ Chang, H Lee, D Harwath ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 6 | 2023 |
Av-superb: A multi-task evaluation benchmark for audio-visual representation models Y Tseng, L Berry, YT Chen, IH Chiu, HH Lin, M Liu, P Peng, YJ Shih, ... ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 3 | 2024 |
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data HF Wang, YJ Shih, HJ Chang, L Berry, P Peng, H Lee, HM Wang, ... arXiv preprint arXiv:2402.06959, 2024 | | 2024 |
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model HC Fang, NX Ye, YJ Shih, P Peng, HF Wang, L Berry, H Lee, D Harwath arXiv preprint arXiv:2402.05819, 2024 | | 2024 |