Theme Transformer: Symbolic Music Generation with Theme-Conditioned Transformer YJ Shih, SL Wu, F Zalkow, M Muller, YH Yang IEEE Transactions on Multimedia, 2022 | 61 | 2022 |
SpeechCLIP: Integrating speech with pre-trained vision and language model YJ Shih, HF Wang, HJ Chang, L Berry, H Lee, D Harwath 2022 IEEE Spoken Language Technology Workshop (SLT), 715-722, 2023 | 19 | 2023 |
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval L Berry, YJ Shih, HF Wang, HJ Chang, H Lee, D Harwath ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and ¡K, 2023 | 6 | 2023 |
Av-superb: A multi-task evaluation benchmark for audio-visual representation models Y Tseng, L Berry, YT Chen, IH Chiu, HH Lin, M Liu, P Peng, YJ Shih, ... ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and ¡K, 2024 | 2 | 2024 |
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data HF Wang, YJ Shih, HJ Chang, L Berry, P Peng, H Lee, HM Wang, ... arXiv preprint arXiv:2402.06959, 2024 | | 2024 |
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model HC Fang, NX Ye, YJ Shih, P Peng, HF Wang, L Berry, H Lee, D Harwath arXiv preprint arXiv:2402.05819, 2024 | | 2024 |