Theme transformer: Symbolic music generation with theme-conditioned transformer YJ Shih, SL Wu, F Zalkow, M Müller, YH Yang IEEE Transactions on Multimedia 25, 3495-3508, 2022 | 86 | 2022 |
SpeechCLIP: Integrating speech with pre-trained vision and language model YJ Shih, HF Wang, HJ Chang, L Berry, H Lee, D Harwath 2022 IEEE Spoken Language Technology Workshop (SLT), 715-722, 2023 | 33 | 2023 |
M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval L Berry, YJ Shih, HF Wang, HJ Chang, H Lee, D Harwath ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and …, 2023 | 10 | 2023 |
Av-superb: A multi-task evaluation benchmark for audio-visual representation models Y Tseng, L Berry, YT Chen, IH Chiu, HH Lin, M Liu, P Peng, YJ Shih, ... ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and …, 2024 | 8 | 2024 |
SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data HF Wang, YJ Shih, HJ Chang, L Berry, P Peng, H Lee, HM Wang, ... arXiv preprint arXiv:2402.06959, 2024 | 2 | 2024 |
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks C Huang, WC Chen, S Yang, AT Liu, CA Li, YX Lin, WC Tseng, A Diwan, ... arXiv preprint arXiv:2411.05361, 2024 | 1 | 2024 |
Interface Design for Self-Supervised Speech Models YJ Shih, D Harwath arXiv preprint arXiv:2406.12209, 2024 | 1 | 2024 |
Measuring Sound Symbolism in Audio-visual Models WC Tseng, YJ Shih, D Harwath, R Mooney arXiv preprint arXiv:2409.12306, 2024 | | 2024 |
Self-supervised Speech Models for Word-Level Stuttered Speech Detection YJ Shih, Z Gkalitsiou, AG Dimakis, D Harwath arXiv preprint arXiv:2409.10704, 2024 | | 2024 |
Integrating Self-supervised Speech Model with Pseudo Word-level Targets from Visually-grounded Speech Model HC Fang, NX Ye, YJ Shih, P Peng, HF Wang, L Berry, H Lee, D Harwath arXiv preprint arXiv:2402.05819, 2024 | | 2024 |