Follow
Florian Metze
Florian Metze
Verified email at andrew.cmu.edu - Homepage
Title
Cited by
Cited by
Year
EESEN: End-to-end speech recognition using deep RNN models and WFST-based decoding
Y Miao, M Gowayyed, F Metze
2015 IEEE workshop on automatic speech recognition and understanding (ASRU …, 2015
9722015
Videoclip: Contrastive pre-training for zero-shot video-text understanding
H Xu, G Ghosh, PY Huang, D Okhonko, A Aghajanyan, F Metze, ...
arXiv preprint arXiv:2109.14084, 2021
5132021
Extracting deep bottleneck features using stacked auto-encoders
J Gehring, Y Miao, F Metze, A Waibel
2013 IEEE international conference on acoustics, speech and signal …, 2013
3832013
How2: a large-scale dataset for multimodal language understanding
R Sanabria, O Caglayan, S Palaskar, D Elliott, L Barrault, L Specia, ...
arXiv preprint arXiv:1811.00347, 2018
3012018
Learning joint embedding with multimodal cues for cross-modal video-text retrieval
NC Mithun, J Li, F Metze, AK Roy-Chowdhury
Proceedings of the 2018 ACM on international conference on multimedia …, 2018
2882018
Support-set bottlenecks for video-text representation learning
M Patrick, PY Huang, Y Asano, F Metze, A Hauptmann, J Henriques, ...
arXiv preprint arXiv:2010.02824, 2020
2802020
Keeping your eye on the ball: Trajectory attention in video transformers
M Patrick, D Campbell, Y Asano, I Misra, F Metze, C Feichtenhofer, ...
Advances in neural information processing systems 34, 12493-12506, 2021
2652021
A one-pass decoder based on polymorphic linguistic context assignment
H Soltau, F Metze, C Fugen, A Waibel
IEEE Workshop on Automatic Speech Recognition and Understanding, 2001. ASRU …, 2001
2512001
Masked autoencoders that listen
PY Huang, H Xu, J Li, A Baevski, M Auli, W Galuba, F Metze, ...
Advances in Neural Information Processing Systems 35, 28708-28720, 2022
2132022
A comparison of five multiple instance learning pooling functions for sound event detection with weak labeling
Y Wang, J Li, F Metze
ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019
2102019
Comparison of four approaches to age and gender recognition for telephone applications
F Metze, J Ajmera, R Englert, U Bub, F Burkhardt, J Stegmann, C Muller, ...
2007 IEEE International Conference on Acoustics, Speech and Signal …, 2007
1992007
A comparison of deep learning methods for environmental sound detection
J Li, W Dai, F Metze, S Qu, S Das
2017 IEEE International conference on acoustics, speech and signal …, 2017
1922017
How2sign: a large-scale multimodal dataset for continuous american sign language
A Duarte, S Palaskar, L Ventura, D Ghadiyaram, K DeHaan, F Metze, ...
Proceedings of the IEEE/CVF conference on computer vision and pattern …, 2021
1912021
Advances in automatic meeting record creation and access
A Waibel, M Bett, F Metze, K Ries, T Schaaf, T Schultz, H Soltau, H Yu, ...
2001 IEEE International Conference on Acoustics, Speech, and Signal …, 2001
1802001
Session independent non-audible speech recognition using surface electromyography
L Maier-Hein, F Metze, T Schultz, A Waibel
IEEE Workshop on Automatic Speech Recognition and Understanding, 2005., 331-336, 2005
1672005
Speaker adaptive training of deep neural network acoustic models using i-vectors
Y Miao, H Zhang, F Metze
IEEE/ACM Transactions on Audio, Speech, and Language Processing 23 (11 …, 2015
1472015
Effective dimensionality reduction for word embeddings
V Raunak, V Gupta, F Metze
Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP …, 2019
1412019
Vlm: Task-agnostic video-language model pre-training for video understanding
H Xu, G Ghosh, PY Huang, P Arora, M Aminzadeh, C Feichtenhofer, ...
arXiv preprint arXiv:2105.09996, 2021
1342021
Universal phone recognition with a multilingual allophone system
X Li, S Dalmia, J Li, M Lee, P Littell, J Yao, A Anastasopoulos, ...
ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and …, 2020
1342020
Deep maxout networks for low-resource speech recognition
Y Miao, F Metze, S Rawat
2013 IEEE Workshop on Automatic Speech Recognition and Understanding, 398-403, 2013
1292013
The system can't perform the operation now. Try again later.
Articles 1–20