IICT - Idiap Publications

Update cookies preferences

Name:

IICT

Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models, Sandipana Dowerah, Atharva Kulkarni, Ajinkya Kulkarni, Hoan My Tran, Joonas Kalda, Artem Fedorchenko, Benoit Fauve, Damien Lolive, Tanel alumae and Mathew Magimai-Doss, in: IEEE Open Journal of Signal Processing, 7:73--81, 2026

[DOI]

Idiap kNN-TTS System for the Blizzard Challenge 2025, Enno Hermann, Karl El Hajal, Ajinkya Kulkarni and Mathew Magimai-Doss, in: Blizzard Challenge Workshop, 2025

attachment

kNN Retrieval for Simple and Effective Zero-Shot Multi-speaker Text-to-Speech, Karl El Hajal, Ajinkya Kulkarni, Enno Hermann and Mathew Magimai-Doss, in: Proceedings of the Annual Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), Albuquerque, New Mexico, ACL, 2025

attachment

[URL]

Multimodal Prosody Modeling: A Use Case for Multilingual Sentence Mode Prediction, Bogdan Vlasenko and Mathew Magimai-Doss, in: Proceedings of Interspeech, 2025

attachment

Unsupervised Rhythm and Voice Conversion of Dysarthric to Healthy Speech for ASR, Karl El Hajal, Enno Hermann, Ajinkya Kulkarni and Mathew Magimai-Doss, in: Proceedings of Workshop on Speech Pathology Analysis and DEtection (SPADE), Hyderabad, India, IEEE, 2025

attachment

[URL]

Unsupervised Rhythm and Voice Conversion to Improve ASR on Dysarthric Speech, Karl El Hajal, Enno Hermann, Sevada Hovsepyan and Mathew Magimai-Doss, in: Proceedings of Interspeech, Rotterdam, Netherlands, ISCA, 2025

attachment

[URL]

Unveiling Audio Deepfake Origins: A Deep Metric learning And Conformer Network Approach With Ensemble Fusion, Ajinkya Kulkarni, Dowerah Sandipana, Mathew Magimai-Doss and Tanel alumae, in: Proceedings of Interspeech, 2025

attachment

COMPARING DATA-DRIVEN AND HANDCRAFTED FEATURES FOR DIMENSIONAL EMOTION RECOGNITION, Bogdan Vlasenko, Sargam Vyas and Mathew Magimai-Doss, in: International Conference on Acoustics, Speech and Signal Processing, 2024

attachment

Cross-transfer Knowledge between Speech and Text Encoders to Evaluate Customer Satisfaction, Luis Felipe Parra-Gallego, Tilak Purohit, Bogdan Vlasenko, Juan Rafael Orozco-Arroyave and Mathew Magimai-Doss, in: Proceedings of Interspeech, Kos Island, Greece, ISCA, 2024

attachment

Exploring generalization to unseen audio data for spoofing: insights from SSL models, Atharva Kulkarni, Hoan My Tran, Ajinkya Kulkarni, Dowerah Sandipana, Damien Lolive and Mathew Magimai-Doss, in: ISCA Proceedings, Greece, 2024

[DOI]
[URL]

SSL-TTS: Leveraging Self-Supervised Embeddings and kNN Retrieval for Zero-Shot Multi-speaker TTS, Karl El Hajal, Ajinkya Kulkarni, Enno Hermann and Mathew Magimai-Doss, Idiap-Internal-RR-38-2024

attachment

[URL]

Towards interfacing large language models with ASR systems using confidence measures and prompting, Maryam Naderi, Enno Hermann, Alexandre Nanchen, Sevada Hovsepyan and Mathew Magimai-Doss, in: Proceedings of Interspeech, pages 2980-2984, 2024

attachment

[DOI]

Unveiling Biases while Embracing Sustainability: Assessing the Dual Challenges of Automatic Speech Recognition Systems, Ajinkya Kulkarni, Atharva Kulkarni, Miguel Couceiro and Isabel Trancoso, in: ISCA proceedings, Greece, pages 4, 2024

[DOI]
[URL]

What Does it Take to Generalize SER Model Across Datasets? A Comprehensive Benchmark, Adham Ibrahim, Shady Shehata, Ajinkya Kulkarni, Mukhtar Mohamed and Muhammad Abdul-Mageed, in: ISCA proceedings, Greece, 2024

[DOI]
[URL]

Few-shot Dysarthric Speech Recognition with Text-to-Speech Data Augmentation, Enno Hermann and Mathew Magimai-Doss, in: Proceedings of Interspeech, pages 156-160, 2023

attachment

[DOI]
[URL]

On matching data and model in LF-MMI-based dysarthric speech recognition, Enno Hermann, École polytechnique fédérale de Lausanne, 2023

attachment

[DOI]
[URL]

Using Commercial ASR Solutions to Assess Reading Skills in Children: A Case Report, Timothy Piton, Enno Hermann, Angela Pasqualotto, Marjolaine Cohen, Mathew Magimai-Doss and Daphné Bavelier, in: Proceedings of Interspeech, pages 4573-4577, 2023

attachment

[DOI]
[URL]

processing time: 0.0003 seconds.