Keywords:
- ASR
- automatic speech recognition (ASR)
- Contextual Adaptation
- Cross-modal Alignment
- Cross-modal Attentio
- Cross-modal Attention
- Data Selection
- Domain Classification
- embedding
- F1 score
- finite-state transducers
- Foundation Models
- GPU decoding
- Human-Computer Interaction
- Intent Classification
- knowledge distillation
- LLM
- multitask learning
- multitask training
- named entity recognition
- pseudo-labelling
- real-time speech recognition
- self-supervised learning
- shallow fusion
- Speaker change detection
- speaker turn detection
- speech recognition
- speech-to-text alignment
- Spoken Language Understanding
- streaming ASR
- streaming transducer
- transformer transducer
- whisper
- Word Consensus Networks
- Word-Confusion-Networks
- XLSR
- XLSR-Transducer
- Zipformer
Publications of Aravind Ganapathiraju
2025
Performance Evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward, , , , , , , , , and , in: SALMA Workshop, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hyderabad, India, IEEE, 2025 |
![]() |
Speech Data Selection for Efficient ASR Fine-Tuning using Domain Classifier and Pseudo-Label Filtering, , , , , , , , , , , and , in: 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025), 2025 |
![]() |
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models, , , , , , , and , in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hyderabad, India, IEEE, 2025 |
![]() |
2024
Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper, , , , , , , , and , Idiap-RR-10-2024 |
![]() |
Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper, , , , , , , , and , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, ACL, 2024 |
![]() |
Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Republic of Korea, pages 12592-12596, IEEE, 2024 |
[DOI] [URL] |
Probability-Aware Word-Confusion-Network-to-Text Alignment Approach for Intent Classification, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Republic of Korea, pages 12617-12621, IEEE, 2024 |
![]() [DOI] [URL] |
TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR, , , , , , , , and , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20988–20995, Association for Computational Linguistics (ACL), 2024 |
![]() [URL] |
TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR, , , , , , , , and , Idiap-RR-07-2024 |
![]() [URL] |
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models, , , , , , , and , Idiap-RR-08-2024 |
![]() [URL] |
2023
Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks, , , , , , , , and , in: Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, 2023 |
![]() |
Implementing contextual biasing in GPU decoder for online ASR, , , , , , and , Idiap-RR-02-2023 |
![]() |
Implementing contextual biasing in GPU decoder for online ASR, , , , , , and , in: Proc. Interspeech 2023, 2023 |
![]() |
2022
Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings, , , , and , Idiap-RR-06-2022 |
![]() |
Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings, , , , and , in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2022 |
![]() [DOI] |