Keywords:
- Aho-Corasick algorithm
- ASR
- ASR robustness
- automatic speech recognition (ASR)
- Contextualisation and adaptation of ASR
- Cross-modal Alignment
- Data Selection
- domain adaptation
- Domain Classification
- Dual mode encoder
- embedding
- F1 score
- Foundation Models
- fvae-lora
- Intent Classification
- knowledge distillation
- language identification
- latent space factorization
- LLM
- LLM-based ASR
- LoRA
- low-rank adaptation
- multitask learning
- multitask training
- named entity recognition
- prompt projection
- pseudo-labelling
- real-time ASR
- self-supervised learning
- shallow fusion
- Speaker change detection
- speaker turn detection
- speech recognition
- Speech-to-LLM alignment
- speech-to-text alignment
- Spoken Language Understanding
- spurious correlation robustness
- streaming ASR
- streaming transducer
- text denoising
- Text fine-tuning
- transformer transducer
- whisper
- Word-Confusion-Networks
- XLSR
- XLSR-Transducer
- Zipformer
Publications of Shashi Kumar sorted by recency
| Latent Space Factorization in LoRA, , , , and , in: 39th Conference on Neural Information Processing Systems, 2025 |
[URL] |
| Fine-Tuning Pretrained Models with NVIB for Improved Generalisation, , , , , , , , , , and , in: Workshop on Spurious Correlation and Shortcut Learning: Foundations and Solutions, 2025 |
[URL] |
| Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering, , , , , , , , , , , , and , in: Interspeech 2025, Rotterdam, The Netherlands, pages 3618--3622, 2025 |
[DOI] [URL] |
| Unifying Global and Near-Context Biasing in a Single Trie Pass., , , , , , , , , , , and , in: Text, Speech, and Dialogue. TSD 2025. Lecture Notes in Computer Science, Springer, Springer, 2025 |
[DOI] [URL] |
| TokenVerse++: Towards Flexible Multitask Learning with Dynamic Task Activation, , , , , , , , , , and , in: 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), IEEE, 2025 |
|
| Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering, , , , , , , , , , , , and , in: Proc. Interspeech, 2025 |
|
| Performance Evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward, , , , , , , , , and , in: SALMA Workshop, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hyderabad, India, IEEE, 2025 |
[URL] |
| XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models, , , , , , , and , in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hyderabad, India, IEEE, 2025 |
[DOI] [URL] |
| Speech Data Selection for Efficient ASR Fine-Tuning using Domain Classifier and Pseudo-Label Filtering, , , , , , , , , , , and , in: 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025), 2025 |
[DOI] [URL] |
| Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper, , , , , , , , and , in: Findings of the Association for Computational Linguistics: EMNLP 2024, pages 16747–16762, Association for Computational Linguistics (ACL), 2024 |
[DOI] [URL] |
| TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR, , , , , , , , and , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20988–20995, Association for Computational Linguistics (ACL), 2024 |
[DOI] [URL] |
| Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper, , , , , , , , and , Idiap-RR-10-2024 |
|
| XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models, , , , , , , and , Idiap-RR-08-2024 |
[URL] |
| TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR, , , , , , , , and , Idiap-RR-07-2024 |
[URL] |
| Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Republic of Korea, pages 12592-12596, IEEE, 2024 |
[DOI] [URL] |
| Probability-Aware Word-Confusion-Network-to-Text Alignment Approach for Intent Classification, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Republic of Korea, pages 12617-12621, IEEE, 2024 |
[DOI] [URL] |