Keywords:
- automatic speech recognition (ASR)
- Cross-modal Alignment
- Data Selection
- Domain Classification
- F1 score
- Intent Classification
- knowledge distillation
- multitask learning
- multitask training
- named entity recognition
- pseudo-labelling
- shallow fusion
- Speaker change detection
- speaker turn detection
- speech recognition
- Spoken Language Understanding
- streaming transducer
- whisper
- Word-Confusion-Networks
- XLSR-Transducer
- Zipformer
Publications of Shashi Kumar sorted by journal and type
Publications of type Idiap-RR
2024
Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper, , , , , , , , and , Idiap-RR-10-2024 |
|
TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR, , , , , , , , and , Idiap-RR-07-2024 |
[URL] |
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models, , , , , , , and , Idiap-RR-08-2024 |
[URL] |
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (2024)
Fast Streaming Transducer ASR Prototyping via Knowledge Distillation with Whisper, , , , , , , , and , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, ACL, 2024 |
|
Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 (2024)
Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Republic of Korea, pages 12592-12596, IEEE, 2024 |
[DOI] [URL] |
Probability-Aware Word-Confusion-Network-to-Text Alignment Approach for Intent Classification, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Republic of Korea, pages 12617-12621, IEEE, 2024 |
[DOI] [URL] |
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (2024)
TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR, , , , , , , , and , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20988–20995, Association for Computational Linguistics (ACL), 2024 |
[URL] |