Keywords:
- adaptation
- ADS-B data
- Aho-Corasick algorithm
- Air traffic control
- air traffic control communications
- AM
- Anti-spoofing
- ASR
- ASR robustness
- audio and voice analysis
- Automatic Speech Recognition
- automatic speech recognition (ASR)
- automatic speech recognition and understanding
- batch norm
- batch normalization
- bayesian fusion
- Call-sign Recognition
- Contextual Adaptation
- contextual biasing
- Contextualisation and adaptation of ASR
- conversational modeling
- Convolutional Neural Networks
- Criminal investigations
- Cross-modal Alignment
- Cross-modal Attentio
- Cross-modal Attention
- ctc
- Data Selection
- deep neural networks
- depression detection
- DISPLACE-2
- domain adaptation
- Domain Classification
- Dual mode encoder
- e2e-lfmmi
- ECAPA-TDNN embedding
- entity linking
- F1 score
- fine-tuning
- finite-state transducers
- FM
- Forensics
- GPU decoding
- Graph Neural Networks
- Human-Computer Interaction
- i-vector
- i-vectors
- Intent Classification
- inter-task fusion
- Interpretability
- Interpretable Models
- knowledge distillation
- language identification
- Language Production
- LEA
- limited training data
- Linear prediction
- LLM-based ASR
- local speaker segmentation
- logistic regression
- low-resource
- Mental Lexicon
- multi-lingual automatic speech recognition
- multi-lingual SAD
- Multilingual automatic speech recognition
- multitask learning
- multitask training
- named entity recognition
- network analysis
- node weighted graphs
- online speech recognition
- OOV-word recognition
- OpenSky Network
- OSINT
- out-of-domain
- prompt projection
- rare word recognition
- real-time ASR
- real-time speech recognition
- ROXANNE
- ROXSD
- self-supervised learning
- sentence embeddings
- Speaker change detection
- speaker clustering
- Speaker Diarization
- Speaker identification
- speaker recognition
- speaker role detection
- speaker turn detection
- speaker verification
- Speech activity detection
- speech dataset
- speech recognition
- Speech-to-LLM alignment
- spoken dialogue systems
- Spoken Language Understanding
- streaming ASR
- subspace Gaussian mixture models
- supervised adaptation
- task-oriented dialog
- Text classification
- text denoising
- Text fine-tuning
- transfer learning
- transformer transducer
- transformers
- user identity linkage
- wav2vec 2.0
- wav2vec2
- whisper
- Word Consensus Networks
- Word-Confusion-Networks
- XLSR
- XLSR-Transducer
- Zipformer
Publications of Srikanth Madikeri sorted by recency
| Better Semi-supervised Learning for Multi-domain ASR Through Incremental Retraining and Data Filtering, , , , , , , , , , , , and , in: Interspeech 2025, Rotterdam, The Netherlands, pages 3618--3622, 2025 |
[DOI] [URL] |
| Unifying Global and Near-Context Biasing in a Single Trie Pass., , , , , , , , , , , and , in: Text, Speech, and Dialogue. TSD 2025. Lecture Notes in Computer Science, Springer, Springer, 2025 |
[DOI] [URL] |
| TokenVerse++: Towards Flexible Multitask Learning with Dynamic Task Activation, , , , , , , , , , and , in: 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), IEEE, 2025 |
|
| Enhancing Speaker Diarization using Correlation-Based Clustering Initialization, , , and , Idiap-RR-09-2025 |
|
| Autocrime - open multimodal platform for combating organized crime, , , , , , , , , , , , , , , , , and , in: Forensic Science International: Digital Investigation, 54, 2025 |
[DOI] [URL] |
| Efficient Data Selection for Domain Adaptation of ASR Using Pseudo-Labels and Multi-Stage Filtering, , , , , , , , , , , , and , in: Proc. Interspeech, 2025 |
|
| TRACY Canvas: A Criminal Network Visualization Tool, , , , , and , Idiap-RR-03-2025 |
|
| XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models, , , , , , , and , in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Hyderabad, India, IEEE, 2025 |
[DOI] [URL] |
| Speech Data Selection for Efficient ASR Fine-Tuning using Domain Classifier and Pseudo-Label Filtering, , , , , , , , , , , and , in: 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2025), 2025 |
[DOI] [URL] |
| Minimum effort adaptation of automatic speech recognition system in air traffic management, , , , , , and , in: European Journal of Transport and Infrastructure Research, 24(4 (2024)):133–153, 2025 |
[DOI] [URL] |
| TEAM SWITZERLAND SUBMISSION TO NIST SRE24 SPEAKER RECOGNITION EVALUATION, , , , , , , , , , , , and , Idiap-RR-10-2025 |
|
| TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR, , , , , , , , and , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20988–20995, Association for Computational Linguistics (ACL), 2024 |
[DOI] [URL] |
| Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction, , and , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA, pages 5421–5440, Association for Computational Linguistics, 2024 |
[URL] |
| XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models, , , , , , , and , Idiap-RR-08-2024 |
[URL] |
| TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR, , , , , , , , and , Idiap-RR-07-2024 |
[URL] |
| Entity Matching Across Small Networks Using Node Attributes, , , , , , , , , and , in: ECAI 2024 - 27th European Conference on Artificial Intelligence, October 19-24, 2024, Santiago de Compostela, Spain - Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings, 2024 |
[DOI] |
| Speech and Language Recognition with Low-rank Adaptation of Pretrained Models, , , , and , in: Interspeech 2024, pages 2825--2829, 2024 |
[DOI] [URL] |
| ROXSD: The ROXANNE Multimodal and Simulated Dataset for Advancing Criminal Investigations, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and , in: Odyssey 2024: The Speaker and Language Recognition Workshop, pages 17-24, 2024 |
[DOI] [URL] |
| Normalizing Flows for Speaker and Language Recognition Backend, , , , and , in: Odyssey 2024: The Speaker and Language Recognition Workshop, 2024 |
|
| Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Republic of Korea, pages 12592-12596, IEEE, 2024 |
[DOI] [URL] |
| Fine-tuning Self-Supervised Models For Language Identification Using Orthonormal Constraint, , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP), pages 11921-11925, 2024 |
[DOI] |
| Probability-Aware Word-Confusion-Network-to-Text Alignment Approach for Intent Classification, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Republic of Korea, pages 12617-12621, IEEE, 2024 |
[DOI] [URL] |
| CONTEXTUAL BIASING METHODS FOR IMPROVING RARE WORD DETECTION IN AUTOMATIC SPEECH RECOGNITION, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Korea, 2024 |
|
| Parameter-Efficient Tuning With Adaptive Bottlenecks For Automatic Speech Recognition, , , , , and , in: Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU'23, 2023 |
[DOI] |
| Lessons Learned in Transcribing 5000 h of Air Traffic Control Communications for Robust Automatic Speech Understanding, , , , , , , , , , and , in: Aerospace, 10(10):898, 2023 |
[DOI] [URL] |
| An Automatic Speaker Clustering Pipeline for the Air Traffic Communication Domain, , , , , , and , in: Aerospace, 10(10):876, 2023 |
[DOI] [URL] |
| Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews, , , and , in: Proceedings of Interspeech, 2023 |
|
| Implementing contextual biasing in GPU decoder for online ASR, , , , , , and , in: Proc. Interspeech 2023, pages 4494--4498, 2023 |
[DOI] [URL] |
| Implementing contextual biasing in GPU decoder for online ASR, , , , , , and , Idiap-RR-02-2023 |
|
| Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks, , , , , , , , and , in: Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, 2023 |
|
| Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews, , , and , Idiap-RR-03-2023 |
|
| IDIAP SUBMISSION TO NIST LRE22 LANGUAGE RECOGNITION EVALUATION, , , and , Idiap-RR-11-2025 |
|
| Speaker recognition on mono-channel telephony recordings, , , , and , in: The Speaker and Language Recognition Workshop, 2022 |
|
| Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings, , , , and , in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2022 |
[DOI] |