Keywords:
- adaptation
- ADS-B data
- Air traffic control
- air traffic control communications
- AM
- Anti-spoofing
- audio and voice analysis
- Automatic Speech Recognition
- automatic speech recognition (ASR)
- automatic speech recognition and understanding
- batch norm
- batch normalization
- bayesian fusion
- Call-sign Recognition
- Contextual Adaptation
- contextual biasing
- conversational modeling
- Convolutional Neural Networks
- Cross-modal Alignment
- Cross-modal Attentio
- Cross-modal Attention
- ctc
- Data Selection
- deep neural networks
- depression detection
- domain adaptation
- Domain Classification
- e2e-lfmmi
- entity linking
- F1 score
- fine-tuning
- finite-state transducers
- FM
- Forensics
- GPU decoding
- Graph Neural Networks
- Human-Computer Interaction
- i-vector
- i-vectors
- Intent Classification
- inter-task fusion
- Interpretability
- Interpretable Models
- knowledge distillation
- language identification
- Language Production
- LEA
- limited training data
- Linear prediction
- logistic regression
- low-resource
- Mental Lexicon
- multi-lingual automatic speech recognition
- multi-lingual SAD
- Multilingual automatic speech recognition
- multitask learning
- multitask training
- named entity recognition
- node weighted graphs
- online speech recognition
- OOV-word recognition
- OpenSky Network
- OSINT
- out-of-domain
- rare word recognition
- real-time speech recognition
- ROXANNE
- ROXSD
- sentence embeddings
- Speaker change detection
- speaker clustering
- Speaker identification
- speaker recognition
- speaker role detection
- speaker turn detection
- speaker verification
- Speech activity detection
- speech dataset
- speech recognition
- spoken dialogue systems
- Spoken Language Understanding
- subspace Gaussian mixture models
- supervised adaptation
- task-oriented dialog
- Text classification
- transfer learning
- transformers
- user identity linkage
- wav2vec 2.0
- wav2vec2
- whisper
- Word Consensus Networks
- Word-Confusion-Networks
- XLSR-Transducer
- Zipformer
Publications of Srikanth Madikeri sorted by title
R
ROXSD: The ROXANNE Multimodal and Simulated Dataset for Advancing Criminal Investigations, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and , in: Odyssey 2024: The Speaker and Language Recognition Workshop, pages 17-24, 2024 |
[DOI] [URL] |
S
SARAL: A Low-Resource Cross-Lingual Domain-Focused Information Retrieval System for Effective Rapid Document Triage, , , , , , , , , , , and , in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2019, pages 19-24, 2019 |
SIIP: An Innovative Speaker Identification Approach for Law Enforcement Agencies, , , , , , , , , and , in: Big Data and Artificial Intelligence for Military Decision Making, http://www.sto.nato.int/, pages PT-1 - 1: PT-1 - 14, STO, 2018 |
[DOI] [URL] |
Speaker Diarization and Linking of Meeting Data, , and , in: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(11):1935-1945, 2016 |
Speaker recognition on mono-channel telephony recordings, , , , and , in: The Speaker and Language Recognition Workshop, 2022 |
|
Speech Activity Detection Based on Multilingual Speech Recognition System, , and , in: Interspeech, 2021 |
|
Speech and Language Recognition with Low-rank Adaptation of Pretrained Models, , , , and , in: Interspeech 2024, pages 2825--2829, 2024 |
[DOI] [URL] |
STACKED NEURAL NETWORKS WITH PARAMETER SHARING FOR MULTILINGUAL LANGUAGE MODELING, , , , , and , Idiap-RR-12-2019 |
|
Supervised domain adaptation for text-independent speaker verification using limited data, , , and , in: Interspeech, pages 3815-3819, 2020 |
[URL] |
SYSTEM FUSION AND SPEAKER LINKING FOR LONGITUDINAL DIARIZATION OF TV SHOWS, , , and , in: Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), Shanghai, pages 5495-5499, IEEE, 2016 |
|
T
Template-matching for Text-dependent Speaker Verification, , , and , Idiap-RR-32-2017 |
|
Template-matching for Text-dependent Speaker Verification, , , and , in: Speech Communication, 2017 |
|
TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR, , , , , , , , and , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20988–20995, Association for Computational Linguistics (ACL), 2024 |
[URL] |
TokenVerse: Unifying Speech and NLP Tasks via Transducer-based ASR, , , , , , , , and , Idiap-RR-07-2024 |
[URL] |
Towards a breakthrough speaker identification approach for law enforcement agencies, , , , , , , , , and , Idiap-RR-29-2017 |
|
Towards a breakthrough Speaker Identification approach for Law Enforcement Agencies: SIIP, , , , , , , , , and , in: European Intelligence and Security Informatics Conference (EISIC) 2017, Athenes, Greece, pages 32-39, IEEE Computer Society, 2017 |
[DOI] [URL] |
Towards utterance-based neural network adaptation in acoustic modeling, , , and , in: IEEE Automatic Speech Recognition and Understanding Workshop, pages 289-295, 2015 |
|
Two-Pass IB based Speaker Diarization System using Meeting-Specific ANN based Features, , , and , Idiap-RR-09-2018 |
|
Two-Pass IB based Speaker Diarization System using Meeting-Specific ANN based Features, , , and , in: Proceedings of Interspeech 2016, pages 2199-2203, 2016 |
V
Voice Presentation Attack Detection Using Convolutional Neural Networks, , , , , and , in: Handbook of Biometric Anti-Spoofing, pages 391--415, Springer, 2019 |
[URL] |
X
XLSR-Transducer: Streaming ASR for Self-Supervised Pretrained Models, , , , , , , and , Idiap-RR-08-2024 |
[URL] |