Keywords:
- adaptation
- ADS-B data
- Air traffic control
- air traffic control communications
- AM
- Anti-spoofing
- audio and voice analysis
- Automatic Speech Recognition
- automatic speech recognition and understanding
- batch norm
- batch normalization
- bayesian fusion
- Call-sign Recognition
- Contextual Adaptation
- contextual biasing
- conversational modeling
- Convolutional Neural Networks
- Cross-modal Alignment
- Cross-modal Attentio
- Cross-modal Attention
- ctc
- deep neural networks
- depression detection
- domain adaptation
- e2e-lfmmi
- entity linking
- F1 score
- fine-tuning
- finite-state transducers
- FM
- Forensics
- GPU decoding
- Graph Neural Networks
- Human-Computer Interaction
- i-vector
- i-vectors
- Intent Classification
- inter-task fusion
- Interpretability
- Interpretable Models
- knowledge distillation
- language identification
- Language Production
- LEA
- limited training data
- Linear prediction
- logistic regression
- low-resource
- Mental Lexicon
- multi-lingual automatic speech recognition
- multi-lingual SAD
- Multilingual automatic speech recognition
- multitask learning
- multitask training
- named entity recognition
- node weighted graphs
- online speech recognition
- OOV-word recognition
- OpenSky Network
- OSINT
- out-of-domain
- rare word recognition
- real-time speech recognition
- ROXANNE
- ROXSD
- sentence embeddings
- Speaker change detection
- speaker clustering
- Speaker identification
- speaker recognition
- speaker role detection
- speaker turn detection
- speaker verification
- Speech activity detection
- speech dataset
- speech recognition
- spoken dialogue systems
- Spoken Language Understanding
- subspace Gaussian mixture models
- supervised adaptation
- task-oriented dialog
- Text classification
- transfer learning
- transformers
- user identity linkage
- wav2vec 2.0
- wav2vec2
- Word Consensus Networks
- Word-Confusion-Networks
- XLSR-Transducer
Publications of Srikanth Madikeri sorted by journal and type
Publications of type Idiap-RR
2020
AM-FM DECOMPOSITION OF SPEECH SIGNAL: APPLICATIONS FOR SPEECH PRIVACY AND DIAGNOSIS, , , , and , Idiap-RR-01-2020 |
|
Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition System, , , , , and , Idiap-RR-28-2020 |
|
LATTICE-FREE MMI ADAPTATION OF SELF-SUPERVISED PRETRAINED ACOUSTIC MODELS, , and , Idiap-RR-40-2020 |
[URL] |
Novel Architectures for Unsupervised Information Bottleneck based Speaker Diarization of Meetings, , , and , Idiap-RR-26-2020 |
|
2019
Idiap submission to the NIST SRE 2018 Speaker Recognition Evaluation, , , and , Idiap-RR-17-2019 |
|
Idiap submission to the NIST SRE 2019 Speaker Recognition Evaluation, , , , and , Idiap-RR-15-2019 |
|
INVESTIGATING TIME DELAY NEURAL NETWORK (TDNN) FOR LANGUAGE MODELING IN LOW RESOURCE AUTOMATIC SPEECH RECOGNITION, , , and , Idiap-RR-13-2019 |
|
STACKED NEURAL NETWORKS WITH PARAMETER SHARING FOR MULTILINGUAL LANGUAGE MODELING, , , , , and , Idiap-RR-12-2019 |
|
2018
Analysis of Posterior Estimation Approaches to I-vector Extraction for Speaker Recognition, , , and , Idiap-RR-15-2018 |
|
DNN based speaker embedding using content information for text-dependent speaker verification, , , and , Idiap-RR-06-2018 |
|
Two-Pass IB based Speaker Diarization System using Meeting-Specific ANN based Features, , , and , Idiap-RR-09-2018 |
|
2017
CONTENT NORMALIZATION FOR TEXT-INDEPENDENT SPEAKER VERIFICATION, , , and , Idiap-RR-31-2017 |
|
EXPLOITING SEQUENCE INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION, , , and , Idiap-RR-04-2017 |
|
INTRA-CLASS COVARIANCE ADAPTATION IN PLDA BACK-ENDS FOR SPEAKER VERIFICATION, , , and , Idiap-RR-05-2017 |
|
Template-matching for Text-dependent Speaker Verification, , , and , Idiap-RR-32-2017 |
|
Towards a breakthrough speaker identification approach for law enforcement agencies, , , , , , , , , and , Idiap-RR-29-2017 |
|
2016
DEEP NEURAL NETWORK BASED POSTERIORS FOR TEXT-DEPENDENT SPEAKER VERIFICATION, , , and , Idiap-RR-08-2016 |
|
IDIAP SUBMISSION TO THE NIST SRE 2016 SPEAKER RECOGNITION EVALUATION, , , , and , Idiap-RR-32-2016 |
|
Implementation of the Standard I-vector System for the Kaldi Speech Recognition Toolkit, , , and , Idiap-RR-26-2016 |
|
INFORMATION THEORETIC CLUSTERING FOR UNSUPERVISED DOMAIN-ADAPTATION, , and , Idiap-RR-09-2016 |
|
2015
COMBINING SGMM SPEAKER VECTORS AND KL-HMM APPROACH FOR SPEAKER DIARIZATION, , and , Idiap-RR-17-2015 |
|
EMPLOYMENT OF SUBSPACE GAUSSIAN MIXTURE MODELS IN SPEAKER RECOGNITION, , , and , Idiap-RR-16-2015 |
|
Improving Real Time Factor of Information Bottleneck-based Speaker Diarization System, , and , Idiap-RR-18-2015 |
|
Integrating Online I-vector extractor with Information Bottleneck based Speaker Diarization system, , , and , Idiap-RR-20-2015 |
|
KL-HMM BASED SPEAKER DIARIZATION SYSTEM FOR MEETINGS, and , Idiap-RR-19-2015 |
|
Aerospace
An Automatic Speaker Clustering Pipeline for the Air Traffic Communication Domain, , , , , , and , in: Aerospace, 10(10):876, 2023 |
[DOI] [URL] |
Lessons Learned in Transcribing 5000 h of Air Traffic Control Communications for Robust Automatic Speech Understanding, , , , , , , , , , and , in: Aerospace, 10(10):898, 2023 |
[DOI] [URL] |
IEEE Signal Processing Letters
A Large-Scale Open-Source Acoustic Simulator for Speaker Recognition, , , , and , in: IEEE Signal Processing Letters, 23(4):527 - 531, 2016 |
|
IEEE/ACM Transactions on Audio Speech and Language Processing
Novel Architectures for Unsupervised Information Bottleneck based Speaker Diarization of Meetings, , , and , in: IEEE/ACM Transactions on Audio Speech and Language Processing, 2020 |
IEEE/ACM Transactions on Audio, Speech, and Language Processing
Speaker Diarization and Linking of Meeting Data, , and , in: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 24(11):1935-1945, 2016 |
International Journal of Speech Techonology
MODIFIED GROUP DELAY FEATURE BASED TOTAL VARIABILITY SPACE MODELLING FOR SPEAKER RECOGNITION, , and , in: International Journal of Speech Techonology, 18(1):17-23, 2014 |
[DOI] |
Speech Communication
Template-matching for Text-dependent Speaker Verification, , , and , in: Speech Communication, 2017 |
|
Handbook of Biometric Anti-Spoofing (2019)
Voice Presentation Attack Detection Using Convolutional Neural Networks, , , , , and , in: Handbook of Biometric Anti-Spoofing, pages 391--415, Springer, 2019 |
[URL] |
Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 (2024)
CONTEXTUAL BIASING METHODS FOR IMPROVING RARE WORD DETECTION IN AUTOMATIC SPEECH RECOGNITION, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Korea, 2024 |
|
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (2024)
Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction, , and , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA, pages 5421–5440, Association for Computational Linguistics, 2024 |
[URL] |
ECAI 2024 - 27th European Conference on Artificial Intelligence, October 19-24, 2024, Santiago de Compostela, Spain - Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings (2024)
Entity Matching Across Small Networks Using Node Attributes, , , , , , , , , and , in: ECAI 2024 - 27th European Conference on Artificial Intelligence, October 19-24, 2024, Santiago de Compostela, Spain - Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings, 2024 |
[DOI] |
Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) (2024)
Fine-tuning Self-Supervised Models For Language Identification Using Orthonormal Constraint, , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP), pages 11921-11925, 2024 |
[DOI] |
Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 (2024)
Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Republic of Korea, pages 12592-12596, IEEE, 2024 |
[DOI] [URL] |
Odyssey 2024: The Speaker and Language Recognition Workshop (2024)
Normalizing Flows for Speaker and Language Recognition Backend, , , , and , in: Odyssey 2024: The Speaker and Language Recognition Workshop, 2024 |
|
Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 (2024)
Probability-Aware Word-Confusion-Network-to-Text Alignment Approach for Intent Classification, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Republic of Korea, pages 12617-12621, IEEE, 2024 |
[DOI] [URL] |
Odyssey 2024: The Speaker and Language Recognition Workshop (2024)
ROXSD: The ROXANNE Multimodal and Simulated Dataset for Advancing Criminal Investigations, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and , in: Odyssey 2024: The Speaker and Language Recognition Workshop, pages 17-24, 2024 |
[DOI] [URL] |
Interspeech 2024 (2024)
Speech and Language Recognition with Low-rank Adaptation of Pretrained Models, , , , and , in: Interspeech 2024, pages 2825--2829, 2024 |
[DOI] [URL] |
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (2024)
TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR, , , , , , , , and , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20988–20995, Association for Computational Linguistics (ACL), 2024 |
[URL] |
Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (2023)
Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks, , , , , , , , and , in: Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, 2023 |
|
Proc. Interspeech 2023 (2023)
Implementing contextual biasing in GPU decoder for online ASR, , , , , , and , in: Proc. Interspeech 2023, 2023 |
|
Proceedings of Interspeech (2023)
Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews, , , and , in: Proceedings of Interspeech, 2023 |
|
Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU'23 (2023)
Parameter-Efficient Tuning With Adaptive Bottlenecks For Automatic Speech Recognition, , , , , and , in: Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU'23, 2023 |
|
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (2022)
Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings, , , , and , in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2022 |
[DOI] |
The Speaker and Language Recognition Workshop (2022)
Speaker recognition on mono-channel telephony recordings, , , , and , in: The Speaker and Language Recognition Workshop, 2022 |
|
2021 IEEE International Conference on Acoustics, Speech and Signal Processing (2021)
A COMPARISON OF METHODS FOR OOV-WORD RECOGNITION ON A NEW PUBLIC DATASET, , and , in: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE Signal Processing Society, Toronto, Ontario, Canada, 2021 |
|