Keywords:
- adaptation
- ADS-B data
- air traffic control communications
- AM
- Anti-spoofing
- audio and voice analysis
- Automatic Speech Recognition
- automatic speech recognition and understanding
- batch norm
- batch normalization
- bayesian fusion
- Call-sign Recognition
- Contextual Adaptation
- Convolutional Neural Networks
- Cross-modal Attentio
- Cross-modal Attention
- ctc
- deep neural networks
- depression detection
- e2e-lfmmi
- fine-tuning
- finite-state transducers
- FM
- Forensics
- GPU decoding
- Graph Neural Networks
- Human-Computer Interaction
- i-vector
- i-vectors
- inter-task fusion
- Interpretability
- Interpretable Models
- language identification
- Language Production
- LEA
- limited training data
- Linear prediction
- logistic regression
- low-resource
- Mental Lexicon
- multi-lingual automatic speech recognition
- multi-lingual SAD
- Multilingual automatic speech recognition
- node weighted graphs
- online speech recognition
- OOV-word recognition
- OpenSky Network
- OSINT
- out-of-domain
- real-time speech recognition
- speaker clustering
- Speaker identification
- speaker recognition
- speaker role detection
- speaker verification
- Speech activity detection
- speech dataset
- speech recognition
- Spoken Language Understanding
- subspace Gaussian mixture models
- supervised adaptation
- Text classification
- transfer learning
- transformers
- wav2vec 2.0
- wav2vec2
- Word Consensus Networks
Publications of Srikanth Madikeri
2023
An Automatic Speaker Clustering Pipeline for the Air Traffic Communication Domain, , , , , , and , in: Aerospace, 10, 2023 |
![]() [DOI] [URL] |
Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks, , , , , , , , and , in: Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, 2023 |
![]() |
Graph2Speak: Improving Speaker Identification using Network Knowledge in Criminal Conversational Data, , , and , Idiap-RR-01-2023 |
![]() [URL] |
Implementing contextual biasing in GPU decoder for online ASR, , , , , , and , Idiap-RR-02-2023 |
![]() |
Implementing contextual biasing in GPU decoder for online ASR, , , , , , and , in: Proc. Interspeech 2023, 2023 |
![]() |
Lessons Learned in Transcribing 5000 h of Air Traffic Control Communications for Robust Automatic Speech Understanding, , , , , , , , , , and , in: Aerospace, 10(10), 2023 |
![]() [DOI] [URL] |
Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews, , , and , Idiap-RR-03-2023 |
![]() |
Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews, , , and , in: Proceedings of Interspeech, 2023 |
![]() |
Parameter-Efficient Tuning With Adaptive Bottlenecks For Automatic Speech Recognition, , , , , and , in: Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU'23, 2023 |
![]() |
2022
Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings, , , , and , Idiap-RR-06-2022 |
![]() |
Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings, , , , and , in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2022 |
![]() [DOI] |
IDIAP SUBMISSION TO NIST LRE22 LANGUAGE RECOGNITION EVALUATION, , , and , Idiap-Internal-RR-54-2022 |
![]() |
Speaker recognition on mono-channel telephony recordings, , , , and , in: The Speaker and Language Recognition Workshop, 2022 |
![]() |
2021
A COMPARISON OF METHODS FOR OOV-WORD RECOGNITION ON A NEW PUBLIC DATASET, , and , in: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE Signal Processing Society, Toronto, Ontario, Canada, 2021 |
![]() |
Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model, , and , Idiap-RR-04-2021 |
![]() |
Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model, , and , in: Proceedings of Interspeech, 2021 |
[URL] |
Graph2Speak: Improving Speaker Identification using Network Knowledge in Criminal Conversational Data, , , and , in: 1st ISCA Symposium on Security and Privacy in Speech Communication, pages 10--13, 2021 |
[DOI] |
LATTICE-FREE MMI ADAPTATION OF SELF-SUPERVISED PRETRAINED ACOUSTIC MODELS, , and , in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2021 |
[URL] |
Multitask adaptation with Lattice-Free MMI for multi-genre speech recognition of low resource languages, , and , in: Proceedings of Interspeech 2021, 2021 |
![]() |
Speech Activity Detection Based on Multilingual Speech Recognition System, , and , in: Interspeech, 2021 |
![]() |
2020
A BAYESIAN APPROACH TO INTER-TASK FUSION FOR SPEAKER RECOGNITION, , and , Idiap-RR-07-2020 |
![]() |
AM-FM DECOMPOSITION OF SPEECH SIGNAL: APPLICATIONS FOR SPEECH PRIVACY AND DIAGNOSIS, , , , and , Idiap-RR-01-2020 |
![]() |
INCREMENTAL SEMI-SUPERVISED LEARNING FOR MULTI-GENRE SPEECH RECOGNITION, , , , , and , in: Proceedings of ICASSP 2020, 2020 |
![]() |
Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition System, , , , , and , Idiap-RR-28-2020 |
![]() |
Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition System, , , , , and , in: In Proceedings of Interspeech 2020, pages 4746--4750, ISCA, 2020 |
![]() |
LATTICE-FREE MMI ADAPTATION OF SELF-SUPERVISED PRETRAINED ACOUSTIC MODELS, , and , Idiap-RR-40-2020 |
![]() [URL] |
Novel Architectures for Unsupervised Information Bottleneck based Speaker Diarization of Meetings, , , and , Idiap-RR-26-2020 |
![]() |
Novel Architectures for Unsupervised Information Bottleneck based Speaker Diarization of Meetings, , , and , in: IEEE/ACM Transactions on Audio Speech and Language Processing, 2020 |
Supervised domain adaptation for text-independent speaker verification using limited data, , , and , in: Interspeech, pages 3815-3819, 2020 |
![]() [URL] |
2019
A BAYESIAN APPROACH TO INTER-TASK FUSION FOR SPEAKER RECOGNITION, , and , in: In Proceedings of ICASSP 2019, Brighton, ENGLAND, pages 5786-5790, 2019 |
![]() |
AM-FM DECOMPOSITION OF SPEECH SIGNAL: APPLICATIONS FOR SPEECH PRIVACY AND DIAGNOSIS, , , , and , in: 11th International workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, Universita Degli Studi Firenze, Firenze, Italy, 2019 |
![]() [URL] |
Idiap submission to the NIST SRE 2018 Speaker Recognition Evaluation, , , and , Idiap-RR-17-2019 |
![]() |
Idiap submission to the NIST SRE 2019 Speaker Recognition Evaluation, , , , and , Idiap-RR-15-2019 |
![]() |
INCREMENTAL TRANSFER LEARNING IN TWO-PASS INFORMATION BOTTLENECK BASED SPEAKER DIARIZATION SYSTEM FOR MEETINGS, , , and , in: Proceedings of ICASSP 2019, pages 6291-6295, 2019 |