Keywords:
- adaptation
- ADS-B data
- Aho-Corasick algorithm
- Air traffic control
- air traffic control communications
- AM
- Anti-spoofing
- ASR
- ASR robustness
- audio and voice analysis
- Automatic Speech Recognition
- automatic speech recognition (ASR)
- automatic speech recognition and understanding
- batch norm
- batch normalization
- bayesian fusion
- Call-sign Recognition
- Contextual Adaptation
- contextual biasing
- Contextualisation and adaptation of ASR
- conversational modeling
- Convolutional Neural Networks
- Criminal investigations
- Cross-modal Alignment
- Cross-modal Attentio
- Cross-modal Attention
- ctc
- Data Selection
- deep neural networks
- depression detection
- DISPLACE-2
- domain adaptation
- Domain Classification
- Dual mode encoder
- e2e-lfmmi
- ECAPA-TDNN embedding
- entity linking
- F1 score
- fine-tuning
- finite-state transducers
- FM
- Forensics
- GPU decoding
- Graph Neural Networks
- Human-Computer Interaction
- i-vector
- i-vectors
- Intent Classification
- inter-task fusion
- Interpretability
- Interpretable Models
- knowledge distillation
- language identification
- Language Production
- LEA
- limited training data
- Linear prediction
- LLM-based ASR
- local speaker segmentation
- logistic regression
- low-resource
- Mental Lexicon
- multi-lingual automatic speech recognition
- multi-lingual SAD
- Multilingual automatic speech recognition
- multitask learning
- multitask training
- named entity recognition
- network analysis
- node weighted graphs
- online speech recognition
- OOV-word recognition
- OpenSky Network
- OSINT
- out-of-domain
- prompt projection
- rare word recognition
- real-time ASR
- real-time speech recognition
- ROXANNE
- ROXSD
- self-supervised learning
- sentence embeddings
- Speaker change detection
- speaker clustering
- Speaker Diarization
- Speaker identification
- speaker recognition
- speaker role detection
- speaker turn detection
- speaker verification
- Speech activity detection
- speech dataset
- speech recognition
- Speech-to-LLM alignment
- spoken dialogue systems
- Spoken Language Understanding
- streaming ASR
- subspace Gaussian mixture models
- supervised adaptation
- task-oriented dialog
- Text classification
- text denoising
- Text fine-tuning
- transfer learning
- transformer transducer
- transformers
- user identity linkage
- wav2vec 2.0
- wav2vec2
- whisper
- Word Consensus Networks
- Word-Confusion-Networks
- XLSR
- XLSR-Transducer
- Zipformer
Publications of Srikanth Madikeri sorted by journal and type
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (2024)
| Dialog2Flow: Pre-training Soft-Contrastive Action-Driven Sentence Embeddings for Automatic Dialog Flow Extraction, , and , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, Florida, USA, pages 5421–5440, Association for Computational Linguistics, 2024 |
[URL] |
ECAI 2024 - 27th European Conference on Artificial Intelligence, October 19-24, 2024, Santiago de Compostela, Spain - Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings (2024)
| Entity Matching Across Small Networks Using Node Attributes, , , , , , , , , and , in: ECAI 2024 - 27th European Conference on Artificial Intelligence, October 19-24, 2024, Santiago de Compostela, Spain - Including 13th Conference on Prestigious Applications of Intelligent Systems (PAIS 2024), Proceedings, 2024 |
[DOI] |
Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) (2024)
| Fine-tuning Self-Supervised Models For Language Identification Using Orthonormal Constraint, , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP), pages 11921-11925, 2024 |
[DOI] |
Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 (2024)
| Multitask Speech Recognition and Speaker Change Detection for Unknown Number of Speakers, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Republic of Korea, pages 12592-12596, IEEE, 2024 |
[DOI] [URL] |
Odyssey 2024: The Speaker and Language Recognition Workshop (2024)
| Normalizing Flows for Speaker and Language Recognition Backend, , , , and , in: Odyssey 2024: The Speaker and Language Recognition Workshop, 2024 |
|
Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024 (2024)
| Probability-Aware Word-Confusion-Network-to-Text Alignment Approach for Intent Classification, , , , , , , and , in: Proceedings of the 49th IEEE International Conference on Acoustics, Speech, & Signal Processing (ICASSP) 2024, Seoul, Republic of Korea, pages 12617-12621, IEEE, 2024 |
[DOI] [URL] |
Odyssey 2024: The Speaker and Language Recognition Workshop (2024)
| ROXSD: The ROXANNE Multimodal and Simulated Dataset for Advancing Criminal Investigations, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and , in: Odyssey 2024: The Speaker and Language Recognition Workshop, pages 17-24, 2024 |
[DOI] [URL] |
Interspeech 2024 (2024)
| Speech and Language Recognition with Low-rank Adaptation of Pretrained Models, , , , and , in: Interspeech 2024, pages 2825--2829, 2024 |
[DOI] [URL] |
Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (2024)
| TokenVerse: Towards Unifying Speech and NLP Tasks via Transducer-based ASR, , , , , , , , and , in: Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 20988–20995, Association for Computational Linguistics (ACL), 2024 |
[DOI] [URL] |
Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (2023)
| Effectiveness of Text, Acoustic, and Lattice-based representations in Spoken Language Understanding tasks, , , , , , , , and , in: Proceedings of the 2023 IEEE International Conference on Acoustics, Speech and Signal Processing, 2023 |
|
Proc. Interspeech 2023 (2023)
| Implementing contextual biasing in GPU decoder for online ASR, , , , , , and , in: Proc. Interspeech 2023, pages 4494--4498, 2023 |
[DOI] [URL] |
Proceedings of Interspeech (2023)
| Node-weighted Graph Convolutional Network for Depression Detection in Transcribed Clinical Interviews, , , and , in: Proceedings of Interspeech, 2023 |
|
Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU'23 (2023)
| Parameter-Efficient Tuning With Adaptive Bottlenecks For Automatic Speech Recognition, , , , , and , in: Proc. of the IEEE Workshop on Automatic Speech Recognition and Understanding, ASRU'23, 2023 |
[DOI] |
Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval (2022)
| Expanded Lattice Embeddings for Spoken Document Retrieval on Informal Meetings, , , , and , in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, ACM, 2022 |
[DOI] |
The Speaker and Language Recognition Workshop (2022)
| Speaker recognition on mono-channel telephony recordings, , , , and , in: The Speaker and Language Recognition Workshop, 2022 |
|
2021 IEEE International Conference on Acoustics, Speech and Signal Processing (2021)
| A COMPARISON OF METHODS FOR OOV-WORD RECOGNITION ON A NEW PUBLIC DATASET, , and , in: 2021 IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE Signal Processing Society, Toronto, Ontario, Canada, 2021 |
|
Proceedings of Interspeech (2021)
| Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model, , and , in: Proceedings of Interspeech, 2021 |
[URL] |
1st ISCA Symposium on Security and Privacy in Speech Communication (2021)
| Graph2Speak: Improving Speaker Identification using Network Knowledge in Criminal Conversational Data, , , and , in: 1st ISCA Symposium on Security and Privacy in Speech Communication, pages 10--13, 2021 |
[DOI] |
Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (2021)
| LATTICE-FREE MMI ADAPTATION OF SELF-SUPERVISED PRETRAINED ACOUSTIC MODELS, , and , in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2021 |
[URL] |
Proceedings of Interspeech 2021 (2021)
| Multitask adaptation with Lattice-Free MMI for multi-genre speech recognition of low resource languages, , and , in: Proceedings of Interspeech 2021, 2021 |
|
Interspeech (2021)
| Speech Activity Detection Based on Multilingual Speech Recognition System, , and , in: Interspeech, 2021 |
|
Proceedings of ICASSP 2020 (2020)
| INCREMENTAL SEMI-SUPERVISED LEARNING FOR MULTI-GENRE SPEECH RECOGNITION, , , , , and , in: Proceedings of ICASSP 2020, 2020 |
|
In Proceedings of Interspeech 2020 (2020)
| Lattice-Free Maximum Mutual Information Training of Multilingual Speech Recognition System, , , , , and , in: In Proceedings of Interspeech 2020, pages 4746--4750, ISCA, 2020 |
|
Interspeech (2020)
| Supervised domain adaptation for text-independent speaker verification using limited data, , , and , in: Interspeech, pages 3815-3819, 2020 |
[URL] |
In Proceedings of ICASSP 2019 (2019)
| A BAYESIAN APPROACH TO INTER-TASK FUSION FOR SPEAKER RECOGNITION, , and , in: In Proceedings of ICASSP 2019, Brighton, ENGLAND, pages 5786-5790, 2019 |
|
11th International workshop on Models and Analysis of Vocal Emissions for Biomedical Applications (2019)
| AM-FM DECOMPOSITION OF SPEECH SIGNAL: APPLICATIONS FOR SPEECH PRIVACY AND DIAGNOSIS, , , , and , in: 11th International workshop on Models and Analysis of Vocal Emissions for Biomedical Applications, Universita Degli Studi Firenze, Firenze, Italy, 2019 |
[URL] |
Proceedings of ICASSP 2019 (2019)
| INCREMENTAL TRANSFER LEARNING IN TWO-PASS INFORMATION BOTTLENECK BASED SPEAKER DIARIZATION SYSTEM FOR MEETINGS, , , and , in: Proceedings of ICASSP 2019, pages 6291-6295, 2019 |
Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2019 (2019)
| SARAL: A Low-Resource Cross-Lingual Domain-Focused Information Retrieval System for Effective Rapid Document Triage, , , , , , , , , , , and , in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: System Demonstrations. 2019, pages 19-24, 2019 |
Proceedings of Interspeech 2018 (2018)
| Analysis of Language Dependent Front-End for Speaker Recognition, , and , in: Proceedings of Interspeech 2018, Hyderabad, INDIA, pages 1101-1105, 2018 |
[DOI] |
Proceedings of 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing (2018)
| DNN based speaker embedding using content information for text-dependent speaker verification, , , and , in: Proceedings of 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2018 |
|
Proceedings of Interspeech 2018 (2018)
| End-to-end text-dependent speaker verification using novel distance measures, , and , in: Proceedings of Interspeech 2018, Hyderabad, INDIA, Aug 02-Sep 06, 2018, pages 3598-3602, 2018 |
[DOI] |
Big Data and Artificial Intelligence for Military Decision Making (2018)
| SIIP: An Innovative Speaker Identification Approach for Law Enforcement Agencies, , , , , , , , , and , in: Big Data and Artificial Intelligence for Military Decision Making, http://www.sto.nato.int/, pages PT-1 - 1: PT-1 - 14, STO, 2018 |
[DOI] [URL] |
Proc. of Interspeech (2017)
| Content Normalization for Text-dependent Speaker Verification, , , and , in: Proc. of Interspeech, 2017 |
|
Proceedings of 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing (2017)
| EXPLOITING SEQUENCE INFORMATION FOR TEXT-DEPENDENT SPEAKER VERIFICATION, , , and , in: Proceedings of 2017 IEEE International Conference on Acoustics, Speech, and Signal Processing, New Orleans, pages 5370-5374, 2017 |
|
Proceedings of International Conference on Acoustics, Speech and Signal Processing (2017)
| INTRA-CLASS COVARIANCE ADAPTATION IN PLDA BACK-ENDS FOR SPEAKER VERIFICATION, , , and , in: Proceedings of International Conference on Acoustics, Speech and Signal Processing, pages 5365-5369, 2017 |
[DOI] |
European Intelligence and Security Informatics Conference (EISIC) 2017 (2017)
| Towards a breakthrough Speaker Identification approach for Law Enforcement Agencies: SIIP, , , , , , , , , and , in: European Intelligence and Security Informatics Conference (EISIC) 2017, Athenes, Greece, pages 32-39, IEEE Computer Society, 2017 |
[DOI] [URL] |
Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016) (2016)
| DEEP NEURAL NETWORK BASED POSTERIORS FOR TEXT-DEPENDENT SPEAKER VERIFICATION, , , and , in: Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), Shanghai, pages 5050-5054, IEEE, 2016 |
|
| INFORMATION THEORETIC CLUSTERING FOR UNSUPERVISED DOMAIN-ADAPTATION, , and , in: Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), Shanghai, pages 5580-5584, IEEE, 2016 |
|
Proceeedings of the INTERSPEECH (2016)
| Inter-task System Fusion for Speaker Recognition, , , , and , in: Proceeedings of the INTERSPEECH, 2016 |
|
Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016) (2016)
| SYSTEM FUSION AND SPEAKER LINKING FOR LONGITUDINAL DIARIZATION OF TV SHOWS, , , and , in: Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), Shanghai, pages 5495-5499, IEEE, 2016 |
|
Proceedings of Interspeech 2016 (2016)
| Two-Pass IB based Speaker Diarization System using Meeting-Specific ANN based Features, , , and , in: Proceedings of Interspeech 2016, pages 2199-2203, 2016 |
Proceedings of ICASSP 2015 (2015)
| COMBINING SGMM SPEAKER VECTORS AND KL-HMM APPROACH FOR SPEAKER DIARIZATION, , and , in: Proceedings of ICASSP 2015, pages 4834-4837, 2015 |
|
2015 IEEE International Conference on Acoustics, Speech, and Signal Processing (2015)
| EMPLOYMENT OF SUBSPACE GAUSSIAN MIXTURE MODELS IN SPEAKER RECOGNITION, , , and , in: 2015 IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, Brisbane, Australia, pages 4445-4449, 2015 |
[URL] |
Proceedings of Interspeech 2015 (2015)
| Integrating Online I-vector extractor with Information Bottleneck based Speaker Diarization system, , , and , in: Proceedings of Interspeech 2015, pages 3105-3109, 2015 |
|
Proceedings of ICASSP 2015 (2015)
| KL-HMM BASED SPEAKER DIARIZATION SYSTEM FOR MEETINGS, and , in: Proceedings of ICASSP 2015, pages 4435-4439, 2015 |
|
IEEE Automatic Speech Recognition and Understanding Workshop (2015)
| Towards utterance-based neural network adaptation in acoustic modeling, , , and , in: IEEE Automatic Speech Recognition and Understanding Workshop, pages 289-295, 2015 |
|
Proc. of Interspeech 2014 (2014)
| Feature Switching in the i-vector Framework for Speaker Verification, , , , and , in: Proc. of Interspeech 2014, pages 5, 2014 |