Keywords:
- acoustic modeling
- Ad hoc array calibration
- Ad hoc microphone array calibration
- Ad-hoc microphone array
- Ad-hoc microphone calibration
- adaptive training
- Adequacy of diffuseness
- AFC
- Afrikaans
- Artificial Neural Networks
- assessment method
- association rules
- audio–visual speech synchrony
- Auto-associative multilayer perceptrons
- autoencoders
- automatic disambiguation
- Automatic prosodic event detection
- Automatic Speech Recognition
- automatic speech recognition (ASR)
- Autoregressive modeling
- Bayesian recognition
- binary masking
- Binary pattern matching
- bottleneck
- Broadband beam-pattern
- Cadzow algorithm
- canonical correlation analysis
- Cerebral Palsy
- chain models
- Channel selection
- clinical application
- Compressive Acoustic Measurements
- Compressive sampling
- Compressive Sensing
- computer vision
- connectionist temporal classification (ctc)
- Conversational technologies
- Convex optimization
- Convolutive source separation
- crosslingual adaptation
- ctc
- data analysis
- data utility
- deep MLPs
- Deep neural network
- Deep neural network (DNN)
- Deep neural network posterior features
- Deep neural network posterior probabilities
- deep neural networks
- Delay-and-sum beamformer
- Dictionary learning
- difference features
- Diffuse field coherence model
- Diffuse noise coherence
- Diffuse sound coherence model
- Diffuse sound field
- Digital IIR Filters
- Digital IIR Filters
- Directivity
- discourse connectives
- Distant speech recognition
- Distributed source localization.
- dnn
- dnn-based speech recognition
- duration models
- Dynamic Bayesian Network
- Dysarthria
- e2e-lfmmi
- error correction
- Euclidean distance matrix
- evidence combination
- exemplar-based modeling
- far-field asr
- far-field speech
- Fast $k$NN
- fast adaptation
- fast training
- FC
- floor control
- Fujisaki Model
- full combination
- gaming
- Generalized cross correlation
- Generalized Trust Region Subproblem (GTRS).
- Graphemes
- Grassmannian discriminant analysis
- hidden variable
- high-dimensional sparse representations
- HMM/ANN-Hybrid
- HMMs
- human behaviour analysis
- hybrid system
- Image Model
- information bottleneck
- Information Bottleneck clustering
- Information Retrieval
- intelligibility
- Joint sparse recovery
- k-nearest neighbor (kNN) search
- Keyword Detection
- Keyword spotting detection
- KL-divergence
- KL-HMM
- kNN classifier
- Kullback-Leibler divergence
- Kullback-Leibler divergence based hidden Markov model
- Laplacian speech modeling
- Linguistic parsing
- Low bit rate speech vocoding
- low-rank representation (LRR)
- low-rank sparsity
- Matrix completion
- microphone array
- Microphone array calibration
- missing data
- missing features
- ML-adaptation
- Model-Based Compressive Sensing
- Model-based sparse recovery
- models
- multi-band
- multi-band combination
- Multi-party Speech Recognition
- Multi-speaker Localization
- multi-stream
- multi-stream processing
- multiband
- multilayer perceptron
- multilingual acoustic modeling
- multilingual ASR
- multilingual speech recognition
- Multimodal interaction
- multimodal signal processing
- multimodal speaker diarisation
- Multiparty Conversation
- multiparty meetings
- multiple time scales
- mutual information
- nearest neighbour rule of classification.
- neural network
- neural network features
- neural networks
- Noise
- noise adaptation
- noise annoyance
- noise intrusiveness
- noise reduction
- noise robust ASR
- Noisy Text
- Non-negative matrix factorization
- non-verbal features
- Objective intelligibility
- objective measures
- open-architecture distributed system
- Overlap speech
- Overlapping Speech
- overlapping speech recognition
- P-ESTOI
- Pairwise distance estimation
- PCA
- perceptual quality assessment
- Phase transform
- Phone posterior
- Phoneme classification
- phonemes
- Phonological features
- Phonological posterior
- phonological posteriors
- posterior feature
- Posterior feature space
- Posterior features
- Posterior hashing
- posterior probability
- Posterior probability structures
- Posterior representatives
- posterior space properties
- posterior-based metrics
- Principle component analysis
- Pronunciation dewarping
- Prosody Modelling
- Psychoacoustics
- Quantized posterior hashing
- query by example
- real-time processing
- reliability
- representation learning
- Reverberant enclosure
- Reverberation
- robust ASR
- Robust microphone placement
- robust recognition
- Room acoustic characterization
- Room acoustic estimation
- Room Geometry
- Room geometry estima- tion
- S-stress
- Semidefinite programming
- sensor fusion
- Single-channel source localization
- Singular value decomposition
- Social Behaviour Analysis
- Social Interactions
- Social Signal Processing
- Social signals
- soft targets
- sound source localization
- Source localization
- sparse autoencoder
- sparse coding
- Sparse Component Analysis
- Sparse modeling
- sparse overcomplete autoencoder
- Sparse Recovery
- sparse representation
- Sparse Signal Recovery
- Sparse word posterior probabilities
- sparsity
- Speaker Diarization
- Speaker localization
- speaker turn
- spectral amplitude estimation
- Spectral subspace
- Speech
- Speech Analysis
- Speech dereverberation
- Speech enhancement
- Speech intelligibility
- speech modeling
- speech processing
- speech quality
- speech recognition
- speech separation
- Speech source localization
- speech sparsity
- Speech spectral structures
- speech synthesis
- Spoken Documents Retrieval
- spoken term detection
- spontaneous meeting recordings
- Statistical Machine Translation
- Structural similarity measure
- Structured Sparse Coding
- Structured sparse representation
- Structured sparsity
- structured sparsity models
- subbands
- subjective evaluation
- subjective testing
- Subspace detection
- Superdirective beamformer
- SVD
- synchronisation
- synthetic reference templates.
- Tandem
- template-based approach
- temporal modulations
- temporal subspace
- text-to-speech synthesis
- triphone mapping
- TTS
- un- derdetermined convolutive speech separation
- under-resourced languages
- under-resourced speech recognition
- universal phoneme set
- utterance verification
- verb tense
- wav2vec 2.0
- weighting
- word emphasis
Publications of Hervé Bourlard sorted by recency
Exploiting Low-dimensional Structures to Enhance DNN based Acoustic Modeling in Speech Recognition, , , and , in: Proceedings of 2016 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP 2016), Shanghai, pages 5690-5694, IEEE, 2016 |
|
Phonetic and Phonological Posterior Search Space Hashing Exploiting Class-Specific Sparsity Structures, , , and , Idiap-RR-10-2016 |
|
On Structured Sparsity of Phonological Posteriors for Linguistic Parsing, , and , Idiap-RR-07-2016 |
[URL] |
Subspace Detection of DNN Posterior Probabilities via Sparse Representation for Query by Example Spoken Term Detection, , and , Idiap-RR-06-2016 |
|
Low-Rank Representation of Nearest Neighbor Phone Posterior Probabilities to Enhance DNN Acoustic Modeling, , , and , Idiap-RR-04-2016 |
|
Sound Pattern Matching for Automatic Prosodic Event Detection, , , , and , Idiap-RR-03-2016 |
|
Sparse Subspace Modeling for Query by Example Spoken Term Detection, , and , Idiap-RR-01-2016 |
|
On Compressibility of Neural Network phonological Features for Low Bit Rate Speech Coding, , and , in: Proceeding of Interspeech, pages 418-422, ISCA, 2015 |
|
Predicting the intrusiveness of noise through sparse coding with auditory kernels, and , in: Speech Communication, 76:186-200, 2016 |
[DOI] [URL] |
Binary Sparse Coding of Convolutive Mixtures for Sound Localization and Separation via Spatialization, , , , , and , in: IEEE Transactions on Signal Processing, 64(3):567-579, 2016 |
[DOI] |
Spatial Sound Localization via Multipath Euclidean Distance Matrix Recovery, , , , and , in: IEEE Journal of Selected Topics in Signal Processing, 9(5):802-814, 2015 |
|
Computational Methods for Underdetermined Convolutive Speech Localization and Separation via Model-based Sparse Component Analysis, , , and , in: Speech Communication, 76:201-217, 2016 |
|
Dictionary Learning for Sparse Representation of Neural Network Exemplars in Speech Recognition, , and , in: Proceedings of SPARS 2015: Workshop on Signal Processing with Adaptive Sparse Structured Representations, 2015, pages 1093, 2015 |
|
Sparse Modeling of Neural Network Posterior Probabilities for Exemplar-Based Speech Recognition, , and , in: Proceedings of SPARS 2015: Workshop on Signal Processing with Adaptive Sparse Structured Representations, 2015 |
|
Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification, , , and , in: Proceedings of Interspeech, Dresden, Germany, pages 3501-3505, 2015 |
[URL] |
KL-HMM BASED SPEAKER DIARIZATION SYSTEM FOR MEETINGS, and , in: Proceedings of ICASSP 2015, pages 4435-4439, 2015 |
|
COMBINING SGMM SPEAKER VECTORS AND KL-HMM APPROACH FOR SPEAKER DIARIZATION, , and , in: Proceedings of ICASSP 2015, pages 4834-4837, 2015 |
|
Sparse Modeling of Posterior Exemplars for Keyword Detection, , , and , in: Proceedings of Interspeech, pages 3690-3694, 2015 |
|
Automatic Recognition of Emergent Social Roles in Small Group Interactions, and , in: Multimedia, IEEE Transactions, 17(5):746 - 760, 2015 |
[DOI] |
Sparse Modeling of Neural Network Posterior Probabilities for Exemplar-based Speech Recognition, , and , in: Speech Communication: Special Issue on Advances in Sparse Modeling and Low-rank Modeling for Speech Processing, 76:230–244, 2016 |
[DOI] |
Objective Intelligibility Assessment of Text-to-Speech Systems Through Utterance Verification, , , and , Idiap-RR-06-2015 |
|
Objective Speech Intelligibility Assessment through Comparison of Phoneme Class Conditional Probability Sequences, , and , in: 40th IEEE Int. Conf. on Acoustics, Speech and Signal Processing (ICASSP), Brisbane, Australia, pages 4924-4928, 2015 |
[DOI] |
On Application Of Non-Negative Matrix Factorization for Ad Hoc Microphone Array Calibration from Incomplete Noisy Distances, , , , and , in: IEEE 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2694-2698, IEEE, 2015 |
[DOI] |
Novel GCC-PHAT Model in Diffuse Sound Field for Microphone Array Pairwise Distance Based Calibration, , , , , , and , in: IEEE 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2669-2673, 2015 |
|
Robust Microphone Placement for Source Localization from Noisy Distance Measurements, , , , and , in: IEEE 40th International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 2579-2583, IEEE, 2015 |
[DOI] |
Detecting and Labeling Speakers on Overlapping Speech using Vector Taylor Series, , and , in: INTERSPEECH, 2014 |
|
Multi-source Posteriors for Speech Activity Detection on Public Talks, and , in: INTERSPEECH, 2014 |
|
Diarizing Large Corpora using Multi-modal Speaker Linking, , , and , in: INTERSPEECH 2014, 2014 |
|
Phoneme Background Model for Information Bottleneck based Speaker Diarization, , and , in: Interspeech, Singapore, 2014 |
|
Overlapping speech detection using long-term conversational features for speaker diarization in meeting room conversations., and , in: Audio, Speech and Language processing, IEEE/ACM Transaction on, 22(12):1688-1700, 2014 |
|
ROCKIT: Roadmap for Conversational Interaction Technologies, , , , , , , , , , , , , , and , in: Proceedings of the 2014 Workshop on Roadmapping the Future of Multimodal Interaction Research including Business Opportunities and Challenges (RFMIR '14), pages 39-42, ACM, 2014 |
[DOI] |
COMBINING SGMM SPEAKER VECTORS AND KL-HMM APPROACH FOR SPEAKER DIARIZATION, , and , Idiap-RR-17-2015 |
|
KL-HMM BASED SPEAKER DIARIZATION SYSTEM FOR MEETINGS, and , Idiap-RR-19-2015 |
|