Keywords:
- Accent Identification
- Accented speech
- Accentual mismatch
- Acoustic model adaptation
- acoustic modeling
- Ad hoc microphone array calibration
- Ad-hoc microphone calibration
- adaptive budget allocation
- adaptive layer norm
- adaptive training
- adaptive TTS
- Adequacy of diffuseness
- Afrikaans
- All pass warp
- ASR
- audio processing
- aurora
- Automatic prosodic event detection
- Automatic Speech Recognition
- Bayesian transfer learning
- Benchmarking
- benchmarks
- bilingual speakers
- Blizzard Challenge
- Broadband beam-pattern
- broadcast news
- Cadzow algorithm
- catastrophic forgetting
- cepstral normalisation
- cochlear model
- cochlear models
- Code-Switching
- conditional layer normalization
- Confidence Measure (CM)
- connectionist temporal classification (ctc)
- constrained structural maximum a posteriori linear regression
- continuous F0 coding
- Conversational technologies
- crosslingual adaptation
- Deep learning for speech
- deep MLPs
- deep neural networks
- Delay-and-sum beamformer
- dialectal lexicon
- Diffuse field coherence model
- Diffuse noise coherence
- Diffuse sound coherence model
- diffusion model
- diffusion transformer
- Digital IIR Filters
- Digital IIR Filters
- direction of arrival
- Directivity
- Distant speech recognition
- Distributed source localization.
- dnn
- dnn-based speech recognition
- domain adaptation
- duration
- Emotion Recognition
- emotional speech synthesis
- emotional TTS
- emphasis
- end-to-end
- end-to-end architectures
- energy
- Environmental mismatch
- Euclidean distance matrix
- fast adaptation
- fast training
- filterbanks
- French accents
- French Regional Accents
- French TTS
- Fujisaki Model
- gamma-tone filter
- Generalized Trust Region Subproblem (GTRS).
- German
- German language
- GMM Modelling
- hidden Markov models
- HMM-based speech synthesis
- HSMM explicit duration modelling
- hybrid system
- i-vectors
- Image Model
- importance score
- intonation
- KL-HMM
- Kullback-Leibler divergence
- Laplace approximation
- Lexicon
- low bit rate speech coding
- low-rank adaptation
- LVCSR
- Matrix completion
- modelling
- multi-dialect
- multilayer perceptron
- Multilingual
- multilingual acoustic modeling
- multilingual ASR
- multilingual speech recognition
- Multimodal interaction
- nearest neighbour rule of classification.
- neural network features
- neural networks
- NLP
- Noise Robustness
- open vocabulary
- open-vocabulary
- Out-Of-Language (OOL) detection
- Overlapping Speech
- parameter-efficient fine-tuning
- parametric speech synthesis
- Parametric vocoding
- pattern matching
- phone duration modelling
- Phonological features
- phonological posteriors
- phonology
- pitch analysis
- pitch model
- Pitch modelling
- pitch target approximation
- pitch target realisation
- Posterior features
- pretrained language model
- probabilistic amplitude demodulation
- prosody
- Prosody Modelling
- punctuation
- real-time audio processing
- recurrent neural network
- reliability estimation
- Reverberant enclosure
- Robust microphone placement
- S-stress
- Saliency Mapping
- self-supervision
- Semi-supervised training
- Semidefinite programming
- sentence boundary prediction
- SGMM adaptation
- SincNet
- Single-channel source localization
- SNR spectrum
- Source localization
- Sparse Component Analysis
- speaker adaptation
- spectral amplitude modulation phase hierarchy
- speech coding
- speech corpus
- speech meta-data
- speech prosody
- speech recognition
- speech synthesis
- Speech Translation
- speech-to-speech translation
- spiking neural networks
- Spoken Language Understanding
- Spoken Term Detection (STD)
- Statistical parametric speech synthesis
- Subs-ace Gaussian Mixture Models
- subword segmentation
- Subword unit
- Superdirective beamformer
- Support Vector Regression
- SVM
- Swiss German
- Swiss prosody
- Swisscom
- synchronisation
- Tandem
- temporal alignment
- text-to-speech
- time synchronisation
- time synchronization
- time-frequency analysis
- trainable filterbanks
- TTS
- TV Box
- Under-resourced data
- under-resourced languages
- under-resourced speech recognition
- unit selection
- universal phoneme set
- VAE
- variational inference
- Very low bit rate speech coding
- vocal tract length normalization
- voice assistant
- VTLN
- Wav2vec
- word emphasis
- zero-shot speaker adaptation
Publications of Philip N. Garner
2020
COMPARISON OF SUBWORD SEGMENTATION METHODS FOR OPEN-VOCABULARYEND-TO-END SPEECH RECOGNITION, , , and , Idiap-RR-34-2020 |
|
2019
AN END-TO-END NETWORK TO SYNTHESIZE INTONATION USING A GENERALIZED COMMAND RESPONSE MODEL, , , , and , Idiap-RR-05-2019 |
|
An End-to-end Network to Synthesize Intonation Using a Generalized Command Response Model, , , , and , in: ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, United Kingdom, pages 7040-7044, IEEE, 2019 |
[DOI] [URL] |
AN INVESTIGATION OF MULTILINGUAL ASR USING END-TO-END LF-MMI, , and , in: International Conference on Acoustics, Speech and Signal Processing, 2019 |
|
EMPIRICAL EVALUATION AND COMBINATION OF PUNCTUATION PREDICTION MODELS APPLIED TO BROADCAST NEWS, and , Idiap-RR-01-2019 |
|
EMPIRICAL EVALUATION AND COMBINATION OF PUNCTUATION PREDICTION MODELS APPLIED TO BROADCAST NEWS, and , in: Proceedings of 2019 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2019 |
|
Neural VTLN for Speaker Adaptation in TTS, and , in: Proc. 10th ISCA Speech Synthesis Workshop, ISCA, Vienna, Austria, pages 6, 2019 |
[DOI] |
Self-attention for Speech Emotion Recognition, , and , in: Proc. Interspeech 2019, 2019 |
[DOI] |
Unbiased semi-supervised LF-MMI training using dropout, , , and , in: Proceedings of Interspeech 2019, 2019 |
[DOI] |
2018
A Neural Model to Predict Parameters for a Generalized Command Response Model of Intonation, and , Idiap-RR-10-2018 |
|
A Neural Model to Predict Parameters for a Generalized Command Response Model of Intonation, and , in: Proc. Interspeech 2018, pages 3147-3151, 2018 |
[DOI] |
A Neural Model to Predict Parameters for a Generalized Command Response Model of Intonation, and , in: MLSLP-18 Proceedings, Hyderabad, 2018 |
[URL] |
Combining the SNR Spectrum with a Cochlear Model, , Idiap-RR-14-2018 |
|
CONTEXT-AWARE ATTENTION MECHANISM FOR SPEECH EMOTION RECOGNITION, , , and , in: IEEE Workshop on Spoken Language Technology, Athens, Greece, pages 126-131, 2018 |
[URL] |
Cross-lingual Adaptation of a CTC-based multilingual Acoustic Model, , and , in: Speech Communication, 104:39-46, 2018 |
[DOI] |
Embedding Context-Dependent Variations of Prosodic Contours using Variational Encoding for Decomposing the Structure of Speech Prosody, , , , and , in: Workshop on Prosody and Meaning: Information Structure and Beyond, Aix-en-Provence, France, 2018 |
[URL] |
Fast Language Adaptation Using Phonological Information, , and , in: Proceedings of Interspeech 2018, Hyderabad, INDIA, pages 2459-2463, 2018 |
[DOI] |
Intonation modelling using a muscle model and perceptually weighted matching pursuit, , , and , in: Speech Communication, 2018 |
[DOI] [URL] |
Multilingual Training and Cross-lingual Adaptation on CTC-based Acoustic Model, , and , Idiap-RR-01-2018 |
|
Phonological mappings for English, French, German and Portuguese, and , Idiap-Com-02-2018 |
|
2017
An Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation, , and , in: Proc. of Interspeech, 2017 |
|
Comparative Study on Sentence Boundary Prediction for German and English Broadcast News, , , , and , Idiap-RR-18-2017 |
|
From Research to Reality: Evaluation of a Single-Computer Real-Time LVCSR System for Speech-Based Retrieval, , , and , Idiap-RR-12-2017 |
|
The SIWIS French Speech Synthesis Database – Design and recording of a high quality French database for speech synthesis, , , and , Idiap-RR-03-2017 |
|
2016
An agonist-antagonist pitch production model, and , in: Lecture Notes in Artificial Intelligence: 18th International Conference, SPECOM 2016, Budapest, Hungary, pages 84--91, 2016 |
|
Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding, , , and , Idiap-RR-11-2016 |
|
Composition of Deep and Spiking Neural Networks for Very Low Bit Rate Speech Coding, , , and , in: IEEE/ACM Trans. on Audio, Speech and Language Processing, 2016 |
|
Design of a Speech Corpus for Research on Cross-Lingual Prosody Transfer, , , , , , , , , , , and , in: Lecture Notes in Artificial Intelligence: 18th International Conference, SPECOM 2016, Budapest, Hungary, pages 199--206, 2016 |
|
Emphasis Recreation for TTS using Intonation Atoms, and , in: 9th ISCA Speech Synthesis Workshop, pages 14--20, 2016 |
[DOI] |
Intonation atom based emphasis transfer, and , Idiap-RR-14-2016 |
|
Investigating Cross-lingual Multi-level Adaptive Networks: The Importance of the Correlation of Source and Target Languages, , , , and , in: Proceedings of the International Workshop on Spoken Language Translation, Seattle, WA, USA, 2016 |
|