CONF
Lazaridis_SP-7_2014/IDIAP
SVR vs MLP for Phone Duration Modelling in HMM-based Speech Synthesis
Lazaridis, Alexandros
Honnet, Pierre-Edouard
Garner, Philip N.
HMM-based speech synthesis
HSMM explicit duration modelling
multilayer perceptron
phone duration modelling
Support Vector Regression
EXTERNAL
https://publications.idiap.ch/attachments/papers/2014/Lazaridis_SP-7_2014.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/Lazaridis_Idiap-RR-03-2014
Related documents
Speech Prosody
2014
In this paper we investigate external phone duration models (PDMs) for improving the quality of synthetic speech in hidden Markov model (HMM)-based speech synthesis. Support Vector Regression (SVR) and Multilayer Perceptron (MLP) were used for this task. SVR and MLP PDMs were compared with the explicit duration modelling of hidden semi-Markov models (HSMMs). Experiments done on an American English database showed the SVR outperforming the MLP and HSMM duration modelling on objective and subjective evaluation. In the objective test, SVR managed to outperform MLP and HSMM models achieving 15.3% and 25.09% relative improvement in terms of root mean square error (RMSE) respectively. Moreover, in the subjective evaluation test, on synthesized speech, the SVR model was preferred over the MLP and HSMMmodels, achieving a preference score of 35.93% and 56.30%, respectively.
REPORT
Lazaridis_Idiap-RR-03-2014/IDIAP
SVR vs MLP for Phone Duration Modelling in HMM-based Speech Synthesis
Lazaridis, Alexandros
Honnet, Pierre-Edouard
Garner, Philip N.
HMM-based speech synthesis
HSMM explicit duration modelling
multilayer perceptron
phone duration modelling
Support Vector Regression
EXTERNAL
https://publications.idiap.ch/attachments/reports/2013/Lazaridis_Idiap-RR-03-2014.pdf
PUBLIC
Idiap-RR-03-2014
2014
Idiap
March 2014
In this paper we investigate external phone duration models (PDMs) for improving the quality of synthetic speech in hidden Markov model (HMM)-based speech synthesis. Support Vector Regression (SVR) and Multilayer Perceptron (MLP) were used for this task. SVR and MLP PDMs were compared with the explicit duration modelling of hidden semi-Markov models (HSMMs). Experiments done on an American English database showed the SVR outperforming the MLP and HSMM duration modelling on objective and subjective evaluation. In the objective test, SVR managed to outperform MLP and HSMM models achieving 15.3% and 25.09% relative improvement in terms of root mean square error (RMSE) respectively. Moreover, in the subjective evaluation test, on synthesized speech, the SVR model was preferred over the MLP and HSMM models, achieving a preference score of 35.93% and 56.30%, respectively.