<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
	<record>
		<datafield tag="980" ind1=" " ind2=" ">
			<subfield code="a">REPORT</subfield>
		</datafield>
		<datafield tag="970" ind1=" " ind2=" ">
			<subfield code="a">Saheer_Idiap-RR-12-2012/IDIAP</subfield>
		</datafield>
		<datafield tag="245" ind1=" " ind2=" ">
			<subfield code="a">VTLN-Based Rapid Cross-Lingual Adaptation for Statistical Parametric Speech Synthesis</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Saheer, Lakshmi</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Liang, Hui</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Dines, John</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Garner, Philip N.</subfield>
		</datafield>
		<datafield tag="856" ind1="4" ind2="0">
			<subfield code="i">EXTERNAL</subfield>
			<subfield code="u">http://publications.idiap.ch/attachments/reports/2011/Saheer_Idiap-RR-12-2012.pdf</subfield>
			<subfield code="x">PUBLIC</subfield>
		</datafield>
		<datafield tag="088" ind1=" " ind2=" ">
			<subfield code="a">Idiap-RR-12-2012</subfield>
		</datafield>
		<datafield tag="260" ind1=" " ind2=" ">
			<subfield code="c">2012</subfield>
			<subfield code="b">Idiap</subfield>
		</datafield>
		<datafield tag="771" ind1="2" ind2=" ">
			<subfield code="d">April 2012</subfield>
		</datafield>
		<datafield tag="520" ind1=" " ind2=" ">
			<subfield code="a">Cross-lingual speaker adaptation (CLSA) has
emerged as a new challenge in statistical parametric speech syn-
thesis, with specific application to speech-to-speech translation.
Recent research has shown that reasonable speaker similarity
can be achieved in CLSA using maximum likelihood linear
transformation of model parameters, but this method also has
weaknesses due to the inherent mismatch caused by differing
phonetic inventories of languages. In this paper, we propose
that fast and effective CLSA can be made using vocal tract
length normalization (VTLN), where strong constraints of the
vocal tract warping function may actually help to avoid the most
severe effects of the aforementioned mismatch. VTLN has a single
parameter that warps spectrum. Using shifted or adapted pitch,
VTLN can still achieve reasonable speaker similarity. We present
our approach, VTLN-based CLSA, and evaluation results that
support our proposal under the limitation that the voice identity
and speaking style of a target speaker don’t diverge too far from
that of the average voice model.</subfield>
		</datafield>
	</record>
</collection>