<?xml version="1.0" encoding="UTF-8"?>
<collection xmlns="http://www.loc.gov/MARC21/slim">
	<record>
		<datafield tag="980" ind1=" " ind2=" ">
			<subfield code="a">CONF</subfield>
		</datafield>
		<datafield tag="970" ind1=" " ind2=" ">
			<subfield code="a">Asaei_INTERSPEECH_2015/IDIAP</subfield>
		</datafield>
		<datafield tag="245" ind1=" " ind2=" ">
			<subfield code="a">On Compressibility of Neural Network phonological Features for Low Bit Rate Speech Coding</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Asaei, Afsaneh</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Cernak, Milos</subfield>
		</datafield>
		<datafield tag="700" ind1=" " ind2=" ">
			<subfield code="a">Bourlard, Hervé</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">Compressive sampling</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">Low bit rate speech vocoding</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">Phonological features</subfield>
		</datafield>
		<datafield tag="653" ind1="1" ind2=" ">
			<subfield code="a">Structured sparsity</subfield>
		</datafield>
		<datafield tag="856" ind1="4" ind2="0">
			<subfield code="i">EXTERNAL</subfield>
			<subfield code="u">http://publications.idiap.ch/attachments/papers/2016/Asaei_INTERSPEECH_2015.pdf</subfield>
			<subfield code="x">PUBLIC</subfield>
		</datafield>
		<datafield tag="711" ind1="2" ind2=" ">
			<subfield code="a">Proceeding of Interspeech</subfield>
		</datafield>
		<datafield tag="260" ind1=" " ind2=" ">
			<subfield code="c">2015</subfield>
			<subfield code="b">ISCA</subfield>
		</datafield>
		<datafield tag="773" ind1=" " ind2=" ">
			<subfield code="c">418-422</subfield>
		</datafield>
		<datafield tag="520" ind1=" " ind2=" ">
			<subfield code="a">Phonological features extracted by neural network have shown interesting potential for low bit rate speech vocoding. 
The span of phonological features is wider than the span of phonetic features, and thus fewer frames need to be transmitted. 
Moreover, the binary nature of phonological features enables a higher compression ratio at minor quality cost. 

In this paper, we study the compressibility and structured sparsity of the phonological features. 
We propose a compressive sampling framework for speech coding and sparse reconstruction for decoding prior to synthesis. 
Compressive sampling is found to be a principled way for compression in contrast to the conventional pruning approach; it leads to $50$\% reduction in the bit-rate for better or equal quality of the decoded speech.
Furthermore, exploiting the structured sparsity and binary characteristic of these features have shown to enable very low bit-rate coding at 700 bps with negligible quality loss; this coding scheme imposes no latency. If we consider a latency of $256$~ms for supra-segmental structures, the rate of $250-350$~bps is achieved.</subfield>
		</datafield>
	</record>
</collection>