CONF motlicek:ICASSP-1:2007/IDIAP Wide-Band Perceptual Audio Coding based on Frequency-Domain Linear Prediction Motlicek, Petr Ullal, Vijay Hermansky, Hynek EXTERNAL http://publications.idiap.ch/attachments/papers/2007/motlicek-ICASSP-1-2007.pdf PUBLIC http://publications.idiap.ch/index.php/publications/showcite/motlicek:rr06-58 Related documents IEEE Int. Conf. on Acoustics, Speech, and Signal Processing (ICASSP) 2007 IDIAP-RR 06-58 In this paper we propose an extension of the very low bit-rate speech coding technique, exploiting predictability of the temporal evolution of spectral envelopes, for wide-band audio coding applications. Temporal envelopes in critically band-sized sub-bands are estimated using frequency domain linear prediction applied on relatively long time segments. The sub-band residual signals, which play an important role in acquiring high quality reconstruction, are processed using a heterodyning-based signal analysis technique. For reconstruction, their optimal parameters are estimated using a closed-loop analysis-by-synthesis technique driven by a perceptual model emulating simultaneous masking properties of the human auditory system. We discuss the advantages of the approach and show some properties on challenging audio recordings. The proposed technique is capable of encoding high quality, variable rate audio signals on bit-rates below 1 bit/sample. REPORT motlicek:rr06-58/IDIAP Wide-Band Perceptual Audio Coding based on Frequency-Domain Linear Prediction Motlicek, Petr Ullal, Vijay Hermansky, Hynek EXTERNAL http://publications.idiap.ch/attachments/reports/2006/motlicek-idiap-rr-06-58.pdf PUBLIC Idiap-RR-58-2006 2006 IDIAP In this paper we propose an extension of the very low bit-rate speech coding technique, exploiting predictability of the temporal evolution of spectral envelopes, for wide-band audio coding applications. Temporal envelopes in critically band-sized sub-bands are estimated using frequency domain linear prediction applied on relatively long time segments. The sub-band residual signals, which play an important role in acquiring high quality reconstruction, are processed using a heterodyning-based signal analysis technique. For reconstruction, their optimal parameters are estimated using a closed-loop analysis-by-synthesis technique driven by a perceptual model emulating simultaneous masking properties of the human auditory system. We discuss the advantages of the approach and show some properties on challenging audio recordings. The proposed technique is capable of encoding high quality, variable rate audio signals on bit-rates below $1$bit/sample.