CONF Korchagin_ICASSP_2010/IDIAP Automatic Temporal Alignment of AV Data with Confidence Estimation Korchagin, Danil Garner, Philip N. Dines, John pattern matching reliability estimation time synchronization time-frequency analysis EXTERNAL http://publications.idiap.ch/attachments/papers/2009/Korchagin_ICASSP_2010.pdf PUBLIC http://publications.idiap.ch/index.php/publications/showcite/Korchagin_Idiap-RR-40-2009 Related documents Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing Dallas, USA 2010 P.O. Box 592, CH-1920 Martigny, Switzerland March 2010 In this paper, we propose a new approach for the automatic audio-based temporal alignment with confidence estimation of audio-visual data, recorded by different cameras, camcorders or mobile phones during social events. All recorded data is temporally aligned based on ASR-related features with a common master track, recorded by a reference camera, and the corresponding confidence of alignment is estimated. The core of the algorithm is based on perceptual time-frequency analysis with a precision of 10 ms. The results show correct alignment in 99% of cases for a real life dataset and surpass the performance of cross correlation while keeping lower system requirements. REPORT Korchagin_Idiap-RR-40-2009/IDIAP Automatic Temporal Alignment of AV Data with Confidence Estimation Korchagin, Danil Garner, Philip N. Dines, John pattern matching reliability estimation time synchronisation time-frequency analysis EXTERNAL http://publications.idiap.ch/attachments/reports/2009/Korchagin_Idiap-RR-40-2009.pdf PUBLIC Idiap-RR-40-2009 2009 Idiap CH-1920 Martigny, Switzerland December 2009 In this paper, we propose a new approach for the automatic audio-based temporal alignment with confidence estimation of audio-visual data, recorded by different cameras, camcorders or mobile phones during social events. All recorded data is temporally aligned based on ASR-related features with a common master track, recorded by a reference camera, and the corresponding confidence of alignment is estimated. The core of the algorithm is based on perceptual time-frequency analysis with a precision of 10 ms. The results show correct alignment in 99% of cases for a real life dataset and surpass the performance of cross correlation while keeping lower system requirements.

</datafield>

<subfield code="a">Korchagin_ICASSP_2010/IDIAP</subfield>

</datafield>

<subfield code="a">Automatic Temporal Alignment of AV Data with Confidence Estimation</subfield>

</datafield>

<subfield code="a">Korchagin, Danil</subfield>

</datafield>

<subfield code="a">Garner, Philip N.</subfield>

</datafield>

<subfield code="a">Dines, John</subfield>

</datafield>

<subfield code="a">pattern matching</subfield>

</datafield>

<subfield code="a">reliability estimation</subfield>

</datafield>

<subfield code="a">time synchronization</subfield>

</datafield>

<subfield code="a">time-frequency analysis</subfield>

</datafield>

<subfield code="i">EXTERNAL</subfield>

<subfield code="u">http://publications.idiap.ch/attachments/papers/2009/Korchagin_ICASSP_2010.pdf</subfield>

<subfield code="x">PUBLIC</subfield>

</datafield>

<subfield code="u">http://publications.idiap.ch/index.php/publications/showcite/Korchagin_Idiap-RR-40-2009</subfield>

<subfield code="z">Related documents</subfield>

</datafield>

<subfield code="a">Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing</subfield>

<subfield code="c">Dallas, USA</subfield>

</datafield>

<subfield code="a">P.O. Box 592, CH-1920 Martigny, Switzerland</subfield>

</datafield>

<subfield code="d">March 2010</subfield>

</datafield>

<subfield code="a">In this paper, we propose a new approach for the automatic audio-based temporal alignment with confidence estimation of audio-visual data, recorded by different cameras, camcorders or mobile phones during social events. All recorded data is temporally aligned based on ASR-related features with a common master track, recorded by a reference camera, and the corresponding confidence of alignment is estimated. The core of the algorithm is based on perceptual time-frequency analysis with a precision of 10 ms. The results show correct alignment in 99% of cases for a real life dataset and surpass the performance of cross correlation while keeping lower system requirements.</subfield>

</datafield>

</record>

<subfield code="a">REPORT</subfield>

</datafield>

<subfield code="a">Korchagin_Idiap-RR-40-2009/IDIAP</subfield>

</datafield>

<subfield code="a">Automatic Temporal Alignment of AV Data with Confidence Estimation</subfield>

</datafield>

<subfield code="a">Korchagin, Danil</subfield>

</datafield>

<subfield code="a">Garner, Philip N.</subfield>

</datafield>

<subfield code="a">Dines, John</subfield>

</datafield>

<subfield code="a">pattern matching</subfield>

</datafield>

<subfield code="a">reliability estimation</subfield>

</datafield>

<subfield code="a">time synchronisation</subfield>

</datafield>

<subfield code="a">time-frequency analysis</subfield>

</datafield>

<subfield code="i">EXTERNAL</subfield>

<subfield code="u">http://publications.idiap.ch/attachments/reports/2009/Korchagin_Idiap-RR-40-2009.pdf</subfield>

<subfield code="x">PUBLIC</subfield>

</datafield>

<subfield code="a">Idiap-RR-40-2009</subfield>

</datafield>

<subfield code="b">Idiap</subfield>

<subfield code="a">CH-1920 Martigny, Switzerland</subfield>

</datafield>

<subfield code="d">December 2009</subfield>

</datafield>

</datafield>

</record>

</collection>