CONF
Korchagin_ICASSP_2010/IDIAP
Automatic Temporal Alignment of AV Data with Confidence Estimation
Korchagin, Danil
Garner, Philip N.
Dines, John
pattern matching
reliability estimation
time synchronization
time-frequency analysis
EXTERNAL
https://publications.idiap.ch/attachments/papers/2009/Korchagin_ICASSP_2010.pdf
PUBLIC
https://publications.idiap.ch/index.php/publications/showcite/Korchagin_Idiap-RR-40-2009
Related documents
Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing
Dallas, USA
2010
P.O. Box 592, CH-1920 Martigny, Switzerland
March 2010
In this paper, we propose a new approach for the automatic audio-based temporal alignment with confidence estimation of audio-visual data, recorded by different cameras, camcorders or mobile phones during social events. All recorded data is temporally aligned based on ASR-related features with a common master track, recorded by a reference camera, and the corresponding confidence of alignment is estimated. The core of the algorithm is based on perceptual time-frequency analysis with a precision of 10 ms. The results show correct alignment in 99% of cases for a real life dataset and surpass the performance of cross correlation while keeping lower system requirements.
REPORT
Korchagin_Idiap-RR-40-2009/IDIAP
Automatic Temporal Alignment of AV Data with Confidence Estimation
Korchagin, Danil
Garner, Philip N.
Dines, John
pattern matching
reliability estimation
time synchronisation
time-frequency analysis
EXTERNAL
https://publications.idiap.ch/attachments/reports/2009/Korchagin_Idiap-RR-40-2009.pdf
PUBLIC
Idiap-RR-40-2009
2009
Idiap
CH-1920 Martigny, Switzerland
December 2009
In this paper, we propose a new approach for the automatic audio-based temporal alignment with confidence estimation of audio-visual data, recorded by different cameras, camcorders or mobile phones during social events. All recorded data is temporally aligned based on ASR-related features with a common master track, recorded by a reference camera, and the corresponding confidence of alignment is estimated. The core of the algorithm is based on perceptual time-frequency analysis with a precision of 10 ms. The results show correct alignment in 99% of cases for a real life dataset and surpass the performance of cross correlation while keeping lower system requirements.