REPORT Korchagin_Idiap-RR-10-2011/IDIAP Just-in-Time Multimodal Association and Fusion from Home Entertainment Korchagin, Danil Motlicek, Petr Duffner, Stefan Bourlard, Hervé association rules data analysis multimodal signal processing sensor fusion EXTERNAL http://publications.idiap.ch/attachments/reports/2011/Korchagin_Idiap-RR-10-2011.pdf PUBLIC Idiap-RR-10-2011 2011 Idiap Martigny, Switzerland May 2011 In this paper, we describe a real-time multimodal analysis system with just-in-time multimodal association and fusion for a living room environment, where multiple people may enter, interact and leave the observable world with no constraints. It comprises detection and tracking of up to 4 faces, detection and localisation of verbal and paralinguistic events, their association and fusion. The system is designed to be used in open, unconstrained environments like in next generation video conferencing systems that automatically “orchestrate” the transmitted video streams to improve the overall experience of interaction between spatially separated families and friends. Performance levels achieved to date on hand-labelled dataset have shown sufficient reliability at the same time as fulfilling real-time processing requirements.

<subfield code="a">REPORT</subfield>

</datafield>

<subfield code="a">Korchagin_Idiap-RR-10-2011/IDIAP</subfield>

</datafield>

<subfield code="a">Just-in-Time Multimodal Association and Fusion from Home Entertainment</subfield>

</datafield>

<subfield code="a">Korchagin, Danil</subfield>

</datafield>

<subfield code="a">Motlicek, Petr</subfield>

</datafield>

<subfield code="a">Duffner, Stefan</subfield>

</datafield>

<subfield code="a">Bourlard, Hervé</subfield>

</datafield>

<subfield code="a">association rules</subfield>

</datafield>

<subfield code="a">data analysis</subfield>

</datafield>

<subfield code="a">multimodal signal processing</subfield>

</datafield>

<subfield code="a">sensor fusion</subfield>

</datafield>

<subfield code="i">EXTERNAL</subfield>

<subfield code="u">http://publications.idiap.ch/attachments/reports/2011/Korchagin_Idiap-RR-10-2011.pdf</subfield>

<subfield code="x">PUBLIC</subfield>

</datafield>

<subfield code="a">Idiap-RR-10-2011</subfield>

</datafield>

<subfield code="b">Idiap</subfield>

<subfield code="a">Martigny, Switzerland</subfield>

</datafield>

</datafield>

<subfield code="a">In this paper, we describe a real-time multimodal analysis system with just-in-time multimodal association and fusion for a living room environment, where multiple people may enter, interact and leave the observable world with no constraints. It comprises detection and tracking of up to 4 faces, detection and localisation of verbal and paralinguistic events, their association and fusion. The system is designed to be used in open, unconstrained environments like in next generation video conferencing systems that automatically “orchestrate” the transmitted video streams to improve the overall experience of interaction between spatially separated families and friends. Performance levels achieved to date on hand-labelled dataset have shown sufficient reliability at the same time as fulfilling real-time processing requirements.</subfield>

</datafield>

</record>

</collection>