Visual Focus of Attention Estimation in 3D Scene with an Arbitrary Number of Targets

We use cookies

This website uses cookies and other tracking technologies to improve your browsing experience for the following purposes: to enable basic functionality of the website, to provide a better experience on the website, to measure your interest in our products and services and to personalize marketing interactions, to deliver ads that are more relevant to you.

[BibTeX] [Marc21]

Type of publication:	Conference paper
Citation:	Siegfried_CVPRW_2021
Publication status:	Accepted
Booktitle:	Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops
Year:	2021
Month:	June
Pages:	9
Publisher:	IEEE
Abstract:	Visual Focus of Attention (VFOA) estimation in conversation is challenging as it relies on difficult to estimate information (gaze) combined with scene features like target positions and other contextual information (speaking status) allowing to disambiguate situations. Previous VFOA models fusing all these features are usually trained for a specific setup and using a fixed number of interacting people, and should be retrained to be applied to another one, which limits their usability. To address these limitations, we propose a novel deep learning method that encodes all input features as a fixed number of 2D maps, which makes the input more naturally processed by a convolutional neural network, provides scene normalization, and allows to consider an arbitrary number of targets. Experiments performed on two publicly available datasets demonstrate that the proposed method can be trained in a cross-dataset fashion without loss in VFOA accuracy compared to intra-dataset training.
Keywords:	attention, remote sensor, VFOA
Projects	Idiap MUMMER
Authors	Siegfried, Remy Odobez, Jean-Marc
Added by:	[UNK]
Total mark:	0
Attachments
Siegfried_CVPRW_2021.pdf
Notes

processing time: 0.0012 seconds.