Keywords:
- 3D face model
- abnormality detection
- acoustic generators
- Acoustic signal processing
- activity
- appearance based methods
- Appearance based model
- appearance model
- Archaeology
- Artificial Neural Networks
- attention
- audio-visual speaker recognition
- Autism
- backchannels
- Bayesian modeling
- behavior analysis.
- bias correction
- blink
- bobbing estimation
- camera network
- Children
- clustering
- cognition
- computer vision
- Content-based multimedia indexing
- conversation
- convolutional network
- Convolutional neural network.
- Convolutional Neural Networks
- corpus
- covariance matrices
- Crowdsourcing
- Cultural heritage
- dataset
- deep learning
- Deep Metric Learning.
- deep neural networks
- Delays
- diarization
- Dimensionality reduction
- direction-of-arrival estimation
- DOA estimation
- domain adaptation
- embedding
- embedding learning
- Encoding
- entrainment to music
- Epigraphy
- Estimation
- eye movements
- eye tracking
- eye-gaze
- Face
- face clustering
- Face dirarization
- Face Recognition
- Face tracking
- Facial animations
- Feature extraction
- Feature-based tracking
- first impressions
- focus of attention
- gait
- Gaze
- Gaze Coding
- gaze detection
- Gaze estimation
- generative models
- geometric method
- grapevine pruning
- group dynamics
- HCI
- head nods
- Head pose
- Head pose tracking
- head-pose invariance
- HHI
- hieroglyph
- Histogram of orientation
- HOOSC
- HRI
- human activity recognition
- human behaviour analysis
- human detection
- Human pose estimation
- human-robot interaction
- image rectification
- image retrieval
- image segmentation
- indexing
- information fusion
- information visualization
- internet of things
- involvement
- keyframe extraction
- language
- learning
- likelihood-based encoding
- listener categories
- machine learning
- manipulation
- Maya civilization
- Maya culture
- Maya glyph
- maya glyphs
- Metric learning
- microphone arrays
- Microphones
- mixed activity
- Monte Carlo methods
- motif mining
- multi-camera
- multi-object tracking
- Multimodal
- multimodal identification
- Multimodal interaction
- Multimodal person diarization
- multiple face tracking
- multiple sound sources
- multiple speaker detection
- multivariate time series
- network output
- neural nets
- neural network-based sound source localization methods
- neural networks
- non parametric models
- non-verbal cues
- Nonverbal behavior
- OCR
- online calibration.
- particle filter
- person diarization
- person discovery
- person identification
- person invariance
- Person Tracking
- plant skeleton
- pLSA
- Position measurement
- precision viticulture
- real-time
- remote
- remote recording
- remote sensing
- remote sensor
- representation learning
- RGB-D
- RGB-D camera
- RGB-D cameras
- road vehicles
- Robots
- saccade
- Sampling
- scene analysis
- segmentation
- shape classification
- Shape descriptor
- shape recognition
- Shape retrieval
- shot boundary detection
- simultaneous detection
- single sound source
- sketch
- skin colour
- social computing
- sound mixtures
- sound source localization
- sparse autoencoder
- sparse coding
- spatial spectrum-based approaches
- speaker
- Speaker Diarization
- Speaker identification
- speaker recognition
- speaker verification
- spectral shot clustering
- Speech
- surveillance
- topic models
- tracking
- training
- transfer learning
- triplet loss
- unsupervised
- Unsupervised · Latent sequential patterns · Topic models · PLSA · Video surveillance · Activity analysis
- unsupervised activity analysis
- unsupervised calibration
- unsupervised learning
- usability
- user study
- variational inference
- ve- hicle detection.
- VFOA
- video
- video processing
- video structuring
- vineyard
- virtual agents
- visual focus of attention
- Visual similarity
- weakly-supervised learning.
Publications of Jean-Marc Odobez
2025
Loose Social-Interaction Recognition in Real-world Therapy Scenarios, , , , , , , and , in: IEEE/CVF Winter Conference on Applications of Computer Vision, 2025 |
|
2024
A Unified Model for Gaze Following and Social Gaze Prediction, , , and , in: The 18th IEEE International Conference on Automatic Face and Gesture Recognition, 2024 |
|
Automatic detection of the visual gaze components of joint attention in observational, naturalistic child language acquisition data, , , , , and , in: Boston University Conference on Language Development, 2024 |
CCDb-HG: Novel Annotations and Gaze-Aware Representations for Head Gesture Recognition, , , and , in: 18th IEEE Int. Conference on Automatic Face and Gesture Recognition (FG), Istanbul,, 2024 |
|
ChildPlay-Hand: A Dataset of Hand Manipulations in the Wild, , and , in: Proceedings of the European Conference on Computer Vision (ECCV) Workshops, 2024 |
|
Exploring the Zero-Shot Capabilities of Vision-Language Models for Improving Gaze Following, , , and , in: Int. Conf. Computer Vision and Pattern Recognition (CVPR), Workshop on Gaze Estimation and Prediction in the Wild, 2024 |
|
Investigating Semantic Segmentation Models to Assist Visually Impaired People, , and , Idiap-RR-13-2024 |
|
Investigating Semantic Segmentation Models to Assist Visually Impaired People, , and , in: European Conference on Computer Vision - Workshops, 2024 |
|
MTGS: A Novel Framework for Multi-Person Temporal Gaze Following and Social Gaze Prediction, , , , and , in: 38th Conf. on Neural Information Processing System, 2024 |
|
Sharingan: A Transformer Architecture for Multi-Person Gaze Following, , and , in: Int. Conference Computer Vision and Pattern Recognition (CVPR), Seatle, 2024 |
|
Toward Semantic Gaze Target Detection, , , and , in: 38th Conf. on Neural Information Processing System, 2024 |
|
Weakly-supervised Autism Severity Assessment in Long Videos, , , , , , and , in: International Conference on Content-based Multimedia Indexing, 2024 |
|
2023
A Bayesian approach to machine learning model comparison, , Idiap-Com-01-2023 |
|
A Multitask and Kernel approach for Learning to Push Objects with a Task-Parameterized Deep Q-Network, , , , and , in: Proc. IEEE/RSJ Intl Conf. on Intelligent Robots and Systems (IROS), 2023 |
|
ChildPlay: A New Benchmark for Understanding Children's Gaze Behaviour, , and , in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2023 |
|
Efficient Grapevine Structure Estimation in Vineyards Conditions, , , and , in: Proceedings of the IEEE/CVF International Conference on Computer Vision, Paris, pages 712--720, 2023 |
[URL] |
Idiap Scientific Report 2022, , , , , , , , , , , , , , , , , and , Idiap-RR-05-2023 |
|
The AI4Autism Project: A Multimodal and Interdisciplinary Approach to Autism Diagnosis and Stratification, , , , , , , and , in: Companion Publication of the 25th International Conference on Multimodal Interaction, Paris, France, pages 414–425, Association for Computing Machinery, 2023 |
[DOI] [URL] |
Towards Smart Pruning: ViNet, a Deep-Learning Approach for Grapevine Structure Estimation, , , and , in: Computers and Electronics in Agriculture, 207:107736, 2023 |
[DOI] [URL] |
2022
A Modular Multimodal Architecture for Gaze Target Prediction: Application to Privacy-Sensitive Settings, , and , in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2022 |
|
Robust Unsupervised Gaze Calibration using Conversation and Manipulation Attention Priors, and , in: ACM Transactions on Multimedia Computing, Communications, and Applications, 18(1):26, 2022 |
[DOI] [URL] |
2021
A Differential Approach for Gaze Estimation, , and , in: IEEE Transaction on Pattern Analysis and Machine Intelligence, 43(3):1092--1098, 2021 |
[DOI] [URL] |
Active Learning of Bayesian Probabilistic Movement Primitives, , , and , in: IEEE Robotic and Automation Letters, 2021 |
|
An Attention Mechanism for Deep Q-Networks with Applications in Robotic Pushing, , and , Idiap-RR-03-2021 |
|
An Attention Mechanism for Deep Q-Networks with Applications in Robotic Pushing, , and , in: Proc. of Workshop on Emerging paradigms for robotic manipulation: from the lab to the productive world, ICRA, 2021 |
An Efficient Image-to-Image Translation HourGlass-based Architecture for Object Pushing Policy Learning, , and , in: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2021 |
|
Multi-task Neural Network for Robust Multiple Speaker Embedding Extraction, , and , in: Proceedings of Interspeech 2021, 2021 |
Neural Network Adaptation and Data Augmentation for Multi-Speaker Direction-of-Arrival Estimation, , and , in: IEEE/ACM Transactions on Audio, Speech, and Language Processing, 29:1303-1317, 2021 |
[DOI] [URL] |
Pose Transformers (POTR): Human Motion Prediction with Non-Autoregressive Transformers, , and , in: International Conference in Computer Vision - Workshops, 2021 |
|
Towards an Engagement-Aware Attentive Artificial Listener for Multi-Party Interactions, , , , , and , in: Frontiers in Robotics and AI, 8:189, 2021 |
[DOI] [URL] |
Visual Focus of Attention Estimation in 3D Scene with an Arbitrary Number of Targets, and , in: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pages 9, IEEE, 2021 |
|
2020
Efficient Convolutional Neural Networks for Depth-Based Multi-Person Pose Estimation, , , and , in: IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, 30(11):4207-4221, 2020 |
[DOI] [URL] |
ManiGaze: a Dataset for Evaluating Remote Gaze Estimator in Object Manipulation Situations, , and , in: Symposium on Eye Tracking Research and Applications, Stuttgart, Germany, pages 5, ACM, 2020 |
[DOI] |
Multi-scale sequential network for semantic text segmentation and localization, , and , in: Pattern Recognition Letters, 129:63-69, 2020 |
[DOI] [URL] |
Residual Pose: A Decoupled Approach for Depth-based 3D Human Pose Estimation, , , and , in: IEEE/RSJ International Conference on Intelligent Robots and Systems, 2020 |
|
The MuMMER data set for Robot Perception in multi-party HRI Scenarios, , , and , in: Proceedings of the 29th IEEE International Conference on Robot & Human Interactive Communication, 2020 |
|