Keywords:
- audio-visual speaker recognition
- deep learning
- Deep Metric Learning.
- deep neural networks
- diarization
- domain adaptation
- embedding
- embedding learning
- Face
- face clustering
- Face dirarization
- Face Recognition
- Metric learning
- multi-object tracking
- Multimodal
- multimodal identification
- Multimodal person diarization
- OCR
- person diarization
- person discovery
- person identification
- speaker
- Speaker Diarization
- Speaker identification
- speaker verification
- tracking
- transfer learning
- triplet loss
Publications of Nam Le sorted by recency
Multimodal Person Recognition in Audio-Visual Streams, , EPFL, 2019 |
[DOI] |
Improving speech embedding using crossmodal transfer learning with audio-visual data, and , in: Multimedia Tools and Applications, 78(11):15681-15704, 2019 |
[DOI] |
Robust and Discriminative Speaker Embedding via Intra-Class Distance Variance Regularization, and , in: Proceedings of Interspeech, Hyderabad, INDIA, pages 2257-2261, 2018 |
[DOI] |
A Domain Adaptation Approach to Improve Speaker Turn Embedding Using Face Representation, and , in: ACM International Conference on Multimodal Interaction, Glasgow, Scotland, ACM, 2017 |
|
Improving speaker turn embedding by crossmodal transfer learning from face embedding, and , in: ICCV Workshop on Computer Vision for Audio-Visual Media, 2017 |
|
Towards large scale multimedia indexing: A case study on person discovery in broadcast news, , and , in: 15th International Workshop on Content-Based Multimedia Indexing, 2017 |
|
EUMSSI team at the MediaEval Person Discovery Challenge 2016, , and , in: MediaEval Benchmarking Initiative for Multimedia Evaluation, Hilversum, Netherlands, 2016 |
|
Long-Term Time-Sensitive Costs for CRF-Based Tracking by Detection, , and , in: 2nd Workshop on Benchmarking Multi-target Tracking: MOTChallenge 2016, Amsterdam, 2016 |
|
Temporally Subsampled Detection for Accurate and Efficient Face Tracking and Diarization, , , and , in: International Conference on Pattern Recognition, Cancun, Mexico, IEEE, 2016 |
|
Learning Multimodal Temporal Representation for Dubbing Detection in Broadcast Media, and , in: ACM Multimedia, Amsterdam, ACM, 2016 |
|
Deep Dynamic Neural Networks for Multimodal Gesture Segmentation and Recognition, , , , , , and , in: IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2016 |
|
EUMSSI team at the MediaEval Person Discovery Challenge, , , and , in: Working Notes Proceedings of the MediaEval 2015 Workshop, Wurzen, Germany, 2015 |
[URL] |