Publications of project SNSF-MULTI
| 1 | 2 |
2014
How Do You Like Your Virtual Agent?: Human-Agent Interaction Experience through Nonverbal Features and Personality Traits, , and , in: Human Behavior Understanding, pages 1-15, Springer, 2014 |
|
Saliency-based Representations and Multi-component Classifiers for Visual Scene Recognition, , École Polytechnique Fédérale de Lausanne (EPFL), 2014 |
|
2013
Applying multi- and cross-lingual stochastic phone space transformations to non-native speech recognition, , , , and , in: IEEE Transactions on Audio, Speech, and Language Processing, 2013 |
[DOI] |
Comparing different acoustic modeling techniques for multilingual boosting, , , , and , Idiap-RR-01-2013 |
|
Crosslingual Tandem-SGMM: Exploiting Out-Of-Language Data for Acoustic Model and Feature Level Adaptation, , and , Idiap-RR-39-2013 |
|
Crosslingual Tandem-SGMM: Exploiting Out-Of-Language Data for Acoustic Model and Feature Level Adaptation, , and , in: Proceedings of the 14th Annual Conference of the International Speech Communication Association (Interspeech 2013), ISCA - International Speech Communication Association, Lyon, France, pages 510-514, ISCA, 2013 |
|
MediaParl: Bilingual mixed language accented speech database, , , , , and , Idiap-RR-03-2013 |
|
Multilingual speech recognition A posterior based approach, , École Polytechnique Fédérale de Lausanne (EPFL), 2013 |
|
Overview of the ImageCLEF 2013 Robot Vision Task, , , and , in: Working Notes, CLEF 2013, 2013 |
|
Robust triphone mapping for acoustic modeling, , and , Idiap-RR-02-2013 |
|
Speaker adaptive Kullback-Leibler divergence based hidden Markov models, and , in: Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, 2013 |
|
Using out-of-language data to improve an under-resourced speech recognizer, , , and , in: Speech Communication, 2013 |
[DOI] [URL] |
Using out-of-language data to improve an under-resourced speech recognizer, , , and , Idiap-RR-09-2013 |
|
2012
A Fast Parts-based Approach to Speaker Verification using Boosted Slice Classifiers, , and , in: IEEE Transactions on Information Forensics and Security, 7(1):241-254, 2012 |
|
Baseline Multimodal Place Classifier for the 2012 Robot Vision Task, , and , in: Working Notes of the ImageCLEF 2012 Laboratory, 2012 |
|
Boosting localized binary features for speech recognition, , and , in: Symposium on Machine Learning in Speech and Language Processing (MLSLP), 2012 |
|
Boosting under-resourced speech recognizers by exploiting out of language data - Case study on Afrikaans, , and , in: Proceedings of the 3rd International Workshop on Spoken Languages Technologies for Under-resourced Languages, Cape Town, pages 60--67, 2012 |
|
Boosting under-resourced speech recognizers by exploiting out of language data - Case study on Afrikaans, , and , Idiap-RR-15-2012 |
|
Bridging the Past, Present and Future: Modeling Scene Activities From Event Relationships and Global Rules, , and , in: IEEE Conference on Computer Vision and Pattern Recognition, 2012, Providence, Rhode Island, USA, 2012 |
Comparing different acoustic modeling techniques for multilingual boosting, , , , and , in: Proceedings of Interspeech, Portland, Oregon, 2012 |
|
Decision tree clustering for KL-HMM, and , Idiap-Com-01-2012 |
|
Indoor Scene Recognition using Task and Saliency-driven Feature Pooling, and , in: Proceedings of the British Machine Vision Conference, Guildford, UK, 2012 |
|
MediaParl: Bilingual mixed language accented speech database, , , , , and , in: Proceedings of the 2012 IEEE Workshop on Spoken Language Technology, pages 263--268, 2012 |
|
Overview of the ImageCLEF 2012 Robot Vision Task, , and , in: Working Notes of the ImageCLEF 2012 Laboratory, 2012 |
|
Phase AutoCorrelation (PAC) features for noise robust speech recognition, , , and , in: Speech Communication, 54(7):867–880, 2012 |
[DOI] |
Robust triphone mapping for acoustic modeling, , and , in: Proceedings of Interspeech, Portland, Oregon, 2012 |
|
Sampling techniques for audio-visual tracking and head pose estimation, and , in: Multimodal Signal Processing: Human Interactions in Meetings, pages 84-102, Cambridge University Press, 2012 |
|
Sequential Topic Models for Mining Recurrent Activities and their Relationships : Application to long term video recordings, , École Polytechnique Fédérale de Lausanne, 2012 |
|
Using KL-divergence and multilingual information to improve ASR for under-resourced languages, , and , in: Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, pages 4869--4872, 2012 |
|
Wordless Sounds: Robust Speaker Diarization using Privacy-Preserving Audio Representations, , and , Idiap-RR-28-2012 |
|
Wordless Sounds: Robust Speaker Diarization using Privacy-Preserving Audio Representations, , and , in: IEEE Transactions on Audio, Speech, and Language Processing, 2012 |
|
2011
An Information Theoretic Combination of MFCC and TDOA Features for Speaker Diarization, , and , in: IEEE Transactions on Audio Speech and Language Processing, 19(2), 2011 |
[DOI] |
Boosting Localized Features for Speaker and Speech Recognition, , Ecole Polytechnique Federale de Lausanne (EPFL), 2011 |
|
Continuous Speech Recognition using Boosted Binary Features, , and , Idiap-RR-35-2011 |
|
Discovering Routines from Large-Scale Human Locations using Probabilistic Topic Models, and , in: ACM Transactions on Intelligent Systems and Technology, 2(1), 2011 |
|
Extracting and Locating Temporal Motifs in Video Scenes Using a Hierarchical Non Parametric Bayesian Model, , and , in: IEEE Conference on Computer Vision and Pattern Recognition, 2011 |
|
Fast Speaker Verification on Mobile Phone data using Boosted Slice Classifiers, , and , in: IAPR IEEE International Joint Conference on Biometrics, Washington DC, 2011 |
|
Flickr Groups: Multimedia Communities for Multimedia Analysis, and , in: Internet Multimedia Search and Mining, Bentham Science Publishers, 2011 |
Language dependent universal phoneme posterior estimation for mixed language speech recognition, , , and , Idiap-RR-13-2011 |
|
Language dependent universal phoneme posterior estimation for mixed language speech recognition, , , and , in: Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, Prag, CZ, pages 5012-5015, 2011 |
|
LP Residual Features for Robust, Privacy-Sensitive Speaker Diarization, , and , Idiap-RR-14-2011 |
|
LP Residual Features for Robust, Privacy-Sensitive Speaker Diarization, , and , in: Interspeech, 2011 |
|
Modeling and understanding communities in online social media using probabilistic methods, , Ecole polytechnique fédérale de Lausanne, 2011 |
[DOI] [URL] |
Pervasive Sensing to Model Political Opinions in Face-to-Face Networks, , , and , in: Pervasive, San Francisco, 2011 |
|
Phoneme Recognition using Boosted Binary Features, , and , in: IEEE Intl. Conference on Acoustics, Speech and Signal Processing 2011, 2011 |
|
Privacy-Sensitive Audio Features for Speech/Nonspeech Detection, , , and , Idiap-RR-12-2011 |
|
Privacy-Sensitive Audio Features for Speech/Nonspeech Detection, , , and , in: IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2011 |
|
Sensing the `Health State` of our Society, , , , and , in: IEEE Pervasive Computing, Special Issue on Large-Scale Opportunistic Sensing, 2011 |
|
Towards semi-supervised learning of semantic spatial concepts, and , Idiap-RR-03-2011 |
|
Towards semi-supervised learning of semantic spatial concepts, and , in: IEEE International Conference on Robotics and Automation, 2011 |
|
Towards semi-supervised learning of semantic spatial concepts for mobile robots, and , in: Journal of Physical Agents, 2011 |
|
2010
A Multi Cue Discriminative Approach to Semantic Place Classification, , and , in: CLEF 2010 Notebook Papers/LABs/Workshops, 2010 |
|
A Sparsity Constraint for Topic Models - Application to Temporal Activity Mining, , and , in: NIPS-2010 Workshop on Practical Applications of Sparse Modeling: Open Issues and New Directions, 2010 |
|
An Information Theoretic Approach to Speaker Diarization of Meeting Recordings, , Ecole polytechnique fédérale de Lausanne, 2010 |
|
An Information Theoretic Combination of MFCC and TDOA Features for Speaker Diarization, , and , Idiap-RR-22-2010 |
|
BOOSTED BINARY FEATURES FOR NOISE-ROBUST SPEAKER VERIFICATION, , and , in: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, Texas, 2010 |
|
Crossmodal Matching of Speakers using Lip and Voice Features in Temporally Non-overlapping Audio and Video Streams, and , Idiap-RR-13-2010 |
|
Crossmodal Matching of Speakers using Lip and Voice Features in Temporally Non-overlapping Audio and Video Streams, and , in: 20th International Conference on Pattern Recognition, Istanbul, Turkey, International Association for Pattern Recognition (IAPR), Istanbul, Turkey, 2010 |
|
Evaluating the Robustness of Privacy-Sensitive Audio Features for Speech Detection in Personal Audio Log Scenarios, , , and , Idiap-RR-01-2010 |
|
Evaluating the Robustness of Privacy-Sensitive Audio Features for Speech Detection in Personal Audio Log Scenarios, , , and , in: ICASSP 2010, 2010 |
|
Flickr Groups: Multimedia Communities for Multimedia Analysis, and , Idiap-RR-18-2010 |
|
Hierarchical Multilayer Perceptron based Language Identification, , and , Idiap-RR-14-2010 |
|
Hierarchical Multilayer Perceptron based Language Identification, , and , in: Proceedings of Interspeech, Makuhari, Japan, pages 2722-2725, 2010 |
|
Introducing Crossmodal Biometrics: Person Identification from Distinct Audio & Visual Streams, and , Idiap-RR-29-2010 |
|
Introducing Crossmodal Biometrics:Person Identification from Distinct Audio & Visual Streams, and , in: IEEE Fourth International Conference on Biometrics: Theory, Applications and Systems, 2010 |
|
Kodak Moments and Flickr Diamonds: How Users Shape Large-scale Media, , and , in: Proc. of the 18th Intl. Conf. on Multimedia, Firenze, Italy, 2010 |
Mining Human Location-Routines Using a Multi-Level Approach to Topic Modeling, and , in: 2010 IEEE Second International Conference on Social Computing, SIN Symposium, Minneapolis, Minnesota, USA, 2010 |
|
Mining Human Location-Routines using a Multi-Level Topic Model, and , Idiap-RR-28-2010 |
|
Modeling and Understanding Flickr Communities through Topic-based Analysis, and , Idiap-RR-19-2010 |
|
Modeling and Understanding Flickr Communities through Topic-based Analysis, and , in: IEEE Transactions on Multimedia, 12(5), 2010 |
[DOI] |
Probabilistic Latent Sequential Motifs: Discovering temporal activity patterns in video scenes, , and , Idiap-RR-33-2010 |
|
Probabilistic Latent Sequential Motifs: Discovering temporal activity patterns in video scenes, , and , in: BMVC 2010, Aberystwyth University, Aberystwyth, BMVA Press, 2010 |
|
Probabilistic Mining of Socio-Geographic Routines from Mobile Phone Data, and , in: IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING, 4(4), 2010 |
|
The Robot Vision Track at ImageCLEF 2010, , , and , in: CLEF 2010 Notebook Papers/LABs/Workshops, 2010 |
[URL] |
Towards mixed language speech recognition systems, , and , Idiap-RR-15-2010 |
|
Towards mixed language speech recognition systems, , and , in: Proceedings of Interspeech, Makuhari, Japan, pages 278-281, 2010 |
|
Visual processing-inspired Fern-Audio features for Noise-Robust Speaker Verification, and , in: ACM 25th Symposium on Applied Computing, 2010, Sierre, Switzerland, Association for Computing Machinery, 2010 |
|
2009
Contextual classification of image patches with latent aspect models, , , and , in: EURASIP Journal on Image and Video Processing, Special Issue on Patches in Vision, 2009 |
|
Flickr Hypergroups, , , , and , in: Proceedings of the 17th ACM International Conference on Multimedia, 2009 |
|
Haar Local Binary Pattern Feature for Fast Illumination Invariant Face Detection, and , Idiap-RR-28-2009 |
|
Haar Local Binary Pattern Feature for Fast Illumination Invariant Face Detection, and , in: British Machine Vision Conference 2009, 2009 |
|
Investigating Privacy-Sensitive Features for Speech Detection in Multiparty Conversations, , , and , Idiap-RR-12-2009 |
|
Investigating Privacy-Sensitive Features for Speech Detection in Multiparty Conversations, , , and , in: Proceedings of Interspeech 2009, 2009 |
|
Learning and Predicting Multimodal Daily Life Patterns from Cell Phones, and , in: ICMI-MLMI, 2009 |
|
On Joint Modelling of Grapheme and Phoneme Information using KL-HMM for ASR, , and , Idiap-RR-24-2009 |
|
Posterior features applied to speech recognition tasks with user-defined vocabulary, , and , in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009 |
|
Social Network Analysis in Multimedia Indexing: Making Sense of People in Multiparty Recordings, , in: Proceedings of the Doctoral Consortium of the International Conference on Affective Computing & Intelligent Interaction (ACII), 2009 |
|
Speaker Change Detection with Privacy-Preserving Audio Cues, , , and , Idiap-RR-23-2009 |
|
Speaker Change Detection with Privacy-Preserving Audio Cues, , , and , in: Proceedings of ICMI-MLMI 2009, 2009 |
|
Topic Models for Scene Analysis and Abnormality Detection, and , in: 9th International Workshop in Visual Surveillance, IEEE, Kyoto, Japan, IEEE, 2009 |
|
Visual processing-inspired Fern-Audio features for Noise-Robust Speaker Verification, and , Idiap-RR-29-2009 |
|
Volterra Series for Analyzing MLP based Phoneme Posterior Probability Estimator, , , and , in: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2009 |
|
2008
Analyzing Flickr Groups, and , in: Proc. of the Intl. Conf. on Image and Video Retrieval, ACM, 2008 |
Topickr: Flickr Groups and Users Reloaded, and , in: MM '08: Proc. of the 16th ACM Intl. Conf. on Multimedia, ACM, 2008 |
Volterra Series for Analyzing MLP based Phoneme Posterior Probability Estimator, , , and , Idiap-RR-69-2008 |
|
| 1 | 2 |