Publications of IM2 sorted by recency
Enabling speech applications using Ad-Hoc Microphone Arrays, , École Polytechnique Fédérale de Lausanne, 2015 |
|
Gender Classification by LUT based boosting of Overlapping Block Patterns, , and , in: Scandinavian Conference on Image Analysis, pages 530-542, Springer International Publishing, 2015 |
[DOI] [URL] |
What Your Face Vlogs About: Expressions of Emotion and Big-Five Traits Impressions in YouTube, , , and , in: IEEE Transactions Affective Computing, 2014 |
|
The Workshop on Computational Personality Recognition 2014, , , , , and , in: Proceedings of the ACM International Conference on Multimedia, 2014 |
|
Mining Crowdsourced First Impressions in Online Social Video, and , in: IEEE Transactions on Multimedia, 16(7), 2014 |
|
Feature Mapping of Multiple Beamformed Sources for Robust Overlapping Speech Recognition Using a Microphone Array, , , , , , and , Idiap-RR-17-2014 |
|
Automatic Speech Recognition and Translation of a Swiss German Dialect: Walliserdeutsch, , and , in: Proceedings of Interspeech, 2014 |
|
Ad Hoc Microphone Array Calibration: Euclidean Distance Matrix Completion Algorithm and Theoretical Guarantees, , , , and , in: Signal Processing, 107:123–140, 2015 |
[DOI] |
Detecting speaker roles and topic changes in multiparty conversations using latent topic models, and , in: Proceedings of Interspeech, 2014 |
|
Enhanced Diffuse Field Model for Ad Hoc Microphone Array Calibration, , and , in: Signal Processing, 101:242-255, 2014 |
|
Enforcing Topic Diversity in a Document Recommender for Conversations, and , in: Proceedings of the Coling 2014 (25th International Conference on Computational Linguistics), Dublin, Ireland, pages 746-759, IEEE, 2014 |
|
Recurrent Convolutional Neural Networks for Scene Labeling, and , in: 31st International Conference on Machine Learning (ICML), Beijing, China, pages 82-90, JMLR, 2014 |
[URL] |
Ad-Hoc Microphone Array Calibration from Partial Distance Measurements, , , and , in: Proceedings of the 4th Joint Workshop on Hands-free speech communication and Microphone Arrays, Villers-les-Nancy, pages 1 - 5, IEEE, 2014 |
[DOI] |
Hi YouTube! Personality Impressions and Verbal Content in Social Video, , , and , in: 15th ACM International Conference on Multimodal Interaction, Sydney, Australia, ACM, 2013, 2013 |
|
Speech Processing, , in: Interactive Multimodal Information Management, pages 221--245, EPFL Press, 2013 |
Gammatone Wavelet Cepstral Coefficients for Robust Speech Recognition, , and , in: Proceedings of IEEE TENCON, 2013 |
|
A Savitzky-Golay Filtering Perspective of Dynamic Feature Computation, , and , in: IEEE Signal Processing Letters, 20(3):281 -- 284, 2013 |
[DOI] |
A Probabilistic Framework for Multiple Speaker Localization, , , and , in: Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2013 |
|
Context Aware Addressee Estimation for Human Robot Interaction, , , and , in: Proceedings of the 6th Workshop on Eye Gaze in Intelligent Human Machine Interaction: Gaze in Multimodal Interaction, 2013 |
Leveraging the robot dialog state for visual focus of attention recognition, , , , and , in: Proceedings of the 15th ACM on International Conference on Multimodal Interaction, 2013 |
Recurrent Convolutional Neural Networks for Scene Labeling, and , Idiap-RR-41-2013 |
|
End-to-end Phoneme Sequence Recognition using Convolutional Neural Networks, , and , Idiap-RR-40-2013 |
|
Manifold Sparse Beamforming, , and , in: IEEE International Workshop on Computational Advances in Multi-Sensor Adaptive Processing, Saint Martin, France, pages 113-116, IEEE, 2013 |
[DOI] |
Convexity in source separation: Models, geometry, and algorithms, , , , and , in: IEEE Signal Processing Magazine, Special Issue on Source Separation and Applications, 2013 |
|
Model-based Sparse Component Analysis for Reverberant Speech Localization, , , and , in: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing, pages 1439 - 1443, IEEE, 2014 |
[DOI] |
Broadcasting oneself: Visual Discovery of Vlogging Styles, , and , in: IEEE Transactions on Multimedia, 16(1):201-215, 2014 |
[DOI] |
One of a Kind: Inferring Personality Impressions in Meetings, and , in: 15th ACM International Conference on Multimodal Interaction, 2013 |
|
Cross-Domain Personality Prediction: From Video Blogs to Small Group Meetings, and , in: 15th ACM International Conference on Multimodal Interaction, 2013 |
|
Estimating Phoneme Class Conditional Probabilities from Raw Speech Signal using Convolutional Neural Networks, , and , in: Proceedings of Interspeech, 2013 |
|
Interactive Multimodal Information Management, and , EPFL Press, 2013 |
Interactive Multimodal Information Management: Shaping the Vision, and , in: Interactive Multimodal Information Management, pages 1-17, EPFL Press, 2013 |
|
Euclidean Distance Matrix Completion for Ad-hoc Microphone Array Calibration, , , and , in: Proceedings IEEE International Conference On Digital Signal Processing, 2013 |
|
Learning to Rank on Network Data, , and , in: Mining and Learning with Graphs, 2013 |
|
Computing Text Semantic Relatedness using the Contents and Links of a Hypertext Encyclopedia, and , in: International Joint Conference on artificial intelligence, 2013 |
|
Medical image annotation, , in: Interactive Multimodal Information Management, EPFL Press, 2013 |
|
Learning to learn new models of human activities in indoor settings1, , , and , in: Interactive Multimodal Information Management, EPFL Press, 2013 |
|
Learning to learn new models of human activities in indoor settings1, , , and , in: Interactive Multimodal Information Management, EPFL Press, 2013 |
Investigating the Impact of Language Style and Vocal Expression on Social Roles of Participants in Professional Meetings, and , in: Affective Computing and Intelligent Interaction, Geneva, pages 324-329, IEEE, 2013 |
[DOI] |
Multilingual speech recognition A posterior based approach, , École Polytechnique Fédérale de Lausanne (EPFL), 2013 |
|
Mining Conversational Social Video, , EPFL, 2013 |
|
Automatic Social Role Recognition In Professional Meetings Using Conditional Random Fields, and , in: Proceedings of Interspeech, 2013 |
|
Recurrent Convolutional Neural Networks for Scene Parsing, and , Idiap-RR-22-2013 |
|
Diverse Keyword Extraction from Conversations, and , in: Proceedings of the ACL 2013 (51th Annual Meeting of the Association for Computational Linguistics ), Short Papers, Sofia, Bulgaria, pages 651-657, ACL, 2013 |
|
Applying multi- and cross-lingual stochastic phone space transformations to non-native speech recognition, , , , and , in: IEEE Transactions on Audio, Speech, and Language Processing, 2013 |
[DOI] |
Estimating Phoneme Class Conditional Probabilities from Raw Speech Signal using Convolutional Neural Networks, , and , Idiap-RR-13-2013 |
|
Who Wants To Be A Millionaire? (II), , and , Idiap-Com-02-2013 |
|
Using out-of-language data to improve an under-resourced speech recognizer, , , and , Idiap-RR-09-2013 |
|
Using out-of-language data to improve an under-resourced speech recognizer, , , and , in: Speech Communication, 2013 |
[DOI] [URL] |
FaceTube: predicting personality from facial expressions of emotion in online conversational video, , and , in: Proceedings International Conference on Multimodal Interfaces (ICMI-MLMI), 2012 |
|
IMPROVING ACOUSTIC BASED KEYWORD SPOTTING USING LVCSR LATTICES, , and , in: Proceedings on IEEE International Conference on Acoustics, Speech and Signal Processing, IEEE, Japan, pages 4413-4416, 2012 |
IMPROVING ACOUSTIC BASED KEYWORD SPOTTING USING LVCSR LATTICES, , and , Idiap-RR-36-2012 |
|
Automatic Social Role Recognition In Professional Meetings, and , Idiap-RR-35-2012 |
|
Modeling dominance effects on nonverbal behaviors using granger causality, , , , , and , in: Proceedings of International Conference on Multimodal Interaction, ICMI 2012, Santa Monica, CA, 2012 |
|
Improving Object Classification using Pose Information, , , and , Idiap-RR-30-2012 |
|
Leveraging speaker diarization for meeting recognition from distant microphones, , and , in: Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, pages 4390--4393, 2010 |
Assessing the Impact of Language Style on Emergent Leadership Perception from Ubiquitous Audio, , and , in: Proceedings of the 11th International Conference on Mobile and Ubiquitous Multimedia, Ulm, Germany, 2012 |
|
Joint Similarity Learning for Predicting Links in Networks with Multiple-type Links, and , Idiap-RR-29-2015 |
|
COMBINING CEPSTRAL NORMALIZATION AND COCHLEAR IMPLANT-LIKE SPEECH PROCESSING FOR MICROPHONE ARRAY-BASED SPEECH RECOGNITION, , and , in: Proceedings of the IEEE Workshop on Spoken Language Technology, 2012 |
|
MediaParl: Bilingual mixed language accented speech database, , , , , and , Idiap-RR-03-2013 |
|
MediaParl: Bilingual mixed language accented speech database, , , , , and , in: Proceedings of the 2012 IEEE Workshop on Spoken Language Technology, pages 263--268, 2012 |
|
The YouTube Lens: Crowdsourced Personality Impressions and Audiovisual Analysis of Vlogs, and , in: IEEE Transactions on Multimedia, 2012 |
|
Using Sparse Classification Outputs as Feature Observations for Noise Robust ASR, , , , , and , in: Proceedings of Interspeech, 2012 |
|
Combination of Sparse Classification and Multilayer Perceptron for Noise Robust ASR, , , , , and , in: Proceedings of Interspeech, 2012 |
|
Boosting localized binary features for speech recognition, , and , in: Symposium on Machine Learning in Speech and Language Processing (MLSLP), 2012 |
|
From Research to Reality: Evaluation of a Single-Computer Real-Time LVCSR System for Speech-Based Retrieval, , , and , Idiap-RR-12-2017 |
|
Using Crowdsourcing to Compare Document Recommendation Strategies for Conversations, and , in: RecSys, Recommendation Utility Evaluation (RUE 2012), Dublin, Ireland, pages 15-20, 2012 |
|
The Good, the Bad, and the Angry: Analyzing Crowdsourced Impressions of Vloggers, and , in: Proceedings of AAAI International Conference on Weblogs and Social Media, 2012 |
|
Microphone Array Beampattern Characterization for Hands-free Speech Applications, , and , in: IEEE 7th Sensor Array and Multichannel Signal Processing Workshop(SAM), Hoboken, NJ, USA, pages 473-476, 2012 |
|
Computing Text Semantic Relatedness using the Contents and Links of a Hypertext Encyclopedia, and , in: Artificial Intelligence Journal, 194:176–202, 2013 |
[DOI] |
Automatic detection of conflict escalation in spoken conversations, , and , in: INTERSPEECH, ISCA, Portland, Oregon, USA., 2012 |
|
Using Crowdsourcing to Compare Document Recommendation Strategies for Conversations, and , Idiap-RR-14-2012 |
|
Comparing different acoustic modeling techniques for multilingual boosting, , , , and , Idiap-RR-01-2013 |
|
Comparing different acoustic modeling techniques for multilingual boosting, , , , and , in: Proceedings of Interspeech, Portland, Oregon, 2012 |
|
Phase AutoCorrelation (PAC) features for noise robust speech recognition, , , and , in: Speech Communication, 54(7):867–880, 2012 |
[DOI] |
Boosting under-resourced speech recognizers by exploiting out of language data - Case study on Afrikaans, , and , Idiap-RR-15-2012 |
|
Alternative search techniques for face detection using location estimation and binary features, , ECOLE POLYTECHNIQUE FEDERALE DE LAUSANNE, 2012 |
|
Automatic detection of conflicts in spoken conversations: ratings and analysis of broadcast political debates, , and , in: Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, Japan, 2012 |
|
Boosting under-resourced speech recognizers by exploiting out of language data - Case study on Afrikaans, , and , in: Proceedings of the 3rd International Workshop on Spoken Languages Technologies for Under-resourced Languages, Cape Town, pages 60--67, 2012 |
|
The ICSI RT-09 Speaker Diarization System, , , , , , , and , in: IEEE Transactions on Audio, Speech, and Language Processing, 20(2):371--381, 2012 |
[DOI] |
The Kaldi Speech Recognition Toolkit, , , , , , , , , , , , and , in: IEEE 2011 Workshop on Automatic Speech Recognition and Understanding, Hilton Waikoloa Village, Big Island, Hawaii, US, IEEE Signal Processing Society, 2011 |
|
Using KL-divergence and multilingual information to improve ASR for under-resourced languages, , and , in: Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, Kyoto, pages 4869--4872, 2012 |
|
Hierarchical Tandem Features for ASR in Mandarin, , and , in: Proceedings of Interspeech, 2011 |
Evaluation of Meeting Support Technology, and , in: Multimodal Signal Processing: Human Interactions in Meetings, pages 237-252, Cambridge University Press, 2012 |
User Requirements for Meeting Support Technology, and , in: Multimodal Signal Processing: Human Interactions in Meetings, pages 210-221, Cambridge University Press, 2012 |
Multimodal Signal Processing for Meetings: an Introduction, and , in: Multimodal Signal Processing: Human Interactions in Meetings, pages 1-11, Cambridge University Press, 2012 |
|
BROADBAND BEAMPATTERN FOR MULTI-CHANNEL SPEECH ACQUISITION AND DISTANT SPEECH RECOGNITION, , and , Idiap-RR-39-2011 |
|
Analysis of Group Conversations: Modeling Social Verticality, and , in: Computer Analysis of Human Behavior, pages 293-322, Springer London, 2011 |
Speaker Diarization, and , in: Multimodal Signal Processing: Human Interactions in Meetings, Cambridge University Press, 2012 |
[URL] |
Language-Independent Socio-Emotional Role Recognition in the AMI Meetings Corpus, and , in: Proceedings of Interspeech, 2011 |
|
Analysis and Comparison of Recent MLP Features for LVCSR Systems, , and , in: Proceedings of Interspeech 2011, 2011 |
|
Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features, , and , in: Speech Communication, 54(1), 2012 |
[DOI] |
Speaker Diarization of Meetings based on Speaker Role N-gram Models, , and , in: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011 |
|
MULTISTREAM SPEAKER DIARIZATION THROUGH INFORMATION BOTTLENECK SYSTEM OUTPUTS COMBINATION, , and , in: Proceedings of International Conference on Acoustics, Speech and Signal Processing, 2011 |
|
Current trends in multilingual speech processing, , , , , , , , and , in: Sadhana, 36(5):885–915, 2011 |
[DOI] [URL] |
Transcribing meetings with the AMIDA systems, , , , , , , , , and , in: IEEE Transactions on Audio, Speech, and Language Processing, 20(2):486--498, 2012 |
[DOI] [URL] |
Boosting Localized Features for Speaker and Speech Recognition, , Ecole Polytechnique Federale de Lausanne (EPFL), 2011 |
|
Information Bottleneck Features for HMM/GMM Speaker Diarization of Meetings Recordings, and , in: Interspeech, Florence, Italy, pages 953-956, 2011 |
|
Continuous Speech Recognition using Boosted Binary Features, , and , Idiap-RR-35-2011 |
|
IMPROVING MICROPHONE ARRAY SPEECH RECOGNITION WITH COCHLEAR IMPLANT-LIKE SPECTRALLY REDUCED SPEECH, , and , Idiap-RR-40-2011 |
|
Fast and flexible Kullback-Leibler divergence based acoustic modeling for non-native speech recognition, , and , in: Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding, Hawaii, USA, pages 348-353, 2011 |
|
Fast and flexible Kullback-Leibler divergence based acoustic modeling for non-native speech recognition, , and , Idiap-RR-01-2012 |
|
A Fast Parts-based Approach to Speaker Verification using Boosted Slice Classifiers, , and , in: IEEE Transactions on Information Forensics and Security, 7(1):241-254, 2012 |
|
Fast Speaker Verification on Mobile Phone data using Boosted Slice Classifiers, , and , in: IAPR IEEE International Joint Conference on Biometrics, Washington DC, 2011 |
|
VlogSense: Conversational Behavior and Social Attention in YouTube, and , in: Transactions on Multimedia Computing, Communications and Applications, 2011 |
|
Multimodal Signal Processing: Human Interactions in Meetings, , , and , Cambridge University Press, 2012 |
[URL] |
A Just-in-Time Document Retrieval System for Dialogues or Monologues, , , and , in: SIGDIAL 2011 (12th annual SIGDIAL Meeting on Discourse and Dialogue), Demonstration Session, Portland, OR, pages 350-352, 2011 |
|
Finding Information in Multimedia Records of Meetings, , and , in: IEEE Multimedia, 19(2):48-57, 2012 |
[DOI] [URL] |
A Speech-based Just-in-Time Retrieval System using Semantic Search, , , and , in: Proceedings of the ACL-HLT 2011 System Demonstrations (49th Annual Meeting of the Association for Computational Linguistics), Portland, OR, pages 80-86, 2011 |
[URL] |
An Integrated Framework for Multi-Channel Multi-Source Localization and Voice Activity Detection, , , , and , in: The Third Joint Workshop on Hands-free Speech Communication and Microphone Arrays, 2011 |
|
Privacy-sensitive recognition of group conversational context with sociometers, , , and , in: Springer Multimedia Systems Journal, 2011 |
|
Using a Wikipedia-based Semantic Relatedness Measure for Document Clustering., and , in: Graph-based Methods for Natural Language Processing, 2011 |
|
Improving non-native ASR through stochastic multilingual phoneme space transformations, , , , and , in: Proceedings of Interspeech, Florence, Italy, pages 537-540, 2011 |
|
Improving non-native ASR through stochastic multilingual phoneme space transformations, , , , and , Idiap-RR-19-2011 |
|
AN INTEGRATED FRAMEWORK FOR MULTI-CHANNEL MULTI-SOURCE LOCALIZATION AND VOICE ACTIVITY DETECTION, , , , and , Idiap-RR-16-2011 |
|
Cepstral normalisation and the signal to noise ratio spectrum in automatic speech recognition, , in: Speech Communication, 53(8):991--1001, 2011 |
[DOI] |
You Are Known by How You Vlog: Personality Impressions and Nonverbal Behavior in YouTube, , and , in: Proceedings of AAAI International Conference on Weblogs and Social Media, Barcelona, 2011 |
|
A Speech-based Just-in-Time Retrieval System using Semantic Search, , , and , Idiap-RR-31-2011 |
|
When Users Meet Technology: The Meeting Browser Development Helix, , and , Idiap-RR-05-2011 |
|
Phoneme Recognition using Boosted Binary Features, , and , in: IEEE Intl. Conference on Acoustics, Speech and Signal Processing 2011, 2011 |
|
Language dependent universal phoneme posterior estimation for mixed language speech recognition, , , and , in: Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, Prag, CZ, pages 5012-5015, 2011 |
|
Call me Guru: user categories and large-scale behavior in YouTube, and , in: Social Media Computing, Springer, 2011 |
|
Vlogcast Yourself: Nonverbal Behavior and Attention in Social Media, and , in: Proceedings International Conference on Multimodal Interfaces (ICMI-MLMI), 2010 |
|
Automatic Content Linking: Speech-based Just-in-time Retrieval for Multimedia Archives, , , , , and , in: Proceedings of the 33rd Annual ACM SIGIR Conference, Geneva, Switzerland, pages 703, 2010 |
[DOI] |
Finding Information in Multimedia Records of Meetings, , and , Idiap-RR-32-2011 |
|
Automatic Identification of Discourse Markers in Multiparty Dialogues: An In-Depth Study of Like and Well, and , in: Computer Speech and Language, 25(3):499-518, 2011 |
[DOI] |
View-Based Appearance Model Online Learning for 3D Deformable Face Tracking, and , in: Proc. Int. Conf. on Computer Vision Theory and Applications, Angers, 2010 |
|
Cepstral normalisation and the signal to noise ratio spectrum in automatic speech recognition., , Idiap-RR-15-2011 |
|
An Information Theoretic Approach to Speaker Diarization of Meeting Recordings, , Ecole polytechnique fédérale de Lausanne, 2010 |
|
VARIATIONAL BAYESIAN SPEAKER DIARIZATION OF MEETING RECORDINGS, , and , in: Proceedings of ICASSP, 2010 |
|
A Comparative Study of MLP Front-ends for Mandarin ASR, , , , and , in: Proceedings of Interspeech, Japan, 2010 |
|
Social Signal Processing: Understanding Nonverbal Communication in Social Interactions, and , in: Proceedings of Measuring Behavior 2010, Eindhoven (The Netherlands), 2010 |
|
Hierarchical Tandem Features for ASR in Mandarin, , and , Idiap-RR-39-2010 |
|
Fast Bounding Box Estimation based Face Detection, and , in: ECCV, Workshop on Face Detection: Where we are, and what next?, 2010 |
[URL] |
Recognizing conversational context in group interaction using privacy-sensitive mobile sensors, , , and , in: Proceedings of International Conference on Mobile and Ubiquitous Multimedia, Limassol, Cyprus, 2010 |
|
Language dependent universal phoneme posterior estimation for mixed language speech recognition, , , and , Idiap-RR-13-2011 |
|
Tuning-Robust Initialization Methods for Speaker Diarization, and , in: IEEE Transactions on Audio, Speech, and Language Processing, 18(8):2028-2037, 2010 |
[DOI] |
Voices of Vlogging, and , in: Proceedings of AAAI International Conference on Weblogs and Social Media, Washington DC, 2010 |
|
Mining group nonverbal conversational patterns using probabilistic topic models, and , in: IEEE Transactions on Multimedia, 2010 |
|
Predicting Remote Versus Collocated Group Interactions using Nonverbal Cues, , and , in: Proc. Int. Conf. on Multimodal Interfaces, Workshop on Multimodal Sensor-Based Systems and Mobile Phones for Social Computing,, Cambridge, 2009 |
[DOI] |
Introducing Crossmodal Biometrics:Person Identification from Distinct Audio & Visual Streams, and , in: IEEE Fourth International Conference on Biometrics: Theory, Applications and Systems, 2010 |
|
A Random Walk Framework to Compute Textual Semantic Similarity: a Unified Model for Three Benchmark Tasks, and , in: Proceedings of the 4th IEEE International Conference on Semantic Computing (ICSC 2010 ), Carnegie Mellon University, Pittsburgh, PA, USA, 2010 |
|
Towards a standard for dialogue act annotation, , , , , , , , , , , and , in: 7th International Conference on Language Resources and Evaluation, Malta, 2010 |
[URL] |
The ACLD: Speech-based Just-in-Time Retrieval of Meeting Transcripts, Documents and Websites, , , and , in: ACM Multimedia Workshop on Searching Spontaneous Conversational Speech, Florence, Italy, 2010 |
|
The ACLD: Speech-based Just-in-Time Retrieval of Multimedia Documents and Websites, , , and , Idiap-RR-26-2010 |
|
English Spoken Term Detection in Multilingual Recordings, , and , in: Proceedings of Interspeech, Makuhari, Japan, 2010, ISCA, Makuhari, Japan, 2010 |
|
Sparse Component Analysis for Speech Recognition in Multi-Speaker Environment, , and , in: Proceedings of Interspeech, Makuhari, Japan, 2010 |
|
Floor Holder Detection and End of Speaker Turn Prediction in Meetings, , and , in: International Conference on Speech and Language Processing, Interspeech, Makuhari, Japan, ISCA, 2010 |
|
Towards mixed language speech recognition systems, , and , in: Proceedings of Interspeech, Makuhari, Japan, pages 278-281, 2010 |
|
Hierarchical Multilayer Perceptron based Language Identification, , and , in: Proceedings of Interspeech, Makuhari, Japan, pages 2722-2725, 2010 |
|
Tracter: A Lightweight Dataflow Framework, and , in: Proceedings of Interspeech, Makuhari, Japan, 2010 |
|
Audio–Visual Synchronisation for Speaker Diarisation, , and , in: International Conference on Speech and Language Processing, Interspeech, Makuhari, Japan, 2010 |
|
An Information Theoretic Combination of MFCC and TDOA Features for Speaker Diarization, , and , Idiap-RR-22-2010 |
|
Advances in Fast Multistream Diarization based on the Information Bottleneck Framework, , and , Idiap-RR-23-2010 |
|
Estimating Cohesion in Small Groups using Audio-Visual Nonverbal Behavior, and , Idiap-RR-12-2010 |
|
Introducing Crossmodal Biometrics: Person Identification from Distinct Audio & Visual Streams, and , Idiap-RR-29-2010 |
|
English Spoken Term Detection in Multilingual Recordings, , and , Idiap-RR-21-2010 |
|
Hierarchical Multilayer Perceptron based Language Identification, , and , Idiap-RR-14-2010 |
|
Towards mixed language speech recognition systems, , and , Idiap-RR-15-2010 |
|
Tracter: A Lightweight Dataflow Framework, and , Idiap-RR-10-2010 |
|
Fast Bounding Box Estimation based Face Detection, and , Idiap-RR-38-2010 |
|
A Multimodal Corpus for Studying Dominance in Small Group Conversations, , and , in: LREC workshop on Multimodal Corpora: Advances in Capturing, Coding and Analyzing Multimodality, Malta, May 2010, 2010 |
|
Joint Pose Estimator and Feature Learning for Object Detection, , , and , in: Proceedings of the IEEE International Conference on Computer Vision, 2009 |
Multi-Person Visual Focus of Attention from Head Pose and Meeting Contextual Cues, and , in: IEEE Trans. on Pattern Analysis and Machine Intelligence, 33(1):101-116, 2011 |
|
Learning Large Margin Likelihood for Realtime Head Pose Tracking, and , in: IEEE Int. Conference on Image Processing, Cairo, Egypt, IEEE, 2009 |
|
Structure and appearance features for robust 3D facial actions tracking, and , in: IEEE Proc. Int. Conf. on Multimedia and Expo, IEEE, 2009 |
|
Analysis of MLP Based Hierarchical Phoneme Posterior Probability Estimator, , , , and , in: IEEE Transcations on Audio, Speech, and Language Processing, 19(2):225-241, 2011 |
|
Crossmodal Matching of Speakers using Lip and Voice Features in Temporally Non-overlapping Audio and Video Streams, and , Idiap-RR-13-2010 |
|
Finding without searching, , Idiap-Com-01-2010 |
|
Application of Out-Of-Language Detection To Spoken-Term Detection, and , in: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, USA, 2010 |
|
BOOSTED BINARY FEATURES FOR NOISE-ROBUST SPEAKER VERIFICATION, , and , in: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, Texas, 2010 |
|
Multistream Speaker Diarization beyond Two Acoustic Feature Streams, , and , in: International Conference on Acoustics, Speech, and Signal Processing, 2010 |
|
An Alternative Scanning Strategy to Detect Faces, and , in: Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, USA, 2010 |
|
Using Audio and Visual Cues for Speaker Diarisation Initialisation, and , in: International Conference on Acoustics, Speech and Signal Processing, 2010 |
|
An Adaptive Initialization Method for Speaker Diarization based on Prosodic Features, and , in: Proceedings IEEE International Conference on Acoustics, Speech and Signal Processing, Dallas, USA, pages 4946-4949, 2010 |
|
MLP Based Hierarchical System for Task Adaptation in ASR, , and , in: Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding, Merano, Italy, 2009 |
|
A Multimedia Retrieval System Using Speech Input, , , , , , , , , , and , in: Proceedings of ICMI-MLMI 2009 (11th International Conference on Multimodal Interfaces and 6th Workshop on Machine Learning for Multimodal Interaction), Cambridge, MA, 2009 |
|
The FEMTI guidelines for contextual MT evaluation: principles and tools, , and , in: Linguistica Antverpiensia New Series, 8, 2009 |
Managing Multimodal Data, Metadata and Annotations: Challenges and Solutions, , in: Multimodal Signal Processing for Human-Computer Interaction, Elsevier / Academic Press, 2009 |
Accessing a Large Multimodal Corpus using an Automatic Content Linking Device, , , and , in: Multimodal Corpora: From Models of Natural Interaction to Systems and Applications, Springer-Verlag, 2009 |
[DOI] |
User Interface Design in a Just-in-time Retrieval System for Meetings, , , , , , and , Idiap-RR-38-2009 |
|
APPLICATIONS OF SIGNAL ANALYSIS USING AUTOREGRESSIVE MODELS FOR AMPLITUDE MODULATION, , , and , in: IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2009, WASPAA '09., IEEE, Mohonk Mountain House, New Paltz, New York, USA, 2009 |
[URL] |
APPLICATIONS OF SIGNAL ANALYSIS USING AUTOREGRESSIVE MODELS FOR AMPLITUDE MODULATION, , , and , Idiap-RR-35-2009 |
|
Wearing a YouTube hat: directors, comedians, gurus, and user aggregated behavior, and , in: Proceedings of the 17th ACM International Conference on Multimedia, ACM, 2009 |
|
An Adaptive Initialization Method for Speaker Diarization based on Prosodic Features, and , Idiap-RR-02-2010 |
|
Haar Local Binary Pattern Feature for Fast Illumination Invariant Face Detection, and , Idiap-RR-28-2009 |
|
Evaluating the Robustness of Privacy-Sensitive Audio Features for Speech Detection in Personal Audio Log Scenarios, , , and , Idiap-RR-01-2010 |
|
Robust Speaker Diarization for Short Speech Recordings, and , in: Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding, Merano, Italy, pages 432-437, 2009 |
|
SNR Features for Automatic Speech Recognition, , in: Proceedings of the IEEE workshop on Automatic Speech Recognition and Understanding, Merano, Italy, 2009 |
|
On Joint Modelling of Grapheme and Phoneme Information using KL-HMM for ASR, , and , Idiap-RR-24-2009 |
|
Robust Speaker Diarization for Short Speech Recordings, and , Idiap-RR-26-2009 |
|
Discovering Group Nonverbal Conversational Patterns with Topics, and , in: Proceedings ICMI-MLMI, 2009 |
|
Speaker Change Detection with Privacy-Preserving Audio Cues, , , and , in: Proceedings of ICMI-MLMI 2009, 2009 |
|
Automatic Role Recognition in Multiparty Recordings: Using Social Affiliation Networks for Feature Extraction, , and , in: IEEE Transactions on Multimedia, 11(7), 2009 |
|
Automatic Role Recognition in Multiparty Recordings Using Social Networks and Probabilistic Sequential Models, , and , in: ACM International Conference on Multimedia, 2009 |
|
Investigating the use of Visual Focus of Attention for Audio-Visual Speaker Diarisation, , , and , in: Proceedings of the ACM International Conference on Multimedia, Beijing, China, 2009 |
|
Implicit Human Centered Tagging, , and , in: Proceedings of IEEE Conference on Multimedia and Expo, 2009 |
|
Capturing Order in Social Interactions, , in: IEEE Signal Processing Magazine, 2009 |
|
Visual Speaker Localization Aided by Acoustic Models, , and , in: ACM Multimedia, 2009 |
SNR Features for Automatic Speech Recognition, , Idiap-RR-25-2009 |
|
KL Realignment for Speaker Diarization with Multiple Feature Streams, , and , in: 10th Annual Conference of the International Speech Communication Association, 2009 |
Mutual Information based Channel Selection for Speaker Diarization of Meetings Data, , and , in: Proceedings of International conference on acoustics speech and signal processing, 2009 |
Real-Time ASR from Meetings, , , , , , , , and , in: Proceedings of Interspeech, Brighton, UK., 2009 |
|
Automatic vs. human question answering over multimedia meeting recordings, and , in: 10th Annual Conference of the International Speech Communication Association, 2009 |
|
Automatic vs. human question answering over multimedia meeting recordings, and , Idiap-RR-13-2009 |
|
Investigating Privacy-Sensitive Features for Speech Detection in Multiparty Conversations, , , and , in: Proceedings of Interspeech 2009, 2009 |
|
Comparing meeting browsers using a task-based evaluation method, , Idiap-RR-11-2009 |
|
Tuning-Robust Initialization Methods for Speaker Diarization, and , Idiap-RR-35-2010 |
|
ClusterRank: A Graph Based Method for Meeting Summarization, , , and , Idiap-RR-09-2009 |
|
KL Realignment for Speaker Diarization with Multiple Feature Streams, , and , Idiap-RR-24-2010 |
|
Speech/Non-Speech Detection in Meetings from Automatically Extracted Low Resolution Visual Features, and , Idiap-RR-20-2009 |
|
Speaker Change Detection with Privacy-Preserving Audio Cues, , , and , Idiap-RR-23-2009 |
|
Novel initialization methods for Speaker Diarization, , Idiap-RR-07-2009 |
|
Steerable Features for Statistical 3D Dendrite Detection, , , , and , in: Proceedings of the International Conference on Medical Image Computing and Computer Assisted Intervention, 2009 |
Investigating Privacy-Sensitive Features for Speech Detection in Multiparty Conversations, , , and , Idiap-RR-12-2009 |
|
Characterising Conversationsal Group Dynamics Using Nonverbal Behaviour, , and , in: Proceedings ICME 2009, 2009 |
|
Real-Time ASR from Meetings, , , , , , , , and , Idiap-RR-15-2009 |
|
Visual Activity Context For Focus of Attention Estimation in Dynamic Meetings, , and , in: International Conference on Multimedia & Expo, 2009 |
|
Learning Rotational Features for Filament Detection, , and , in: Proceedings of the IEEE international conference on Computer Vision and Pattern Recognition, 2009 |
A MAP Approach to Noise Compensation of Speech, , Idiap-RR-08-2009 |
|
Modeling and Understanding Flickr Communities through Topic-based Analysis, and , Idiap-RR-19-2010 |
|
Posterior features applied to speech recognition tasks with user-defined vocabulary, , and , in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009 |
|
Non-linear mapping for multi-channel speech separation and robust overlapping speech recognition, , , and , in: Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2009 |
|
Volterra Series for Analyzing MLP based Phoneme Posterior Probability Estimator, , , and , in: Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2009 |
|
Visual activity context for focus of attention estimation in dynamic meetings, , and , Idiap-RR-02-2009 |
|
MULTI-MODAL SPEAKER DIARIZATION OF REAL-WORLD MEETINGS USING COMPRESSED-DOMAIN VIDEO FEATURES, , and , in: International Conference on Audio, Speech and Signal Processing, 2009 |
|
Topickr: Flickr Groups and Users Reloaded, and , in: MM '08: Proc. of the 16th ACM Intl. Conf. on Multimedia, ACM, 2008 |
Analyzing Flickr Groups, and , in: Proc. of the Intl. Conf. on Image and Video Retrieval, ACM, 2008 |
MUTUAL INFORMATION BASED CHANNEL SELECTION FOR SPEAKER DIARIZATION OF MEETINGS DATA, , and , in: Proceedings of International Conference on Acoustics, Speech and Signal Processing, 2009 |
|
An Information Theoretic Approach to Speaker Diarization of Meeting Data, , and , in: IEEE Transactions on Audio Speech and Language Processing, 17(7), 2009 |
[DOI] |
Role Recognition in Multiparty Recordings using Social Affiliation Networks and Discrete Distributions, , , and , in: International Conference on Multimodal Interfaces, Chania, Greece, 2008 |
|
Tracking the visual focus of attention for a varying number of wandering people, , , and , in: IEEE Trans. on Pattern Analysis and Machine Intelligence, 30(7), 2008 |
|
Recognizing Human Visual Focus of Attention from Head Pose in Meetings, and , in: IEEE Transactions on Systems, Man, Cybernetics, Part-B, Vol. 39(No. 1), 2009 |
|
Contextual classification of image patches with latent aspect models, , , and , in: EURASIP Journal on Image and Video Processing, Special Issue on Patches in Vision, 2009 |
|
Face Detection using Ferns, and , Idiap-Com-01-2011 |
|
Low-Delay Error Resilient Speech Coding Using Sub-band Hilbert Envelopes, , and , Idiap-RR-75-2008 |
|
MODIFIED DISCRETE COSINE TRANSFORM FOR ENCODING RESIDUAL SIGNALS IN FREQUENCY DOMAIN LINEAR PREDICTION, , and , Idiap-RR-74-2008 |
|
Automatic Out-of-Language Detection based on Confidence Measures derived from LVCSR Word and Phone Lattices, , Idiap-RR-06-2009 |
|
Modeling Dominance in Group Conversations using NonVerbal Activity Cues, , , and , in: IEEE Transactions on Audio, Speech and Language Processing, 2008 |
|
Methods for Asynchronous and Non-Invasive EEG-Based Brain-Computer Interfaces. Towards Intelligent Brain-Actuated Wheelchairs, , University of Barcelona, 2008 |
|
Principled Detection-by-classification from Multiple Views, , and , in: proceedings of the International Conference on Computer Vision Theory and Applications, 2008 |
Classification-based Probabilistic Modeling of Texture Transition for Fast Line Search Tracking and Delineation, , , and , in: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008 |
Multi-layer Boosting for Pattern Recognition, , in: Pattern Recognition Letter, 30, 2009 |
Multi-Camera People Tracking with a Probabilistic Occupancy Map, , , and , in: IEEE Transactions on Pattern Analysis and Machine Intelligence, 30(2), 2008 |
Multiple Object Tracking using Flow Linear Programming, , and , Idiap-RR-10-2009 |
|
Social Signal Processing: Survey of an Emerging Domain, , and , in: Image and Vision Computing, 2009 |
|
Entropy coding of Quantized Spectral Components in FDLP audio codec, , and , Idiap-RR-71-2008 |
|
Modulation Frequency Features For Phoneme Recognition In Noisy Speech, , and , in: Journal of Acoustical Society of America - Express Letters, 2008 |
|
Modulation Frequency Features For Phoneme Recognition In Noisy Speech, , and , Idiap-RR-70-2008 |
|
Volterra Series for Analyzing MLP based Phoneme Posterior Probability Estimator, , , and , Idiap-RR-69-2008 |
|
ESTIMATING THE DOMINANT PERSON IN MULTI-PARTY CONVERSATIONS USING SPEAKER DIARIZATION STRATEGIES, , , and , in: IEEE International Conference on Acoustics, Speech, and Signal Processing, 2007 |
Investigating Automatic Dominance Estimation in Groups From Visual Attention and Speaking Activity, , , , and , in: International Conference on Multi-modal Interfaces, 2008 |
|
Towards Audio-Visual On-line Diarization Of Participants In Group Meetings, and , in: European Conference on Computer Vision Workshop on Multi-camera and Multi-modal Sensor Fusion, 2008 |
|
Kernel Based Text-Independnent Speaker Verification, , and , Idiap-RR-68-2008 |
|
Fast Recognition of Anticipation Related Potentials, , and , in: IEEE Transactions on Biomedical Engineering, 2008 |
|
Multi-layer Boosting for Pattern Recognition, , Idiap-RR-76-2008 |
|
Task-based evaluation of meeting browsers: from BET task elicitation to user behavior analysis, , , and , in: 6th International Conference on Language Resources and Evaluation, Marrakech, Morocco, 2008 |
|
Reference-based vs. task-based evaluation of human language technology, , in: LREC 2008 ELRA Workshop on Evaluation, ELRA, Marrakech, Morocco, 2008 |
|
Dimensionality of Dialogue Act Tagsets: An Empirical Analysis of Large Corpora, , in: Language Resources and Evaluation, 42(1), 2008 |
[DOI] |
Towards an Objective Test for Meeting Browsers: the BET4TQB Pilot Experiment, , , and , in: Machine Learning for Multimodal Interaction IV, Springer-Verlag, 2008 |
[DOI] |
Machine Learning for Multimodal Interaction V, and , Springer-Verlag, LNCS, volume 5237, 2008 |
[DOI] |
Machine Learning for Multimodal Interaction IV, , and , Springer-Verlag, LNCS, volume 4892, 2008 |
[DOI] |
Social Signal Processing: State-of-the-Art and Future Perspectives of an Emerging Domain, , , and , in: Proceedings of the ACM International Conference on Multimedia, 2008 |
|
Social Signals, their Function, and Automatic Analysis: A Survey, , , and , in: Proceedings of International Conference on Multimodal Interfaces (to appear), 2008 |
|
Role Recognition for Meeting Participants: an Approach Based on Lexical Information and Social Network Analysis, , , , and , in: ACM International Conference on Multimedia, Vancouver, Canada, 2008 |
|
Identifying Dominant People in Meetings from Audio-Visual Sensors, and , in: International Conference on Automatic Face and Gesture Recognition, Amsterdam, The Netherlands, 2008 |
|
Identifying Dominant People in Meetings from Audio-Visual Sensors, and , Idiap-RR-65-2008 |
|
Stationary Features and Cat Detection, and , in: Journal of Machine Learning Research, 9, 2008 |
Automated Delineation of Dendritic Networks in Noisy Image Stacks, , and , in: proceedings of the European Conference on Computer Vision, 2008 |
Multi-Camera Tracking and Atypical Motion Detection with Behavioral Maps, , and , in: proceedings of the European Conference on Computer Vision, 2008 |