SPEECH EMOTION RECOGNITION: ARE WE THERE YET?
|
|
- Bartholomew McDonald
- 5 years ago
- Views:
Transcription
1 SPEECH EMOTION RECOGNITION: ARE WE THERE YET? CARLOS BUSSO Multimodal Signal Processing (MSP) lab The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science
2 Why study emotion or attitude? Emotions play a crucial role in human interaction Emotional (vs. cognitive) reasoning Emotion is reflected in our body Our emotions change the minds of others People rely on emotion for making decisions Knowing the user s emotional state should help to adjust system performance User can be more engaged and have a more effective interaction with the system 2
3 Speech: a multimodal signal 3
4 Emotion Recognition in the Lab Databases Acted data Categorical representation of emotions Few speakers Limited data Features Many features are selected Feature set is reduced (pca, fisher linear discriminant, sequential forward feature selection, etc ) Results From 50% - 85% depending on the task [Pantic_2003, Cowie_2001] 4
5 Emotion Recognition in Real Applications Too much variability Speaker dependency Emotional descriptors Differences in acoustic environments Emotional models do not generalize!!! Results are strongly dependent on the recording condition Models are not easily generalized to other databases or online recognition task Speaker dependent models give better performance than Speaker independent models [Austermann et al. 2005] Cross-corpus testing resulted in drop in performance [Shami and Verhelst, 2007] How can we build models that generalize across problems? 5
6 Examples Sample 1: [fru; ()] [ang; ()] [neu; ()] [fru; ()] [oth; (exasperated)] [neu; ()] Sample 2: Sample 3: [ang; ()] [ang; ()] [ang; ()] 6
7 Robustness and Generalization We have made important progress, but challenges remain: At MSP: Databases Features Models Big corpora Reliable labels Natural behaviors Feature normalization Feature selection Feature representation Model adaptation Specialized detectors Temporal/contextual modeling 7
8 OUTLINE Introduction MSP-PODCAST: The The largest Largest speech Speech emotional Emotional database Database Case study 1: Multi-Task Learning Case Study 2: Training with Soft Labels Conclusions Reza Lotfian and Carlos Busso, "Building naturalistic emotionally balanced speech corpus by retrieving emotional speech from existing podcast recordings," IEEE Transactions on Affective Computing, vol. To appear, Alec Burmania, Srinivas Parthasarathy, and Carlos Busso, "Increasing the reliability of crowdsourcing evaluations using online quality assessment," IEEE Transactions on Affective Computing, vol. 7, no. 4, October-December Soroosh Mariooryad, Reza Lotfian, and Carlos Busso, "Building a naturalistic emotional speech corpus by retrieving expressive behaviors from existing speech corpora," in Interspeech 2014, Singapore, September 2014, pp
9 Current Emotional Corpora Lack of naturalness Limited in size Limited number of speakers Unbalanced emotional content Corpus Size # Spkr. Type Lang. IEMOCAP 12h26m 10 acted English MSP-IMPROV 9h35m 12 acted English CREMA-D 7,442 samples 91 acted English Chen Bimodal 9,900 samples 100 acted English Emo-DB 22m 10 acted German GEMEP 1,260 samples 10 acted - VAM-Audio 48m 47 spont. German TUM AVIC 10h23m 21 spont. English SEMAINE 6h21m 20 spont. English FAU-AIBO 9h12m 51 spont. German RECOLA 2h50m 46 spont. French 9
10 Unbalanced Emotional Content Categorical labels Anger, happiness, sadness, neutral Dimensional or attribute based labels Valence (negative vs positive) Arousal (calm vs active) More accurate emotion descriptors (intensity) Emoticons from IEMOCAP MSP-IMPROV SEMAINE RECOLA VAM acted databases 10
11 Contribution: MSP-PODCAST Use existing podcast recordings Divide into speaker turns Emotion retrieval to balance the emotional content Annotate using crowdsourcing framework Podcast recording 11
12 MSP-PODCAST Corpus Audio sharing website Podcast Audio 16kHz, 16b PCM, Mono Diarization Duration filter 2.75s< <11s SNR filter Perceptual Evaluation Manual screening Emotion retrieval Remove telephone quality Speech only audio Music detection High quality audio Collection of audio recordings (Podcasts) Naturalness and the diversity of emotions Creative Commons copyright licenses Interviews, talk shows, news, discussions, education, storytelling, comedy, science, technology, politics, economics, business, arts, culture, sports 12
13 MSP-PODCAST Corpus Audio sharing website Podcast Audio 16kHz, 16b PCM, Mono Diarization Duration filter 2.75s< <11s SNR filter Perceptual Evaluation Manual screening Emotion retrieval Remove telephone quality Speech only audio Music detection High quality audio Automatic speaker diarization Single speaker segments Duration: Longer than 2.75sec: Long enough for annotators + extract reliable features Shorter than 11sec: Emotion content not changing significantly High SNR, no music, no phone quality The speaker attribution intelligent service, 13
14 MSP-PODCAST Corpus Audio sharing website Podcast Audio 16kHz, 16b PCM, Mono Diarization Duration filter 2.75s< <11s SNR filter Perceptual Evaluation Manual screening Emotion retrieval Remove telephone quality Speech only audio Music detection High quality audio Retrieve samples that convey desired emotion Developing and optimizing different machine learning framework using existing databases Balance the emotional content 14
15 MSP-PODCAST Corpus Audio sharing website Podcast Audio 16kHz, 16b PCM, Mono Diarization Duration filter 2.75s< <11s SNR filter Perceptual Evaluation Manual screening Emotion retrieval Remove telephone quality Speech only audio Music detection High quality audio Subjective annotation is costly: screening only retrieved samples before uploading for annotations Diarization fails on overlapping speech and interrupting speakers 15
16 Perceptual Evaluation Use Amazon Mechanical Turk Crowdsourcing Verify if a worker is spamming in real time Collect reference set Collect Reference Set (Gold Standard) R R R R R R R R R R Interleave Reference Set with Data (Online Quality Assessment) Data R End Data R End Data R Phase A Phase B Trace performance in real time videos x REFERENCE SET videos REFERENCE SET videos Alec Burmania, Srinivas Parthasarathy, and Carlos Busso, "Increasing the reliability of crowdsourcing evaluations using online quality assessment," IEEE Transactions on Affective Computing, vol. 7, no. 4, October-December
17 Status of the MSP-PODCAST: Ongoing Work With emotion labels: 20,988 sentences (35h, 1m) Segmented turns 199,836 sentences from 1,019 podcasts Arousal Valence 17
18 Status of the MSP-PODCAST: Ongoing Work Hot anger Cold anger Arousal Happiness Neutral Natural recordings The largest database Valence Multiple speakers Rich emotional content 18
19 MSP-PODCAST: Power of Data Effect of amount of data on the performance of emotion recognition models Arousal Linear output layer Number layers [2,3,4,5] Number of nodes [128, 256, 512, 1,024] Data: add 1,000 sentences at a time Example: Arousal Concordance correlation coefficient (CCC) IS-2013 (6,372 features Input nodes) Hidden layers (N nodes) Two layers Five layers 256 nodes per layer 19
20 OUTLINE Introduction MSP-PODCAST: The Largest Speech Emotional Database Case study 1: 1: Multi-Task Learning Learning Case Study 2: Training with Soft Labels Conclusions Srinivas Parthasarathy and Carlos Busso, "Jointly predicting arousal, valence and dominance with multi-task learning," in Interspeech 2017, Stockholm, Sweden, August Nominated for Best Student Paper at Interspeech 2017!
21 Multi-Task Learning Prediction of arousal, valence and dominance Previous studies have considered these dimensions as orthogonal descriptors Interrelation between these emotional attributes Goal: predicting emotional attributes with a unified framework Multi-task learning (MTL) implemented with deep neural networks (DNN) Srinivas Parthasarathy and Carlos Busso, "Jointly predicting arousal, valence and dominance with multi-task learning," in Interspeech 2017, Stockholm, Sweden, August
22 Multi-Task Emotion Recognition Leverage the relationship between attributes Arousal (calm versus active) Valence (negative versus positive) Dominance (weak versus strong) MTL-1 MTL-2 22
23 Multi-Task Emotion Recognition Weights learned using the development set MTL STL STL !=1, "=0 Arousal!=0, "=1 Valence !=0, "=0 Dominance α β
24 Multi-Task Emotion Recognition Within-corpus evaluation Multi-task learning (MTL) always better than single task learning (STL) Performance increase as we increase number of nodes Nodes / Layers 256 / / / 2 Type of task Concordance Correlation Coefficient Arousal Valence Dominance STL MTL MTL STL MTL MTL STL MTL MTL Within-Corpus Evaluation Validation Set Testing Set 10 speakers 887 sent. Training Set Rest of corpus 6,710 sentences 50 speakers 5,024 sent.
25 Multi-Task Emotion Recognition Cross-corpus evaluation Performance drops with respect to within-corpus evaluations Benefit of multi-task increases - 14% Concordance Correlation Coefficient Nodes / Layers Type of task Arousal Valence Dominance STL Best performance with lower number of nodes per layer 256 / 2 MTL MTL Cross-Corpus Evaluation Validation Set Testing Set 512 / 2 STL MTL speakers 887 sent. 50 speakers 5,024 sent. MTL STL Training Set 1024 / 2 MTL IEMOCAP MSP-IMPROV MTL
26 OUTLINE Introduction MSP-PODCAST: The Largest Speech Emotional Database Case study 1: Multi-Task Learning Case Study 2: 2: Training with with Soft Soft Labels Labels Conclusions R. Lotfian and C. Busso. Formulating emotion perception as a probabilistic model with application to categorical emotion classification. In Affective Computing and Intelligent Interaction (ACII 2017), San Antonio, TX, USA, October 2017
27 Training with Soft Labels Emotional labels often come from perceptual evaluations from multiple evaluators Expressive behaviors tend to be ambiguous with blended emotions Happy Sad Angry Evaluators disagree on the perceived emotion Noise or information? Assigning a single emotion per sentence oversimplifies the subjectivity in emotion perception Goal: leverage information provided by multiples evaluators Training emotion recognition with soft labels 27
28 Training with Soft Labels Straightforward approach Use distribution of emotions assigned by evaluators [Fayek et al., 2016] happiness happiness neutral happiness Sentence 1 Sentence 2 neutral happiness neutral neutral apple apple neu 0.25 hap = 0.75 apple apple neu 0.75 hap = 0.25 This approach ignores relationship between emotional classes Prioritize separation of unrelated categories (e.g., anger versus happiness) over related emotions (anger versus disgust) 28
29 Emotion Perception as a Probabilistic Model Each speech segment has a non-observable multivariate Gaussian distribution Task is to derive the distribution for a speech segment Use the expected value of distribution as a soft label apple apple neu 0.3 hap = 0.7 happiness happiness neutral happiness neutral happiness neutral neutral Happiness H H H N H N N N Happiness Neutral apple apple neu 0.55 hap = Neutral Sentence 1 Sentence 2
30 Experimental Evaluations MSP-Podcast Test set: data from 50 speakers (4,283 segments), Development set: data from 10 speakers (1,860 segments) Training rest of the corpus (7,289 segments) Seven-class problem: anger, sadness, happiness, surprised, disgust, contempt, and neutral (chances is 14%) Acoustic features: egemaps set [Eyben et al., 2016] DNN 2 hidden layers 512 nodes softmax layer Hap Neu Sad egemaps (88D) so.max layer 2 Hidden layers 30
31 Results Performance metrics average recall, average precision, and F1-score Human performance is only 39.6% (hard problem) Soft-Labels with from the expected intensity of emotion (SL-EIE) improved performance over majority vote labels and soft-labels proposed in previous work Rec [%] Pre [%] F1-Score Human Performance Majority vote Soft-label [Fayek, 2016] SL-EIE [proposed]
32 OUTLINE Introduction MSP-PODCAST: The Largest Speech Emotional Database Case study 1: Multi-Task Learning Case Study 2: Training with Soft Labels Conclusions
33 Summary Important contributions to increase robustness of emotion recognition systems: Resource level Model level using deep learning Are we there yet? Temporal dynamic modeling not yet, but soon Understand and modeling impact of emotion on other tasks Multimodal fusion using deep architectures Next step: use these models in real applications 33
34 Potential Impact Instrumental tools for health care Distance learning Security and defense (credibility assessment) 34
35 CARLOS BUSSO Tel: (972) Web: Multimodal Signal Processing (MSP) NAJMEH SADOUGHI Ph.D. Student Virtual characters REZA LOTFIAN Ph.D. Student Affective computing SRINIVAS PARTHASARATHY Ph.D. Student Affective computing KUSHA MURTHY Ph.D. Student Affective computing FEI TAO Ph.D. Student Audiovisual ASR MOHAMMED ABD EL-WAHAB Ph.D. Student Affective computing SUMIT JHA Ph.D. Student In-vehicle safety systems MICHELLE BANCROFT Undergraduate Student Emotion and Speaker verification DOROTHY MANTLE Undergraduate Student In-vehicle safety system ASIM GAZI Undergraduate Student Human robot interaction 35
Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification
Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification Reza Lotfian and Carlos Busso Multimodal Signal Processing (MSP) lab The University of Texas
More informationAnalysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information
Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, S. Narayanan Emotion
More informationRetrieving Categorical Emotions using a Probabilistic Framework to Define Preference Learning Samples
INTERSPEECH 216 September 8 12, 216, San Francisco, USA Retrieving Categorical Emotions using a Probabilistic Framework to Define Preference Learning Samples Reza Lotfian and Carlos Busso Multimodal Signal
More informationGender Based Emotion Recognition using Speech Signals: A Review
50 Gender Based Emotion Recognition using Speech Signals: A Review Parvinder Kaur 1, Mandeep Kaur 2 1 Department of Electronics and Communication Engineering, Punjabi University, Patiala, India 2 Department
More informationDimensional Emotion Prediction from Spontaneous Head Gestures for Interaction with Sensitive Artificial Listeners
Dimensional Emotion Prediction from Spontaneous Head Gestures for Interaction with Sensitive Artificial Listeners Hatice Gunes and Maja Pantic Department of Computing, Imperial College London 180 Queen
More informationThe Ordinal Nature of Emotions. Georgios N. Yannakakis, Roddy Cowie and Carlos Busso
The Ordinal Nature of Emotions Georgios N. Yannakakis, Roddy Cowie and Carlos Busso The story It seems that a rank-based FeelTrace yields higher inter-rater agreement Indeed, FeelTrace should actually
More informationOn Shape And the Computability of Emotions X. Lu, et al.
On Shape And the Computability of Emotions X. Lu, et al. MICC Reading group 10.07.2013 1 On Shape and the Computability of Emotion X. Lu, P. Suryanarayan, R. B. Adams Jr., J. Li, M. G. Newman, J. Z. Wang
More informationGeneral Soundtrack Analysis
General Soundtrack Analysis Dan Ellis oratory for Recognition and Organization of Speech and Audio () Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/
More informationOutline. Teager Energy and Modulation Features for Speech Applications. Dept. of ECE Technical Univ. of Crete
Teager Energy and Modulation Features for Speech Applications Alexandros Summariza(on Potamianos and Emo(on Tracking in Movies Dept. of ECE Technical Univ. of Crete Alexandros Potamianos, NatIONAL Tech.
More informationUsing the Soundtrack to Classify Videos
Using the Soundtrack to Classify Videos Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/
More informationGfK Verein. Detecting Emotions from Voice
GfK Verein Detecting Emotions from Voice Respondents willingness to complete questionnaires declines But it doesn t necessarily mean that consumers have nothing to say about products or brands: GfK Verein
More informationUMEME: University of Michigan Emotional McGurk Effect Data Set
THE IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, NOVEMBER-DECEMBER 015 1 UMEME: University of Michigan Emotional McGurk Effect Data Set Emily Mower Provost, Member, IEEE, Yuan Shangguan, Student Member, IEEE,
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 5, JULY
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 5, JULY 2011 1057 A Framework for Automatic Human Emotion Classification Using Emotion Profiles Emily Mower, Student Member, IEEE,
More informationResearch Proposal on Emotion Recognition
Research Proposal on Emotion Recognition Colin Grubb June 3, 2012 Abstract In this paper I will introduce my thesis question: To what extent can emotion recognition be improved by combining audio and visual
More informationANALYSIS OF FACIAL FEATURES OF DRIVERS UNDER COGNITIVE AND VISUAL DISTRACTIONS
ANALYSIS OF FACIAL FEATURES OF DRIVERS UNDER COGNITIVE AND VISUAL DISTRACTIONS Nanxiang Li and Carlos Busso Multimodal Signal Processing (MSP) Laboratory Department of Electrical Engineering, The University
More informationFacial expression recognition with spatiotemporal local descriptors
Facial expression recognition with spatiotemporal local descriptors Guoying Zhao, Matti Pietikäinen Machine Vision Group, Infotech Oulu and Department of Electrical and Information Engineering, P. O. Box
More informationEmotion Recognition Modulating the Behavior of Intelligent Systems
2013 IEEE International Symposium on Multimedia Emotion Recognition Modulating the Behavior of Intelligent Systems Asim Smailagic, Daniel Siewiorek, Alex Rudnicky, Sandeep Nallan Chakravarthula, Anshuman
More informationEmotion Recognition using a Cauchy Naive Bayes Classifier
Emotion Recognition using a Cauchy Naive Bayes Classifier Abstract Recognizing human facial expression and emotion by computer is an interesting and challenging problem. In this paper we propose a method
More informationAudiovisual to Sign Language Translator
Technical Disclosure Commons Defensive Publications Series July 17, 2018 Audiovisual to Sign Language Translator Manikandan Gopalakrishnan Follow this and additional works at: https://www.tdcommons.org/dpubs_series
More informationOpen Research Online The Open University s repository of research publications and other research outputs
Open Research Online The Open University s repository of research publications and other research outputs Toward Emotionally Accessible Massive Open Online Courses (MOOCs) Conference or Workshop Item How
More informationCOMBINING CATEGORICAL AND PRIMITIVES-BASED EMOTION RECOGNITION. University of Southern California (USC), Los Angeles, CA, USA
COMBINING CATEGORICAL AND PRIMITIVES-BASED EMOTION RECOGNITION M. Grimm 1, E. Mower 2, K. Kroschel 1, and S. Narayanan 2 1 Institut für Nachrichtentechnik (INT), Universität Karlsruhe (TH), Karlsruhe,
More informationBlue Eyes Technology
Blue Eyes Technology D.D. Mondal #1, Arti Gupta *2, Tarang Soni *3, Neha Dandekar *4 1 Professor, Dept. of Electronics and Telecommunication, Sinhgad Institute of Technology and Science, Narhe, Maharastra,
More informationNoise-Robust Speech Recognition Technologies in Mobile Environments
Noise-Robust Speech Recognition echnologies in Mobile Environments Mobile environments are highly influenced by ambient noise, which may cause a significant deterioration of speech recognition performance.
More informationUSING EMOTIONAL NOISE TO UNCLOUD AUDIO-VISUAL EMOTION PERCEPTUAL EVALUATION. Emily Mower Provost, Irene Zhu, and Shrikanth Narayanan
USING EMOTIONAL NOISE TO UNCLOUD AUDIO-VISUAL EMOTION PERCEPTUAL EVALUATION Emily Mower Provost, Irene Zhu, and Shrikanth Narayanan Electrical Engineering and Computer Science, University of Michigan,
More informationIEMOCAP: Interactive emotional dyadic motion capture database
IEMOCAP: Interactive emotional dyadic motion capture database Carlos Busso, Murtaza Bulut, Chi-Chun Lee, Abe Kazemzadeh, Emily Mower, Samuel Kim, Jeannette N. Chang, Sungbok Lee and Shrikanth S. Narayanan
More informationAUDIO-VISUAL EMOTION RECOGNITION USING AN EMOTION SPACE CONCEPT
16th European Signal Processing Conference (EUSIPCO 28), Lausanne, Switzerland, August 25-29, 28, copyright by EURASIP AUDIO-VISUAL EMOTION RECOGNITION USING AN EMOTION SPACE CONCEPT Ittipan Kanluan, Michael
More informationPerceptual Enhancement of Emotional Mocap Head Motion: An Experimental Study
2017 Seventh International Conference on Affective Computing and Intelligent Interaction (ACII) Perceptual Enhancement of Emotional Mocap Head Motion: An Experimental Study Yu Ding Univeristy of Houston
More informationAdaptation of Classification Model for Improving Speech Intelligibility in Noise
1: (Junyoung Jung et al.: Adaptation of Classification Model for Improving Speech Intelligibility in Noise) (Regular Paper) 23 4, 2018 7 (JBE Vol. 23, No. 4, July 2018) https://doi.org/10.5909/jbe.2018.23.4.511
More informationSpeech Emotion Recognition with Emotion-Pair based Framework Considering Emotion Distribution Information in Dimensional Emotion Space
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Speech Recognition with -Pair based Framework Considering Distribution Information in Dimensional Space Xi Ma 1,3, Zhiyong Wu 1,2,3, Jia Jia 1,3,
More information- - Xiaofen Xing, Bolun Cai, Yinhu Zhao, Shuzhen Li, Zhiwei He, Weiquan Fan South China University of Technology
- - - - -- Xiaofen Xing, Bolun Cai, Yinhu Zhao, Shuzhen Li, Zhiwei He, Weiquan Fan South China University of Technology 1 Outline Ø Introduction Ø Feature Extraction Ø Multi-modal Hierarchical Recall Framework
More informationCorrecting Time-Continuous Emotional Labels by Modeling the Reaction Lag of Evaluators
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. X, NO. X, DECEMBER 203 Correcting Time-Continuous Emotional Labels by Modeling the Reaction Lag of Evaluators Soroosh Mariooryad, Student Member, IEEE, Carlos
More informationAcoustic Signal Processing Based on Deep Neural Networks
Acoustic Signal Processing Based on Deep Neural Networks Chin-Hui Lee School of ECE, Georgia Tech chl@ece.gatech.edu Joint work with Yong Xu, Yanhui Tu, Qing Wang, Tian Gao, Jun Du, LiRong Dai Outline
More informationCOMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM: APPLICATION TO SPEECH EMOTION RECOGNITION
Journal of Engineering Science and Technology Vol. 11, No. 9 (2016) 1221-1233 School of Engineering, Taylor s University COMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM: APPLICATION TO SPEECH EMOTION
More informationValence-arousal evaluation using physiological signals in an emotion recall paradigm. CHANEL, Guillaume, ANSARI ASL, Karim, PUN, Thierry.
Proceedings Chapter Valence-arousal evaluation using physiological signals in an emotion recall paradigm CHANEL, Guillaume, ANSARI ASL, Karim, PUN, Thierry Abstract The work presented in this paper aims
More informationDecision tree SVM model with Fisher feature selection for speech emotion recognition
Sun et al. EURASIP Journal on Audio, Speech, and Music Processing (2019) 2019:2 https://doi.org/10.1186/s13636-018-0145-5 RESEARCH Decision tree SVM model with Fisher feature selection for speech emotion
More informationA Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China
A Vision-based Affective Computing System Jieyu Zhao Ningbo University, China Outline Affective Computing A Dynamic 3D Morphable Model Facial Expression Recognition Probabilistic Graphical Models Some
More informationAudio-based Emotion Recognition for Advanced Automatic Retrieval in Judicial Domain
Audio-based Emotion Recognition for Advanced Automatic Retrieval in Judicial Domain F. Archetti 1,2, G. Arosio 1, E. Fersini 1, E. Messina 1 1 DISCO, Università degli Studi di Milano-Bicocca, Viale Sarca,
More informationALTHOUGH there is agreement that facial expressions
JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 Cross-Cultural and Cultural-Specific Production and Perception of Facial Expressions of Emotion in the Wild Ramprakash Srinivasan, Aleix
More informationEMOTION DETECTION THROUGH SPEECH AND FACIAL EXPRESSIONS
EMOTION DETECTION THROUGH SPEECH AND FACIAL EXPRESSIONS 1 KRISHNA MOHAN KUDIRI, 2 ABAS MD SAID AND 3 M YUNUS NAYAN 1 Computer and Information Sciences, Universiti Teknologi PETRONAS, Malaysia 2 Assoc.
More informationNot All Moods are Created Equal! Exploring Human Emotional States in Social Media
Not All Moods are Created Equal! Exploring Human Emotional States in Social Media Munmun De Choudhury Scott Counts Michael Gamon Microsoft Research, Redmond {munmund, counts, mgamon}@microsoft.com [Ekman,
More informationFacial Expression Biometrics Using Tracker Displacement Features
Facial Expression Biometrics Using Tracker Displacement Features Sergey Tulyakov 1, Thomas Slowe 2,ZhiZhang 1, and Venu Govindaraju 1 1 Center for Unified Biometrics and Sensors University at Buffalo,
More informationEmotionally Augmented Storytelling Agent
Emotionally Augmented Storytelling Agent The Effects of Dimensional Emotion Modeling for Agent Behavior Control Sangyoon Lee 1(&), Andrew E. Johnson 2, Jason Leigh 2, Luc Renambot 2, Steve Jones 3, and
More informationFacial Expression Recognition Using Principal Component Analysis
Facial Expression Recognition Using Principal Component Analysis Ajit P. Gosavi, S. R. Khot Abstract Expression detection is useful as a non-invasive method of lie detection and behaviour prediction. However,
More informationMemory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports
Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports Ramon Maldonado, BS, Travis Goodwin, PhD Sanda M. Harabagiu, PhD The University
More informationEMOTION DETECTION FROM TEXT DOCUMENTS
EMOTION DETECTION FROM TEXT DOCUMENTS Shiv Naresh Shivhare and Sri Khetwat Saritha Department of CSE and IT, Maulana Azad National Institute of Technology, Bhopal, Madhya Pradesh, India ABSTRACT Emotion
More informationAudio-visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space
2010 International Conference on Pattern Recognition Audio-visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space Mihalis A. Nicolaou, Hatice Gunes and Maja Pantic, Department
More informationarxiv: v4 [cs.cv] 1 Sep 2018
manuscript No. (will be inserted by the editor) Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge, Deep Architectures, and Beyond Dimitrios Kollias Panagiotis Tzirakis Mihalis A. Nicolaou
More informationA framework for the Recognition of Human Emotion using Soft Computing models
A framework for the Recognition of Human Emotion using Soft Computing models Md. Iqbal Quraishi Dept. of Information Technology Kalyani Govt Engg. College J Pal Choudhury Dept. of Information Technology
More informationJia Jia Tsinghua University 26/09/2017
Jia Jia jjia@tsinghua.edu.cn Tsinghua University 26/09/2017 Stage 1: Online detection of mental health problems Stress Detection via Harvesting Social Media Detecting Stress Based on Social Interactions
More informationEMOTION CLASSIFICATION: HOW DOES AN AUTOMATED SYSTEM COMPARE TO NAÏVE HUMAN CODERS?
EMOTION CLASSIFICATION: HOW DOES AN AUTOMATED SYSTEM COMPARE TO NAÏVE HUMAN CODERS? Sefik Emre Eskimez, Kenneth Imade, Na Yang, Melissa Sturge- Apple, Zhiyao Duan, Wendi Heinzelman University of Rochester,
More informationIntroduction to affect computing and its applications
Introduction to affect computing and its applications Overview What is emotion? What is affective computing + examples? Why is affective computing useful? How do we do affect computing? Some interesting
More informationAffective Game Engines: Motivation & Requirements
Affective Game Engines: Motivation & Requirements Eva Hudlicka Psychometrix Associates Blacksburg, VA hudlicka@ieee.org psychometrixassociates.com DigiPen Institute of Technology February 20, 2009 1 Outline
More informationIntelligent Machines That Act Rationally. Hang Li Bytedance AI Lab
Intelligent Machines That Act Rationally Hang Li Bytedance AI Lab Four Definitions of Artificial Intelligence Building intelligent machines (i.e., intelligent computers) Thinking humanly Acting humanly
More informationThe SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation
INTERSPEECH 2016 September 8 12, 2016, San Francisco, USA The SRI System for the NIST OpenSAD 2015 Speech Activity Detection Evaluation Martin Graciarena 1, Luciana Ferrer 2, Vikramjit Mitra 1 1 SRI International,
More informationReal life emotions in French and English TV video clips: an integrated annotation protocol combining continuous and discrete approaches
Real life emotions in French and English TV video clips: an integrated annotation protocol combining continuous and discrete approaches L. Devillers, R. Cowie, J-C. Martin, E. Douglas-Cowie, S. Abrilian,
More informationMotivation: Attention: Focusing on specific parts of the input. Inspired by neuroscience.
Outline: Motivation. What s the attention mechanism? Soft attention vs. Hard attention. Attention in Machine translation. Attention in Image captioning. State-of-the-art. 1 Motivation: Attention: Focusing
More informationEECS 433 Statistical Pattern Recognition
EECS 433 Statistical Pattern Recognition Ying Wu Electrical Engineering and Computer Science Northwestern University Evanston, IL 60208 http://www.eecs.northwestern.edu/~yingwu 1 / 19 Outline What is Pattern
More informationRecognition & Organization of Speech & Audio
Recognition & Organization of Speech & Audio Dan Ellis http://labrosa.ee.columbia.edu/ Outline 1 2 3 Introducing Projects in speech, music & audio Summary overview - Dan Ellis 21-9-28-1 1 Sound organization
More informationEffect of Sensor Fusion for Recognition of Emotional States Using Voice, Face Image and Thermal Image of Face
Effect of Sensor Fusion for Recognition of Emotional States Using Voice, Face Image and Thermal Image of Face Yasunari Yoshitomi 1, Sung-Ill Kim 2, Takako Kawano 3 and Tetsuro Kitazoe 1 1:Department of
More informationAn Affect Prediction Approach through Depression Severity Parameter Incorporation in Neural Networks
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden An Affect Prediction Approach through Depression Severity Parameter Incorporation in Neural Networks Rahul Gupta, Saurabh Sahu +, Carol Espy-Wilson
More informationSpeech Enhancement Based on Deep Neural Networks
Speech Enhancement Based on Deep Neural Networks Chin-Hui Lee School of ECE, Georgia Tech chl@ece.gatech.edu Joint work with Yong Xu and Jun Du at USTC 1 Outline and Talk Agenda In Signal Processing Letter,
More informationarxiv: v5 [cs.cv] 1 Feb 2019
International Journal of Computer Vision - Special Issue on Deep Learning for Face Analysis manuscript No. (will be inserted by the editor) Deep Affect Prediction in-the-wild: Aff-Wild Database and Challenge,
More informationA Dynamic Model for Identification of Emotional Expressions
A Dynamic Model for Identification of Emotional Expressions Rafael A.M. Gonçalves, Diego R. Cueva, Marcos R. Pereira-Barretto, and Fabio G. Cozman Abstract This paper discusses the dynamics of emotion
More informationRecognising Emotions from Keyboard Stroke Pattern
Recognising Emotions from Keyboard Stroke Pattern Preeti Khanna Faculty SBM, SVKM s NMIMS Vile Parle, Mumbai M.Sasikumar Associate Director CDAC, Kharghar Navi Mumbai ABSTRACT In day to day life, emotions
More informationDeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation
DeepASL: Enabling Ubiquitous and Non-Intrusive Word and Sentence-Level Sign Language Translation Biyi Fang Michigan State University ACM SenSys 17 Nov 6 th, 2017 Biyi Fang (MSU) Jillian Co (MSU) Mi Zhang
More informationSociable Robots Peeping into the Human World
Sociable Robots Peeping into the Human World An Infant s Advantages Non-hostile environment Actively benevolent, empathic caregiver Co-exists with mature version of self Baby Scheme Physical form can evoke
More informationCONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS
CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS Mitchel Weintraub and Leonardo Neumeyer SRI International Speech Research and Technology Program Menlo Park, CA, 94025 USA ABSTRACT
More informationIntelligent Machines That Act Rationally. Hang Li Toutiao AI Lab
Intelligent Machines That Act Rationally Hang Li Toutiao AI Lab Four Definitions of Artificial Intelligence Building intelligent machines (i.e., intelligent computers) Thinking humanly Acting humanly Thinking
More informationThe OMG-Emotion Behavior Dataset
The OMG-Emotion Behavior Dataset Pablo Barros, Nikhil Churamani, Egor Lakomkin, Henrique Siqueira, Alexander Sutherland and Stefan Wermter Knowledge Technology, Department of Informatics University of
More informationUsing Source Models in Speech Separation
Using Source Models in Speech Separation Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/
More informationRethinking Cognitive Architecture!
Rethinking Cognitive Architecture! Reconciling Uniformity and Diversity via Graphical Models! Paul Rosenbloom!!! 1/25/2010! Department of Computer Science &! Institute for Creative Technologies! The projects
More informationEMOTION is at the core of human behavior, influencing
IEEE TRANSACTIONS ON AFFECTIVE COMPUTING, VOL. 5, NO. 2, APRIL-JUNE 2014 201 Robust Unsupervised Arousal Rating: A Rule-Based Framework with Knowledge-Inspired Vocal Features Daniel Bone, Senior Member,
More informationALTHOUGH there is agreement that facial expressions
JOURNAL OF L A T E X CLASS FILES, VOL. 14, NO. 8, AUGUST 2015 1 Cross-Cultural and Cultural-Specific Production and Perception of Facial Expressions of Emotion in the Wild Ramprakash Srinivasan, Aleix
More informationBFI-Based Speaker Personality Perception Using Acoustic-Prosodic Features
BFI-Based Speaker Personality Perception Using Acoustic-Prosodic Features Chia-Jui Liu, Chung-Hsien Wu, Yu-Hsien Chiu* Department of Computer Science and Information Engineering, National Cheng Kung University,
More informationFudan University, China
Cyber Psychosocial and Physical (CPP) Computation Based on Social Neuromechanism -Joint research work by Fudan University and University of Novi Sad By Professor Weihui Dai Fudan University, China 1 Agenda
More informationAdvances in Intelligent Systems and 533: Book Title: Proceedings of the Inter.
JAIST Reposi https://dspace.j Title Optimizing Fuzzy Inference Systems f Speech Emotion Recognition Author(s)Elbarougy, Reda; Akagi, Masato Citation Advances in Intelligent Systems and 533: 85-95 Issue
More informationSpeech Enhancement Using Deep Neural Network
Speech Enhancement Using Deep Neural Network Pallavi D. Bhamre 1, Hemangi H. Kulkarni 2 1 Post-graduate Student, Department of Electronics and Telecommunication, R. H. Sapat College of Engineering, Management
More informationChapter 12 Conclusions and Outlook
Chapter 12 Conclusions and Outlook In this book research in clinical text mining from the early days in 1970 up to now (2017) has been compiled. This book provided information on paper based patient record
More informationOnline Speaker Adaptation of an Acoustic Model using Face Recognition
Online Speaker Adaptation of an Acoustic Model using Face Recognition Pavel Campr 1, Aleš Pražák 2, Josef V. Psutka 2, and Josef Psutka 2 1 Center for Machine Perception, Department of Cybernetics, Faculty
More informationEmote to Win: Affective Interactions with a Computer Game Agent
Emote to Win: Affective Interactions with a Computer Game Agent Jonghwa Kim, Nikolaus Bee, Johannes Wagner and Elisabeth André Multimedia Concepts and Application, Faculty for Applied Computer Science
More informationUSING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES
USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES Varinthira Duangudom and David V Anderson School of Electrical and Computer Engineering, Georgia Institute of Technology Atlanta, GA 30332
More informationFace Analysis : Identity vs. Expressions
Hugo Mercier, 1,2 Patrice Dalle 1 Face Analysis : Identity vs. Expressions 1 IRIT - Université Paul Sabatier 118 Route de Narbonne, F-31062 Toulouse Cedex 9, France 2 Websourd Bâtiment A 99, route d'espagne
More informationMotion Control for Social Behaviours
Motion Control for Social Behaviours Aryel Beck a.beck@ntu.edu.sg Supervisor: Nadia Magnenat-Thalmann Collaborators: Zhang Zhijun, Rubha Shri Narayanan, Neetha Das 10-03-2015 INTRODUCTION In order for
More informationTelephone Based Automatic Voice Pathology Assessment.
Telephone Based Automatic Voice Pathology Assessment. Rosalyn Moran 1, R. B. Reilly 1, P.D. Lacy 2 1 Department of Electronic and Electrical Engineering, University College Dublin, Ireland 2 Royal Victoria
More informationR Jagdeesh Kanan* et al. International Journal of Pharmacy & Technology
ISSN: 0975-766X CODEN: IJPTFI Available Online through Research Article www.ijptonline.com FACIAL EMOTION RECOGNITION USING NEURAL NETWORK Kashyap Chiranjiv Devendra, Azad Singh Tomar, Pratigyna.N.Javali,
More informationKALAKA-3: a database for the recognition of spoken European languages on YouTube audios
KALAKA3: a database for the recognition of spoken European languages on YouTube audios Luis Javier RodríguezFuentes, Mikel Penagarikano, Amparo Varona, Mireia Diez, Germán Bordel Grupo de Trabajo en Tecnologías
More informationRecognition & Organization of Speech and Audio
Recognition & Organization of Speech and Audio Dan Ellis Electrical Engineering, Columbia University http://www.ee.columbia.edu/~dpwe/ Outline 1 2 3 4 5 Introducing Tandem modeling
More informationDesigning Human-like Video Game Synthetic Characters through Machine Consciousness
Designing Human-like Video Game Synthetic Characters through Machine Consciousness Raúl Arrabales, Agapito Ledezma and Araceli Sanchis Computer Science Department Carlos III University of Madrid http://conscious-robots.com/raul
More informationEdge Based Grid Super-Imposition for Crowd Emotion Recognition
Edge Based Grid Super-Imposition for Crowd Emotion Recognition Amol S Patwardhan 1 1Senior Researcher, VIT, University of Mumbai, 400037, India ---------------------------------------------------------------------***---------------------------------------------------------------------
More informationPerformance of Gaussian Mixture Models as a Classifier for Pathological Voice
PAGE 65 Performance of Gaussian Mixture Models as a Classifier for Pathological Voice Jianglin Wang, Cheolwoo Jo SASPL, School of Mechatronics Changwon ational University Changwon, Gyeongnam 64-773, Republic
More informationFACIAL EXPRESSION RECOGNITION FROM IMAGE SEQUENCES USING SELF-ORGANIZING MAPS
International Archives of Photogrammetry and Remote Sensing. Vol. XXXII, Part 5. Hakodate 1998 FACIAL EXPRESSION RECOGNITION FROM IMAGE SEQUENCES USING SELF-ORGANIZING MAPS Ayako KATOH*, Yasuhiro FUKUI**
More informationCPSC81 Final Paper: Facial Expression Recognition Using CNNs
CPSC81 Final Paper: Facial Expression Recognition Using CNNs Luis Ceballos Swarthmore College, 500 College Ave., Swarthmore, PA 19081 USA Sarah Wallace Swarthmore College, 500 College Ave., Swarthmore,
More informationAuditory Scene Analysis
1 Auditory Scene Analysis Albert S. Bregman Department of Psychology McGill University 1205 Docteur Penfield Avenue Montreal, QC Canada H3A 1B1 E-mail: bregman@hebb.psych.mcgill.ca To appear in N.J. Smelzer
More informationTemporal Context and the Recognition of Emotion from Facial Expression
Temporal Context and the Recognition of Emotion from Facial Expression Rana El Kaliouby 1, Peter Robinson 1, Simeon Keates 2 1 Computer Laboratory University of Cambridge Cambridge CB3 0FD, U.K. {rana.el-kaliouby,
More informationNoise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise
4 Special Issue Speech-Based Interfaces in Vehicles Research Report Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise Hiroyuki Hoshino Abstract This
More informationA HMM-based Pre-training Approach for Sequential Data
A HMM-based Pre-training Approach for Sequential Data Luca Pasa 1, Alberto Testolin 2, Alessandro Sperduti 1 1- Department of Mathematics 2- Department of Developmental Psychology and Socialisation University
More informationPrice list Tools for Research & Development
Price list Tools for Research & Development All prior price lists are no longer valid after this price list was issued. For information about the currently valid price list contact HörTech ggmbh. All prices
More informationModeling and Recognizing Emotions from Audio Signals: A Review
Modeling and Recognizing Emotions from Audio Signals: A Review 1 Ritu Tanwar, 2 Deepti Chaudhary 1 UG Scholar, 2 Assistant Professor, UIET, Kurukshetra University, Kurukshetra, Haryana, India ritu.tanwar2012@gmail.com,
More informationFacial Expression Classification Using Convolutional Neural Network and Support Vector Machine
Facial Expression Classification Using Convolutional Neural Network and Support Vector Machine Valfredo Pilla Jr, André Zanellato, Cristian Bortolini, Humberto R. Gamba and Gustavo Benvenutti Borba Graduate
More informationDialogue Scenario Collection of Persuasive Dialogue with Emotional Expressions via Crowdsourcing
Dialogue Scenario Collection of Persuasive Dialogue with Emotional Expressions via Crowdsourcing Koichiro Yoshino,, Yoko Ishikawa, Masahiro Mizukami, Yu Suzuki, Sakti Sakriani, and Satoshi Nakamura Graduate
More information