EMOTION CLASSIFICATION: HOW DOES AN AUTOMATED SYSTEM COMPARE TO NAÏVE HUMAN CODERS?

Similar documents
EMOTIONS are one of the most essential components of

Emotion Recognition Modulating the Behavior of Intelligent Systems

Audio-based Emotion Recognition for Advanced Automatic Retrieval in Judicial Domain

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 5, JULY

COMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM: APPLICATION TO SPEECH EMOTION RECOGNITION

Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest

EMOTION DETECTION FROM VOICE BASED CLASSIFIED FRAME-ENERGY SIGNAL USING K-MEANS CLUSTERING

Gender Based Emotion Recognition using Speech Signals: A Review

Decision tree SVM model with Fisher feature selection for speech emotion recognition

This is the accepted version of this article. To be published as : This is the author version published as:

Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification

Sound Texture Classification Using Statistics from an Auditory Model

EMOTIONS RECOGNITION BY SPEECH AND FACIAL EXPRESSIONS ANALYSIS

Colon cancer subtypes from gene expression data

On Shape And the Computability of Emotions X. Lu, et al.

Emotion Recognition using a Cauchy Naive Bayes Classifier

ANALYSIS OF FACIAL FEATURES OF DRIVERS UNDER COGNITIVE AND VISUAL DISTRACTIONS

Not All Moods are Created Equal! Exploring Human Emotional States in Social Media

AUDIO-VISUAL EMOTION RECOGNITION USING AN EMOTION SPACE CONCEPT

Facial expression recognition with spatiotemporal local descriptors

Modeling and Recognizing Emotions from Audio Signals: A Review

Hierarchical Classification of Emotional Speech

Emotion Detection Using Physiological Signals. M.A.Sc. Thesis Proposal Haiyan Xu Supervisor: Prof. K.N. Plataniotis

Speech Emotion Recognition with Emotion-Pair based Framework Considering Emotion Distribution Information in Dimensional Emotion Space

SVM based Emotion Recognition using Spectral Features and PCA

Classification of valence using facial expressions of TV-viewers

- - Xiaofen Xing, Bolun Cai, Yinhu Zhao, Shuzhen Li, Zhiwei He, Weiquan Fan South China University of Technology

Dimensional Emotion Prediction from Spontaneous Head Gestures for Interaction with Sensitive Artificial Listeners

Applying One-vs-One and One-vs-All Classifiers in k-nearest Neighbour Method and Support Vector Machines to an Otoneurological Multi-Class Problem

Some Studies on of Raaga Emotions of Singers Using Gaussian Mixture Model

Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information

Automated Estimation of mts Score in Hand Joint X-Ray Image Using Machine Learning

Audio-visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space

EEG Signal Description with Spectral-Envelope- Based Speech Recognition Features for Detection of Neonatal Seizures

Selection of Emotionally Salient Audio-Visual Features for Modeling Human Evaluations of Synthetic Character Emotion Displays

EMOTION is at the core of human behavior, influencing

Exploring Normalization Techniques for Human Judgments of Machine Translation Adequacy Collected Using Amazon Mechanical Turk

Classification Accuracy Comparison of Asthmatic Wheezing Sounds Recorded under Ideal and Real-world Conditions

A Review on Dysarthria speech disorder

Comparison of the SVM classification results between original and DWT denoised respiratory signals considering to the transients noise

HUMAN behavior is inherently multimodal and

Analyzing Personality through Social Media Profile Picture Choice

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

Automatic Classification of Perceived Gender from Facial Images

Speaker-independent emotion recognition exploiting a psychologically-inspired binary cascade classification schema

Reading Emotions from Speech using Deep Neural Networks

Classification of mammogram masses using selected texture, shape and margin features with multilayer perceptron classifier.

Speech Emotion Recognition using CART algorithm

Using Affect Awareness to Modulate Task Experience: A Study Amongst Pre-Elementary School Kids

Outline. Teager Energy and Modulation Features for Speech Applications. Dept. of ECE Technical Univ. of Crete

Study on Aging Effect on Facial Expression Recognition

GfK Verein. Detecting Emotions from Voice

Towards Socio- and Neuro-feedback Treatment for Schizophrenia. By: PhD Student: Yasir Tahir Supervisor: Asst.Prof Justin Dauwels

Fuzzy Model on Human Emotions Recognition

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

Vital Responder: Real-time Health Monitoring of First- Responders

COMBINING CATEGORICAL AND PRIMITIVES-BASED EMOTION RECOGNITION. University of Southern California (USC), Los Angeles, CA, USA

Analyzing Personality through Social Media Profile Picture Choice

A REAL-TIME ENVIRONMENTAL SOUND RECOGNITION SYSTEM FOR THE ANDROID OS

Perceptual audio features for emotion detection

Identification of Neuroimaging Biomarkers

Accessible Computing Research for Users who are Deaf and Hard of Hearing (DHH)

Prediction of Average and Perceived Polarity in Online Journalism

Predicting Breast Cancer Survivability Rates

Phonologically-based biomarkers for major depressive disorder

Biomedical Instrumentation E. Blood Pressure

SCHIZOPHRENIA, AS SEEN BY A

MULTIMODAL EMOTION ANALYSIS

Affective pictures and emotion analysis of facial expressions with local binary pattern operator: Preliminary results

Analyzing Personality through Social Media Profile Picture Choice

Supplementary Online Content

Valence-arousal evaluation using physiological signals in an emotion recall paradigm. CHANEL, Guillaume, ANSARI ASL, Karim, PUN, Thierry.

Supersparse Linear Integer Models for Interpretable Prediction. Berk Ustun Stefano Tracà Cynthia Rudin INFORMS 2013

SPEECH EMOTION RECOGNITION: ARE WE THERE YET?

Multimodal Emotion Recognition Integrating Affective Speech with Facial Expression

Studying the Dark Triad of Personality through Twitter Behavior

An Algorithm to Detect Emotion States and Stress Levels Using EEG Signals

Using Audio-Derived Affective Offset to Enhance TV Recommendation

ON THE USE OF TIME-FREQUENCY FEATURES FOR DETECTING AND CLASSIFYING EPILEPTIC SEIZURE ACTIVITIES IN NON-STATIONARY EEG SIGNALS

UNOBTRUSIVE MONITORING OF SPEECH IMPAIRMENTS OF PARKINSON S DISEASE PATIENTS THROUGH MOBILE DEVICES

Affect Intensity Estimation using Multiple Modalities

ALTHOUGH there is agreement that facial expressions

Affective Impact of Movies: Task Overview and Results

Towards an EEG-based Emotion Recognizer for Humanoid Robots

CONTENT BASED CLINICAL DEPRESSION DETECTION IN ADOLESCENTS

Learning Long-term Dependencies in Recurrent Neural Networks

Facial expressions in vocal performance: Visual communication of emotion

Audiovisual to Sign Language Translator

Emotion classification using linear predictive features on wavelet-decomposed EEG data*

Facial Expression Biometrics Using Tracker Displacement Features

Motivation. Motivation. Motivation. Finding Deceptive Opinion Spam by Any Stretch of the Imagination

UMEME: University of Michigan Emotional McGurk Effect Data Set

Yeast Cells Classification Machine Learning Approach to Discriminate Saccharomyces cerevisiae Yeast Cells Using Sophisticated Image Features.

Enhanced Detection of Lung Cancer using Hybrid Method of Image Segmentation

Effect of Sensor Fusion for Recognition of Emotional States Using Voice, Face Image and Thermal Image of Face

ONE of the main findings of social cognition is that

Diagnosing lung status using PPG-derived respiratory signals

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)

REZUMAT TEZA DE DOCTORAT

An automatic mammogram system: from screening to diagnosis. Inês Domingues

Transcription:

EMOTION CLASSIFICATION: HOW DOES AN AUTOMATED SYSTEM COMPARE TO NAÏVE HUMAN CODERS? Sefik Emre Eskimez, Kenneth Imade, Na Yang, Melissa Sturge- Apple, Zhiyao Duan, Wendi Heinzelman University of Rochester, United States

Motivation Emotions play vital role in social interactions. Realistic human-computer interactions require determining affective state of the user accurately. How does an automated system compare to naïve human coders? Can automated systems replace naïve human coders in speech-based emotion classification applications?

Introduction In this study naïve human coders and an automated system are evaluated in terms of speech emotion classification performance. The results show that it is feasible to replace naïve human coders with automatic emotion classification systems. Naïve human coders confidence level in classification does not effect their classification accuracy, while automated system has increased accuracy when it is confident in classification.

Automatic Speech Emotion Classification System Overview

Feature Extraction All features and their 1 st order derivative except speaking rate are calculated in overlapping frames. Statistical values are calculated using all frames. min, max, mean, standard deviation and range (max-min). Feature name # Fundamental Frequency (f0) 10 Energy 10 Frequency and bandwidth for the first four Formants 80 12 Mel-frequency Cepstral Coefficients (MFCCs) 120 Zero-cross rate 10 Roll-off 10 Brightness 10 Centroid 10 Feature name # Spread 10 Skewness 10 Kurtosis 10 Flatness 10 Entropy 10 Roughness 10 Irregularity 10 Speaking Rate 1 Size of Feature Vector: 331

Feature Selection SVM Recursive Feature Elimination Train the SVMs to obtain weights. Eliminate the feature that has the lowest weight value. Continue until there is no feature left. Rank the features according to reverse of the elimination order and get top N best features. In our experiments we use N = 80 (out of 331);

Automatic Emotion Classification The system labels each sample with three different labels from the following sub-systems: 6 Emotion Categories: anger, disgust, panic, happy, neutral, sadness. Arousal Categories: Active, passive and neutral (APN). Valence Categories: Positive, negative and neutral (PNN).

Automatic Emotion Classifiers System uses binary Support Vector Machine (SVM) classifiers with RBF kernel for each emotion, resulting: 6 binary SVMs for first sub-system. 3 binary SVMs for second and third-sub system. Total of 12 binary SVMs.

Automatic Emotion Classification Threshold Fusion

LDC Dataset 15 Emotions, Speakers: 4 actress and 4 actors. Total of 2433 utterances. Acted. In our experiments: 6 Emotions: Anger, disgust, panic, happy, neutral and sadness. Speakers: 4 actress and 3 actors. 727 utterances.

Experimental Setup: Automatic Emotion Classification System 7-fold cross validation 6/7 of the data is used for training, 1/7 of the data is used for testing. In each fold, training and testing data has been randomly chosen. Data has been up-sampled to even out all classes. Leave-One-Subject-Out (LOSO) test

Experimental Setup: Amazon s Mechanical Turk 138 unique workers participated. 10-100 random samples per worker. Only one sample per emotion category is presented beforehand.

Experimental Setup: Amazon s Mechanical Turk Turkers are asked to listen, label and transcribe the audio sample. On the right figure, Turkers are asked for demographic information.

Number of instances Number of labeled instances according to Turker s age and gender information: 8000 7000 6000 5000 4000 3000 2000 1000 2610 940 550 250 4350 1300 630 620 300 2850 3980 1570 1270 550 7270 Accuracy (%) Accuracy percentage according to Turker s age and gender information: 70 60 50 40 30 20 10 61.4 56.5 64.2 51.6 60.1 63.6 59 58.7 61.4 61.2 61.9 57.5 61.3 56.9 60.4 0 Male Female Total 18-29 2610 1300 3980 30-39 940 630 1570 40-49 550 620 1270 50-59 250 300 550 Total 4350 2850 7270 18-29 30-39 40-49 50-59 Total 0 Male Female Total 18-29 61.4 63.6 61.9 30-39 56.5 59 57.5 40-49 64.2 58.7 61.3 50-59 51.6 61.4 56.9 Total 60.1 61.2 60.4 18-29 30-39 40-49 50-59 Total

Results: Turkers Accuracy (%) 80 70 60 50 40 30 20 10 0 61.7 56.1 67 56.3 60.8 Male (Confident) Accuracy percentage according to Turker s confidence level 61.6 60.4 54.7 Male (Not Sure) 18-29 30-39 40-49 50-59 Total 37.8 57.9 63.2 58.3 55.9 61.4 60.4 Female (Confident) 64.4 60.7 61.1 68.5 62.9 Female (Not Sure) 62.1 57 61.3 56.2 60.6 Total (Confident) 61.6 60.5 58.4 50.8 59.6 Total (Not Sure) 18-29 61.7 61.6 63.2 64.4 62.1 61.6 30-39 56.1 60.4 58.3 60.7 57 60.5 40-49 67 54.7 55.9 61.1 61.3 58.4 50-59 56.3 37.8 61.4 68.5 56.2 50.8 Total 60.8 57.9 60.4 62.9 60.6 59.6

Results: Computer System

Turkers vs. Computer System: Emotions Accuracy (%) 90 80 70 60 50 40 30 72.9 60.4 61.2 60.1 73.2 64.9 64.4 65.4 72 54.1 57.1 52.5 77.7 60.6 60.4 60.8 61.2 59.6 62.9 57.9 20 10 0 All Samples Female Samples Male Samples Confident (80%) Not Sure (20%) Computer System 72.9 73.2 72 77.7 61.2 All Turkers 60.4 64.9 54.1 60.6 59.6 Female Turkers 61.2 64.4 57.1 60.4 62.9 Male Turkers 60.1 65.4 52.5 60.8 57.9 Computer System All Turkers Female Turkers Male Turkers

Accuracy (%) Turkers vs. Computer System: APN & PNN 100 90 80 70 60 50 40 30 20 10 0 89.3 70.5 82.9 71.8 All Samples 86.8 71.5 82.9 75.5 Female Samples Male Samples Confident (80%) Not Sure (20%) Computer System (APN) 89.3 86.8 92.4 94.4 73.1 All Turkers (APN) 70.5 71.5 69 71 67.9 Computer System (PNN) 82.9 82.9 82.4 88 62 All Turkers (PNN) 71.8 75.5 66.6 72.1 70.7 92.4 69 82.4 66.6 94.4 71 88 72.1 73.1 67.9 Computer System (APN) All Turkers (APN) Computer System (PNN) All Turkers (PNN) 62 70.7

Conclusion This study compares naïve human coders with the automatic emotion classification system. The automatic system achieves much better accuracy in almost all cases. The automatic system can improve the classification accuracy by rejecting samples with low confidence. Naïve human coders were not able to improve their accuracy through specifying their confidence in their classification The results show that it is feasible to replace naïve human coders with automatic emotion classification systems.

The End Thank you!