Enhanced Autocorrelation in Real World Emotion Recognition

Size: px
Start display at page:

Download "Enhanced Autocorrelation in Real World Emotion Recognition"

Transcription

1 Enhanced Autocorrelation in Real World Emotion Recognition Sascha Meudt Institute of Neural Information Processing University of Ulm Friedhelm Schwenker Institute of Neural Information Processing University of Ulm ABSTRACT Multimodal emotion recognition in real world environments is still a challenging task of affective computing research. Recognizing the affective or physiological state of an individual is difficult for humans as well as for computer systems, and thus finding suitable discriminative features is the most promising approach in multimodal emotion recognition. In the literature numerous features have been developed or adapted from related signal processing tasks. But still, classifying emotional states in real world scenarios is difficult and the performance of automatic classifiers is rather limited. This is mainly due to the fact that emotional states can not be distinguished by a well defined set of discriminating features. In this work we present an enhanced autocorrelation feature as a multi pitch detection feature and compare its performance to feature well known, and state-of-the-art in signal and speech processing. Results of the evaluation show that the enhanced autocorrelation outperform other state-of-the-art features in case of the challenge data set. The complexity of this benchmark data set lies in between real world data sets showing naturalistic emotional utterances, and the widely applied and well-understood acted emotional data sets. Keywords enhanced autocorrelation, audio features, emotion recognition, human computer interaction, affective computing 1. INTRODUCTION Human affective state has been attracted researchers in medical, psychological and cognitive science since the mid 196s [8, 24]. Because of the dramatically increasing computational power of modern information technology, numerous technical devices such as smart phones, tablets, laptops, personal computers and many more became more and more part of our daily life, and more even more important, the way we are using these computing devises has totally changed: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. ICMI 14, November 12 16, 214, Istanbul, Turkey Copyright 214 ACM /14/11...$ Human-Computer interaction (HCI) in the 198s and 199s was dominated by keyboard and mouse interaction following a strict input-operation-output work flow. Recently, more and more options for multimodal interaction have been developed, e.g. control via speech or hand gestures, and in the future the analysis of the user s intentions or affective state will come into to focus of HCI research. Examples are the analysis of para-linguistic patterns or facial expressions, or more general, social signals of humans implicitly produced as a by-product when interacting or communicating to a computing device. This change in HCI caused computer scientists to focus on recognition of social signals, such as emotions, with the intention to use this information to alter subsequent actions in HCI scenarios. Humans express their emotions in many different ways and modalities, this makes automatic emotion recognition a very challenging task in computer science. In this emerging field, called affective computing, computers are seen as intelligent companions offering context sensitive advice and support to the human user, and techniques based on machine learning and pattern pattern recognition came more and more in the focus of research. In the past, pattern recognition research in emotion recognition has been done mostly on recognizing acted, usually over-expressed, well defined emotional categories, and segmented data (isolated video clips, audio files) [33, 3], typical datasets consisted of audio [2], video [16] or audio-video recordings [1]. The classification of such data simple, and thus, the achieved classification results are accurate [9, 26]. However, such acted emotions do not appear in every day life HCI situations, thus the community changed their scope more towards non-acted scenarios and benchmark data [2, 25] containing more realistic emotional appearances. A famous benchmark data set is the data base introduced in AVEC Challenge [31], the EmoRec [36] and the Last-Minute data set [23] Unfortunately the ground truth is somewhat unclear and the data are very noisy in this datasets [15]. In case of the EmotiW challenge the ground truth is given by the storyboard of the movies the videos had been taken from. Hence, it is even for humans difficult to categorize the acted emotion without context. Recognition systems thus have to deal with unreliable data containing various levels of expressiveness. In order to overcome those problems, different approaches are proposed [1, 19, 11]. Focus can, for example, be set on a single modality like video or speech signals and classification systems can thus be tailored on the special characteristics of those modalities. Advances in affective computing in recent 52

2 years have come from better classifiers and fusion systems, but also from the adaptation and development of novel, discriminating features. The EmotiW challenge [6, 4] in 214 [5] again offers a challenging bimodal benchmark data set. EmotiW 214 and 213 are data sets derived from cinema movies and thus the complexity of the emotional utterances are in between acted and realistic emotions. For this reason the EmotiW database is an important benchmark to develop new features for emotion recognition. Speech signals are appealing for emotion recognition because they are simple to process and their analyses present promising ways for future research [7, 27, 28, 29]. One of the main issues in designing an automatic emotion recognition system is the choice of the features that can represent the corresponding emotions. In recent years, several different feature types proved to be useful in the context of emotion recognition from speech: Nicholson et. al [21] used feature extraction based on pitch and linear predictive coding (LPC). In earlier work two feature types, comprising modulation spectrum (ModSpec) and relative spectral transform - perceptual linear prediction (RASTA-PLP) where used to recognize seven different emotions with an accuracy of approximately 79% on the Berlin database of emotional speech (EMO-DB [2]) [28, 22, 3, 32]. Other feature choices include the Mel Frequency Cepstral Coefficients (MFCC) as used in [17]. Developing powerful features which discriminate speech in their underling emotional categories in a good manor is still a hot topic in current research. In this paper we present the usage of the enhanced autocorrelation (EAC) feature in the field of emotion recognition. The feature was originally developed by Tolonen and Karjalainen [34] as multi pitch detector for music classification, to the best of our knowledge this feature has so far never been used in emotion estimation. In this paper we compare the performance of the EAC feature to a set of commonly applied features in emotion recognition from speech. The results show that EAC yields comparable and slightly better results on the AFEW 4. dataset. As many other features the EAC computation is based on an analysis in the frequency domain of the audio signal, hence resulting in a much higher output dimension which gives the classifier a better chance to discriminate the emotional classes. In the following we give a short overview of the used common features followed by a detailed introduction of the EAC feature. Afterwards we compare the features based on the challenge dataset and present the results of our approach on the test dataset. A final conclusion and discussion is given in the closing section of the paper. 2. FEATURES In the following 4 subsections we give a brief introduction on commonly used features in emotion recognition from speech. This is followed by a more detailed description of the EAC method. To the best of our knowledge EAC is used in the field of emotion recognition for the first time. Figure 1 gives an overview on the dimensions and shape of the utilized features. All features have in common that the audio signal is divided in overlapping windows, subsequently a Hamming window function w(i) = cos( 2πi ), M i = M,..., M 1 with window size M and sample index i 2 2 is applied to prevent edge effects. Finally in subsection 2.6 MFCC ModSpec LPC RastaPLP EAC Spectrogram Audio Signal Feature comparison of file avi (Angry) Time in seconds Figure 1: Shape comparison of all features on file avi of class angry. From top to bottom: Mel Frequency Cepstral Coefficients, Modulation Spectrum, Linear Predictive Coding, Relative Spectral Perceptual Linear Predictive, Enhanced Autocorrelation, audio signal in frequency domain and audio signal in time domain. we propose a feature fusion method to reduce the number of instances to one per film sequence. 2.1 Mel Frequency Cepstral Coefficients The MFCC representation of a signal is motivated by the way how humans percept audio modulation in their ear. A filter bank with linear spaced filters in the lower frequencies and logarithmic scaled filters in the upper frequencies called MEL filter bank is used during feature extraction. MFCC are commonly used features in short term feature cases since their compact ability to represent the important information in speech processing applications while retaining most of the phonetical significant acoustics [18] On each window a short-term fast Fourier transformation (FFT) is computed, and on the result the MEL filter bank of triangular filters is applied. The discrete cosine transformation (DCT) on each filter output yields the de-correlated cepstral coefficients of each window. A typical filter order is in the ranged from 8 and 24. In this work we extracted the coefficients on 4msec windows with an overlap of 2msec and a filter bank order of 2. 53

3 Time in sample numbers Sample value Audiosignal of avi FFT time doubled FFT Time in sample numbers x 1 4 EAC values of avi FFT dimension Non negative difference (EAC value) EAC dimension Time in window numbers Figure 2: A window out of the audio signal (top) and corresponding EAC analysis (bottom) of film sequence wav In the middle the zero clipped autocorrelation signal and their time doubled curve is shown. 2.2 Modulation Spectrum To reduce the influence of environmental noise Hermansky [12] introduced the Modulation Spectrum feature of speech to obtain the temporal dynamics of the speech segment. To gain the long term temporal modulations in the range from 2Hz to 16Hz window sizes of 2msec up to several seconds are applied. Due to the short audio segments of the EmotiW-challenge we fixed 2msec as the used window size. As for MFCC, firstly a FFT is computed on overlapping subwindows, and the MEL filter bank is applied too (we used 8 filters in this work). Instead of applying the DCT on each subwindow the response of all subwindows is aggregate and a FFT over the energy response per filter is calculated yielding the final feature vector. Strong rates of change of the vocal tract, holding important linguistic information are represented in the corresponding dimensions of the feature. 2.3 Linear Predictive Coding Instead of analysing the signal by using FFT based approaches in LPC the speech sample s(t) is approximated by linear combination of the past p samples [14]: s(t) a 1s(t 1) a ps(t p), (1) where the coefficients a 1,..., a p are assumed to be constant in a short signal segment. This results in a p dimensional vector corresponding to a curve fitting around the peaks of the short term log magnitude spectrum of a signal. As in the previous features the information is compressed, avoiding a transformation from time to frequency domain. In this work we use p = 12, extracted from 4msec windows with 2msec overlap. 2.4 Relative Spectral Perceptual Linear Predictive Coding The perceptually and biologically motivations of critical bands and equally loudness curve build the basis of the perceptual linear prediction (PLP) [13]. The sound pressure (db) that is required to perceive a sound of any frequency Figure 3: Audio signal and the corresponding EAC analysis of film sequence wav. Falling multi pitch in the very beginning, a low frequent single pitch from window 12 to 16 and a very noisy and wide pitch in the end can be observed. as loud as a reference sound of 1 khz is approximated by: (w E(w) = ) w 2 (2) (w ) (w ) As a result of this equation, frequencies below 1 khz need higher sound pressure levels than the reference sound while sounds between 2 and 5 khz need less pressure. As for MFCC filtering a critical band filtering is done (usual with about 21 filters), but in contrast to MFCC the filtering is linear in the Bark Scale and the filter shape is not triangular. PLP is sensitive towards spectral changes caused by transmission channels caused by different microphones or telephone voice compression algorithms. Therefore after band filtering Hermansky suggested equal loudness conversion by transforming the spectrum to the logarithmic domain and applying a relative spectral (Rasta) filter processing followed by using exponential re-transformation. Finally, LPC coefficients are calculated similar to subsection 2.3 over the critical band energies. In this work the features with filter order of 21, a window length of 25msec and an offset of 1msec have been applied. 2.5 Enhanced Auto Correlation The EAC feature was originally introduced by Tolonen and Karjalainen [34] as multi pitch analyses in various different fields of audio signal processing. In our work we use a slightly modified version of their extraction process in order to extract the main harmonics of the vocal cord and the glottal source of the speech signal. Different emotional states could cause a variety of muscle tension in the vocal tract which influence the produced sound signal. Extracting the multi pitch harmonics based on the periodicity of the signal could therefore be a reliable feature containing a lot of emotional information independent of the underling spoken content. In addition the EAC feature is a very high dimensional feature in the field of audio processing which might help a classifier to distinguish between the distributions of the emotional classes. 54

4 Fusion type instance wise window average window variance Feature type Voting - Late Fusion Early Fusion Early Fusion MFCC 24.7 (2.44) 25.6 (1.56) 22.8 (2.6) ModSpec 28.4 (1.71) 28. (3.86) 25.1 (1.95) LPC 21.9 (.8) 2.6 (2.18) 2.6 (3.23) RastaPLP 22.8 (1.61) 19.9 (1.36) 29.1 (1.6) EAC 29.2 (1.13) 3.7 (3.82) 3.7 (2.8) Table 1: Results for emotion recognition on the utilized features based on a 1-fold cross-validation on the unification of train and validation set (accuracy in percent, standard derivation in brackets). As in most other feature extraction procedures we first divide the signal in overlapping Hamming windows. For each window containing the input signal S the autocorrelation function ( ( acf(s) = Re ifft FFT (Hamming(S)) k)) (3) is applied in order to detect periodic signal parts. The parameter k allows the control of the periodicity detection with some non linear processing in the frequency domain. Peaks in the autocorrelation curve are an indicator for pitch periods. Normally, a lot of redundant information and noise is part of this curve. To improve the reliability of the detection the autocorrelation result is clipped at zero and all negative values are set to zero. In order to remove multiple peaks of the fundamental periods which are caused by the autocorrelation function the time doubled signal of the autocorrelation function is subtracted from the autocorrelation signal, and again all negative values are set to zero, yielding the final EAC feature vector. Figure 2 shows a audio signal window applied with the Hamming function (upper part). In the middle plot the red curve shows the autocorrelation curve of the window clipped to zero and the time doubled curve in green. Finally in the bottom line the EAC curve is displayed, showing a wide single peak at 31. Keeping in mind that this procedure is repeated for every sliding window on the audio signal (Figure 3 top) the EAC result could be drawn as in Figure 3 bottom where the x-axis displays the time, the y-axis donates the dimensions of the feature and the color corresponds to the value at a given point (from blue = zero to red = one). One could see a falling multi pitch in the very beginning, a low frequent single pitch from window 12 to 16 and a very noisy and wide pitch in the end. In the following the parameters of the EAC had been set to window size 124 sample values (47msec) with a overlap of 512 samples resulting in 512 EAC dimensions per window. The parameter k was set to 1/3 designated by empirical experiences. 2.6 Per film feature representation All above described features are derived from short time windows of a signal, resulting in a large number (2-3) of feature vectors per utterance or in case of the challenge film sequences. Keeping in mind that a typical EmotiWchallenge subset contains about 4 sequences the total number of instances presented to the classifier is above 1,. In order to reduce the number of training instances we applied two different early fusion techniques which reduces the number of instances per film sequence to one per sequence. First we compute the average vector x µ = 1 T T t=1 xt of all t = 1,..., T feature vectors per sequence. Computing the average loses the variety within a sequence, in order to reduce this disadvantage we second took the variance x σ 2 = 1 T T t=1 (xt xµ)2 of a vector sequence. In the following we build a separate classifier on the windowed sequence, sequence averages and variances. 3. RESULTS The evaluation is divided in two parts. First the results according to the comparison of the features ignoring the underling challenge are presented. Second we choose the most promising features containing EAC and ModSpec and a fusion of all features to participate in the challenge. The other feature types are evaluated on the test set without taking them into account in the challenge. 3.1 Feature comparison To compare the performance of the features we rearranged the train and validation set by combining them to a single dataset containing 962 film sequences. We extracted all features presented in section 2 in the windowed sequence, sequence average and variance option and applied a Support Vector Machine (SVM) with Gaussian kernel for each feature type. Each SVM was optimized according to their specific parameters C and γ based on unweighed accuracy measure. A 1-fold cross validation was applied. Table 1 gives an overview of the achieved results. In case of single frame based as well as sequence average and variance features the EAC outperforms all other features. The most common used MFCC feature performs only moderate on the EmotiW dataset, which is not very surprising based on our results of last years challenge participation[2]. On window instance based and sequence average vectors the ModSpec feature performs second best and third best on the variance of window based vectors. 3.2 Challenge Results Based on the previous results we decided to use the Mod- Sepc and EAC feature in all three (full, average and variance) options for the participation in the challenge. The six SVM s were trained and optimized on the train and validation set according to the challenge guidelines. Table 2 gives an overview of the results and endorses the dominance of the EAC feature family. Finally a fusion architecture was applied by summarizing the probabilistic outputs of the SVM classifiers which build an ensemble of the six members. The fusion architecture yield a overall accuracy of 4.1% which is gently higher than each single feature. As on the feature comparison results again the EAC features are better than the ModSpec feature. 55

5 Fusion type instance wise window average window variance Feature type Voting - Late Fusion Early Fusion Early Fusion MFCC ModSpec LPC RastaPLP EAC Fusion of all 4.1 Baseline 26.2 Table 2: Results comparison for emotion recognition, trained on the train set according to the challenge guidelines. Classification Accuracy in percent Angry Disgust Fear Happy Neutral Sad Surprise Accuracy Angry Disgust Fear Happy Neutral Sad Surprise Table 3: Confusion matrix of average EAC feature results on test set. Accuracy in percent, confusion matrix in absolute numbers. In table 3 the confusion matrix is evinced, which shows a strong imbalance oft the classification accuracies along the different classes. Neutral and angry could be recognized very well while disgust and surprise couldn t be perceived at all. Compared to the last year results where we also had a imbalance of the classification accuracy towards the classes angry, neutral and happiness this imbalancing is not very surprising. May the poor results of disgust, sadness and surprise result from the lower train data amount and a priory probability in test data. In case of the challenge this result is a bit unsatisfying, while in the scope of affective computing were neutral and angry are very important classes it is a very promising result. 4. CONCLUSIONS After our work in the field of feature selection in the last year we focused more on the development and usage of new or alternative features in the field of speech based emotion recognition on datasets which are recorded under mostly realistic and not strict controlled environments like the AFEW 4. dataset. In this work we presenter the EAC as a new feature and compared it to several state of the art features. All common features had been outperformed in all cases. The best result on the challenge dataset (4% accuracy) was archived by using a fusion combination of all features together an was slightly better than the best single feature (EAC average with 37% accuracy). In our opinion it is very important to search for new features which could discriminate speech towards emotions better than the existing ones which had mostly been developed for speech recognition, speaker identification or music classification. Based on better features the underlain classification problem could get much easier, which then result in higher classification accuracy based on single frame or short term signal analyses. May it is also suitable to analyse features towards their discrimination ability by using correlation dimension analyses [35]. Improving the basic input results could then finally improve results after a well designed multi modal fusion architecture dramatically. 5. REFERENCES [1] T. Bänziger, H. Pirker, and K. Scherer. Gemep-geneva multimodal emotion portrayals: A corpus for the study of multimodal emotional expressions. In Proceedings of LREC, volume 6, pages 15 19, 26. [2] F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss. A database of german emotional speech. In Interspeech, volume 5, pages , 25. [3] F. Dellaert, T. Polzin, and A. Waibel. Recognizing emotion in speech. In Spoken Language, ICSLP 96. Proceedings., Fourth International Conference on, volume 3, pages IEEE, [4] A. Dhall, R. Goecke, J. Joshi, M. Wagner, and T. Gedeon. Emotion recognition in the wild challenge. ACM ICMI, 213. [5] A. Dhall, R. Goecke, J. Joshi, M. Wagner, and T. Gedeon. Emotion recognition in the wild challenge 214: Baseline, data and protocol. ACM ICMI, 214. [6] A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Collecting large, richly annotated facial-expression databases from movies. IEEE Multimedia, 19:34 41, 212. [7] N. Fragopanagos and J. Taylor. Emotion recognition in human-computer interaction. Neural Networks, 18:389 45, 25. [8] N. H. Frijda. Recognition of emotion. Advances in experimental social psychology, 4: , [9] M. Glodek, M. Schels, G. Palm, and F. Schwenker. Multi-modal fusion based on classifiers using reject 56

6 options and markov fusion networks. In Proceedings of the International Conference on Pattern Recognition (ICPR), pages IEEE, 212. [1] M. Glodek, S. Scherer, F. Schwenker, and G. Palm. Conditioned hidden markov model fusion for multimodal classification. In INTERSPEECH, pages , 211. [11] M. Glodek, S. Tschechne, G. Layher, M. Schels, T. Brosch, S. Scherer, M. Kächele, M. Schmidt, H. Neumann, G. Palm, et al. Multiple classifier systems for the classification of audio-visual emotional states. In Affective Computing and Intelligent Interaction, pages Springer, 211. [12] H. Hermansky. The modulation spectrum in the automatic recognition of speech. In Automatic Speech Recognition and Understanding, Proceedings., 1997 IEEE Workshop on, pages IEEE, [13] H. Hermansky, N. Morgan, A. Bayya, and P. Kohn. Rasta-plp speech analysis technique. In Acoustics, Speech, and Signal Processing, IEEE International Conference on, volume 1, pages IEEE, [14] F. Itakura. Line spectrum representation of linear predictor coefficients of speech signals. The Journal of the Acoustical Society of America, 57(S1):S35 S35, [15] M. Kächele, M. Glodek, D. Zharkov, S. Meudt, and F. Schwenker. Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In Procȯf ICPRAM, pages , 214. [16] T. Kanade, J. F. Cohn, and Y. Tian. Comprehensive database for facial expression analysis. In Automatic Face and Gesture Recognition, 2. Proceedings. Fourth IEEE International Conference on, pages IEEE, 2. [17] C. M. Lee, S. Yildirim, M. Bulut, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee, and S. S. Narayanan. Emotion recognition based on phoneme classes. In Proceedings of ICSLP 4, 24. [18] B. Logan et al. Mel frequency cepstral coefficients for music modeling. In ISMIR, 2. [19] S. Meudt and F. Schwenker. On instance selection in audio based emotion recognition. In Artificial Neural Networks in Pattern Recognition, pages Springer, 212. [2] S. Meudt, D. Zharkov, M. Kächele, and F. Schwenker. Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech. In Proceedings of the 15th ACM on International conference on multimodal interaction, pages ACM, 213. [21] J. Nicholson, K. Takahashi, and R. Nakatsu. Emotion recognition in speech using neural networks. Neural Computing and Applications, 9:29 296, 2. [22] G. Palm and F. Schwenker. Sensor-fusion in neural networks. In E. Shahbazian, G. Rogova, and M. J. DeWeert, editors, Harbour Protection Through Data Fusion Technologies, pages Springer, 29. [23] D. Rösner, J. Frommer, R. Friesen, M. Haase, J. Lange, and M. Otto. Last minute: a multimodal corpus of speech-based user-companion interactions. In LREC, pages , 212. [24] S. Schachter. The interaction of cognitive and physiological determinants of emotional state. Advances in Experimental Social Psychology, 1(Bd. 1):49 8, [25] M. Schels, M. Glodek, S. Meudt, S. Scherer, M. Schmidt, G. Layher, S. Tschechne, T. Brosch, D. Hrabal, S. Walter, G. Palm, H. Neumann, H. Traue, and F. Schwenker. Multi-modal classifier-fusion for the recognition of emotions. In M. Rojc and N. Campbell, editors, Coverbal synchrony in Human-Machine Interaction, chapter 4. CRC Press, 213. [26] M. Schels, M. Kächele, M. Glodek, D. Hrabal, S. Walter, and F. Schwenker. Using unlabeled data to improve classification of emotional states in human computer interaction. Journal on Multimodal User Interfaces, 8(1):5 16, 214. [27] K. R. Scherer, T. Johnstone, and G. Klasmeyer. Handbook of Affective Sciences - Vocal expression of emotion, chapter 23, pages Affective Science. Oxford University Press, 23. [28] S. Scherer, M. Oubbati, F. Schwenker, and G. Palm. Real-time emotion recognition from speech using echo state networks. In Artificial Neural Networks in Pattern Recognition, pages Springer Berlin Heidelberg, 28. [29] S. Scherer, F. Schwenker, and G. Palm. Emotion recognition from speech using multi-classifier systems and rbf-ensembles. In Speech, Audio, Image and Biomedical Signal Processing using Neural Networks, pages Springer Berlin Heidelberg, 28. [3] S. Scherer, F. Schwenker, and G. Palm. Classifier fusion for emotion recognition from speech. In W. Minker, M. Weber, H. Hagras, V. Callagan, and A. D. Kameas, editors, Advanced Intelligent Environments, pages Springer, 29. [31] B. Schuller, M. Valstar, F. Eyben, G. McKeown, R. Cowie, and M. Pantic. Avec 211 the first international audio/visual emotion challenge. In Affective Computing and Intelligent Interaction, pages Springer, 211. [32] F. Schwenker, S. Scherer, M. Schmidt, M. Schels, and M. Glodek. Multiple classifier systems for the recognition of human emotions. In N. E. Gayar, J. Kittler, and F. Roli, editors, Proceedings of the 9th International Workshop on Multiple Classifier Systems (MCS 1), LNCS 5997, pages Springer, 21. [33] Y.-L. Tian, T. Kanade, and J. F. Cohn. Facial expression analysis. In Handbook of face recognition, pages Springer, 25. [34] T. Tolonen and M. Karjalainen. A computationally efficient multipitch analysis model. Speech and Audio Processing, IEEE Transactions on, 8(6):78 716, 2. [35] C. Traina, A. Traina, L. Wu, and C. Faloutsos. Fast feature selection using fractal dimension. 2. [36] S. Walter, S. Scherer, M. Schels, M. Glodek, D. Hrabal, M. Schmidt, R. Böck, K. Limbrecht, H. C. Traue, and F. Schwenker. Multimodal emotion classification in naturalistic user behavior. In Human-Computer Interaction. Towards Mobile and Intelligent Interaction Environments, pages Springer,

Audio-based Emotion Recognition for Advanced Automatic Retrieval in Judicial Domain

Audio-based Emotion Recognition for Advanced Automatic Retrieval in Judicial Domain Audio-based Emotion Recognition for Advanced Automatic Retrieval in Judicial Domain F. Archetti 1,2, G. Arosio 1, E. Fersini 1, E. Messina 1 1 DISCO, Università degli Studi di Milano-Bicocca, Viale Sarca,

More information

Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information

Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, S. Narayanan Emotion

More information

Emotion Recognition Modulating the Behavior of Intelligent Systems

Emotion Recognition Modulating the Behavior of Intelligent Systems 2013 IEEE International Symposium on Multimedia Emotion Recognition Modulating the Behavior of Intelligent Systems Asim Smailagic, Daniel Siewiorek, Alex Rudnicky, Sandeep Nallan Chakravarthula, Anshuman

More information

COMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM: APPLICATION TO SPEECH EMOTION RECOGNITION

COMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM: APPLICATION TO SPEECH EMOTION RECOGNITION Journal of Engineering Science and Technology Vol. 11, No. 9 (2016) 1221-1233 School of Engineering, Taylor s University COMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM: APPLICATION TO SPEECH EMOTION

More information

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM) SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM) Virendra Chauhan 1, Shobhana Dwivedi 2, Pooja Karale 3, Prof. S.M. Potdar 4 1,2,3B.E. Student 4 Assitant Professor 1,2,3,4Department of Electronics

More information

Gender Based Emotion Recognition using Speech Signals: A Review

Gender Based Emotion Recognition using Speech Signals: A Review 50 Gender Based Emotion Recognition using Speech Signals: A Review Parvinder Kaur 1, Mandeep Kaur 2 1 Department of Electronics and Communication Engineering, Punjabi University, Patiala, India 2 Department

More information

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment DISTRIBUTION STATEMENT A Approved for Public Release Distribution Unlimited Advanced Audio Interface for Phonetic Speech Recognition in a High Noise Environment SBIR 99.1 TOPIC AF99-1Q3 PHASE I SUMMARY

More information

EMOTION DETECTION THROUGH SPEECH AND FACIAL EXPRESSIONS

EMOTION DETECTION THROUGH SPEECH AND FACIAL EXPRESSIONS EMOTION DETECTION THROUGH SPEECH AND FACIAL EXPRESSIONS 1 KRISHNA MOHAN KUDIRI, 2 ABAS MD SAID AND 3 M YUNUS NAYAN 1 Computer and Information Sciences, Universiti Teknologi PETRONAS, Malaysia 2 Assoc.

More information

Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification

Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification Reza Lotfian and Carlos Busso Multimodal Signal Processing (MSP) lab The University of Texas

More information

Multi-Modal Classifier- Fusion for the Recognition of Emotions

Multi-Modal Classifier- Fusion for the Recognition of Emotions CHAPTER 4 Multi-Modal Classifier- Fusion for the Recognition of Emotions Martin Schels, Michael Glodek, Sascha Meudt, Stefan Scherer, Miriam Schmidt, Georg Layher, Stephan Tschechne, Tobias Brosch, David

More information

EMOTION DETECTION FROM VOICE BASED CLASSIFIED FRAME-ENERGY SIGNAL USING K-MEANS CLUSTERING

EMOTION DETECTION FROM VOICE BASED CLASSIFIED FRAME-ENERGY SIGNAL USING K-MEANS CLUSTERING EMOTION DETECTION FROM VOICE BASED CLASSIFIED FRAME-ENERGY SIGNAL USING K-MEANS CLUSTERING Nazia Hossain 1, Rifat Jahan 2, and Tanjila Tabasum Tunka 3 1 Senior Lecturer, Department of Computer Science

More information

Dimensional Emotion Prediction from Spontaneous Head Gestures for Interaction with Sensitive Artificial Listeners

Dimensional Emotion Prediction from Spontaneous Head Gestures for Interaction with Sensitive Artificial Listeners Dimensional Emotion Prediction from Spontaneous Head Gestures for Interaction with Sensitive Artificial Listeners Hatice Gunes and Maja Pantic Department of Computing, Imperial College London 180 Queen

More information

A Common Framework for Real-Time Emotion Recognition and Facial Action Unit Detection

A Common Framework for Real-Time Emotion Recognition and Facial Action Unit Detection A Common Framework for Real-Time Emotion Recognition and Facial Action Unit Detection Tobias Gehrig and Hazım Kemal Ekenel Facial Image Processing and Analysis Group, Institute for Anthropomatics Karlsruhe

More information

EMOTIONS are one of the most essential components of

EMOTIONS are one of the most essential components of 1 Hidden Markov Model for Emotion Detection in Speech Cyprien de Lichy, Pratyush Havelia, Raunaq Rewari Abstract This paper seeks to classify speech inputs into emotion labels. Emotions are key to effective

More information

A Multilevel Fusion Approach for Audiovisual Emotion Recognition

A Multilevel Fusion Approach for Audiovisual Emotion Recognition A Multilevel Fusion Approach for Audiovisual Emotion Recognition Girija Chetty & Michael Wagner National Centre for Biometric Studies Faculty of Information Sciences and Engineering University of Canberra,

More information

Research Article Automatic Speaker Recognition for Mobile Forensic Applications

Research Article Automatic Speaker Recognition for Mobile Forensic Applications Hindawi Mobile Information Systems Volume 07, Article ID 698639, 6 pages https://doi.org//07/698639 Research Article Automatic Speaker Recognition for Mobile Forensic Applications Mohammed Algabri, Hassan

More information

Computational Perception /785. Auditory Scene Analysis

Computational Perception /785. Auditory Scene Analysis Computational Perception 15-485/785 Auditory Scene Analysis A framework for auditory scene analysis Auditory scene analysis involves low and high level cues Low level acoustic cues are often result in

More information

Decision tree SVM model with Fisher feature selection for speech emotion recognition

Decision tree SVM model with Fisher feature selection for speech emotion recognition Sun et al. EURASIP Journal on Audio, Speech, and Music Processing (2019) 2019:2 https://doi.org/10.1186/s13636-018-0145-5 RESEARCH Decision tree SVM model with Fisher feature selection for speech emotion

More information

Emotion Recognition using a Cauchy Naive Bayes Classifier

Emotion Recognition using a Cauchy Naive Bayes Classifier Emotion Recognition using a Cauchy Naive Bayes Classifier Abstract Recognizing human facial expression and emotion by computer is an interesting and challenging problem. In this paper we propose a method

More information

Facial expression recognition with spatiotemporal local descriptors

Facial expression recognition with spatiotemporal local descriptors Facial expression recognition with spatiotemporal local descriptors Guoying Zhao, Matti Pietikäinen Machine Vision Group, Infotech Oulu and Department of Electrical and Information Engineering, P. O. Box

More information

Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation

Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation Aldebaro Klautau - http://speech.ucsd.edu/aldebaro - 2/3/. Page. Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation ) Introduction Several speech processing algorithms assume the signal

More information

Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest

Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest Published as conference paper in The 2nd International Integrated Conference & Concert on Convergence (2016) Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest Abdul

More information

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 5, JULY

IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 5, JULY IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 5, JULY 2011 1057 A Framework for Automatic Human Emotion Classification Using Emotion Profiles Emily Mower, Student Member, IEEE,

More information

CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS

CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS Mitchel Weintraub and Leonardo Neumeyer SRI International Speech Research and Technology Program Menlo Park, CA, 94025 USA ABSTRACT

More information

Hierarchical Classification of Emotional Speech

Hierarchical Classification of Emotional Speech 1 Hierarchical Classification of Emotional Speech Zhongzhe Xiao 1, Emmanuel Dellandrea 1, Weibei Dou 2, Liming Chen 1 1 LIRIS Laboratory (UMR 5205), Ecole Centrale de Lyon, Department of Mathematic and

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION CHAPTER 1 INTRODUCTION 1.1 BACKGROUND Speech is the most natural form of human communication. Speech has also become an important means of human-machine interaction and the advancement in technology has

More information

SPEECH EMOTION RECOGNITION: ARE WE THERE YET?

SPEECH EMOTION RECOGNITION: ARE WE THERE YET? SPEECH EMOTION RECOGNITION: ARE WE THERE YET? CARLOS BUSSO Multimodal Signal Processing (MSP) lab The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science Why study emotion

More information

Noise-Robust Speech Recognition Technologies in Mobile Environments

Noise-Robust Speech Recognition Technologies in Mobile Environments Noise-Robust Speech Recognition echnologies in Mobile Environments Mobile environments are highly influenced by ambient noise, which may cause a significant deterioration of speech recognition performance.

More information

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED Francisco J. Fraga, Alan M. Marotta National Institute of Telecommunications, Santa Rita do Sapucaí - MG, Brazil Abstract A considerable

More information

Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency

Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency Tetsuya Shimamura and Fumiya Kato Graduate School of Science and Engineering Saitama University 255 Shimo-Okubo,

More information

Sound Texture Classification Using Statistics from an Auditory Model

Sound Texture Classification Using Statistics from an Auditory Model Sound Texture Classification Using Statistics from an Auditory Model Gabriele Carotti-Sha Evan Penn Daniel Villamizar Electrical Engineering Email: gcarotti@stanford.edu Mangement Science & Engineering

More information

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION Lu Xugang Li Gang Wang Lip0 Nanyang Technological University, School of EEE, Workstation Resource

More information

EEL 6586, Project - Hearing Aids algorithms

EEL 6586, Project - Hearing Aids algorithms EEL 6586, Project - Hearing Aids algorithms 1 Yan Yang, Jiang Lu, and Ming Xue I. PROBLEM STATEMENT We studied hearing loss algorithms in this project. As the conductive hearing loss is due to sound conducting

More information

Research Proposal on Emotion Recognition

Research Proposal on Emotion Recognition Research Proposal on Emotion Recognition Colin Grubb June 3, 2012 Abstract In this paper I will introduce my thesis question: To what extent can emotion recognition be improved by combining audio and visual

More information

Facial Expression Biometrics Using Tracker Displacement Features

Facial Expression Biometrics Using Tracker Displacement Features Facial Expression Biometrics Using Tracker Displacement Features Sergey Tulyakov 1, Thomas Slowe 2,ZhiZhang 1, and Venu Govindaraju 1 1 Center for Unified Biometrics and Sensors University at Buffalo,

More information

AUDIO-VISUAL EMOTION RECOGNITION USING AN EMOTION SPACE CONCEPT

AUDIO-VISUAL EMOTION RECOGNITION USING AN EMOTION SPACE CONCEPT 16th European Signal Processing Conference (EUSIPCO 28), Lausanne, Switzerland, August 25-29, 28, copyright by EURASIP AUDIO-VISUAL EMOTION RECOGNITION USING AN EMOTION SPACE CONCEPT Ittipan Kanluan, Michael

More information

EMOTION CLASSIFICATION: HOW DOES AN AUTOMATED SYSTEM COMPARE TO NAÏVE HUMAN CODERS?

EMOTION CLASSIFICATION: HOW DOES AN AUTOMATED SYSTEM COMPARE TO NAÏVE HUMAN CODERS? EMOTION CLASSIFICATION: HOW DOES AN AUTOMATED SYSTEM COMPARE TO NAÏVE HUMAN CODERS? Sefik Emre Eskimez, Kenneth Imade, Na Yang, Melissa Sturge- Apple, Zhiyao Duan, Wendi Heinzelman University of Rochester,

More information

Some Studies on of Raaga Emotions of Singers Using Gaussian Mixture Model

Some Studies on of Raaga Emotions of Singers Using Gaussian Mixture Model International Journal for Modern Trends in Science and Technology Volume: 03, Special Issue No: 01, February 2017 ISSN: 2455-3778 http://www.ijmtst.com Some Studies on of Raaga Emotions of Singers Using

More information

Blue Eyes Technology

Blue Eyes Technology Blue Eyes Technology D.D. Mondal #1, Arti Gupta *2, Tarang Soni *3, Neha Dandekar *4 1 Professor, Dept. of Electronics and Telecommunication, Sinhgad Institute of Technology and Science, Narhe, Maharastra,

More information

Heart Murmur Recognition Based on Hidden Markov Model

Heart Murmur Recognition Based on Hidden Markov Model Journal of Signal and Information Processing, 2013, 4, 140-144 http://dx.doi.org/10.4236/jsip.2013.42020 Published Online May 2013 (http://www.scirp.org/journal/jsip) Heart Murmur Recognition Based on

More information

This is the accepted version of this article. To be published as : This is the author version published as:

This is the accepted version of this article. To be published as : This is the author version published as: QUT Digital Repository: http://eprints.qut.edu.au/ This is the author version published as: This is the accepted version of this article. To be published as : This is the author version published as: Chew,

More information

Enhanced Feature Extraction for Speech Detection in Media Audio

Enhanced Feature Extraction for Speech Detection in Media Audio INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Enhanced Feature Extraction for Speech Detection in Media Audio Inseon Jang 1, ChungHyun Ahn 1, Jeongil Seo 1, Younseon Jang 2 1 Media Research Division,

More information

BFI-Based Speaker Personality Perception Using Acoustic-Prosodic Features

BFI-Based Speaker Personality Perception Using Acoustic-Prosodic Features BFI-Based Speaker Personality Perception Using Acoustic-Prosodic Features Chia-Jui Liu, Chung-Hsien Wu, Yu-Hsien Chiu* Department of Computer Science and Information Engineering, National Cheng Kung University,

More information

USING EMOTIONAL NOISE TO UNCLOUD AUDIO-VISUAL EMOTION PERCEPTUAL EVALUATION. Emily Mower Provost, Irene Zhu, and Shrikanth Narayanan

USING EMOTIONAL NOISE TO UNCLOUD AUDIO-VISUAL EMOTION PERCEPTUAL EVALUATION. Emily Mower Provost, Irene Zhu, and Shrikanth Narayanan USING EMOTIONAL NOISE TO UNCLOUD AUDIO-VISUAL EMOTION PERCEPTUAL EVALUATION Emily Mower Provost, Irene Zhu, and Shrikanth Narayanan Electrical Engineering and Computer Science, University of Michigan,

More information

EEG Signal Description with Spectral-Envelope- Based Speech Recognition Features for Detection of Neonatal Seizures

EEG Signal Description with Spectral-Envelope- Based Speech Recognition Features for Detection of Neonatal Seizures EEG Signal Description with Spectral-Envelope- Based Speech Recognition Features for Detection of Neonatal Seizures Temko A., Nadeu C., Marnane W., Boylan G., Lightbody G. presented by Ladislav Rampasek

More information

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES

USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES Varinthira Duangudom and David V Anderson School of Electrical and Computer Engineering, Georgia Institute of Technology Atlanta, GA 30332

More information

Performance of Gaussian Mixture Models as a Classifier for Pathological Voice

Performance of Gaussian Mixture Models as a Classifier for Pathological Voice PAGE 65 Performance of Gaussian Mixture Models as a Classifier for Pathological Voice Jianglin Wang, Cheolwoo Jo SASPL, School of Mechatronics Changwon ational University Changwon, Gyeongnam 64-773, Republic

More information

DISCRETE WAVELET PACKET TRANSFORM FOR ELECTROENCEPHALOGRAM- BASED EMOTION RECOGNITION IN THE VALENCE-AROUSAL SPACE

DISCRETE WAVELET PACKET TRANSFORM FOR ELECTROENCEPHALOGRAM- BASED EMOTION RECOGNITION IN THE VALENCE-AROUSAL SPACE DISCRETE WAVELET PACKET TRANSFORM FOR ELECTROENCEPHALOGRAM- BASED EMOTION RECOGNITION IN THE VALENCE-AROUSAL SPACE Farzana Kabir Ahmad*and Oyenuga Wasiu Olakunle Computational Intelligence Research Cluster,

More information

General Soundtrack Analysis

General Soundtrack Analysis General Soundtrack Analysis Dan Ellis oratory for Recognition and Organization of Speech and Audio () Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/

More information

Outline. Teager Energy and Modulation Features for Speech Applications. Dept. of ECE Technical Univ. of Crete

Outline. Teager Energy and Modulation Features for Speech Applications. Dept. of ECE Technical Univ. of Crete Teager Energy and Modulation Features for Speech Applications Alexandros Summariza(on Potamianos and Emo(on Tracking in Movies Dept. of ECE Technical Univ. of Crete Alexandros Potamianos, NatIONAL Tech.

More information

On Shape And the Computability of Emotions X. Lu, et al.

On Shape And the Computability of Emotions X. Lu, et al. On Shape And the Computability of Emotions X. Lu, et al. MICC Reading group 10.07.2013 1 On Shape and the Computability of Emotion X. Lu, P. Suryanarayan, R. B. Adams Jr., J. Li, M. G. Newman, J. Z. Wang

More information

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China

A Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China A Vision-based Affective Computing System Jieyu Zhao Ningbo University, China Outline Affective Computing A Dynamic 3D Morphable Model Facial Expression Recognition Probabilistic Graphical Models Some

More information

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions

Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions 3 Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions Ryoichi Takashima, Tetsuya Takiguchi and Yasuo Ariki Graduate School of System Informatics, Kobe University,

More information

LIE DETECTION SYSTEM USING INPUT VOICE SIGNAL K.Meena 1, K.Veena 2 (Corresponding Author: K.Veena) 1 Associate Professor, 2 Research Scholar,

LIE DETECTION SYSTEM USING INPUT VOICE SIGNAL K.Meena 1, K.Veena 2 (Corresponding Author: K.Veena) 1 Associate Professor, 2 Research Scholar, International Journal of Pure and Applied Mathematics Volume 117 No. 8 2017, 121-125 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: 10.12732/ijpam.v117i8.25

More information

Valence-arousal evaluation using physiological signals in an emotion recall paradigm. CHANEL, Guillaume, ANSARI ASL, Karim, PUN, Thierry.

Valence-arousal evaluation using physiological signals in an emotion recall paradigm. CHANEL, Guillaume, ANSARI ASL, Karim, PUN, Thierry. Proceedings Chapter Valence-arousal evaluation using physiological signals in an emotion recall paradigm CHANEL, Guillaume, ANSARI ASL, Karim, PUN, Thierry Abstract The work presented in this paper aims

More information

COMBINING CATEGORICAL AND PRIMITIVES-BASED EMOTION RECOGNITION. University of Southern California (USC), Los Angeles, CA, USA

COMBINING CATEGORICAL AND PRIMITIVES-BASED EMOTION RECOGNITION. University of Southern California (USC), Los Angeles, CA, USA COMBINING CATEGORICAL AND PRIMITIVES-BASED EMOTION RECOGNITION M. Grimm 1, E. Mower 2, K. Kroschel 1, and S. Narayanan 2 1 Institut für Nachrichtentechnik (INT), Universität Karlsruhe (TH), Karlsruhe,

More information

ITU-T. FG AVA TR Version 1.0 (10/2013) Part 3: Using audiovisual media A taxonomy of participation

ITU-T. FG AVA TR Version 1.0 (10/2013) Part 3: Using audiovisual media A taxonomy of participation International Telecommunication Union ITU-T TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU FG AVA TR Version 1.0 (10/2013) Focus Group on Audiovisual Media Accessibility Technical Report Part 3: Using

More information

Facial Emotion Recognition with Facial Analysis

Facial Emotion Recognition with Facial Analysis Facial Emotion Recognition with Facial Analysis İsmail Öztel, Cemil Öz Sakarya University, Faculty of Computer and Information Sciences, Computer Engineering, Sakarya, Türkiye Abstract Computer vision

More information

Selection of Emotionally Salient Audio-Visual Features for Modeling Human Evaluations of Synthetic Character Emotion Displays

Selection of Emotionally Salient Audio-Visual Features for Modeling Human Evaluations of Synthetic Character Emotion Displays Selection of Emotionally Salient Audio-Visual Features for Modeling Human Evaluations of Synthetic Character Emotion Displays Emily Mower #1, Maja J Matarić 2,Shrikanth Narayanan # 3 # Department of Electrical

More information

An Affect Prediction Approach through Depression Severity Parameter Incorporation in Neural Networks

An Affect Prediction Approach through Depression Severity Parameter Incorporation in Neural Networks INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden An Affect Prediction Approach through Depression Severity Parameter Incorporation in Neural Networks Rahul Gupta, Saurabh Sahu +, Carol Espy-Wilson

More information

Speech recognition in noisy environments: A survey

Speech recognition in noisy environments: A survey T-61.182 Robustness in Language and Speech Processing Speech recognition in noisy environments: A survey Yifan Gong presented by Tapani Raiko Feb 20, 2003 About the Paper Article published in Speech Communication

More information

Describing Human Emotions Through Mathematical Modelling

Describing Human Emotions Through Mathematical Modelling Describing Human Emotions Through Mathematical Modelling Kim Hartmann Ingo Siegert Stefan Glüge Andreas Wendemuth Michael Kotzyba Barbara Deml Faculty of Electrical Engineering and Information Technology,

More information

HISTOGRAM EQUALIZATION BASED FRONT-END PROCESSING FOR NOISY SPEECH RECOGNITION

HISTOGRAM EQUALIZATION BASED FRONT-END PROCESSING FOR NOISY SPEECH RECOGNITION 2 th May 216. Vol.87. No.2 25-216 JATIT & LLS. All rights reserved. HISTOGRAM EQUALIZATION BASED FRONT-END PROCESSING FOR NOISY SPEECH RECOGNITION 1 IBRAHIM MISSAOUI, 2 ZIED LACHIRI National Engineering

More information

Recognising Emotions from Keyboard Stroke Pattern

Recognising Emotions from Keyboard Stroke Pattern Recognising Emotions from Keyboard Stroke Pattern Preeti Khanna Faculty SBM, SVKM s NMIMS Vile Parle, Mumbai M.Sasikumar Associate Director CDAC, Kharghar Navi Mumbai ABSTRACT In day to day life, emotions

More information

Speech Emotion Recognition with Emotion-Pair based Framework Considering Emotion Distribution Information in Dimensional Emotion Space

Speech Emotion Recognition with Emotion-Pair based Framework Considering Emotion Distribution Information in Dimensional Emotion Space INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Speech Recognition with -Pair based Framework Considering Distribution Information in Dimensional Space Xi Ma 1,3, Zhiyong Wu 1,2,3, Jia Jia 1,3,

More information

Linguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions.

Linguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions. 24.963 Linguistic Phonetics Basic Audition Diagram of the inner ear removed due to copyright restrictions. 1 Reading: Keating 1985 24.963 also read Flemming 2001 Assignment 1 - basic acoustics. Due 9/22.

More information

A framework for the Recognition of Human Emotion using Soft Computing models

A framework for the Recognition of Human Emotion using Soft Computing models A framework for the Recognition of Human Emotion using Soft Computing models Md. Iqbal Quraishi Dept. of Information Technology Kalyani Govt Engg. College J Pal Choudhury Dept. of Information Technology

More information

HUMAN EMOTION DETECTION THROUGH FACIAL EXPRESSIONS

HUMAN EMOTION DETECTION THROUGH FACIAL EXPRESSIONS th June. Vol.88. No. - JATIT & LLS. All rights reserved. ISSN: -8 E-ISSN: 87- HUMAN EMOTION DETECTION THROUGH FACIAL EXPRESSIONS, KRISHNA MOHAN KUDIRI, ABAS MD SAID AND M YUNUS NAYAN Computer and Information

More information

Using simulated body language and colours to express emotions with the Nao robot

Using simulated body language and colours to express emotions with the Nao robot Using simulated body language and colours to express emotions with the Nao robot Wouter van der Waal S4120922 Bachelor Thesis Artificial Intelligence Radboud University Nijmegen Supervisor: Khiet Truong

More information

Sound Analysis Research at LabROSA

Sound Analysis Research at LabROSA Sound Analysis Research at LabROSA Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/

More information

I. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist

I. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 6, AUGUST 2010 1379 Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments Hynek Bořil,

More information

arxiv: v1 [cs.lg] 4 Feb 2019

arxiv: v1 [cs.lg] 4 Feb 2019 Machine Learning for Seizure Type Classification: Setting the benchmark Subhrajit Roy [000 0002 6072 5500], Umar Asif [0000 0001 5209 7084], Jianbin Tang [0000 0001 5440 0796], and Stefan Harrer [0000

More information

Analysis of Speech Recognition Techniques for use in a Non-Speech Sound Recognition System

Analysis of Speech Recognition Techniques for use in a Non-Speech Sound Recognition System Analysis of Recognition Techniques for use in a Sound Recognition System Michael Cowling, Member, IEEE and Renate Sitte, Member, IEEE Griffith University Faculty of Engineering & Information Technology

More information

Lecture 9: Speech Recognition: Front Ends

Lecture 9: Speech Recognition: Front Ends EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition: Front Ends 1 2 Recognizing Speech Feature Calculation Dan Ellis http://www.ee.columbia.edu/~dpwe/e682/

More information

GfK Verein. Detecting Emotions from Voice

GfK Verein. Detecting Emotions from Voice GfK Verein Detecting Emotions from Voice Respondents willingness to complete questionnaires declines But it doesn t necessarily mean that consumers have nothing to say about products or brands: GfK Verein

More information

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1 1. INTRODUCTION Sign language interpretation is one of the HCI applications where hand gesture plays important role for communication. This chapter discusses sign language interpretation system with present

More information

Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise

Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise 4 Special Issue Speech-Based Interfaces in Vehicles Research Report Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise Hiroyuki Hoshino Abstract This

More information

An assistive application identifying emotional state and executing a methodical healing process for depressive individuals.

An assistive application identifying emotional state and executing a methodical healing process for depressive individuals. An assistive application identifying emotional state and executing a methodical healing process for depressive individuals. Bandara G.M.M.B.O bhanukab@gmail.com Godawita B.M.D.T tharu9363@gmail.com Gunathilaka

More information

Effect of Sensor Fusion for Recognition of Emotional States Using Voice, Face Image and Thermal Image of Face

Effect of Sensor Fusion for Recognition of Emotional States Using Voice, Face Image and Thermal Image of Face Effect of Sensor Fusion for Recognition of Emotional States Using Voice, Face Image and Thermal Image of Face Yasunari Yoshitomi 1, Sung-Ill Kim 2, Takako Kawano 3 and Tetsuro Kitazoe 1 1:Department of

More information

Affect Intensity Estimation using Multiple Modalities

Affect Intensity Estimation using Multiple Modalities Affect Intensity Estimation using Multiple Modalities Amol S. Patwardhan, and Gerald M. Knapp Department of Mechanical and Industrial Engineering Louisiana State University apatwa3@lsu.edu Abstract One

More information

Jitter, Shimmer, and Noise in Pathological Voice Quality Perception

Jitter, Shimmer, and Noise in Pathological Voice Quality Perception ISCA Archive VOQUAL'03, Geneva, August 27-29, 2003 Jitter, Shimmer, and Noise in Pathological Voice Quality Perception Jody Kreiman and Bruce R. Gerratt Division of Head and Neck Surgery, School of Medicine

More information

Vital Responder: Real-time Health Monitoring of First- Responders

Vital Responder: Real-time Health Monitoring of First- Responders Vital Responder: Real-time Health Monitoring of First- Responders Ye Can 1,2 Advisors: Miguel Tavares Coimbra 2, Vijayakumar Bhagavatula 1 1 Department of Electrical & Computer Engineering, Carnegie Mellon

More information

Modeling and Recognizing Emotions from Audio Signals: A Review

Modeling and Recognizing Emotions from Audio Signals: A Review Modeling and Recognizing Emotions from Audio Signals: A Review 1 Ritu Tanwar, 2 Deepti Chaudhary 1 UG Scholar, 2 Assistant Professor, UIET, Kurukshetra University, Kurukshetra, Haryana, India ritu.tanwar2012@gmail.com,

More information

TESTS OF ROBUSTNESS OF GMM SPEAKER VERIFICATION IN VoIP TELEPHONY

TESTS OF ROBUSTNESS OF GMM SPEAKER VERIFICATION IN VoIP TELEPHONY ARCHIVES OF ACOUSTICS 32, 4 (Supplement), 187 192 (2007) TESTS OF ROBUSTNESS OF GMM SPEAKER VERIFICATION IN VoIP TELEPHONY Piotr STARONIEWICZ Wrocław University of Technology Institute of Telecommunications,

More information

Exploiting visual information for NAM recognition

Exploiting visual information for NAM recognition Exploiting visual information for NAM recognition Panikos Heracleous, Denis Beautemps, Viet-Anh Tran, Hélène Loevenbruck, Gérard Bailly To cite this version: Panikos Heracleous, Denis Beautemps, Viet-Anh

More information

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds Hearing Lectures Acoustics of Speech and Hearing Week 2-10 Hearing 3: Auditory Filtering 1. Loudness of sinusoids mainly (see Web tutorial for more) 2. Pitch of sinusoids mainly (see Web tutorial for more)

More information

ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES

ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES ISCA Archive ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES Allard Jongman 1, Yue Wang 2, and Joan Sereno 1 1 Linguistics Department, University of Kansas, Lawrence, KS 66045 U.S.A. 2 Department

More information

Discrete Signal Processing

Discrete Signal Processing 1 Discrete Signal Processing C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University http://www.cs.nctu.edu.tw/~cmliu/courses/dsp/ ( Office: EC538 (03)5731877 cmliu@cs.nctu.edu.tw

More information

Comparison of selected off-the-shelf solutions for emotion recognition based on facial expressions

Comparison of selected off-the-shelf solutions for emotion recognition based on facial expressions Comparison of selected off-the-shelf solutions for emotion recognition based on facial expressions Grzegorz Brodny, Agata Kołakowska, Agnieszka Landowska, Mariusz Szwoch, Wioleta Szwoch, Michał R. Wróbel

More information

Linguistic Phonetics Fall 2005

Linguistic Phonetics Fall 2005 MIT OpenCourseWare http://ocw.mit.edu 24.963 Linguistic Phonetics Fall 2005 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 24.963 Linguistic Phonetics

More information

! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics

! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics 2/14/18 Can hear whistle? Lecture 5 Psychoacoustics Based on slides 2009--2018 DeHon, Koditschek Additional Material 2014 Farmer 1 2 There are sounds we cannot hear Depends on frequency Where are we on

More information

An active unpleasantness control system for indoor noise based on auditory masking

An active unpleasantness control system for indoor noise based on auditory masking An active unpleasantness control system for indoor noise based on auditory masking Daisuke Ikefuji, Masato Nakayama, Takanabu Nishiura and Yoich Yamashita Graduate School of Information Science and Engineering,

More information

Multimodal emotion recognition from expressive faces, body gestures and speech

Multimodal emotion recognition from expressive faces, body gestures and speech Multimodal emotion recognition from expressive faces, body gestures and speech Ginevra Castellano 1, Loic Kessous 2, and George Caridakis 3 1 InfoMus Lab, DIST - University of Genova Viale Causa 13, I-16145,

More information

A Review on Dysarthria speech disorder

A Review on Dysarthria speech disorder A Review on Dysarthria speech disorder Miss. Yogita S. Mahadik, Prof. S. U. Deoghare, 1 Student, Dept. of ENTC, Pimpri Chinchwad College of Engg Pune, Maharashtra, India 2 Professor, Dept. of ENTC, Pimpri

More information

Audio-visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space

Audio-visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space 2010 International Conference on Pattern Recognition Audio-visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space Mihalis A. Nicolaou, Hatice Gunes and Maja Pantic, Department

More information

Lecture 3: Perception

Lecture 3: Perception ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 3: Perception 1. Ear Physiology 2. Auditory Psychophysics 3. Pitch Perception 4. Music Perception Dan Ellis Dept. Electrical Engineering, Columbia University

More information

Smart Multifunctional Digital Content Ecosystem Using Emotion Analysis of Voice

Smart Multifunctional Digital Content Ecosystem Using Emotion Analysis of Voice International Conference on Computer Systems and Technologies - CompSysTech 17 Smart Multifunctional Digital Content Ecosystem Using Emotion Analysis of Voice Alexander I. Iliev, Peter Stanchev Abstract:

More information

MSAS: An M-mental health care System for Automatic Stress detection

MSAS: An M-mental health care System for Automatic Stress detection Quarterly of Clinical Psychology Studies Allameh Tabataba i University Vol. 7, No. 28, Fall 2017, Pp 87-94 MSAS: An M-mental health care System for Automatic Stress detection Saeid Pourroostaei Ardakani*

More information

Hearing the Universal Language: Music and Cochlear Implants

Hearing the Universal Language: Music and Cochlear Implants Hearing the Universal Language: Music and Cochlear Implants Professor Hugh McDermott Deputy Director (Research) The Bionics Institute of Australia, Professorial Fellow The University of Melbourne Overview?

More information

Error Detection based on neural signals

Error Detection based on neural signals Error Detection based on neural signals Nir Even- Chen and Igor Berman, Electrical Engineering, Stanford Introduction Brain computer interface (BCI) is a direct communication pathway between the brain

More information