Enhanced Autocorrelation in Real World Emotion Recognition
|
|
- Bernadette Brown
- 5 years ago
- Views:
Transcription
1 Enhanced Autocorrelation in Real World Emotion Recognition Sascha Meudt Institute of Neural Information Processing University of Ulm Friedhelm Schwenker Institute of Neural Information Processing University of Ulm ABSTRACT Multimodal emotion recognition in real world environments is still a challenging task of affective computing research. Recognizing the affective or physiological state of an individual is difficult for humans as well as for computer systems, and thus finding suitable discriminative features is the most promising approach in multimodal emotion recognition. In the literature numerous features have been developed or adapted from related signal processing tasks. But still, classifying emotional states in real world scenarios is difficult and the performance of automatic classifiers is rather limited. This is mainly due to the fact that emotional states can not be distinguished by a well defined set of discriminating features. In this work we present an enhanced autocorrelation feature as a multi pitch detection feature and compare its performance to feature well known, and state-of-the-art in signal and speech processing. Results of the evaluation show that the enhanced autocorrelation outperform other state-of-the-art features in case of the challenge data set. The complexity of this benchmark data set lies in between real world data sets showing naturalistic emotional utterances, and the widely applied and well-understood acted emotional data sets. Keywords enhanced autocorrelation, audio features, emotion recognition, human computer interaction, affective computing 1. INTRODUCTION Human affective state has been attracted researchers in medical, psychological and cognitive science since the mid 196s [8, 24]. Because of the dramatically increasing computational power of modern information technology, numerous technical devices such as smart phones, tablets, laptops, personal computers and many more became more and more part of our daily life, and more even more important, the way we are using these computing devises has totally changed: Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from Permissions@acm.org. ICMI 14, November 12 16, 214, Istanbul, Turkey Copyright 214 ACM /14/11...$ Human-Computer interaction (HCI) in the 198s and 199s was dominated by keyboard and mouse interaction following a strict input-operation-output work flow. Recently, more and more options for multimodal interaction have been developed, e.g. control via speech or hand gestures, and in the future the analysis of the user s intentions or affective state will come into to focus of HCI research. Examples are the analysis of para-linguistic patterns or facial expressions, or more general, social signals of humans implicitly produced as a by-product when interacting or communicating to a computing device. This change in HCI caused computer scientists to focus on recognition of social signals, such as emotions, with the intention to use this information to alter subsequent actions in HCI scenarios. Humans express their emotions in many different ways and modalities, this makes automatic emotion recognition a very challenging task in computer science. In this emerging field, called affective computing, computers are seen as intelligent companions offering context sensitive advice and support to the human user, and techniques based on machine learning and pattern pattern recognition came more and more in the focus of research. In the past, pattern recognition research in emotion recognition has been done mostly on recognizing acted, usually over-expressed, well defined emotional categories, and segmented data (isolated video clips, audio files) [33, 3], typical datasets consisted of audio [2], video [16] or audio-video recordings [1]. The classification of such data simple, and thus, the achieved classification results are accurate [9, 26]. However, such acted emotions do not appear in every day life HCI situations, thus the community changed their scope more towards non-acted scenarios and benchmark data [2, 25] containing more realistic emotional appearances. A famous benchmark data set is the data base introduced in AVEC Challenge [31], the EmoRec [36] and the Last-Minute data set [23] Unfortunately the ground truth is somewhat unclear and the data are very noisy in this datasets [15]. In case of the EmotiW challenge the ground truth is given by the storyboard of the movies the videos had been taken from. Hence, it is even for humans difficult to categorize the acted emotion without context. Recognition systems thus have to deal with unreliable data containing various levels of expressiveness. In order to overcome those problems, different approaches are proposed [1, 19, 11]. Focus can, for example, be set on a single modality like video or speech signals and classification systems can thus be tailored on the special characteristics of those modalities. Advances in affective computing in recent 52
2 years have come from better classifiers and fusion systems, but also from the adaptation and development of novel, discriminating features. The EmotiW challenge [6, 4] in 214 [5] again offers a challenging bimodal benchmark data set. EmotiW 214 and 213 are data sets derived from cinema movies and thus the complexity of the emotional utterances are in between acted and realistic emotions. For this reason the EmotiW database is an important benchmark to develop new features for emotion recognition. Speech signals are appealing for emotion recognition because they are simple to process and their analyses present promising ways for future research [7, 27, 28, 29]. One of the main issues in designing an automatic emotion recognition system is the choice of the features that can represent the corresponding emotions. In recent years, several different feature types proved to be useful in the context of emotion recognition from speech: Nicholson et. al [21] used feature extraction based on pitch and linear predictive coding (LPC). In earlier work two feature types, comprising modulation spectrum (ModSpec) and relative spectral transform - perceptual linear prediction (RASTA-PLP) where used to recognize seven different emotions with an accuracy of approximately 79% on the Berlin database of emotional speech (EMO-DB [2]) [28, 22, 3, 32]. Other feature choices include the Mel Frequency Cepstral Coefficients (MFCC) as used in [17]. Developing powerful features which discriminate speech in their underling emotional categories in a good manor is still a hot topic in current research. In this paper we present the usage of the enhanced autocorrelation (EAC) feature in the field of emotion recognition. The feature was originally developed by Tolonen and Karjalainen [34] as multi pitch detector for music classification, to the best of our knowledge this feature has so far never been used in emotion estimation. In this paper we compare the performance of the EAC feature to a set of commonly applied features in emotion recognition from speech. The results show that EAC yields comparable and slightly better results on the AFEW 4. dataset. As many other features the EAC computation is based on an analysis in the frequency domain of the audio signal, hence resulting in a much higher output dimension which gives the classifier a better chance to discriminate the emotional classes. In the following we give a short overview of the used common features followed by a detailed introduction of the EAC feature. Afterwards we compare the features based on the challenge dataset and present the results of our approach on the test dataset. A final conclusion and discussion is given in the closing section of the paper. 2. FEATURES In the following 4 subsections we give a brief introduction on commonly used features in emotion recognition from speech. This is followed by a more detailed description of the EAC method. To the best of our knowledge EAC is used in the field of emotion recognition for the first time. Figure 1 gives an overview on the dimensions and shape of the utilized features. All features have in common that the audio signal is divided in overlapping windows, subsequently a Hamming window function w(i) = cos( 2πi ), M i = M,..., M 1 with window size M and sample index i 2 2 is applied to prevent edge effects. Finally in subsection 2.6 MFCC ModSpec LPC RastaPLP EAC Spectrogram Audio Signal Feature comparison of file avi (Angry) Time in seconds Figure 1: Shape comparison of all features on file avi of class angry. From top to bottom: Mel Frequency Cepstral Coefficients, Modulation Spectrum, Linear Predictive Coding, Relative Spectral Perceptual Linear Predictive, Enhanced Autocorrelation, audio signal in frequency domain and audio signal in time domain. we propose a feature fusion method to reduce the number of instances to one per film sequence. 2.1 Mel Frequency Cepstral Coefficients The MFCC representation of a signal is motivated by the way how humans percept audio modulation in their ear. A filter bank with linear spaced filters in the lower frequencies and logarithmic scaled filters in the upper frequencies called MEL filter bank is used during feature extraction. MFCC are commonly used features in short term feature cases since their compact ability to represent the important information in speech processing applications while retaining most of the phonetical significant acoustics [18] On each window a short-term fast Fourier transformation (FFT) is computed, and on the result the MEL filter bank of triangular filters is applied. The discrete cosine transformation (DCT) on each filter output yields the de-correlated cepstral coefficients of each window. A typical filter order is in the ranged from 8 and 24. In this work we extracted the coefficients on 4msec windows with an overlap of 2msec and a filter bank order of 2. 53
3 Time in sample numbers Sample value Audiosignal of avi FFT time doubled FFT Time in sample numbers x 1 4 EAC values of avi FFT dimension Non negative difference (EAC value) EAC dimension Time in window numbers Figure 2: A window out of the audio signal (top) and corresponding EAC analysis (bottom) of film sequence wav In the middle the zero clipped autocorrelation signal and their time doubled curve is shown. 2.2 Modulation Spectrum To reduce the influence of environmental noise Hermansky [12] introduced the Modulation Spectrum feature of speech to obtain the temporal dynamics of the speech segment. To gain the long term temporal modulations in the range from 2Hz to 16Hz window sizes of 2msec up to several seconds are applied. Due to the short audio segments of the EmotiW-challenge we fixed 2msec as the used window size. As for MFCC, firstly a FFT is computed on overlapping subwindows, and the MEL filter bank is applied too (we used 8 filters in this work). Instead of applying the DCT on each subwindow the response of all subwindows is aggregate and a FFT over the energy response per filter is calculated yielding the final feature vector. Strong rates of change of the vocal tract, holding important linguistic information are represented in the corresponding dimensions of the feature. 2.3 Linear Predictive Coding Instead of analysing the signal by using FFT based approaches in LPC the speech sample s(t) is approximated by linear combination of the past p samples [14]: s(t) a 1s(t 1) a ps(t p), (1) where the coefficients a 1,..., a p are assumed to be constant in a short signal segment. This results in a p dimensional vector corresponding to a curve fitting around the peaks of the short term log magnitude spectrum of a signal. As in the previous features the information is compressed, avoiding a transformation from time to frequency domain. In this work we use p = 12, extracted from 4msec windows with 2msec overlap. 2.4 Relative Spectral Perceptual Linear Predictive Coding The perceptually and biologically motivations of critical bands and equally loudness curve build the basis of the perceptual linear prediction (PLP) [13]. The sound pressure (db) that is required to perceive a sound of any frequency Figure 3: Audio signal and the corresponding EAC analysis of film sequence wav. Falling multi pitch in the very beginning, a low frequent single pitch from window 12 to 16 and a very noisy and wide pitch in the end can be observed. as loud as a reference sound of 1 khz is approximated by: (w E(w) = ) w 2 (2) (w ) (w ) As a result of this equation, frequencies below 1 khz need higher sound pressure levels than the reference sound while sounds between 2 and 5 khz need less pressure. As for MFCC filtering a critical band filtering is done (usual with about 21 filters), but in contrast to MFCC the filtering is linear in the Bark Scale and the filter shape is not triangular. PLP is sensitive towards spectral changes caused by transmission channels caused by different microphones or telephone voice compression algorithms. Therefore after band filtering Hermansky suggested equal loudness conversion by transforming the spectrum to the logarithmic domain and applying a relative spectral (Rasta) filter processing followed by using exponential re-transformation. Finally, LPC coefficients are calculated similar to subsection 2.3 over the critical band energies. In this work the features with filter order of 21, a window length of 25msec and an offset of 1msec have been applied. 2.5 Enhanced Auto Correlation The EAC feature was originally introduced by Tolonen and Karjalainen [34] as multi pitch analyses in various different fields of audio signal processing. In our work we use a slightly modified version of their extraction process in order to extract the main harmonics of the vocal cord and the glottal source of the speech signal. Different emotional states could cause a variety of muscle tension in the vocal tract which influence the produced sound signal. Extracting the multi pitch harmonics based on the periodicity of the signal could therefore be a reliable feature containing a lot of emotional information independent of the underling spoken content. In addition the EAC feature is a very high dimensional feature in the field of audio processing which might help a classifier to distinguish between the distributions of the emotional classes. 54
4 Fusion type instance wise window average window variance Feature type Voting - Late Fusion Early Fusion Early Fusion MFCC 24.7 (2.44) 25.6 (1.56) 22.8 (2.6) ModSpec 28.4 (1.71) 28. (3.86) 25.1 (1.95) LPC 21.9 (.8) 2.6 (2.18) 2.6 (3.23) RastaPLP 22.8 (1.61) 19.9 (1.36) 29.1 (1.6) EAC 29.2 (1.13) 3.7 (3.82) 3.7 (2.8) Table 1: Results for emotion recognition on the utilized features based on a 1-fold cross-validation on the unification of train and validation set (accuracy in percent, standard derivation in brackets). As in most other feature extraction procedures we first divide the signal in overlapping Hamming windows. For each window containing the input signal S the autocorrelation function ( ( acf(s) = Re ifft FFT (Hamming(S)) k)) (3) is applied in order to detect periodic signal parts. The parameter k allows the control of the periodicity detection with some non linear processing in the frequency domain. Peaks in the autocorrelation curve are an indicator for pitch periods. Normally, a lot of redundant information and noise is part of this curve. To improve the reliability of the detection the autocorrelation result is clipped at zero and all negative values are set to zero. In order to remove multiple peaks of the fundamental periods which are caused by the autocorrelation function the time doubled signal of the autocorrelation function is subtracted from the autocorrelation signal, and again all negative values are set to zero, yielding the final EAC feature vector. Figure 2 shows a audio signal window applied with the Hamming function (upper part). In the middle plot the red curve shows the autocorrelation curve of the window clipped to zero and the time doubled curve in green. Finally in the bottom line the EAC curve is displayed, showing a wide single peak at 31. Keeping in mind that this procedure is repeated for every sliding window on the audio signal (Figure 3 top) the EAC result could be drawn as in Figure 3 bottom where the x-axis displays the time, the y-axis donates the dimensions of the feature and the color corresponds to the value at a given point (from blue = zero to red = one). One could see a falling multi pitch in the very beginning, a low frequent single pitch from window 12 to 16 and a very noisy and wide pitch in the end. In the following the parameters of the EAC had been set to window size 124 sample values (47msec) with a overlap of 512 samples resulting in 512 EAC dimensions per window. The parameter k was set to 1/3 designated by empirical experiences. 2.6 Per film feature representation All above described features are derived from short time windows of a signal, resulting in a large number (2-3) of feature vectors per utterance or in case of the challenge film sequences. Keeping in mind that a typical EmotiWchallenge subset contains about 4 sequences the total number of instances presented to the classifier is above 1,. In order to reduce the number of training instances we applied two different early fusion techniques which reduces the number of instances per film sequence to one per sequence. First we compute the average vector x µ = 1 T T t=1 xt of all t = 1,..., T feature vectors per sequence. Computing the average loses the variety within a sequence, in order to reduce this disadvantage we second took the variance x σ 2 = 1 T T t=1 (xt xµ)2 of a vector sequence. In the following we build a separate classifier on the windowed sequence, sequence averages and variances. 3. RESULTS The evaluation is divided in two parts. First the results according to the comparison of the features ignoring the underling challenge are presented. Second we choose the most promising features containing EAC and ModSpec and a fusion of all features to participate in the challenge. The other feature types are evaluated on the test set without taking them into account in the challenge. 3.1 Feature comparison To compare the performance of the features we rearranged the train and validation set by combining them to a single dataset containing 962 film sequences. We extracted all features presented in section 2 in the windowed sequence, sequence average and variance option and applied a Support Vector Machine (SVM) with Gaussian kernel for each feature type. Each SVM was optimized according to their specific parameters C and γ based on unweighed accuracy measure. A 1-fold cross validation was applied. Table 1 gives an overview of the achieved results. In case of single frame based as well as sequence average and variance features the EAC outperforms all other features. The most common used MFCC feature performs only moderate on the EmotiW dataset, which is not very surprising based on our results of last years challenge participation[2]. On window instance based and sequence average vectors the ModSpec feature performs second best and third best on the variance of window based vectors. 3.2 Challenge Results Based on the previous results we decided to use the Mod- Sepc and EAC feature in all three (full, average and variance) options for the participation in the challenge. The six SVM s were trained and optimized on the train and validation set according to the challenge guidelines. Table 2 gives an overview of the results and endorses the dominance of the EAC feature family. Finally a fusion architecture was applied by summarizing the probabilistic outputs of the SVM classifiers which build an ensemble of the six members. The fusion architecture yield a overall accuracy of 4.1% which is gently higher than each single feature. As on the feature comparison results again the EAC features are better than the ModSpec feature. 55
5 Fusion type instance wise window average window variance Feature type Voting - Late Fusion Early Fusion Early Fusion MFCC ModSpec LPC RastaPLP EAC Fusion of all 4.1 Baseline 26.2 Table 2: Results comparison for emotion recognition, trained on the train set according to the challenge guidelines. Classification Accuracy in percent Angry Disgust Fear Happy Neutral Sad Surprise Accuracy Angry Disgust Fear Happy Neutral Sad Surprise Table 3: Confusion matrix of average EAC feature results on test set. Accuracy in percent, confusion matrix in absolute numbers. In table 3 the confusion matrix is evinced, which shows a strong imbalance oft the classification accuracies along the different classes. Neutral and angry could be recognized very well while disgust and surprise couldn t be perceived at all. Compared to the last year results where we also had a imbalance of the classification accuracy towards the classes angry, neutral and happiness this imbalancing is not very surprising. May the poor results of disgust, sadness and surprise result from the lower train data amount and a priory probability in test data. In case of the challenge this result is a bit unsatisfying, while in the scope of affective computing were neutral and angry are very important classes it is a very promising result. 4. CONCLUSIONS After our work in the field of feature selection in the last year we focused more on the development and usage of new or alternative features in the field of speech based emotion recognition on datasets which are recorded under mostly realistic and not strict controlled environments like the AFEW 4. dataset. In this work we presenter the EAC as a new feature and compared it to several state of the art features. All common features had been outperformed in all cases. The best result on the challenge dataset (4% accuracy) was archived by using a fusion combination of all features together an was slightly better than the best single feature (EAC average with 37% accuracy). In our opinion it is very important to search for new features which could discriminate speech towards emotions better than the existing ones which had mostly been developed for speech recognition, speaker identification or music classification. Based on better features the underlain classification problem could get much easier, which then result in higher classification accuracy based on single frame or short term signal analyses. May it is also suitable to analyse features towards their discrimination ability by using correlation dimension analyses [35]. Improving the basic input results could then finally improve results after a well designed multi modal fusion architecture dramatically. 5. REFERENCES [1] T. Bänziger, H. Pirker, and K. Scherer. Gemep-geneva multimodal emotion portrayals: A corpus for the study of multimodal emotional expressions. In Proceedings of LREC, volume 6, pages 15 19, 26. [2] F. Burkhardt, A. Paeschke, M. Rolfes, W. F. Sendlmeier, and B. Weiss. A database of german emotional speech. In Interspeech, volume 5, pages , 25. [3] F. Dellaert, T. Polzin, and A. Waibel. Recognizing emotion in speech. In Spoken Language, ICSLP 96. Proceedings., Fourth International Conference on, volume 3, pages IEEE, [4] A. Dhall, R. Goecke, J. Joshi, M. Wagner, and T. Gedeon. Emotion recognition in the wild challenge. ACM ICMI, 213. [5] A. Dhall, R. Goecke, J. Joshi, M. Wagner, and T. Gedeon. Emotion recognition in the wild challenge 214: Baseline, data and protocol. ACM ICMI, 214. [6] A. Dhall, R. Goecke, S. Lucey, and T. Gedeon. Collecting large, richly annotated facial-expression databases from movies. IEEE Multimedia, 19:34 41, 212. [7] N. Fragopanagos and J. Taylor. Emotion recognition in human-computer interaction. Neural Networks, 18:389 45, 25. [8] N. H. Frijda. Recognition of emotion. Advances in experimental social psychology, 4: , [9] M. Glodek, M. Schels, G. Palm, and F. Schwenker. Multi-modal fusion based on classifiers using reject 56
6 options and markov fusion networks. In Proceedings of the International Conference on Pattern Recognition (ICPR), pages IEEE, 212. [1] M. Glodek, S. Scherer, F. Schwenker, and G. Palm. Conditioned hidden markov model fusion for multimodal classification. In INTERSPEECH, pages , 211. [11] M. Glodek, S. Tschechne, G. Layher, M. Schels, T. Brosch, S. Scherer, M. Kächele, M. Schmidt, H. Neumann, G. Palm, et al. Multiple classifier systems for the classification of audio-visual emotional states. In Affective Computing and Intelligent Interaction, pages Springer, 211. [12] H. Hermansky. The modulation spectrum in the automatic recognition of speech. In Automatic Speech Recognition and Understanding, Proceedings., 1997 IEEE Workshop on, pages IEEE, [13] H. Hermansky, N. Morgan, A. Bayya, and P. Kohn. Rasta-plp speech analysis technique. In Acoustics, Speech, and Signal Processing, IEEE International Conference on, volume 1, pages IEEE, [14] F. Itakura. Line spectrum representation of linear predictor coefficients of speech signals. The Journal of the Acoustical Society of America, 57(S1):S35 S35, [15] M. Kächele, M. Glodek, D. Zharkov, S. Meudt, and F. Schwenker. Fusion of audio-visual features using hierarchical classifier systems for the recognition of affective states and the state of depression. In Procȯf ICPRAM, pages , 214. [16] T. Kanade, J. F. Cohn, and Y. Tian. Comprehensive database for facial expression analysis. In Automatic Face and Gesture Recognition, 2. Proceedings. Fourth IEEE International Conference on, pages IEEE, 2. [17] C. M. Lee, S. Yildirim, M. Bulut, A. Kazemzadeh, C. Busso, Z. Deng, S. Lee, and S. S. Narayanan. Emotion recognition based on phoneme classes. In Proceedings of ICSLP 4, 24. [18] B. Logan et al. Mel frequency cepstral coefficients for music modeling. In ISMIR, 2. [19] S. Meudt and F. Schwenker. On instance selection in audio based emotion recognition. In Artificial Neural Networks in Pattern Recognition, pages Springer, 212. [2] S. Meudt, D. Zharkov, M. Kächele, and F. Schwenker. Multi classifier systems and forward backward feature selection algorithms to classify emotional coloured speech. In Proceedings of the 15th ACM on International conference on multimodal interaction, pages ACM, 213. [21] J. Nicholson, K. Takahashi, and R. Nakatsu. Emotion recognition in speech using neural networks. Neural Computing and Applications, 9:29 296, 2. [22] G. Palm and F. Schwenker. Sensor-fusion in neural networks. In E. Shahbazian, G. Rogova, and M. J. DeWeert, editors, Harbour Protection Through Data Fusion Technologies, pages Springer, 29. [23] D. Rösner, J. Frommer, R. Friesen, M. Haase, J. Lange, and M. Otto. Last minute: a multimodal corpus of speech-based user-companion interactions. In LREC, pages , 212. [24] S. Schachter. The interaction of cognitive and physiological determinants of emotional state. Advances in Experimental Social Psychology, 1(Bd. 1):49 8, [25] M. Schels, M. Glodek, S. Meudt, S. Scherer, M. Schmidt, G. Layher, S. Tschechne, T. Brosch, D. Hrabal, S. Walter, G. Palm, H. Neumann, H. Traue, and F. Schwenker. Multi-modal classifier-fusion for the recognition of emotions. In M. Rojc and N. Campbell, editors, Coverbal synchrony in Human-Machine Interaction, chapter 4. CRC Press, 213. [26] M. Schels, M. Kächele, M. Glodek, D. Hrabal, S. Walter, and F. Schwenker. Using unlabeled data to improve classification of emotional states in human computer interaction. Journal on Multimodal User Interfaces, 8(1):5 16, 214. [27] K. R. Scherer, T. Johnstone, and G. Klasmeyer. Handbook of Affective Sciences - Vocal expression of emotion, chapter 23, pages Affective Science. Oxford University Press, 23. [28] S. Scherer, M. Oubbati, F. Schwenker, and G. Palm. Real-time emotion recognition from speech using echo state networks. In Artificial Neural Networks in Pattern Recognition, pages Springer Berlin Heidelberg, 28. [29] S. Scherer, F. Schwenker, and G. Palm. Emotion recognition from speech using multi-classifier systems and rbf-ensembles. In Speech, Audio, Image and Biomedical Signal Processing using Neural Networks, pages Springer Berlin Heidelberg, 28. [3] S. Scherer, F. Schwenker, and G. Palm. Classifier fusion for emotion recognition from speech. In W. Minker, M. Weber, H. Hagras, V. Callagan, and A. D. Kameas, editors, Advanced Intelligent Environments, pages Springer, 29. [31] B. Schuller, M. Valstar, F. Eyben, G. McKeown, R. Cowie, and M. Pantic. Avec 211 the first international audio/visual emotion challenge. In Affective Computing and Intelligent Interaction, pages Springer, 211. [32] F. Schwenker, S. Scherer, M. Schmidt, M. Schels, and M. Glodek. Multiple classifier systems for the recognition of human emotions. In N. E. Gayar, J. Kittler, and F. Roli, editors, Proceedings of the 9th International Workshop on Multiple Classifier Systems (MCS 1), LNCS 5997, pages Springer, 21. [33] Y.-L. Tian, T. Kanade, and J. F. Cohn. Facial expression analysis. In Handbook of face recognition, pages Springer, 25. [34] T. Tolonen and M. Karjalainen. A computationally efficient multipitch analysis model. Speech and Audio Processing, IEEE Transactions on, 8(6):78 716, 2. [35] C. Traina, A. Traina, L. Wu, and C. Faloutsos. Fast feature selection using fractal dimension. 2. [36] S. Walter, S. Scherer, M. Schels, M. Glodek, D. Hrabal, M. Schmidt, R. Böck, K. Limbrecht, H. C. Traue, and F. Schwenker. Multimodal emotion classification in naturalistic user behavior. In Human-Computer Interaction. Towards Mobile and Intelligent Interaction Environments, pages Springer,
Audio-based Emotion Recognition for Advanced Automatic Retrieval in Judicial Domain
Audio-based Emotion Recognition for Advanced Automatic Retrieval in Judicial Domain F. Archetti 1,2, G. Arosio 1, E. Fersini 1, E. Messina 1 1 DISCO, Università degli Studi di Milano-Bicocca, Viale Sarca,
More informationAnalysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information
Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, S. Narayanan Emotion
More informationEmotion Recognition Modulating the Behavior of Intelligent Systems
2013 IEEE International Symposium on Multimedia Emotion Recognition Modulating the Behavior of Intelligent Systems Asim Smailagic, Daniel Siewiorek, Alex Rudnicky, Sandeep Nallan Chakravarthula, Anshuman
More informationCOMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM: APPLICATION TO SPEECH EMOTION RECOGNITION
Journal of Engineering Science and Technology Vol. 11, No. 9 (2016) 1221-1233 School of Engineering, Taylor s University COMPARISON BETWEEN GMM-SVM SEQUENCE KERNEL AND GMM: APPLICATION TO SPEECH EMOTION
More informationSPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)
SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM) Virendra Chauhan 1, Shobhana Dwivedi 2, Pooja Karale 3, Prof. S.M. Potdar 4 1,2,3B.E. Student 4 Assitant Professor 1,2,3,4Department of Electronics
More informationGender Based Emotion Recognition using Speech Signals: A Review
50 Gender Based Emotion Recognition using Speech Signals: A Review Parvinder Kaur 1, Mandeep Kaur 2 1 Department of Electronics and Communication Engineering, Punjabi University, Patiala, India 2 Department
More informationAdvanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment
DISTRIBUTION STATEMENT A Approved for Public Release Distribution Unlimited Advanced Audio Interface for Phonetic Speech Recognition in a High Noise Environment SBIR 99.1 TOPIC AF99-1Q3 PHASE I SUMMARY
More informationEMOTION DETECTION THROUGH SPEECH AND FACIAL EXPRESSIONS
EMOTION DETECTION THROUGH SPEECH AND FACIAL EXPRESSIONS 1 KRISHNA MOHAN KUDIRI, 2 ABAS MD SAID AND 3 M YUNUS NAYAN 1 Computer and Information Sciences, Universiti Teknologi PETRONAS, Malaysia 2 Assoc.
More informationFormulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification
Formulating Emotion Perception as a Probabilistic Model with Application to Categorical Emotion Classification Reza Lotfian and Carlos Busso Multimodal Signal Processing (MSP) lab The University of Texas
More informationMulti-Modal Classifier- Fusion for the Recognition of Emotions
CHAPTER 4 Multi-Modal Classifier- Fusion for the Recognition of Emotions Martin Schels, Michael Glodek, Sascha Meudt, Stefan Scherer, Miriam Schmidt, Georg Layher, Stephan Tschechne, Tobias Brosch, David
More informationEMOTION DETECTION FROM VOICE BASED CLASSIFIED FRAME-ENERGY SIGNAL USING K-MEANS CLUSTERING
EMOTION DETECTION FROM VOICE BASED CLASSIFIED FRAME-ENERGY SIGNAL USING K-MEANS CLUSTERING Nazia Hossain 1, Rifat Jahan 2, and Tanjila Tabasum Tunka 3 1 Senior Lecturer, Department of Computer Science
More informationDimensional Emotion Prediction from Spontaneous Head Gestures for Interaction with Sensitive Artificial Listeners
Dimensional Emotion Prediction from Spontaneous Head Gestures for Interaction with Sensitive Artificial Listeners Hatice Gunes and Maja Pantic Department of Computing, Imperial College London 180 Queen
More informationA Common Framework for Real-Time Emotion Recognition and Facial Action Unit Detection
A Common Framework for Real-Time Emotion Recognition and Facial Action Unit Detection Tobias Gehrig and Hazım Kemal Ekenel Facial Image Processing and Analysis Group, Institute for Anthropomatics Karlsruhe
More informationEMOTIONS are one of the most essential components of
1 Hidden Markov Model for Emotion Detection in Speech Cyprien de Lichy, Pratyush Havelia, Raunaq Rewari Abstract This paper seeks to classify speech inputs into emotion labels. Emotions are key to effective
More informationA Multilevel Fusion Approach for Audiovisual Emotion Recognition
A Multilevel Fusion Approach for Audiovisual Emotion Recognition Girija Chetty & Michael Wagner National Centre for Biometric Studies Faculty of Information Sciences and Engineering University of Canberra,
More informationResearch Article Automatic Speaker Recognition for Mobile Forensic Applications
Hindawi Mobile Information Systems Volume 07, Article ID 698639, 6 pages https://doi.org//07/698639 Research Article Automatic Speaker Recognition for Mobile Forensic Applications Mohammed Algabri, Hassan
More informationComputational Perception /785. Auditory Scene Analysis
Computational Perception 15-485/785 Auditory Scene Analysis A framework for auditory scene analysis Auditory scene analysis involves low and high level cues Low level acoustic cues are often result in
More informationDecision tree SVM model with Fisher feature selection for speech emotion recognition
Sun et al. EURASIP Journal on Audio, Speech, and Music Processing (2019) 2019:2 https://doi.org/10.1186/s13636-018-0145-5 RESEARCH Decision tree SVM model with Fisher feature selection for speech emotion
More informationEmotion Recognition using a Cauchy Naive Bayes Classifier
Emotion Recognition using a Cauchy Naive Bayes Classifier Abstract Recognizing human facial expression and emotion by computer is an interesting and challenging problem. In this paper we propose a method
More informationFacial expression recognition with spatiotemporal local descriptors
Facial expression recognition with spatiotemporal local descriptors Guoying Zhao, Matti Pietikäinen Machine Vision Group, Infotech Oulu and Department of Electrical and Information Engineering, P. O. Box
More informationFrequency Tracking: LMS and RLS Applied to Speech Formant Estimation
Aldebaro Klautau - http://speech.ucsd.edu/aldebaro - 2/3/. Page. Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation ) Introduction Several speech processing algorithms assume the signal
More informationDivide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest
Published as conference paper in The 2nd International Integrated Conference & Concert on Convergence (2016) Divide-and-Conquer based Ensemble to Spot Emotions in Speech using MFCC and Random Forest Abdul
More informationIEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 5, JULY
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 5, JULY 2011 1057 A Framework for Automatic Human Emotion Classification Using Emotion Profiles Emily Mower, Student Member, IEEE,
More informationCONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS
CONSTRUCTING TELEPHONE ACOUSTIC MODELS FROM A HIGH-QUALITY SPEECH CORPUS Mitchel Weintraub and Leonardo Neumeyer SRI International Speech Research and Technology Program Menlo Park, CA, 94025 USA ABSTRACT
More informationHierarchical Classification of Emotional Speech
1 Hierarchical Classification of Emotional Speech Zhongzhe Xiao 1, Emmanuel Dellandrea 1, Weibei Dou 2, Liming Chen 1 1 LIRIS Laboratory (UMR 5205), Ecole Centrale de Lyon, Department of Mathematic and
More informationCHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION 1.1 BACKGROUND Speech is the most natural form of human communication. Speech has also become an important means of human-machine interaction and the advancement in technology has
More informationSPEECH EMOTION RECOGNITION: ARE WE THERE YET?
SPEECH EMOTION RECOGNITION: ARE WE THERE YET? CARLOS BUSSO Multimodal Signal Processing (MSP) lab The University of Texas at Dallas Erik Jonsson School of Engineering and Computer Science Why study emotion
More informationNoise-Robust Speech Recognition Technologies in Mobile Environments
Noise-Robust Speech Recognition echnologies in Mobile Environments Mobile environments are highly influenced by ambient noise, which may cause a significant deterioration of speech recognition performance.
More informationFREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED
FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED Francisco J. Fraga, Alan M. Marotta National Institute of Telecommunications, Santa Rita do Sapucaí - MG, Brazil Abstract A considerable
More informationCombination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency
Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency Tetsuya Shimamura and Fumiya Kato Graduate School of Science and Engineering Saitama University 255 Shimo-Okubo,
More informationSound Texture Classification Using Statistics from an Auditory Model
Sound Texture Classification Using Statistics from an Auditory Model Gabriele Carotti-Sha Evan Penn Daniel Villamizar Electrical Engineering Email: gcarotti@stanford.edu Mangement Science & Engineering
More informationLATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION
LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION Lu Xugang Li Gang Wang Lip0 Nanyang Technological University, School of EEE, Workstation Resource
More informationEEL 6586, Project - Hearing Aids algorithms
EEL 6586, Project - Hearing Aids algorithms 1 Yan Yang, Jiang Lu, and Ming Xue I. PROBLEM STATEMENT We studied hearing loss algorithms in this project. As the conductive hearing loss is due to sound conducting
More informationResearch Proposal on Emotion Recognition
Research Proposal on Emotion Recognition Colin Grubb June 3, 2012 Abstract In this paper I will introduce my thesis question: To what extent can emotion recognition be improved by combining audio and visual
More informationFacial Expression Biometrics Using Tracker Displacement Features
Facial Expression Biometrics Using Tracker Displacement Features Sergey Tulyakov 1, Thomas Slowe 2,ZhiZhang 1, and Venu Govindaraju 1 1 Center for Unified Biometrics and Sensors University at Buffalo,
More informationAUDIO-VISUAL EMOTION RECOGNITION USING AN EMOTION SPACE CONCEPT
16th European Signal Processing Conference (EUSIPCO 28), Lausanne, Switzerland, August 25-29, 28, copyright by EURASIP AUDIO-VISUAL EMOTION RECOGNITION USING AN EMOTION SPACE CONCEPT Ittipan Kanluan, Michael
More informationEMOTION CLASSIFICATION: HOW DOES AN AUTOMATED SYSTEM COMPARE TO NAÏVE HUMAN CODERS?
EMOTION CLASSIFICATION: HOW DOES AN AUTOMATED SYSTEM COMPARE TO NAÏVE HUMAN CODERS? Sefik Emre Eskimez, Kenneth Imade, Na Yang, Melissa Sturge- Apple, Zhiyao Duan, Wendi Heinzelman University of Rochester,
More informationSome Studies on of Raaga Emotions of Singers Using Gaussian Mixture Model
International Journal for Modern Trends in Science and Technology Volume: 03, Special Issue No: 01, February 2017 ISSN: 2455-3778 http://www.ijmtst.com Some Studies on of Raaga Emotions of Singers Using
More informationBlue Eyes Technology
Blue Eyes Technology D.D. Mondal #1, Arti Gupta *2, Tarang Soni *3, Neha Dandekar *4 1 Professor, Dept. of Electronics and Telecommunication, Sinhgad Institute of Technology and Science, Narhe, Maharastra,
More informationHeart Murmur Recognition Based on Hidden Markov Model
Journal of Signal and Information Processing, 2013, 4, 140-144 http://dx.doi.org/10.4236/jsip.2013.42020 Published Online May 2013 (http://www.scirp.org/journal/jsip) Heart Murmur Recognition Based on
More informationThis is the accepted version of this article. To be published as : This is the author version published as:
QUT Digital Repository: http://eprints.qut.edu.au/ This is the author version published as: This is the accepted version of this article. To be published as : This is the author version published as: Chew,
More informationEnhanced Feature Extraction for Speech Detection in Media Audio
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Enhanced Feature Extraction for Speech Detection in Media Audio Inseon Jang 1, ChungHyun Ahn 1, Jeongil Seo 1, Younseon Jang 2 1 Media Research Division,
More informationBFI-Based Speaker Personality Perception Using Acoustic-Prosodic Features
BFI-Based Speaker Personality Perception Using Acoustic-Prosodic Features Chia-Jui Liu, Chung-Hsien Wu, Yu-Hsien Chiu* Department of Computer Science and Information Engineering, National Cheng Kung University,
More informationUSING EMOTIONAL NOISE TO UNCLOUD AUDIO-VISUAL EMOTION PERCEPTUAL EVALUATION. Emily Mower Provost, Irene Zhu, and Shrikanth Narayanan
USING EMOTIONAL NOISE TO UNCLOUD AUDIO-VISUAL EMOTION PERCEPTUAL EVALUATION Emily Mower Provost, Irene Zhu, and Shrikanth Narayanan Electrical Engineering and Computer Science, University of Michigan,
More informationEEG Signal Description with Spectral-Envelope- Based Speech Recognition Features for Detection of Neonatal Seizures
EEG Signal Description with Spectral-Envelope- Based Speech Recognition Features for Detection of Neonatal Seizures Temko A., Nadeu C., Marnane W., Boylan G., Lightbody G. presented by Ladislav Rampasek
More informationUSING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES
USING AUDITORY SALIENCY TO UNDERSTAND COMPLEX AUDITORY SCENES Varinthira Duangudom and David V Anderson School of Electrical and Computer Engineering, Georgia Institute of Technology Atlanta, GA 30332
More informationPerformance of Gaussian Mixture Models as a Classifier for Pathological Voice
PAGE 65 Performance of Gaussian Mixture Models as a Classifier for Pathological Voice Jianglin Wang, Cheolwoo Jo SASPL, School of Mechatronics Changwon ational University Changwon, Gyeongnam 64-773, Republic
More informationDISCRETE WAVELET PACKET TRANSFORM FOR ELECTROENCEPHALOGRAM- BASED EMOTION RECOGNITION IN THE VALENCE-AROUSAL SPACE
DISCRETE WAVELET PACKET TRANSFORM FOR ELECTROENCEPHALOGRAM- BASED EMOTION RECOGNITION IN THE VALENCE-AROUSAL SPACE Farzana Kabir Ahmad*and Oyenuga Wasiu Olakunle Computational Intelligence Research Cluster,
More informationGeneral Soundtrack Analysis
General Soundtrack Analysis Dan Ellis oratory for Recognition and Organization of Speech and Audio () Electrical Engineering, Columbia University http://labrosa.ee.columbia.edu/
More informationOutline. Teager Energy and Modulation Features for Speech Applications. Dept. of ECE Technical Univ. of Crete
Teager Energy and Modulation Features for Speech Applications Alexandros Summariza(on Potamianos and Emo(on Tracking in Movies Dept. of ECE Technical Univ. of Crete Alexandros Potamianos, NatIONAL Tech.
More informationOn Shape And the Computability of Emotions X. Lu, et al.
On Shape And the Computability of Emotions X. Lu, et al. MICC Reading group 10.07.2013 1 On Shape and the Computability of Emotion X. Lu, P. Suryanarayan, R. B. Adams Jr., J. Li, M. G. Newman, J. Z. Wang
More informationA Vision-based Affective Computing System. Jieyu Zhao Ningbo University, China
A Vision-based Affective Computing System Jieyu Zhao Ningbo University, China Outline Affective Computing A Dynamic 3D Morphable Model Facial Expression Recognition Probabilistic Graphical Models Some
More informationSingle-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions
3 Single-Channel Sound Source Localization Based on Discrimination of Acoustic Transfer Functions Ryoichi Takashima, Tetsuya Takiguchi and Yasuo Ariki Graduate School of System Informatics, Kobe University,
More informationLIE DETECTION SYSTEM USING INPUT VOICE SIGNAL K.Meena 1, K.Veena 2 (Corresponding Author: K.Veena) 1 Associate Professor, 2 Research Scholar,
International Journal of Pure and Applied Mathematics Volume 117 No. 8 2017, 121-125 ISSN: 1311-8080 (printed version); ISSN: 1314-3395 (on-line version) url: http://www.ijpam.eu doi: 10.12732/ijpam.v117i8.25
More informationValence-arousal evaluation using physiological signals in an emotion recall paradigm. CHANEL, Guillaume, ANSARI ASL, Karim, PUN, Thierry.
Proceedings Chapter Valence-arousal evaluation using physiological signals in an emotion recall paradigm CHANEL, Guillaume, ANSARI ASL, Karim, PUN, Thierry Abstract The work presented in this paper aims
More informationCOMBINING CATEGORICAL AND PRIMITIVES-BASED EMOTION RECOGNITION. University of Southern California (USC), Los Angeles, CA, USA
COMBINING CATEGORICAL AND PRIMITIVES-BASED EMOTION RECOGNITION M. Grimm 1, E. Mower 2, K. Kroschel 1, and S. Narayanan 2 1 Institut für Nachrichtentechnik (INT), Universität Karlsruhe (TH), Karlsruhe,
More informationITU-T. FG AVA TR Version 1.0 (10/2013) Part 3: Using audiovisual media A taxonomy of participation
International Telecommunication Union ITU-T TELECOMMUNICATION STANDARDIZATION SECTOR OF ITU FG AVA TR Version 1.0 (10/2013) Focus Group on Audiovisual Media Accessibility Technical Report Part 3: Using
More informationFacial Emotion Recognition with Facial Analysis
Facial Emotion Recognition with Facial Analysis İsmail Öztel, Cemil Öz Sakarya University, Faculty of Computer and Information Sciences, Computer Engineering, Sakarya, Türkiye Abstract Computer vision
More informationSelection of Emotionally Salient Audio-Visual Features for Modeling Human Evaluations of Synthetic Character Emotion Displays
Selection of Emotionally Salient Audio-Visual Features for Modeling Human Evaluations of Synthetic Character Emotion Displays Emily Mower #1, Maja J Matarić 2,Shrikanth Narayanan # 3 # Department of Electrical
More informationAn Affect Prediction Approach through Depression Severity Parameter Incorporation in Neural Networks
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden An Affect Prediction Approach through Depression Severity Parameter Incorporation in Neural Networks Rahul Gupta, Saurabh Sahu +, Carol Espy-Wilson
More informationSpeech recognition in noisy environments: A survey
T-61.182 Robustness in Language and Speech Processing Speech recognition in noisy environments: A survey Yifan Gong presented by Tapani Raiko Feb 20, 2003 About the Paper Article published in Speech Communication
More informationDescribing Human Emotions Through Mathematical Modelling
Describing Human Emotions Through Mathematical Modelling Kim Hartmann Ingo Siegert Stefan Glüge Andreas Wendemuth Michael Kotzyba Barbara Deml Faculty of Electrical Engineering and Information Technology,
More informationHISTOGRAM EQUALIZATION BASED FRONT-END PROCESSING FOR NOISY SPEECH RECOGNITION
2 th May 216. Vol.87. No.2 25-216 JATIT & LLS. All rights reserved. HISTOGRAM EQUALIZATION BASED FRONT-END PROCESSING FOR NOISY SPEECH RECOGNITION 1 IBRAHIM MISSAOUI, 2 ZIED LACHIRI National Engineering
More informationRecognising Emotions from Keyboard Stroke Pattern
Recognising Emotions from Keyboard Stroke Pattern Preeti Khanna Faculty SBM, SVKM s NMIMS Vile Parle, Mumbai M.Sasikumar Associate Director CDAC, Kharghar Navi Mumbai ABSTRACT In day to day life, emotions
More informationSpeech Emotion Recognition with Emotion-Pair based Framework Considering Emotion Distribution Information in Dimensional Emotion Space
INTERSPEECH 2017 August 20 24, 2017, Stockholm, Sweden Speech Recognition with -Pair based Framework Considering Distribution Information in Dimensional Space Xi Ma 1,3, Zhiyong Wu 1,2,3, Jia Jia 1,3,
More informationLinguistic Phonetics. Basic Audition. Diagram of the inner ear removed due to copyright restrictions.
24.963 Linguistic Phonetics Basic Audition Diagram of the inner ear removed due to copyright restrictions. 1 Reading: Keating 1985 24.963 also read Flemming 2001 Assignment 1 - basic acoustics. Due 9/22.
More informationA framework for the Recognition of Human Emotion using Soft Computing models
A framework for the Recognition of Human Emotion using Soft Computing models Md. Iqbal Quraishi Dept. of Information Technology Kalyani Govt Engg. College J Pal Choudhury Dept. of Information Technology
More informationHUMAN EMOTION DETECTION THROUGH FACIAL EXPRESSIONS
th June. Vol.88. No. - JATIT & LLS. All rights reserved. ISSN: -8 E-ISSN: 87- HUMAN EMOTION DETECTION THROUGH FACIAL EXPRESSIONS, KRISHNA MOHAN KUDIRI, ABAS MD SAID AND M YUNUS NAYAN Computer and Information
More informationUsing simulated body language and colours to express emotions with the Nao robot
Using simulated body language and colours to express emotions with the Nao robot Wouter van der Waal S4120922 Bachelor Thesis Artificial Intelligence Radboud University Nijmegen Supervisor: Khiet Truong
More informationSound Analysis Research at LabROSA
Sound Analysis Research at LabROSA Dan Ellis Laboratory for Recognition and Organization of Speech and Audio Dept. Electrical Eng., Columbia Univ., NY USA dpwe@ee.columbia.edu http://labrosa.ee.columbia.edu/
More informationI. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist
IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 18, NO. 6, AUGUST 2010 1379 Unsupervised Equalization of Lombard Effect for Speech Recognition in Noisy Adverse Environments Hynek Bořil,
More informationarxiv: v1 [cs.lg] 4 Feb 2019
Machine Learning for Seizure Type Classification: Setting the benchmark Subhrajit Roy [000 0002 6072 5500], Umar Asif [0000 0001 5209 7084], Jianbin Tang [0000 0001 5440 0796], and Stefan Harrer [0000
More informationAnalysis of Speech Recognition Techniques for use in a Non-Speech Sound Recognition System
Analysis of Recognition Techniques for use in a Sound Recognition System Michael Cowling, Member, IEEE and Renate Sitte, Member, IEEE Griffith University Faculty of Engineering & Information Technology
More informationLecture 9: Speech Recognition: Front Ends
EE E682: Speech & Audio Processing & Recognition Lecture 9: Speech Recognition: Front Ends 1 2 Recognizing Speech Feature Calculation Dan Ellis http://www.ee.columbia.edu/~dpwe/e682/
More informationGfK Verein. Detecting Emotions from Voice
GfK Verein Detecting Emotions from Voice Respondents willingness to complete questionnaires declines But it doesn t necessarily mean that consumers have nothing to say about products or brands: GfK Verein
More information1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1
1. INTRODUCTION Sign language interpretation is one of the HCI applications where hand gesture plays important role for communication. This chapter discusses sign language interpretation system with present
More informationNoise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise
4 Special Issue Speech-Based Interfaces in Vehicles Research Report Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise Hiroyuki Hoshino Abstract This
More informationAn assistive application identifying emotional state and executing a methodical healing process for depressive individuals.
An assistive application identifying emotional state and executing a methodical healing process for depressive individuals. Bandara G.M.M.B.O bhanukab@gmail.com Godawita B.M.D.T tharu9363@gmail.com Gunathilaka
More informationEffect of Sensor Fusion for Recognition of Emotional States Using Voice, Face Image and Thermal Image of Face
Effect of Sensor Fusion for Recognition of Emotional States Using Voice, Face Image and Thermal Image of Face Yasunari Yoshitomi 1, Sung-Ill Kim 2, Takako Kawano 3 and Tetsuro Kitazoe 1 1:Department of
More informationAffect Intensity Estimation using Multiple Modalities
Affect Intensity Estimation using Multiple Modalities Amol S. Patwardhan, and Gerald M. Knapp Department of Mechanical and Industrial Engineering Louisiana State University apatwa3@lsu.edu Abstract One
More informationJitter, Shimmer, and Noise in Pathological Voice Quality Perception
ISCA Archive VOQUAL'03, Geneva, August 27-29, 2003 Jitter, Shimmer, and Noise in Pathological Voice Quality Perception Jody Kreiman and Bruce R. Gerratt Division of Head and Neck Surgery, School of Medicine
More informationVital Responder: Real-time Health Monitoring of First- Responders
Vital Responder: Real-time Health Monitoring of First- Responders Ye Can 1,2 Advisors: Miguel Tavares Coimbra 2, Vijayakumar Bhagavatula 1 1 Department of Electrical & Computer Engineering, Carnegie Mellon
More informationModeling and Recognizing Emotions from Audio Signals: A Review
Modeling and Recognizing Emotions from Audio Signals: A Review 1 Ritu Tanwar, 2 Deepti Chaudhary 1 UG Scholar, 2 Assistant Professor, UIET, Kurukshetra University, Kurukshetra, Haryana, India ritu.tanwar2012@gmail.com,
More informationTESTS OF ROBUSTNESS OF GMM SPEAKER VERIFICATION IN VoIP TELEPHONY
ARCHIVES OF ACOUSTICS 32, 4 (Supplement), 187 192 (2007) TESTS OF ROBUSTNESS OF GMM SPEAKER VERIFICATION IN VoIP TELEPHONY Piotr STARONIEWICZ Wrocław University of Technology Institute of Telecommunications,
More informationExploiting visual information for NAM recognition
Exploiting visual information for NAM recognition Panikos Heracleous, Denis Beautemps, Viet-Anh Tran, Hélène Loevenbruck, Gérard Bailly To cite this version: Panikos Heracleous, Denis Beautemps, Viet-Anh
More informationHearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds
Hearing Lectures Acoustics of Speech and Hearing Week 2-10 Hearing 3: Auditory Filtering 1. Loudness of sinusoids mainly (see Web tutorial for more) 2. Pitch of sinusoids mainly (see Web tutorial for more)
More informationACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES
ISCA Archive ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES Allard Jongman 1, Yue Wang 2, and Joan Sereno 1 1 Linguistics Department, University of Kansas, Lawrence, KS 66045 U.S.A. 2 Department
More informationDiscrete Signal Processing
1 Discrete Signal Processing C.M. Liu Perceptual Lab, College of Computer Science National Chiao-Tung University http://www.cs.nctu.edu.tw/~cmliu/courses/dsp/ ( Office: EC538 (03)5731877 cmliu@cs.nctu.edu.tw
More informationComparison of selected off-the-shelf solutions for emotion recognition based on facial expressions
Comparison of selected off-the-shelf solutions for emotion recognition based on facial expressions Grzegorz Brodny, Agata Kołakowska, Agnieszka Landowska, Mariusz Szwoch, Wioleta Szwoch, Michał R. Wróbel
More informationLinguistic Phonetics Fall 2005
MIT OpenCourseWare http://ocw.mit.edu 24.963 Linguistic Phonetics Fall 2005 For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms. 24.963 Linguistic Phonetics
More information! Can hear whistle? ! Where are we on course map? ! What we did in lab last week. ! Psychoacoustics
2/14/18 Can hear whistle? Lecture 5 Psychoacoustics Based on slides 2009--2018 DeHon, Koditschek Additional Material 2014 Farmer 1 2 There are sounds we cannot hear Depends on frequency Where are we on
More informationAn active unpleasantness control system for indoor noise based on auditory masking
An active unpleasantness control system for indoor noise based on auditory masking Daisuke Ikefuji, Masato Nakayama, Takanabu Nishiura and Yoich Yamashita Graduate School of Information Science and Engineering,
More informationMultimodal emotion recognition from expressive faces, body gestures and speech
Multimodal emotion recognition from expressive faces, body gestures and speech Ginevra Castellano 1, Loic Kessous 2, and George Caridakis 3 1 InfoMus Lab, DIST - University of Genova Viale Causa 13, I-16145,
More informationA Review on Dysarthria speech disorder
A Review on Dysarthria speech disorder Miss. Yogita S. Mahadik, Prof. S. U. Deoghare, 1 Student, Dept. of ENTC, Pimpri Chinchwad College of Engg Pune, Maharashtra, India 2 Professor, Dept. of ENTC, Pimpri
More informationAudio-visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space
2010 International Conference on Pattern Recognition Audio-visual Classification and Fusion of Spontaneous Affective Data in Likelihood Space Mihalis A. Nicolaou, Hatice Gunes and Maja Pantic, Department
More informationLecture 3: Perception
ELEN E4896 MUSIC SIGNAL PROCESSING Lecture 3: Perception 1. Ear Physiology 2. Auditory Psychophysics 3. Pitch Perception 4. Music Perception Dan Ellis Dept. Electrical Engineering, Columbia University
More informationSmart Multifunctional Digital Content Ecosystem Using Emotion Analysis of Voice
International Conference on Computer Systems and Technologies - CompSysTech 17 Smart Multifunctional Digital Content Ecosystem Using Emotion Analysis of Voice Alexander I. Iliev, Peter Stanchev Abstract:
More informationMSAS: An M-mental health care System for Automatic Stress detection
Quarterly of Clinical Psychology Studies Allameh Tabataba i University Vol. 7, No. 28, Fall 2017, Pp 87-94 MSAS: An M-mental health care System for Automatic Stress detection Saeid Pourroostaei Ardakani*
More informationHearing the Universal Language: Music and Cochlear Implants
Hearing the Universal Language: Music and Cochlear Implants Professor Hugh McDermott Deputy Director (Research) The Bionics Institute of Australia, Professorial Fellow The University of Melbourne Overview?
More informationError Detection based on neural signals
Error Detection based on neural signals Nir Even- Chen and Igor Berman, Electrical Engineering, Stanford Introduction Brain computer interface (BCI) is a direct communication pathway between the brain
More information