Vocal and Facial Biomarkers of Depression Based on Motor Incoordination and Timing

Similar documents
Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores

Study and Comparison of Various Techniques of Image Edge Detection

STOCHASTIC MODELS OF PITCH JITTER A D AMPLITUDE SHIMMER FOR VOICE MODIFICATIO

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy

AUTOMATED CHARACTERIZATION OF ESOPHAGEAL AND SEVERELY INJURED VOICES BY MEANS OF ACOUSTIC PARAMETERS

Using Past Queries for Resource Selection in Distributed Information Retrieval

A Linear Regression Model to Detect User Emotion for Touch Input Interactive Systems

Using a Wavelet Representation for Classification of Movement in Bed

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field

Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA

Lateral Transfer Data Report. Principal Investigator: Andrea Baptiste, MA, OT, CIE Co-Investigator: Kay Steadman, MA, OTR, CHSP. Executive Summary:

An Improved Time Domain Pitch Detection Algorithm for Pathological Voice

AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION

Automated and ERP-Based Diagnosis of Attention-Deficit Hyperactivity Disorder in Children

Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of

Shape-based Retrieval of Heart Sounds for Disease Similarity Detection Tanveer Syeda-Mahmood, Fei Wang

Research Article Statistical Analysis of Haralick Texture Features to Discriminate Lung Abnormalities

*VALLIAPPAN Raman 1, PUTRA Sumari 2 and MANDAVA Rajeswari 3. George town, Penang 11800, Malaysia. George town, Penang 11800, Malaysia

Nonlinear Modeling Method Based on RBF Neural Network Trained by AFSA with Adaptive Adjustment

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article

SPEECH TO FACIAL ANIMATION CONVERSION FOR DEAF CUSTOMERS

Dr.S.Sumathi 1, Mrs.V.Agalya 2 Mahendra Engineering College, Mahendhirapuri, Mallasamudram

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex

Available online at ScienceDirect. Procedia Computer Science 46 (2015 )

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION

AUTOMATED DETECTION OF HARD EXUDATES IN FUNDUS IMAGES USING IMPROVED OTSU THRESHOLDING AND SVM

Integration of sensory information within touch and across modalities

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana

A New Machine Learning Algorithm for Breast and Pectoral Muscle Segmentation

Price linkages in value chains: methodology

Copy Number Variation Methods and Data

Biomarker Selection from Gene Expression Data for Tumour Categorization Using Bat Algorithm

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo

Balanced Query Methods for Improving OCR-Based Retrieval

NHS Outcomes Framework

A Geometric Approach To Fully Automatic Chromosome Segmentation

An Introduction to Modern Measurement Theory

Physical Model for the Evolution of the Genetic Code

Reconstruction of gene regulatory network of colon cancer using information theoretic approach

Study on Psychological Crisis Evaluation Combining Factor Analysis and Neural Networks *

Semantics and image content integration for pulmonary nodule interpretation in thoracic computed tomography

ME Abstract. Keywords: multidimensional reliability, instrument of students satisfaction as an internal costumer, confirmatory factor analysis

Heart Rate Variability Analysis Diagnosing Atrial Fibrillation

econstor Make Your Publications Visible.

Non-linear Multiple-Cue Judgment Tasks

Investigation of zinc oxide thin film by spectroscopic ellipsometry

Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer

Michael Dorman Department of Speech and Hearing Science, Arizona State University, Tempe, Arizona 85287

An expressive three-mode principal components model for gender recognition

Multidimensional Reliability of Instrument for Measuring Students Attitudes Toward Statistics by Using Semantic Differential Scale

Statistically Weighted Voting Analysis of Microarrays for Molecular Pattern Selection and Discovery Cancer Genotypes

EEG Comparison Between Normal and Developmental Disorder in Perception and Imitation of Facial Expressions with the NeuCube

INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER

Optimal Planning of Charging Station for Phased Electric Vehicle *

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE

A Neural Network System for Diagnosis and Assessment of Tremor in Parkinson Disease Patients

THE NATURAL HISTORY AND THE EFFECT OF PIVMECILLINAM IN LOWER URINARY TRACT INFECTION.

Appendix F: The Grant Impact for SBIR Mills

Recognition of ASL for Human-robot Interaction

Richard Williams Notre Dame Sociology Meetings of the European Survey Research Association Ljubljana,

Active Affective State Detection and User Assistance with Dynamic Bayesian Networks. Xiangyang Li, Qiang Ji

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis

Introduction ORIGINAL RESEARCH

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect

ALMALAUREA WORKING PAPERS no. 9

A-UNIFAC Modeling of Binary and Multicomponent Phase Equilibria of Fatty Esters+Water+Methanol+Glycerol

EXAMINATION OF THE DENSITY OF SEMEN AND ANALYSIS OF SPERM CELL MOVEMENT. 1. INTRODUCTION

ARTICLE IN PRESS Biomedical Signal Processing and Control xxx (2011) xxx xxx

CLUSTERING is always popular in modern technology

Evaluation of the generalized gamma as a tool for treatment planning optimization

Statistical Analysis on Infectious Diseases in Dubai, UAE

The effect of salvage therapy on survival in a longitudinal study with treatment by indication

EVALUATION OF BULK MODULUS AND RING DIAMETER OF SOME TELLURITE GLASS SYSTEMS

Computing and Using Reputations for Internet Ratings

What Determines Attitude Improvements? Does Religiosity Help?

N-back Training Task Performance: Analysis and Model

Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE

J. H. Rohrer, S. H. Baron, E. L. Hoffman, D. V. Swander

Classification of Breast Tumor in Mammogram Images Using Unsupervised Feature Learning

Improvement of Automatic Hemorrhages Detection Methods using Brightness Correction on Fundus Images

An Approach to Discover Dependencies between Service Operations*

The Influence of the Isomerization Reactions on the Soybean Oil Hydrogenation Process

Project title: Mathematical Models of Fish Populations in Marine Reserves

Detection of Lung Cancer at Early Stage using Neural Network Techniques for Preventing Health Care

Lymphoma Cancer Classification Using Genetic Programming with SNR Features

Algorithms 2009, 2, ; doi: /a OPEN ACCESS

Proceedings of the 6th WSEAS Int. Conf. on EVOLUTIONARY COMPUTING, Lisbon, Portugal, June 16-18, 2005 (pp )

A Mathematical Model of the Cerebellar-Olivary System II: Motor Adaptation Through Systematic Disruption of Climbing Fiber Equilibrium

Perceptual image quality: Effects of tone characteristics

Computer cursor control by motor cortical signals in humans with tetraplegia

Transcription:

Vocal and Facal Bomarkers of Depresson Based on Motor Incoordnaton and Tmng James R. Wllamson MIT Lncoln Laboratory Lexngton, MA 02421 (781) 981-5374 jrw@ll.mt.edu Thomas F. Quater MIT Lncoln Laboratory Lexngton, MA 02421 (781) 981-7487 quater@ll.mt.edu Bran S. Helfer MIT Lncoln Laboratory Lexngton, MA 02421 (781) 981-7962 bran.helfer@ll.mt.edu Gregory Cccarell MIT Lncoln Laboratory Lexngton, MA 02421 (781) 981-3474 gregory.cccarell@ll.mt.edu ABSTRACT 1 In ndvduals wth major depressve dsorder, neurophysologcal changes often alter motor control and thus affect the mechansms controllng speech producton and facal expresson. These changes are typcally assocated wth psychomotor retardaton, a condton marked by slowed neuromotor output that s behavorally manfested as altered coordnaton and tmng across multple motor-based propertes. Changes n motor outputs can be nferred from vocal acoustcs and facal movements as ndvduals speak. We derve novel mult-scale correlaton structure and tmng feature sets from audo-based vocal features and vdeobased facal acton unts from recordngs provded by the 4th Internatonal Audo/Vdeo Emoton Challenge (AVEC). The feature sets enable detecton of changes n coordnaton, movement, and tmng of vocal and facal gestures that are potentally symptomatc of depresson. Combnng complementary features n Gaussan mxture model and extreme learnng machne classfers, our multvarate regresson scheme predcts Beck depresson nventory ratngs on the AVEC test set wth a root-mean-square error of 8.12 and mean absolute error of 6.31. Future work calls for contnued study nto detecton of neurologcal dsorders based on altered coordnaton and tmng across audo and vdeo modaltes. 1 Ths work s sponsored by the Assstant Secretary of Defense for Research & Engneerng under Ar Force contract #FA8721-05-C-0002. Opnons, nterpretatons, conclusons, and recommendatons are those of the authors and are not necessarly endorsed by the Unted States Government. Permsson to make dgtal or hard copes of all or part of ths work for personal or classroom use s granted wthout fee provded that copes are not made or dstrbuted for proft or commercal advantage and that copes bear ths notce and the full ctaton on the frst page. Copyrghts for components of ths work owned by others than ACM must be honored. Abstractng wth credt s permtted. To copy otherwse, or republsh, to post on servers or to redstrbute to lsts, requres pror specfc permsson and/or a fee. Request permssons from Permssons@acm.org. AVEC 14, November 7, 2014, Orlando, FL, USA. Copyrght 2014 ACM 978-1-4503-3119-7/14/11 $15.00. http://dx.do.org/10.1145/2661806.2661809 Daryush D. Mehta MIT Lncoln Laboratory Lexngton, MA 02421 (781) 981-5818 daryush.mehta@ll.mt.edu Categores and Subject Descrptors G.3 [Mathematcs of Computng]: Probablty and Statstcs correlaton and regresson analyss, tme seres analyss I.5.4 [Computer Methodologes]: Pattern Recognton sgnal processng Keywords major depressve dsorder, motor control, vocal bomarker, facal bomarker, ncoordnaton and tmng, correlaton structure, Gaussan mxture model, extreme learnng machne 1. INTRODUCTION Major depressve dsorder (MDD) s the most prevalent mood dsorder, wth a lfetme rsk of 10 20% for women and 5 12% for men [6]. As the number of people sufferng from MDD steadly ncreases, so too does the burden of accurate dagnoss. Currently, the dagnoss of MDD requres a comprehensve assessment by a professonal wth sgnfcant clncal experence. However, the nter-clncan varablty of these assessments makes the trackng of medcaton effcacy durng clncal trals dffcult. The growng global burden of MDD suggests that a convenent and automated method to evaluate depresson severty would both smplfy and standardze the task of dagnosng and montorng depresson, allowng for greater avalablty and unformty n assessment. An automated approach may reduce multple n-offce clncal vsts, facltate accurate measurement and dentfcaton, and qucken the evaluaton of treatment. Toward these objectves, potental depresson bomarkers of growng nterest are vocal- and facal expresson-based features, two categores of easly-acqured measures that have been shown to change wth a patent s mental condton and emotonal state [3, 7, 8, 21, 25 28, 34]. Fgure 1 llustrates the categorzaton of vocal characterstcs nto three components: speech exctaton (source), vocal tract (system), and pattern of stress and ntonaton (prosody). Depresson-related changes n speech reflect the percepton of qualtes such as monotony, slur, slowness, hoarseness, and breathness n the speech of depressed ndvduals. Hoarseness and breathness may be assocated wth speech source characterstcs (at the level of the vocal folds). Monotony may be 65

assocated wth prosody (e.g., modulaton of speech rate, ptch, and energy) and slur wth speech system characterstcs (e.g., vocal tract artculators). Characterzng the effects of depresson on facal movements s an actve research area. Early work found measurable dfferences between facal expressons of people sufferng from MDD and facal expressons of non-depressed ndvduals [8]. EMG montors can regster facal expressons that are mperceptble durng clncal assessment [9], and have found acute reductons n nvoluntary facal expressons n depressed persons [31]. The facal acton codng system (FACS) quantfes localzed changes n facal expresson representng facal acton unts (FAUs) that correspond to dstnct muscle movements of the face [5]. Prosody Fgure 1. Illustraton of speech source (at the vocal folds), system (vocal tract), and prosody (melody). Although there has been sgnfcant effort n studyng vocal and facal bomarkers for emoton classfcaton, there has been lttle or no study nto changes n coordnaton, movement, and tmng usng speech and facal modaltes for depresson classfcaton or severty predcton. In ndvduals sufferng from MDD, neurophysologcal changes often alter motor control and thus affect mechansms controllng speech producton and facal expresson. Clncally, these changes are typcally assocated wth psychomotor retardaton, a condton of slowed neuromotor output manfested n altered coordnaton and tmng across multple observables of acoustcs and facal movements durng speech. Fgure 2 dsplays a block dagram of the developed system for predctng depresson severty from the Beck depresson nventory (BDI) ratng scale. Incorporatng audo and vdeo features reflectng manfestatons of altered coordnaton and tmng nto a novel machne learnng scheme s the focus of ths study. 2. AUDIO/VIDEO DATABASE The 2014 Audo/Vdeo Emoton Challenge (AVEC) uses a depresson corpus that ncludes audo and vdeo recordngs of subjects performng a human-computer nteracton task [35]. Data were collected from 84 German subjects, wth a subset of subjects recorded durng multple sessons: 31 subjects were recorded twce and 18 subjects were recorded three tmes. The subjects age vared between 18 and 63 years, wth a mean of 31.5 years and a standard devaton of 12.3 years. Subjects performed two speech tasks n the German language: (1) readng a phonetcally-balanced passage and (2) replyng to a free-response queston. The read passage (NW) was an excerpt of the fable De Sonne und der Wnd (The North Wnd and the Sun). The free speech secton (FS) asked the subjects to respond to one of a number of questons (prompted n wrtten German), such as What s your favorte dsh? What was your best gft, and why? and Dscuss a sad chldhood memory. The NW and FS passages ranged n duraton from 00:31 to 01:29 (mm:ss) and 00:06 to 03:50 (mm:ss), respectvely. Vdeo of the subjects face was captured usng a webcam at 30 frames per second and a spatal resoluton of 640 x 480 pxels. Audo was captured wth a headset mcrophone connected to a laptop soundcard at samplng rates of 32 khz or 48 khz usng the AAC codec. For each sesson, the self-reported BDI score was avalable. The recorded sessons were splt nto three parttons (tranng, development, and test) wth 50 recordngs n each set. We combned the tranng and development sets nto a sngle 100- sesson data set, whch s henceforth termed the Tranng set. 3. LOW-LEVEL FEATURE EXTRACTION We explot dynamc varaton and nter-relatonshps across speech producton systems by computng features that reflect complementary aspects of the speech source, system, and prosody. In the vdeo doman, FAUs yeld measures reflectng facal movements durng speech that can contrbute to depresson characterzaton. 3.1 Voce Source Propertes Harmoncs-to-nose rato (HNR): A spectral measure of harmoncs-to-nose rato was performed usng a perodc/nose decomposton method that employs a comb flter to extract the harmonc component of a sgnal [17 19]. Ths ptch-scaled harmonc flter approach uses an analyss wndow duraton equal to an nteger number of local perods (four n the current work) and reles on the property that harmoncs of the fundamental frequency exst at specfc frequency bns of the short-tme dscrete Fourer transform (DFT). In each wndow, after obtanng an estmate of the harmonc component, subtracton from the orgnal spectrum yelds the nose component, where nterpolaton flls n gaps n the resdual nose spectrum. The tme-doman sgnals of the harmonc and nose components n each frame are obtaned by performng nverse DFTs of the respectve spectra. Overlap-add synthess s then used to merge together all the shorttme segments. The short-tme harmoncs-to-nose rato s the rato, n db, of the power of the decomposed harmonc sgnal and the power of the decomposed speech nose sgnal. Cepstral peak promnence (CPP): Recent research has focused on developng mproved acoustc measures that do not rely on an accurate estmate of fundamental frequency, as requred for jtter and shmmer measures. Several studes have reported strong correlatons between cepstral peak promnence (CPP) and overall dysphona percepton [4, 13, 23], breathness [14, 15], and vocal fold knematcs. CPP s defned as the dfference, n db, between the magntude of the hghest peak and the nose floor n the power cepstrum for quefrences greater than 2 ms (correspondng to a range mnmally affected by vocal tract related nformaton) and was computed every 10 ms.. 3.2 Speech System Propertes Formant frequences: We assocate vocal tract resonance nformaton wth speech dynamcs as a means to represent artculatory changes n the depressed voce. We have selected an algorthm based on the prncple that formants are correlated wth one another n both frequency and tme domans [24, 30]. Formant frequences are computed every 10 ms. Embedded n the algorthm s a voce-actvty detector that allows a Kalman smoother to smoothly coast through non-speech regons. Because we are usng only the frequences of formants, these features are approxmately mmune to slowly-varyng lnear channel effects. Mel frequency cepstral coeffcents (MFCCs): To ntroduce vocal tract spectral magntude nformaton, we use standard MFCCs provded by AVEC. We also derved 16 correspondng delta MFCCs to reflect dynamc veloctes of the MFCCs over tme. Delta coeffcents were computed usng a delay parameter of 2 (regresson over two frames before and after a gven frame). 66

Fgure 2. Block dagram of the developed system for predctng the Beck depresson nventory (BDI) score. 3.3 Speech Prosody Propertes Phoneme duratons: We have found that computng phonemespecfc characterstcs, rather than average measures of speakng rate, reveals stronger relatonshps between speech rate and depresson severty [34]. Usng an automatc phoneme recognton algorthm [32], we detect phonetc boundares and phonemespecfc duratons that are assocated wth each nstance of the 40 classes of defned phonetc speech unts. Ptch slopes: The fundamental frequency (ptch) was estmated usng a tme-doman autocorrelaton method over 40 ms Hannng wndows every 1 ms [2]. Wthn each phone segment, a lnear ft was made to these ptch values, yeldng a ptch slope feature (ΔHz/s) assocated wth each nstance of phonetc speech unts. 3.4 Facal Acton Unts (FAUs) Although the FACS provdes a formalzed method for dentfyng changes n facal expresson, ts mplementaton for the analyss of large quanttes of data has been mpeded by the need for traned annotators to mark ndvdual frames of a recorded vdeo sesson. For ths reason, the Unversty of Calforna San Dego has developed a computer expresson recognton toolbox (CERT) for the automatc dentfcaton of FAUs from ndvdual vdeo frames [20]. Table 1 lsts the FAUs output by CERT used for the vdeo-based facal expresson analyss. All frames marked as nvald by the program and values consdered outlers were removed. In addton, each frame of data s retaned only f t s marked vald across all 20 FAUs. If the duraton of the remanng FAU tme seres was less than 30 s or 40% of ther orgnal length, the entre set of FAUs for that recordng was not used. Wth ths procedure, FAUs from fve NW and 17 FS passages n the Tranng set and from three NW and fve FS passages n the Test set were omtted from processng. Each FAU feature was converted from a support vector machne (SVM) hyperplane dstance to a posteror probablty usng a logstc model traned on a separate database of vdeo recordngs [22]. Henceforth, the term FAU refers to these frame-by-frame estmates of FAU posteror probabltes. 4. HIGH-LEVEL FEATURE EXTRACTION Our hgh-level features are desgned to characterze propertes of coordnaton and tmng from the low-level features. The measures of coordnaton use assessments of the mult-scale structure of correlatons among the low-level features. Ths approach s motvated by the observaton that auto- and crosscorrelatons of measured sgnals can reveal hdden parameters n the stochastc-dynamcal systems that generate the tme seres. Ths multvarate feature constructon approach frst ntroduced for analyss of EEG sgnals for epleptc sezure predcton [36, 37] has snce been successfully appled to speech analyss for estmatng depresson [33], the estmaton of cogntve performance assocated wth dementa [39], and the detecton of changes n cogntve performance assocated wth mld traumatc bran njury [11]. Channel-delay correlaton and covarance matrces are computed from multple tme seres channels (of gven vocal and facal parameters). Each matrx contans correlaton or covarance coeffcents between the channels at multple relatve tme delays. Changes over tme n the couplng strengths among the channel sgnals cause changes n the egenvalue spectra of the channeldelay matrces. The matrces are computed at four separate tme scales, n whch successve tme delays correspond to frame spacngs of 1, 3, 7, and 15. A detaled descrpton of the crosscorrelaton approach can be found n [37]. Overall covarance power (logarthm of the trace) and entropy (logarthm of the determnant) are also extracted from the channel-delay covarance matrces at each scale. For vocal-based tmng features we use cumulatve phonemedependent duratons and ptch slopes, obtaned usng estmated phoneme boundares. For facal-based tmng features, we use FAU rates obtaned from ther estmated posteror probabltes. 4.1 Speech Correlaton Structure Fgure 3 llustrates the cross-correlaton (xcorr) technque appled to a formant tme seres. For each above tme scale (10, 30, 70, 150 ms for a 10-ms frame), correlaton coeffcents are computed among sgnals shfted n tme relatve to each other, wth 15 tmedelays used per scale. After nvestgatng multple combnatons of the low-level vocal features as nput to the xcorr analyss, we found the best overall Table 1. The 20 facal acton unts from CERT. # Descrpton # Descrpton 1 Inner Brow Rase 11 Lp Stretch 2 Outer Brow Rase 12 Cheek Rase 3 Brow Lower 13 Lds Tght 4 Eye Wden 14 Lp Pucker 5 Nose Wrnkle 15 Lp Tghtener 6 Lp Rase 16 Lp Presser 7 Lp Corner Pull 17 Lps Part 8 Dmpler 18 Jaw Drop 9 Lp Corner Depressor 19 Lps Suck 10 Chn Rase 20 Blnk/Eye Closure 67

performance usng the followng three combnatons: 1) Formant CPP, 2) CPP HNR, and 3) delta MFCC. Fgures 4 6 show example results at a sngle tme scale for each of the feature combnatons. Two speech recordngs of the NW passage Fgure 3. Dagram of cross-correlaton analyss of artculatory coordnaton, as performed through formant-based features usng channel-delay correlaton matrces at multple delay scales. A channel-delay matrx from one scale s shown. llustrate the typcal effect of depresson on these xcorr channeldelay matrces and egenspectra feature vectors. These recordngs are of a non-depressed ndvdual (BDI = 0) and a depressed ndvdual (BDI = 35) from the Tranng set. The lower-left n each fgure gves the egenvalues for the normal and depressed subject cases, whle the lower-rght plot n each fgure shows the mean normalzed egenvalues for all Tranng sessons grouped nto four dfferent BDI score ranges: 0 8 (blue), 9 19 (cyan), 19 28 (green), and 29 45 (red). For Formant CPP xcorr features, vectors consst of 248 elements (4 channels, 4 tme scales, 15 delays per scale, and 2 covarance features per scale). For CPP HNR xcorr features, vectors consst of 88 elements (2 channels, 4 scales, 15 delays per scale, top 20 egenvalues per scale, and 2 covarance features per scale). For delta MFCC xcorr features, the vectors consst of 968 elements (16 channels, 4 scales, 15 delays per scale, and 2 covarance features per scale). Fgure 5. CPP HNR xcorr features. Top: Channel-delay correlaton matrces from NW passage for a normal and a depressed subject. Bottom: Egenvalues for these subjects (left) and average normalzed egenvalues for four BDI ranges n Tranng set (rght). Fgure 6. Delta MFCC xcorr features. Top: Channel-delay correlaton matrces from NW passage for a normal and a depressed subject. Bottom: Egenvalues for these subjects (left) and average normalzed egenvalues for four BDI ranges n Tranng set (rght). Fgure 4. Formant CPP xcorr features. Top: Channel-delay correlaton matrces from NW passage for a normal and a depressed subject. Red denotes hgh and blue low (auto-) cross-correlaton values. Bottom: Egenvalues for these subjects (left) and average normalzed egenvalues for four BDI ranges n Tranng set (rght). 4.2 Facal Correlaton Structure Facal coordnaton features are obtaned by applyng the xcorr technque to the FAU tme seres usng the same parameters that were used to analyze the vocal-based features. Because of the 30 Hz FAU frame rate, spacng for the four tme scales correspond to tme samplng n ncrements of approxmately 33 ms, 100 ms, 234 ms, and 500 ms. Fgure 7 (top) shows example FAU channel-delay matrces at a sngle tme scale from the same normal and depressed subjects that were used for llustraton n Fgures 4 6. These matrces are derved from the FS passage. As wth Formant CPP and CPP HNR xcorr features, Fgure 7 (bottom-left) shows that the egenspectra of the depressed subject contan less power n the small egenvalues. Ths effect s observed across a spectrum of BDI scores n all 83 free-response 68

Tranng set recordngs wth vald FAU features. The facal-based egenvalue dfferences are smlar to those found n Formant CPP and CPP HNR xcorr features. Fgure 7. FAU xcorr features. Top: Channel-delay correlaton matrces from FS passage for a normal and a depressed subject. Bottom: Egenvalues for these subjects (left) and average normalzed egenvalues for four BDI ranges n Tranng set (rght). 4.3 Phoneme Duraton Buldng on prevous work [34], the summed duratons of certan phonemes are lnearly combned to yeld fused phoneme duraton measures. A subset of phonemes whose summed duratons are hghly correlated wth BDI scores on the Tranng set are selected to create these fused measures, wth weghts based on the strength of ther ndvdual correlatons. Table 2 lsts the selected phonemes for the North Wnd passage (left) and the frst sx of the ten selected phonemes for the Free Speech passage (rght), along wth ther ndvdual BDI correlatons. The correlatons of the fused measures for each passage are shown at the bottom. The lnear combnaton used to obtan the fused measures s as follows. For each phoneme category, we denote d as the cumulatve duraton over a recordng, R as ts correlaton wth BDI, and as ts weght. Then, the fused duraton measure s d ˆ w d, (1) where w sgn 2 R 1 R. Equaton (2) s a modfcaton of the combnaton rule used n [34] that causes hghly correlatng phoneme duratons to be weghted more strongly than weakly correlatng phoneme duratons. For the NW passage, d s the total duraton of phoneme. For the FS passage, d s the total phoneme duraton dvded by the total passage duraton (speech only) and thus provdes a rate measure (percent tme present). 4.4 Ptch Slope A fused phoneme-dependent ptch slope measure s obtaned usng essentally the same procedure as descrbed above. For each phoneme, we compute the sum of vald ptch slopes across all nstances of that phoneme. Invald slopes are those wth absolute value greater than eght, resultng n the excluson of most slopes that are computed from dscontnuous ptch contours. (2) Table 2. Correlaton coeffcents (R, p < 0.01) between fused phoneme duratons and BDI scores n the Tranng set. Fuson s done usng lnear combnatons of phoneme duratons. Only 6 of the 10 Free Speech phonemes are shown. North Wnd Free Speech Phoneme R Phoneme R l 0.50 ng 0.38 ah 0.45 t 0.34 n 0.41 hh 0.33 h 0.34 ey 0.32 b 0.34 ow 0.28 ow 0.34 er 0.27 Fused 0.54 Fused 0.57 For each passage, the set of phonemes wth the hghest correlatng summed ptch slopes are then selected. The summed ptch slopes are combned usng equatons (1) and (2) to obtan fused measures for the NW and FS passages. Usng 20 phonemes for NW and 15 phonemes for FS, these fused measures have BDI correlatons of (NW) and (FS). 4.5 Facal Actvaton Rate The facal actvaton rate feature s obtaned by computng mean FAU values (an estmate of percent tme present va posteror probabltes) over each passage and combnng several of these nto a fused FAU rate measure. Weghts are based on FAU correlatons wth BDI scores usng the combnaton rule n Equatons 1 and 2. Table 3 shows the FAUs used n ths fuson process, ther ndvdual correlatons, and the correlatons of the fused measures. 4.6 Dmensonalty Reducton The xcorr feature vectors typcally contan hghly correlated elements. To obtan lower-dmensonal uncorrelated feature vectors for machne learnng technques, we apply prncpal component analyss. Table 4 lsts the number of prncpal components we chose for each xcorr feature type, along wth phonetc and FAU rate features. The number of prncpal components n each case was determned emprcally by crossvaldaton performance. Table 3. Correlaton coeffcent (R) between mean FAU posteror probabltes and BDI n the Tranng set (p < 0.05 for all R 0.21). Fuson s done usng lnear combnatons of the mean FAU posteror probabltes. See Table 1 for FAU descrptons. North Wnd Free Speech FAU # R FAU # R 3 0.24 2 0.16 4 0.26 3 0.22 5 0.21 4 0.27 7 0.30 5 0.19 8 0.28 8 0.18 10 0.23 9 0.17 11 0.24 11 0.15 14 0.27 12 0.14 15 0.30 13 0.16 18 0.37 15 0.22 Fused 0.58 Fused 0.46 69

5. MULTIVARIATE FUSION AND PREDICTION Our next step nvolves mappng the features descrbed n Secton 4 nto unvarate scores that can be easly mapped nto BDI predctons. To do ths, we use both generatve Gaussan mxture models (GMMs), whch have been wdely used for automatc speaker recognton [29] and have recently been extended to vocal-based depresson classfcaton [12, 33, 38], and dscrmnatve extreme learnng machnes (ELMs), a sngle layer feedforward neural network archtecture wth randomly assgned hdden nodes [10, 16]. Table 4. Total number of dmensons (# Dm.) and number of dmensons selected after prncpal component analyss (PCA #) for each of the eght features sets. Feature Data Feature Type # Dm. PCA # Set 1 NW Formant CPP 248 4 xcorr NW CPP HNR xcorr 88 2 NW Delta MFCC xcorr 968 5 2 NW Phoneme duraton 1 1 3 NW Ptch slope 1 1 4 NW FAU rate 1 1 5 FS FAU xcorr 1208 6 6 FS Phoneme rate 1 1 7 FS Ptch slope 1 1 8 FS FAU rate 1 1 5.1 Gaussan Mxture Model Gaussan starcase: To tran the generatve GMMs, we utlze the Gaussan starcase approach n whch each GMM s comprsed of an ensemble of Gaussan classfers [38]. The ensemble s derved from sx parttons of the tranng data nto dfferent ranges of depresson severty for low (Class 1) and hgh (Class 2) depresson. Gven a BDI range of 0 to 45, the Class 1 ranges for the sx Gaussan classfers are: 0 4, 0 10, 0 17, 0 23, 0 30, and 0 36, wth the Class 2 ranges beng the complement of these. The Gaussan classfers comprse a sngle, hghly regularzed GMM classfer, wth feature denstes that smoothly ncrease n the drecton of decreasng (Class 1) or of ncreasng (Class 2) levels of depresson. Addtonal regularzaton of the denstes s obtaned by addng 0.1 to the dagonal elements of the normalzed covarance matrces. Subject-based adaptaton: Indvdual varablty n the relatonshps between features and BDI are partally accounted for wthn the GMMs usng Gaussan-mean subject-based adaptaton. If one or more sessons n the Tranng set have the same subject ID as the Test subject and are n the same BDI-based partton, the mean of the Gaussan for that partton s assgned to the mean of the data from that subject only, rather than the mean of the data from all subjects wthn the partton [38]. 5.2 Extreme Learnng Machne ELMs are used to provde a complementary dscrmnatve approach for predctng depresson level. The ELM s a feedforward neural network wth a sngle hdden layer, n whch the weghts and bases of the nodes are randomly assgned. The number of hdden nodes and the actvaton functon were emprcally selected, and rdge regresson output was used to map the transformed feature space nto depresson scores. Ths was done by solvng an L 2 -norm regularzed least squares problem. Feature Set 1 for the ELM uses a hdden layer wth 395 nodes and a hyperbolc tangent actvaton functon, whle Feature Set 2 uses a network wth 62 nodes and an nverse trangular bass actvaton functon. These values were selected emprcally. The ELM was chosen n place of more tradtonal multlayer perceptron or deep neural network archtectures for the advantages t provdes wth the AVEC audo-vsual data. Due to the lmted number of Tranng sessons, and the nosness of multdmensonal maps between features and BDI scores, gradent descent learnng algorthms typcally converge to hghly suboptmal solutons. The use of a least squares constrant n the ELM avods such a stepwse teratve procedure, and nstead uses a matrx nverse operator to drectly solve for the output mappng. An addtonal beneft wth ELM s that t tends to learn output weghts wth a small norm, thereby provdng better generalzaton performance accordng to Bartlett s theory [1]. 5.3 Predctors Table 4 lsts the eght Feature Sets used n the predcton system. A separate GMM classfer s used for each Feature Set, outputtng a log-lkelhood rato score for Class 1 (Normal) and Class 2 (Depressed). Separate ELM classfers are used for Feature Sets 1 and 2. Intal BDI predctons are obtaned from three Predctors, whch use dfferent combnatons of the eght Feature Sets and two types of classfers (Table 5). Wthn each Predctor, the classfer outputs from Feature Sets are summed together. Followng ths, a unvarate regresson model s created from the Tranng set and appled to the classfer output from the Test data. The resultng unvarate regresson output s the nttal BDI score predcton from each Predctor. For Predctors 1 and 2, subject-based adaptaton s then appled to adjust ths ntal predcton by correctng for consstent bases seen n the BDI Tranng set predctons of the same subject. If there are any Tranng sessons from a gven Test subject, then the average Tranng set error from that subject s used to adjust the predcton, as follows. Let N denote the number of repeat sessons n the Tranng set, and let and ndcate the true and predcted BDI score, respectvely, for the j th repeat Tranng sesson. Then, the predcton s adjusted usng the followng equaton, whch contans emprcally derved parameters: ~ N z zˆ 0.9 y yˆ 0.1 N (3) 5.4 Fusng Predctors The outputs of the three Predctors are fused to create a fnal BDI predcton, usng weghts based on each Predctor s accuracy, quantfed by BDI correlatons, n predctng BDI scores on the Tranng set: w R j 1 2. j 2 1 R. j For Predctor 2 (Feature Sets 3 8), only those Tranng and Test sessons that contan vald data from all of ts Feature Sets are used, resultng n 80 vald Tranng sessons and 45 vald Test sessons. In developng our predcton system, we obtan separate estmates of performance for novel Test subjects and for repeat Test subjects, who have sessons n the Tranng set. Ths s done usng non-repeat subjects and repeat subjects cross-valdaton evaluaton. Wth non-repeat subjects evaluaton, the system s traned only on subjects other than the one beng tested. Wth repeat subjects evaluaton, the system s tested only on subjects who have multple sessons n the Tranng set, and sessons from those subjects are ncluded n tranng. (4) 70

Intermedate predcton: The BDI correlatons n Equaton (4) are computed separately from non-repeat subjects and repeat subjects evaluatons, resultng n non-repeat subject weghts for Predctors 1 and 2 of and repeat subject weghts of. For each Test sesson, the approprate weghts are appled to the outputs of Predctors 1 and 2, and the weghted sum s then normalzed by the sum of the weghts. Nneteen of 50 Test sessons contan repeat subjects. Fnal predcton: The procedure descrbed above for obtanng a fused output from Predctors 1 and 2 s now repeated, to fuse wth the output from Predctor 3. The non-repeat subject weghts for ths fuson are, and the repeat-subject weghts are. 6. RESULTS The predcton system descrbed above was used for our best submsson n the AVEC 2014 Challenge, wth test RMSE = 8.12 and MAE = 6.31. These results are an mprovement on the wnnng submsson n the AVEC 2013 competton, whch was test RMSE = 8.50 and MAE = 6.52. Last year s result was obtaned usng a read passage (Homo Faber) that was much longer than the NW passage made avalable ths year. Our ntroducton ths year of new vocal and facal features, as well as mproved machne learnng technques, has resulted n mproved performance despte the relatve lack of data n ths year s AVEC challenge. In our fve submssons for AVEC 2014 we nvestgated varous Feature Set and Predctor combnatons, resultng n test RMSE values of (n order): 8.71, 9.08, 8.12, 8.36, and 8.27. Table 5. Three Predctors consstng of dfferent combnatons of Feature Sets and Classfers. Feature Regresson Classfer # Tran # Test Sets Order 1: 1-2 GMM 100 50 4 2: 3-8 GMM 80 45 4 3: 1-2 ELM 100 50 3 7. DISCUSSION In addton to the ntroducton of several novel feature combnatons, ths work s techncally sgnfcant because t demonstrates benefts of shftng from basc low-level features to hgh-level features that characterze and emphasze nteractons and tmng. These benefts may be due to the nformaton nherent n a holstc analyss n whch neurocogntve changes are manfested not just n subsystems of expresson but n the degraded coordnaton among the subsystems. The benefts may also be due to ncreased robustness to channel effects because of lttle drect dependence on speech spectral magntude. Clncally, the ablty to acheve a low test RMSE has mportant mplcatons for automatc and frequent at home montorng to assess patent state and quckly adapt treatment. Our future work wll emphasze three analyss branches. The frst branch s contnued analyss of the vocal and facal modaltes. For vocal analyss, we are consderng more sophstcated versons of prosodc characterzaton that jontly consder ptch and ntensty. For facal analyss, we are consderng the emotonal state classfcaton outputs provded by CERT. Both feature sets may provde addtonal nsght nto the arousal and valence of subjects. Second, we wll contnue to mprove feature fuson methodologes to handle ndvdually nosy or mssng data. Thrd, we seek a unfed neurocogntve model whch lnks the observed features to mechanstc changes n the bran that correspond to neural crcut changes assocated wth depresson. Ths holy-gral of neurocogntve research has the potental to relate the feature sets we have dentfed to neurologcal substrates. 8. CONCLUSION In summary, we have presented a multmodal analyss ppelne that explots complementary nformaton n audo and vdeo sgnals for estmatng depresson severty. We nvestgated how speech source, system, and prosody features, along wth facal acton unt features, correlate wth MDD usng the AVEC 2014 database, consstng of a read passage and free-response speech segment from subjects wth varyng depresson levels accordng to ther self-reported Beck depresson nventory assessment. Specfcally, we selected speech features that reflect movement and coordnaton n artculaton from formant frequences and delta mel-cepstra, aspects of the voce source ncludng degree of source rregularty (CPP and HNR), and changes n phonetc duratons and ptch slopes as propertes of prosody. We explored how certan facal expresson based features correlated wth depresson severty by showng the mportance of coordnaton across FAUs. These coordnaton measures were obtaned from auto- and cross-correlatons of the multchannel speech and vdeo sgnals. We also obtaned fused phoneme duraton features, and appled smlar fuson technques to obtan novel ptch slope and FAU rate features. Fnally, combnng GMM classfers created from a Gaussan starcase tranng procedure wth ELM classfers, we acheved a test RMSE of 8.12. 9. ACKNOWLEDGMENTS The authors thank Dr. Ncolas Malyska, Dr. Charles Dagl, and Bea Yu of MIT Lncoln Laboratory for ther contrbutons to software development requred n phoneme-based rate and facal acton unt features. 10. REFERENCES [1] Bartlett, P.L. 1998. The sample complexty of pattern classfcaton wth neural networks: the sze of the weghts s more mportant than the sze of the network. Informaton Theory, IEEE Transactons on. 44, 2 (1998), 525 536. [2] Boersma, P. 1993. Accurate short-term analyss of the fundamental frequency and the harmoncs-to-nose rato of a sampled sound. Proceedngs of the Insttute of Phonetc Scences. 17, (1993), 97 110. [3] Darby, J.K., Smmons, N. and Berger, P.A. 1984. Speech and voce parameters of depresson: A plot study. Journal of Communcaton Dsorders. 17, 2 (1984), 75 85. [4] Dejonckere, P. and Lebacq, J. 1996. Acoustc, perceptual, aerodynamc and anatomcal correlatons n voce pathology. ORL. 58, 6 (1996), 326 332. [5] Ekman, P., Fresen, W.V. and Ancol, S. 1980. Facal sgns of emotonal experence. Journal of personalty and socal psychology. 39, 6 (1980), 1125. [6] Fava, M. and Kendler, K.S. 2000. Major depressve dsorder. Neuron. 28, 2 (2000), 335 341. [7] France, D.J., Shav, R.G., Slverman, S., Slverman, M. and Wlkes, D.M. 2000. Acoustcal propertes of speech as ndcators of depresson and sucdal rsk. Bomedcal Engneerng, IEEE Transactons on. 47, 7 (2000), 829 837. [8] Gaebel, W. and Wölwer, W. 1992. Facal expresson and emotonal face recognton n schzophrena and depresson. European archves of psychatry and clncal neuroscence. 242, 1 (1992), 46 52. 71

[9] Greden, J.F. and Carroll, B.J. 1981. Psychomotor functon n affectve dsorders: An overvew of new montorng technques. The Amercan journal of psychatry. (1981). [10] Guang-Bn Huang, Hongmng Zhou, Xaojan Dng and Ru Zhang 2012. Extreme Learnng Machne for Regresson and Multclass Classfcaton. IEEE Transactons on Systems, Man, and Cybernetcs, Part B (Cybernetcs). 42, 2 (Apr. 2012), 513 529. [11] Helfer, B.S., Quater, T.F., Wllamson, J.R., Keyes, L., Evans, B., Greene, W.N., Van, T., Lacrgnola, J., Shenk, T., Talavage, T., Palmer, J. and Heaton, K. 2014. Artculatory Dynamcs and Coordnaton n Classfyng Cogntve Change wth Preclncal mtbi. Interspeech (2014). [12] Helfer, B.S., Quater, T.F., Wllamson, J.R., Mehta, D.D., Horwtz, R. and Yu, B. 2013. Classfcaton of depresson state based on artculatory precson. (2013). [13] Heman-Ackah, Y.D., Heuer, R.J., Mchael, D.D., Ostrowsk, R., Horman, M., Baroody, M.M., Hllenbrand, J. and Sataloff, R.T. 2003. Cepstral peak promnence: a more relable measure of dysphona. Annals of Otology Rhnology and Laryngology. 112, 4 (2003), 324 333. [14] Heman-Ackah, Y.D., Mchael, D.D. and Godng Jr, G.S. 2002. The relatonshp between cepstral peak promnence and selected parameters of dysphona. Journal of Voce. 16, 1 (2002), 20 27. [15] Hllenbrand, J. and Houde, R.A. 1996. Acoustc Correlates of Breathy Vocal QualtyDysphonc Voces and Contnuous Speech. Journal of Speech, Language, and Hearng Research. 39, 2 (1996), 311 321. [16] Huang, G.-B., Zhu, Q.-Y. and Sew, C.-K. 2006. Extreme learnng machne: Theory and applcatons. Neurocomputng. 70, 1-3 (Dec. 2006), 489 501. [17] Jackson, P.J. and Shadle, C.H. 2000. Frcaton nose modulated by vocng, as revealed by ptch-scaled decomposton. The Journal of the Acoustcal Socety of Amerca. 108, 4 (2000), 1421 1434. [18] Jackson, P.J. and Shadle, C.H. 2000. Performance of the ptchscaled harmonc flter and applcatons n speech analyss. Acoustcs, Speech, and Sgnal Processng, 2000. ICASSP 00. Proceedngs. 2000 IEEE Internatonal Conference on (2000), 1311 1314. [19] Jackson, P.J. and Shadle, C.H. 2001. Ptch-scaled estmaton of smultaneous voced and turbulence-nose components n speech. Speech and Audo Processng, IEEE Transactons on. 9, 7 (2001), 713 726. [20] Lttlewort, G., Whtehll, J., Wu, T., Fasel, I., Frank, M., Movellan, J. and Bartlett, M. 2011. The computer expresson recognton toolbox (CERT). Automatc Face & Gesture Recognton and Workshops (FG 2011), 2011 IEEE Internatonal Conference on (2011), 298 305. [21] Low, L.-S., Maddage, M., Lech, M., Sheeber, L. and Allen, N. 2010. Influence of acoustc low-level descrptors n the detecton of clncal depresson n adolescents. Acoustcs Speech and Sgnal Processng (ICASSP), 2010 IEEE Internatonal Conference on (2010), 5154 5157. [22] Lucey, P., Cohn, J.F., Kanade, T., Saragh, J., Ambadar, Z. and Matthews, I. 2010. The Extended Cohn-Kanade Dataset (CK+): A complete dataset for acton unt and emoton-specfed expresson. Computer Vson and Pattern Recognton Workshops (CVPRW), 2010 IEEE Computer Socety Conference on (2010), 94 101. [23] Maryn, Y., Corthals, P., Van Cauwenberge, P., Roy, N. and De Bodt, M. 2010. Toward mproved ecologcal valdty n the acoustc measurement of overall voce qualty: combnng contnuous speech and sustaned vowels. Journal of voce. 24, 5 (2010), 540 555. [24] Mehta, D.D., Rudoy, D. and Wolfe, P.J. 2012. Kalman-based autoregressve movng average modelng and nference for formant and antformant trackng. The Journal of the Acoustcal Socety of Amerca. 132, 3 (Sep. 2012), 1732 46. [25] Moore, E., Clements, M., Pefer, J. and Wesser, L. 2003. Analyss of prosodc varaton n speech for clncal depresson. Engneerng n Medcne and Bology Socety, 2003. Proceedngs of the 25th Annual Internatonal Conference of the IEEE (2003), 2925 2928. [26] Mundt, J.C., Snyder, P.J., Cannzzaro, M.S., Chappe, K. and Geralts, D.S. 2007. Voce acoustc measures of depresson severty and treatment response collected va nteractve voce response (IVR) technology. Journal of neurolngustcs. 20, 1 (Jan. 2007), 50 64. [27] Ozdas, A., Shav, R.G., Slverman, S.E., Slverman, M.K. and Wlkes, D.M. 2004. Investgaton of vocal jtter and glottal flow spectrum as possble cues for depresson and near-term sucdal rsk. Bomedcal Engneerng, IEEE Transactons on. 51, 9 (2004), 1530 1540. [28] Quater, T.F. and Malyska, N. 2012. Vocal-Source Bomarkers for Depresson: A Lnk to Psychomotor Actvty. Interspeech (2012). [29] Reynolds, D.A., Quater, T.F. and Dunn, R.B. 2000. Speaker verfcaton usng adapted Gaussan mxture models. Dgtal sgnal processng. 10, 1 (2000), 19 41. [30] Rudoy, D., Spendley, D.N. and Wolfe, P.J. 2007. Condtonally lnear Gaussan models for estmatng vocal tract resonances. INTERSPEECH (2007), 526 529. [31] Schwartz, G.E., Far, P.L., Salt, P., Mandel, M.R. and Klerman, G.L. 1976. Facal expresson and magery n depresson: an electromyographc study. Psychosomatc medcne. 38, 5 (1976), 337 47. [32] Shen, W., Whte, C.M. and Hazen, T.J. 2009. A comparson of query-by-example methods for spoken term detecton. DTIC Document. [33] Sturm, D.E., Torres-Carrasqullo, P.A., Quater, T.F., Malyska, N. and McCree, A. 2011. Automatc Detecton of Depresson n Speech Usng Gaussan Mxture Modelng wth Factor Analyss. Interspeech (2011), 2981 2984. [34] Trevno, A.C., Quater, T.F. and Malyska, N. 2011. Phonologcally-based bomarkers for major depressve dsorder. EURASIP Journal on Advances n Sgnal Processng. 2011, 1 (2011), 1 18. [35] Valstar, M., Schuller, B., Smth, K., Almaev, T., Eyben, F., Krajewsk, J., Cowe, R. and Pantc, M. 2013. AVEC 2014 3D Dmensonal Affect and Depresson Recognton Challenge. (2013). [36] Wllamson, J.R., Blss, D.W. and Browne, D.W. 2011. Epleptc sezure predcton usng the spatotemporal correlaton structure of ntracranal EEG. Acoustcs, Speech and Sgnal Processng (ICASSP), 2011 IEEE Internatonal Conference on (2011), 665 668. [37] Wllamson, J.R., Blss, D.W., Browne, D.W. and Narayanan, J.T. 2012. Sezure predcton usng EEG spatotemporal correlaton structure. Eplepsy & Behavor. 25, 2 (2012), 230 238. [38] Wllamson, J.R., Quater, T.F., Helfer, B.S., Horwtz, R., Yu, B. and Mehta, D.D. 2013. Vocal bomarkers of depresson based on motor ncoordnaton. Proceedngs of the 3rd ACM nternatonal workshop on Audo/vsual emoton challenge (2013), 41 48. [39] Yu, B., Quater, T.F., Wllamson, J.R. and Mundt, J.C. 2014. Predcton of cogntve performance n an anmal fluency task based on rate and artculatory markers. Interspeech (2014). 72