Nonlinear signal processing and statistical machine learning techniques to remotely quantify Parkinson's disease symptom severity A. Tsanas, M.A. Little, P.E. McSharry, L.O. Ramig 9 March 2011
Project background Neurological disorders claim lives at an epidemic rate worldwide Parkinson s disease (PD) is the second most common neurodegenerative disorder to Alzheimer Incidence rate: 20/100,000 annually 1% affected over the age of 60 years (age most important risk factor) Aging population more PD patients Drugs & surgery can alleviate symptoms but there is no known cure
What we already know Speech impairment is common in PD (90%) Evidence suggests degradation with PD progression Disease symptom tracking: costly, subjective, time-consuming, requires patient s physical presence. Symptoms clinician maps to clinical metric (inter-rater variability) Unified Parkinson s Disease Rating Scale (UPDRS) Range: motor-updrs 0 108, total-updrs 0 176 (44 sections)
Unified Parkinson s Disease Rating Scale (UPDRS) comprises three components and 44 sections in total, each section spans the range 0-4 Component 1 Mentation, behavior and mood 4 sections (1-4) Component 2 Activities of daily living 13 sections (5-17) Component 3 Motor (motor-updrs) 27 sections (18-44) Includes mentation, thought disorder, depression, and motivation/initiative Ability to complete daily tasks unassisted, e.g. dressing, walking, writing Muscle problems e.g. tremor, rigidity, posture, stability, bradykinesia Section 5: Speech the clinician assesses whether the subject s vocal output is understandable during casual discussion. Section 18: Speech the clinician assesses whether the subject s vocal output is expressive during casual discussion.
What this study adds Novel mapping of speech dysphonias to UPDRS Accurate telemonitoring of PD progression Objective machine learning implementations Remote monitoring, less time-consuming Technology enabling large scale clinical trials Facilitates design and testing of novel drug treatments
Telemedicine: the dawn of a new era
Data Untreated idiopathic PD (onset < 5 years at baseline) 42 PD subjects (28 males) Patient follow-up: UPDRS at baseline, 3-month and 6-month Linear interpolation, obtain weekly UPDRS estimates 5875 phonations in total, approximately 140 phonations per subject 6 phonations weekly, 4 phonations comfortable pitch, and 2 loud Data partitioning: 4,010 signals for males and 1,865 signals for females
Methodology of this work Speech signal Extract feature vector Feature selection/transformation Statistical machine learning (mapping features UPDRS) Report results
Features Classical dysphonia measures (Tsanas et al. IEEE TBME 2010a) (capture fundamental frequency changes, amplitude, variability...) Improved classical dysphonia measures (Tsanas et al. ICASSP 2010b) (power transformation of classical schemes leads to improved results) Wavelets, Entropies & Energy (Tsanas et al. NOLTA 2010c) (wavelet decomposition and application of entropy and linear/nonlinear energy concepts) Nonlinear dynamical systems theory (Tsanas et al. JRSI 2010d)
Novel features (background) Pathological voices exhibit high frequency noise Asynchronous excitation due to incomplete vocal fold closure Loss of signal power & increase of noise power signal to noise ratio concepts Unstable control of voice production mechanism, not confined only to vocal folds Tentative belief that voice disorders affect differently frequency bands Nonlinearities in voice production
Novel features (algorithms) Empirical mode decomposition excitation ratio (first components represent noise, the rest components the signal) Vocal fold excitation ratio (detect glottal pulses and scan the frequency range using 500 Hz bands, then 1 Hz 2.5 khz denotes signal, the rest is noise - heuristic finding in this study) Glottis quotient (use DYPSA algorithm to detect glottal pulses and quantify periodicity concept invoked from the classical dysphonia measures)
Empirical mode decomposition excitation ratio (EMD-ER) Hilbert-Huang transform decomposes the signal in its components known as intrinsic mode functions The first intrinsic mode function contains the most highfrequency component in the signal noise in the signal Subsequent components have lower frequencies Experiment to find appropriate ratios of components that denote signal to noise ratio
Vocal fold excitation ratio (VFER) Detect glottal pulses using a robust F0 algorithm Scan the frequency range using 500 Hz bands Optimize the frequency range of signal and noise by experimentation Compute ratios, entropy values of the signal to noise ratio signals
Feature selection Curse of dimensionality Reducing number of features enables a) improved performance, b) more accurate inference of the underlying characteristics of the modelled system LASSO, elastic net, maximum relevance minimum redundancy Recently we have proposed the relevance, redundancy and complementarity trade-off (RRCT)
Statistical mapping y = f (X), X Features and y UPDRS Random forests, powerful nonparametric learner Input into Random Forests the feature subsets selected by the feature selection algorithms One standard error rule to select parsimonious model
Test methods Use 10-fold cross validation with 100 repetitions for confidence Report 1 MAE yi f( xi) L Q, L is number of samples denoted by Q which contains the indices of samples in each cross validation repetition Statistical significance tests (Kolmogorov-Smirnov, Wilcoxon)
Results Table 1: Most strongly associated features with motor-updrs and total-updrs for males Measure VFER- NSR TKEO DFA 7 th MFCC coef 6 th MFCC coef F0,Sun F0,expected Log energy 4 th MFCC coef 0 th MFCC coef 8 th MFCC coef 8 th delta MFCC coef Description Ratio of the sum of the log-transformed mean TKEO of the band-pass signals for frequencies >2.5 khz to the sum of the mean TKEO of the band-pass signals for frequencies <2.5 khz Characterizes the extent of turbulent noise, quantifying its stochastic self-similarity (Little et al. 2007) 7 th Mel Frequency Cepstral Coefficient (Brookes 2006) 6 th Mel Frequency Cepstral Coefficient (Brookes 2006) Mean difference of the cycle-to-cycle estimate (extracted using Sun s algorithm) and the average expected in age- and sex- matched healthy controls Estimate of the logarithmic energy (Brookes 2006) 4 th Mel Frequency Cepstral Coefficient (Brookes 2006) 0 th Mel Frequency Cepstral Coefficient (Brookes 2006) 8 th Mel Frequency Cepstral Coefficient (Brookes 2006) 8 th delta Mel Frequency Cepstral Coefficient (First derivative of 8 th MFCC) (Brookes 2006) Motor-UPDRS relevance and correlation MI Spearman R Total-UPDRS relevance and correlation MI Spearman R 0.105 0.159 0.132 0.187 0.078-0.162 0.115-0.205 0.079-0.066 0.108 0.0070 0.106-0.277 0.102-0.294 0.088 0.097 0.101 0.018 0.090 0.149 0.099 0.169 0.088-0.082 0.098-0.061 0.079 0.171 0.099 0.197 0.106 0.276 0.095 0.259 0.073 0.181 0.093 0.205
Results Table 2: Most strongly associated features with motor-updrs and total-updrs for females Measure Description Motor-UPDRS relevance and correlation MI Spearman R Total-UPDRS relevance and correlation MI Spearman R Std F0,mixture Standard deviation of the extracted 0.205 0.475 0.216 0.470 Std F0, Rapt Standard deviation of the extracted 0.174 0.437 0.195 0.434 GQ closed 0 th MFCC coef F0,Praat F0,expected 1 st MFCC coef Log energy F0,mixture F0, expected Standard deviation of the duration that the vocal folds remain closed 0.211 0.236 0.195 0.250 0 th delta Mel Frequency Cepstral Coefficient (Brookes 2006) 0.200-0.327 0.187-0.344 Mean difference of the cycle-to-cycle estimate (extracted using Praat s algorithm) and the average expected in age- and sex- matched healthy controls 0.198 0.103 0.176 0.034 1 st delta Mel Frequency Cepstral Coefficient (Brookes 2006) 0.135-0.047 0.170-0.031 Estimate of the logarithmic energy (Brookes 2006) 0.179-0.458 0.170-0.487 Mean difference of the cycle-to-cycle estimate (extracted using the mixture algorithm) and the average expected in age- and sex- matched healthy controls 0.181 0.019 0.164-0.055 F0, Rapt F0, expected TKEO (F0), prc5 Mean difference of the cycle-to-cycle estimate (extracted using Rapt s algorithm) and the average expected in age- and sex- matched healthy controls 5 th percentile of the TKEO of the fundamental frequency values, obtained with the mixture algorithm 0.173 0.022 0.158-0.054 0.177-0.411 0.153-0.369
Results Table 3: Selected feature subset using the LASSO (show only 17 due to lack of space) MALES (33 dysphonia measures) FEMALES (33 dysphonia measures) Dysphonia measure Motor UPDRS MI R Total UPDRS MI R Dysphonia measure Motor UPDRS MI R Total UPDRS 6 th MFCC coef 0.106-0.277 0.102-0.294 Log energy 0.179-0.458 0.170-0.487 8 th MFCC coef 0.106 0.276 0.095 0.259 Std F0, Rapt 0.205 0.475 0.216 0.470 VFER SNR,TKEO 0.077-0.076 0.077-0.108 10 th MFCC coef 0.112 0.239 0.107 0.250 VFER mean 0.076 0.154 0.089 0.13 PPE 0.118 0.436 0.105 0.396 8 th delta MFCC 0.073 0.181 0.093 0.205 12 th MFCC coef 0.094 0.204 0.088 0.261 12 th delta MFCC 0.048 0.172 0.054 0.167 IMF SNR,TKEO 0.075-0.127 0.067-0.067 0 th MFCC coef 0.079 0.171 0.097 0.197 8 th MFCC coef 0.114-0.341 0.092-0.255 2 nd MFCC coef 0.082-0.149 0.084-0.182 11 th MFCC coef 0.078 0.127 0.100 0.187 3 rd MFCC coef 0.071 0.091 0.077 0.067 IMF NSR,SEO 0.099-0.117 0.065-0.058 2 nd delta MFCC 0.047 0.130 0.050 0.125 GNE mean 0.090 0.035 0.086-0.062 3 rd delta MFCC 0.046 0.169 0.054 0.161 3 rd delta MFCC 0.070 0.149 0.064 0.119 Std F0, Sun 0.046 0.144 0.050 0.129 HNR std 0.072 0.224 0.066 0.195 9 th MFCC coef 0.075-0.194 0.073-0.153 5 th MFCC coef 0.113 0.173 0.115 0.188 7 th MFCC coef 0.079-0.066 0.108 0.007 2 nd delta MFCC 0.055 0.172 0.056 0.206 4 th delta MFCC 0.041 0.001 0.044 0.007 GNE SNR,TKEO 0.036 0.038 0.042 0.033 GNE SNR,TKEO 0.023 0.074 0.024 0.089 10 th delta MFCC 0.071-0.064 0.066-0.079 Shimmer A0,abs 0.042-0.079 0.058-0.135 GQ open 0.061 0.256 0.057 0.248 MI R
Results Table 4: Out of sample mean absolute error (MAE) for motor UPDRS and total UPDRS Motor UPDRS (males) Total UPDRS (males) Motor UPDRS (females) Total-UPDRS (females) 1.62 0.17 1.96 0.23 1.72 0.16 2.20 0.21 Inter-rater variability: 4-5 UPDRS points Critically important difference ~ 2.5 motor-updrs & ~ 4.5 total-updrs Results are statistically significant (p<0.001) with respect to the mean UPDRS and baseline UPDRS using the KS & Wilcoxon significance tests
Results 6-month UPDRS-tracking for a male and a female subject
Conclusions Speech signals convey clinically useful information Fast, accurate, remote, objective monitoring of PD is shown to be possible using simply speech Results are better than the inter-rater variability (which is about 5 UPDRS points). Potentially these results could be further improved in conjunction with other PD recorded signals (e.g. tremor data).
Current and future work A robust feature selection algorithm which outperforms popular filters (in December we submitted a paper in JMLR) PD metrics (about to submit a paper in Movement Disorders) Apply novel features for PD versus controls (writing for IEEE TBME) Currently extending feature selection algorithms using geometrical concepts A new machine learning approach for speech F0 estimation extending our JRSI suggestion (expecting ground truth signals from collaborators) Exploring a new approach on random forests
References A. Tsanas, M.A. Little, P.E. McSharry, L.O. Ramig: Accurate telemonitoring of Parkinson s disease progression by non-invasive speech tests, IEEE Transactions Biomedical Engineering, Vol. 57, pp. 884-893, 2010a A. Tsanas, M.A. Little, P.E. McSharry, L.O. Ramig: Enhanced classical dysphonia measures and sparse regression for telemonitoring of Parkinson s disease progression, ICASSP, pp. 594-597, Dallas, Texas, US, 14-19 March 2010b A. Tsanas, M.A. Little, P.E. McSharry, L.O. Ramig: New nonlinear markers and insights into speech signal degradation for effective tracking of Parkinson s disease symptom severity", International Symposium on Nonlinear Theory and its Applications (NOLTA), pp. 457-460, Krakow, Poland, 5-8 September 2010c (invited) A. Tsanas, M.A. Little, P.E. McSharry, L.O. Ramig: Remote tracking of Parkinson s disease progression by extracting novel dysphonia patterns from speech signals, Journal ofthe Royal Society Interface, (in press) 2010d