An Investigation of MDVP Parameters for Voice Pathology Detection on Three Different Databases

Size: px

Start display at page:

Download "An Investigation of MDVP Parameters for Voice Pathology Detection on Three Different Databases"

Anna Blake
5 years ago
Views:

INTERSPEECH 2015 An Investigation of MDVP Parameters for Voice Pathology Detection on Three Different Databases Ahmed Al-nasheri 1, Zulfiqar Ali 1,2, Ghulam Muhammad 1, and Mansour Alsulaiman 1 1

1 INTERSPEECH 2015 An Investigation of MDVP Parameters for Voice Pathology Detection on Three Different Databases Ahmed Al-nasheri 1, Zulfiqar Ali 1,2, Ghulam Muhammad 1, and Mansour Alsulaiman 1 1 Digital Speech Processing Group, Department of Computer Engineering, College of Computer and Information Sciences King Saud University, Riyadh 11543, Saudi Arabia 2 Centre for Intelligent Signal and Imaging Research (CISIR), Department of Electrical and Electronic Engineering Universiti Tekhnologi PETRONAS, Tronoh 31750, Perak, Malaysia a.alnasheri@yahoo.com, {zuali, ghulam, msuliman}@ksu.edu.sa Abstract In this paper, an investigation of Multi-Dimensional Voice Program (MDVP) parameters to automatically detect voice pathology in three different databases was conducted. MDVP parameters are very popular acoustic analysis among physician / clinician to detect voice pathology. The main objective of the paper is to find out the most prominent MDVP parameters irrespective to the databases used. In this study, three different databases from three distinct languages were used. The databases are Arabic voice pathology database (AVPD), Massachusetts Eye and Ear Infirmary (MEEI) (English database), and Saarbruecken Voice Database (SVD) (German database). Only the sustained vowel /a/ was used in the study. Fisher discrimination ratio (FDR) was applied to rank the parameters. Support vector machine (SVM) was used to perform the detection process. The experimental results demonstrated that there was clear difference of the performance of the MDVP parameters using these databases. The highly ranked parameters were also different from one database to another. The accuracies that achieved are varied from one database to another with the same number of MVPD parameters. The best accuracies obtained by using the three highest MDVP parameters arranged according to FDR were 99.68%, 88.21% and for SVD, MEEI and AVPD, respectively. Index Terms: MDVP parameters, AVPD, SVD, MEEI, SVM, Fisher discrimination ratio, voice pathology detection. 1. Introduction Voice pathologies affect the vocal folds, and these pathologies (disorder) produce irregular vibrations in the vocal folds due to the malfunctioning of the voice box. Vocal fold pathologies exhibit variations in a vibratory cycle of the vocal folds due to their incomplete closure. The voice disorder also changes the shape of the vocal tract and produces irregularities in spectral properties [1]. In addition, voice disorders affect the vocal fold vibration depending on the type of disorder and the location of the disease on the vocal folds that make them produce different basic tones. The number of dysphonic patients having different types of voice disorders has been increased significantly. In the United States, approximately 7.5 million people have vocal difficulty [2]. It has been found that 15% of the total visitors to the King Abdul Aziz University Hospital, Riyadh complain from a voice disorder [3]. The complications caused by a voice problem in a teaching professional are significantly greater than in a nonteaching professional. Studies revealed that, in the U.S., the prevalence of voice disorders during a lifetime is 57.7% for teachers and 28.8% for non-teachers [4]. Approximately, 33% of male and female teachers in the Riyadh area suffer from voice disorders [5]. At Communication and Swallowing Disorders Unit, King Abdul Aziz University Hospital, a high volume of voice disorder cases is examined (almost 760 cases per annum) in individuals with various professional and etiological backgrounds. The use of computers to detect or identify pathological problems in speech is a non-invasive method, which is advancing with time. In the last decade, much research has been done on the automatic detection of vocal fold disorders, and these tasks continue to require further investigation due to the lack of standard automatic diagnosing approaches/equipment for voice disorders. Pathology detection is the first crucial step to correctly diagnose and manage voice disorders. The use of the objective assessment that includes acoustical analysis is independent of human bias and can assess the voice quality reliably by relating certain parameters to vocal fold behavior. On the other hand, subjective measurement of voice quality is based on individual experience, which may vary. Automatic voice pathology detection can be accomplished by various types of features, which can be obtained by the longterm and short-term signal analysis. The long-term parameters can be derived by acoustic analysis [6], [7] of speech, and the short term parameters can be calculated by linear predictive coefficients (LPC) [8], [9], linear predictive cepstral coefficients (LPCC) [10], Mel-frequency cepstral coefficients (MFCC) [11], [12] etc. Different pattern matching techniques such as Gaussian mixture model (GMM) [13], [14], hidden Markov model (HMM) [15], support vector machine (SVM) [16], artificial neural networks (ANN) [17] etc. have been used to differentiate between disorder and normal samples. Multiple long-term acoustic features, namely, pitch, shimmer, jitter, APQ (amplitude perturbation quotient), PPQ (pitch perturbation quotient), HNR (harmonic to noise ratio), NNE (normalized noise energy), VTI (voice turbulence index), SPI (soft phonation index), FATR (frequency amplitude tremor), and the glottal to noise excitation ratio (GNE) are frequently used to diagnose voice pathology (referred in [14] as [2]-[12]). Furthermore, jitter and shimmer capture the vocal fold vibration characteristics for both pathological and normal people, and both parameters are widely used for clinical and scientific diagnosis [18]. Seven acoustic parameters, including shimmer Copyright 2015 ISCA 2952 September 6-10, 2015, Dresden, Germany

2 and jitter, are extracted by means of an iterative residual signal estimator in Rosa et al. [19], and jitter provided 54.8% pathology detection accuracy between 21 pathologies. Thirtythree different long-term acoustic parameters with their definitions, derived from Multi-Dimensional Voice Program (MDVP), are listed in Arjmandi et al. [20]. Twenty-two acoustic parameters are selected from the list, and they were extracted from the voice samples of the Massachusetts Eye and Ear Infirmary (MEEI) database. In this study, 50 dysphonic patients and 50 normal persons were used for the detection. The 22 parameters are calculated for each sample and fed to six different classifiers to compare their accuracies. Two feature reduction techniques are also used before applying classification methods. Binary classifier SVM has shown the best results compared to other classifiers, and its recognition rate is 94.26%. In [21], MFCC and six acoustic parameters jitter, shimmer, NHR, SPI, APQ and RAP (Relative Average Perturbation) are extracted, and the results are compared with the NN-based voice pathology detection system [22]. Sáenz- Lechón et al compared between their proposed parameters based on wavelet transform and some of the MDVP parameters to discriminate between pathological and normal voices [23]. To ensure the reliability of the acoustic MDVP parameters, some of these parameters were compared to the same parameters which are extracted using Praat and the result showed that there was no significant different between the two computer software [24]. Recently, MPEG-7 audio descriptors and multi-directional regression based features are used in voice pathology detection and found to have good accuracy [26, 27]. Another recent study investigates the most discriminative frequency region for voice pathology detection [28]. In general, MDVP parameters have a good ability to discriminate between normal and pathological voices like other tools that used to extract acoustic parameters such as WPCVox [25]. In this paper, the well-known MDVP parameters are used in three different databases, which are AVPD, MEEI, and SVD, to detect voice pathology. MDVP parameters are commonly used by the physician / clinician to assess the voice pathology; however, MDVP is commercial software, and it includes parameters of voice quality measurement. The rest of the paper is organized as follows: section 2 provides overview of the speech databases. Section 3 presents the methodology and tools using in this study; Section 4 gives experimental setup; Section 5 presents results and discussion, and finally, Section 6 draws some conclusions. 2. Databases 2.1. Arabic voice pathology database (AVPD) The speech samples collected in different sessions at Communication and Swallowing Disorders Unit [4], King Abdul Aziz University Hospital, Riyadh, Saudi Arabia, by experienced phoneticians in a soundproof room using a standardized recording protocol. The database collection is one of the major tasks of the undergoing project funded by National Plans for Science and Technology (NPST), Saudi Arabia, for the duration of two years. The protocol of the database designed such that it should avoid different shortcomings of MEEI database [24]. The project records the sustained vowels as well as the speech of patients having vocal fold disorders and the same of normal persons after clinically assessing the persons. The recording has different types of texts; three vowels with onset and offset information, isolated words containing Arabic digits and some common words, and continuous speech. The selected text covers all Arabic phonemes. All speakers record three utterances of each vowel /a/, /u/ and /i/, while, isolated words and continuous speech are recorded once to avoid making a burden on patients. The sampling frequency of the database is 50 khz, and the speech is recorded by using Kay Pentax computerized speech lab (CSL Model 4300) Massachusetts Eye and Ear Infirmary (MEEI) voice disorder database This database developed by MEEI Voice and Speech Lab. It includes more than 1,400 voiced samples of sustained vowel /a/ and first 12 seconds of Rainbow Passage. It is openly commercialized by Kay Elemetrics [29]. It was recorded in two different environments. The sampling frequency for normal samples is 50 khz while that of the pathological samples is 25 khz or 50 khz. It is used in most of the studies of voice pathology detection and classification Saarbruecken voice database (SVD) The SVD is a freely downloadable database [30]. It has been recorded by the Institute of Phonetics of Saarland University. This database contains sustained vowels /a/, /i/ and /u/ with different intonations: normal, low, high and low-high-low and a spoken sentence in German Guten Morgen, wie geht es Ihnen? that mean in English Good morning, how are you? is included. Very few studies of voice pathology detection have been done on this database. 3. Method Various samples of normal and three distinct voice disorders were taken from three different databases as shown in Table 1. We chose these disorders because only these disorders are common in the three databases. The numbers of male and female speakers are shown, respectively, inside the parenthesis in Table 1. Table 1. Normal and pathological samples from three different databases. Data base AVPD MEEI SVD Normal 118 (93-25) 53 (21, 32) 262 ( ) Pathological Cysts Paraly. Polyp (7 6) (16-16) (14-16) (6-4) (38-32) (8-7) (5-1) (121-73) (19-25) 244 In the experiments, we only used samples for sustained vowel /a/. Each voice sample has a total of 33 MDVP acoustic parameters. The MDVP parameters of these samples were extracted using MDVP [31] kaypentax. This program extracts 33 parameters, but we only selected 22 parameters out of 33 parameters because the rest of the parameters do not reflect voice quality, or they are not reported for some voices. In this study, fisher discriminative ratio (FDR) was used between two classes, one from normal and the other from pathological, for the whole parameters in each database individually. The purpose of using FDR is to find which features contribute higher to detection of pathologies, and it can be calculated as shown in (1). 2 ( N P ) FDR where, i 1,2,3,...22 (1) i 2 2 N P 2953

3 where µn and µp represent the mean for classes of normal and pathological samples, respectively, while σn and σp represent the same but as variances, and where i represents the feature number. In the experiments, the features were fed to support vector machine for the decision making whether the subject is normal or pathological. To ensure from the accuracy we got, every experiment was repeated ten times and then we took the average to report the results. 4. Experiment For each database, four different experiments were performed with various numbers of parameters. First, we selected the 22 parameters that were used and defined in [31]. We performed the experiments with the 22 parameters from these databases individually. After that, we used FDR and sorted these parameters in descending order according to the fisher ratio score. We chose the first top 10 parameters, and performed the experiments with these parameters. Next, we chose the three top parameters from each database and performed the experiments with them. To develop a general system independent of the databases, we chose the four common parameters from the first 10 top parameters and performed the experiment with them from each database individually. 5. Results and Discussion The results of the performed experiments for pathology detection are expressed in terms of accuracy (ACC), sensitivity (SN: the proportion of pathological samples that positively identified), specificity (SP: the proportion of normal samples that negatively identified) and area under the Receiver Operating Characteristic (ROC) curve called AUC, which are shown in Table 2. As we can see from this table, the accuracies are varied from one database to another with the same number of MDVP parameters. The variation in the accuracies from one database to another may refer to different reasons such as (i) the severity levels of voice disorders, which are not the same in the three databases as we can notice from Table 2 how much sensitivity (that belongs to pathological samples) is varied from one database to another, (ii) the recording environment and the regulation of the recording are not the same in the three databases, (iii) in the same database the recoding environments for pathological and normal samples are not the same as in MEEI database, and (iv) the number of the taken samples from these databases in this study are not the same. In case of 22-parameters, the highest accuracy is 76.36% with MEEI database, which is comparable to the result in [32] where Arjmandi uses the same database and the same classifier, but different pathological samples. The accuracies with the two other databases, SVD and AVPD, are comparable (little less) to the accuracy obtained by MEEI. To the best of our knowledge, this is the first instance of evaluating the MDVP parameters in these two databases, and it needs more investigation. In case of 10 top parameters, the accuracy is increased by 11% with the MEEI samples, and the accuracy remained almost constant in case of SVD and AVPD. The reason behind that may be due to the top 10 parameters are not the same in all databases, and their FDR values are different too. In case of 3 top MDVP parameters, the accuracies are dramatically increased especially with the samples that were taken from SVD database. The best accuracies are 99.68%, 88.21%, and 72.53% for SVD, MEEI, and AVPD, respectively. The top three features that gave high accuracy with SVD are vam (peak amplitude variation in period-to-period), APQ (amplitude perturbation quotient), and PFR (phonatory fundamental frequency). vam has FDR value of which is far greater than FDR values of and that belong to PFR and APQ, respectively. Table 2. Result of different MDVP Parameters from three databases Parameter s 22- Parameters 10- Parameters 3-Parameters 4 - Common Databas e ACC % SN% SP% AUC AVPD MEEI SVD AVPD MEEI SVD AVPD MEEI SVD AVPD MEEI SVD Figure 1: Scatter plot for the three to MDVP Parameters from SVD database. From the scatter plot in Figure 1, we can see vam parameter has the best ability to discriminate the pathological samples more than the others two parameters, and it can be illustrated in Figure 2 that the probability density functions (pdf) of normal and pathological samples using the vam parameter have almost no overlapping. Figure 3 illustrates the ROC curve of the 3 top parameters from the three databases. It shows that the best performance is obtained with the features extracted from SVD database. One of the reasons behind that is in SVD database, the pathological samples have high severity, so it has a clear difference between normal and pathological samples. The 95% confidence interval (C.I.) is [ ], and 1-tail p-value is zero (<0.05) describing the significance of the data in the two classes. 2954

4 Figure 2: PDF for vam Parameter. from these databases produce the same ability to detect pathological voices or not and to avoid unfairly over-fitting that results from error estimation. By performing cross database experiment, it will be easy to see how well the trained system on one database could classify samples of another database. As shown in Table 3 the cross database experiments are performed between two databases: one database for training and the other one for testing and the accuracy is depicted in that table with increased accuracy in the presence of SVD database in the training. In addition, three more experiments are performed between three databases, where two are used for training and the other one for testing. The accuracies are comparable in the three experiments. Table 3: Cross database experiments accuracies (%) with 4-common parameters. In general, the result we got is comparable to previous study in case of using MEEI database and it represents a good result if we compare our initial work on AVPD database with the initial use of SVD in the field of voice pathology detection as in [33]. Training Testing MEEI SVD AVPD MEEI SVD AVPD MEEI + SVD SVD + AVPD MEEI AVPD Figure 3: ROC Curve for 3-Top Features for the three Databases. In case of the using the four common MDVP parameters between the three databases, the accuracy is decreased by 7% in case of MEEI and the other two accuracies for AVD and AVPD almost remained unchanged (with respect to the accuracies obtained by 10 parameters). The reason behind that is that every database is independent of one to another, and the 4 common parameters have different values according to the FDR score. The four common parameters between these databases are Jitta, Jitt, Rap, and PPQ. Jitta: is an absolute jitter that gives an evaluation in microseconds of the period-to-period variability of the pitch within the analyzed voice sample, Jitt: it gives an evaluation of the variability of the pitch period within the analyzed voice sample in percent, Rap: is a relative average perturbation that gives an evaluation of the variability of the pitch period within the analyzed voice sample at smoothing factor 3 periods, and PPQ: is a pitch perturbation quotient that gives an evaluation in percent of the long-term variability of the pitch period within the analyzed voice sample at smoothing factor of three periods. In each type of experiment mentioned before, we made a comparison between the performance of MDVP parameters to detect voice pathology for these databases and how much their samples contribute to discriminate between pathological and normal voices. In this study, some additional experiments were performed using cross database between the three databases to make sure if the extracted MDVP parameters From the results of all the experiments, we find the followings. Training with MEEI does not make the system robust, because the system is confused whether it is classifying the environments or normal / pathology. Training with SVD makes the system more robust than the systems trained with other database, because SVD samples are clearly distinguishable between normal and pathology. Training with two databases offsets some shortcomings of training with one database. There is a need to develop more robust features that can successfully differentiate between normal and pathological samples irrespective of the databases. Moreover, the features should be able to classify low severe pathological samples as in the case of AVPD. 6. Conclusion In this work, we evaluate MDVP parameters on three different databases AVPD, MEEI, and SVD with four different experiments individually. The accuracies of the detection are varied from one database to another with the same number of MDVP parameters. The best accuracies we got are %, 88.21%, and 72.53% for the samples that were taken from SVD, MEEI, and AVPD respectively. Some of the MDVP parameters show high ability of discrimination between normal and pathological subjects such as vam, APQ, and PFR. In a future work, we will try to find out some features that can correctly detect voice pathologies independent of the databases. 7. Acknowledgment This project was funded by the National Plan for Science, Technology and Innovation (MAARIFAH), King Abdulaziz City for Science and Technology, Kingdom of Saudi Arabia, Award Number (12-MED ). 2955

5 8. References [1] G. Muhammad, Z. Ali, M. Alsulaiman, and K. Almutib. "Vocal Fold Disorder Detection by applying LBP Operator on Dysphonic Speech Signal, RAICMS, , [2] National Institute on Deafness and Other Communication Disorders: Voice, Speech, and Language: Quick Statistics, 2014.Availableat Accessed on March, [3] Research Chair of Voicing and Swallowing Disorders. Available at Accessed on March, [4] N. Roy, R.M. Merrill, S. Thibeault, R.A. Parsa, S.D. Gray, and E.M. Smith, Prevalence of voice disorders in teachers and the general population, J Speech Lang Hear Res., vol.47, no. 2, pp , Apr [5] K.H. Almalki, Voice Disorders Among Saudi Teachers in Riyadh City, Saudi Journal of Oto-Rhinolaryngology Head and Neck Surgery, [6] B. Boyanov, and S. Hadjitodorov, Acoustic analysis of pathological voices. a voice analysis system for the screening of laryngeal diseases, Proceedings of IEEE International Conference on Engineering in Medicine and Biology Society, vol.16, pp.74-82, [7] C. E. Martinez, and L H. Rufiner, Acoustic analysis of speech for detection of laryngeal pathologies, Proceedings of 22nd Annual IEEE International Conference on Engineering in Medicine and Biology Society, vol. 3, pp , [8] B. S. Atal, "Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and recognition" J. Acoustic. Soc. Amer., vol. 54, no. 6, pp , [9] L. Xugang, and D. Jianwu, An investigation of dependencies between frequency components and speaker characteristics for text-independent speaker identification, Speech Communication 07, vol. 50, no. 4, pp , Oct [10] M. A. Anusuya, S. K. Katti, Front end analysis of speech recognition: a review, International Journal of Speech Technology, vol. 14, pp , Dec [11] L. Rabiner and B.H. Juang, Fundamentals of speech recognition. Englewood Cliffs, NJ: Prentice-Hall, [12] Z. Ali, M. Aslam., and M.E. Ana María, A speaker identification system using MFCC features with VQ technique, Proceedings of 3rd IEEE International Symposium on Intelligent Information Technology Application, pp , [13] W.J.J. Roberts, and J.P. Willmore, "Automatic speaker recognition using Gaussian mixture models", Proceedings of Information, Decision and Control, IDC 99, pp , [14] J.I. Godino-Llorente, P. Gomes-Vilda and M. Blanco-Velasco, "Dimensionality reduction of a pathological voice quality assessment system based on Gaussian mixture models and shortterm cepstral parameters", IEEE Transactions on Biomedical Engineering, vol. 53, no. 10, pp Oct [15] L.E. Baum and T. Petrie, Statistical inference for probabilistic functions of finite state Markov Chains, Ann. Math. Stat., vol. 37, pp , [16] S. Abe, Support Vector Machines for Pattern Classification. Springer-Verlag, Berlin Heidelberg New York, 2005 [17] T. Ritchings, M. McGillion, and C. Moore, Pathological voice quality assessment using artificial neural networks, Med. Eng. Phys., vol. 24, no. 8, pp , Sept [18] M. Brockmann, M.J. Drinnan, C. Storck, and P.N. Carding, "Reliable jitter and shimmer measurements in voice clinics: The relevance of vowel, gender, vocal intensity, and fundamental frequency effects in a typical clinical task," Journal of Voice, vol. 25, no. 1, pp , [19] M. Rosa, J.C. Pereira, and M. Grellet, Adaptive estimation of residue signal for voice pathology diagnosis, IEEE Trans. Biomed. Eng., vol. 47, no. 1, pp , Jan [20] M.K. Arjmandi, M. Pooyan, M. Mikaili, M. Vali, and A. Moqarehzadeh, Identification of voice disorders using long-time features and support vector machine with different feature reduction methods, Journal of Voice, vol. 25, no. 6, pp , Nov [21] J. Wang and C. Jo, "Vocal folds disorder detection using pattern recognition method", Proceedings of 29th Annual International Conference of the IEEE EMBS, pp , Lyon, France, [22] T. Li, C. Jo, and S. Wang, Classification of pathological voice including severely noisy cases, Proceedings of 8th International Conference on Spoken Language Processing, I, Jeju, Korea, pp , [23] N. Sáenz-Lechón, J.I. Godino-Llorente, V.. Osma-Ruiz, and P. Gómez-Vilda, Methodological issues in the development of automatic systems for voice pathology detection, Biomedical Signal Processing and Control, vol. 1, no. 2, pp , April [24] H. Oğuz, M. A. Kiliç, and M. A. Şafak, "Comparison of results in two acoustic analysis programs: PRAAT and MDVP," Turkish Journal of Medical Sciences 41.5, pp , [25] J.I. Godino-Llorente, V. Osma-Ruiz, N. Sáenz-Lechón, I. Cobeta- Marco, R. González-Herranz, C. Ramírez-Calvo "Acoustic analysis of voice using WPCVox: a comparative study with Multi- Dimensional Voice Program," European Archives of Oto-Rhino- Laryngology, 265.4, , [26] G. Muhammad and M. Melhem, Pathological Voice Detection and Binary Classification Using MPEG-7 Audio Features," Biomedical Signal Processing and Controls, 11, pp. 1 9, [27] G. Muhammad, T. Mesallam, K. Almalki, M. Farahat, A. Mahmood, and M. Alsulaiman, Multi Directional Regression (MDR) Based Features for Automatic Voice Disorder Detection, Journal of Voice, Vol. 26, No. 6, pp. 817.e e27, [28] A. A-Nasheri, Z. Ali, G. Muhammad, and M. Alsulaiman, Voice Pathology Detection Using Auto-Correlation of Different Filters Bank, 11th ACS/IEEE International Conference on Computer Systems and Applications, Doha, Qatar, November 10-13, [29] Kay Elemetrics Corp., Disordered Voice Database, Version 1.03 (CD-ROM), MEEI, Voice and Speech Lab, Boston, MA (October 1994). [30] W.J. Barry, P utzer, M., Saarbrucken Voice Database, Institute of Phonetics, Univ. of Saarland, [31] Kay Elemetrics, Multi-Dimensional Voice Program (MDVP) [Computer Program], [32] M. K. Arjmandi, M. Pooyan, M. Mikaili, M. Vali, A. Moqarehzadeh, "Identification of voice disorders using long-time features and support vector machine with different feature reduction methods," Journal of Voice, 25 (6), pp. e275-e289, [33] D. Martínez, E. Lleida, A. Ortega, A. Miguel, and J. Villalba "Voice Pathology Detection on the Saarbruecken Voice Database with Calibration and Fusion of Scores Using MultiFocal Toolkit," Advances in Speech and Language Technologies for Iberian Languages. Springer Berlin Heidelberg, ,

Performance of Gaussian Mixture Models as a Classifier for Pathological Voice

PAGE 65 Performance of Gaussian Mixture Models as a Classifier for Pathological Voice Jianglin Wang, Cheolwoo Jo SASPL, School of Mechatronics Changwon ational University Changwon, Gyeongnam 64-773, Republic