STOCHASTIC MODELS OF PITCH JITTER A D AMPLITUDE SHIMMER FOR VOICE MODIFICATIO

Similar documents
AUTOMATED CHARACTERIZATION OF ESOPHAGEAL AND SEVERELY INJURED VOICES BY MEANS OF ACOUSTIC PARAMETERS

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures

Study and Comparison of Various Techniques of Image Edge Detection

Copy Number Variation Methods and Data

An Improved Time Domain Pitch Detection Algorithm for Pathological Voice

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16

EXAMINATION OF THE DENSITY OF SEMEN AND ANALYSIS OF SPERM CELL MOVEMENT. 1. INTRODUCTION

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Using Past Queries for Resource Selection in Distributed Information Retrieval

A New Diagnosis Loseless Compression Method for Digital Mammography Based on Multiple Arbitrary Shape ROIs Coding Framework

Research Article Statistical Analysis of Haralick Texture Features to Discriminate Lung Abnormalities

Balanced Query Methods for Improving OCR-Based Retrieval

Physical Model for the Evolution of the Genetic Code

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE

Shape-based Retrieval of Heart Sounds for Disease Similarity Detection Tanveer Syeda-Mahmood, Fei Wang

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana

*VALLIAPPAN Raman 1, PUTRA Sumari 2 and MANDAVA Rajeswari 3. George town, Penang 11800, Malaysia. George town, Penang 11800, Malaysia

Dr.S.Sumathi 1, Mrs.V.Agalya 2 Mahendra Engineering College, Mahendhirapuri, Mallasamudram

EVALUATION OF BULK MODULUS AND RING DIAMETER OF SOME TELLURITE GLASS SYSTEMS

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article

Combined Temporal and Spatial Filter Structures for CDMA Systems

A Heuristic Method of the Optimal Matching for the Two Unstructured Systems

Michael Dorman Department of Speech and Hearing Science, Arizona State University, Tempe, Arizona 85287

Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of

4.2 Scheduling to Minimize Maximum Lateness

Arrhythmia Detection based on Morphological and Time-frequency Features of T-wave in Electrocardiogram ABSTRACT

Investigation of zinc oxide thin film by spectroscopic ellipsometry

Heart Rate Variability Analysis Diagnosing Atrial Fibrillation

A New Machine Learning Algorithm for Breast and Pectoral Muscle Segmentation

Perceptual image quality: Effects of tone characteristics

Optimal Planning of Charging Station for Phased Electric Vehicle *

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters

A Geometric Approach To Fully Automatic Chromosome Segmentation

SPEECH TO FACIAL ANIMATION CONVERSION FOR DEAF CUSTOMERS

Vocal and Facial Biomarkers of Depression Based on Motor Incoordination and Timing

Clinging to Beliefs: A Constraint-satisfaction Model

Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer

Concentration of teicoplanin in the serum of adults with end stage chronic renal failure undergoing treatment for infection

The Influence of the Isomerization Reactions on the Soybean Oil Hydrogenation Process

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION

Proceedings of the 6th WSEAS Int. Conf. on EVOLUTIONARY COMPUTING, Lisbon, Portugal, June 16-18, 2005 (pp )

IDENTIFICATION AND DELINEATION OF QRS COMPLEXES IN ELECTROCARDIOGRAM USING FUZZY C-MEANS ALGORITHM

NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 TIANHONG ZHOU

Analysis of the QRS Complex for Apnea-Bradycardia Characterization in Preterm Infants

A Linear Regression Model to Detect User Emotion for Touch Input Interactive Systems

Inverted-U and Inverted-J Effects in Self-Referenced Decisions

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo

Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago

Lateral Transfer Data Report. Principal Investigator: Andrea Baptiste, MA, OT, CIE Co-Investigator: Kay Steadman, MA, OTR, CHSP. Executive Summary:

Machine Understanding - a new area of research aimed at building thinking/understanding machines

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE

Using a Wavelet Representation for Classification of Movement in Bed

Encoding processes, in memory scanning tasks

Price linkages in value chains: methodology

TMS Induced EEG Artifacts Analysis. Based on the Partial Cross-Correlations

ME Abstract. Keywords: multidimensional reliability, instrument of students satisfaction as an internal costumer, confirmatory factor analysis

Towards Automated Pose Invariant 3D Dental Biometrics

Integration of sensory information within touch and across modalities

Design of PSO Based Robust Blood Glucose Control in Diabetic Patients

ALMALAUREA WORKING PAPERS no. 9

Pattern Recognition for Robotic Fish Swimming Gaits Based on Artificial Lateral Line System and Subtractive Clustering Algorithms

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field

Desperation or Desire? The Role of Risk Aversion in Marriage. Christy Spivey, Ph.D. * forthcoming, Economic Inquiry. Abstract

What Determines Attitude Improvements? Does Religiosity Help?

A Glorious Literature on Linear Goal Programming Algorithms

Myocardial Mural Thickness During the Cardiac Cycle

THE NORMAL DISTRIBUTION AND Z-SCORES COMMON CORE ALGEBRA II

ARTICLE IN PRESS Biomedical Signal Processing and Control xxx (2011) xxx xxx

Biomarker Selection from Gene Expression Data for Tumour Categorization Using Bat Algorithm

RENAL FUNCTION AND ACE INHIBITORS IN RENAL ARTERY STENOSISA/adbon et al. 651

Computing and Using Reputations for Internet Ratings

Study on Psychological Crisis Evaluation Combining Factor Analysis and Neural Networks *

Multidimensional Reliability of Instrument for Measuring Students Attitudes Toward Statistics by Using Semantic Differential Scale

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS

AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data

Richard Williams Notre Dame Sociology Meetings of the European Survey Research Association Ljubljana,

Lymphoma Cancer Classification Using Genetic Programming with SNR Features

Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods

A Novel artifact for evaluating accuracies of gear profile and pitch measurements of gear measuring instruments

J. H. Rohrer, S. H. Baron, E. L. Hoffman, D. V. Swander

Recognition of ASL for Human-robot Interaction

An Approach to Discover Dependencies between Service Operations*

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

Semantics and image content integration for pulmonary nodule interpretation in thoracic computed tomography

Automatic Labelling and BI-RADS Characterisation of Mammogram Densities

Experimental Study of Dielectric Properties of Human Lung Tissue in Vitro

Importance of Atrial Compliance in Cardiac Performance

A Computer-aided System for Discriminating Normal from Cancerous Regions in IHC Liver Cancer Tissue Images Using K-means Clustering*

Research Article Statistical Segmentation of Regions of Interest on a Mammographic Image

Estimation of Relative Survival Based on Cancer Registry Data

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

Active Affective State Detection and User Assistance with Dynamic Bayesian Networks. Xiangyang Li, Qiang Ji

Resampling Methods for the Area Under the ROC Curve

Transcription:

STOCHASTIC MODELS OF PITCH JITTER A D AMPLITUDE SHIMMER FOR VOICE MODIFICATIO Dma Runsky,2 and Yzhar Lavner Department of Computer Scence, Tel-Ha Academc College, Israel 2 Israel Development Center, Intel Corporaton, Hafa, Israel ABSTRACT We present a voce modfcaton algorthm for transformng a modal voce to a hoarse voce. The algorthm s based on modfyng the jtter and shmmer of the voce, whch are long known to be connected to hoarseness. A ptchsynchronous lnear predcton analyss-resynthess process s used to ncrease jtter and shmmer n modal voces, where the data for the jtter and the shmmer s obtaned from a stochastc model based on naturally hoarse voces. A formal evaluaton of the algorthm conducted wth multple voces and lsteners showed that the produced voces are perceved as close to natural hoarseness, suggestng the applcablty of the algorthm to varous voce modfcaton and voce synthess tasks. Index Terms Hoarseness, voce modfcaton, voce effects, jtter, shmmer.. I TRODUCTIO Voce modfcaton s the task of alterng the acoustc characterstcs of the voce, whle retanng the phonetc characterstcs. That s, producng a voce that sounds dfferent, but carres the same speech nformaton, and s stll perceved as natural. Examples nclude tme-scale modfcaton [], ptch-scale modfcaton [2], voce morphng [3], voce personalty transformaton [4], or manpulatons of varous voce qualtes, such as hoarseness, creakness, breathness, and others [5]. In recent years, along the advancement of speech syntheszers, there s an ncreased nterest n voce modfcaton systems, whch enhance the ablty to produce more natural, expressve, and personalzed synthetc voces [6]. It s also a subject of nterest n the professonal musc ndustry, [5, 7] and n voce transformaton systems [8]. Among the varous vocal qualty types, a group of perceptually and acoustcally smlar voces, whch s commonly referred to as rough or hoarse voces, has been the focus of extensve study (see for example [6-4]). Many of these studes am to fnd the acoustc or the perceptve correlates of the hoarse or rough voces ([9, -4]). For example, t was found that hoarseness s related to measures of jtter, shmmer and nose levels (e.g., [2, 3]). Other studes confrmed that the perceved roughness s lnked wth hgh levels of jtter, but also found that ths percepton s affected by the relaton between the jtter and the fundamental frequency [4]. Despte the multtude of studes on hoarse and rough voces, few of them attempt to synthesze such voces or to convert modal voce nto hoarse voce [5, 6, 8]. In most such studes t was found that the naturalness of the effect s often dependent on the nput voce. In ths study, we present an algorthm for transformng a modal voce to a hoarse voce, amed at the professonal musc ndustry. The goal of the algorthm s to ntroduce hoarseness n a sngng or speakng voce, whle retanng ts naturalness. Addtonally, the hoarseness model produced by the algorthm may be used to enhance voce synthess systems, by ntroducng dfferent vocal qualtes to the syntheszed voces. The transformaton s acheved by modelng two key parameters, whose relaton to hoarseness has been establshed n prevous studes ptch jtter (e.g., [5, 6]) and ampltude shmmer (e.g., [7]). Jtter refers to the shortterm (cycle-to-cycle) perturbaton n the fundamental frequency, whle shmmer refers to the short-term perturbaton n ampltude of the voce [8]. Our algorthm constructs a model based on statstcal analyss of natural hoarse voces, and utlzes t to modfy the jtter and shmmer propertes of a modal voce, usng a ptch-synchronous lnear predcton analyss-resynthess mechansm. Double-blnd lstenng tests conducted wth nne lsteners, usng several dfferent voces showed that the algorthm s capable of producng hoarse voces that are perceved as close to natural, suggestng the applcablty of the algorthm to varous voce modfcaton and voce synthess tasks. 2. PITCH A D JITTER I SPEECH Voced speech s produced by the quas-perodc vbratons of the vocal folds, modulated by the vocal tract. The speech waveform, therefore, has a locally pseudo-perodc nature: for the sgnal denoted by x( t ), at any pont t 0 there exsts a pseudo-perod T, such that x( t) x( t+ T) for each t [ t0, t0+ T]. The fundamental perod T for whch the above holds s called the local ptch perod of the sgnal, wth the

correspondng fundamental frequency denoted as ptch frequency. In regular speech, typcal values of the ptch perod are between 2.5 ms and 6 ms. The ptch perod vares over tme, dependng on the utterance, ntonaton, emotonal state of the speaker, and other factors. The long term ptch varatons are often ntentonal, and are due to changes n the ntonaton or transtons between phonemes. The short-term changes are caused by the tme-varant characterstcs of the vocal system, and are usually not controllable: small, random fluctuatons exst between consecutve glottal cycles even n sustaned vowels. These small varatons are commonly referred to as jtter, and exst naturally n every voce. In modal voces, the jtter typcally vares between 0.% and % of the ptch perod [9], whereas for hoarse voces t s typcally hgher []. In our emprcal tests, the strength of the jtter was found to be up to 5% n hoarse voces. An example s shown n Fgure, whch compares the ptch contours of the same male snger, sngng once n a modal voce, and once n a hoarse voce, whch exhbts notceably stronger jtter. The jtter dervatve (frst order dfference functon) s also computed, as shown n the example n Fgure 3. Next, we extract statstcs on the strength of the jtter (relatve to the nstantaneous ptch), and on the possble duratons of the jtter trend - the number of consecutve ptch cycles wthout sgn alternatons of the jtter dervatve. Our analyss has shown that these trends tend to be very short, typcally only a sngle cycle, and occasonally 2-4 cycles, as demonstrated n Fgure 3 (bottom plot). We calculate the probabltes of each possble trend duraton (, 2, 3 and 4+ cycles), and for each such case construct a jtter bank, whch stores the possble jtter strength values. The jtter banks are used to model the jtter n the synthess stage, where an artfcal jtter contour s constructed for syntheszng hoarseness n a gven nput voce (see Secton 4). Fgure. Ptch contour of a male voce sgnng the vowel "AA". TOP: Modal voce. BOTTOM: Hoarse voce. The jtter n the hoarse voce s clearly stronger. 3. A ALYSIS OF JITTER PATTER S I HOARSE VOICES In order to ntroduce hoarseness n modal voces, we analyzed the jtter patterns occurrng naturally n hoarse voces. A schematc descrpton of the analyss framework s shown n Fgure 2.. After segmentaton of the sgnal to voced/unvoced sectons, we obtan the local ptch perod usng a two stage detector, based on the real cepstrum (for ntal detecton) and the short-tme normalzed crosscorrelaton (for refnement). After the ptch contour s obtaned, the average local ptch and the long-term varatons are cancelled out, usng a hgh-pass flter, leavng only the short-term varatons, namely the jtter. Fgure 3. Jtter pattern of a hoarse voce. TOP: The ptch perod contour (measured as number of samples at 4400 Hz samplng frequency). MIDDLE: The jtter contour obtaned after removng the ptch trend. BOTTOM: The jtter dfference functon, wth a prevalent sgn-alternatng pattern. 4. JITTER GE ERATIO In our approach, jtter s smulated by stretchng or shortenng each ptch cycle accordng to a jtter factor assocated wth that cycle. A vector of such jtter factors denoted as jtter contour. In ths vector, J s the local jtter factor of each ptch cycle, whose length s P samples. Postve values of J ndcate stretchng, whereas negatve values ndcate shortenng. Thus, the syntheszed ptch cycle s P + J samples long (Fgure 4). x[n] VOICED/UNVOICED (hoarse speech) SEGMENTATION Voced segments PITCH DETECTION Ptch contour P REMOVAL OF LONG-TERM CHANGES Jtter J Jtter dervatve STATISTICAL ANALYSIS Fgure 2. A schematc block dagram of the jtter analyss system. Jtter trend probabltes Relatve jtter ampltude Fgure 4. TOP: Orgnal ptch contour of a modal voce. MIDDLE: Generated jtter contour. BOTTOM: Syntheszed ptch contour wth added jtter. All values are n samples at 4400 Hz samplng rate.

The jtter s modeled as a stochastc varable wth short-term memory, constructed pont-by-pont. The value of the jtter at any pont may depend on (up to 4) past samples, as follows: at each step, the jtter trend (postve or negatve) and ts duraton (number of cycles between and 4) are chosen randomly, based on the probabltes calculated n the analyss stage. The strength of the jtter for each of the cycles s selected from the sutable jtter bank (see Secton 3). In prelmnary experments ths model was shown to yeld hoarse voces that sound more natural than the smpler Gaussan memory-less model, where the value of the jtter at each step s smply taken at random from the jtter bank, accordng to a Gaussan dstrbuton. The naïve approach of a fxed sgn-alternatng jtter was even less successful, leadng to a voce that was perceved as "metallc" or "buzzy". The generated jtter contours are normalzed, so as to contan only nformaton about the trend and the relatve jtter strength. The actual amount of jtter for each ptch cycle s determned as a combnaton of several parameters: The length of the ptch cycle (nstantaneous ptch) The local value of the normalzed jtter contour J ˆ. The maxmum relatve jtter J max ( 0,). J max s userdefned, to control the strength of the hoarseness effect. The short-tme energy of the ptch cycle E (computed relatve to the mnmum energy E mn and the maxmum energy E max measured across the sgnal). The naturalness of the produced voce ncreases when jtter and energy are correlated, as was demonstrated n prelmnary tests. The local jtter factor J s determned by the followng formula: J ˆ = J max J C P, () where E 0.5 E C mn = + 0.5 Emax Emn s a scalng factor, rangng from 0.5 to, dependng on the local energy. J s rounded to the nearest nteger because ts value must be determned n samples. (2) 5. THE HOARSE ESS SY THESIS ALGORITHM The synthess algorthm conssts of the followng three steps: Ptch-synchronous analyss, usng Lnear Predcton (LPC) [20] nverse flterng. Introducton of jtter by modfyng the LPC resdual of ndvdual ptch cycles. Reconstructon of the voce usng the LPC synthess flter. A schematc block dagram of the process s shown n Fgure 5. The LPC resdual, denoted as e[ n ], s computed usng LPC nverse flterng, where M e[ n] = x[ n] a x[ n ] =, (3) M = 2 s the flter order and the coeffcents a are obtaned by the Levnson-Durbn algorthm appled separately to each ptch cycle. The resdual sgnal e[ n ] contans mostly the nformaton related to the glottal pulse and the fundamental frequency, snce the nformaton related to the formants s elmnated by the nverse flterng. Unvoced segments are unchanged by the algorthm, snce they do not exhbt ptch, and therefore, jtter s not applcable. Indeed, snce vocal folds vbratons do not contrbute to unvoced speech, hoarseness and other voce modaltes are rrelevant. The resdual of each ptch cycle, obtaned through the nverse flterng, s ether stretched or compressed by a few samples, accordng to the desred jtter factor (see Secton 4), as follows: f the -th ptch cycle s P samples long, and the jtter factor s J samples ( J may be negatve), the resdual s resampled by a factor of ( P + J) / P. The resamplng s performed usng a poly-phase FIR flter mplementaton, wth a Kaser wndow to mnmze dstortons. Fnally, the sgnal s reconstructed cycle-by-cycle from the resampled resdual sgnal eɶ [ n], usng the LPC synthess flter: M y[ n] = e[ n] + a y[ n ] = ɶ, (4) At ths step, shmmer may be appled to the sgnal, as descrbed n the followng secton. x[ n] a A ( z ) a e[ n ] eɶ [ n] y[ n ] A( z) Fgure 5. A block dagram of the process of ntroducng jtter to a modal voce. 6. SHIMMER Shmmer s defned as the fluctuaton of ampltudes of consecutve ptch cycles of voced speech. Followng prevous studes, such as [7], whch establshed a relaton between the amount of shmmer and the hoarseness of the voce, we attempted to use shmmer n combnaton wth jtter, n order to enhance the hoarse effect. Ths was done by multplyng each ptch cycle by a specal type of wndow functon, wth varyng peak ampltudes: xɶ n = x n + S w n, (5) [ ] [ ] ( [ ])

x n s the ptch cycle (after applcaton of the jtter S s the desgnated shmmer factor and w n s a Hammng wndow. Ths wndow functon acheves maxmum shmmer near the mddle of the ptch cycle, whle ensurng contnuty at the boundares of consecutve cycles. The shmmer factor S vares over tme, smlarly to the jtter factor J (see Secton 4), and s bounded by the maxmum shmmer S max. It was expermentally found that usng values of S max < 0.2 can enhance the effect acheved by the jtter, whereas hgher values can lead to notceable dstortons and degradaton of the voce. where [ ] procedure), (,) [ ] 7. RESULTS The evaluaton of the algorthm has been carred out usng a subjectve lstenng test, n whch lsteners had to judge the level of hoarseness and naturalness n several voces, sampled and dgtzed usng a samplng frequency of 44. KHz. Voces of four dfferent sngers, two male, and two female, wth dfferent ptches and tmbres were used. The average duraton of the recordngs was 7 seconds. All the orgnal voces were modal or very close to modal. For each voce, the algorthm was appled to construct three hoarse samples wth lght jtter/shmmer (3-4%), wth slghtly stronger jtter/shmmer (5-6%), and wth extremely strong jtter/shmmer (20%). The jtter was modeled accordng to jtter banks extracted from voces of two hoarse speakers, male and female, each used for the correspondng gender. ne lsteners partcpated n a subjectve evaluaton test, all of them wthout any hearng loss. Fve samples of each of the four speakers voces were used: the orgnal (clear) voce, the three hoarse samples constructed by the algorthm, and a dstracter constructed usng the naïve sgn-alternatng approach known to yeld metallc sound. For each voce, the lsteners were presented wth the fve samples n a double-blnd settng (nether the lsteners nor the testers knew the order of the samples). The lsteners were asked to grade each of the voces accordng to two crtera: hoarseness and naturalness, on a free scale between 0 and 00 (usng slders). For the evaluaton, the lsteners could play each sample any number of tmes, n any order, before assgnng the grades. The results are shown on Fgure 6 for each of the voces separately. Each column ndcates the average of the scores assgned by the lsteners. All lsteners perceved the samples generated by algorthm as notceably hoarse, when compared to the orgnal modal voces. However, the naturalness of the voces decreased as the hoarseness ncreased, as evdent by the grades gven to the verson wth 20% jtter. The metallc sound also receved sgnfcantly lower marks n naturalness, even though ts hoarseness marks were hgh. It s also evdent from the results that the success of the algorthm can depend on the orgnal voce, wth M/F recevng sgnfcantly lower marks n naturalness compared to M2/F2. Fgure 6. The hoarseness and naturalness grades for each of the four voces. M/M2 and F/F2 denote the male and female sngers, respectvely. CLEAR orgnal modal voce, HRS004/HRS005/HRS02 syntheszed hoarseness usng jtter/shmmer values of 0.04, 0.05 and 0.2, respectvely. METAL syntheszed hoarseness usng the naïve approach. 8. CO CLUSIO We presented an algorthm for transformng a modal voce nto a hoarse voce, by controllng two parameters of the speech waveform, whose correlaton wth hoarseness has been prevously establshed jtter and shmmer. Our expermental results showed that enhancng these two parameters leads to a voce that s perceved as hoarse. The results demonstrate a few lmtatons of the algorthm, namely, the decrease of the perceved naturalness as the hoarseness ncreases, and the large varaton n naturalness between dfferent voces. However, n all cases tested, t was possble to synthesze a voce that sounds notceably hoarse, whle retanng most of ts naturalness. Thus, the algorthm seems to be a promsng drecton n hoarseness emulaton for voce synthess and voce modfcaton applcatons. Addtonal mprovement may be acheved by further study of the propertes of jtter and shmmer n hoarse voces, n order to obtan methods that can better preserve the naturalness of the modfed voce. Another approach s combnng the presented algorthm wth other technques. For example, n order to acheve natural hoarseness n hghptched voces, t may be desrable to slghtly lower the fundamental frequency, usng a ptch-scale modfcaton algorthm.

9. ACK OWLEDGME T We would lke to thank Mr. Mer Shaashua and Mr. Ita Neoran, both of Waves Audo, for valuable deas and dscussons. We also wsh to thank Mr. Yefm Yakr for techncal assstance. Ths study was partally supported by Waves Audo. REFERE CES [] W. Verhelst, "Overlap-add methods for tme-scalng of speech," Speech Communcaton, vol. 30, pp. 207-22, 2000. [2] E. Moulnes and J. Laroche, "Non-parametrc technques for ptch-scale and tme-scale modfcaton of speech," Speech Communcaton, vol. 6, pp. 75-206, 995. [3] Y. Lavner and G. Porat, "Voce morphng usng 3-D waveform nterpolaton surfaces and lossless tube area functons," EURASIP Journal on Appled Sgnal Processng, vol. 8, pp. 74-84, 2005. [4] L. M. Arslan, "Speaker Transformaton Algorthm usng Segmental Codebooks (STASC)," Speech Communcaton, vol. 28, pp. 2-226, 999. [5] A. Loscos and J. Bonada, ""Emulatng Rough and Growl Voce n Spectral Doman"," n In proceedngs of the 7th Conference on Dgtal Audo Effects (DAFx 04), Naples, Italy, 2004, pp. 49-52. [6] T. Böhm, N. Audbert, S. Shattuck-Hufnagel, G. Németh, and V. Aubergé, "Transformng modal voce nto rregular voce by ampltude scalng of ndvdual glottal cycles," n Acoustcs 08, Pars, France, 2008, pp. 64-646. [7] K. I. Sakakbara, L. Fuks, H. Imagawa, and N. Tayama, "Growl voce n Ethnc and Pop Styles," n Int. Symp. on Muscal Acoustcs (ISMA 2004), Nara, Japan, 2004. [8] A. Verma and A. Kumar, "Introducng roughness n ndvdualty transformaton through jtter modelng and modfcaton," n Internatonal Conference on Acoustcs, Speech, and Sgnal Processng, ICASSP-05, 2005, pp. 5-8. [9] F. Bettens, F. Grenez, and J. Schoentgen, "Estmaton of vocal dysperodctes n dsordered connected speech by means of dstant-sample bdrectonal lnear predctve analyss," Journal of the Acoustcal Socety of Amerca, vol. 7, pp. 328-337, 2005. [0] C. Gobl and A. N. Chasade, "The role of voce qualty n communcaton of emoton, mood and atttude," Speech Communcaton, vol. 40, pp. 89-22, 2003. [] T. M. Jones, M. Trabold, F. Plante, B. M. G. Cheetham, and J. E. Ears, "Objectve assessment of hoarseness by measurng jtter " Clncal Otolaryngology & Alled Scences, vol. 26, pp. 29-32, 200. [2] J. Kreman and B. R. Garrett, "Percepton of aperodcty n pathologcal voce," Journal of the Acoustcal Socety of Amerca, vol. 7, pp. 220-22, 2005. [3] G. d. Krom, "Some spectral correlates of pathologcal breathy and rough voce qualty for dfferent types of vowel fragments," Journal of speech and hearng research, vol. 38, pp. 794-8, 995. [4] J. Muñoz, E. Mendoza, M. D. Freseneda, G. Carballo, and P. López, "Acoustc and Perceptual ndcates of Normal and Pathologcal Voce," Fola Phonatr. Logop., vol. 55, pp. 02-4, 2003. [5] E. Keller, "The analyss of voce qualty n speech processng," Lecture otes n Computer Scence, vol. 3445, pp. 54-73, 2005. [6] J. Laver, The Phonetc Descrpton of Voce Qualty: Cambrdge Unversty Press, 980. [7] D. Mchaels, M. Fröhlch, and H. W. Strube, "Selecton and Combnaton of Acoustc Features for the Descrpton of Pathologc Voces," Journal of the Acoustcal Socety of Amerca, vol. 03, pp. 628-639, 998. [8] I. R. Ttze, "Workshop on acoustc voce analyss, summary statement ", Nat. Center of Voce and Speech, Denver, Colorado, 994. [9] J. Schoentgen, "Stochastc models of jtter," Journal of the Acoustcal Socety of Amerca, vol. 09 pp. 63-650, 200. [20] T. F. Quater, Dscrete-Tme Speech Sgnal Processng vol. 4: Prentce-Hall, 200.