A Determination of Drinking According to Types of Sentence using Speech Signals

Similar documents
A Study on the Degree of Pronunciation Improvement by a Denture Attachment Using an Weighted-α Formant

A Judgment of Intoxication using Hybrid Analysis with Pitch Contour Compare (HAPCC) in Speech Signal Processing

Reducing Errors of Judgment of Intoxication in Overloaded Speech Signal

A Study on Recovery in Voice Analysis through Vocal Changes before and After Speech Using Speech Signal Processing

A Study on Reducing Stress through Deep Breathing

A Study on Warning Sound for Drowsiness Driving Prevention System

A Study on a Automatic Audiometry Aid by PSM

Voice Detection using Speech Energy Maximization and Silence Feature Normalization

Combination of Bone-Conducted Speech with Air-Conducted Speech Changing Cut-Off Frequency

Automatic Judgment System for Chinese Retroflex and Dental Affricates Pronounced by Japanese Students

FREQUENCY COMPRESSION AND FREQUENCY SHIFTING FOR THE HEARING IMPAIRED

The effect of wearing conventional and level-dependent hearing protectors on speech production in noise and quiet

Frequency Tracking: LMS and RLS Applied to Speech Formant Estimation

A Review on Dysarthria speech disorder

Speech (Sound) Processing

Advanced Audio Interface for Phonetic Speech. Recognition in a High Noise Environment

Ambiguity in the recognition of phonetic vowels when using a bone conduction microphone

Jitter, Shimmer, and Noise in Pathological Voice Quality Perception

HCS 7367 Speech Perception

Voice. What is voice? Why is voice important?

whether or not the fundamental is actually present.

International Forensic Science & Forensic Medicine Conference Naif Arab University for Security Sciences Riyadh Saudi Arabia

2/25/2013. Context Effect on Suprasegmental Cues. Supresegmental Cues. Pitch Contour Identification (PCI) Context Effect with Cochlear Implants

IS A TWO WAY STREET BETTER COMMUNICATION HABITS A GUIDE FOR FAMILY AND FRIENDS

SoundRecover2 the first adaptive frequency compression algorithm More audibility of high frequency sounds

Juan Carlos Tejero-Calado 1, Janet C. Rutledge 2, and Peggy B. Nelson 3

EEL 6586, Project - Hearing Aids algorithms

11 Music and Speech Perception

Noise-Robust Speech Recognition in a Car Environment Based on the Acoustic Features of Car Interior Noise

ACOUSTIC AND PERCEPTUAL PROPERTIES OF ENGLISH FRICATIVES

Noise-Robust Speech Recognition Technologies in Mobile Environments

CHAPTER 1 INTRODUCTION

Performance of Gaussian Mixture Models as a Classifier for Pathological Voice

Re/Habilitation of the Hearing Impaired. Better Hearing Philippines Inc.

HCS 7367 Speech Perception

Role of F0 differences in source segregation

SPEECH TO TEXT CONVERTER USING GAUSSIAN MIXTURE MODEL(GMM)

ACOUSTIC ANALYSIS AND PERCEPTION OF CANTONESE VOWELS PRODUCED BY PROFOUNDLY HEARING IMPAIRED ADOLESCENTS

An active unpleasantness control system for indoor noise based on auditory masking

Shaheen N. Awan 1, Nancy Pearl Solomon 2, Leah B. Helou 3, & Alexander Stojadinovic 2

Auditory scene analysis in humans: Implications for computational implementations.

Speech conveys not only linguistic content but. Vocal Emotion Recognition by Normal-Hearing Listeners and Cochlear Implant Users

Assistive Listening Technology: in the workplace and on campus

What is sound? Range of Human Hearing. Sound Waveforms. Speech Acoustics 5/14/2016. The Ear. Threshold of Hearing Weighting

TESTS OF ROBUSTNESS OF GMM SPEAKER VERIFICATION IN VoIP TELEPHONY

Language Speech. Speech is the preferred modality for language.

Sound Texture Classification Using Statistics from an Auditory Model

Research Article Measurement of Voice Onset Time in Maxillectomy Patients

Genesis of wearable DSP structures for selective speech enhancement and replacement to compensate severe hearing deficits

WHAT IS SOFT SKILLS:

Implementation of Spectral Maxima Sound processing for cochlear. implants by using Bark scale Frequency band partition

Effects of speaker's and listener's environments on speech intelligibili annoyance. Author(s)Kubo, Rieko; Morikawa, Daisuke; Akag

UvA-DARE (Digital Academic Repository) Perceptual evaluation of noise reduction in hearing aids Brons, I. Link to publication

group by pitch: similar frequencies tend to be grouped together - attributed to a common source.

INTELLIGENT LIP READING SYSTEM FOR HEARING AND VOCAL IMPAIRMENT

Prelude Envelope and temporal fine. What's all the fuss? Modulating a wave. Decomposing waveforms. The psychophysics of cochlear

[5]. Our research showed that two deafblind subjects using this system could control their voice pitch with as much accuracy as hearing children while

LATERAL INHIBITION MECHANISM IN COMPUTATIONAL AUDITORY MODEL AND IT'S APPLICATION IN ROBUST SPEECH RECOGNITION

Providing Effective Communication Access

A Sleeping Monitor for Snoring Detection

Hearing Lectures. Acoustics of Speech and Hearing. Auditory Lighthouse. Facts about Timbre. Analysis of Complex Sounds

Production of Stop Consonants by Children with Cochlear Implants & Children with Normal Hearing. Danielle Revai University of Wisconsin - Madison

Use of Auditory Techniques Checklists As Formative Tools: from Practicum to Student Teaching

Hearing in the Environment

USING CUED SPEECH WITH SPECIAL CHILDREN Pamela H. Beck, 2002

Assessing Hearing and Speech Recognition

A comparison of acoustic features of speech of typically developing children and children with autism spectrum disorders

Application of Motion Correction using 3D Autoregressive Model in Kinect-based Telemedicine

Effect of spectral normalization on different talker speech recognition by cochlear implant users

Consonant Perception test

Noise and Fishing Vessels

Effects of noise and filtering on the intelligibility of speech produced during simultaneous communication

Hearing. and other senses

A PROPOSED MODEL OF SPEECH PERCEPTION SCORES IN CHILDREN WITH IMPAIRED HEARING

Audibility, discrimination and hearing comfort at a new level: SoundRecover2

Psychometric Properties of the Mean Opinion Scale

Overview. Acoustics of Speech and Hearing. Source-Filter Model. Source-Filter Model. Turbulence Take 2. Turbulence

Chapter 3. Sounds, Signals, and Studio Acoustics

Speech Generation and Perception

Auditory gist perception and attention

Essential feature. Who are cochlear implants for? People with little or no hearing. substitute for faulty or missing inner hair

Subjective impression of copy machine noises: an improvement of their sound quality based on physical metrics

Speech Spectra and Spectrograms

SLHS 1301 The Physics and Biology of Spoken Language. Practice Exam 2. b) 2 32

Who are cochlear implants for?

Modulation and Top-Down Processing in Audition

Phoneme Perception Test 3.0

Open The Door To. Of Better Hearing. A Short Guide To Better Hearing. People First.

Representation of sound in the auditory nerve

What you re in for. Who are cochlear implants for? The bottom line. Speech processing schemes for

Hierarchical Classification of Emotional Speech

Speech Intelligibility Measurements in Auditorium

Requirements for Maintaining Web Access for Hearing-Impaired Individuals

REVISED. The effect of reduced dynamic range on speech understanding: Implications for patients with cochlear implants

Computational Perception /785. Auditory Scene Analysis

TWO HANDED SIGN LANGUAGE RECOGNITION SYSTEM USING IMAGE PROCESSING

Sonic Spotlight. Binaural Coordination: Making the Connection

Speech Cue Weighting in Fricative Consonant Perception in Hearing Impaired Children

I. INTRODUCTION. OMBARD EFFECT (LE), named after the French otorhino-laryngologist

Transcription:

A Determination of Drinking According to Types of Sentence using Speech Signals Seong-Geon Bae 1, Won-Hee Lee 2 and Myung-Jin Bae *3 1 School of Software Application, Kangnam University, Gyunggido, Korea. 2,3 Information and Telecommunication of Department, Soongsil University, 369, Sangdo-ro Dongjak-gu, Seoul, Korea. 1 Orcid Id: 0000-0003-3252-0062, 2 Orcid Id: 0000-0002-7232-2207, 3 Orcid Id: 0000-0002-7585-0400 *Correspondence Author Abstract In this paper, we investigated the accuracy of negative utterance sentences in order to judge drinking. Drinking makes it difficult to pronounce specific sounds correctly. A number of speech parameters have been studied to identify inaccurate pronunciation features. Generally, the method of grasping the inaccurate pronunciation is analyzed using the change of pitch and formant. Formants have various information such as the individual's uniqueness, clarity, and physical condition, and pronunciation status can also be analyzed. Analysis of the pronunciation through the pronunciation agency may cause difficulty in the analysis due to the pronunciation difference of the individual. On the other hand, analysis of pronunciation through the vocal organs is easy to analyze because consonants or vowels are applied only to the consonant vowel rather than to the pronunciation. Especially when drinking alcohol, the vocal cord is dehydrated due to alcohol. Therefore, it is difficult to elaborate voices or to utter certain sounds. Therefore, in this paper, we have studied the possibility of extracting the alcohol discrimination parameter with the change of the accuracy of the voiced sound according to the vocal sentence vowels. As a result, the longer the sentence, the more pronounced error was obtained and the meaningful result was obtained when the number sentence was spoken. Keywords: Drinking, spectrum, pronunciation, LPCcoefficients INTRODUCTION Of the 131 vessels that were found to have been drunk in the sea during 2015, 90 vessels accounted for 68.7% of the vessels, and the largest percentage. [1] Drinking on the road is easy to crack down on drinking through short distance. However, since it is necessary to stop the movement of vessels such as fishing vessels or cargo vessels from above the sea, it is difficult to control drinking through near places. On the other hand, if it is possible to measure alcohol disturbance at a distance rather than a short distance, it can be applied not only to roads but also to maritime and air traffic. The information that can be obtained from a distance through the state or change of the body is voice. If alcohol is interrupted through voice that can send information to a long distance, it is possible to measure any distance. In order to enforce alcohol abuse by voice over the sea, a wireless device must be provided to connect the sender and the receiver. At sea, all vessels are managed by VTS (Sea Traffic Control System) and provide information services to help the ship in the direction of the ship. When the voice communication is performed through the VTS, the characteristics vary depending on the radio equipment, but the voice signal is distorted during demodulation through the receiving equipment. In the case of a cargo ship or a large ship, it is possible to obtain a distorted voice signal by using a speciallydesigned radio equipment, while using a relatively lowspecification equipment in the case of a small fishing boat. When a signal is demodulated at the time of reception, a phenomenon occurs in which distortion occurs in the voice signal during detection and becomes a limit. If the input voice signal is similar to when the input voice signal is out of the input level and if the possibility of drinking through the distorted voice signal is measured, an error occurs in the judgment rate and it is difficult to intercept the voice signal. It is necessary to extract a parameter that is robust against distortion in order to determine whether or not to drink by using a distorted voice signal on the VTS communication [1][3][4][5]. Therefore, in this paper, we try to determine whether alcohol is strong in the surrounding environment by extracting parameters for determining whether or not to drink according to the type of utterance. In Section 2, we examine the existing method of determining whether or not to drink alcohol. In Chapter 3, we discuss how to extract the alcohol dependency parameter according to the type of vocalization proposed in this paper. In Chapter 4, we conclude the future research direction. 11157

EXISTING METHODS Generally speaking, the phonetic organ is classified into two characteristics and analyzed. First, it is analyzed as a formant characteristic having pitch characteristic and negative characteristic having individual characteristic information. Figure 1 shows a picture of the voice generation model [5][6][7][8][9][10]. Fig. 2 and 3 show the difference of formant envelope after treatment according to each LPC coefficient by the analysis method by LPC analysis before and after drinking. The horizontal axis represents frequency and the vertical axis represents energy. The first peak is the first formant and the third peak of the second peak represents the second formant and the third formant, respectively. In the speech signal after drinking, the interval between the vowel and the consonant is characterized by a very significant change in the sound. This means that the accuracy of the vowels connecting the sound and the sound is reduced. When the sound is generated in syllable unit through the saints, it is judged to be a phenomenon because the pronunciation changes due to the change of other vocal organs [3][4][5][6][7]. Figure 1: A block of diagram of speech production Figure 2: Voice signal formants when the LPC coefficients before drinking are 6, 14 11158

Figure 3: Voice signal formants when the LPC coefficients after drinking are 6, 14 Fig. As can be seen from 2 and 3, there is a difference in formant changes when comparing the voice signals before and after drinking. It can be seen that the change of the third formant and the fourth formant more prominently than the change of the first formant and the second formant. This is because the difference between the 6th and 14th percentiles is significant when calculating formants using LPC before drinking, but not significantly after drinking [3]. Therefore, the envelope for LPC with different orders is obtained in order to measure the formant change amount through the LPC coefficient. Here, 6th and 14th were used for comparison. 3. Proposed Method Before drinking, voice can be delivered more accurately than voice when drunk. On the other hand, when drinking under the pressure of the vocal cords, the complete opening and closing of the vocal cords may not occur, thereby increasing the minimum pressure required for vocalization and increasing the volume of voice users. The information of the jaw and the tooth can be extracted through the third formant and the fourth formant. The high-frequency domain of the speech signal is useful for analyzing the naturalness and clarity of the sound. In addition, since the third formant and the fourth formant are the formant information in the high frequency region, the parameters can be sufficiently extracted by only the high frequency [1][2][3]. Therefore, in order to investigate the pronunciation accuracy according to sentence types in the speech signal before and after drinking, this paper analyzes through the frequency domain. Fig. 4 is a block diagram of the procedure for the experiment. In general, since the information of the voice signal is mostly below 4 khz, the unnecessary components are separated through the low-frequency filter of 3.75 khz cut-off frequency. The LPF used is defined as having a non-lossy speech signal. Depending on the sentence, fig. 4 method is applied to determine the accuracy of the sound according to the sentence type, and to extract the appropriate sentence when judging drinking. In this paper, the logarithm of the energy of the formants according to each LPC order is obtained and the pronunciation accuracy is obtained because the difference of the formant envelope is pronounced according to the pronunciation sentence type before and after drinking. Equation (1) is shown in Fig. 3 shows the formula of log distance. 11159

Figure 4: A block of diagram of proposed algorithm EXPERIMENTS AND RESULTS In this paper, the voice before drinking and the voice after drinking were recorded and used as voice samples. The (1) recording environment was recorded at various places such as home, office, laboratory, subway, automobile, bar, and used as a sample. The ages participated in the recording were 20 ~ 60 years old and total 44 voices was analyzed. The amount of alcohol consumed was 20% or more in his own diet. We used the test sentences. Figures 5 to 6 show the proposed method with the speech signal of the sentence. (a) 11160

(b) Figure 4: LPC coefficient difference result before drinking (a) 11161

(b) Figure 5: LPC coefficient difference result after drinking As shown in the figure, the whole frame distance of the LPC coefficient in fig 4 before drinking is larger than the coefficient distance in fig. Especially, the difference between "shi" and "sou" is remarkable. Figures show the proposed method with the speech signal of sentence. The phrase "ah-ahoh" also has a greater overall frame distance after drinking than drinking. However, there were fewer differences than the " shi" sentence. When I saw the phonemes, there was a remarkable result similar to "Shi" in "ie" and "ou". Figures show the proposed method with the speech signal of "one to nine" sentence. "1 to 8," like the other two sentences, the distance difference before drinking is noticeable. In addition, we can see that the difference is larger than the other two sentences. When I saw the phonemes, the results showed that there was a difference between "7" and "8". CONCLUSION Drinking accidents occur on the road as much as alcoholic accidents on the road. Drinking accidents occur on the sea, so it is difficult to prevent them beforehand, and if they happen, the cost and damage will increase. If it is possible to find fishing vessels that operate after drunkenness through remote control, it is possible to enforce the crackdown efficiently and reduce the cost and secondary damage. If it is possible to judge the possibility of drinking by using voice data that can transmit a change of a person's body to a remote place, it is possible to sufficiently intercept the communication only in a wireless communication environment. In the case of a cargo ship or a large ship, it is possible to communicate with good sound quality because it uses specially designed equipment and it has less influence on the noise inside the ship. Conversely, in the case of small fishing boats or inexpensive wireless equipment, noise and / or voice signals may be clipped due to the limited input level. Therefore, in this paper, we propose a method to judge the drinking status according to the sentence type in order to be robust to the surrounding environment and to optimize the interception. In the future, this information will be used to gain important information changes in the field of recognition and synthesis REFERENCES [1] Seong-Geon Bae, Won-Hee Lee, and Myung-Jin Bae, A Judgment of Intoxication using Hybrid Analysis with Pitch Coutour Compare in Speech Signal Processing, IJAER, Vol. 12(2017), pp.2342-2346, 2017. [2] Seong-Geon Bae and Myung-Jin Bae, A Study on Recovery in Voice Analysis through Vocal Changes before and After Specch Using Speech Signal Processing, IJAER, Vol. 12(2017), pp.5299-5303, 2017. 11162

[3] Seong-Geon Bae, Myung-Jin Bae, "Reducing Errors of Judgement of Intoxication in Overloaded Speech Signal," IJET Vol.8, No.1, pp.219-224, 2016. [4] Won-Hee Lee, Myung-Sook Kim, and Myung-Jin Bae, Using valid-frame deviation to judgment of intoxication. Information, An International Interdisciplinary Journal, Vol. 18 (2015), pp.4131-4136, 2015. [5] Seong-Geon Bae, Myung-Sook Kim, and Myung-Jin Bae, Using High Frequency Accentuation in Speech Signals as a New Parameter in Intoxication Judgment. Information, An International Interdisciplinary Journal, Vol. 17 (2014), pp.6531-6536, 2014. [6] Seong-Geon Bae, Myung-Jin Bae, A New Speech Coding using Harmonics Emphasis Filter, ISAAC 2013, AACL Vol. 1(2013), pp.43-44, 2013. [7] Geum-Ran Baek, Myung-Jin Bae, Study on the Judgment of Intoxication State using Speech Analysis ". Vol.352 CCIS, 2012, pp. 277-282, Springer-Verlag. 2012. [8] Won-Hee Lee, Seong-Geon Bae, and Myung-Jin Bae, "A Study on Improving the Overloaded Speech Waveform to Distinguish Alcohol Intoxication using Spectral Compensation," IJET Vol.7, No.5, pp.235-237. [9] Chang-Joong Jung, Seong-Geon Bae, Myung-Sook Kim and Myung-Jin Bae, Speech Sobriety Test Based on Formant Energy Distribution, IJMUE Vol.8(2013), No.6, pp.209-216, 2013. [10] S. Marcieli Bellé, Sílvia do Amaral Sartori, Angela Garcia Rossi. Alcoholism: effects on the cochleovestibular apparatus. Rev Bras Otorrinolaringol. 73(1):116-22. 2007. 11163