Subject-Adaptve Real-Tme Sleep Stage Classfcaton Based on Condtonal Random Feld Gang Luo, PhD, Wanl Mn, PhD IBM TJ Watson Research Center, Hawthorne, NY {luog, wanlmn}@usbmcom Abstract Sleep stagng s the pattern recognton task of classfyng sleep recordngs nto sleep stages Ths task s one of the most mportant steps n sleep analyss It s crucal for the dagnoss and treatment of varous sleep dsorders, and also relates closely to bran-machne nterfaces We report an automatc, onlne sleep stager usng electroencephalogram (EEG) sgnal based on a recently-developed statstcal pattern recognton method, condtonal random feld, and novel potental functons that have explct physcal meanngs Usng sleep recordngs from human subjects, we show that the average classfcaton accuracy of our sleep stager almost approaches the theoretcal lmt and s about % hgher than that of exstng systems Moreover, for a subject s wth lmted tranng data D, we perform subject adaptaton to mprove classfcaton accuracy Our dea s to use the knowledge learned from old subjects to obtan from D a regulated estmate of CRF s parameters Usng sleep recordngs from human subjects, we show that even wthout any D, our sleep stager can acheve an average classfcaton accuracy of 70% on s Ths accuracy ncreases wth the sze of D and eventually becomes close to the theoretcal lmt 1 Introducton Sleep s ndspensable to everybody As have been reported n Ancol-Israel and Roth 1 that s consstent wth other natonal studes, about one-thrd of Amercans had some knd of sleep problem Hence, the study of sleep pattern, much of whch s through sleep recordngs, has consstently been an mportant research topc A typcal sleep recordng has one or more channels of electroencephalogram (EEG) waves comng from electrodes Sleep stagng s the pattern recognton task of classfyng sleep recordngs nto sleep stages (eg,, sleep) contnuously Ths task s crucal for the dagnoss and treatment of varous sleep dsorders 19 In addton, t relates closely to both ntensve care unt montorng of bran actvty 0 and bran-machne nterfaces In the latter case, successful classfcaton can facltate dsabled people to control computers Sleep stagng s also of specal nterest to the study of avan brd song system 3 and the evolutonary theory of mammalan sleep Many statstcal pattern recognton methods, such as autoregresson 5, Kullback-Lebler dvergencebased nearest-neghbor classfcaton, and hdden Markov model (HMM) 7, have been used to buld an automatc, onlne sleep stager Despte all these efforts, exstng sleep stagers can only acheve an average classfcaton accuracy below 0%, 19, whch s nsuffcent for physcans to quckly and correctly dagnose sleep dsorders by establshng a clear classfcaton of the problem (In bran-computer nterfaces, ncorrect EEG wave classfcaton can cause computers to receve wrong nstructons) In ths work, we present an automatc, onlne sleep stager based on a recently-developed statstcal pattern recognton method, condtonal random feld (CRF), and novel potental functons that have explct physcal meanngs Accordng to our testng results on sngle-channel sleep recordngs from human bengs, our sleep stager can acheve an average classfcaton accuracy that almost approaches the theoretcal lmt 9 and s about % hgher than that of exstng systems One challenge for sleep stagng s that n practce, we often have enough tranng data D old from several old subjects s old but very lmted tranng data D from a subject s, as t often takes several days or several weeks to manually label suffcent D for s 19 In ths case, t s undesrable to tran the parameter vector Θ of the CRF by only usng D Instead, we perform subject adaptaton to mprove the classfcaton accuracy on s Our hgh-level dea s to use the knowledge on Θ that s learned from D old to obtan a regulated estmate of Θ from D In ths way, the classfcaton accuracy on s ncreases wth the sze of D and eventually becomes close to the theoretcal lmt 9 Especally, even wthout any D, the average accuracy on s can be 70% accordng to our test results on sleep recordngs from human bengs CRF was orgnally proposed by the natural language processng communty n 001 11 In contrary to HMM, CRF drectly models the probabltes of possble label sequences gven an observaton sequence, wthout makng unnecessary ndependence
assumptons on the observaton elements Consequently, CRF overcomes HMM s shortcomng of beng unable to represent multple nteractng features or long-range dependences among the observaton elements To the best of our knowledge, nether the applcaton of CRF nor subject adaptaton has been studed before n EEG wave classfcaton The rest of the paper s organzed as follows Secton provdes a bref revew of CRF Secton 3 presents our automatc, onlne sleep stager based on CRF for a sngle subject Secton descrbes the subject adaptaton technque Secton 5 dscusses feature extracton We evaluate the performance of our technques n Secton and conclude n Secton 7 Revew of CRF We frst revew the concept of CRF Let X be the observaton sequence, and Y be the correspondng label (state) sequence The CRF defnton n Lafferty et al 11 s as follows: Defnton Let G = ( V, E) be a graph such that Y = ( y v ), so that Y s ndexed by the vertces of G v V Then ( X, Y ) s a condtonal random feld n case, when condtoned on X, the random varables y v obey the Markov property wth respect to the graph: P( yv X, yw, w v) = P( yv X, yw, w ~ v), where w~v means that w and v are neghbors n G y 1 y y 3 y n-1 y n X = ( x1, x, xn) Fgure 1 Graphcal structure of a lnear-chan CRF A specal case of CRF s the lnear-chan CRF (LCRF) 11 as shown n Fgure 1, where the graph G s a lnear chan so that each y has exactly two neghbors: y -1 and y +1 As has been shown n Lafferty et al 11, n ths case, the dstrbuton of the label sequence Y gven the observaton sequence X has the followng form: n k p( Y X ) exp{ [ = 1 λ j= j f j ( y y X 1 1 1,,, ) k + μ (,, )]} j =1 jg j y X Here, f j and g j are called potental functons λ j and μ j are parameters The selecton of approprate potental functons s both applcaton-dependent and crtcal to the success of the CRF method 3 CRF-based sleep stager for a sngle subject In our sleep stager, we use lnear-chan CRFs In ths case, v v v X = ( x1, x, xn ) s the observaton sequence, where each element v T x = [ 1,, m] s an m-dmensonal vector that represents the observed EEG wave sgnal (possbly after some transformaton) at tme pont ( 1 n ) Y = ( y1, y, yn ) s the label sequence Each y ( 1 n ) belongs to the sleep stage space S (eg, {,, N}) and represents the sleep stage at tme pont that needs to be labeled Our sleep stager uses the followng two knds of potental functons, the frst one s for f j and the second one s for g j : (1) 1y 1 = s1y = t ( s, t S ), () 1 y t x = (, h t S, 1 h m ) 1 ( f y = Here, the ndcator functon t) 1y = t = 0 ( f y t) For each ( 1 n ), the number of potental functons s k = S + S m Our ntuton s that local features are often the most mportant ones Hence, at any tme pont ( 1 n ), we focus on the local observaton elements and only consder the frst-order term x v Also, these potental functons are easy to compute, whch s mportant for onlne classfcaton In fact, these potental functons can be justfed from the statstcal mechancs perspectve: (1) The term exp{ λ s, t1y 1 } 1= s y = t can be vewed as the spontaneous transton probablty from state s to state t () As dscussed below, our X s the power spectral densty, a quantty assocated wth energy Hence, the term exp{ μ t, h1y t h} = can be vewed as an analogy to the Boltzmann factor P( E) exp( βe), whch s related to the probablty for a canoncal ensemble to be n a state wth energy E 1 Gven the k = k 1 + k potental functons, parameter estmaton (e, learnng λ j s and μ j s from a labeled tranng data set) and nference makng (e, gven X, computng the most lkely Y) n the CRF are performed usng the forward-backward dynamc programmng and Vterb algorthms, as descrbed n Lafferty et al 11 and Sha and Perera 1 Subject adaptaton Next, we dscuss subject adaptaton Ths technque combnes the (usually suffcent) tranng data sequence (X old, Y old ) from several old subjects s old wth the (possbly nsuffcent) tranng data sequence
(X, Y ) from a subject s to mprove the classfcaton accuracy on s Let Θ be the column parameter vector of the CRF that contans λ j s and μ j s L old( = ln p( Yold Xold, and L ( = ln p( Y X, are the log-lkelhood functons for s old and s, respectvely Let Θˆ denote the maxmum-lkelhood estmator (MLE) of Θ on s old A theorem about MLE 13 asserts that Θˆ asymptotcally follows a normal dstrbuton, whose mean vector and covarance matrx are Θ and 1 Σ = ( L old ), respectvely Here, Lold s the Hessan matrx of L old ( Ths can be vewed as a pror of Θ when we ft the same model to s The correspondng probablty densty functon s ˆ T 1 p ( exp{ ( Θ Σ ( Θ Θˆ ) / } ˆ T = exp{( Θ L ( Θ Θˆ old ) / } From Bayes theorem, the posteror dstrbuton of Θ s p( Θ X, Y ) p( Y X, p( T Lold exp{ L ( ) ( ˆ ) ( ˆ Θ + Θ Θ Θ / } The gradent of L old (, Lold, can be effcently computed usng a backward-forward dynamc programmng method 11 Lold can be computed numercally by takng dfference quotents of Lold Then we can obtan the pont estmate Θ for s by maxmzng ˆ T L ( ) ( ) ( ˆ Θ + Θ Θ Lold Θ / (eg, usng the BFGS method) 5 Data collecton and transformaton We appled our sleep stager to four -hour human sleep recordngs n the sleep-edf database 1 whose subject ds are sc00e0, sc01e0, sce0, and sc11e0 Each recordng was from a dfferent, healthy Caucasan male or female (1-35 years old) wthout any medcaton The raw data has samplng rate 0 and a sleep stage s assgned for each 30- second epoch by a human scorer The sleep stage space S = {,,, N, N3, N} Due to ts large sze and often exstng artfacts, each EEG recordng s frst transformed to capture the embedded, useful nformaton Ths process s called feature extracton The most popular sgnal processng technques for feature extracton nclude wavelet transform, fast Fourer transform 15, zerocrossng, parametrc waveform recognton 1, etc We adopted an approach based on power spectral propertes of the EEG sgnal The Thompson multtaper method 17 s appled to 3-second movng wndow 1 1 sc00e0 N N3 N 0 5 15 0 5 30 1 1 to obtan the localzed power spectral densty (PSD) wth between-wndow-shft of 7 seconds Consequently, we have 1,333 data ponts for each hour s sleep recordng Fgure shows the average log PSD for each stage For each frequency f and each tme pont, the logarthm of the PSD s normalzed across tme to obtan the Z score Z f,, where normalzaton s performed by frst subtractng the mean and then dvdng by the standard devaton sce0 N N3 N 0 5 15 0 5 30 1 1 sc01e0 N N3 N 0 5 15 0 5 30 1 1 sc11e0 N N3 N 0 5 15 0 5 30 Fgure Stage-specfc average logarthmc power spectral densty of four human subjects We choose m= dsjont frequency bands: 0-, -, -1, 1-1, 1-3, and 3-9, whch jontly contan 99% of the power of EEG waves The justfcatons for selectng these frequency bands are as follows Frst, as Fgure shows, the PSD curves of varous stages are well separated wthn these bands Second, t s well known that human sleep s characterzed nto dfferent stages based on the frequency content of the delta-wave (0-), theta-wave ( ), alphawave ( 13), beta1-wave (13 ), and beta-wave ( 35), whch are smlar to our frequency bands Hence, the features contaned wthn these bands should provde enough dscrmnaton power for stage classfcaton For the jth ( 1 j ) band, at tme pont, let ~ x, j denote the maxmum Z score wthn ths band That s, ~ x, j = max{ Z f,, for all frequences f n the jth band} Snce occasonally the recordng has very large nose caused by movement, we truncate ~ by x, j x ~ ) mn{ ~, j = sgn( j j, A), where A = 5
Vector v T x = [ 1,, m] s the transformed observaton element at tme pont The classfcaton of the sleep recordng s based on the x s across tme Results Our experments were performed on a computer wth one G Intel Core TM Duo T00 processor and GB of memory Feature extracton code s wrtten n Matlab R00a and classfcaton code s wrtten n R For each human subject, the tranng data contans four segments, each of 30 mnutes Two tests were performed on two dsjont test data segments, each of 0 mnutes Each sleep stage has suffcent occurrences n every test data segment For comparson, we also appled the wdely used benchmark classfer of Gaussan Observaton Hdden Markov Models (GOHMM) 7 to the same features as we used for the CRF classfer The feature extracton tme for each 30-mnute data segment s 0 seconds The tranng tme of the CRF classfer vares from 3 seconds to 30 seconds and labelng on test data takes less than one second Thus, the CRF classfer can be used onlne Table 1 reports the accuracy obtaned by the HMM classfer and the CRF classfer on human data The same experment s repeated usng the feature of mnmum Z-score n each frequency band and the results are smlar In most cases the CRF classfer acheves better accuracy than the HMM classfer wth average mprovement of about % The average accuracy of the CRF classfer (37% for human data) already approaches the lmt of automated sleep stagng method, as there s only 0%-90% nterscorer agreement n manual stagng 9 The HMM classfer, however, has an advantage of shorter tranng tme, normally 30 seconds, whch s expected gven ts strong model assumpton of Gaussan observaton Table 1 Classfcaton accuracy on human data classfcaton accuracy Test 1 Test average accuracy of two tests subject d CRF HMM CRF HMM CRF HMM sc00e0 1% 9% 77% 9% 797% % sc01e0 70% 7% 715% 77% 793% 7% sce0 97% 35% 7% 7% % 31% sc11e0 % 91% 930% 9% 7% 790% average accuracy of four subjects 37% 75% We evaluated our subject adaptaton technque usng the human data In each test, we treated one human subject as the subject s and vared the length L of s s tranng data sequence D from 0 to mnutes The other three human subjects are treated as old subjects and ther entre tranng data sequences are used to obtan Θ s pror dstrbuton The classfcaton accuracy acheved by subject adaptaton on the test data sequence of s s called the adaptaton accuracy When s s entre tranng data sequence s used to tran the CRF wthout subject adaptaton, the accuracy obtaned by the CRF classfer on the test data sequence of s s called the fnal accuracy Two tests were performed for each human subject In sx of these eght tests, even when L=0 (e, no tranng data from s ), we can obtan an adaptaton accuracy between 70% and 90%, whch s close to the fnal accuracy and mproves slghtly when L becomes larger Fgure 3 shows the classfcaton accuracy for the other two tests (test of sc01e0 and test 1 of sc11e0) There, the adaptaton accuracy s below 50% when L=0 When L becomes larger, the adaptaton accuracy mproves and eventually reaches the fnal accuracy classfcaton accuracy 0% 0% 0% 0% 0% 0% sc01e0 test adaptaton accuracy fnal accuracy 0 15 30 5 0 75 90 5 length of s 's tranng data sequence (mnutes) classfcaton accuracy 0% 0% 0% 0% 0% 0% sc11e0 test 1 adaptaton accuracy fnal accuracy 0 15 30 5 0 75 90 5 length of s 's tranng data sequence (mnutes) Fgure 3 Classfcaton accuracy acheved by subject adaptaton 7 Concluson One advantage of CRF s that the user can defne potental functons that approprately ft the specfc applcaton Ths paper proposes usng CRF and novel potental functons that have explct physcal meanngs to perform the pattern recognton task of sleep stagng On human subjects, the average classfcaton accuracy of our sleep stager almost approaches the theoretcal lmt and s about % hgher than that of exstng systems Moreover, for a subject s wth lmted tranng data D, we
propose performng subject adaptaton to mprove classfcaton accuracy Even wthout any D, the average accuracy on s can be 70% Ths accuracy ncreases wth the sze of D and eventually becomes close to the theoretcal lmt In addton to human sleep data, we also appled our sleep stager to brd sleep data and obtaned smlar results whose detals are avalable n Luo and Mn 1 Acknowledgements We thank Xng We, Zhenghua Fu, and Danel Margolash for helpful dscussons References 1 Ancol-Israel S, Roth T Characterstcs of nsomna n the Unted States: results of the 1991 natonal sleep foundaton survey Sleep Suppl, S37-S353, 1999 Is ths the bonc man? Nature, 1-171, 00 3 Rauske PL, Shea SD, and Margolash D State and neuronal class-dependent reconfguraton n the avan song system J NeuroPhysology 9, 1-1701, 003 Crck F, Mtchson G The functon of dream sleep Nature 30, 111-11, 193 5 Sergejew AA, Tso AC Markovan analyss of EEG sgnal dynamcs n obsessve-compulsve dsorder In Advances n Processng and Pattern Analyss of Bologcal Sgnals, I Gath, GF Inbar, Eds (Plenum, New York, 1995), pp 33- Gersch W, Martnell F, Yonemoto J, Low MD, and Ewan JA Mc Automatc classfcaton of electroencephalograms: Kullback-Lebler nearest neghbor rules Scence 05(0), 193-195, 1979 7 Penny WD, Roberts SJ Gaussan observaton hdden Markov models for EEG analyss Techncal report TR-9-1, Imperal College, London, 199 Becq G, Charbonner S, Chapotot F, Buguet A, Bourdon L, and Baconner P Comparson between fve classfers for automatc scorng of human sleep recordngs FSKD 00: 1-0 9 Schaltenbrand N, Lengelle R, and Toussant M et al Sleep stage scorng usng the neural network model: comparson between vsual and automatc analyss n normal subjects and patents Sleep 19(1), -35, 199 Rabner LR, Juang BH Fundamentals of speech recognton Prentce Hall, Englewood Clffs, NJ, 1993 11 Lafferty JD, McCallum A, and Perera FC Condtonal random felds: probablstc models for segmentng and labelng sequence data ICML 001: -9 1 Huang K Statstcal Mechancs, nd Edton John Wley & Sons, New York, 197 13 Bckel PJ, Rtov Y, and Rydén T Asymptotc normalty of the maxmum-lkelhood estmator for general hdden Markov models Ann Statst (), 11-135, 199 1 Kemp B The Sleep-EDF Database http://wwwphysonetorg/physobank/database/sl eep-edf, 00 15 Shaker MM EEG waves classfer usng wavelet transform and Fourer transform IJBS 1(), 5-90, 00 1 Hanaoka M, Kobayash M, and Yamazak H Automatc sleep stage scorng based on waveform recognton method and decson-tree learnng Systems and Computers n Japan 33(11), 1-13, 00 17 Thomson DJ Spectrum estmaton and harmonc analyss Proc IEEE 70(9), 55-9, 19 1 Sha F, Perera FC Shallow parsng wth condtonal random felds HLT-NAACL 003: 13-11 19 Penzel T, Kesper Gross V, Becker HF, and Vogelmeer C Problems n automatc sleep scorng appled to sleep apnea EBMS 003: 35-31 0 Scheuer ML Contnuous EEG montorng n the ntensve care unt Eplepsa 3 Suppl 3, 11-17, 00 1 Luo G, Mn W Subject-adaptve real-tme sleep stage classfcaton based on condtonal random feld Full verson avalable as IBM research report RC30 at http://pagescswscedu/~gangluo/eeg_fullpdf