Actigraphy Scoring Reliability in the Study of Osteoporotic Fractures

Similar documents
and the San Francisco VA Medical Center. VA Medical Center, Minneapolis, Minnesota.

Sleep Disturbances and Risk of Depression in Older Men

Waiting Time Distributions of Actigraphy Measured Sleep

ORIGINAL INVESTIGATION. Actigraphy-Measured Sleep Characteristics and Risk of Falls in Older Women

Sleep Medicine 13 (2012) Contents lists available at SciVerse ScienceDirect. Sleep Medicine. journal homepage:

Utilizing Actigraphy Data and Multi-Dimensional Sleep Domains

Accuracy of Self-Reported Diagnosis of Hip Replacement

AGE-RELATED changes in sleep patterns and sleep

Education and Clinical Center d Research Institute, California Pacific Medical Center, San Francisco

Brief Report: Sleep in Parents of Children with Autism Spectrum Disorders

Chapter 2. Development of a portable device for tele-monitoring. of physical activities during sleep

Comparison between subjective and actigraphic measurement of sleep and sleep rhythms

Factor Analysis and Structural Equation Modeling of Actigraphy Derived Sleep Variables

Effect of individualized social activity on sleep in nursing home residents with dementia Richards K C, Beck C, O'Sullivan P S, Shue V M

Comparison of arbitrary definitions of circadian time periods with those determined by wrist actigraphy in analysis of ABPM data

Sleep and Falls. Katie L. Stone, Ph.D. October 4-6, 2015

Assessing Functional Status and Qualify of Life in Older Adults

The Relationships between Sleep-Wake Cycle and Academic Performance in Medical Students

Wrist Actigraph Versus Self-Report in Normal Sleepers: Sleep Schedule Adherence and Self-Report Validity

There has been a substantial increase in the number of studies. The Validity of Wrist Actimetry Assessment of Sleep With and Without Sleep Apnea

Collaborating to Develop Digital Biomarkers with Passive Data Collection

Factors Associated With Caregiver Reports of Sleep Disturbances in Persons With Dementia

Actigraphy. Description. Section: Medicine Effective Date: April 15, 2017

Pain Assessment in Elderly Patients with Severe Dementia

THE ANALYSES TO DETERMINE THE RELATIONSHIP BETWEEN SLEEPING PROBLEMS AND THE HEALTH OUTCOMES OF THE ELDER PEOPLE

Sleepiness, Napping and Health Risk in the Elderly

The clinical trial information provided in this public disclosure synopsis is supplied for informational purposes only.

Clinical Trial Synopsis TL , NCT#

The Epworth Sleepiness Scale (ESS), which asks an individual

The REM Cycle is a Sleep-Dependent Rhythm

MOST forms of dementia, particularly Alzheimer s

The Influence of Sleep on Cognition in Breast Cancer

Sleep problems are very common during childhood and

Study on sleep quality and associated psychosocial factors among elderly in a rural population of Kerala, India

A COMPARISON OF THE PITTSBURGH SLEEP QUALITY INDEX, A NEW SLEEP QUESTIONNAIRE, AND SLEEP DIARIES. Kevin J. Sethi, B.S.

Discrepancy between subjective and objective sleep in patients with depression

Sleep Disturbances and Sleep Disorders in Older Adults: Epidemiology, Identification and Associations with Inpatient Healthcare Utilization

EFFICACY OF MODAFINIL IN 10 TAIWANESE PATIENTS WITH NARCOLEPSY: FINDINGS USING THE MULTIPLE SLEEP LATENCY TEST AND EPWORTH SLEEPINESS SCALE

Actigraphy. Policy Number: Last Review: 7/2018 Origination: 7/2008 Next Review: 1/2019

November 24, External Advisory Board Members:

Medical Policy An independent licensee of the Blue Cross Blue Shield Association

Actigraphy refers to the assessment of activity patterns by devices typically placed on the wrist or ankle

Diagnostic Accuracy of the Multivariable Apnea Prediction (MAP) Index as a Screening Tool for Obstructive Sleep Apnea

The Association between Sleep Patterns and Obesity in Older Adults

Home Sleep Testing Questionnaire

Center for Chronic Disease Outcomes Research, VA Medical Center, Minneapolis, Minnesota. 2Department of Medicine and

Sleep and Motor Performance in On-call Internal Medicine Residents

Correcting distorted perception of sleep in insomnia: a novel behavioural experiment?

Medical Policy. MP Actigraphy. Related Policies None

Diabetes and Incidence of Functional Disability in Older Women

Patterns of Sleepiness in Various Disorders of Excessive Daytime Somnolence

Objectives. Disclosure. APNA 26th Annual Conference Session 2017: November 8, Kurtz 1. The speaker has no conflicts of interest to disclose

ORIGINAL INVESTIGATION. C-Reactive Protein Concentration and Incident Hypertension in Young Adults

ORIGINAL INVESTIGATION. Comparison of 2 Frailty Indexes for Prediction of Falls, Disability, Fractures, and Death in Older Women

ORIGINAL INVESTIGATION. Use of Antidepressants and Rates of Hip Bone Loss in Older Women

The Sleep Log and Actigraphy: Congruency of Measurement Results for Heart Failure Patients

The Implications of a Hospital Break Policy: A Comparison of Two Regional Hospitals Using Survey Data

Excessive Daytime Sleepiness Associated with Insufficient Sleep

Actigraphy Sleep Patterns: Relationships To Body Mass Index And Physical Activity In Minority Children

Sleep habits and their consequences: a survey. Umar A. Khan, Sara N. Pasha, Sarah K. Khokhar, Asim A. Rizvi

Web-Based Home Sleep Testing

Measuring sleep and sleepiness with mobile devices

This is a Sample version of the. watermark..

Does moderate to vigorous physical activity reduce falls?

Sleep Posture Affects Sleep Parameters Differently in Young and Senior Japanese as Assessed by Actigraphy

Sleep Patterns and Sleep-Related Factors Between Caregiving and Non-Caregiving Women

Daytime Sleepiness and Antihistamines

Is It Insomnia, Is It Hypersomnia, Is It Both? W. Vaughn McCall, MD, MS Wake Forest University Health Sciences

What Is the Moment of Sleep Onset for Insomniacs?

ATTENTION-DEFICIT/HYPERACTIVITY DISORDER, PHYSICAL HEALTH, AND LIFESTYLE IN OLDER ADULTS

ORIGINAL INVESTIGATION. A Prospective Study of Physical Activity and Cognitive Decline in Elderly Women

Effects of a rotating-shift schedule on nurses vigilance as measured by the Psychomotor Vigilance Task

Sleep-hygiene Education improves Sleep Indices in Elite Female Athletes

Changes in circadian sleep-wake and rest-activity rhythms during different phases of menstrual cycle

Sleep in late pregnancy predicts length of labor and type of delivery

Note: Diagnosis and Management of Obstructive Sleep Apnea Syndrome is addressed separately in medical policy

ALTHOUGH VITAMIN B-12 is known to influence the

Does melatonin improve sleep in older people? A randomised crossover trial

Overview of epidemiology of sleep and obesity risk

Clinical Trials in OSA

Study «CLIMSOM & Sleep Quality»

Actigraphy Correctly Predicts Sleep Behavior in Infants Who Are Younger than Six Months, When Compared with Polysomnography

Developing accurate and reliable tools for quantifying. Using objective physical activity measures with youth: How many days of monitoring are needed?

Older people increasingly reside in assisted living facilities

Background The novel hypothesis of a recently proposed theoretical model is that chronic insomnia is maintained by a cascade of cognitive processes

Body Mass Index as Predictor of Bone Mineral Density in Postmenopausal Women in India

Volitional Lifestyle and Nocturnal Sleep in the Healthy Elderly

NORAH Sleep Study External Comment Mathias Basner, MD, PhD, MSc

Algorithm for Sleep/Wake Identification From Actigraphy


Prediction of sleep-disordered breathing by unattended overnight oximetry

How Well do School-aged Children Comply with Imposed Sleep Schedules at Home?

Measuring Sleep and Activity (Cycles) Using Wrist Actigraphy

Equivalence of Activity Recordings and Derived Sleep Statistics

Change in Self-Rated Health and Mortality Among Community-Dwelling Disabled Older Women

Sensitivity and Specificity of the Minimal Chair Height Standing Ability Test: A Simple and Affordable Fall-Risk Screening Instrument

Research Article Validation of Capturing Sleep Diary Data via a Wrist-Worn Device

Assessment of Sleep Disorders DR HUGH SELSICK

Actigraphy: Validity, Reliability, and Clinical Utility

pii: jc Weill Cornell Medical College Center of Sleep Medicine, Cornell University, New York, NY; 2

Transcription:

INSTRUMENTATION AND METHODOLOGY Actigraphy Scoring Reliability in the Study of Osteoporotic Fractures Terri Blackwell, MA 1 ; Sonia Ancoli-Israel, PhD 2 ; Philip R. Gehrman, PhD 3 ; Jennifer L. Schneider, MPH 1 ; Kathryn L. Pedula, MS 4 ; Katie L. Stone, PhD 1 1 San Francisco Coordinating Center and California Pacific Medical Center Research Institute, San Francisco, CA; 2 Department of Psychiatry, University of California and Veterans Affairs San Diego Healthcare System, San Diego, CA; 3 Center for Sleep and Respiratory Neurobiology, Hospital of the University of Pennsylvania, PA; 4 Kaiser Permanente Center for Health Research, Portland, OR Study Objectives: The editing and scoring of actigraphy data are important for calculating variables that describe sleep. Scoring is dependent on marking time points for when a participant got in and out of bed, plus time when the actigraph was removed. This placement of time points is subject to error. We examined interscorer reliability to determine if files scored by 2 different people were comparable. Design: Observational study. Setting: Community-based. Participants: A subset of 36 women taken from the latest biannual visit of the Study of Osteoporotic Fractures. All women had actigraphy data scored by 1 scorer for the Study of Osteoporotic Fractures staff, plus a blinded rescoring by an expert scorer at a different site. Interventions: N/A. Measurements and Results: The outcomes of interest from actigraphy are duration of in-bed interval, total sleep time, sleep latency, sleep efficiency, wake after sleep onset, total nap time, and total daytime minutes of watch removal. Clearly documented actigraphy scoring procedures were used. There were no significant differences between the expert scorer and the study scorer in sleep outcomes (all P values >.16 from a paired t test). There was a small but statistically significant difference between scorers for watch removal times (mean absolute difference 3.4 minutes ± 5.4, P=.02). The intraclass correlation coefficients showed a high level of agreement (range, 0.84-0.99). Conclusions: Even in a large study with 2 scorers, it is possible to use actigraphy as a measure of sleep without introducing interscorer measurement error. Using well-documented scoring and data-gathering procedures are essential for data quality control. Keywords: Actigraphy, reliability, sleep, scoring Citation: Blackwell T; Ancoli-Israel S; Gehrman PR et al. Actigraphy scoring reliability in the study of osteoporotic fractures. SLEEP 2005;28(12): 1599-1603. INTRODUCTION ACTIGRAPHS ARE MOVEMENT-RECORDING DEVICES THAT HAVE BEEN USED FOR MORE THAN 25 YEARS IN THE STUDY OF SLEEP AND CIRCADIAN RHYTHMS. The advantage of actigraphy over traditional polysomnography is that actigraphs can record data continuously, for weeks or longer. The actigraph is generally placed on the wrist, and movement is recorded via an accelerometer and scored with validated algorithms to infer sleep and wake. 1 The editing and scoring of actigraphy data is an important step in the process of calculating variables that describe sleep quality and quantity, including sleep latency, sleep efficiency, time spent in bed, time spent out of bed, time spent napping, and total sleep time while in bed. Scoring is dependent on correctly marking time points to denote when the participant got in and out of bed and deleting times from the record when the participant removed the actigraph, such as for bathing. Point placement relies on the selfreporting in a sleep diary or log by the participant or an observer. As reviewed in Ancoli-Israel et al, 1 studies have examined the Disclosure Statement This was not an industry supported study. Ms. Blackwell has received support from Eli Lilly. Dr. Ancoli-Israel is a member of the speakers bureau, scientific advisory board, and/or consultant for Neurocrine Pfizer, Sanofi-Aventis, King Pharmaceuticals, Takeda, and Sepracor. Drs. Gehrman, Stone and Ms. Schneider, Pedula have indicated no financial conflicts of interest. Submitted for publication February 2005 Accepted for publication August 2005 Address correspondence to: Terri Blackwell, San Francisco Coordinating Center, 185 Berry Street, Lobby 4, Suite 5700, San Francisco, CA 94107; Tel: (415) 600-7412; Fax: (415) 514-8150; E-mail: tblackwell@psg.ucsf.edu 1599 correlations between actigraphy and polysomnography, considered the gold standard for measuring sleep, resulting in high correlations for differentiating sleep from wake. 2,3 Correlations for total sleep time have been shown to be 0.97, 2 with overall agreement ranging from 91% to 93% in adults (age 20-30 years) 4 and minute-by-minute agreement rates ranging from 91.4% to 96.5% in adolescents (aged 10-16 years) and adults (aged 20-30 years). 5 Actigraphy has also been shown to be valid for assessing sleep durations and sleep/wake activity in healthy adults but is less reliable for more-specific measures, such as sleep offset or sleep efficiency. 6 In nursing-home populations, where it is impossible to use sleep diaries, correlation between actigraphy and polysomnography for total sleep time has ranged from 0.81 to 0.91 and for percentage of sleep, from 0.61 to 0.78. 7 Scoring of actigraphs often includes a subjective component, particularly when sleep diaries are incomplete, inaccurate, or not available. Yet, to our knowledge, no studies have ever compared interrater reliability between actigraph scorers. As part of a larger study, the Study of Osteoporotic Fractures (SOF), we had the opportunity to compare actigraphy scoring between raters in a subset of 36 women ranging in age from 77 to 92 years, some of whom were frail or cognitively impaired. This study compared results from scorers at the San Francisco Coordinating Center (SFCC) with those from an experienced scorer at the University of California San Diego (UCSD). METHODS Participants The SOF is a longitudinal study designed to examine the risk factors of osteoporotic fractures in women. Community-dwelling women aged 65 years or older were recruited from population-based listings in 4 United States cities: Baltimore, Maryland;

Minneapolis, Minnesota; Portland, Oregon; and the Monongahela Valley, Pennsylvania. At the baseline visit, women were excluded if they were unable to walk without help or had a previous bilateral hip replacement. The SOF study enrolled 9704 Caucasian women from September 1986 to October 1988.8 Initially African American women were excluded from the study because of their low incidence of hip fractures, but, from February 1997 to February 1998, 662 African American women were recruited.9 The focus of our analysis is the most recent biannual visit, which took place between January 2002 and April 2004. There were a total of 4727 participants at this visit: 3137 (66%) visited a study clinic for performance measures, anthropometry, and clinic interview, 1051 (22%) had self-administered questionnaire data only, and 539(11%) had a limited visit done in their homes. At this biannual visit, we introduced measures of sleep into the established SOF protocol. Actigraphy data were collected on all consenting participants who completed a clinic or home visit. Inhome polysomnography was collected in a convenience sample of 461 women at 2 of the clinics. Questionnaire information regarding sleep habits was also gathered. The institutional review boards at each clinic site approved the study, and written informed consent was obtained from all participants. Actigraphy Equipment The Sleepwatch (Ambulatory Monitoring, Inc., Ardsley, NY) was used. This actigraph, which looks like a wristwatch, measures movement using a piezoelectric biomorph-ceramic cantilevered beam, which generates a voltage each time the actigraph is moved. These voltages are gathered continuously and stored in 1-minute epochs. Data were collected in the 3 modes of zero crossings, digital integration (also known as proportional integration mode), and time above threshold. In zero crossing mode, the number of times the signal voltage crosses zero voltage is summed over the epoch. Digital integration mode is a high-resolution measurement of the area under the rectified conditioned transducer signal (area under the curve). In the time above threshold mode, the amount of time in tenths of a second spent above the sensitivity threshold is gathered over the epoch.10 ActionW-2 software (Ambulatory Monitoring, Inc.) was used to analyze the data.11 Sleep-scoring algorithms available in this software were used to determine sleep from wake times. The Cole-Kripke algorithm was used for data collected in the zero crossing mode, and the UCSD scoring algorithm was used for data collected in the digital integration and time above threshold modes.12,13 These algorithms calculate a moving average, which takes into account the activity levels immediately prior to and after the current minute, to determine if the time point should be coded as sleep or wake. participants, as was as an information sheet with a contact phone number to call for further information or questions. The participant was told to wear the actigraph at all times for at least 3 consecutive 24-hour periods, except when bathing or during water sports. The participants were asked to complete a sleep diary for the duration of time they wore the actigraph. The data gathered on the diary included time in and out of bed, time they thought they fell asleep and awoke, number of times they thought they woke during the night, information about naps, time and circumstances for removing the actigraph, and any times during which they may have been sitting still for long periods (like watching television or at the movies). The 2 actigraphy scorers at the SFCC were introduced to scoring methods by an experienced scorer (UCSD sleep disorders laboratory, Dr. Ancoli-Israel director). Manuals of operation were developed based on this guidance and were used at the SFCC. Guidance on how to make judgment calls when diary information was incomplete or inaccurate was included in these manuals. Points were placed on the computer file to mark the intervals the participants were in bed and the times the device was removed. An example of point placement is shown in Figure 1. Actigraph removals for reasons that required wakefulness, such as bathing or water sports, were coded as awake. Those actigraph removals listed on the diary that did not require wakefulness or did not include information on why the watch was removed were deleted from the analysis. If the data suggested that the actigraph had been removed but no information was collected on the diary, these timepoints were deleted from the analysis. Data collection began during the clinic or home exam. It was assumed that it might take participants a small amount of time to become accustomed to wearing the device, so data collected prior to the first in-bed interval were removed from analysis in case they did not represent normal activity levels (approximately 8 hours of data). Each scored actigraph file was graded for quality. Each point placement was graded as excellent, good, fair, or poor based on how well the participant s self-report from the diary coincided with data that appeared in the Action-W data file (see Table 1 for 1800 0000 0600 1800 0000 0600 Actigraphy Data Collection and Scoring All clinic staff gathering actigraphy data were required to go through formal centralized training by SFCC staff and pass a certification test before being allowed to oversee collection of data. At the clinic or home visit, the actigraph was placed on the woman s nondominant wrist by clinic staff. If needed, a terrycloth wristband was placed between the device and the skin to prevent chafing.14 A verbal explanation of the actigraph was given to the SLEEP, Vol. 28, No. 12, 2005 1600 Downloaded from https://academic.oup.com/sleep/article-abstract/28/12/1599/2708055 Figure 1 Actigraph file before and after point placement by scorer. Points were placed to denote an in-bed interval from 9:35 PM to 5:44 AM on the following day. Points were placed to remove data when the actigraph was off from 6:24 AM to 6:56 AM. SFCC refers to San Francisco Coordinating Center; UCSD, University of California San Diego.

Table 1 Grading Levels for Actigraph File Quality* Grading for times in and out of bed Excellent (4) Sleep diary matches data within 10 min. Good (3) Sleep diary matches data within 30 min. Fair (2) Sleep diary matches data within 30 min to 1 h. Poor (1) Sleep diary is inconsistent with data by 1 h. Grading for watch removals Excellent (4) Sleep diary matches data within 5 min. Good (3) Sleep diary matches data within 15 min. Fair (2) Sleep diary matches data within 30 min. Poor (1) Sleep diary is inconsistent with data by 30 min or has removals that are not listed on the diary. *Comparing how well the actigraph data file corresponded with the participants self-report data from the sleep diary. grading definitions). If information was not present on the diary, the point placement was given the lowest possible score. Summary variables were created as the average of these grading scores across all days for in bed, out of bed, and removal points. These summary scores ranged from 1 to 4, with higher values representing better quality. Actigraphy Data Rescoring The scorer at UCSD was sent a copy of the SFCC s manuals of operation for data editing as well as raw actigraph data files and their corresponding sleep diaries. The UCSD scorer was blinded to the results of the SFCC scoring. All scoring was done using the same software and sleep-scoring algorithms. Outcomes of Interest The outcomes of greatest interest were those affected by decisions made by the scorer, including sleep latency (the time it took the participant to fall asleep once they were in bed), sleep efficiency (the percentage of time the participant was asleep while in bed), time in bed, time spent sleeping while in bed, time spent napping, and minutes of wake after sleep onset (WASO). The amount of time the actigraph was removed during the day, which may effect the calculation of the napping variable, was also examined. Outcomes were calculated for each day the participant wore the actigraph. The average of these variables across all days was also calculated. Using these averaged data, the differences between the 2 scorers are presented to show direction, and absolute difference is presented to show the magnitude. Skewed outcomes (total nap time, sleep latency, watch removals, and WASO) were log-transformed to meet normality assumptions. Other Measurements All participants completed questionnaire data, which included questions about medical history, self-reported health, physical activity, smoking, and alcohol and caffeine use. The Geriatric Depression Scale was used to assess depressive symptoms, with the standard cutoff of 6 or more symptoms used to define depression. 15 The Pittsburgh Sleep Quality Index 16,17 and the Epworth Sleepiness Scale 18 were completed. During the home or clinic visits, current medication use was assessed. The Mini-Mental State Examination was administered to assess cognitive function, 19 and Table 2 Actigraphy Data Collection N (%) Eligible for actigraphy 3676 Those with actigraphy data 3127 (85.0) Reason no actigraphy data* Participant not given actigraph 457 (83.2) Data not sent site decision 9 (1.6) Actigraph malfunction 26 (4.7) Software or initialization problem 14 (2.6) Participant removed watch (too little data) 43 (7.8) *Percentage of the 549 subjects who did not have usable data the Functional Outcomes of Sleep Questionnaire 20 was completed. Information on 6 independent activities of daily living was also gathered. 21,22 Body weight and height were measured, 23 and body mass index was calculated as weight in kilograms divided by the square of the height in meters. Statistical Analysis Analyses were preformed with data collected in digital integration, zero crossing, and time above threshold modes. Results were similar for all 3 modes, and, therefore, only results from the digital integration mode are presented. Our sample size was based on a t test performed on the difference between the original SFCC scorer and the gold standard rescorer for all outcomes. We designed our study to have 80% power to detect an effect size of 0.5 or greater with an α level of.05. This would mean that the detectable difference in the means between the original scorer and the gold standard scorer was half that of the standard deviation of the measure being compared. The subset of data was randomly selected by site using a random number generator. We compared characteristics between this randomly selected subset of women and the remaining SOF population to determine if the results based on the subset were generalizable to the entire SOF cohort. For continuous variables, t tests were used for those with normal distributions, and Wilcoxon rank sum tests were used for those with skewed distributions. Differences in categorical data were examined using χ 2 tests. Fisher exact tests were used when categorical variables had low expected cell counts. Only those with actigraphy data are included in these comparisons. We also examined if these characteristics were related to the absolute difference between the 2 scorers for our outcomes of interest by performing linear regression with the absolute difference variables as outcomes and the characteristics as predictors. The main focus of the analyses examining the differences between scorers used the data averaged over all nights the participants wore the actigraph. This is the primary way the data is used in analysis in the SOF study, in order to limit the night-to-night variability. The differences between the gold standard and the SFCC scorers were examined using paired t tests on the average of the data over all days the woman wore the watch. Agreement between the scorers was assessed with intraclass correlation coefficients (ICC) and 95% confidence intervals (CI), which were computed using a 2-way analysis of variance. 24 Bland and Altman plots were examined to assess systematic bias in the differences. 25 An additional analysis using random coefficients models was per- 1601

Figure 2 Bland and Altman Plots. SFCC refers to San Francisco Coordinating Center; UCSD, University of California San Diego; CI, confidence interval. formed on the individual day-by-day data from each participant. The model used was y ijk = µ + day k + participant i + scorer j + participant day (ik) + e ijk where i = participant, j = scorer, and k= the day of the record. Day is a fixed effect, while the participant and scorer terms are random effects. With this model, the grand mean is represented by µ, the day fixed effect represents the systematic offset from the grand mean that is attributed to time (which day it is), the participant random effect represents the within-participant variance, the scorer random effect represents the between-scorer variance, and the remaining variance estimated by e ijk represents random error.the summary measures we are interested in from this model are the coefficient of variation (CV) between scorers. These were calculated as 100 (standard deviation between scorers)/grand mean. The 95% CI for these CVs were obtained using a bootstrap approach. A similar CV for within participant was also calculated. RESULTS Subject Characteristics Of the 3676 women eligible, 3219 (87.6%) wore the actigraph, and 3127 (85.0%) had usable data (Table 2). Of those without data, 457 (83.2%) were not given the device. Reasons the watch was not distributed were not gathered, but clinic staff reported participant refusal and staff decision based on advanced frailty or cognitive impairment of the participant as the most common reasons. Indeed, the average Mini-Mental State Examination score was worse and the rate of having at least 1 independent activities of daily living impairment was higher for these 457 without actigraphy data compared with those given the watch (P<.001). Participant error accounted for 7.8% of the missing data, caused 1602 Table 3 Characteristics of Women with Actigraphy Data Characteristic Population P value* Reliability Remaining Subset (n=36) SOF (n=3091) Age, y 82.3 ± 3.1 83.6 ± 3.8.04 Body mass index, kg/m 2 26.9 ± 3.8 27.0 ± 5.0.77 African American, no. (%) 1 (2.8) 327 (10.6).17 One or more IADL, no. (%) 21 (58.3) 1622 (52.8).51 Hand tremor, no. (%) 2 (6.1) 268 (9.3).52 Currently taking, no. (%) Medication for sleep 6 (16.7) 427 (13.8).63 Antidepressants 3 (8.3) 424 (13.7).47 Benzodiazepines 3 (8.3) 224 (7.3).74 GDS 6, no. (%) 5 (13.9) 362 (11.7).61 MMSE, score range 0-30 28.0 ± 1.6 27.8 ± 2.0.97 Education, y 12.7 ± 2.6 12.8 ± 2.7.92 Current smoker, no. (%) 1 (2.8) 84 (2.7) 1.00 Average no. alcoholic drinks 0.7 ± 0.8 0.5 ± 0.7.15 Daily caffeine intake, mg 158.9 ± 155.0 150.0 ± 153.5.75 Takes walks for exercise, no. (%) 18 (51.4) 1130 (37.0).08 Self-reported health, no. (%).27 Poor/very poor 2 (5.6) 69 (2.2) Fair 10 (27.8) 685 (22.2) Good/very good 24 (66.7) 2433 (75.6) ESS, score range 0-24 4.8 ± 3.1 5.7 ± 3.8.16 FOSQ, score range 4-16 15.4 ± 1.0 15.3 ± 0.8.03 PSQI, score range 0-21 5.9 ± 3.7 6.3 ± 3.7.40 Data are presented as mean ± SD unless otherwise indicated. SOF refers to Study of Osteoporotic Fractures; BMI, body mass index; IADL, independent activities of daily living; GDS, Geriatric Depression Scale; MMSE, Mini-Mental State Examination; ESS, Epworth Sleepiness Scale; FOSQ, Functional Outcomes of Sleep Questionnaire; PSQI, Pittsburgh Sleep Quality Index. *P values for continuous data are from a t test for normally distributed data and a Wilcoxon rank sum test for skewed data. P values for categorical data are from a χ 2 test. Those where cells have low expected counts are from a Fisher exact test. Number of drinks per day in the last 30 days. by the participant removing the device before 1 full night of data could be collected. Participants who had failed data collection were asked to rewear the device. Twenty-four participants included in the successful actigraphy count had a first wearing with unusable data and a second wearing with usable data. Including these 24 participants who initially had failed data, our total datacollection failure rate among those given an actigraph was 3.6%. Our goal was to collect at least 3 nights of data for each participant. The average number of nights was 4.1 ± 0.8, with 92 (2.9%) having only 1 to 2 nights (62 with 2 nights, 30 with 1 night). Of the 3127 with data, 75 had data gathered in the zero crossing mode only. In our analysis subset, all 36 participants had data collected for at least 3 nights, and the data were gathered in all 3 modes. A few characteristics were significantly different (P<.05) between the reliability subset of 36 and the remaining SOF cohort with actigraphy data (Tables 3 and 4). The reliability subset was younger, on average, by 1.3 years and had a better quality of sleep diaries. They also had slightly better sleep efficiency (3.9%), fewer minutes of WASO (17.3), and better scores on the Functional Outcomes of Sleep Questionnaire (0.1) on average. Few characteristics were related to the absolute differences be-

Table 4 Summary of Actigraphy Data Source Population P value* Reliability Remaining Subset (n=36) SOF (n=3091) Actigraphy file information Nights of data, no. 4.2 ± 0.6 4.1 ± 0.8.35 In-bed intervals, duration in min 508.3 ± 69.0 528.4 ± 78.1.13 Quality of in-bed points Mean ± SD, range 1-4 3.7 ± 0.5 3.2 ± 0.7 <.001 Totals coded as, discrepancy in min Excellent, < 10 109 (73.2) 4373 (35.2) Good, 10-30 24 (16.1) 5550 (44.6) Fair, 30-60 12 (8.1) 1242 (10.0) Poor, > 60 4 (2.7) 1267 (10.2) Quality of out-of-bed points Mean ± SD, range 1-4 3.5 ± 0.5 3.2 ± 0.7.004 Totals coded as, discrepancy in min Excellent, < 10 98 (65.8) 4877 (39.2) Good, 10-30 28 (18.8) 5078 (40.9) Fair, 30-60 17 (11.4) 1203 (9.7) Poor, > 60 6 (4.0) 1274 (10.3) Quality of watch-removal points Mean ± SD, range 1-4 3.4 ± 0.9 2.8 ± 1.0 <.001 Totals coded as, discrepancy in min Excellent, < 5 77 (66.4) 3984 (41.5) Good, 5-15 13 (11.2) 1113 (11.6) Fair, 15-30 8 (6.9) 2058 (21.4) Poor, > 30 18 (15.5) 2455 (25.6) Actigraph recording represented 32 (88.9) 2755 (89.5).79 normal sleep Actigraphy data averaged over all days Total sleep time, min 410.7 ± 72.9 405.1 ± 77.9.67 Total nap time, min 58.6 ± 42.7 76.3 ± 66.0.20 Sleep efficiency, % 81.0 ± 9.4 77.1 ± 12.0.04 Sleep latency, min 35.9 ± 29.5 41.9 ± 41.6.35 Wake after sleep onset, min 60.9 ± 36.8 78.2 ± 48.3.02 Watch removal time during 28.9 ± 30.7 30.8 ± 42.3.73 the day, min *P values for continuous data are from a t test for normally distributed data and a Wilcoxon rank sum test for skewed data. P values for categorical data are from a χ 2 test. Those in which cells have low expected counts are from a Fisher exact test. Data are presented as mean ± SD. Data are presented as number (percentage). The totals represent the total number of nights and days the actigraph was worn during the period in which the point placement falls into these categories; the denominator is the total number of nights and days the actigraph was worn. For in-bed and out-of-bed point placement, the total number =149 nights for the reliability subset; n=12432 for the remaining SOF population. For watchremoval points, the total number = 116 days for the reliability subset; n = 9610 for the remaining SOF population. tween the scorers for our outcomes of interest. The quality of the in-bed and removal points were negatively related to the difference of total watch-removal time between the scorers (P <.05). For each 1-unit increase in the quality of the in-bed points, there was a 3.8-minute decrease in the absolute difference between the scorers; for each 1-unit increase in quality of removal points, there was a 2.6-minute decrease in the absolute difference. All other characteristics, including age, ethnicity, and body mass index were not related to the absolute differences examined (P >.05). 1603 Table 5 Comparison of SFCC and UCSD Scorers on the Actigraphy Outcomes Averaged Over All Days and Nights Variable Difference * Absolute P value ICC (95% CI) Difference Duration of in-bed 1.9 ± 8.2 2.6 ± 8.0.17 0.99 (0.99, 1.00) interval, min Total sleep time, min -0.6 ± 24.5 6.0 ± 22.7.89 0.95 (0.90, 0.97) Total nap time, min -0.7 ± 8.6 3.2 ± 8.0.99 0.98 (0.97, 0.99) Sleep efficiency, % -0.5 ± 5.6 1.7 ± 5.4.59 0.84 (0.71, 0.92) Sleep latency, min 4.9 ± 19.9 5.6 ± 19.7.21 0.88 (0.77, 0.94) Wake after sleep 0.6 ± 2.0 0.6 ± 2.0.16 0.99 (0.98, 1.00) onset, min Watch-removal time 2.4 ± 5.9 3.4 ± 5.4.02 0.93 (0.87, 0.96) during the day, min * Difference refers to data from the scorers at the San Francisco Coordinating Center (SFCC) minus those from an experienced scorer at the University of California San Diego (UCSD). Data are presented as mean ± SD from 36 pairs. ICC refers to intraclass correlation coefficient; CI, confidence interval. P value is from a t test on the paired data. Tests were preformed on logtransformed data for naps, sleep latency, wake after sleep onset, and watch removal times due to skewness. Interrater Reliability There were no significant differences in sleep outcomes between the SFCC scorers and the gold-standard UCSD scorer for data that were averaged over all days the participant wore the actigraph (Table 5). The absolute differences between the scorers were negligible. Examining the Bland and Altman plot for total sleep time shows no discernable patterns that would imply a systematic bias affecting the differences (Figure 2a). There were 2 noticeable points with differences over 80 minutes. One participant had more than 2 hours during an in-bed interval that was not noted on the sleep diary as being a removal but appeared questionable. One scorer removed this section of time from the analysis, the other scorer did not and left it scored as sleep. The second was an issue with interpretation of the scoring protocol. Our participants were told to remove the watch at 9:30 AM after the last night of data collection. Initially, we had the scoring protocol listed to end the file at 9:30 AM on the last morning. This participant removed the watch to return to the clinic at 7:30 AM. One scorer deleted the last night of data, in order to end the file at 9:30 AM, while the other ended the file at 7:30 AM after the last night. Our protocols were rewritten to clarify the end of the data file as 9:30 AM on the last morning or the last usable data point to avoid this problem in the future, and all scored files were examined for similar problems. Plots for the other sleep outcomes were examined and showed no discernible patterns (data not shown). For the duration of night-time interval, total sleep time, and sleep latency, there were data from 27 participants for which the 2 scorers matched exactly, 3 for which the SFCC scorer underestimated, and 6 for which the SFCC scorer overestimated. For sleep efficiency, 24 matched exactly, 7 the SFCC scorer underestimated, and 5 the SFCC scorer overestimated. For WASO, 32 matched exactly, with 4 overestimated by the SFCC scorer. For total nap time, 24 matched exactly, with the SFCC scorer underestimating 10 and overestimating 2. The intervals scored as times the participant removed the watch showed the largest difference between the 2 scorers (3.4 ± 5.4-minute absolute difference). This difference was statistically

significant (P=.02), however these differences did not affect the napping data. The Bland and Altman plot showed no systematic bias affecting the differences (Figure 2b). Of the 36 pairs of data, 29 matched within 5 minutes. For 1 file, the SFCC underestimated the watch-removal time when compared with the UCSD scorer; for 7 files, the 2 scorers matched exactly; and for 28 files, the SFCC scorer denoted more time as a watch removal time, although 22 of the 28 were within 2 minutes of the UCSD scorer. The ICC can range from 0 to 1, with a higher number representing better agreement between raters. Our ICCs for all outcomes were near 1 (range 0.84-0.99), showing almost complete agreement (Table 5). In our random coefficients analysis, the variation between scorers was so low for all outcomes that it could not be modeled. DISCUSSION Within the SOF study, a large number of actigraphy files were sent to the SFCC for editing. The high volume resulted in the need for more than 1 staff member to be responsible for scoring. It was important, therefore, that there was a scoring manual that allowed scorers to reproduce the same results, given the same sleep diaries and actigraph data files. Currently, we are using the outcomes averaged over the days the participant was wearing the watch. In our reliability study, we found no statistically significant difference between the 2 scorers for any of the sleep outcomes we use for analysis. There was a small but statistically significant difference in watch removals between the scorers. Placement of points for watch removals generally requires more guesswork by the scorer because participants tend to keep less-accurate records of removals than of in-bed intervals. Even so, the outcome that would be most affected by this, total nap time, was not significantly different. The between-scorer variation from the random coefficients models using the day-by-day data show that the variation attributed to the scorer is so small it cannot be modeled. In contrast, the within-participant variation is large enough to be modeled. For example, the within-participant coefficient of variation for total sleep time was 16.7%, and the CV for sleep efficiency was 11.4% (data not shown). This shows that the variance attributed to interscorer measurement error is negligible when compared with the variability introduced into the sleep measurements from the daily variation within the participant. The ICCs, which assess unsystematic variation due to the scorer, also show almost complete agreement between the scorers. These results demonstrate that, even in a large-scale study involving 2 scorers, actigraphy can be used as a measure of sleep and wake without introducing significant interscorer measurement error. The use of clearly operationalized and documented scoring procedures, as well as rigid training criteria for staff, are essential for quality control of data. Of note, the interscorer reliability in this study was better than that typically reported in studies utilizing polysomnography, which range from 80% to 98% for sleep-scoring interrater agreement. 26 This suggests that polysomnography, although considered the gold standard for measuring sleep, introduces a greater degree of measurement error when multiple scorers are used. Our study has several limitations. Our analysis subset was not completely comparable with the rest of the SOF study with actigraphy data. The subset was slightly younger and had a higher quality of sleep diaries, better sleep efficiency, less WASO, and higher Functional Outcomes of Sleep Questionnaire scores than the remaining population. These were small but statistically significant differences. We conducted our reliability study within the first 6 months of our biannual visit. It can be speculated that these women who were seen first were the higher functioning women in that there were no home visits in the reliability subset, as compared with 9.5% in the remaining SOF population with actigraphy data (P=.04). We only have 2 scorers at SFCC to compare with the expert scorer at UCSD. No comparisons of the 2 scorers at SFCC were made. We conclude that, with careful training and adherence to protocols, actigraphy data can be reliably edited and scored using more than 1 scorer. This was true even when scoring data from elderly women, some of whom had mild to moderate cognitive impairments. ACKNOWLEDGEMENTS Investigators in the Study of Osteoporotic Fractures Research Group: San Francisco (Coordinating Center): SR Cummings (principal investigator), DM Black (co-investigator) KL Stone (co-investigator), DC Bauer (co-investigator), MC Nevitt (coinvestigator), W Browner, (co-investigator), J Schneider (project director), R Benard, T Blackwell, P Cawthon, M Dockrell, S Ewing, C Fox, R Fullman, D Kimmel, S Litwack, LY Lui, J Maeda, L Nusgarten, L Palermo, M Rahorst, C Schambach, R Scott, D Tanaka, C Yeung, J Ziarno. University of Maryland: M C Hochberg (principal investigator), L Makell, (coordinator), MA Walsh, B Whitkop. University of Minnesota: KE Ensrud (principal investigator), M Homan (co-investigator), C Quinton (clinic coordinator), C Bird, D Blanks, C Burckhardt, F Imker- Witte, K Jacobson, D King, K Knauth, N Nelson. University of Pittsburgh: JA Cauley (principal investigator), LH Kuller (coprincipal investigator), J Zmuda (co-investigator), L Harper (project director), L Buck (clinic coordinator), C Bashada, W Bush, D Cusick, A Flaugh, A Githens, M Gorecki, D Moore, M Nasim, C Newman, N Watson. The Kaiser Permanente Center for Health Research, Portland, Oregon: T. Hillier (principal investigator), E. Harris (co-investigator), E. Orwoll (co-investigator), K Vesco (co-investigator), J Van Marter (project administrator), M Rix (clinic coordinator), J Wallace, K Snider, T Suvalcu-constantin, A MacFarlane, K Pedula, J Rizzo. REFERENCES 1. Ancoli-Israel S, Cole R, Alessi CA, Chambers M, Moorcroft WH, Pollak CP. The role of actigraphy in the study of sleep and circadian rhythms. Sleep 2003;26:342-92. 2. Jean-Louis G, von Gizycki H, Zizi F, et al. Determination of sleep and wakefulness with the actigraph data analysis software (ADAS). Sleep 1996;19:739-43. 3. Blood ML, Sack RL, Percy DC, Pen JC. A comparison of sleep detection by wrist actigraphy, behavioral response, and polysomnography. Sleep 1997;20:388-95. 4. Jean-Louis G. Sleep estimation from wrist movement quantified by different actigraphic modalities. J Neurosci Methods 2001;105:185-91. 5. Sadeh A, Sharkey KM, Carskadon MA. Activity-based sleep-wake identification: An empirical test of methodological issues. Sleep 1994;17:201-7. 6. Reid K, Dawson D. Correlation between wrist activity monitor and 1604

electrophysiological measures of sleep in a simulated shiftwork environment for younger and older subjects. Sleep 1999;22:378-85. 7. Ancoli-Israel S, Clopton P, Klauber MR, Fell R, Mason WJ. Use of wrist activity for monitoring sleep/wake in demented nursing home patients. Sleep 1997;20:24-7. 8. Cummings SR, Black DM, Nevitt MC, et al. Appendicular bone density and age predict hip fracture in women: the Study of Osteoporotic Fractures Research Group. JAMA 1990;263:665-8. 9. Vogt MT, Rubin DA, Palermo L, et al. Lumbar spine listhesis in older African American women. Spine J 2003;3:255-61. 10. Motionlogger User s Guide: Act Millennium. Ambulatory Monitoring, Inc. Ardsley, NY. 11. Action-W User s Guide, Version 2.0. Ambulatory Monitoring, Inc. Ardsley, NY. 12. Cole RJ, Kripke DF, Gruen W, Mullaney DJ, Gillin JC. Automatic sleep/wake identification from wrist actigraphy. Sleep 1992;15:461-9. 13. Girardin JL, Kripke DF, Mason WJ, Elliot JA, Youngstedt SD. Sleep estimation from wrist movement quantified by different actigraphic modalities. J Neurosci Methods 2001;105:185-91. 14. Ancoli-Israel S. Actigraphy. In: Kryger MH, Roth T, Dement WC, eds. Principles and Practice of Sleep Medicine. 3 rd ed. Philadelphia: WB Saunders; 2000:1295-301. 15. Sheikh JI, Yesavage JA. Geriatric Depression Scale (GDS): recent evidence and development of a shorter version. Clin Gerontol 1986;5:165-73. 16. Buysse DJ, Reynolds CF 3rd, Monk TH, Berman SR, Kupfer DJ. The Pittsburgh Sleep Quality Index: a new instrument for psychiatric practice and research. Psychiatry Res 1989;28:193-213. 17. Buysse DJ, Reynolds CF 3 rd, Monk TH, Hoch CC, Yeager AL, Kupfer DJ. Quantification of subjective sleep quality in healthy elderly men and women using the Pittsburgh Sleep Quality Index (PSQI). Sleep 1991;14:331-8. 18. Johns MW. A new method for measuring daytime sleepiness: the Epworth Sleepiness Scale. Sleep 1991;14:540-5. 19. Folstein MF, Robins LN, Helzer JE. The Mini-Mental State Examination. Arch Gen Psychiatry 1983;40:812. 20. Weaver TE, Laizner AM, Evans LK, Maislin G, Chugh DK, Lyon K, Smith PL, Schwartz AR, Redline S, Pack AI, Dinges DF. An instrument to measure functional status outcome for disorders of excessive sleepiness. Sleep 1997;20):835-43. 21. Fitti JE, Kovar MG. The supplement on aging to the 1984 National Health Interview Survey. Vital & Health Statistics-series 1: Programs & collection procedures. 1987;21:1-115. 22. Pincus T, Summey JA, Soraci SA Jr, Wallston KA, Hummon NP. Assessment of patient satisfaction in activities of daily living using a modified Stanford Health Assessment Questionnaire. Arthritis Rheum 1983;26):1346-53. 23. Lohman TG, Roche AF, Mavtorell R. Anthropometric Standardization Reference Manual. Champaign: Human Kinetics Books; 1988:177. 24. Shrout PE, Fleiss LJ. Interclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420-8. 25. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1:307-10. 26. Ogilvie RD, Wilkinson RT. Behavioral versus EEG-based monitoring of all-night sleep/wake patterns. Sleep 1988;11:139-55. 1605