Practical Use of Rating Scales for Mood and Anxiety Disorders Vytas Velvyis, MSc, PhD Candidate

Similar documents
An adult version of the Screen for Child Anxiety Related Emotional Disorders (SCARED-A)

The Concept of Validity


how good is the Instrument? Dr Dean McKenzie

INSTRUCTION MANUAL Instructions for Patient Health Questionnaire (PHQ) and GAD-7 Measures

NIH Public Access Author Manuscript J Gambl Stud. Author manuscript; available in PMC 2011 December 1.

Assessment in Integrated Care. J. Patrick Mooney, Ph.D.

Validity and reliability of measurements

Gambling Decision making Assessment Validity

Acute Stabilization In A Trauma Program: A Pilot Study. Colin A. Ross, MD. Sean Burns, MA, LLP

11-3. Learning Objectives

Overview of Experimentation

Reliability AND Validity. Fact checking your instrument

Using Rating Scales in a Clinical Setting

ORIGINAL RESEARCH Key Words: psychometric evaluation, obsessive-compulsive disorder, co-morbidity, assessment

Supplementary Online Content

Integrated Care for Depression, Anxiety and PTSD. Introduction: Overview of Clinical Roles and Ideas

Assessing the quality of environmental design of nursing homes for people with dementia: development of a new tool

Making a psychometric. Dr Benjamin Cowan- Lecture 9

Reliability and Validity checks S-005

Relative Benefits of Narrowband and Broadband Tools for Behavioral Health Settings

Validity and reliability of measurements

The Bengali Adaptation of Edinburgh Postnatal Depression Scale

Family Assessment Device (FAD)

A STUDY OF PREVALENCE OF MAJOR DEPRESSIVE DISORDER IN PATIENTS PRIMARILY HAVE HEADACHE AS A SYMPTOM

THE HAMILTON Depression Rating Scale

ORIGINAL INVESTIGATION. A Brief Measure for Assessing Generalized Anxiety Disorder

ADMS Sampling Technique and Survey Studies

Test Validity. What is validity? Types of validity IOP 301-T. Content validity. Content-description Criterion-description Construct-identification

Utility and limitations of PHQ-9 in a clinic specializing in psychiatric care

CROSS-SECTIONAL STUDY AMONG MEDICAL STUDENTS IN LATVIA: DIFFERENCES OF MENTAL SYMPTOMS AND SOMATIC SYMPTOMS AMONG LATVIAN AND INTERNATIONAL STUDENTS

Table S1. Search terms applied to electronic databases. The African Journal Archive African Journals Online. depression OR distress

Rating Scales. Chris Smart

Measures. David Black, Ph.D. Pediatric and Developmental. Introduction to the Principles and Practice of Clinical Research

King s Research Portal

Convergent Validity of a Single Question with Multiple Classification Options for Depression Screening in Medical Settings

Clinical Significance of Anxiety in Depressed Patients Selecting an Antidepressant

Manual Supplement. Posttraumatic Stress Disorder Checklist (PCL)

Importance of Good Measurement

CHAPTER 2 CRITERION VALIDITY OF AN ATTENTION- DEFICIT/HYPERACTIVITY DISORDER (ADHD) SCREENING LIST FOR SCREENING ADHD IN OLDER ADULTS AGED YEARS

Diagnosis of Mental Disorders. Historical Background. Rise of the Nomenclatures. History and Clinical Assessment

INTEGRATING REALISTIC RESEARCH INTO EVERY DAY PRACTICE

DEPRESSION SELF RATING SCALE FOR CHILDREN INSTRUCTIONS

INTERQUAL BEHAVIORAL HEALTH CRITERIA ADOLESCENT PSYCHIATRY REVIEW PROCESS

Correspondence of Pediatric Inpatient Behavior Scale (PIBS) Scores with DSM Diagnosis and Problem Severity Ratings in a Referred Pediatric Sample

Hubley Depression Scale for Older Adults (HDS-OA): Reliability, Validity, and a Comparison to the Geriatric Depression Scale

Causes of Treatment Failure

Identifying Adult Mental Disorders with Existing Data Sources

Conners 3. Conners 3rd Edition

Appendix B: Screening and Assessment Instruments

Expanding Behavioral Health Data Collection:

RATING MENTAL WHOLE PERSON IMPAIRMENT UNDER THE NEW SABS: New Methods, New Challenges. CSME/CAPDA Conference, April 1, 2017

Preliminary Reliability and Validity Report

The use of Autism Mental Status Exam in an Italian sample. A brief report

Review of Various Instruments Used with an Adolescent Population. Michael J. Lambert

Catastrophic Impairment:

Differentiating Anxiety and Depression: A Test of the Cognitive Content-Specificity Hypothesis

Anxiety and Depression in Saudi Patients with Traumatic Spinal Cord Injury

Process of a neuropsychological assessment

Statistics for Psychosocial Research Session 1: September 1 Bill

Integrated IAPT Data Guide

Rating Mental Impairment with AMA Guides 6 th edition:

Early response as predictor of final remission in elderly depressed patients

Establishing Interrater Agreement for Scoring Criterion-based Assessments: Application for the MEES

With additional support from Florida International University and The Children s Trust.

TITLE: Practice parameters for the assessment and treatment of children and adolescents with posttraumatic stress disorder.

By Hui Bian Office for Faculty Excellence

Increasing the Recognition of Generalized Anxiety Disorder in Primary Care

Figure 1: Design and outcomes of an independent blind study with gold/reference standard comparison. Adapted from DCEB (1981b)

Variable Measurement, Norms & Differences

Research Questions and Survey Development

PHQ-9 BSL and GAD-7 BSL

INTERQUAL BEHAVIORAL HEALTH CRITERIA GERIATRIC PSYCHIATRY REVIEW PROCESS

Copyright is owned by the Author of the thesis. Permission is given for a copy to be downloaded by an individual for the purpose of research and

Evaluating the Reliability and Validity of the. Questionnaire for Situational Information: Item Analyses. Final Report

Title: ADHD in girls and boys - gender differences in co-existing symptoms and executive function measures

Basic Psychometrics for the Practicing Psychologist Presented by Yossef S. Ben-Porath, PhD, ABPP

Developing and Testing Survey Items

Review of self-reported instruments that measure sleep dysfunction in patients suffering from temporomandibular disorders and/or orofacial pain

PTHP 7101 Research 1 Chapter Assignments

ORIGINAL ARTICLE Validation of the Hospital Anxiety and Depression Scale and the psychological disorder among premature ejaculation subjects

Associate Prof. Dr Anne Yee. Dr Mahmoud Danaee

The Concept of Validity

Urdu translation and validation of GAD-7: A screening and rating tool for anxiety symptoms in primary health care

The Wellness Assessment: Global Distress and Indicators of Clinical Severity May 2010

Measurement is the process of observing and recording the observations. Two important issues:

Clinical Assessment and Diagnosis

ASD Working Group Endpoints

The Psychometric Principles Maximizing the quality of assessment

Patients in the MIDAS Project. Exclusion Due to Bipolarity or Psychosis. Results

Psychometric properties of the Turkish version of the Patient Health Questionnaire Somatic, Anxiety, and Depressive Symptoms

Depression is one of the most prevalent and treatable. The PHQ-9. Validity of a Brief Depression Severity Measure

Research Article On the Differential Diagnosis of Anxious from Nonanxious Major Depression by means of the Hamilton Scales

CLINICAL VS. BEHAVIOR ASSESSMENT

The 5A's are practice guidelines on tobacco use prevention and cessation treatment (4):

Evidence-Based Assessment in School Mental Health

Reliability & Validity Dr. Sudip Chaudhuri

1. Introduction. 2. Objectives. 2.1 Primary objective

Comparing Multiple-Choice, Essay and Over-Claiming Formats as Efficient Measures of Knowledge

Transcription:

Practical Use of Rating Scales for Mood and Anxiety Disorders Vytas Velvyis, MSc, PhD Candidate At the conclusion of this workshop, participants should be able to know what gold standard mood and anxiety rating scales are being used in psychiatry for both clinical and research purposes. They will understand their strengths and weaknesses psychometrically and will have some practice using them. These scales were selected on the basis of both their psychometric standards and/or as a result of their practicality of use in clinical settings. These scales will be described and compared to other competing scales in terms of their psychometric properties and some ideas for how certain scales come to be accepted in the literature as the gold standard will be relayed. For each scale, the items will be reviewed one by one, and each question will be described in terms of the symptoms they are meant to assess and suggestions/heuristics given about how to make a numerical rating for a given item. Finally, principles of searching for and selecting tests based on clinical utility, practicality, validity and other psychometric properties will be generally described.

Practical Use of Rating Scales for Mood & Anxiety Disorders Vytas P. Velyvis Ph.D (c) Director, Interprofessional Research, Knowledge Translation & Academic Development Ontario Shores Centre for Mental Health Sciences e-mail: velyvisv@ontarioshores.ca Learning Objectives / Topics Why Use Rating Scales? Review of Psychometric Issues: Validity & Reliability Depression Rating Scales: MADRS & PHQ-9 Anxiety Rating Scales: HARS & GAD-7 Tips on choosing scales, more terminology and references March 10, 2012 Toronto Psychopharmacology Update Day Educational Methods Didactic Overview / Discussion (role play) Video of training for scale Use with unstructured video Handouts include key definitions and useful references at end Why Use Rating Scales? Reliability improves by using objective and standardized terms and procedures. Can compare scores against normative data Some can be integrated into clinical practice (~ mental blood pressure check) Increasing demand for standardization in private, public and legal contexts Good to use as a screen for more intensive investigation (if necessary) Expressed Myths against Use of Rating Scales? Not part of routine / inertia trumps good intentions Learning new ways of doing things may be intimidating Other people only use informal methods so it should be good enough for me It s too onerous / time consuming to conduct standardized interviews Using standardized scales interferes with the therapeutic relationship Disadvantages to Using Clinician Administered Rating Scales Requires training and quality control of interviewers Requires clear guidelines for rating accurately Quality control is difficult across multiple sites 1

What happens when we don t use Standardized Scales? 1. We tend to miss things Standardized assessment found 2-3 times higher rates of comorbidity as compared to routine assessment. Zimmerman & Mattia (1999) Psychiatric Diagnosis in Clinical Practice: Is Comorbidity being Missed? Comprehensive Psychiatry, 40, 182-191. Significantly improves accuracy as compared to routine assessment Ramirez-Basco, Bostic, Davies et al. (2000) Methods to Improve Diagnostic Accuracy in a Community Mental Health Setting American Journal of Psychiatry, 157, 1599-1605. What happens when we don t use Standardized Scales? 2. Increased illness burden due to insufficient information 3. Can mean longer hospital stays 4. Higher costs of care savings of $3,591,280: 580 more admissions not having to be referred to private hospitals which charge $800/day Miller (2001) Inpatient Diagnostic Assessments: Causes and Effects of Diagnostic Imprecision. Psychiatry Research, 111, 191-197. Key Terms in Evaluating Scales Reliability: is the measure consistent? Internal Reliability Test-Retest Reliability Inter-Rater Reliability Validity: is the measure assessing what it is supposed to assess? Content Validity Construct Validity Predictive Validity Validity/Reliability Relationship Tests that are reliable are not necessarily valid, however, tests that are unreliable are certainly NOT valid! Example of IQ test with HIGH RELIABILTY but with NO VALIDITY What month were you born? 1 + 1 =? What is your mother s first name? Video Clip 1 of Depression Depression Rating Scales: PHQ-9 & MADRS 2

Overview of the MADRS 2 nd most used depression rating scale More sensitive to symptom change over time. 10 items which are focused on core symptoms of depression Items are rated on a 0-6 scale (0=no symptoms; 6=severe) Greater emphasis placed on psychological symptoms of depression; less on somatic symptoms Positive features of the MADRS Short but efficient scale Items have high sensitivity Clear symptom definitions and anchors reduces need for extensive training Excellent inter-rater reliability Highly related to HDRS Drawbacks to the MADRS Original scale does not have standardized questions to elicit item responses. Does not specify a time frame for rating, which undermines reliability of severity ratings Overview of the PHQ-9 9 item self-report scale assesses depressive symptoms Symptoms map onto DSM-IV Depressive Disorder criteria (2 week time frame) one question concerning functional impairment Total score is obtained by summing the 9 items; scores range from 0-27 A score of 10 or higher is highly indicative of MDD Positive features of the PHQ-9 Can be self administered; brief Validated use in family practice settings can establish/predict Depressive Disorder diagnosis Two week time frame consistent with MDE Excellent internal consistency, test-retest reliability; high validity, sensitivity & specificity less than a minute to score Highly correlated with external validators (HAM-D, QLES-Q) = convergent/concurrent validity Drawbacks to the PHQ-9 the suggested scoring interpretive guidelines requier further psychometric validation Cannot use score alone as a means of determining diagnostic issues or to establish definitions of mild, moderate, severe etc (gray area) 3

PHQ-9 versus MADRS 9 items 10 items 5-10 minutes 10-15 minutes covers past 2 weeks time covered flexible Score Guidelines Score Guidelines Minimal 0-4 Remission < 8 Mild 5-12 Mild 9-17 Mod 13 19 Mod 18-34 Severe > 20 Severe > 35 Video Clip 2 of Depression Overview of the HARS (HAM-A) Anxiety Rating Scales: HARS (HAM-A) & GAD-7 Assesses anxiety symptoms in clinically anxious individuals. 14 items in total Scores range from 0 to 4 on all items (i.e., 0 = none; 4 = severe). Note: Hamilton s (1959) article suggests a rating of 4 is very rarely used in outpatient practice and is meant for severely debilitating symptoms Emphasizes somatic/autonomic features; less emphasis on cognitive/psychological aspects of anxiety. Positive features of the HARS Widely used and easily available Distinguishes those with anxiety disorders from those without Sensitive to change with treatment Standardized interview guidelines can now be obtained (see especially Bruss et al., 1994) Drawbacks to the HARS Items stem from outdated clinical construct anxiety neurosis which limits interpretation of scores Cannot distinguish anxious depression from anxiety. While component symptoms are provided for each item, specific anchor points are not delineated. Scale does not distinguish symptoms of a specific anxiety disorder. 4

Overview of the GAD-7 7 item self-report screening tool for detecting generalized anxiety disorder Total score is obtained by summing the items (0-3); scores range from 0-21 A score of 10 or higher is considered a yellow flag and is indicative of probable GAD (15 is considered a red flag ) Positive features of the GAD-7 Can be self-administered & brief Can establish a probable diagnosis for GAD Can be used as a screener for detecting panic disorder, social anxiety disorder and PTSD Good internal & test-retest reliability; high validity, sensitivity & specificity for GAD Correlates with HAM-A and World Health Organization Disability Scale (WHO-DAS-II) Drawbacks to the GAD-7 DSM-IV diagnostic criteria for GAD specify at least a 6 month duration of symptoms GAD 7 only asks about last two weeks HARS versus GAD-7 14 items 7 items 15-30 minutes 5-10 minutes assesses past week assesses past 2 weeks Score Guidelines * None / Remission < 5 Clinically significant > 14 Score Guidelines Mild 5-9 Mod 10-14 Severe > 15 * Kobak et al., 1993 Video Clip 3 of Anxiety Key Definitions, Tips & References 5

Definitions Reliability Internal Consistency/Reliability: the extent to which all items on a test measure the same variable or construct 2 factors affect internal consistency: (a) correlations between items (b) number of items Uses Cronbach s alpha statistic for overall test (> 0.70 reflect adequate reliability) For individual items, use corrected item-to-total correlation: Pearson s r > 0.20 Can interpret this statistic like ANOVA ie., if alpha =.70, then 70% of score is believed to be attributed to the true test score while 30% is attributed to measurement error. Cronbach s alpha: Rules of Thumb (Murphy & Davidshofer, 2004).95 = Measurement errors have no effect (excellent but rare).90 = excellent (standardized intelligence tests).85 = high (standardized achievement tests).80 = moderate-high.75 = moderate.70 = moderate-low (adequate for most rating scales).60 = unacceptably low.50 = effect of error and true score is exactly equal = no value Definitions cont d Reliability Test-Retest Reliability: assesses the extent to which multiple administrations of the scale generates the same results i.e., is the test stable over time? Useful only if you expect that the construct should be temporally stable Preferred statistic is the intraclass r which allows for adjustment for agreement by chance (> 0.60; Pearson s r > 0.70) For test-retest of individual items, use Pearson s r > 0.70. Standard Error of Measurement: provides an estimate of how accurate test scores really are by estimating the variability in test scores. SEM is the standard deviation of measurement error. (lower is not better necessarily) Definitions cont d Reliability Inter-Rater Reliability: assesses the extent to which multiple raters generate the same result Preferred statistic is kappa or the intraclass r which allows for adjustment for agreement by chance (> 0.60; Pearson s r > 0.70) (Interpretive guidelines suggested by Cicchetti (2001) Journal of Clinical and Experimental Neuropsychology, vol 23, 695-700) Levels of kappa, Levels of Levels of weighted kappa, observed clinical or or R intra-class agreement practical (%) significance <.40 <70 Poor.40-.59 70-79 Fair.60-.74 80-89 Good.75-1.00 90-100 Excellent Definitions (cont d) Validity Content Validity the extent to expert opinion agrees that the instrument measures what it was designed to measure Based on good description of content domain, and agreement that the items tap the domain of behaviour described Similar conceptually to internal reliability Construct Validity the extent to which an instrument is an accurate measure of a particular construct or psychological measure Evidence includes expert judgments, internal consistency of the instrument, studies confirming differences between the construct measure and variables which should be different from it, or correlations with other variables with which the instrument is expected to have certain relationships. Convergent Validity the extent to which an instrument shows high correlations with other measures or methods of measuring the same construct Convergent validity is adequate when a scale shows Pearson s r values of at least 0.50 in correlations with measures of the same construct. Definitions (cont d) Validity Criterion (empirical)validity (Concurrent vs. Discriminant) the extent to which an instrument measures what it was designed to measure as indicated by the correlation of test scores with some criterion/ gold standard measure of behavior Concurrent Validity the extent to which scores obtained by a group of people on an instrument are related to scores obtained on another (criterion/gold standard) measure of the same construct. Discriminant Validity the extent to which an instrument has low correlations with other measures or methods of measuring different psychological constructs Definitions (cont d) Validity Predictive Validity the extent to which scores on a test are predictive of performance on some criterion measure assessed at a later time. Determined by a statistically significant (p< 0.05) capacity to predict behavior (or change with treatment). Sensitivity pertains to the ability of an instrument to correctly identify individuals with the specified attribute Specificity pertains to the ability of an instrument to correctly exclude individuals without the specified attribute 6

Sources of Information about Scales References in your handout Key Library Reference Books The Mental Measurements Yearbook Tests in Print Tests Test Critiques Typical Contents of Scale Reviews Content, purpose, & underlying theory of scale construct(s) Identification of source and means to acquire Scoring and administration guidelines Normative information about population it was standardized on Reliability and Validity data References MADRS Mittmann N, Mitter S, Borden KE, Herrmann N, Naranjo CA & Shear NH: Montgomery-Asberg severity gradations. American Journal of Psychiatry 1997; 154(9), 1320-1321. Montgomery SA, & Asberg M: A new depression scale designed to be sensitive to change. British Journal of Psychiatry 1979; 134: 382-389. Müller M.J., Szegedi A, Wetzel H, & Benkert O: Moderate and severe depression gradations for the Montgomery-Asberg depression rating scale. Journal of Affective Disorders 2000; 60: 137-140. Snaith RP, Harrop FM, Newby DA & Teale C: Grade scores of the Montgomery- Asberg depression and clinical anxiety scales. British Journal of Psychiatry 1986; 148, 599-601. References PHQ-9 Kroenke, K., Spitzer, R. L., & Williams, J. B. W. (2001). Validity of a brief depression severity measure. Journal of General Interal Medicine, 16(9), 606-613. Liu, S., Yeh, Z., Huang, H., Sun, F., Tjung, J., Hwang, L., Shih, Y., & Yeh, A. W. (2011). Validation of Patient Health Questionnaire for depression screening among primary care patients in Taiwan. Comprehensive Psychiatry, 52, 96-101. Löwe, B., Kroenke, K., Herzog, W., & Grafe, K. (2003). Measuring depression outcome with a brief self-report instrument: sensitivity to change of the Patient Health Questionnaire. Journal of Affective Disorders, 81(1), 61-66. References HAM-A Hamilton M: The assessment of anxiety states by rating. British Journal of Medical Psychology 1959; 32: 50-55. Kobak K.A., Reynolds W.M. & Greist, J.H.: Development and validation of a computer-administered version of the Hamilton Anxiety Scale. Psychological Assessment 1993; 5: 487-494. GAD-7 Kroenke, K., Spitzer, R. L., Williams, J. B. W., & Löwe, B. (2010). The Patient Health Questionnaire somatic, anxiety, and depressive symptom scales: a systematic review. General Hospital Psychiatry, 32, 345-359. Löwe, B., Decker, O., Müller, S., Brähler, E., Schellberg, D., Herzog, W., & Herzberg, P.Y. (2008). Validation and standardization of the Generalized Anxiety Disorder screener (GAD-7) in the general population. Med care, 46, 266-274. Ruiz, M. A., Zamorano, E., García-Campayo, J., Pardo, A., Freire, O., & Rejas, J. (2010). Validity of GAD-7 scale as an outcome measure of disability in patients with generalized anxiety disorders in primary care. Journal of Affective Disorders, 128, 277-286. Resource References Aiken, LR: Rating Scales & Checklists: Evaluating Behavior, Personality, and Attitude. New York, John Wiley & Sons, Inc., 1996. American Psychiatric Association: Handbook of Psychiatric Measures. Washington, DC, American Psychiatric Association, 2000. Hully, SB, Cummings, SR, Browner, WS, Grady, D, Hearst, N, Newman, TB: Designing Clinical Research: An Epidemiologic Approach (2 nd edition). Philadelphia, Lippincott, Williams, & Wilkins, 2001. Lam, RW, Michilak, EE., Swinson, RP: Assessment in Scales in Depression, Mania and Anxiety. Oxfordshire, UK, Taylor & Francis Group, 2005. 7

Resource References Prien, RF, Robinson, DS: Clinical Evaluation of Psychotropic Drugs: Principles and Guidelines. New York, Raven Press, 1994. Sajatovic, M, Ramirez, LF: Rating Scales in Mental Health. Hudson, Ohio, Lexi-Comp Inc, 2001. Streiner, DL, Norman, GR: Health Measurement Scales: A Practical Guide to Their Development and Use (2 nd edition). New York, Oxford University Press, Inc., 1995. Vogt, WP: Dictionary of Statistics & Methodology: A Nontechnical Guide for the Social Sciences (2 nd edition). Thousand Oaks, California, Sage Publications, Inc., 1999. 8