Data harmonization tutorial:teaser for FH2019

Similar documents
Methodological Issues in Measuring the Development of Character

Mostly Harmless Simulations? On the Internal Validity of Empirical Monte Carlo Studies

EC352 Econometric Methods: Week 07

Module 14: Missing Data Concepts

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide

A Bayesian Nonparametric Model Fit statistic of Item Response Models

Equating UDS Neuropsychological Tests: 3.0>2.0, 3.0=2.0, 3.0<2.0? Dan Mungas, Ph.D. University of California, Davis

Table of Contents. Preface to the third edition xiii. Preface to the second edition xv. Preface to the fi rst edition xvii. List of abbreviations xix

Investigating the Invariance of Person Parameter Estimates Based on Classical Test and Item Response Theories

Computerized Mastery Testing

THE GOOD, THE BAD, & THE UGLY: WHAT WE KNOW TODAY ABOUT LCA WITH DISTAL OUTCOMES. Bethany C. Bray, Ph.D.

On Test Scores (Part 2) How to Properly Use Test Scores in Secondary Analyses. Structural Equation Modeling Lecture #12 April 29, 2015

Appendix Part A: Additional results supporting analysis appearing in the main article and path diagrams

Structural Equation Modeling (SEM)

SLEEP DISTURBANCE ABOUT SLEEP DISTURBANCE INTRODUCTION TO ASSESSMENT OPTIONS. 6/27/2018 PROMIS Sleep Disturbance Page 1

Biostatistics II

11/24/2017. Do not imply a cause-and-effect relationship

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Technical Specifications

Measures. David Black, Ph.D. Pediatric and Developmental. Introduction to the Principles and Practice of Clinical Research

Missing Data and Imputation

THE APPLICATION OF ORDINAL LOGISTIC HEIRARCHICAL LINEAR MODELING IN ITEM RESPONSE THEORY FOR THE PURPOSES OF DIFFERENTIAL ITEM FUNCTIONING DETECTION

Quality of Life. The assessment, analysis and reporting of patient-reported outcomes. Third Edition

Statistical data preparation: management of missing values and outliers

Selection and Combination of Markers for Prediction

Chapter 17 Sensitivity Analysis and Model Validation

Psychometrics for Beginners. Lawrence J. Fabrey, PhD Applied Measurement Professionals

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Business Statistics Probability

Reliability, validity, and all that jazz

Application of Item Response Theory to ADAS-cog Scores Modeling in Alzheimer s Disease

PROMIS ANXIETY AND KESSLER 6 MENTAL HEALTH SCALE PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES

Item Analysis Explanation

Still important ideas

ANNEX A5 CHANGES IN THE ADMINISTRATION AND SCALING OF PISA 2015 AND IMPLICATIONS FOR TRENDS ANALYSES

MS&E 226: Small Data

FATIGUE. A brief guide to the PROMIS Fatigue instruments:

Study Guide #2: MULTIPLE REGRESSION in education

PROMIS PAIN INTERFERENCE AND BRIEF PAIN INVENTORY INTERFERENCE

PROMIS DEPRESSION AND CES-D

Contents. What is item analysis in general? Psy 427 Cal State Northridge Andrew Ainsworth, PhD

You must answer question 1.

Political Science 15, Winter 2014 Final Review

The Classification Accuracy of Measurement Decision Theory. Lawrence Rudner University of Maryland

investigate. educate. inform.

PROMIS ANXIETY AND MOOD AND ANXIETY SYMPTOM QUESTIONNAIRE PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES

In this module I provide a few illustrations of options within lavaan for handling various situations.

S Imputation of Categorical Missing Data: A comparison of Multivariate Normal and. Multinomial Methods. Holmes Finch.

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

UNIT 4 ALGEBRA II TEMPLATE CREATED BY REGION 1 ESA UNIT 4

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

PROMIS DEPRESSION AND NEURO-QOL DEPRESSION

A Brief Introduction to Bayesian Statistics

Differential Item Functioning

PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS SLEEP DISTURBANCE AND NEURO-QOL SLEEP DISTURBANCE

CSE 258 Lecture 1.5. Web Mining and Recommender Systems. Supervised learning Regression

USE OF DIFFERENTIAL ITEM FUNCTIONING (DIF) ANALYSIS FOR BIAS ANALYSIS IN TEST CONSTRUCTION

Empowered by Psychometrics The Fundamentals of Psychometrics. Jim Wollack University of Wisconsin Madison

Section on Survey Research Methods JSM 2009

PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES PROMIS PEDIATRIC ANXIETY AND NEURO-QOL PEDIATRIC ANXIETY

Validity and Reliability. PDF Created with deskpdf PDF Writer - Trial ::

Validity and reliability of measurements

Item Response Theory: Methods for the Analysis of Discrete Survey Response Data

Does factor indeterminacy matter in multi-dimensional item response theory?

Comparability Study of Online and Paper and Pencil Tests Using Modified Internally and Externally Matched Criteria

Data Analysis Using Regression and Multilevel/Hierarchical Models

Section 5. Field Test Analyses

PAIN INTERFERENCE. ADULT ADULT CANCER PEDIATRIC PARENT PROXY PROMIS-Ca Bank v1.1 Pain Interference PROMIS-Ca Bank v1.0 Pain Interference*

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Estimating Reliability in Primary Research. Michael T. Brannick. University of South Florida

Scale Building with Confirmatory Factor Analysis

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

COLLEGIUM REPRODUCIBILITY PRINCIPLES, PROBLEMS, PRACTICES. Symposium

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

PLS 506 Mark T. Imperial, Ph.D. Lecture Notes: Reliability & Validity

Appendix 1. Sensitivity analysis for ACQ: missing value analysis by multiple imputation

Design and Analysis Plan Quantitative Synthesis of Federally-Funded Teen Pregnancy Prevention Programs HHS Contract #HHSP I 5/2/2016

Incorporating Measurement Nonequivalence in a Cross-Study Latent Growth Curve Analysis

INTRODUCTION TO ASSESSMENT OPTIONS

Lecture 4: Research Approaches

Lecture Outline Biost 517 Applied Biostatistics I

Retrospective power analysis using external information 1. Andrew Gelman and John Carlin May 2011

PROSETTA STONE ANALYSIS REPORT A ROSETTA STONE FOR PATIENT REPORTED OUTCOMES

Analysis of TB prevalence surveys

ANXIETY. A brief guide to the PROMIS Anxiety instruments:

Jumpstart Mplus 5. Data that are skewed, incomplete or categorical. Arielle Bonneville-Roussy Dr Gabriela Roman

HOW STATISTICS IMPACT PHARMACY PRACTICE?

Accuracy of Range Restriction Correction with Multiple Imputation in Small and Moderate Samples: A Simulation Study

Multiple Calibrations in Integrative Data Analysis: A Simulation Study and Application to Multidimensional Family Therapy

Performance of Median and Least Squares Regression for Slightly Skewed Data

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

accuracy (see, e.g., Mislevy & Stocking, 1989; Qualls & Ansley, 1985; Yen, 1987). A general finding of this research is that MML and Bayesian

Practical Bayesian Optimization of Machine Learning Algorithms. Jasper Snoek, Ryan Adams, Hugo LaRochelle NIPS 2012

Transcription:

Data harmonization tutorial:teaser for FH2019 Alden Gross, Johns Hopkins Rich Jones, Brown University Friday Harbor Tahoe 22 Aug. 2018 1 / 50

Outline Outline What is harmonization? Approach Prestatistical harmonization - the accounting job Apply the tool Diagnostics/checks Distribution-based IRT-based Missing data 2 / 50

Motivation We want to do analyses Cross-sectional comparisons at a time point Longitudinal change in a variable over time I I Individual changes Inter-variable di erences in intra-individual changes (change regressed on a covariate of interest) Nesselroade and Baltes, 1979, per Buss, 1974 3 / 50

Harmonization Statistical harmonization Harmonization is broad Qualitative assessments of the comparability of measures Statistical approaches to equate and link measurement scales or tests Dierent studies on the same topic often implement dierent measures due to: Developmental dierences in the target population Investigator proclivities Logistical issues Harmonization across data sources can help synthesize information across dierent sources 4 / 50

Harmonization Integrative vs coordinated analysis Integrative data analysis (IDA): analysis of multiple datasets, together or in parallel, to address substantive research hypotheses Together (pooled): Individual-participant meta-analysis In parallel: Coordinated analysis, in which models are run separately in each data source Goals of harmonization of pooled data Larger sample size to aord power for questions that cannot be addressed from individual data sources (e.g., genetics; interactions) Address innovative questions that cannot be answered with one data source (e.g., what does cognitive change look like across the life span? Is education or study membership a stronger predictor of cognitive change?) 5 / 50

Approaches Approaches to harmonization (Grith et al., 2012 AHRQ report) Prestatistical harmonization Accounting work Gather available test data Evaluate test responses Describe the sample Statistical harmonization approach for test equating Diagnostics 6 / 50

Approaches Pre-statistical harmonization Identify the items to be used Rename variables to common rubric Recode missing data codes Trim outliers (e.g., winsorize) Stretch out variable distributions Discretization Check for small cells Check for anchor items 7 / 50

Approaches Visualizing number of anchors 8 / 50

Approaches Challenges and lessons in harmonization: The Anna Karenina principle From Leo Tolstoy's Anna Karenina: "Happy families are all alike; every unhappy family is unhappy in its own way." Our analog Well-designed studies are all alike; every study with design features that present challenges for analysis presents challenges for analysis in its own way 9 / 50

Approaches Apply the method 10 / 50

for co-calibration Distribution-based Goal: dene a transformation of a test that returns the same cumulative probability plot as the other variable being compared Mean equating Linear equating Equipercentile equating Item-based Goal: dene a transformation of a test that places it on the same metric as another Item response theory Multiple imputation for missing data 11 / 50

Mean equating Relative position is dened by the absolute dierence from the sample mean of a test, and each individual's score is changed by the same amount to equate the sample mean to that of a reference test 12 / 50

Linear equating Relative position is dened in terms of standard deviations from the group mean. Linear equating is accomplished by adjusting scores from the new form to be within the same number of standard deviations of the mean of the original form. 13 / 50

Equipercentile equating Denes relative position by a score's percentile rank in the group. Accomplished by identifying percentiles for scores on a reference test and transforming each score on a new test to the score on the reference with the same percentile rank 14 / 50

Equipercentile equating Each score on one test (non-30 point MMSE version) is transformed to the score on the reference test (30-point MMSE) with the same percentile rank 15 / 50

How is it done? Mean equating MeanEq_test2 = old_test2-(mean_test2 - mean_test1) Linear equating LinEq_test2 = mean_test1 +(old_test2-mean_test2)/(sdtest1/sdtest2) z scores! Equipercentile equating R package: equate https: //cran.r-project.org/web/packages/equate/equate.pdf 16 / 50

Dangers of distribution-based methods They equate scales, not metrics Blunt force tools. They not only erase measurement dierences, they can obliterate age dierences and other dierences that we wish to preserve Think carefully about what you equate Same construct? Is there sucient variability to support the relative position? 17 / 50

Shoe size and MMSE(Mobilize Boston Study, N=807) Among persons with an observed MMSE less than 24, the maximum equated shoe size is 7. So, to screen for dementia using shoe size, ag as possibly demented persons with a shoe size of less than 7 18 / 50

for co-calibration Distribution-based Goal: dene a transformation of a test that returns the same cumulative probability plot as the other variable being compared Mean equating Linear equating Equipercentile equating Item-based Goal: dene a transformation of a test that places it on the same metric as another Item response theory Multiple imputation for missing data 19 / 50

IRT and the Item Characteristic Curve 20 / 50

Item response theory: An important assumption Exchangeability of items Each item conveys information about the latent trait Implication: Missingness on an item might be OK as long as other items are not missing. Less precision 21 / 50

Scaling a latent variable in IRT There is no natural scale in latent variable space By convention, we usually make mean 0, variance 1 PROMIS scales their constructs to a T-scale (mean 50, SD 10) 22 / 50

IRT How to link items to an internal scale using Mplus Data are stacked or pooled together Disadvantage: scale has no external meaning 23 / 50

IRT How to link items to an internal scale using Mplus Data are stacked or pooled together Factor is now scaled to a particular dataset or reference group 24 / 50

IRT How to link items to an external scale using Mplus Data are stacked or pooled together 25 / 50

Matters to be aware of using IRT for co-calibration Is the scale interpretable? Check for dierential precision / ceiling eects Selecting an estimator Is the metric the same across studies? Is it a problem to estimate an IRT model with repeated measures on people? 26 / 50

IRT Check for dierential precision In addition to estimating levels for a person at a point in time in a sample, IRT can quantify the precision of the estimated value More information, leading to improved precision, isn't always better when it is dierential by study visit or data source Example: ARIC NCS study 27 / 50

IRT Domain-specic factor scores We see dierences in oors based on the precision of information available We did not think these dropped oors would be dierential by a predictor In fact they are... Dierential precision across visits can bias estimated associations of an exposure with cognitive decline if people with low levels of the exposure have low cognition at baseline 28 / 50

IRT Familiar methods eect No statistical model can address without more assumptions 29 / 50

Matters to be aware of using IRT for co-calibration Is the scale interpretable? Check for dierential precision / ceiling eects Selecting an estimator Is the metric the same across studies? Is it a problem to estimate an IRT model with repeated measures on people? 30 / 50

Selecting an estimator Maximum likelihood (with robust estimation) (MLR) All records are used Except records with 100 Assumes MAR (missing at random) More reasonable in epidemiologic settings WLSMV is accurate, ecient (fast), but inappropriate when MCAR (missing completely at random) assumptions are not viable Harmonization across many datasets 31 / 50

Selecting an estimator MLR and WLSMV provide "regression-based" factor score estimates Bayesian plausible values Based on a mean of k individual plausible values drawn from the posterior distribution As number of draws from the posterior increase, we should reach MLR regression-based factor score estimates Regression-based factor scores are fair for high-stakes testing because a record (person) with the same response pattern gets the same values If the goal is to estimate population parameters (e.g., epidemiological inference), then plausible values might be desired because they retain imprecision in estimates See Asparouhov and Muthen (2010). Plausible values for Latent Variables Using Mplus. 32 / 50

Matters to be aware of using IRT for co-calibration Is the scale interpretable? Check for dierential precision / ceiling eects Selecting an estimator Is the metric the same across studies? Is it a problem to estimate an IRT model with repeated measures on people? 33 / 50

Is the metric the same across studies? Dierential item functioning (DIF) by dataset Simulation Imagine a population in which all indicators were administered to all respondents Derive a "true" theta and a dataset-specic theta based on tests available only in that data Compare 34 / 50

Is the metric the same across studies? 35 / 50

Matters to be aware of using IRT for co-calibration Is the scale interpretable? Check for dierential precision / ceiling eects Selecting an estimator Is the metric the same across studies? Is it a problem to estimate an IRT model with repeated measures on people? 36 / 50

A perennial reviewer critique... A possible problem involves the non-independence of the measures. Although robust estimators were used, they can go only so far in eliminating the high degree of non-independence Resolution We can run sensitivity analyses by estimating models based on 1 record per person. Do this 10240 times, get bootstrapped estimates This yields the same model parameters (loadings and thresholds) as a model using all records The standard errors of item parameters do become larger when we use just 1 record per person, however we do not typically use those standard errors 37 / 50

for co-calibration Distribution-based Goal: dene a transformation of a test that returns the same cumulative probability plot as the other variable being compared Mean equating Linear equating Equipercentile equating Item-based Goal: dene a transformation of a test that places it on the same metric as another Item response theory Multiple imputation for missing data 38 / 50

Missing data/multiple imputation Assume you have 2 datasets Dataset 1 has hba1c, diabetes diagnosis, glucose Dataset 2 has diabetes diagnosis, glucose If I want to use HBA1c in the pooled sample, I could predict it based on diabetes status and fasting glucose using multiple imputation methods in the pooled data Multiple imputations reect uncertainty in unmeasured data If both datasets measure the same construct, we can measure it in the pooled data 39 / 50

Stop here 40 / 50

Example Example challenge - AVLT nonequivalence 41 / 50

Example Statistical equating methods: ACTIVE AVLT 42 / 50

Example ADNI 1 data (N=825) 43 / 50

Example An IRT approach in ADNI 44 / 50

Maybe stop here 45 / 50

46 / 50

Harmonization Psychometric engineering as art Psychometrics is a eld of study concerned with the theory and technique of psychological measurement https://en.wikipedia.org/wiki/psychometrics What do psychometricians design? Models Algorithms Statistical procedures 47 / 50

Harmonization Psychometric engineering as art Engineering is the application of mathematics, empirical evidence and scientic, economic, social, and practical knowledge in order to invent, innovate, design, build, maintain, and improve structures, machines, tools, systems, components, materials, processes and organizations Art is about nding beauty in good designs George Box: "Models, of course, are never true, but fortunately it is only necessary that they be useful..." (1979, pg 2) Pablo Picasso: "Art is a lie that enables us to realize the truth" Design should integrate art and engineering A psychometrician's models may be wrong, but if they are useful (engineering standard) and tell us something about the universe (science), and beautiful (art) then they are valuable 48 / 50

Motivation We want to do analyses An important aspect of psychometrics involves problem-solving. Engineers solve problems The work of a psychometrician is to create a thing of beauty that is useful 49 / 50

Motivation Psychometric Engineering (Thissen 2001) Goal: make functional, useful, beautiful things Harmonization serves as a bridge between 2+ studies. The tests must be useful and functional Analogously, a great Battle Station was built in Star Wars to bring harmony to the galaxy, to restore order 50 / 50