Repeat Measurement of Case-Control Data: Corrections for Measurement Error in a Study of lschaemic Stroke and Haemostatic Factors

Similar documents
Flexible Matching in Case-Control Studies of Gene-Environment Interactions

A Bayesian Approach to Measurement Error Problems in Epidemiology Using Conditional Independence Models

Challenges of Observational and Retrospective Studies

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Biostatistics II

Introduction to Meta-analysis of Accuracy Data

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

A ccurate prediction of outcome in the acute and

EVect of measurement error on epidemiological studies of environmental and occupational

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Systematic Reviews and meta-analyses of Diagnostic Test Accuracy. Mariska Leeflang

Does Body Mass Index Adequately Capture the Relation of Body Composition and Body Size to Health Outcomes?

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

MS&E 226: Small Data

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

EPIDEMIOLOGY-BIOSTATISTICS EXAM Midterm 2004 PRINT YOUR LEGAL NAME:

Chapter 17 Sensitivity Analysis and Model Validation

Meta Analysis. David R Urbach MD MSc Outcomes Research Course December 4, 2014

Further data analysis topics

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

Advanced IPD meta-analysis methods for observational studies

Controlling Bias & Confounding

Fixed Effect Combining

POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

Introduction to Multilevel Models for Longitudinal and Repeated Measures Data

Learning Objectives 9/9/2013. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency

Strategies for handling missing data in randomised trials

10 Intraclass Correlations under the Mixed Factorial Design

Elevated Factor XI Activity Levels Are Associated With an Increased Odds Ratio for Cerebrovascular Events

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002

Sanjay P. Zodpey Clinical Epidemiology Unit, Department of Preventive and Social Medicine, Government Medical College, Nagpur, Maharashtra, India.

AN INDEPENDENT VALIDATION OF QRISK ON THE THIN DATABASE

investigate. educate. inform.

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Lecture Outline Biost 517 Applied Biostatistics I

Multivariate Mixed-Effects Meta-Analysis of Paired-Comparison Studies of Diagnostic Test Accuracy

9/4/2013. Decision Errors. Hypothesis Testing. Conflicts of Interest. Descriptive statistics: Numerical methods Measures of Central Tendency

Antiphospholipid Syndrome

Bias. Zuber D. Mulla

Internal Quality Control in the Haemostasis laboratory. Dr Steve Kitchen Sheffield Haemophilia and Thrombosis centre & UK NEQAS Blood Coagulation

Imputation approaches for potential outcomes in causal inference

Models for potentially biased evidence in meta-analysis using empirically based priors

BIOSTATISTICAL METHODS AND RESEARCH DESIGNS. Xihong Lin Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

Measuring cancer survival in populations: relative survival vs cancer-specific survival

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

MAGNITUDE OF SAMPLING AND ANALYTICAL VARIATIONS IN BLOOD AND BREATH ALCOHOL MEASUREMENTS

Sample size and power calculations in Mendelian randomization with a single instrumental variable and a binary outcome

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy

breast cancer; relative risk; risk factor; standard deviation; strength of association

Using Statistical Principles to Implement FDA Guidance on Cardiovascular Risk Assessment for Diabetes Drugs

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Appropriate Statistical Methods to Account for Similarities in Binary Outcomes Between Fellow Eyes

Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials

Biostatistics for Med Students. Lecture 1

Ch. 45 Blood Plasma proteins, Coagulation and Fibrinolysis Student Learning Outcomes: Describe basic components of plasma

Use of GEEs in STATA

Control of Confounding in the Assessment of Medical Technology

Methods Research Report. An Empirical Assessment of Bivariate Methods for Meta-Analysis of Test Accuracy

PTHP 7101 Research 1 Chapter Assignments

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013

Pearce, N (2016) Analysis of matched case-control studies. BMJ (Clinical research ed), 352. i969. ISSN DOI: /bmj.

A Percent Correction Formula for Evaluation of Mixing Studies

An introduction to power and sample size estimation

INTERVAL trial Statistical analysis plan for principal paper

How should the propensity score be estimated when some confounders are partially observed?

The ROBINS-I tool is reproduced from riskofbias.info with the permission of the authors. The tool should not be modified for use.

Available from Deakin Research Online:

A Brief (very brief) Overview of Biostatistics. Jody Kreiman, PhD Bureau of Glottal Affairs

Fracture risk in unicameral bone cyst. Is magnetic resonance imaging a better predictor than plain radiography?

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of

Annals of RSCB Vol. XVI, Issue 1

CHAPTER 6. Conclusions and Perspectives

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Mortality in relation to alcohol consumption: a prospective study among male British doctors

Two-stage Methods to Implement and Analyze the Biomarker-guided Clinical Trail Designs in the Presence of Biomarker Misclassification

Setting The setting was secondary care. The study was carried out in the UK, with emphasis on Scottish data.

CHAPTER VI RESEARCH METHODOLOGY

MEA DISCUSSION PAPERS

Systematic review of statistical approaches to quantify, or correct for, measurement error in a continuous exposure in nutritional epidemiology

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Modeling Binary outcome

Table S1. Read and ICD 10 diagnosis codes for polymyalgia rheumatica and giant cell arteritis

Statistical Considerations: Study Designs and Challenges in the Development and Validation of Cancer Biomarkers

Agreement Coefficients and Statistical Inference

CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS

Regular physical activity in leisure time is associated with

11/24/2017. Do not imply a cause-and-effect relationship

Critical Review Form Clinical Decision Analysis

Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision

Interpretation of Epidemiologic Studies

CHL 5225 H Advanced Statistical Methods for Clinical Trials. CHL 5225 H The Language of Clinical Trials

EPIDEMIOLOGY. Training module

Some interpretational issues connected with observational studies

Logistic regression: Why we often can do what we think we can do 1.

Transcription:

10 1 1 1 1 1 1 0 1 0 1 0 1 0 1 International Journal of Epidemiology Vol. No. 1 International Epidemiological Association 1 Printed in Great Britain Repeat Measurement of Case-Control Data: Corrections for Measurement Error in a Study of lschaemic Stroke and Haemostatic Factors S A BASHIR,*,** S W DUFFY* AND N QIZILBASH Bashir S A (International Agency for Research on Cancer, 10 cours Albert Thomas, Lyon Cedex 0, France), Duffy S W and Qizilbash N. Repeat measurement of case-control data: Corrections for measurement error in a study of ischaemic stroke and haemostatic factors. Intemahonal Journal of Epidemiology 1; : 0. Background. Haemostatic factors are suspected to be involved in the aetiology of cerebrovascular events. Methods. In a case-control study of 10 cases of transient ischaemic attack and minor ischaemic stroke, and 1 controls, data were available on levels of the haemostatic factors von Willebrand factor (vwf), plasminogen activator inhibitor-1 (PAI), tissue plasminogen activator (TPA) and factor Vll (FVII). These are subject to measurement error and withinperson fluctuation of true levels, which may bias relative risk estimates. For all subjects, two determinations were performed on the same blood sample, which allowed estimation of pure measurement error. For estimation of withinperson fluctuation, levels were measured from a repeat blood sample on 1 of the controls one year later. Results. The pure measurement error accounted for a very small proportion of the total variation in all cases. Uncorrected for within-person fluctuation, the odds ratio estimates associated with exceeding the median of vwf, PAI, TPA and FVII respectively were 1., 0., 1.0 and 0.. After correction for within-person fluctuation odds ratios were., 0.0, 1.1 and 0.1. Because the PAI determination was not robust to storage conditions, it was estimated that % of the variation in this factor was within-person rather than between-persons. Thus, estimates of relative risk relation to PAI cannot be regarded as reliable in this study. Conclusions. It is likely that elevated levels of vwf are associated with increased risk of ischaemic stroke, but interpretation must be tentative, due to relatively large within-person fluctuation of vwf levels. Keywords: case-control studies, misclassification, repeated measures, ischaemic stroke, risk estimation, odds ratio Various haemostatic factors are considered to be potentially involved in the aetiology of stroke. In this paper we report on the measurement of certain haemostatic variables, their within-subject fluctuation and an assessment of the associated relative risk in the context of a case-control study of transient ischaemic attack (TIA) and minor ischaemic stroke. The haemostatic variables examined were von Willebrand factor (vwf), plasminogen activator inhibitor- I (PAI), tissue plasminogen activator (tpa) and factor Vll (FVII). Elevations or reductions of these haemostatic factors may result in an increased or decreased tendency to thrombosis. The von Willebrand factor is the carrier protein for factor Vll in plasma which mediates adhesion between collagen and platelet glycoprotein lb. Factor VII is a key coagulation component. Thus * MRC Biostatistics Unit, Institute of Public Health, University Forvie Site, Robinson Way, Cambridge CB SR, UK. ** Current address: IARC, 10 cours Albert Thomas, Lyon Cedex 0, France. Memory Trials Research Group, Department of Clinical Geratology, University of Oxford, Radcliffe Infirmary, Oxford OX HE,UK. increases in these factors may be expected to increase the tendency to clot formation Plasminogen activator inhibitor- I is the major inhibitor of plasminogen activator and thus, elevated levels of PAI should be associated with depression of fibrinolysis. Tissue plasminogen activator is the natural plasminogen activator which causes physiological fibrinolysis. Thus, depressed tpa levels may be associated with an increased tendency to thrombosis and other circulatory events. In this field, as in other areas of epidemiology measurement error is a problem, 1 both as a result of pure laboratory error and from random variation of the true value within individuals. In the latter case, a person who usually has high levels of a factor. for example, may happen to have a lower level than usual when the blood sample is taken. The present study investigates the effect of these four factors on risk of TIA and minor ischaemic stroke in the context of a case-control study, along with the implication of measurement error and temporal fluctuation of levels of these factors. Minor ischaemic stroke and TIA were used as surrogate models for major ischaemic

CORRECTION FOR MEASUREMENT ERROR IN A STROKE STUDY 10 1 1 1 1 1 1 0 1 0 1 0 1 0 1 stroke as they are less likely to cause post-attack changes in haemostatic variables and hence invalidate retrospective inference. Major ischaemic stroke is likely to cause such changes, and may also introduce difficulty in obtaining valid pre-morbid histories due to comprehension and speech. DATA DESCRIPTION A summary of the design of the case-control study used in this analysis is given below. Further details about the design of this case-control study have been published previously. This study was designed to assess several biochemical variables as potential risk factors for ischaemic stroke. Cases of TIA and minor ischaemic stroke and controls were recruited to the study in 1 and 1 from the Oxfordshire Community Stroke Project (OCSP) and from a neurology clinic. There were 10 cases and (community) controls. In all, 1 (%) controls provided adequate blood samples for analysis. The remaining either were unwilling to provide a sample, or the sample contained insufficient plasma tor analysis. A second measurement of the haemostatic variables was provided by a random sample of 1 controls one year after the original measurement using identical procedures This was not performed for cases, as many were undergoing therapy in the months after the stroke which would change their blood chemistry. It had originally been planned to have 100 such repeat measurements. The number with repeated measurements was less than 100 due to non-response and furthermore varied between the haemostatic variables due to breakages and inadequate plasma. Figure 1 shows recruitment details ior controls and Figure the corresponding details for cases. Plasma samples stored at C in EDTA containers for a median of years (range years) were thawed once and assayed. The vwf antigen was assayed with an enzyme linked immunoabsorbent assay (ELISA) (Dako patts A/S, Glastrup, Denmark); FVII antigen by ELISA (Diagnostics Stage, Asnières, France); and tpa antigen by ELISA (Biopool AB, Umca, Sweden). All assays were performed in duplicate and blind to casecontrol status. The intra- and inter-assay coefficients of variation for vwf antigen were.% and.% respectively. The values in i.u./ml may be read as 1.00 i.u./ml of vwf antigen being equivalent to 100% of normal plasma. STATISTICAL METHODS For all of the haemostatic variables we have two assayed measurements taken from each of the blood samples (i.e. each blood sample is assayed twice). The 1 repeat-blood samples for controls were taken one year later and these also have two assayed measurements for each of the haemostatic variables. This enables US to

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY 10 1 1 1 1 1 1 0 1 0 1 0 1 0 1 analyse the within-laboratory measurements (or withinyear measurements) and the between-years measurements in an analysis of variance (ANOVA). Random effect.s ANOVA Random effects analysis of variance is used to estimate the proportion of the total variation that is attributable to each of the factors that have been specified in the model. In a random effects analysis of variance 10 we fit the model as given in equation (1). where i = 1, represents the year, j = 1, represents the measurement and k represents the subjects. In this model the A i and B ij are random variables with A i ~ N(0, ) and B ~ N(O, ). Due to this assignment of distributions, the A i A ij B s and B ij s are regarded as random effects. In this model the variance component is split into the between-persons variability,, the between-years A variability and the within-sample (or within-years) B variability. These can be expressed as percentages of the overall variability of a single measurement. Coefficient of Variation The coefficient of variation is used to describe the variation in a population as a proportion of its mean and is defined as cv = /. However we use the sample estimate which is ĉv = s/ x. Below, the coefficient of variation will be expressed as a percentage (i.e. cv 100%). Hereafter cv, cv A and cv B will represent the between-persons, between-years and within-years coefficients of variation, respectively. Thus we describe variation between subjects, temporal fluctuation within subjects and pure measurement error. Risk Estimates For estimation of relative risks, the haemostatic variables were dichotomized at the medians of the controls (first year average). Measurements above the median were regarded as risk (confounding) factor present. The risk was assessed as the risk factor present relative to the risk factor absent. The extreme categorization by splitting at the median was used as a cutoff point because it is still unclear what role these haemostatic factors play in the aetiology of stroke, and we therefore would like to keep model assumptions as simple as possible. Adjusted odds ratios were calculated using maximum likelihood techniques. Confidence intervals were calculated using variance asymtotic variance approximations (details available from SAB). Single Binary Risk Factor Adjusting for Mismeasurement In a case-control study, suppose the proportion of cases with positive risk factor status is p 1 and the proportion of controls with positive status is p. Then the odds ratio is p 1 (l p )/p (1 p 1 ). If p 1 and p are the proportions observed positive using an imperfect method with an error (either false positive or false negative) probability of 1, it can be shown that the true proportion positive is estimated as We assume that we have (external to the study) repeat determinations using the imperfect measure. The external assumption requires separation of the controls with the repeat measures from the remaining controls. Although this entails a loss of information, it enables us to obtain simple analytical estimates and it has been shown to give good agreement with more formal estimates and it also makes variance estimation easier. We estimate as where N individuals are measured twice and there are n disagreements between the first and second measurements. The variance can be estimated as The log odds ratio is calculated as where is the true log(or) and * is the observed log(or) from the misclassificated data. We use the following approximation for the variance of ˆ (using the delta method) where a, b, c and d are the observed misclassified cell numbers.

CORRECTION FOR MEASUREMENT ERROR IN A STROKE STUDY 10 1 1 1 1 1 1 0 1 0 1 0 1 0 1 Binary Risk Factor (RF) and Confounding Factor (CF) both Subject to Misclassification This problem is dealt with by Duffy et al. assuming an internal table (case-control status by RF by CF) and external repeal measurement. The error probabilities are estimated for the risk factor and confounding factor independently using equation (). Then the individual probabilities of being (RF positive, CF positive), (RF positive, CF negative), and so on? are estimated by a matrix correction to the observed probabilities for cases and controls. Formal interval estimation is performed using the profile likelihood, but a similar approximation to that in equation () is available (details available from SAB). Again, if the repeat data are internal to the study, they can be separated from the main data set and the analysis performed as usual this leads to the same expected odds ratio estimate but to a conservative confidence interval. RESULTS Table 1 gives a summary of the haemostatic variables. The figures are based on the means of the two measurements within individual blood samples that were taken for each variable. For each factor the mean (in international units), the standard deviation and the number of subjects is given for cases and controls with a single measurement and for controls with repeat measurements (i.e. an initial first measurement and a repeat measurement a year later). Table gives the results of the random effects analysis of variance. The mean used to calculate the coefficients of variation is the overall mean of all the measurements for the controls with repeat data. The within-years cv represents the pure laboratory error within samples. This is relatively low overall, with the highest, %, being for TPA and FVII. Temporal variation (between-years) is highest for PAI and vwf (1% and 0% respectively). It is clear that the major contribution to within-person variation is the temporal fluctuation of the true value. Hence, for further analysis the mean of the two within-blood-samples measures was taken as a single determination and formal correction for the misclassification was based on the between year changes. Table gives the corrected (crude) and uncorrected odds ratio using each of the haemostatic variables as a risk factor. It also gives the estimated value ˆ (i.e. Pr [Risk factor is correctly classified]). The median value that is given is the cutoff point for the risk factor being present or absent. Note that there is a considerable correction to the risk factor for vwf. This is because there was a substantial discrepancy between the first and the second measurement one year later (between-years) indicating a greater degree of misclassification. This increased uncertainty is reflected in a much wider confidence interval (Table ). Table is a cross tabulation ot all the haemostatic variables, as risk, and as confounding factors. The Table shows the odds ratio for each factor adjusted for each of the other factors in turn. Odds ratios are presented for the situation when both factors are corrected for misclassification (1), neither are corrected (), only the confounding factor is corrected () and only the risk factor is corrected (). The FVII and vwf have much the same effects on risk (approximately 0. and., respectively) when adjusted for any of the other variables. The TPA and PAI have their effects reversed when adjusting for vwf.

INTERNATIONAL JOURNAL OF EPIDEMIOLOGY 10 1 1 1 1 1 1 0 1 0 1 0 1 0 1 DISCUSSION Results in Tables, and indicate that in terms of classification above and below the median, more misclassification occurs in von Willebrand factor than in the other three factors. As a result, the correction is greater for von Willebrand factor and the corrected confidence intervals are considerably wider. Table also illustrates the fact that severe misclassification of a confounding factor can indicate dramatic changes in the odds ratio for the risk factor of interest. This can be seen from the differences between results correcting for misclassification and not correcting when plasminogen activator inhibitor-l or tissue plasminogen activator is adjusted for von Willebrand factor. The results in Table show that risk estimates with respect to plasminogen activator inhibitor-l are very difficult to interpret due to the variation attributable to between-subject, between-year variance. An alternative

CORRECTION FOR MEASUREMENT ERROR IN A STROKE STUDY 10 1 1 1 1 1 1 0 1 0 1 0 1 0 1 interpretation of this is that there is a large systematic increase from year 1 to year (Table 1). In any case, we would not give much credibility to the related risk estimates. It is likely that the problem with this variable arises from the way the blood was processed. For the repeat measurements (i.e. the measurements taken a year later, controls only) the blood was not centrifuged immediately after venepuncture thus allowing released PAI from platelets to enter the plasma, resulting in much higher assayed levels than normally found in the plasma. The major correction to the odds ratio for von Willebrand factor indicates the need to consider misclassification bias in design and analysis. The corrected odds ratio is considerably different from the uncorrected (from 1. to.). Another point to note is that the confidence intervals for the odds ratios for von Willebrand factor are very wide. This is because the confidence interval takes into account the variability of our estimated misclassification probability, 1 ˆ. This is also desirable and fits with the intuitive idea that if misclassification is bad enough to produce a major change in the estimated odds ratio, it is bad enough to introduce considerable uncertainty in our corrected estimate. Implicit in this analysis are certain assumptions, notably independence of errors conditional on true status, non-differential error (i.e. error probabilities are the same for cases and controls) and the independence of measurement error between variables. There are various methods for correction tor measurement and all make some of these assumptions. 1 A full multivariate correction would dispense with the last assumption 1, but for this we would need either a validation study or a larger repeat measures study. This paper highlights an explicit maximum likelihood approach for the correction of measurement errors. Measurement errors introduce many complications into most statistical analyses which are beyond the scope of this paper. For example, to look into the effect of differential measurement one would require repeat data on cases (which may not be useful if cases are undergoing treatment). Some of the other methods mentioned above for the correction of measurement errors have been discussed elsewhere. REFERENCES 1 Rosner B, Willet W C, Spiegelman D. Correction of logistic regression relative risk estimates and confidence intervals for systematic within-person measurement error. Stat Med 10; : 101.

0 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY 10 1 1 1 1 1 1 0 1 0 1 0 1 0 1 Greenland S. On correcting for misclassification in twin studies and other matched-pair studies. Stat Med 1; :. Kuha J. Corrections for exposure measurement error in logistic regression models with an application to nutritional data. Stat Med 1; : 1. Armstrong B G, Whittemore A S, Howe G R. Analysis of casecontrol data with covariate measurement error: Application to diet and colon cancer. Stat Med 1; : 1. Richardson S, Gilks W R. Conditional independence models for epidemiological studies with covariate measurement error. Stat Med ; : 10. Kaldor J, Clayton D. Latent class analysis in chronic disease epidemiology. Stat Med 1; :. Takano K, Yamaguchi T, Okada Y, Uchida K, Kisiel W, Kato H. Hypercoagolability in acute ischaemic stroke: Analysis of the intrinsic coagulation reactions in plasma by a highly sensitive automated method. Thromb Res 10; : 1 1. Elliot F A, Buckell M. Fibrinogen changes in relation to cerebravascular accidents. Neurology 11; : 0. Qizilbash N, Jones L, Warlow C, Mann J. Fibrinogen and lipid concentration as risk factors for transient ischaemic attacks and minor ischaemic strokes. Br Med J ; 0: 0 0. 10 Snedecor G W, Cochran W G. Statistical Methods. Ames, lowa: The lowa Stata University Press, 10. Duffy S W, Rohan T E, Day N E. Misclassification in more than one factor in a case-control study: A combination of Mantel-Hacnszel and maximum likelihood approaches. Stat Med 1; : 1. Duffy S W, Maximovitch D M, Day N E. External validation, repeat determination, and precision of risk estimation in misclassified exposure data in epidemiology. J Epidemiol Community Health ; : 0. Bashir S A, Duffy S W. Correction of risk estimates for measurement effort in epidemiology. Methods Inf Med 1; : 0 10. (Revised version received April 1) 0