%CEM: A SAS MACRO TO PERFORM COARSENED EXACT MATCHING

Size: px
Start display at page:

Download "%CEM: A SAS MACRO TO PERFORM COARSENED EXACT MATCHING"

Transcription

1 %CEM: A SAS MACRO TO PERFORM COARSENED EXACT MATCHING STEFANO VERZILLO PAOLO BERTA MATTEO BOSSI Working Paper n DECEMBER 2015 DIPARTIMENTO DI ECONOMIA, MANAGEMENT E METODI QUANTITATIVI Via Conservatorio Milano tel (21522) - fax (21505) E Mail: dipeco@unimi.it

2 %CEM: A SAS Macro to perform Coarsened Exact Matching S. Verzillo Univ. of Milan P. Berta CRISP-Univ. of Milan Bicocca September 16, 2013 M. Bossi DLG % CEM is a SAS macro which allows researchers to perform the recently introduced Coarsened Exact Matching (CEM) technique. CEM is a non-parametric matching method to avoid the confounding influence of pre-treatment control variables directly improving causal inference in quasi experimental studies. CEM authors originally provided few software solutions for R, Stata and SPSS packages to perform their matching algorithm. The % CEM macro integrates the already available software alternatives introducing a completely automated Coarsened Exact Matching macro for SAS users. Both the matching strategy -including some standard coarsening options- and the associated L 1 multivariate imbalance measure are provided. An empirical application estimating the causal effect of regional health systems on the intra-hospital mortality using multiple artificial datasets from a large administrative database completes the paper. Keywords: Coarsened Exact Matching, Causal inference, SAS, SAS/IML 1 Introduction An important branch of the existing literature on causal inference is represented by observational studies that often are the only available alternatives to the lack of randomization, especially in non-u.s. countries. The key goal of these studies is to measure the effect of a binary treatment (T) on an outcome of interest by two subgroups of individuals: treated and controls. The main assumption, unfortunately, lies on the absence of random assignment of units to treatment and control states. As a result, treatment and control groups of individuals may differ substantially on the multidimensional distribution of their observable covariates [9]. Then, to avoid the confounding influence of pre-treatment control variables on the estimated causal effect, different econometric techniques were early introduced by the literature. Methods available to control for this bias could be both parametric or not, and essentially consist of propensity score-based techniques, 1

3 matching algorithms and model stratification [1]. Propensity score matching [8] is the most commonly used within parametric methods [6]. It consists on estimating the individual conditional probability of assignment to the treatment status for all the selected individuals given their observed covariates. Then the estimated propensity score could be used in model estimation to accommodate general heterogeneity in different ways: a regression covariate as well as a matching parameter or a stratification rule. But usually finding a matching solution with propensity score does not guarantee a good balance to all of the selected covariates. Infact, improving balance on most of them could leave the remainders unbalanced often introducing also more bias with respect to the initial distribution. In addition to this, propensity score matching has the drawback of violating the congruence principle, which requires congruencies between the data and analysis spaces metrics (the metric of the two spaces is different by definition). It is well-known how parametric methods force covariates of input data from a multi-dimensional original space in a new space usually defined by the univariate propensity score. Mielke and Berry [7] show how violating the congruence principle produces less robust inferences. Otherwise matching is a non-parametric method that can be highly effective in removing imbalance in observed covariates between treatment and control groups. Exactly balanced data avoid in controlling for the observables (X) allowing researchers in estimating causal effects through a simple mean difference between the selected groups of individuals. An additional problem stems in the fact that most of the existing matching methods a priori guarantee the sample size of the two sub-groups but only occasionally reduce the imbalance between treated and controls units (hence occasionally reducing bias of the estimate effect). To avoid these substantial problems Iacus, King and Porro [5] introduced a new class of matching algorithms: Monotonic Imbalance Bounding (MIB) methods. Within this class of algorithms a specific method that appears really helpfull in empirical applications is Coarsened Exact Matching (CEM). CEM authors originally provided few software solutions for standard softwares like R, Stata and SPSS to perform their algorithm. Our %CEM SAS macro integrates the already available software alternatives introducing a completely automated Coarsened Exact Matching macro for SAS users. The paper is organized as follows: section 2 is a short review of the CEM algorithm, section 3 introduces the % CEM macro list of parameters while section 4 reports an empirical application with some results we obtain testing the macro speed with different options of variables binning and an increasing number of records. Conclusions complete the paper. 2 Coarsened Exact Matching CEM belongs to the set of matching approaches based on stratification (also known in literature as sub-classification ). In comparison with other methods 2

4 CEM firstly meets the principle of not reducing the original data space, operating in the multidimensional variable space itself. A second innovation of CEM as a member of MIB class of methods is that it does not fix a-priori the number of matched observations ex-ante but it lets the number of matched units be the result of the setting coarsening parameters on the observables. CEM is, this way, expressly defined to overcome the issue of increasing imbalance on some variables when improving it for others, which represents a serious problem when performing parametric methods as propensity score. With CEM the researcher chooses the maximal level of allowed imbalance ex-ante and then CEM produces a matched sample of a-priori unknown size. Additionally to this CEM exhibits really interesting computational advantages because CEM algorithm represents all the observable information in a single text string associated to each observation. The result is that Coarsened Exact Matching has the same complexity of simple frequency tabulation. On the other hand the most serious drawback of CEM consists on the fact that setting a level of coarsening too fine means discarding lots of units. So choosing the level of coarsening appropriately is the crucial point when running CEM. If the binning is too large then important information, potentially useful for better matching results, may be missed. Otherwise the smaller is the coarsening the larger is the number of discarded observations and the solution may be unavailable or less efficient. So that, the main results of CEM are threefold: less covariate imbalance, less model dependence and less resulting statistical bias. The CEM authors documented in their original paper how in many empirical applications CEM eliminates much of the heterogeneity producing causal estimates. Given the assumption that the coarsening choices have to be done on the basis of researchers substantive information, in order to support SAS users in this trouble we have automated a series of standard coarsening options which choose -case by case- the bin-widths for continuous variables. These standard alternatives are automatically produced by % CEM macro. The CEM structure is the subsequent: 1. it coarsens each of the observed variables (X) following the researcher willingness (differently if categorical or continuous); 2. it applies a matching algorithm (1:1 or 1:n) to the strata identified by the attributes of the coarsened variables; 3. empty strata, where no treated or controls are included, are discarded while strata with at least one treated or control units are retained; 4. CEM weights are computed for each stratum (s) as follows: w i = m C /m T m s T /ms C where ms T and ms C and m C and m T are respectively: the formers the frequencies of treated and controls individuals in the stratum and the latter the frequencies of treated and controls being matched in the same stratum. Additionally weights of zero are given to unmatched units; 3

5 5. finally an imbalance measure called L 1 is computed; The L 1 was introduced by the authors ( [3]) to measure the distance between the multivariate histograms (H) of the original and the matched populations producing a measure of global balance. The L 1 balance measure is computed into the % CEM macro invoking an ad hoc macro called % L 1. An alternative multidimensional balance measure called G I has recently been introduced by the literature [2] with it s % G I SAS macro code. This macro could be easily combined by a SAS user with the % CEM substituting the % G I macro instead of our % L 1 into the SAS original % CEM macro (only adapting it with the names of our macro parameters). The original L 1 is defined as follows: L 1 (H) = 1 2 Â f l1 l k g l1 l k (1) l 1 l k 2H(X) where f l1 l k and g l1 l k are the relative frequencies of treated and controls belonging to the cells with coordinates l 1 l k in the multivariate cross tabulation (H). The L 1 provides an easy interpretation: conditioning on the coarsening level if the empirical distributions before and after CEM are completely separated then L 1 =1 while if the distributions perfectly overlap then L 1 =0. Otherwise L 1 2 [0, 1]. For example if L 1 = 0.81, it means that a 19% of the two multidimensional histograms overlap. A good matching performance is reached if L 1 of the matched population is less or equal to the L 1 of the original population. Optimizing the absolute differences of treated and controls relative frequencies of the full matrix (H) we adopt a reduced approach using the SAS/IML language. Computation of the relative differences is performed separately for each sub-matrix (both outside and onto the principal diagonal) of the complete original matrix (H) and then the additive property of L 1 guarantees to obtain same results. Then to simultaneously compare different pre-defined coarsening levels a standard proc gplot of each of their L 1 measures is offered with % CEM allowing researchers in choosing the more efficient binning solution to their purposes. 3 List of Macro Parameters Based on recent contributes in matching literature [3] a SAS (SAS Institute Inc.) macro program to perform Coarsened Exact Matching and evaluate the global imbalance of pre/after-matching populations is written. A complete list of % CEM parameters is the following: % CEM (lib=, data=, id=, treat=, del mis=, match type=, coddataset=) where: 4

6 * lib: name of the directory containing the original dataset; * data: name of the SAS dataset to be read. It must be organized with one row for each observation to be matched (individuals or firms), K observed continuous or categorical covariates, a treatment indicator variable and the ID primary-key variable; * id: ID primary-key variable; * treat: a dummy indicator variable, 1-treated and 0-untreated; * del mis: option for missing values: 0 for keeping as additional categories and 1 for deleting before matching; * match type: 1 (1:1 matching) or N (1:N matching with associated strata weights); * coddataset: option to assign a code to the different dataset tested; The macro computes CEM between treated and controls using both SAS and SAS/IML languages. At this end, after defining the type of matching (1:1 or 1:n) and missing value options, it computes the matching algorithm on the provided subjects (data). % CEM automatically creates some multiple sets of strata, depending on the level of coarsening, each with equal values of the observable covariates (X). For continuous variables the macro performs by default quintiles, quartiles, percentiles and original values as matching alternatives. For nominal and ordered variables the macro assumes that the user already specifies data in the desired number of categories. Categorical variables cannot be coarsened by default without specific choices on how the coarsening would take place. Indeed coarsening choices are assumptions that are strictly based on a substantial and extensive knowledge on both variables interpretations and measurement scales. Then subjects belonging to strata with at least one treated and one control units are retained while the left-overs are pruned by the sample. Therefore, depending on the specified matching option % CEM performs exact matching randomly selecting the desired number of treated and controls, or it includes all of them by calculating CEM weights for each strata. Finally the L 1 multidimensional imbalance measure is computed for each of the default coarsening option and compared to the others with a simple graphical representation. 4 Empirical Application This section details of applying % CEM macro for matching treated and control patients in an artificial regional study focused on the incidence of in-hospital mortality in Lombardy Region (Italy). The purpose of this study is to assess 5

7 if there is a different risk of mortality for citizens resident in Lombardy and citizens resident elsewhere in Italy but discharged from a Lombard hospital. The Italian National Healthcare System (NHS) provides universal healthcare coverage, but a recent policy of devolution (2001) has transferred several important administrative and organizational responsibilities from the central government to the 20 regions. Among the 20 regions, Lombardy is one of the most important in terms of socio-demographic and economic aspects. It contains about 10 million citizens (equal to 16% of the Italian population) and it ranks among the most competitive areas in Europe. The Lombardy healthcare system comprises approximately 200 hospitals, 2 million of discharges annually and 16 billion of Euros devoted to healthcare expenditures (73% of the total regional budget). A regional reform in 1997 radically transformed the healthcare system into a quasi-market in which citizens can freely choose the provider, regardless of the ownership (private for profit, private not for profit, or public). The Italian NHS provides for each citizen free hospitalization in any of the Italian regions. This determines that each years about 150,000 Italian citizens from other regions are admitted in Lombard hospitals. The empirical application of this work would like to understand if there is a difference in term of in-hospital mortality between patients from other regions respect to Lombard inhabitants. For this reason we apply the %CEM macro to extract two subgroups, one of treated (Lombard patients) and one of control (not-lombard patients) patients, selected according to specific characteristics, in order to verify the average difference on their mortality risk. The database was originally abstracted from the administrative regional healthcare information system that collects data about the patients admitted to hospitals in the Lombardy region in In 2011 discharges were around of which 79% were ordinary and 21% were day-hospital or daysurgery. Moreover, hospitalizations of residents outside the Lombardy region account for 10% of the whole admissions. The hospital discharge data contains basic demographic information (age, gender), information on hospitalization (length of stay, special-care unit use, transfers within the same hospital or through other facilities, within-hospital mortality,...) and six diagnosis codes and procedures defined according to the International Classification of Diseases, Ninth Revision, Clinical Modification (ICD-9-CM). Only ordinary hospitalizations for patients aged more than 2 years were retained in the sample. The response variable was an in-hospital binary mortality index, indicating whether or not the patient died during the hospitalization. Selected variables at the patient level were chosen as reasonable major determinants of patient mortality in our example. To apply the %CEM macro we control for patient s age (AGE, expressed in years), gender (SEX, a dummy variable equal to 1 if the patient is male and 0 otherwise), coexisting conditions expressed by the Elixhauser index (COMORB; Elixahuser, 1998), presence of selected comorbidities at admission such as cardiovascular diseases (CARDIO, expressed as a dummy variable) and cancer (ONCO, expressed as a dummy variable), length of stay (LOS, expressed in days), transit in an Intensive Care Unit (ICU), presence of a 6

8 principal diagnosis indicating an admission to emergency (EMERG, expressed as a dummy variable), Diagnostic Related Group (DRG) and Major Class of DRG (MDC). In addition to this, in order to artificially test the macro on a consistent number of variables, we have introduced the following characteristics usually not affecting mortality: financing category (rehabilitation, long term care, etc..), length of stay before surgery, type of discharge (voluntary, to a different hospital, at home, etc.),ward of discharge, four different variables indicating a re-admission according to its type and the number of days occurred between a previous discharge and a following hospitalization. The following code was submitted: % CEM (lib=health, data=discharges, id=patient id, treat=treat, del mis=1, match type=1, coddataset=1) To calculate the simulation test speed of the macro we built 28 datasets for different number of records and different number of numeric or categorical variables. In table 1 we describe the characteristics of these datasets, composed of a range of records between 100,000 and 1,000,000 and a number of variables varying between 2 and 18. To monitories all the parameters that could influence the speed of the macro in each dataset we arbitrarily tested a different number of numeric and categorical variables. %CEM macro speed performances were tested on a notebook with the following technical characteristics: OS Windows 7 (X64) Quad-Core processor Intel(R) Core(TM) i5-2430m 2.40GHz 4.00 Gb Ram Memory Speed test are calculated as the difference in seconds. At each session of test the SAS Log, the output and the Work Library are cleaned. In table 2 we present main results. The code identifying the datasets are referred to the code assigned in table 1. As expected time increases with the number of records and variables and varies from 7 minutes for the dataset with 100,000 records and 2 variables (1 numeric variable and 1 categorical variable) to nearly 2 days and 2 hours for the dataset with 1,000,000 records and 18 variables (8 numeric variable and 10 categorical variables). The executions of the macro, obviously, produce different selections of patients depending on the way the stratification code is constructed. Each numeric variable is classified according to 4 criteria: the exact value of the variable, the value of the percentile of the exact value, the value of the quartile and the value of the quintile. Each categorical variable is combined in a single code, defined as the whole set of the individual categories. The combinations of these 4 values for each numerical variable with the original values of the categorical variables produce a new code for the stratum to be assigned to each patient. At this point %CEM combines records with same 7

9 Table 1: Speed Test: Dataset s characteristics DatsetCode Tot Record Tot Var Tot Numerical Var Tot Class Var 1 100, , , , , , , , , , , , , , , , , , , , , ,000, ,000, ,000, ,000, ,000, ,000, ,000,

10 Table 2: Speed Test: Time of execution DatasetCode Time HH:MM:SS Time seconds 1 00:07: :19: :28: :26: :47: :03: :20: :10: :22: :29: :29: :40: :56: :50: :14: :26: :32: :32: :42: :10: :30: :52: :58: :51: :11: :34: :55: :21:

11 Figure 1: L1 Plot Table 3: Speed Test: Dataset s characteristics Model1 Model2 Model3 Model4 Model5 Model6 Model7 Model8 Intercept *** *** *** *** *** *** *** *** TotRec *** *** *** *** *** TotVar *** *** TotVarClass *** *** * TotVarNum *** *** * R-square *** = p-value ** = p-value * = p-value 0.01 stratum code according to the specified matching rule. This way the %CEM macro selects an equal number of records for treated and control patients (it s a 1:1 matching) or a different number of records for treated and controls of the stratum assigning them a specific CEM weight (in a 1:n matching). The different selection criterion produces more or less balanced sub populations. At the end of the execution the macro calculates the L 1 parameter, as discussed before, and plots a graph of all the L 1 balance values. Analyzing the L 1 plot we can choose the best matching system corresponding to the lower L 1 value (an example is given by Figure 1). Analyzing the contribution of the number of records and the number and type of variables included to the speed of the %CEM macro we estimate a model that analyzes time of execution with respect to the different characteristics of the dataset. Results are presented in table 3. The analysis put in evidence as the number of numeric variables is the main predictor of the time of execution of the macro. The number of variables included and the variable type (categorical or numerical) contribute to explain the time of execution in all the models as the number of records. Models with the best goodness of fit are models 5, 6, 7 and 8 where there is a combined effect of the number of records and all the different number of variables. We tested also the interaction between the number of records with all the number 10

12 of variables but the parameters estimated were not significant. To test for matching validity we set an artificial model that estimates the probability of being a Lombard patient versus the probability of being a patient resident in a different Italian region. If the matching remove all the unbalance we expect to obtain non-significant coefficients for all the covariates included. where: Treat ln( i )=a 0 + S 1 Treat j b j X ji + # i (2) i i = 1... I Patients, j = 1... J Covariates at patient level Treat i is a dummy variable = 1 if the patient is resident in Lombardy or = 0 otherwise X ji are the J covariates at patient level, corresponding to the variable used to match the patients Table 4 confirms the hypothesis tested. Finally we estimate a standard treated model to measure the effect of being Lombard on the in-hospital mortality in our empirical example: we analyze the in-hospital mortality difference in the original data and after the %CEM application on the matched observations. Results are shown in tables 5 and 6. where: p i ln( )=a 0 + btreat 1 p i + # i (3) i i = 1... I-th patient p i is the probability of dying in-hospital for the i-th patient Treat i is a dummy variable equal to 1 if the patient is resident in Lombardy and zero otherwise The estimated model in table 5 shows as for the patients that live in Lombardy the risk of dying in hospital is 3 times higher respect to the patients from different regions. This could means that the patients from other regions come to Lombardy for specific hospitalization related moreover to high specialization with lower risk of death. Patients from other regions with high risk of dying could ask to come back at home and then we can t follow the rest of their life. After the application of %CEM macro we compare patients with same characteristics and the model estimated in table 6 puts in evidence a not-significant risk for patients living in Lombardy and patients living outside Lombardy. 11

13 Table 4: Test for unbalance Variable Estimate StdErr P Value Intercept Sex (F vs M) Age Cardio COMORB (1 vs 5+) COMORB (2 vs 5+) COMORB (3 vs 5+) COMORB (4 vs 5+) LOS Mdc Mdc Mdc Mdc NA Rep Rep Rep Rep Clafi1 DD Clafi1 DO Clafi1 ZU Urg TYPE DISCH (1 vs 7) TYPE DISCH (2 vs 7) TYPE DISCH (3 vs 7) TYPE DISCH (4 vs 7) TYPE DISCH (5 vs 7) TYPE DISCH (6 vs 7) Onco ICU Drg Drg Drg Drg Drg Drg Drg LOS Pre Surg Readm S GG Readm MDC S GG Readm MDC S AC Readm MDC GG

14 Table 5: Analysis of in-hospital mortality before %CEM application Variable Estimate StdErr P Value Intercept Lombardo Table 6: Analysis of in-hospital mortality after %CEM application Variable Estimate StdErr P Value Intercept Lombardo Concluding Remarks Coarsened Exact Matching is a non-parametric matching method introduced to avoid the confounding influence of pre-treatment control variables directly improving causal inference in quasi experimental studies. CEM s authors originally provided few software solutions for R, Stata and SPSS. % CEM now allows researchers in performing the Coarsened Exact Matching (CEM) technique also with the SAS software taking advantages of this software performances. The %CEM macro fills this way an important lack in the actual software literature providing the possibility of processing the CEM matching algorithm on a huge number of records with a consistent time saving. The macro code is illustrated using an ad hoc example of matching treated and control patients in a regional Italian study of the incidence of in-hospital mortality considering the individual place of residence. Macro s execution time depends on the number of records (results provided from 100,000 up to 1 million) and variables (from 2 to 18, both numerical or categorical) included. The analysis of execution times makes evidence of how time depends strongly on both the number of records and the number of numerical variables (goodness of fit around 0.93). Our empirical application makes evidence of how, in our example, unbalanced observable characteristics of treated and control patients directly affect the treatment estimate. 13

15 References [1] Cochran, W. G., The effectiveness of adjustment by sub-classification in removing bias in observational studies, Biometrics, vol. 24, pp , (1968); [2] Camillo, F., D Attoma, I., % GI SAS Macro: A SAS Macro for Measuring and Testing Global Imbalance of Covariates within Subgroups, Journal of Statistical Software, vol. 51, Code Snippet 1, (2012); [3] Iacus, S.M., King, G., Porro, G. Cem: Software for Coarsened Exact Matching, Journal of Statistical Software, 30(9), 127 (2009); [4] Iacus, S. M., Porro, G., Random Recursive Partitioning: A Matching Method for the Estimation of the Average Treatment Effect, Journal of Applied Econometrics, vol. 24, pp [349], (2009); [5] Iacus, S. M., King G. and Porro G., Multivariate matching methods that are Monotonic Imbalance Bounding, Journal of the American Statistical Association, (2011); [6] Imbens, G. and Wooldridge, J.M., Recent developments in the Econometric of Program Evaluation, IZA Discussion Paper, No. 3640, (2008); [7] Mielke, P. W., and Kenneth J. B., Permutation methods: A distance function approach, New York: Springer, 2007; [8] Rosenbaum, P. R. and Rubin, D. B., The central role of the propensity score in observational studies for causal effects, Biometrika, vol.70, 41-55, (1983); [9] Rubin, D.B., Practical implications of modes of statistical inference for causal effects and the critical role of the assignment mechanism, Biometrika, vol.47, (1991). 14

BIOSTATISTICAL METHODS

BIOSTATISTICAL METHODS BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH PROPENSITY SCORE Confounding Definition: A situation in which the effect or association between an exposure (a predictor or risk factor) and

More information

PubH 7405: REGRESSION ANALYSIS. Propensity Score

PubH 7405: REGRESSION ANALYSIS. Propensity Score PubH 7405: REGRESSION ANALYSIS Propensity Score INTRODUCTION: There is a growing interest in using observational (or nonrandomized) studies to estimate the effects of treatments on outcomes. In observational

More information

P E R S P E C T I V E S

P E R S P E C T I V E S PHOENIX CENTER FOR ADVANCED LEGAL & ECONOMIC PUBLIC POLICY STUDIES Revisiting Internet Use and Depression Among the Elderly George S. Ford, PhD June 7, 2013 Introduction Four years ago in a paper entitled

More information

TRIPLL Webinar: Propensity score methods in chronic pain research

TRIPLL Webinar: Propensity score methods in chronic pain research TRIPLL Webinar: Propensity score methods in chronic pain research Felix Thoemmes, PhD Support provided by IES grant Matching Strategies for Observational Studies with Multilevel Data in Educational Research

More information

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Propensity Score Methods for Causal Inference with the PSMATCH Procedure Paper SAS332-2017 Propensity Score Methods for Causal Inference with the PSMATCH Procedure Yang Yuan, Yiu-Fai Yung, and Maura Stokes, SAS Institute Inc. Abstract In a randomized study, subjects are randomly

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Propensity Score Matching with Limited Overlap. Abstract

Propensity Score Matching with Limited Overlap. Abstract Propensity Score Matching with Limited Overlap Onur Baser Thomson-Medstat Abstract In this article, we have demostrated the application of two newly proposed estimators which accounts for lack of overlap

More information

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching PharmaSUG 207 - Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching Aran Canes, Cigna Corporation ABSTRACT Coarsened Exact

More information

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine Confounding by indication developments in matching, and instrumental variable methods Richard Grieve London School of Hygiene and Tropical Medicine 1 Outline 1. Causal inference and confounding 2. Genetic

More information

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,

More information

Part 8 Logistic Regression

Part 8 Logistic Regression 1 Quantitative Methods for Health Research A Practical Interactive Guide to Epidemiology and Statistics Practical Course in Quantitative Data Handling SPSS (Statistical Package for the Social Sciences)

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

Basic Biostatistics. Chapter 1. Content

Basic Biostatistics. Chapter 1. Content Chapter 1 Basic Biostatistics Jamalludin Ab Rahman MD MPH Department of Community Medicine Kulliyyah of Medicine Content 2 Basic premises variables, level of measurements, probability distribution Descriptive

More information

Matching pre-processing of splitballot. for the analysis of double standards. Bruno Arpino

Matching pre-processing of splitballot. for the analysis of double standards. Bruno Arpino Matching pre-processing of splitballot survey data for the analysis of double standards Bruno Arpino RECSM Working Paper Number 47 February 2016 http://www.upf.edu/survey/_pdf/recsm_wp047.pdf Matching

More information

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods Dean Eckles Department of Communication Stanford University dean@deaneckles.com Abstract

More information

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis DSC 4/5 Multivariate Statistical Methods Applications DSC 4/5 Multivariate Statistical Methods Discriminant Analysis Identify the group to which an object or case (e.g. person, firm, product) belongs:

More information

Introduction to Observational Studies. Jane Pinelis

Introduction to Observational Studies. Jane Pinelis Introduction to Observational Studies Jane Pinelis 22 March 2018 Outline Motivating example Observational studies vs. randomized experiments Observational studies: basics Some adjustment strategies Matching

More information

Detecting Anomalous Patterns of Care Using Health Insurance Claims

Detecting Anomalous Patterns of Care Using Health Insurance Claims Partially funded by National Science Foundation grants IIS-0916345, IIS-0911032, and IIS-0953330, and funding from Disruptive Health Technology Institute. We are also grateful to Highmark Health for providing

More information

Propensity Score Methods to Adjust for Bias in Observational Data SAS HEALTH USERS GROUP APRIL 6, 2018

Propensity Score Methods to Adjust for Bias in Observational Data SAS HEALTH USERS GROUP APRIL 6, 2018 Propensity Score Methods to Adjust for Bias in Observational Data SAS HEALTH USERS GROUP APRIL 6, 2018 Institute Institute for Clinical for Clinical Evaluative Evaluative Sciences Sciences Overview 1.

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information

Remarks on Bayesian Control Charts

Remarks on Bayesian Control Charts Remarks on Bayesian Control Charts Amir Ahmadi-Javid * and Mohsen Ebadi Department of Industrial Engineering, Amirkabir University of Technology, Tehran, Iran * Corresponding author; email address: ahmadi_javid@aut.ac.ir

More information

Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank)

Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank) Evaluating Social Programs Course: Evaluation Glossary (Sources: 3ie and The World Bank) Attribution The extent to which the observed change in outcome is the result of the intervention, having allowed

More information

Reveal Relationships in Categorical Data

Reveal Relationships in Categorical Data SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction

More information

Lecture II: Difference in Difference. Causality is difficult to Show from cross

Lecture II: Difference in Difference. Causality is difficult to Show from cross Review Lecture II: Regression Discontinuity and Difference in Difference From Lecture I Causality is difficult to Show from cross sectional observational studies What caused what? X caused Y, Y caused

More information

Numerical Integration of Bivariate Gaussian Distribution

Numerical Integration of Bivariate Gaussian Distribution Numerical Integration of Bivariate Gaussian Distribution S. H. Derakhshan and C. V. Deutsch The bivariate normal distribution arises in many geostatistical applications as most geostatistical techniques

More information

Effects of propensity score overlap on the estimates of treatment effects. Yating Zheng & Laura Stapleton

Effects of propensity score overlap on the estimates of treatment effects. Yating Zheng & Laura Stapleton Effects of propensity score overlap on the estimates of treatment effects Yating Zheng & Laura Stapleton Introduction Recent years have seen remarkable development in estimating average treatment effects

More information

Combining machine learning and matching techniques to improve causal inference in program evaluation

Combining machine learning and matching techniques to improve causal inference in program evaluation bs_bs_banner Journal of Evaluation in Clinical Practice ISSN1365-2753 Combining machine learning and matching techniques to improve causal inference in program evaluation Ariel Linden DrPH 1,2 and Paul

More information

OHDSI Tutorial: Design and implementation of a comparative cohort study in observational healthcare data

OHDSI Tutorial: Design and implementation of a comparative cohort study in observational healthcare data OHDSI Tutorial: Design and implementation of a comparative cohort study in observational healthcare data Faculty: Martijn Schuemie (Janssen Research and Development) Marc Suchard (UCLA) Patrick Ryan (Janssen

More information

Section on Survey Research Methods JSM 2009

Section on Survey Research Methods JSM 2009 Missing Data and Complex Samples: The Impact of Listwise Deletion vs. Subpopulation Analysis on Statistical Bias and Hypothesis Test Results when Data are MCAR and MAR Bethany A. Bell, Jeffrey D. Kromrey

More information

Methods for treating bias in ISTAT mixed mode social surveys

Methods for treating bias in ISTAT mixed mode social surveys Methods for treating bias in ISTAT mixed mode social surveys C. De Vitiis, A. Guandalini, F. Inglese and M.D. Terribili ITACOSM 2017 Bologna, 16th June 2017 Summary 1. The Mixed Mode in ISTAT social surveys

More information

I. Identifying the question Define Research Hypothesis and Questions

I. Identifying the question Define Research Hypothesis and Questions Term Paper I. Identifying the question What is the question? (What are my hypotheses?) Is it possible to answer the question with statistics? Is the data obtainable? (birth weight, socio economic, drugs,

More information

Lecture II: Difference in Difference and Regression Discontinuity

Lecture II: Difference in Difference and Regression Discontinuity Review Lecture II: Difference in Difference and Regression Discontinuity it From Lecture I Causality is difficult to Show from cross sectional observational studies What caused what? X caused Y, Y caused

More information

Methods for Addressing Selection Bias in Observational Studies

Methods for Addressing Selection Bias in Observational Studies Methods for Addressing Selection Bias in Observational Studies Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA What is Selection Bias? In the regression

More information

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide

George B. Ploubidis. The role of sensitivity analysis in the estimation of causal pathways from observational data. Improving health worldwide George B. Ploubidis The role of sensitivity analysis in the estimation of causal pathways from observational data Improving health worldwide www.lshtm.ac.uk Outline Sensitivity analysis Causal Mediation

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection

Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Author's response to reviews Title: A robustness study of parametric and non-parametric tests in Model-Based Multifactor Dimensionality Reduction for epistasis detection Authors: Jestinah M Mahachie John

More information

Recent advances in non-experimental comparison group designs

Recent advances in non-experimental comparison group designs Recent advances in non-experimental comparison group designs Elizabeth Stuart Johns Hopkins Bloomberg School of Public Health Department of Mental Health Department of Biostatistics Department of Health

More information

Using Propensity Score Matching in Clinical Investigations: A Discussion and Illustration

Using Propensity Score Matching in Clinical Investigations: A Discussion and Illustration 208 International Journal of Statistics in Medical Research, 2015, 4, 208-216 Using Propensity Score Matching in Clinical Investigations: A Discussion and Illustration Carrie Hosman 1,* and Hitinder S.

More information

Chapter 13 Estimating the Modified Odds Ratio

Chapter 13 Estimating the Modified Odds Ratio Chapter 13 Estimating the Modified Odds Ratio Modified odds ratio vis-à-vis modified mean difference To a large extent, this chapter replicates the content of Chapter 10 (Estimating the modified mean difference),

More information

Supplemental Appendix for Beyond Ricardo: The Link Between Intraindustry. Timothy M. Peterson Oklahoma State University

Supplemental Appendix for Beyond Ricardo: The Link Between Intraindustry. Timothy M. Peterson Oklahoma State University Supplemental Appendix for Beyond Ricardo: The Link Between Intraindustry Trade and Peace Timothy M. Peterson Oklahoma State University Cameron G. Thies University of Iowa A-1 This supplemental appendix

More information

Instrumental Variables I (cont.)

Instrumental Variables I (cont.) Review Instrumental Variables Observational Studies Cross Sectional Regressions Omitted Variables, Reverse causation Randomized Control Trials Difference in Difference Time invariant omitted variables

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

We define a simple difference-in-differences (DD) estimator for. the treatment effect of Hospital Compare (HC) from the

We define a simple difference-in-differences (DD) estimator for. the treatment effect of Hospital Compare (HC) from the Appendix A: Difference-in-Difference Estimation Estimation Strategy We define a simple difference-in-differences (DD) estimator for the treatment effect of Hospital Compare (HC) from the perspective of

More information

Index. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI /

Index. Springer International Publishing Switzerland 2017 T.J. Cleophas, A.H. Zwinderman, Modern Meta-Analysis, DOI / Index A Adjusted Heterogeneity without Overdispersion, 63 Agenda-driven bias, 40 Agenda-Driven Meta-Analyses, 306 307 Alternative Methods for diagnostic meta-analyses, 133 Antihypertensive effect of potassium,

More information

ESM1 for Glucose, blood pressure and cholesterol levels and their relationships to clinical outcomes in type 2 diabetes: a retrospective cohort study

ESM1 for Glucose, blood pressure and cholesterol levels and their relationships to clinical outcomes in type 2 diabetes: a retrospective cohort study ESM1 for Glucose, blood pressure and cholesterol levels and their relationships to clinical outcomes in type 2 diabetes: a retrospective cohort study Statistical modelling details We used Cox proportional-hazards

More information

9 research designs likely for PSYC 2100

9 research designs likely for PSYC 2100 9 research designs likely for PSYC 2100 1) 1 factor, 2 levels, 1 group (one group gets both treatment levels) related samples t-test (compare means of 2 levels only) 2) 1 factor, 2 levels, 2 groups (one

More information

Is Hospital Admission Useful for Syncope Patients? Preliminary Results of a Multicenter Cohort

Is Hospital Admission Useful for Syncope Patients? Preliminary Results of a Multicenter Cohort Is Hospital Admission Useful for Syncope Patients? Preliminary Results of a Multicenter Cohort F. Dipaola, E. Pivetta, G. Costantino, G. Casazza, M.J. Reed, B. Sun, M. Solbiati, F. Barbic, D. Shiffer,

More information

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose

More information

Mediation Analysis With Principal Stratification

Mediation Analysis With Principal Stratification University of Pennsylvania ScholarlyCommons Statistics Papers Wharton Faculty Research 3-30-009 Mediation Analysis With Principal Stratification Robert Gallop Dylan S. Small University of Pennsylvania

More information

Impact and adjustment of selection bias. in the assessment of measurement equivalence

Impact and adjustment of selection bias. in the assessment of measurement equivalence Impact and adjustment of selection bias in the assessment of measurement equivalence Thomas Klausch, Joop Hox,& Barry Schouten Working Paper, Utrecht, December 2012 Corresponding author: Thomas Klausch,

More information

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you? WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters

More information

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA PART 1: Introduction to Factorial ANOVA ingle factor or One - Way Analysis of Variance can be used to test the null hypothesis that k or more treatment or group

More information

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance

Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Machine Learning to Inform Breast Cancer Post-Recovery Surveillance Final Project Report CS 229 Autumn 2017 Category: Life Sciences Maxwell Allman (mallman) Lin Fan (linfan) Jamie Kang (kangjh) 1 Introduction

More information

Title: How efficient are Referral Hospitals in Uganda? A Data Envelopment Analysis and Tobit Regression Approach

Title: How efficient are Referral Hospitals in Uganda? A Data Envelopment Analysis and Tobit Regression Approach Author s response to reviews Title: How efficient are Referral Hospitals in Uganda? A Data Envelopment Analysis and Tobit Regression Approach Authors: Paschal Mujasi (Pmujasi@yahoo.co.uk) Eyob Asbu (zeyob@yahoo.com)

More information

Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision

Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision ISPUB.COM The Internet Journal of Epidemiology Volume 7 Number 2 Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision Z Wang Abstract There is an increasing

More information

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business

Applied Medical. Statistics Using SAS. Geoff Der. Brian S. Everitt. CRC Press. Taylor Si Francis Croup. Taylor & Francis Croup, an informa business Applied Medical Statistics Using SAS Geoff Der Brian S. Everitt CRC Press Taylor Si Francis Croup Boca Raton London New York CRC Press is an imprint of the Taylor & Francis Croup, an informa business A

More information

Version No. 7 Date: July Please send comments or suggestions on this glossary to

Version No. 7 Date: July Please send comments or suggestions on this glossary to Impact Evaluation Glossary Version No. 7 Date: July 2012 Please send comments or suggestions on this glossary to 3ie@3ieimpact.org. Recommended citation: 3ie (2012) 3ie impact evaluation glossary. International

More information

Tutorial Copyright ACT Consult. All Rights Reserved

Tutorial Copyright ACT Consult. All Rights Reserved ACT-DRG-Optimizer Tutorial Copyright 1998-2006 ACT Consult. All Rights Reserved @ct-drg-optimizer version 1.3 ACT-DRG-Optimizer ACT-DRG-Optimizer is designed for use by Healthcare providers where the healthcare

More information

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp The Stata Journal (22) 2, Number 3, pp. 28 289 Comparative assessment of three common algorithms for estimating the variance of the area under the nonparametric receiver operating characteristic curve

More information

ABSTRACT INTRODUCTION COVARIATE EXAMINATION. Paper

ABSTRACT INTRODUCTION COVARIATE EXAMINATION. Paper Paper 11420-2016 Integrating SAS and R to Perform Optimal Propensity Score Matching Lucy D Agostino McGowan and Robert Alan Greevy, Jr., Vanderbilt University, Department of Biostatistics ABSTRACT In studies

More information

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations) Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations) After receiving my comments on the preliminary reports of your datasets, the next step for the groups is to complete

More information

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros

Marno Verbeek Erasmus University, the Netherlands. Cons. Pros Marno Verbeek Erasmus University, the Netherlands Using linear regression to establish empirical relationships Linear regression is a powerful tool for estimating the relationship between one variable

More information

Abstract Title Page. Authors and Affiliations: Chi Chang, Michigan State University. SREE Spring 2015 Conference Abstract Template

Abstract Title Page. Authors and Affiliations: Chi Chang, Michigan State University. SREE Spring 2015 Conference Abstract Template Abstract Title Page Title: Sensitivity Analysis for Multivalued Treatment Effects: An Example of a Crosscountry Study of Teacher Participation and Job Satisfaction Authors and Affiliations: Chi Chang,

More information

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values

Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values Logistic Regression with Missing Data: A Comparison of Handling Methods, and Effects of Percent Missing Values Sutthipong Meeyai School of Transportation Engineering, Suranaree University of Technology,

More information

Complier Average Causal Effect (CACE)

Complier Average Causal Effect (CACE) Complier Average Causal Effect (CACE) Booil Jo Stanford University Methodological Advancement Meeting Innovative Directions in Estimating Impact Office of Planning, Research & Evaluation Administration

More information

Joseph W Hogan Brown University & AMPATH February 16, 2010

Joseph W Hogan Brown University & AMPATH February 16, 2010 Joseph W Hogan Brown University & AMPATH February 16, 2010 Drinking and lung cancer Gender bias and graduate admissions AMPATH nutrition study Stratification and regression drinking and lung cancer graduate

More information

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH

CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH CASE STUDY 2: VOCATIONAL TRAINING FOR DISADVANTAGED YOUTH Why Randomize? This case study is based on Training Disadvantaged Youth in Latin America: Evidence from a Randomized Trial by Orazio Attanasio,

More information

Supplementary Online Content

Supplementary Online Content Supplementary Online Content Rollman BL, Herbeck Belnap B, Abebe KZ, et al. Effectiveness of online collaborative care for treating mood and anxiety disorders in primary care: a randomized clinical trial.

More information

Modeling Sentiment with Ridge Regression

Modeling Sentiment with Ridge Regression Modeling Sentiment with Ridge Regression Luke Segars 2/20/2012 The goal of this project was to generate a linear sentiment model for classifying Amazon book reviews according to their star rank. More generally,

More information

In this module I provide a few illustrations of options within lavaan for handling various situations.

In this module I provide a few illustrations of options within lavaan for handling various situations. In this module I provide a few illustrations of options within lavaan for handling various situations. An appropriate citation for this material is Yves Rosseel (2012). lavaan: An R Package for Structural

More information

Propensity score method: a non-parametric technique to reduce model dependence

Propensity score method: a non-parametric technique to reduce model dependence Big-data Clinical Trial Column Page of 8 Propensity score method: a non-parametric technique to reduce model dependence Zhongheng Zhang Department of Emergency Medicine, Sir Run-Run Shaw Hospital, Zhejiang

More information

Supplementary Appendix

Supplementary Appendix Supplementary Appendix This appendix has been provided by the authors to give readers additional information about their work. Supplement to: Weintraub WS, Grau-Sepulveda MV, Weiss JM, et al. Comparative

More information

Peter C. Austin Institute for Clinical Evaluative Sciences and University of Toronto

Peter C. Austin Institute for Clinical Evaluative Sciences and University of Toronto Multivariate Behavioral Research, 46:119 151, 2011 Copyright Taylor & Francis Group, LLC ISSN: 0027-3171 print/1532-7906 online DOI: 10.1080/00273171.2011.540480 A Tutorial and Case Study in Propensity

More information

Multi-Stage Stratified Sampling for the Design of Large Scale Biometric Systems

Multi-Stage Stratified Sampling for the Design of Large Scale Biometric Systems Multi-Stage Stratified Sampling for the Design of Large Scale Biometric Systems Jad Ramadan, Mark Culp, Ken Ryan, Bojan Cukic West Virginia University 1 Problem How to create a set of biometric samples

More information

Matt Laidler, MPH, MA Acute and Communicable Disease Program Oregon Health Authority. SOSUG, April 17, 2014

Matt Laidler, MPH, MA Acute and Communicable Disease Program Oregon Health Authority. SOSUG, April 17, 2014 Matt Laidler, MPH, MA Acute and Communicable Disease Program Oregon Health Authority SOSUG, April 17, 2014 The conditional probability of being assigned to a particular treatment given a vector of observed

More information

Day Hospital versus Ordinary Hospitalization: factors in treatment discrimination

Day Hospital versus Ordinary Hospitalization: factors in treatment discrimination Working Paper Series, N. 7, July 2004 Day Hospital versus Ordinary Hospitalization: factors in treatment discrimination Luca Grassetti Department of Statistical Sciences University of Padua Italy Michela

More information

Political Science 15, Winter 2014 Final Review

Political Science 15, Winter 2014 Final Review Political Science 15, Winter 2014 Final Review The major topics covered in class are listed below. You should also take a look at the readings listed on the class website. Studying Politics Scientifically

More information

Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl

Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl Book review of Herbert I. Weisberg: Bias and Causation, Models and Judgment for Valid Comparisons Reviewed by Judea Pearl Judea Pearl University of California, Los Angeles Computer Science Department Los

More information

Analysis of TB prevalence surveys

Analysis of TB prevalence surveys Workshop and training course on TB prevalence surveys with a focus on field operations Analysis of TB prevalence surveys Day 8 Thursday, 4 August 2011 Phnom Penh Babis Sismanidis with acknowledgements

More information

SUPPLEMENTAL MATERIAL

SUPPLEMENTAL MATERIAL 1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Using machine learning to assess covariate balance in matching studies

Using machine learning to assess covariate balance in matching studies bs_bs_banner Journal of Evaluation in Clinical Practice ISSN1365-2753 Using machine learning to assess covariate balance in matching studies Ariel Linden, DrPH 1,2 and Paul R. Yarnold, PhD 3 1 President,

More information

School Autonomy and Regression Discontinuity Imbalance

School Autonomy and Regression Discontinuity Imbalance School Autonomy and Regression Discontinuity Imbalance Todd Kawakita 1 and Colin Sullivan 2 Abstract In this research note, we replicate and assess Damon Clark s (2009) analysis of school autonomy reform

More information

Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation

Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation Institute for Clinical Evaluative Sciences From the SelectedWorks of Peter Austin 2012 Using Ensemble-Based Methods for Directly Estimating Causal Effects: An Investigation of Tree-Based G-Computation

More information

Propensity Score Analysis to compare effects of radiation and surgery on survival time of lung cancer patients from National Cancer Registry (SEER)

Propensity Score Analysis to compare effects of radiation and surgery on survival time of lung cancer patients from National Cancer Registry (SEER) Propensity Score Analysis to compare effects of radiation and surgery on survival time of lung cancer patients from National Cancer Registry (SEER) Yan Wu Advisor: Robert Pruzek Epidemiology and Biostatistics

More information

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS)

TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) TITLE: A Data-Driven Approach to Patient Risk Stratification for Acute Respiratory Distress Syndrome (ARDS) AUTHORS: Tejas Prahlad INTRODUCTION Acute Respiratory Distress Syndrome (ARDS) is a condition

More information

Finland and Sweden and UK GP-HOSP datasets

Finland and Sweden and UK GP-HOSP datasets Web appendix: Supplementary material Table 1 Specific diagnosis codes used to identify bladder cancer cases in each dataset Finland and Sweden and UK GP-HOSP datasets Netherlands hospital and cancer registry

More information

OBSERVATIONAL MEDICAL OUTCOMES PARTNERSHIP

OBSERVATIONAL MEDICAL OUTCOMES PARTNERSHIP OBSERVATIONAL Patient-centered observational analytics: New directions toward studying the effects of medical products Patrick Ryan on behalf of OMOP Research Team May 22, 2012 Observational Medical Outcomes

More information

Classification and Statistical Analysis of Auditory FMRI Data Using Linear Discriminative Analysis and Quadratic Discriminative Analysis

Classification and Statistical Analysis of Auditory FMRI Data Using Linear Discriminative Analysis and Quadratic Discriminative Analysis International Journal of Innovative Research in Computer Science & Technology (IJIRCST) ISSN: 2347-5552, Volume-2, Issue-6, November-2014 Classification and Statistical Analysis of Auditory FMRI Data Using

More information

Propensity Score Analysis: Its rationale & potential for applied social/behavioral research. Bob Pruzek University at Albany

Propensity Score Analysis: Its rationale & potential for applied social/behavioral research. Bob Pruzek University at Albany Propensity Score Analysis: Its rationale & potential for applied social/behavioral research Bob Pruzek University at Albany Aims: First, to introduce key ideas that underpin propensity score (PS) methodology

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information

Observational & Quasi-experimental Research Methods

Observational & Quasi-experimental Research Methods Observational & Quasi-experimental Research Methods 10th Annual Kathleen Foley Palliative Care Retreat Old Québec, October 24, 2016 Melissa M. Garrido, PhD 1 and Jay Magaziner, PhD 2 1. Department of Veterans

More information

Identification of Tissue Independent Cancer Driver Genes

Identification of Tissue Independent Cancer Driver Genes Identification of Tissue Independent Cancer Driver Genes Alexandros Manolakos, Idoia Ochoa, Kartik Venkat Supervisor: Olivier Gevaert Abstract Identification of genomic patterns in tumors is an important

More information

Biostatistics II

Biostatistics II Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,

More information

The role of self-reporting bias in health, mental health and labor force participation: a descriptive analysis

The role of self-reporting bias in health, mental health and labor force participation: a descriptive analysis Empir Econ DOI 10.1007/s00181-010-0434-z The role of self-reporting bias in health, mental health and labor force participation: a descriptive analysis Justin Leroux John A. Rizzo Robin Sickles Received:

More information

(C) Jamalludin Ab Rahman

(C) Jamalludin Ab Rahman SPSS Note The GLM Multivariate procedure is based on the General Linear Model procedure, in which factors and covariates are assumed to have a linear relationship to the dependent variable. Factors. Categorical

More information

Section 6: Analysing Relationships Between Variables

Section 6: Analysing Relationships Between Variables 6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations

More information

6. Unusual and Influential Data

6. Unusual and Influential Data Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the

More information