Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Similar documents
IMPACTS OF SOCIAL NETWORKS AND SPACE ON OBESITY. The Rights and Wrongs of Social Network Analysis

Practical propensity score matching: a reply to Smith and Todd

Do Your Online Friends Make You Pay? A Randomized Field Experiment on Peer Influence in Online Social Networks Online Appendix

Causal Validity Considerations for Including High Quality Non-Experimental Evidence in Systematic Reviews

Chapter 21 Multilevel Propensity Score Methods for Estimating Causal Effects: A Latent Class Modeling Strategy

Effects of propensity score overlap on the estimates of treatment effects. Yating Zheng & Laura Stapleton

Version No. 7 Date: July Please send comments or suggestions on this glossary to

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

By: Mei-Jie Zhang, Ph.D.

Assessing the impact of unmeasured confounding: confounding functions for causal inference

PharmaSUG Paper HA-04 Two Roads Diverged in a Narrow Dataset...When Coarsened Exact Matching is More Appropriate than Propensity Score Matching

TRIPLL Webinar: Propensity score methods in chronic pain research

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA

Introduction to Observational Studies. Jane Pinelis

Chapter 14: More Powerful Statistical Methods

Propensity scores: what, why and why not?

Regression Discontinuity Analysis

Propensity Score Methods to Adjust for Bias in Observational Data SAS HEALTH USERS GROUP APRIL 6, 2018

PubH 7405: REGRESSION ANALYSIS. Propensity Score

Identifying Endogenous Peer Effects in the Spread of Obesity. Abstract

Fundamental Clinical Trial Design

The Impact of Relative Standards on the Propensity to Disclose. Alessandro Acquisti, Leslie K. John, George Loewenstein WEB APPENDIX

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Analysis Shenyang Guo, Ph.D.

Social Effects in Blau Space:

Getting ready for propensity score methods: Designing non-experimental studies and selecting comparison groups

Causal Methods for Observational Data Amanda Stevenson, University of Texas at Austin Population Research Center, Austin, TX

BIOSTATISTICAL METHODS

Challenges of Observational and Retrospective Studies

Dylan Small Department of Statistics, Wharton School, University of Pennsylvania. Based on joint work with Paul Rosenbaum

Bias in regression coefficient estimates when assumptions for handling missing data are violated: a simulation study

Matched Cohort designs.

UNIT I SAMPLING AND EXPERIMENTATION: PLANNING AND CONDUCTING A STUDY (Chapter 4)

Citation for published version (APA): Ebbes, P. (2004). Latent instrumental variables: a new approach to solve for endogeneity s.n.

Chapter 1: Exploring Data

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Propensity Score Matching with Limited Overlap. Abstract

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

What is Multilevel Modelling Vs Fixed Effects. Will Cook Social Statistics

Estimating Peer Effects on Health in Social Networks

Unit 1 Exploring and Understanding Data

The Stable Unit Treatment Value Assumption (SUTVA) and Its Implications for Social Science RCTs

Recent advances in non-experimental comparison group designs

Addendum to the 12th Report on Carcinogens

arxiv: v1 [cs.si] 19 Jun 2012

How Behavior Spreads. The Science of Complex Contagions. Damon Centola

Sensitivity Analysis in Observational Research: Introducing the E-value

How should the propensity score be estimated when some confounders are partially observed?

Detection of Unknown Confounders. by Bayesian Confirmatory Factor Analysis

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Score Tests of Normality in Bivariate Probit Models

Data Analysis Using Regression and Multilevel/Hierarchical Models

A Brief Introduction to Bayesian Statistics

Assessing the efficacy of mobile phone interventions using randomised controlled trials: issues and their solutions. Dr Emma Beard

The ACCE method: an approach for obtaining quantitative or qualitative estimates of residual confounding that includes unmeasured confounding

Regression and causal analysis. Harry Ganzeboom Research Skills, December Lecture #5

Ponatinib for treating chronic myeloid leukaemia and acute lymphoblastic leukaemia

Matching methods for causal inference: A review and a look forward

Logistic regression: Why we often can do what we think we can do 1.

Using Propensity Score Matching in Clinical Investigations: A Discussion and Illustration

Extending Rungie et al. s model of brand image stability to account for heterogeneity

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

Abstract. Introduction A SIMULATION STUDY OF ESTIMATORS FOR RATES OF CHANGES IN LONGITUDINAL STUDIES WITH ATTRITION

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

Inverse Probability of Censoring Weighting for Selective Crossover in Oncology Clinical Trials.

Student Performance Q&A:

Evaluating health management programmes over time: application of propensity score-based weighting to longitudinal datajep_

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

An Empirical Assessment of Bivariate Methods for Meta-analysis of Test Accuracy

NEW METHODS FOR SENSITIVITY TESTS OF EXPLOSIVE DEVICES

Reassessing Quasi-Experiments: Policy Evaluation, Induction, and SUTVA. Tom Boesche

Sampling Weights, Model Misspecification and Informative Sampling: A Simulation Study

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of

MEA DISCUSSION PAPERS

Exploring the Impact of Missing Data in Multiple Regression

Pros. University of Chicago and NORC at the University of Chicago, USA, and IZA, Germany

A Spreadsheet for Deriving a Confidence Interval, Mechanistic Inference and Clinical Inference from a P Value

Methods for Addressing Selection Bias in Observational Studies

Beliefs about lying and spreading of dishonesty

Instrumental Variables Estimation: An Introduction

EXPERIMENTAL RESEARCH DESIGNS

P E R S P E C T I V E S

Controlled Trials. Spyros Kitsiou, PhD

Chapter 1: Explaining Behavior

Comparing Direct and Indirect Measures of Just Rewards: What Have We Learned?

Research Design. Beyond Randomized Control Trials. Jody Worley, Ph.D. College of Arts & Sciences Human Relations

Examining Relationships Least-squares regression. Sections 2.3

MS&E 226: Small Data

Lec 02: Estimation & Hypothesis Testing in Animal Ecology

Ec331: Research in Applied Economics Spring term, Panel Data: brief outlines

EC352 Econometric Methods: Week 07

Cross-Lagged Panel Analysis

The validity of inferences about the correlation (covariation) between treatment and outcome.

Simple Sensitivity Analyses for Matched Samples Thomas E. Love, Ph.D. ASA Course Atlanta Georgia

Donna L. Coffman Joint Prevention Methodology Seminar

04/12/2014. Research Methods in Psychology. Chapter 6: Independent Groups Designs. What is your ideas? Testing

Comparing treatments evaluated in studies forming disconnected networks of evidence: A review of methods

Meta-Analysis of Correlation Coefficients: A Monte Carlo Comparison of Fixed- and Random-Effects Methods

Transcription:

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods Dean Eckles Department of Communication Stanford University dean@deaneckles.com Abstract There is widespread scientific and practical interest in estimating peer influence effects for a variety of behaviors, but peer influence effects are not identifiable in most observational data. Social scientists have thus had to remain largely silent on these effects or make inconclusive or misleading arguments for their estimates of them. The state-of-the-art research has used propensity score methods to either estimate or place an upper bound on peer influence effects. The proposed research will evaluate these methods using simulations. In particular, I will demonstrate how realistic cases missing edge data leads can lead to these methods substantially underestimating peer influence effects. This means that these estimates fail as upper bounds (i.e., fail to partially identify peer influence effects). I compare these methods with regression adjustment, which may be preferrable in such conditions. 1

Scientists and people making decisions about policies, interventions, products, and campaigns are all interested in how people are influenced by the behaviors of others. It is of widespread interest how healthy and unhealthy behaviors are influenced by those behaviors among others. Marketing managers and researchers are likewise interested in how an individuals product purchase decisions are influenced by the purchase decisions of others. More specifically, marketers have aimed to identify individuals whose decisions have especially large effects on these decisions in their social neighborhoods and larger networks. Despite the attention of basic and applied researchers in multiple fields, there is little credible evidence about peer influence effects in real-world networks. Available observational data and established methods are rarely sufficient to identify peer influence effects Manski (2000). This has not stopped investigators from developing statistical estimators and reporting their estimates of peer influence effects. But these are generally not consistent estimators of any causal effects of interest under plausible assumptions. Some recent efforts to distinguish endogenous peer influence effects from other causes of apparent influence (i.e., homophily, common external causes, contextual interactions) have employed propensity score methods for estimating the effect of alters behavior on egos (e.g., Aral et al., 2009; Hill et al., 2006). Propensity score matching (Rosenbaum and Rubin, 1983; Rubin, 1997) is widely used in epidemiology and other fields to reduce bias in estimates of treatment effects from observational data. Like other matching methods, treated units are matched with control units to minimize some distance measure. In propensity score matching, the analyst fits a model for predicting the treatment with suitable covariates; this is a model of the propensity to be treated for different units. The fitted values from this model (most often a logistic regression) are then matched on, most often excluding some control units that are poor matches. Thus, propensity score matching reduces the available metrics into a single variable the propensity score. Then a regression of the outcome on the treatment is fit for just the matched units. In this context, a propensity score is the probability that some ego s alters will engage in the behavior of interest; this behavior by alters is the treatment for the ego. Aral et al. (2009) develop a use of propensity score matching they call dynamic matching which estimates propensity scores for each biweekly period in their study, thus allowing for temporal differences in which characteristics determine whether one s friends will engage in the behavior. If selection into this treatment is limited to selection on observables, then matching on propensity scores can enable the consistent estimation of this treatment, as long as good enough matches are available. However, in the realistic settings of interest, there generally is selection on unobservables; that is, alters behavior is not ignorable conditional on observed variables. To see why this is, consider the behavior examined in Aral et al. (2009) the adoption of the Yahoo Go mobile phone application. While their data set includes many measures of individuals demographic and behavioral (e.g., use of the Yahoo Sports web site), it is easy to see that there is going to be selection on unobservables. For example, if Yahoo bought advertising for Yahoo Go, then individuals were likely exposed to this advertising because of unobserved variables: in the case of billboards, their commute route; in the case of TV ads, the shows they watch; etc. If these variables are correlated with network structure, then this selection on unobservables can masquerade as peer influence. 2

Based on such considerations about homophily and common external causation, Aral et al. (2009) somes describe their method as producing an upper bound on peer influence effects, rather than estimating them. I find this honesty about the implausible assumptions required to make their estimates consistent praiseworthy. Nonetheless, bounding an effect at zero and this upper bound is distinctively less useful for many purposes than having a credible narrower bound or point estimate. This paper evaluates the estimates produced by these methods using analysis of simulated data. The aim is to demonstrate realistic conditions under which these methods underestimate peer influence effects. Aral et al. (2009) acknowledge some conditions (i.e., when conditional independence or strong ignorability is not satisfied) when these methods overestimate peer influence, which is why they treat these estimates as upper bounds. But when these methods underestimate peer influence effects, this means that they fail to produce true upper bounds, their remaining purpose. In particular, this conclusion casts doubt on the use of these methods to claim that other methods overestimate peer influence effects by 300 to 700%. The specific conditions explored by this paper is when there are missing edges when some edges in the true network are not observed. I begin by describing the model used to create the true network. Simulation To analyze propensity score methods for estimating peer influence effects, it is necessary to have a network in which node characteristics are clustered. Futhermore, we need a network in which peer influence effects are of an known size. This speaks for using a simulation, rather than real-world data (in which the true peer influence effects are unknown). In order to create such a network, I use a modified preferential attachment model. Nodes are added sequentially to the network and create a single edge to an existing node. In the basic preferential attachment model, the probability of any existing node being selected is proportional to its degree relative to the total degree of the network at that point during construction. In this modified model, this probability is also decreasing with the distance from the new node (absolute difference) on a normal random variable H N(5, 1), where h i is a characteristic of node i. Each new node creates one to five edges, with the probability of each edge beyond the first being 1/2. With the selected parameters, this model produces a network in which the pairwise correlation between h i and h j for all pairs of nodes i and j that are neighbors is 0.39. I used this process, starting with a connected network of three nodes, to produce a network with 10,000 nodes. 500 nodes are selected to be infected with probability proportional to h. This is t = 1. At 9 subsequent steps, additional nodes are infected (Figure 2). The probability of any node i becoming infected a t is p(y it ) = 1 1 + e 4.5+3d it+.5(h i µ h ) where d it = 1 if any neighbors of i where infected at t 1. This is the structual equation for y and it a logistic regression model. Thus, it defines the causal effects of 3

Percent of Total 15 10 5 0 0 20 40 60 80 Degree Figure 1: The degree distribution of the resulting network. peer influence and homophilous node characteristics at each step and can be readily compared with output from logistic regressions. This model has p.01 for untreated nodes with h i = µ h, while this same node would have p.18 if treated. This process produces differences in H between infected and uninfected nodes (Figure 3) and, to a lesser extent, nodes with infected neighbors (treated nodes) and nodes without infected neightbors (untreated nodes) (Figure 3). The former difference is due both to the direct effect of H on adoption and imbalance in exposure to infected nodes because of homophily. The latter is the difference in treatment due to homophily. Bias from missing edges Propensity score methods can underestimate peer influence effects in some conditions. This can occur when peer influence occurs via edges that are not observed. For example, Aral et al. (2009) use a network constructed from Yahoo Messenger conversations, but this network likely does not exhaust the edges by which individuals adoption of Yahoo Go could influence others, including friends, family, and colleagues. Missing edges are not just problematic for the nodes they would connect: to the extent that there are edges are between similar nodes, this can weight variables in the selection model (i.e., the model for estimating propensity scores) that do not in fact affect selection. We can conceptualize missing edges as introducing measurement error into the treatment variables. If the treatement variable is the number of friends infected, edges missing at random will include some edges with infected neighbors, thus negatively biases the treatment variable. If the treatment variable is an indicator for whether any friends are infected, some nodes will be measured as untreated when they were in fact treated. If the treatment is the proportion of friends infected, edges missing at random will not have a consistent postive or negative effect on the measured treatment. This third parameterization is not used in Aral et al. (2009), while the other two are. Measurement error in the treatment variable can be expected to introduce or increase bias for many estimators, rather than just the dynamic propensity score matching method. Despite the general importance of such measurement error to causal inference, there has been relatively little applied work on the how different methods 4

10000 Number of infected nodes 8000 6000 4000 2000 0 2 4 6 8 10 Figure 2: The number of infected nodes at each. The rate of infection is decreasing since an increasing number of nodes are already infected at each and probability of infection does not increase with the number of neighbors infected. 5

infected not infected 0 2 4 6 8 Empirical CDF 0 2 4 6 8 H Figure 3: Empirical CDF of H conditioning on and whether the node is infected. There is substantial imbalance in which nodes are infected. This is as planned, since the true Bernoulli model for infection includes a positive coefficient for H. 6

infected neighbor(s) no infected neighbors 0 2 4 6 8 Empirical CDF 0 2 4 6 8 H Figure 4: Empirical CDF of H conditioning on and whether any neighbors are infected. Consistent with the goal of the simulation, there is some imbalance in which units are treated. 7

are affected (Millimet). In order to demonstrate the effect of even a small number of missing edges, I randomly censor edges from the simulated network. The probability of any edge being censored is.2. Many observed networks quite plausibly are missing many more edges through which the peer influence of interest is occurs. Following Aral et al. (2009), for each I fit a logistic regression to predict whether a node has infected neighbors at t. This model includes indicators for each, the homophilous individual characteristics h, and their interaction: d it βh i Only nodes that are not yet infected at that are included in the data set. The fitted values from this model ˆp(d it ) are the estimated propensity scores. Propensity scores are estimated for both the original network and the censored network. Then treated nodes (nodes with infected neighbors) are matched with untreated nodes to minimize differences in propensity scores. Those untreated nodes not selected by this matching are discarded. Subsequent analysis is performed on only treated nodes and matched control nodes. The standard analysis is a logistic regression predicting infection with an indicator of whether any of a nodes neighbors are already infected: y it ηd it The most common way to use the propensity scores is to construct a subset of the data. It can also be desirable to use inverse propensity score weights such that unit i at t has weight 1/ˆp(d it ) if treated and 1/(1 ˆp(d it )) if untreated. This can have the advantage of using more of the data and so having lower variance, but can have increased bias if the propensity score model is not correct. We find that both of these methods seem to underestimate the peer influence effect in the present conditions. From the model used to spread the infection, we know that the true coefficient for any neighbors being infected is η = 3. I plot the estimated coefficients from propensity score matching and weighting for each (Figure 5). I also include regression adjustment for H for comparison. These results illustrate how propensity score methods can underestimate peer influence effects. While the model using inverse propensity score weights may appear to be consistently worse, note that it has lower variance because it is using more of the data than the matching method. Regression adjustment appear to perform much better, exhibiting smaller bias. It is also preferrable because it is more efficient. Conclusions For whatever reason, propensity score methods have been regarded by some as importantly distinct from methods for estimating causal effects using regression adjustment. In an extreme version of this, Aral et al. (2009) write: [F]requently used methods such as regression analysis, which can only establish correlation, are insufficient. Causal treatment effects can, on the other hand, be estimated by matched sampling, which controls for confounding factors and overcomes selection bias by comparing observations that have the same likelihood of treatment. (my emphasis) 8

Estimates of peer influence effects: matching adjustment for H inv. propensity score weighting propensity score matching 4.0 3.5 eta 3.0 2.5 2.0 2 4 6 8 10 Figure 5: Estimates of the effect of having at least one infected neighbor on likelihood of infection. This plot compares estimates from propensity score methods (either weighting or matching) and regression adjustment for H. 9

This is incorrect. The use of propensity score matching versus regression adjustment makes no essential difference for identification (?). Both rest on the same assumption of selection on observables. Furthermore, propensity score matching methods can be seen to be regression adjustment (?). Rather, the appeals of propensity score matching are that (a) it enables dimensionality reduction by using estimated propensity scores and (b) helps researchers be more honest by avoiding specification search in their model for the treatment and failing, rather than extrapolating, when there is lack of overlap when treated and untreated units. I include these comments because I think it is important not to think of propensity score matching as a kind of black box that ensures credible causal inference when other methods fail. I have carried out an initial investigation into how common conditions missing edges in a measured network can introduce negative bias into estimates of peer influence effects from propensity score methods. This illustrates that these methods are not reliable upper bounds on peer influence effects, casting into some doubt their use to criticize the higher estimates produced by other methods. Future work should try to replicate these results in a broader set of conditions. This might include exploring the effects of edges that are not missing at random in a model where influence probabilities vary across edges. Additionally, it would be beneficial to more generally advance the study of the effects of measurement error in treatment variables on estimates of causal effects. References Aral, S., Muchnik, L., and Sundararajan, A. (2009). Distinguishing influence-based contagion from homophily-driven diffusion in dynamic networks. Proceedings of the National Academy of Sciences, 106(51):21544 21549. Hill, S., Provost, F., and Volinsky, C. (2006). Network-Based marketing: Identifying likely adopters via consumer networks. Statistical Science, 21(2):256 276. Manski, C. F. (2000). Economic analysis of social interactions. The Journal of Economic Perspectives, 14(3):115 136. Millimet, D. L. The elephant in the corner: A cautionary tale about measurement error in treatment effects models. SSRN elibrary. Rosenbaum, P. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1):41 55. Rubin, D. B. (1997). Estimating causal effects from large data sets using propensity scores. Annals of Internal Medicine, 127(8 Part 2):757 763. 10