Social Effects in Blau Space:

Size: px
Start display at page:

Download "Social Effects in Blau Space:"

Transcription

1 Social Effects in Blau Space: Miller McPherson and Jeffrey A. Smith Duke University

2 Abstract We develop a method of imputing characteristics of the network alters of respondents in probability samples of individuals using the homophily principle to estimate the properties of a respondent s core discussion network. These properties include a measure of the potential exposure to the attitudes, values, beliefs, and other characteristics of the respondent s network alters. Data from the General Social Survey data demonstrate that the imputed network characteristics are strongly related to individual level measures such as attitudes, beliefs, and other variables typical of survey analysis. In some cases, the imputed network variable drastically alters and even eliminates the effects of standard sociodemographic variables such as age and education. We follow with examples of health-related behavior from the Panel Study of income Dynamics

3 The Relational Approach N cases become N(N-1)/2 cases The metric is now distance Distance is created by Homophily

4 The Homophily Principle Applies to almost all social distinctions: Age, race, gender, beauty, height, weight, skin color. Has been found in almost all studies of social contact. Governs the large-scale organization of most kinds of interaction. Shapes the most elementary form of social structure: the probability of contact between two individuals. Produces localization of social entities.

5 Activities are localized in social space: Points are a representative sample of individuals. Boxes are a representative sample of associations.

6 What ties large scale systems together? The homophily principle simplifies transactions in Blau space, turning sociodemographic distance into network distance.

7 As the scale of the system grows larger, the Homophily principle becomes more and more powerful.

8

9

10

11 Focus on Industrial Society Sizes in millions or billions High dimensional Blau space Differentiation of types of relationships Social entities operate at multiple levels

12 The number of potential connections in systems of varying size Connections (N(N-1)/2) System Size (N)

13 Estimated number of Confiding relationships on Earth 7,280,000,000,000

14 Ratio of Actual to Potential

15 Assumptions in the theory People have a finite capacity for processing information and finite time and energy The homophily principle organizes the flow of information across the system

16 Some implications of the theory: The localization of information through homophily leads to the formation of niches for social entities These social niches evolve and interact with each other dynamically The location of an individual inside a niche conditions the probability that the individual will be affiliated with the social entity

17 Core question for the present: Is the presence of socially transmissible characteristic in an individual the result of contagion in Blau space, or is it a result of human capital, material resources, or some atomistic process?

18 Activities are localized in social space: Points are a probability sample of individuals. Boxes are a probability sample of activities.

19 For example: Are highly educated people more tolerant because they have been trained by the educational institution to appreciate diversity [education causes tolerance]? Or, is it that educated people tend to be surrounded by other educated people who express tolerant views, and become more tolerant as a result [birds of a feather not only flock together, but they fly in parallel]?

20 Another example: Do highly educated people desire fewer children because there is something inherent in the educational process that inhibits childbearing [education causes a desire for smaller families]? Or, do highly educated people get their views on desired family size from their friends, who are also highly educated [social context affects desired family size]

21 Today s approach to the problem: 1. Parameterize Blau space in multiple dimensions. 2. Model the dependencies among social entities produced by proximity in Blau space. 3. Use the estimated dependencies to impute local social context 4. Compare the predictive power of social context and the conventional model

22 Parameterizing Blau space The General Social Survey asked respondents in 1985 and 2004 to name the persons with whom they discussed important matters, and to report the following social characteristics of those persons: Age, Race, Sex, Religion, Education.

23 Parameterizing, cont. The survey also collected information on those characteristics for the respondents, as well as a rich variety of attitudinal and belief variables.

24

25

26 Parameterizing, cont. We create a dataset of all pairs of the GSS respondents in a given year, and assume that these pairs constitute a representative sample of pairs who do not discuss important matters with each other. Each of these pairs has a vector of Blau distances in Education, Age, Sex, Race, and Religion (e.g. D ij = Age i -Age j ). This dataset is the set of Controls, in our Case Control analysis.

27 Parameterizing, cont. We create a parallel dataset consisting of the pairs generated by the reports of each respondent on one of their core discussion partners. Again, each pair is characterized by the Blau distances in Education, Age, Sex, Race, and Religion. This dataset is the set of Cases.

28 Parameterizing, cont. We combine the cases and controls into a single dataset, and estimate the parameters in the following case control logistic regression model: Ln[P(tie ij )/[1-P(tie ij )] = ά + β( Dij ) where β( Dij ) represents the set of Blau distances and the associated parameters.

29 Case Control Analysis It is well known in the biometric literature that logistic regression provides consistent estimates up to a constant of proportionality for the parameters of models fitted to data sampled on the dependent variable (c.f. Hosmer and Lemeshow 1978, Pregibon 1974, Allison 2007). Our application fits into this model in the following way

30 Case Control Analysis, cont. The sample of relationships produced by the GSS study of core discussion networks is a representative sample of core confidant ties, since the respondents are a probability sample of the U.S.population.

31 Case Control Analysis, cont. The respondents in the GSS generate a sample of potential ties representative of the universe of core confidant ties that are unrealized, since the individuals are sampled independently (approximately). This result follows because the probability of obtaining a true network alter of any particular respondent in the actually measured core discussion networks is very small (p<.00001, with heroic assumptions about sampling design).

32 Case Control Analysis, cont Thus, we have a representative sample of observed core discussion network ties, and a representative sample of non-ties, which may be combined in the case control study to allow us to estimate the parameters of our model of homophily in Blau space.

33 Estimated effects of Blau Distances Intercept Race Distance Religion Distance Education Distance Age Distance Gender Distance (All coefficients significant beyond.001)

34 Modeling Dependencies From the estimated logistic regression equation, we recover fitted probabilities of contact between persons i and j in our sample, given their distance in Blau space: P(contact ij D ij )=1/(1+e -(ά + β( Dij ) )

35

36

37

38

39

40 Modeling Dependencies, cont. These probabilities are assembled into a row stochastic distance matrix P with which we can form the term PY i, the i th element of which is proportional to the expectation of Y for the potential network alters of respondent i, located in that respondent s position in Blau space.

41 Modeling Dependencies, cont. In concrete terms, the product PY i imputes the social context of ego s locale in Blau space. If Y is binary, then PY i estimates the proportion of potential alters which have attribute Y. If Y is continuous, PY i estimates the mean value of Y among ego s potential network alters. Since the y-intercepts from the case control analysis are biased, these estimates are correct up to a constant of proportionality, which is all that is required for the next stage of analysis.

42 Modeling Dependencies, cont. With our imputed network PY in hand, we then form the spatial/network regression model: Y i = ηx i + ΔPY j + Θ i Where ηx i represents the conventional survey analytic effects of X variables, Δ parameterizes the homophily effects in Blau space, and Θ i is stochastic.

43

44 Return to our Tolerance Example: For our tolerance example cited earlier, the network regression model will be: Tolerance i = a + b 1 Education i + b 2 PTolerance j + error i

45

46

47 Summary The social context variable not only is a substantial predictor of these survey measures of attitudes, but it actually destroys much of the effects of the conventional major predictors of those attitudes.

48 To emphasize, the model: Is applicable to any social survey variables, from any sample, as long as sociodemographic information is measured in that survey Does not require any information on networks in that survey

49 A Modest Claim If these results hold in general, then past studies of attributes in surveys like the GSS are almost certainly producing biased and inconsistent estimates due to the omission of social context. All prior studies of these survey variables may be producing illusory results due to the operation of social contagion in Blau space.

50 Some Implications The effects of sociodemographic variables may occur through social distance in Blau space, rather than through the traditionally conceived causal mechanisms Contagion in Blau space is a baseline phenomenon that should be accounted for before other effects are posited

51

52

53 A Health Application to a Dataset with No Direct Network Variables Panel Study of Income Dynamics N=1112 Years 1999,2001,2003,2005 Growth Curves: Health, BMI, Ailments

54 Methods We compute the social context of a position in Blau space by constructing Ego s view of the health terrain We argue that Ego will tend to regress to the expected value of their social context over time, rather than to the grand mean over time.

55

56

57

58 Regression to Social Context or Regression to the Grand Mean? Change in Count of Ailments Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** Deviation from Alters e-16 *** Deviation from Overall Mean e-12 *** Change in Reported Health: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 *** Deviation from Alters e-16 *** Deviation from Overall Mean e-10 *** Change in BMI: Estimate Std. Error t value Pr(> t ) (Intercept) e-11 *** Deviation from Alters Deviation from Overall Mean *

59 Interpretations Network Neighborhood Contagion: The Cristakis Effect: Strong and Weak Ties Local Social Context: Family, Neighborhood, Workplace: Lifestyle Generalized Social Context: Position in Stratification System, Urban-Rural Location, Geography. Location in the Cultural System: Cultural Niches: Habitat and Habitus

60 Some limitations Statistical properties not well understood P term in Y i = ηx i + ΔPY i + Θ i is estimated, not fixed Assumed unity of network content All relations assumed to be from same distributional family P estimates based on very strong ties Strong ties are embedded in dense neighborhoods Assumes transmissible Y Genetically coded Y operates on different time scale Limited to entities with niches Will not explain uniformly distributed Y

61 Miller s Final Rant for the Day Homophily is not choice Social change is not rewiring, but is microevolution The action in Blau space is relational not attributional. All surveys of individuals are samples of the residue of networks: Each observation is a highly fallible information dump from a node in a high dimensional space organized by the homophily principle. The above implies that 1) survey observations are serially correlated in Blau space and 2) survey datasets actually have some number in the factorials of n and k dependent observations, rather than n independent observations, where n is the number of individuals, and k is the number of variables. Blau space is a lens that enables us to view the connections implicit in survey data with the high dimensional web of human networks. Micro-level networks, the bipartite (multipartite) networks of connections between individuals and and higher level social entities, and the connections among the higher level entities coevolve in Blau space. Human networks are instantiations of high dimensional objects that are mostly unobservable, not the zeros and ones in our models.

62

63

IMPACTS OF SOCIAL NETWORKS AND SPACE ON OBESITY. The Rights and Wrongs of Social Network Analysis

IMPACTS OF SOCIAL NETWORKS AND SPACE ON OBESITY. The Rights and Wrongs of Social Network Analysis IMPACTS OF SOCIAL NETWORKS AND SPACE ON OBESITY The Rights and Wrongs of Social Network Analysis THE SPREAD OF OBESITY IN A LARGE SOCIAL NETWORK OVER 32 YEARS Nicholas A. Christakis James D. Fowler Published:

More information

Size Matters: the Structural Effect of Social Context

Size Matters: the Structural Effect of Social Context Size Matters: the Structural Effect of Social Context Siwei Cheng Yu Xie University of Michigan Abstract For more than five decades since the work of Simmel (1955), many social science researchers have

More information

Social Network Analysis: When Social Relationship is the Dependent Variable. Anabel Quan Haase Faculty of Information and Media Studies Sociology

Social Network Analysis: When Social Relationship is the Dependent Variable. Anabel Quan Haase Faculty of Information and Media Studies Sociology Social Network Analysis: When Social Relationship is the Dependent Variable Anabel Quan Haase Faculty of Information and Media Studies Sociology Overview of Presentation General overview of the social

More information

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods

Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods Identifying Peer Influence Effects in Observational Social Network Data: An Evaluation of Propensity Score Methods Dean Eckles Department of Communication Stanford University dean@deaneckles.com Abstract

More information

L4, Modeling using networks and other heterogeneities

L4, Modeling using networks and other heterogeneities L4, Modeling using networks and other heterogeneities July, 2017 Different heterogeneities In reality individuals behave differently both in terms of susceptibility and infectivity given that a contact

More information

Logistic regression. Department of Statistics, University of South Carolina. Stat 205: Elementary Statistics for the Biological and Life Sciences

Logistic regression. Department of Statistics, University of South Carolina. Stat 205: Elementary Statistics for the Biological and Life Sciences Logistic regression Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 1 Logistic regression: pp. 538 542 Consider Y to be binary

More information

Addendum: Multiple Regression Analysis (DRAFT 8/2/07)

Addendum: Multiple Regression Analysis (DRAFT 8/2/07) Addendum: Multiple Regression Analysis (DRAFT 8/2/07) When conducting a rapid ethnographic assessment, program staff may: Want to assess the relative degree to which a number of possible predictive variables

More information

Identifying Endogenous Peer Effects in the Spread of Obesity. Abstract

Identifying Endogenous Peer Effects in the Spread of Obesity. Abstract Identifying Endogenous Peer Effects in the Spread of Obesity Timothy J. Halliday 1 Sally Kwak 2 University of Hawaii- Manoa October 2007 Abstract Recent research in the New England Journal of Medicine

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

Do Your Online Friends Make You Pay? A Randomized Field Experiment on Peer Influence in Online Social Networks Online Appendix

Do Your Online Friends Make You Pay? A Randomized Field Experiment on Peer Influence in Online Social Networks Online Appendix Forthcoming in Management Science 2014 Do Your Online Friends Make You Pay? A Randomized Field Experiment on Peer Influence in Online Social Networks Online Appendix Ravi Bapna University of Minnesota,

More information

Assessing Studies Based on Multiple Regression. Chapter 7. Michael Ash CPPA

Assessing Studies Based on Multiple Regression. Chapter 7. Michael Ash CPPA Assessing Studies Based on Multiple Regression Chapter 7 Michael Ash CPPA Assessing Regression Studies p.1/20 Course notes Last time External Validity Internal Validity Omitted Variable Bias Misspecified

More information

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology

Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Bayesian graphical models for combining multiple data sources, with applications in environmental epidemiology Sylvia Richardson 1 sylvia.richardson@imperial.co.uk Joint work with: Alexina Mason 1, Lawrence

More information

NORTH SOUTH UNIVERSITY TUTORIAL 2

NORTH SOUTH UNIVERSITY TUTORIAL 2 NORTH SOUTH UNIVERSITY TUTORIAL 2 AHMED HOSSAIN,PhD Data Management and Analysis AHMED HOSSAIN,PhD - Data Management and Analysis 1 Correlation Analysis INTRODUCTION In correlation analysis, we estimate

More information

Generalized Estimating Equations for Depression Dose Regimes

Generalized Estimating Equations for Depression Dose Regimes Generalized Estimating Equations for Depression Dose Regimes Karen Walker, Walker Consulting LLC, Menifee CA Generalized Estimating Equations on the average produce consistent estimates of the regression

More information

Statistical Analysis of Complete Social Networks

Statistical Analysis of Complete Social Networks Statistical Analysis of Complete Social Networks Co-evolution of Networks & Behaviour Christian Steglich c.e.g.steglich@rug.nl median geodesic distance between groups 1.8 1.2 0.6 transitivity 0.0 0.0 0.5

More information

8/10/2015. Introduction: HIV. Introduction: Medical geography

8/10/2015. Introduction: HIV. Introduction: Medical geography Introduction: HIV Incorporating spatial variability to generate sub-national estimates of HIV prevalence in SSA Diego Cuadros PhD Laith Abu-Raddad PhD Sub-Saharan Africa (SSA) has by far the largest HIV

More information

"Lack of activity destroys the good condition of every human being, while movement and methodical physical exercise save it and preserve it.

Lack of activity destroys the good condition of every human being, while movement and methodical physical exercise save it and preserve it. Leave all the afternoon for exercise and recreation, which are as necessary as reading. I will rather say more necessary because health is worth more than learning. - Thomas Jefferson "Lack of activity

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 www.mssanz.org.au/modsim2013 Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision

More information

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of

Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of Objective: To describe a new approach to neighborhood effects studies based on residential mobility and demonstrate this approach in the context of neighborhood deprivation and preterm birth. Key Points:

More information

Social Network Sensors for Early Detection of Contagious Outbreaks

Social Network Sensors for Early Detection of Contagious Outbreaks Supporting Information Text S1 for Social Network Sensors for Early Detection of Contagious Outbreaks Nicholas A. Christakis 1,2*, James H. Fowler 3,4 1 Faculty of Arts & Sciences, Harvard University,

More information

MEA DISCUSSION PAPERS

MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de

More information

Chapter 2 Interactions Between Socioeconomic Status and Components of Variation in Cognitive Ability

Chapter 2 Interactions Between Socioeconomic Status and Components of Variation in Cognitive Ability Chapter 2 Interactions Between Socioeconomic Status and Components of Variation in Cognitive Ability Eric Turkheimer and Erin E. Horn In 3, our lab published a paper demonstrating that the heritability

More information

Cancer survivorship and labor market attachments: Evidence from MEPS data

Cancer survivorship and labor market attachments: Evidence from MEPS data Cancer survivorship and labor market attachments: Evidence from 2008-2014 MEPS data University of Memphis, Department of Economics January 7, 2018 Presentation outline Motivation and previous literature

More information

Supplementary Appendix

Supplementary Appendix Supplementary Appendix This appendix has been provided by the authors to give readers additional information about their work. Supplement to: Weintraub WS, Grau-Sepulveda MV, Weiss JM, et al. Comparative

More information

Leveraging Social Networks to Promote Cancer Prevention Health Behaviors

Leveraging Social Networks to Promote Cancer Prevention Health Behaviors Leveraging Social Networks to Promote Cancer Prevention Health Behaviors Dr. Jaya Aysola MD, MPH Jazmine Smith Masters in Criminology Candidate, Sarah Griggs MPH, Sitara Soundar MD candidate, Gabrielle

More information

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover).

Write your identification number on each paper and cover sheet (the number stated in the upper right hand corner on your exam cover). STOCKHOLM UNIVERSITY Department of Economics Course name: Empirical methods 2 Course code: EC2402 Examiner: Per Pettersson-Lidbom Number of credits: 7,5 credits Date of exam: Sunday 21 February 2010 Examination

More information

Analysis of TB prevalence surveys

Analysis of TB prevalence surveys Workshop and training course on TB prevalence surveys with a focus on field operations Analysis of TB prevalence surveys Day 8 Thursday, 4 August 2011 Phnom Penh Babis Sismanidis with acknowledgements

More information

SUMMATED RATING SCALES AND LEVELS OF MEASUREMENT

SUMMATED RATING SCALES AND LEVELS OF MEASUREMENT Measurement, Scaling, and Dimensional Analysis Summer 07 Bill Jacoby SUMMATED RATING SCALES AND LEVELS OF MEASUREMENT Assume that we are interested in measuring public attitudes toward government spending.

More information

Chapter 3 CORRELATION AND REGRESSION

Chapter 3 CORRELATION AND REGRESSION CORRELATION AND REGRESSION TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept or a 5 What Value of the Y-Variable Should be Predicted When r = 0? 7 The Regression

More information

Beer Purchasing Behavior, Dietary Quality, and Health Outcomes among U.S. Adults

Beer Purchasing Behavior, Dietary Quality, and Health Outcomes among U.S. Adults Beer Purchasing Behavior, Dietary Quality, and Health Outcomes among U.S. Adults Richard Volpe (California Polytechnical University, San Luis Obispo, USA) Research in health, epidemiology, and nutrition

More information

Estimating Heterogeneous Choice Models with Stata

Estimating Heterogeneous Choice Models with Stata Estimating Heterogeneous Choice Models with Stata Richard Williams Notre Dame Sociology rwilliam@nd.edu West Coast Stata Users Group Meetings October 25, 2007 Overview When a binary or ordinal regression

More information

DAZED AND CONFUSED: THE CHARACTERISTICS AND BEHAVIOROF TITLE CONFUSED READERS

DAZED AND CONFUSED: THE CHARACTERISTICS AND BEHAVIOROF TITLE CONFUSED READERS Worldwide Readership Research Symposium 2005 Session 5.6 DAZED AND CONFUSED: THE CHARACTERISTICS AND BEHAVIOROF TITLE CONFUSED READERS Martin Frankel, Risa Becker, Julian Baim and Michal Galin, Mediamark

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

DANIEL KARELL. Soc Stats Reading Group. Princeton University

DANIEL KARELL. Soc Stats Reading Group. Princeton University Stochastic Actor-Oriented Models and Change we can believe in: Comparing longitudinal network models on consistency, interpretability and predictive power DANIEL KARELL Division of Social Science New York

More information

Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model

Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model Modelling Research Productivity Using a Generalization of the Ordered Logistic Regression Model Delia North Temesgen Zewotir Michael Murray Abstract In South Africa, the Department of Education allocates

More information

Multivariate Multilevel Models

Multivariate Multilevel Models Multivariate Multilevel Models Getachew A. Dagne George W. Howe C. Hendricks Brown Funded by NIMH/NIDA 11/20/2014 (ISSG Seminar) 1 Outline What is Behavioral Social Interaction? Importance of studying

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Rapid decline of female genital circumcision in Egypt: An exploration of pathways. Jenny X. Liu 1 RAND Corporation. Sepideh Modrek Stanford University

Rapid decline of female genital circumcision in Egypt: An exploration of pathways. Jenny X. Liu 1 RAND Corporation. Sepideh Modrek Stanford University Rapid decline of female genital circumcision in Egypt: An exploration of pathways Jenny X. Liu 1 RAND Corporation Sepideh Modrek Stanford University This version: February 3, 2010 Abstract Egypt is currently

More information

Motherhood and Female Labor Force Participation: Evidence from Infertility Shocks

Motherhood and Female Labor Force Participation: Evidence from Infertility Shocks Motherhood and Female Labor Force Participation: Evidence from Infertility Shocks Jorge M. Agüero Univ. of California, Riverside jorge.aguero@ucr.edu Mindy S. Marks Univ. of California, Riverside mindy.marks@ucr.edu

More information

A Study of the Spatial Distribution of Suicide Rates

A Study of the Spatial Distribution of Suicide Rates A Study of the Spatial Distribution of Suicide Rates Ferdinand DiFurio, Tennessee Tech University Willis Lewis, Winthrop University With acknowledgements to Kendall Knight, GA, Tennessee Tech University

More information

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes. Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension

More information

In this module I provide a few illustrations of options within lavaan for handling various situations.

In this module I provide a few illustrations of options within lavaan for handling various situations. In this module I provide a few illustrations of options within lavaan for handling various situations. An appropriate citation for this material is Yves Rosseel (2012). lavaan: An R Package for Structural

More information

Effects of School-Level Norms on Student Substance Use

Effects of School-Level Norms on Student Substance Use Prevention Science, Vol. 3, No. 2, June 2002 ( C 2002) Effects of School-Level Norms on Student Substance Use Revathy Kumar, 1,2,4 Patrick M. O Malley, 1 Lloyd D. Johnston, 1 John E. Schulenberg, 1,3 and

More information

TRIPLL Webinar: Propensity score methods in chronic pain research

TRIPLL Webinar: Propensity score methods in chronic pain research TRIPLL Webinar: Propensity score methods in chronic pain research Felix Thoemmes, PhD Support provided by IES grant Matching Strategies for Observational Studies with Multilevel Data in Educational Research

More information

CSE 255 Assignment 9

CSE 255 Assignment 9 CSE 255 Assignment 9 Alexander Asplund, William Fedus September 25, 2015 1 Introduction In this paper we train a logistic regression function for two forms of link prediction among a set of 244 suspected

More information

1.4 - Linear Regression and MS Excel

1.4 - Linear Regression and MS Excel 1.4 - Linear Regression and MS Excel Regression is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear

More information

PERCEIVED TRUSTWORTHINESS OF KNOWLEDGE SOURCES: THE MODERATING IMPACT OF RELATIONSHIP LENGTH

PERCEIVED TRUSTWORTHINESS OF KNOWLEDGE SOURCES: THE MODERATING IMPACT OF RELATIONSHIP LENGTH PERCEIVED TRUSTWORTHINESS OF KNOWLEDGE SOURCES: THE MODERATING IMPACT OF RELATIONSHIP LENGTH DANIEL Z. LEVIN Management and Global Business Dept. Rutgers Business School Newark and New Brunswick Rutgers

More information

SCHOOL OF MATHEMATICS AND STATISTICS

SCHOOL OF MATHEMATICS AND STATISTICS Data provided: Tables of distributions MAS603 SCHOOL OF MATHEMATICS AND STATISTICS Further Clinical Trials Spring Semester 014 015 hours Candidates may bring to the examination a calculator which conforms

More information

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do. Midterm STAT-UB.0003 Regression and Forecasting Models The exam is closed book and notes, with the following exception: you are allowed to bring one letter-sized page of notes into the exam (front and

More information

IAPT: Regression. Regression analyses

IAPT: Regression. Regression analyses Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project

More information

Food Labels and Weight Loss:

Food Labels and Weight Loss: Food Labels and Weight Loss: Evidence from the National Longitudinal Survey of Youth Bidisha Mandal Washington State University AAEA 08, Orlando Motivation Who reads nutrition labels? Any link with body

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Midterm, 2016 Exam policy: This exam allows one one-page, two-sided cheat sheet; No other materials. Time: 80 minutes. Be sure to write your name and

More information

Linear Regression in SAS

Linear Regression in SAS 1 Suppose we wish to examine factors that predict patient s hemoglobin levels. Simulated data for six patients is used throughout this tutorial. data hgb_data; input id age race $ bmi hgb; cards; 21 25

More information

The Dynamic Effects of Obesity on the Wages of Young Workers

The Dynamic Effects of Obesity on the Wages of Young Workers The Dynamic Effects of Obesity on the Wages of Young Workers Joshua C. Pinkston University of Louisville June, 2015 Contributions 1. Focus on more recent cohort, NLSY97. Obesity

More information

Instrumental Variables Estimation: An Introduction

Instrumental Variables Estimation: An Introduction Instrumental Variables Estimation: An Introduction Susan L. Ettner, Ph.D. Professor Division of General Internal Medicine and Health Services Research, UCLA The Problem The Problem Suppose you wish to

More information

What is Regularization? Example by Sean Owen

What is Regularization? Example by Sean Owen What is Regularization? Example by Sean Owen What is Regularization? Name3 Species Size Threat Bo snake small friendly Miley dog small friendly Fifi cat small enemy Muffy cat small friendly Rufus dog large

More information

Propensity scores: what, why and why not?

Propensity scores: what, why and why not? Propensity scores: what, why and why not? Rhian Daniel, Cardiff University @statnav Joint workshop S3RI & Wessex Institute University of Southampton, 22nd March 2018 Rhian Daniel @statnav/propensity scores:

More information

How to analyze correlated and longitudinal data?

How to analyze correlated and longitudinal data? How to analyze correlated and longitudinal data? Niloofar Ramezani, University of Northern Colorado, Greeley, Colorado ABSTRACT Longitudinal and correlated data are extensively used across disciplines

More information

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data

Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data Analysis of Environmental Data Conceptual Foundations: En viro n m e n tal Data 1. Purpose of data collection...................................................... 2 2. Samples and populations.......................................................

More information

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp

1 Introduction. st0020. The Stata Journal (2002) 2, Number 3, pp The Stata Journal (22) 2, Number 3, pp. 28 289 Comparative assessment of three common algorithms for estimating the variance of the area under the nonparametric receiver operating characteristic curve

More information

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC

Selected Topics in Biostatistics Seminar Series. Missing Data. Sponsored by: Center For Clinical Investigation and Cleveland CTSC Selected Topics in Biostatistics Seminar Series Missing Data Sponsored by: Center For Clinical Investigation and Cleveland CTSC Brian Schmotzer, MS Biostatistician, CCI Statistical Sciences Core brian.schmotzer@case.edu

More information

Situation of Obesity in Different Ages in Albania

Situation of Obesity in Different Ages in Albania Available online at www.scholarsresearchlibrary.com European Journal of Sports & Exercise Science, 2018, 6 (1): 5-10 (http://www.scholarsresearchlibrary.com) Situation of Obesity in Different Ages in Albania

More information

Introduction to Observational Studies. Jane Pinelis

Introduction to Observational Studies. Jane Pinelis Introduction to Observational Studies Jane Pinelis 22 March 2018 Outline Motivating example Observational studies vs. randomized experiments Observational studies: basics Some adjustment strategies Matching

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination

Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Hierarchical Bayesian Modeling of Individual Differences in Texture Discrimination Timothy N. Rubin (trubin@uci.edu) Michael D. Lee (mdlee@uci.edu) Charles F. Chubb (cchubb@uci.edu) Department of Cognitive

More information

Estimating treatment effects with observational data: A new approach using hospital-level variation in treatment intensity

Estimating treatment effects with observational data: A new approach using hospital-level variation in treatment intensity Preliminary and incomplete Do not quote Estimating treatment effects with observational data: A new approach using hospital-level variation in treatment intensity Mark McClellan Stanford University and

More information

NBER WORKING PAPER SERIES HOW WAS THE WEEKEND? HOW THE SOCIAL CONTEXT UNDERLIES WEEKEND EFFECTS IN HAPPINESS AND OTHER EMOTIONS FOR US WORKERS

NBER WORKING PAPER SERIES HOW WAS THE WEEKEND? HOW THE SOCIAL CONTEXT UNDERLIES WEEKEND EFFECTS IN HAPPINESS AND OTHER EMOTIONS FOR US WORKERS NBER WORKING PAPER SERIES HOW WAS THE WEEKEND? HOW THE SOCIAL CONTEXT UNDERLIES WEEKEND EFFECTS IN HAPPINESS AND OTHER EMOTIONS FOR US WORKERS John F. Helliwell Shun Wang Working Paper 21374 http://www.nber.org/papers/w21374

More information

The Epidemiology of HIV/AIDS in Texas in Ages ( )

The Epidemiology of HIV/AIDS in Texas in Ages ( ) The Epidemiology of HIV/AIDS in Texas in Ages 25-49 (1999-2010) Author: Jonathan Rodriguez Faculty Mentor: Joseph R. Oppong, Department of Geography, College of Arts and Sciences; Toulouse School of Graduate

More information

Performance of Median and Least Squares Regression for Slightly Skewed Data

Performance of Median and Least Squares Regression for Slightly Skewed Data World Academy of Science, Engineering and Technology 9 Performance of Median and Least Squares Regression for Slightly Skewed Data Carolina Bancayrin - Baguio Abstract This paper presents the concept of

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Multi-level approaches to understanding and preventing obesity: analytical challenges and new directions

Multi-level approaches to understanding and preventing obesity: analytical challenges and new directions Multi-level approaches to understanding and preventing obesity: analytical challenges and new directions Ana V. Diez Roux MD PhD Center for Integrative Approaches to Health Disparities University of Michigan

More information

MLE #8. Econ 674. Purdue University. Justin L. Tobias (Purdue) MLE #8 1 / 20

MLE #8. Econ 674. Purdue University. Justin L. Tobias (Purdue) MLE #8 1 / 20 MLE #8 Econ 674 Purdue University Justin L. Tobias (Purdue) MLE #8 1 / 20 We begin our lecture today by illustrating how the Wald, Score and Likelihood ratio tests are implemented within the context of

More information

This exam consists of three parts. Provide answers to ALL THREE sections.

This exam consists of three parts. Provide answers to ALL THREE sections. Empirical Analysis and Research Methodology Examination Yale University Department of Political Science January 2008 This exam consists of three parts. Provide answers to ALL THREE sections. Your answers

More information

Applied Quantitative Methods II

Applied Quantitative Methods II Applied Quantitative Methods II Lecture 7: Endogeneity and IVs Klára Kaĺıšková Klára Kaĺıšková AQM II - Lecture 7 VŠE, SS 2016/17 1 / 36 Outline 1 OLS and the treatment effect 2 OLS and endogeneity 3 Dealing

More information

Confluence: Conformity Influence in Large Social Networks

Confluence: Conformity Influence in Large Social Networks Confluence: Conformity Influence in Large Social Networks Jie Tang *, Sen Wu *, and Jimeng Sun + * Tsinghua University + IBM TJ Watson Research Center 1 Conformity Conformity is the act of matching attitudes,

More information

TWO-DAY DYADIC DATA ANALYSIS WORKSHOP Randi L. Garcia Smith College UCSF January 9 th and 10 th

TWO-DAY DYADIC DATA ANALYSIS WORKSHOP Randi L. Garcia Smith College UCSF January 9 th and 10 th TWO-DAY DYADIC DATA ANALYSIS WORKSHOP Randi L. Garcia Smith College UCSF January 9 th and 10 th @RandiLGarcia RandiLGarcia Mediation in the APIM Moderation in the APIM Dyadic Growth Curve Modeling Other

More information

AN INFORMATION VISUALIZATION APPROACH TO CLASSIFICATION AND ASSESSMENT OF DIABETES RISK IN PRIMARY CARE

AN INFORMATION VISUALIZATION APPROACH TO CLASSIFICATION AND ASSESSMENT OF DIABETES RISK IN PRIMARY CARE Proceedings of the 3rd INFORMS Workshop on Data Mining and Health Informatics (DM-HI 2008) J. Li, D. Aleman, R. Sikora, eds. AN INFORMATION VISUALIZATION APPROACH TO CLASSIFICATION AND ASSESSMENT OF DIABETES

More information

Application of Cox Regression in Modeling Survival Rate of Drug Abuse

Application of Cox Regression in Modeling Survival Rate of Drug Abuse American Journal of Theoretical and Applied Statistics 2018; 7(1): 1-7 http://www.sciencepublishinggroup.com/j/ajtas doi: 10.11648/j.ajtas.20180701.11 ISSN: 2326-8999 (Print); ISSN: 2326-9006 (Online)

More information

9 research designs likely for PSYC 2100

9 research designs likely for PSYC 2100 9 research designs likely for PSYC 2100 1) 1 factor, 2 levels, 1 group (one group gets both treatment levels) related samples t-test (compare means of 2 levels only) 2) 1 factor, 2 levels, 2 groups (one

More information

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018 Introduction to Machine Learning Katherine Heller Deep Learning Summer School 2018 Outline Kinds of machine learning Linear regression Regularization Bayesian methods Logistic Regression Why we do this

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

Correlation and regression

Correlation and regression PG Dip in High Intensity Psychological Interventions Correlation and regression Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk/ Correlation Example: Muscle strength

More information

Following in Your Father s Footsteps: A Note on the Intergenerational Transmission of Income between Twin Fathers and their Sons

Following in Your Father s Footsteps: A Note on the Intergenerational Transmission of Income between Twin Fathers and their Sons D I S C U S S I O N P A P E R S E R I E S IZA DP No. 5990 Following in Your Father s Footsteps: A Note on the Intergenerational Transmission of Income between Twin Fathers and their Sons Vikesh Amin Petter

More information

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Multiple Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Multiple Regression 1 / 19 Multiple Regression 1 The Multiple

More information

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests

More information

Those Who Tan and Those Who Don t: A Natural Experiment of Employment Discrimination

Those Who Tan and Those Who Don t: A Natural Experiment of Employment Discrimination Those Who Tan and Those Who Don t: A Natural Experiment of Employment Discrimination Ronen Avraham, Tamar Kricheli Katz, Shay Lavie, Haggai Porat, Tali Regev Abstract: Are Black workers discriminated against

More information

Underweight Children in Ghana: Evidence of Policy Effects. Samuel Kobina Annim

Underweight Children in Ghana: Evidence of Policy Effects. Samuel Kobina Annim Underweight Children in Ghana: Evidence of Policy Effects Samuel Kobina Annim Correspondence: Economics Discipline Area School of Social Sciences University of Manchester Oxford Road, M13 9PL Manchester,

More information

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. Greenland/Arah, Epi 200C Sp 2000 1 of 6 EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. INSTRUCTIONS: Write all answers on the answer sheets supplied; PRINT YOUR NAME and STUDENT ID NUMBER

More information

Micronutrients intake and cancer: Protective or promoters? A challenge for risk estimation.

Micronutrients intake and cancer: Protective or promoters? A challenge for risk estimation. Micronutrients intake and cancer: Protective or promoters? A challenge for risk estimation. Muñoz SE 1,2 ; Roman D 1 ; Roque F 1 ; Navarro A 1 ; Díaz MP 1. 1.Facultad de Ciencias Médicas, Universidad Nacional

More information

Cross-Lagged Panel Analysis

Cross-Lagged Panel Analysis Cross-Lagged Panel Analysis Michael W. Kearney Cross-lagged panel analysis is an analytical strategy used to describe reciprocal relationships, or directional influences, between variables over time. Cross-lagged

More information

Examining Relationships Least-squares regression. Sections 2.3

Examining Relationships Least-squares regression. Sections 2.3 Examining Relationships Least-squares regression Sections 2.3 The regression line A regression line describes a one-way linear relationship between variables. An explanatory variable, x, explains variability

More information

Introduction to Social Network Analysis for Dissemination and Implementation Research

Introduction to Social Network Analysis for Dissemination and Implementation Research Introduction to Social Network Analysis for Dissemination and Implementation Research Miruna Petrescu-Prahova, PhD mirunapp@uw.edu Health Promotion Research Center Department of Health Services University

More information

cloglog link function to transform the (population) hazard probability into a continuous

cloglog link function to transform the (population) hazard probability into a continuous Supplementary material. Discrete time event history analysis Hazard model details. In our discrete time event history analysis, we used the asymmetric cloglog link function to transform the (population)

More information

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer Ronghui (Lily) Xu Division of Biostatistics and Bioinformatics Department of Family Medicine

More information

Ch. 11 Measurement. Measurement

Ch. 11 Measurement. Measurement TECH 646 Analysis of Research in Industry and Technology PART III The Sources and Collection of data: Measurement, Measurement Scales, Questionnaires & Instruments, Sampling Ch. 11 Measurement Lecture

More information

Donna L. Coffman Joint Prevention Methodology Seminar

Donna L. Coffman Joint Prevention Methodology Seminar Donna L. Coffman Joint Prevention Methodology Seminar The purpose of this talk is to illustrate how to obtain propensity scores in multilevel data and use these to strengthen causal inferences about mediation.

More information

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012

STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012 STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION by XIN SUN PhD, Kansas State University, 2012 A THESIS Submitted in partial fulfillment of the requirements

More information

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto

Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling. Olli-Pekka Kauppila Daria Kautto Doing Quantitative Research 26E02900, 6 ECTS Lecture 6: Structural Equations Modeling Olli-Pekka Kauppila Daria Kautto Session VI, September 20 2017 Learning objectives 1. Get familiar with the basic idea

More information