Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties

Size: px
Start display at page:

Download "Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties"

Transcription

1 Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point of 2.6 pci/l for the Radon level that defines a High-Low "Treatment" Dichotomy is given by the initial binary split in the following Partition Regression (Tree) Model: Partition of (unadjusted) Lung Cancer Mortality on Radon RSquare RMSE N Number of AICc Splits All Rows Count Mean Std De LogWorth Difference Radon>=2.6 Count 1220 Mean Std De Radon<2.6 Count 1661 Mean Std De "Low Radon" : Level strictly less than 2.6 pci/l (picocuries per liter.) "High Radon" : Level = 2.6 pci/l (picocuries per liter) or greater. We will see below that higher Radon levels are associated with lower Lung Cancer Mortality rates. Neither this analysis nor the ones depicted on page 2 have been "covariate adjusted" for possible X-confounding factors included within in the datasets being analyzed here. 1

2 Prediction of Lung Cancer Mortality from Ln[Rn]...Unadjusted for all other X-confounders. Ln[Rn] = Natural Logarithm of Radon level. Here, 10 US counties with Radon level coded as "0.0" have been Windsorized in the dataset to Ln[0.05] = The cut-point at Radon = 2.6 pci/l (Ln[Rn] = ) is used in the fits displayed on this page only to color counties either Red or Blue. Linear Fit: Lung Cancer Mortality = * Ln[Rn] RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 2881 Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Model Error Prob > F C. Total <.0001* Parameter Estimates Term Estimate Std Error t Ratio Prob> t Intercept <.0001* Ln[Rn] <.0001* Smoothing Spline Fit, lambda=5 R-Square Sum of Squares Error

3 Output from the "Local Control" JMP Add-In: Pages 3,4 and 5. Outcome Variable: Treatment Variable: Cluster Effect Type: Variability Assumption: Lung Cancer Mortality Radon Level Flag Fixed Homoskedastic Random Number Seed: Specify Number of Clusters = 50 Specify Number of Permutations = Mean_LTD LTD distribution for 50 clusters 3

4 Hierarchical Clustering Method = Fast Ward Obesity (%) Currently Smoke Age Over 65 (%) Dendrogram Hierarchically Clustered Differences 4

5 Response Lung Cancer Mortality -- Nested ANOVA (Treatment within Cluster) RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 2881 Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Model Error Prob > F C. Total <.0001* Effect Tests Source Nparm DF Sum of Squares F Ratio Prob > F Cluster <.0001* High Radon[Cluster] <.0001* NOTE: Cluster #10 is uninformative about Lung Cancer Mortality LTDs (High minus Low Radon) because all 11 US counties it contains have Radon levels less than 2.6 pci/l (picocuries per liter.) This explains why there are only 49 (rather than 50) Degrees-of-Freedom for Treatment-within-Cluster (LTD) effects. Only 49 Degrees-of-Freedom are attributed to main-effects within 50 Clusters by convention; the overall mean effect for mortality is simply removed and not shown in the ANOVA table. Although it may not be obvious from the entries in above Nested ANOVA table, results depend upon choice of the High-Low Radon cut-point (2.6 pci/l here) as well as the numbers of both requested and informative clusters (50 and 49, respectively.) Specifically, the y-outcome column vector here has 2,881 rows for US counties and consists of Lung Cancer Mortality rates being viewed as realizations of a continuous random variable. Furthermore, the "design" matrix has only two non-constant columns viewed as fixed (given) categorical variables: 1. The vector of treatment indicators has 2 levels - say, zeros (Low Radon) and ones (High Radon.) 2. The vector of cluster membership indicators has 50 levels - say, the integers 1 through 50. The analysis of this cross-classification of mortality rates is essentially nonparametric because no information is used on either how clusters were formed / defined from county X-characteristics or what the numerical values of X- characteristics are. 5

6 Aggregate Phase: Observed LTD Distribution (49 Informative Clusters containing 2,870 US Counties) Observed Local Treatment Difference (LTD) Distribution for 50 Ward Clusters Lung Cancer Mortality is measured in Deaths per 100,000 Person-Years. LTDs are differences in Mortality rates: Radon High minus Low. Above histogram depicts the Most Typical LTD Distribution derived from micro-aggregation of 2,881 US Counties on 3 primary X-confounders o Age Over 65 % o Currently Smoke % o Obesity % Y-outcome = Lung Cancer Mortality Binary Treatment Indicator: Radon High ( at least 2.6 pci/l ) vs. Low Best fitting Normal approximation has mean µ = deaths and std. dev. σ =

7 Confirm Phase: Comparison of empirical Cumulative Distribution Functions (CDFs) Random Permutation LTD-like Distribution Observed LTD Distribution These two distributions are rather clearly different; they differ most on statistical measures of location and shape (skewness, kurtosis, range) also see histograms and statistics listed on the next page. This means that clustering (local conditioning, matching) on 3 primary X-confounders [% over 65, % currently smoke and % obese] has indeed yielded appropriately adjusted treatment effect-size estimates. Local treatment effect-size estimates are LTDs expressed as a difference in mortality rates (deaths per 100,000 person-years) of the form: Average for one-or-more High Radon counties minus Average for one-or-more Low Radon counties. 7

8 Random Permutation LTD-like Distribution Observed LTD Distribution Mean Std Dev Std Err Mean Upper 95% Mean Lower 95% Mean N 50 * 2870 Skewness Kurtosis Mean Std Dev Std Err Mean Upper 95% Mean Lower 95% Mean N 2870 Skewness Kurtosis

9 Explore Phases: Tried using Complete Linkage as well as Fast Ward clustering in JMP. Tried using combinations of 3 out of 5 potential X confounders for clustering: o Age Over 65 % o Obesity % o Currently Smoke % o Ever Smoke % o Median Household Income ($1,000s) Tried varying total # of clusters used from 50 to 400. Reveal Phase: NOTE: Cluster #10 is uninformative about LTDs and contains 11 counties. Thus the following predictions use the data from only 2,870 US counties I.E. LTD missingness is not considered informative of potential treatment effect-sizes. Fitted Supervised Learning Models for predicting observed LTDs: o JMP 11 Analyze > Modeling Platform > Partition option single Tree (7 terminal nodes) Bootstrap Forest Model Average of 100 Trees o JMP Analyze > Fit Model Platform Multi Variable Regression (Degree at most 2) Tried using as many as 6 potential X confounders for predicting observed LTDs: o Age Over 65 % o Obesity % o Currently Smoke % o Ever Smoke % o Median Household Income ($1,000s) o Numeric Radon ( or Ln[Rn] ) Level...as either an ordinal or continuous measure 9

10 Predicting LTDs using Supervised Learning: Method One (Single "Small" Tree), R 2 =0.51 Partition - Best such Tree for predicting LTDobserved (6 splits, 7 terminal nodes) LTD RSquare RMSE N Number of Splits AICc

11 Best "Small" Tree: Mean = average LTD within Leaf Note that all 3 splits on "Age Over 65 %" are such that the counties with the higher % elderly population are predicted to have LARGER (more negative) ADVANTAGES of High Radon in keeping Lung Cancer Mortality low. Note also that both splits on "Currently Smoke %" are such that the counties with the lower % smoking are predicted to have LARGER (more negative) ADVANTAGES of High Radon in keeping Lung Cancer Mortality low. Finally, the single split on "Obesity %" is such that the counties with the lower % obese are predicted to have LARGER (more negative) ADVANTAGES of High Radon in keeping Lung Cancer Mortality low. X-Confounder Contributions: Term Number of SS SS Portion Splits Age Over 65 (%) Currently Smoke (%) Obesity (%) Radon level in pci/l Ever Smoke (%) Median HH Income Although membership in the High or Low Treatment cohorts is perfectly predicted by Radon level within a county, it is somewhat interesting that Radon level is not used in the above predictions of the corresponding LTDs in Lung Cancer Mortality rate. 11

12 Predicting LTDs using Supervised Learning: Method Two (Bootstrap Forest), R 2 =0.78 Bootstrap Forest for LTDobserved Number of trees in the forest: 250 Number of terms sampled per split: 4 Training rows: 2870 Validation rows: 0 Test rows: 0 Number of terms: 6 Bootstrap samples: 2870 Minimum Splits Per Tree: 6 Minimum Size Split: 20 Overall Statistics Individual Trees RMSE In Bag Out of Bag RSquare RMSE N

13 Observed LTD Estimates vs their Forest Predictions... X-Confounder Contributions Term Number of SS SS Portion Splits Age Over 65 (%) Currently Smoke (%) Obesity (%) Ever Smoke (%) Median HH Income Radon level in pci/l NOTE: Because Partitioning methods (Trees and Forests) use only the ordinal information about Clusters formed using X-confounders, this could help explain why they do not find Radon Level particularly predictive of LTDs I.E. Radon level in pci/l ranks 6 th out-of-six in the above table of X-confounder predictability!!! On the hand, traditional fully-parametric model fitting methods assume all continuous variables are measured on an interval scale...i.e. individual terms can represent linear or quadratic effects or hyperbolic interactions. We will see 13

14 that numerical values of Radon level are much more predictive of LTDs under these much stronger assumptions. Predicting LTDs using Supervised Learning: Method Three (MultiVariable Regression), R 2 =0.49 RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 2870 Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Model Error Prob > F C. Total <.0001* Parameter Estimates Term Estimate Std Error t Ratio Prob> t Intercept <.0001* Radon * Obesity (%) <.0001* Age Over 65 (%) <.0001* Currently Smoke <.0001* Ever Smoke (Radon )*(Age Over 65 (%) ) <.0001* (Age Over 65 (%) )*(Currently Smoke ) * (Currently Smoke )*(Ever Smoke ) <.0001* (Obesity (%) )*(Obesity (%) ) <.0001* (Age Over 65 (%) )*(Age Over 65 (%) ) <.0001* The 2 quadratic terms (in % obese and % over 65) used here seem particularly curious. Furthermore, including such terms in multi-variable regression model(s) can cause any predictions made strictly outside of the observed ranges of the given X-variables to represent potentially severe and unwarranted extrapolations. Furthermore, of the three methods considered for predicting LTD estimates from six available X-confounding factors, traditional MultiVariable Regression is the least accurate. 14

15 Correlations between Observed LTDs and their Predictions LTD observed LTDtreePred LTDforestPred LTDmvregPred LTD observed LTDtreePred LTDforestPred LTDmvregPred R-squared * * The R 2 value for Bootstrap Forest "model averaging" of (only) listed on page 12 apparently incorporates some sort of further "adjustment" or penalty for being thorough, complicated or versatile. Scatterplot Matrix LTD LTDtreePred LTDforestPred LTDmvregPred

Bias Adjustment: Local Control Analysis of Radon and Ozone

Bias Adjustment: Local Control Analysis of Radon and Ozone Bias Adjustment: Local Control Analysis of Radon and Ozone S. Stanley Young Robert Obenchain Goran Krstic NCSU 19Oct2016 Abstract Bias Adjustment: Local control analysis of Radon and ozone S. Stanley Young,

More information

ANOVA. Thomas Elliott. January 29, 2013

ANOVA. Thomas Elliott. January 29, 2013 ANOVA Thomas Elliott January 29, 2013 ANOVA stands for analysis of variance and is one of the basic statistical tests we can use to find relationships between two or more variables. ANOVA compares the

More information

Math 215, Lab 7: 5/23/2007

Math 215, Lab 7: 5/23/2007 Math 215, Lab 7: 5/23/2007 (1) Parametric versus Nonparamteric Bootstrap. Parametric Bootstrap: (Davison and Hinkley, 1997) The data below are 12 times between failures of airconditioning equipment in

More information

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1

From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1 From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Contents Dedication... iii Acknowledgments... xi About This Book... xiii About the Author... xvii Chapter 1: Introduction...

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Dr. Kelly Bradley Final Exam Summer {2 points} Name {2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. This exam is being scored out of 00 points.

More information

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H

Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H 1. Data from a survey of women s attitudes towards mammography are provided in Table 1. Women were classified by their experience with mammography

More information

Analysis of Variance: repeated measures

Analysis of Variance: repeated measures Analysis of Variance: repeated measures Tests for comparing three or more groups or conditions: (a) Nonparametric tests: Independent measures: Kruskal-Wallis. Repeated measures: Friedman s. (b) Parametric

More information

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu

More information

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.

Statistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Statistics as a Tool A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Descriptive Statistics Numerical facts or observations that are organized describe

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Psych 5741/5751: Data Analysis University of Boulder Gary McClelland & Charles Judd. Exam #2, Spring 1992

Psych 5741/5751: Data Analysis University of Boulder Gary McClelland & Charles Judd. Exam #2, Spring 1992 Exam #2, Spring 1992 Question 1 A group of researchers from a neurobehavioral institute are interested in the relationships that have been found between the amount of cerebral blood flow (CB FLOW) to the

More information

Overview of Lecture. Survey Methods & Design in Psychology. Correlational statistics vs tests of differences between groups

Overview of Lecture. Survey Methods & Design in Psychology. Correlational statistics vs tests of differences between groups Survey Methods & Design in Psychology Lecture 10 ANOVA (2007) Lecturer: James Neill Overview of Lecture Testing mean differences ANOVA models Interactions Follow-up tests Effect sizes Parametric Tests

More information

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Multiple Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Multiple Regression 1 / 19 Multiple Regression 1 The Multiple

More information

Ecological Statistics

Ecological Statistics A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents

More information

HZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION

HZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION HZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION Using the malt quality dataset on the class s Web page: 1. Determine the simple linear correlation of extract with the remaining variables.

More information

Reveal Relationships in Categorical Data

Reveal Relationships in Categorical Data SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Small Group Presentations

Small Group Presentations Admin Assignment 1 due next Tuesday at 3pm in the Psychology course centre. Matrix Quiz during the first hour of next lecture. Assignment 2 due 13 May at 10am. I will upload and distribute these at the

More information

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance

The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance The SAGE Encyclopedia of Educational Research, Measurement, Multivariate Analysis of Variance Contributors: David W. Stockburger Edited by: Bruce B. Frey Book Title: Chapter Title: "Multivariate Analysis

More information

ANOVA in SPSS (Practical)

ANOVA in SPSS (Practical) ANOVA in SPSS (Practical) Analysis of Variance practical In this practical we will investigate how we model the influence of a categorical predictor on a continuous response. Centre for Multilevel Modelling

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

Basic Biostatistics. Chapter 1. Content

Basic Biostatistics. Chapter 1. Content Chapter 1 Basic Biostatistics Jamalludin Ab Rahman MD MPH Department of Community Medicine Kulliyyah of Medicine Content 2 Basic premises variables, level of measurements, probability distribution Descriptive

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

Score Tests of Normality in Bivariate Probit Models

Score Tests of Normality in Bivariate Probit Models Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model

More information

Analysis and Interpretation of Data Part 1

Analysis and Interpretation of Data Part 1 Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying

More information

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.

1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA. LDA lab Feb, 6 th, 2002 1 1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA. 2. Scientific question: estimate the average

More information

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION

MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION Variables In the social sciences data are the observed and/or measured characteristics of individuals and groups

More information

Intro to SPSS. Using SPSS through WebFAS

Intro to SPSS. Using SPSS through WebFAS Intro to SPSS Using SPSS through WebFAS http://www.yorku.ca/computing/students/labs/webfas/ Try it early (make sure it works from your computer) If you need help contact UIT Client Services Voice: 416-736-5800

More information

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose

More information

NORTH SOUTH UNIVERSITY TUTORIAL 2

NORTH SOUTH UNIVERSITY TUTORIAL 2 NORTH SOUTH UNIVERSITY TUTORIAL 2 AHMED HOSSAIN,PhD Data Management and Analysis AHMED HOSSAIN,PhD - Data Management and Analysis 1 Correlation Analysis INTRODUCTION In correlation analysis, we estimate

More information

Multiple Linear Regression Analysis

Multiple Linear Regression Analysis Revised July 2018 Multiple Linear Regression Analysis This set of notes shows how to use Stata in multiple regression analysis. It assumes that you have set Stata up on your computer (see the Getting Started

More information

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA

BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA PART 1: Introduction to Factorial ANOVA ingle factor or One - Way Analysis of Variance can be used to test the null hypothesis that k or more treatment or group

More information

Media, Discussion and Attitudes Technical Appendix. 6 October 2015 BBC Media Action Andrea Scavo and Hana Rohan

Media, Discussion and Attitudes Technical Appendix. 6 October 2015 BBC Media Action Andrea Scavo and Hana Rohan Media, Discussion and Attitudes Technical Appendix 6 October 2015 BBC Media Action Andrea Scavo and Hana Rohan 1 Contents 1 BBC Media Action Programming and Conflict-Related Attitudes (Part 5a: Media and

More information

Normal Q Q. Residuals vs Fitted. Standardized residuals. Theoretical Quantiles. Fitted values. Scale Location 26. Residuals vs Leverage

Normal Q Q. Residuals vs Fitted. Standardized residuals. Theoretical Quantiles. Fitted values. Scale Location 26. Residuals vs Leverage Residuals 400 0 400 800 Residuals vs Fitted 26 42 29 Standardized residuals 2 0 1 2 3 Normal Q Q 26 42 29 360 400 440 2 1 0 1 2 Fitted values Theoretical Quantiles Standardized residuals 0.0 0.5 1.0 1.5

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

Modeling Sentiment with Ridge Regression

Modeling Sentiment with Ridge Regression Modeling Sentiment with Ridge Regression Luke Segars 2/20/2012 The goal of this project was to generate a linear sentiment model for classifying Amazon book reviews according to their star rank. More generally,

More information

Readings Assumed knowledge

Readings Assumed knowledge 3 N = 59 EDUCAT 59 TEACHG 59 CAMP US 59 SOCIAL Analysis of Variance 95% CI Lecture 9 Survey Research & Design in Psychology James Neill, 2012 Readings Assumed knowledge Howell (2010): Ch3 The Normal Distribution

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 8 One Way ANOVA and comparisons among means Introduction

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 8 One Way ANOVA and comparisons among means Introduction Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 8 One Way ANOVA and comparisons among means Introduction In this exercise, we will conduct one-way analyses of variance using two different

More information

STATISTICS AND RESEARCH DESIGN

STATISTICS AND RESEARCH DESIGN Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have

More information

Statistics 2. RCBD Review. Agriculture Innovation Program

Statistics 2. RCBD Review. Agriculture Innovation Program Statistics 2. RCBD Review 2014. Prepared by Lauren Pincus With input from Mark Bell and Richard Plant Agriculture Innovation Program 1 Table of Contents Questions for review... 3 Answers... 3 Materials

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

Student name: SOCI 420 Advanced Methods of Social Research Fall 2017

Student name: SOCI 420 Advanced Methods of Social Research Fall 2017 SOCI 420 Advanced Methods of Social Research Fall 2017 EXAM 1 RUBRIC Instructor: Ernesto F. L. Amaral, Assistant Professor, Department of Sociology Date: October 12, 2017 (Thursday) Section 903: 9:35 10:50am

More information

Biostatistics II

Biostatistics II Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,

More information

Cross-over trials. Martin Bland. Cross-over trials. Cross-over trials. Professor of Health Statistics University of York

Cross-over trials. Martin Bland. Cross-over trials. Cross-over trials. Professor of Health Statistics University of York Cross-over trials Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk Cross-over trials Use the participant as their own control. Each participant gets more than one

More information

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu What you should know before you collect data BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Types and levels of study Descriptive statistics Inferential statistics How to choose a statistical test

More information

Choosing the Correct Statistical Test

Choosing the Correct Statistical Test Choosing the Correct Statistical Test T racie O. Afifi, PhD Departments of Community Health Sciences & Psychiatry University of Manitoba Department of Community Health Sciences COLLEGE OF MEDICINE, FACULTY

More information

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction

Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction In this exercise, we will gain experience assessing scatterplots in regression and

More information

Primary Lighting in a Growth Chamber with Lettuce 330 Watt LED vs. 600 Watt HPS. Prepared for: LumiGrow Inc. By: Robert L. Starnes & Chris P.

Primary Lighting in a Growth Chamber with Lettuce 330 Watt LED vs. 600 Watt HPS. Prepared for: LumiGrow Inc. By: Robert L. Starnes & Chris P. BTO Solutions, LLC 2320 Professional Drive Roseville CA 95661 916 374 0102 916 374 0104 office fax Primary Lighting in a Growth Chamber with Lettuce 330 Watt LED vs. 600 Watt HPS Prepared for: LumiGrow

More information

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Vs. 2 Background 3 There are different types of research methods to study behaviour: Descriptive: observations,

More information

Introduction to Discrimination in Microarray Data Analysis

Introduction to Discrimination in Microarray Data Analysis Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t

More information

Numerous hypothesis tests were performed in this study. To reduce the false positive due to

Numerous hypothesis tests were performed in this study. To reduce the false positive due to Two alternative data-splitting Numerous hypothesis tests were performed in this study. To reduce the false positive due to multiple testing, we are not only seeking the results with extremely small p values

More information

3 CONCEPTUAL FOUNDATIONS OF STATISTICS

3 CONCEPTUAL FOUNDATIONS OF STATISTICS 3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical

More information

n Outline final paper, add to outline as research progresses n Update literature review periodically (check citeseer)

n Outline final paper, add to outline as research progresses n Update literature review periodically (check citeseer) Project Dilemmas How do I know when I m done? How do I know what I ve accomplished? clearly define focus/goal from beginning design a search method that handles plateaus improve some ML method s robustness

More information

An Introduction to Multiple Imputation for Missing Items in Complex Surveys

An Introduction to Multiple Imputation for Missing Items in Complex Surveys An Introduction to Multiple Imputation for Missing Items in Complex Surveys October 17, 2014 Joe Schafer Center for Statistical Research and Methodology (CSRM) United States Census Bureau Views expressed

More information

Chapter 1: Review of Basic Concepts

Chapter 1: Review of Basic Concepts Chapter 1: Review of Basic Concepts Multiple Choice 1. A researcher uses a six-sided dice to determine group membership. The sampling method being used is: a. random sample. b. stratified sample. c. convenience

More information

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017 Essential Statistics for Nursing Research Kristen Carlin, MPH Seattle Nursing Research Workshop January 30, 2017 Table of Contents Plots Descriptive statistics Sample size/power Correlations Hypothesis

More information

Statistical Techniques. Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview

Statistical Techniques. Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview 7 Applying Statistical Techniques Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview... 137 Common Functions... 141 Selecting Variables to be Analyzed... 141 Deselecting

More information

Overview of Non-Parametric Statistics

Overview of Non-Parametric Statistics Overview of Non-Parametric Statistics LISA Short Course Series Mark Seiss, Dept. of Statistics April 7, 2009 Presentation Outline 1. Homework 2. Review of Parametric Statistics 3. Overview Non-Parametric

More information

Regression Including the Interaction Between Quantitative Variables

Regression Including the Interaction Between Quantitative Variables Regression Including the Interaction Between Quantitative Variables The purpose of the study was to examine the inter-relationships among social skills, the complexity of the social situation, and performance

More information

An Introduction to Bayesian Statistics

An Introduction to Bayesian Statistics An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA Fielding School of Public Health robweiss@ucla.edu Sept 2015 Robert Weiss (UCLA) An Introduction to Bayesian Statistics

More information

HS Exam 1 -- March 9, 2006

HS Exam 1 -- March 9, 2006 Please write your name on the back. Don t forget! Part A: Short answer, multiple choice, and true or false questions. No use of calculators, notes, lab workbooks, cell phones, neighbors, brain implants,

More information

Repeated Measures ANOVA and Mixed Model ANOVA. Comparing more than two measurements of the same or matched participants

Repeated Measures ANOVA and Mixed Model ANOVA. Comparing more than two measurements of the same or matched participants Repeated Measures ANOVA and Mixed Model ANOVA Comparing more than two measurements of the same or matched participants Data files Fatigue.sav MentalRotation.sav AttachAndSleep.sav Attitude.sav Homework:

More information

PSY 216: Elementary Statistics Exam 4

PSY 216: Elementary Statistics Exam 4 Name: PSY 16: Elementary Statistics Exam 4 This exam consists of multiple-choice questions and essay / problem questions. For each multiple-choice question, circle the one letter that corresponds to the

More information

Abstract. Introduction

Abstract. Introduction Local Control Analysis of Radon and Ozone S. Stanley Young, CGStat LLC Robert L. Obenchain, Risk Benefit Statistics LLC Goran Krstic, Fraser Health Authority Abstract Large (observational) data sets typically

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi

More information

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;

One-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels; 1 One-Way ANOVAs We have already discussed the t-test. The t-test is used for comparing the means of two groups to determine if there is a statistically significant difference between them. The t-test

More information

investigate. educate. inform.

investigate. educate. inform. investigate. educate. inform. Research Design What drives your research design? The battle between Qualitative and Quantitative is over Think before you leap What SHOULD drive your research design. Advanced

More information

Correlation and Regression

Correlation and Regression Dublin Institute of Technology ARROW@DIT Books/Book Chapters School of Management 2012-10 Correlation and Regression Donal O'Brien Dublin Institute of Technology, donal.obrien@dit.ie Pamela Sharkey Scott

More information

Multivariate dose-response meta-analysis: an update on glst

Multivariate dose-response meta-analysis: an update on glst Multivariate dose-response meta-analysis: an update on glst Nicola Orsini Unit of Biostatistics Unit of Nutritional Epidemiology Institute of Environmental Medicine Karolinska Institutet http://www.imm.ki.se/biostatistics/

More information

10. LINEAR REGRESSION AND CORRELATION

10. LINEAR REGRESSION AND CORRELATION 1 10. LINEAR REGRESSION AND CORRELATION The contingency table describes an association between two nominal (categorical) variables (e.g., use of supplemental oxygen and mountaineer survival ). We have

More information

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models

White Paper Estimating Complex Phenotype Prevalence Using Predictive Models White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015

More information

In each hospital-year, we calculated a 30-day unplanned. readmission rate among patients who survived at least 30 days

In each hospital-year, we calculated a 30-day unplanned. readmission rate among patients who survived at least 30 days Romley JA, Goldman DP, Sood N. US hospitals experienced substantial productivity growth during 2002 11. Health Aff (Millwood). 2015;34(3). Published online February 11, 2015. Appendix Adjusting hospital

More information

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions

Statistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated

More information

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine

Data Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Data Analysis in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Multilevel Data Statistical analyses that fail to recognize

More information

Section 6: Analysing Relationships Between Variables

Section 6: Analysing Relationships Between Variables 6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations

More information

Two-Way Independent Samples ANOVA with SPSS

Two-Way Independent Samples ANOVA with SPSS Two-Way Independent Samples ANOVA with SPSS Obtain the file ANOVA.SAV from my SPSS Data page. The data are those that appear in Table 17-3 of Howell s Fundamental statistics for the behavioral sciences

More information

Prepared by: Assoc. Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies

Prepared by: Assoc. Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Prepared by: Assoc. Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang At the end of this session,

More information

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics : Descriptive Statistics in SPSS When first looking at a dataset, it is wise to use descriptive statistics to get some idea of what your data look like. Here is a simple dataset, showing three different

More information

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4. Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation

More information

Survey research (Lecture 1)

Survey research (Lecture 1) Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation

More information

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes Content Quantifying association between continuous variables. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General

More information

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK SUMMER 011 RE-EXAM PSYF11STAT - STATISTIK Full Name: Årskortnummer: Date: This exam is made up of three parts: Part 1 includes 30 multiple choice questions; Part includes 10 matching questions; and Part

More information

Comparison of discrimination methods for the classification of tumors using gene expression data

Comparison of discrimination methods for the classification of tumors using gene expression data Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley

More information

Addendum: Multiple Regression Analysis (DRAFT 8/2/07)

Addendum: Multiple Regression Analysis (DRAFT 8/2/07) Addendum: Multiple Regression Analysis (DRAFT 8/2/07) When conducting a rapid ethnographic assessment, program staff may: Want to assess the relative degree to which a number of possible predictive variables

More information

Modeling unobserved heterogeneity in Stata

Modeling unobserved heterogeneity in Stata Modeling unobserved heterogeneity in Stata Rafal Raciborski StataCorp LLC November 27, 2017 Rafal Raciborski (StataCorp) Modeling unobserved heterogeneity November 27, 2017 1 / 59 Plan of the talk Concepts

More information

Notes for laboratory session 2

Notes for laboratory session 2 Notes for laboratory session 2 Preliminaries Consider the ordinary least-squares (OLS) regression of alcohol (alcohol) and plasma retinol (retplasm). We do this with STATA as follows:. reg retplasm alcohol

More information

STAT 201 Chapter 3. Association and Regression

STAT 201 Chapter 3. Association and Regression STAT 201 Chapter 3 Association and Regression 1 Association of Variables Two Categorical Variables Response Variable (dependent variable): the outcome variable whose variation is being studied Explanatory

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14

Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Still important ideas Contrast the measurement of observable actions (and/or characteristics)

More information

Chapter 9. Factorial ANOVA with Two Between-Group Factors 10/22/ Factorial ANOVA with Two Between-Group Factors

Chapter 9. Factorial ANOVA with Two Between-Group Factors 10/22/ Factorial ANOVA with Two Between-Group Factors Chapter 9 Factorial ANOVA with Two Between-Group Factors 10/22/2001 1 Factorial ANOVA with Two Between-Group Factors Recall that in one-way ANOVA we study the relation between one criterion variable and

More information

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY

A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,

More information

Linear Regression in SAS

Linear Regression in SAS 1 Suppose we wish to examine factors that predict patient s hemoglobin levels. Simulated data for six patients is used throughout this tutorial. data hgb_data; input id age race $ bmi hgb; cards; 21 25

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

Causal Mediation Analysis with the CAUSALMED Procedure

Causal Mediation Analysis with the CAUSALMED Procedure Paper SAS1991-2018 Causal Mediation Analysis with the CAUSALMED Procedure Yiu-Fai Yung, Michael Lamm, and Wei Zhang, SAS Institute Inc. Abstract Important policy and health care decisions often depend

More information

One way Analysis of Variance (ANOVA)

One way Analysis of Variance (ANOVA) One way Analysis of Variance (ANOVA) Esra Akdeniz March 22nd, 2016 Introduction Test hypothesis concerning one population mean. Test hypothesis concerning two population means What if we want to compare

More information