Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties
|
|
- Roland Smith
- 5 years ago
- Views:
Transcription
1 Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point of 2.6 pci/l for the Radon level that defines a High-Low "Treatment" Dichotomy is given by the initial binary split in the following Partition Regression (Tree) Model: Partition of (unadjusted) Lung Cancer Mortality on Radon RSquare RMSE N Number of AICc Splits All Rows Count Mean Std De LogWorth Difference Radon>=2.6 Count 1220 Mean Std De Radon<2.6 Count 1661 Mean Std De "Low Radon" : Level strictly less than 2.6 pci/l (picocuries per liter.) "High Radon" : Level = 2.6 pci/l (picocuries per liter) or greater. We will see below that higher Radon levels are associated with lower Lung Cancer Mortality rates. Neither this analysis nor the ones depicted on page 2 have been "covariate adjusted" for possible X-confounding factors included within in the datasets being analyzed here. 1
2 Prediction of Lung Cancer Mortality from Ln[Rn]...Unadjusted for all other X-confounders. Ln[Rn] = Natural Logarithm of Radon level. Here, 10 US counties with Radon level coded as "0.0" have been Windsorized in the dataset to Ln[0.05] = The cut-point at Radon = 2.6 pci/l (Ln[Rn] = ) is used in the fits displayed on this page only to color counties either Red or Blue. Linear Fit: Lung Cancer Mortality = * Ln[Rn] RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 2881 Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Model Error Prob > F C. Total <.0001* Parameter Estimates Term Estimate Std Error t Ratio Prob> t Intercept <.0001* Ln[Rn] <.0001* Smoothing Spline Fit, lambda=5 R-Square Sum of Squares Error
3 Output from the "Local Control" JMP Add-In: Pages 3,4 and 5. Outcome Variable: Treatment Variable: Cluster Effect Type: Variability Assumption: Lung Cancer Mortality Radon Level Flag Fixed Homoskedastic Random Number Seed: Specify Number of Clusters = 50 Specify Number of Permutations = Mean_LTD LTD distribution for 50 clusters 3
4 Hierarchical Clustering Method = Fast Ward Obesity (%) Currently Smoke Age Over 65 (%) Dendrogram Hierarchically Clustered Differences 4
5 Response Lung Cancer Mortality -- Nested ANOVA (Treatment within Cluster) RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 2881 Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Model Error Prob > F C. Total <.0001* Effect Tests Source Nparm DF Sum of Squares F Ratio Prob > F Cluster <.0001* High Radon[Cluster] <.0001* NOTE: Cluster #10 is uninformative about Lung Cancer Mortality LTDs (High minus Low Radon) because all 11 US counties it contains have Radon levels less than 2.6 pci/l (picocuries per liter.) This explains why there are only 49 (rather than 50) Degrees-of-Freedom for Treatment-within-Cluster (LTD) effects. Only 49 Degrees-of-Freedom are attributed to main-effects within 50 Clusters by convention; the overall mean effect for mortality is simply removed and not shown in the ANOVA table. Although it may not be obvious from the entries in above Nested ANOVA table, results depend upon choice of the High-Low Radon cut-point (2.6 pci/l here) as well as the numbers of both requested and informative clusters (50 and 49, respectively.) Specifically, the y-outcome column vector here has 2,881 rows for US counties and consists of Lung Cancer Mortality rates being viewed as realizations of a continuous random variable. Furthermore, the "design" matrix has only two non-constant columns viewed as fixed (given) categorical variables: 1. The vector of treatment indicators has 2 levels - say, zeros (Low Radon) and ones (High Radon.) 2. The vector of cluster membership indicators has 50 levels - say, the integers 1 through 50. The analysis of this cross-classification of mortality rates is essentially nonparametric because no information is used on either how clusters were formed / defined from county X-characteristics or what the numerical values of X- characteristics are. 5
6 Aggregate Phase: Observed LTD Distribution (49 Informative Clusters containing 2,870 US Counties) Observed Local Treatment Difference (LTD) Distribution for 50 Ward Clusters Lung Cancer Mortality is measured in Deaths per 100,000 Person-Years. LTDs are differences in Mortality rates: Radon High minus Low. Above histogram depicts the Most Typical LTD Distribution derived from micro-aggregation of 2,881 US Counties on 3 primary X-confounders o Age Over 65 % o Currently Smoke % o Obesity % Y-outcome = Lung Cancer Mortality Binary Treatment Indicator: Radon High ( at least 2.6 pci/l ) vs. Low Best fitting Normal approximation has mean µ = deaths and std. dev. σ =
7 Confirm Phase: Comparison of empirical Cumulative Distribution Functions (CDFs) Random Permutation LTD-like Distribution Observed LTD Distribution These two distributions are rather clearly different; they differ most on statistical measures of location and shape (skewness, kurtosis, range) also see histograms and statistics listed on the next page. This means that clustering (local conditioning, matching) on 3 primary X-confounders [% over 65, % currently smoke and % obese] has indeed yielded appropriately adjusted treatment effect-size estimates. Local treatment effect-size estimates are LTDs expressed as a difference in mortality rates (deaths per 100,000 person-years) of the form: Average for one-or-more High Radon counties minus Average for one-or-more Low Radon counties. 7
8 Random Permutation LTD-like Distribution Observed LTD Distribution Mean Std Dev Std Err Mean Upper 95% Mean Lower 95% Mean N 50 * 2870 Skewness Kurtosis Mean Std Dev Std Err Mean Upper 95% Mean Lower 95% Mean N 2870 Skewness Kurtosis
9 Explore Phases: Tried using Complete Linkage as well as Fast Ward clustering in JMP. Tried using combinations of 3 out of 5 potential X confounders for clustering: o Age Over 65 % o Obesity % o Currently Smoke % o Ever Smoke % o Median Household Income ($1,000s) Tried varying total # of clusters used from 50 to 400. Reveal Phase: NOTE: Cluster #10 is uninformative about LTDs and contains 11 counties. Thus the following predictions use the data from only 2,870 US counties I.E. LTD missingness is not considered informative of potential treatment effect-sizes. Fitted Supervised Learning Models for predicting observed LTDs: o JMP 11 Analyze > Modeling Platform > Partition option single Tree (7 terminal nodes) Bootstrap Forest Model Average of 100 Trees o JMP Analyze > Fit Model Platform Multi Variable Regression (Degree at most 2) Tried using as many as 6 potential X confounders for predicting observed LTDs: o Age Over 65 % o Obesity % o Currently Smoke % o Ever Smoke % o Median Household Income ($1,000s) o Numeric Radon ( or Ln[Rn] ) Level...as either an ordinal or continuous measure 9
10 Predicting LTDs using Supervised Learning: Method One (Single "Small" Tree), R 2 =0.51 Partition - Best such Tree for predicting LTDobserved (6 splits, 7 terminal nodes) LTD RSquare RMSE N Number of Splits AICc
11 Best "Small" Tree: Mean = average LTD within Leaf Note that all 3 splits on "Age Over 65 %" are such that the counties with the higher % elderly population are predicted to have LARGER (more negative) ADVANTAGES of High Radon in keeping Lung Cancer Mortality low. Note also that both splits on "Currently Smoke %" are such that the counties with the lower % smoking are predicted to have LARGER (more negative) ADVANTAGES of High Radon in keeping Lung Cancer Mortality low. Finally, the single split on "Obesity %" is such that the counties with the lower % obese are predicted to have LARGER (more negative) ADVANTAGES of High Radon in keeping Lung Cancer Mortality low. X-Confounder Contributions: Term Number of SS SS Portion Splits Age Over 65 (%) Currently Smoke (%) Obesity (%) Radon level in pci/l Ever Smoke (%) Median HH Income Although membership in the High or Low Treatment cohorts is perfectly predicted by Radon level within a county, it is somewhat interesting that Radon level is not used in the above predictions of the corresponding LTDs in Lung Cancer Mortality rate. 11
12 Predicting LTDs using Supervised Learning: Method Two (Bootstrap Forest), R 2 =0.78 Bootstrap Forest for LTDobserved Number of trees in the forest: 250 Number of terms sampled per split: 4 Training rows: 2870 Validation rows: 0 Test rows: 0 Number of terms: 6 Bootstrap samples: 2870 Minimum Splits Per Tree: 6 Minimum Size Split: 20 Overall Statistics Individual Trees RMSE In Bag Out of Bag RSquare RMSE N
13 Observed LTD Estimates vs their Forest Predictions... X-Confounder Contributions Term Number of SS SS Portion Splits Age Over 65 (%) Currently Smoke (%) Obesity (%) Ever Smoke (%) Median HH Income Radon level in pci/l NOTE: Because Partitioning methods (Trees and Forests) use only the ordinal information about Clusters formed using X-confounders, this could help explain why they do not find Radon Level particularly predictive of LTDs I.E. Radon level in pci/l ranks 6 th out-of-six in the above table of X-confounder predictability!!! On the hand, traditional fully-parametric model fitting methods assume all continuous variables are measured on an interval scale...i.e. individual terms can represent linear or quadratic effects or hyperbolic interactions. We will see 13
14 that numerical values of Radon level are much more predictive of LTDs under these much stronger assumptions. Predicting LTDs using Supervised Learning: Method Three (MultiVariable Regression), R 2 =0.49 RSquare RSquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 2870 Analysis of Variance Source DF Sum of Squares Mean Square F Ratio Model Error Prob > F C. Total <.0001* Parameter Estimates Term Estimate Std Error t Ratio Prob> t Intercept <.0001* Radon * Obesity (%) <.0001* Age Over 65 (%) <.0001* Currently Smoke <.0001* Ever Smoke (Radon )*(Age Over 65 (%) ) <.0001* (Age Over 65 (%) )*(Currently Smoke ) * (Currently Smoke )*(Ever Smoke ) <.0001* (Obesity (%) )*(Obesity (%) ) <.0001* (Age Over 65 (%) )*(Age Over 65 (%) ) <.0001* The 2 quadratic terms (in % obese and % over 65) used here seem particularly curious. Furthermore, including such terms in multi-variable regression model(s) can cause any predictions made strictly outside of the observed ranges of the given X-variables to represent potentially severe and unwarranted extrapolations. Furthermore, of the three methods considered for predicting LTD estimates from six available X-confounding factors, traditional MultiVariable Regression is the least accurate. 14
15 Correlations between Observed LTDs and their Predictions LTD observed LTDtreePred LTDforestPred LTDmvregPred LTD observed LTDtreePred LTDforestPred LTDmvregPred R-squared * * The R 2 value for Bootstrap Forest "model averaging" of (only) listed on page 12 apparently incorporates some sort of further "adjustment" or penalty for being thorough, complicated or versatile. Scatterplot Matrix LTD LTDtreePred LTDforestPred LTDmvregPred
Bias Adjustment: Local Control Analysis of Radon and Ozone
Bias Adjustment: Local Control Analysis of Radon and Ozone S. Stanley Young Robert Obenchain Goran Krstic NCSU 19Oct2016 Abstract Bias Adjustment: Local control analysis of Radon and ozone S. Stanley Young,
More informationANOVA. Thomas Elliott. January 29, 2013
ANOVA Thomas Elliott January 29, 2013 ANOVA stands for analysis of variance and is one of the basic statistical tests we can use to find relationships between two or more variables. ANOVA compares the
More informationMath 215, Lab 7: 5/23/2007
Math 215, Lab 7: 5/23/2007 (1) Parametric versus Nonparamteric Bootstrap. Parametric Bootstrap: (Davison and Hinkley, 1997) The data below are 12 times between failures of airconditioning equipment in
More informationFrom Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Chapter 1: Introduction... 1
From Biostatistics Using JMP: A Practical Guide. Full book available for purchase here. Contents Dedication... iii Acknowledgments... xi About This Book... xiii About the Author... xvii Chapter 1: Introduction...
More informationMULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES
24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter
More informationDr. Kelly Bradley Final Exam Summer {2 points} Name
{2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. This exam is being scored out of 00 points.
More informationMidterm Exam ANSWERS Categorical Data Analysis, CHL5407H
Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H 1. Data from a survey of women s attitudes towards mammography are provided in Table 1. Women were classified by their experience with mammography
More informationAnalysis of Variance: repeated measures
Analysis of Variance: repeated measures Tests for comparing three or more groups or conditions: (a) Nonparametric tests: Independent measures: Kruskal-Wallis. Repeated measures: Friedman s. (b) Parametric
More informationQuantitative Methods in Computing Education Research (A brief overview tips and techniques)
Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu
More informationStatistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.
Statistics as a Tool A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Descriptive Statistics Numerical facts or observations that are organized describe
More informationBusiness Statistics Probability
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More informationPsych 5741/5751: Data Analysis University of Boulder Gary McClelland & Charles Judd. Exam #2, Spring 1992
Exam #2, Spring 1992 Question 1 A group of researchers from a neurobehavioral institute are interested in the relationships that have been found between the amount of cerebral blood flow (CB FLOW) to the
More informationOverview of Lecture. Survey Methods & Design in Psychology. Correlational statistics vs tests of differences between groups
Survey Methods & Design in Psychology Lecture 10 ANOVA (2007) Lecturer: James Neill Overview of Lecture Testing mean differences ANOVA models Interactions Follow-up tests Effect sizes Parametric Tests
More informationMultiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University
Multiple Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Multiple Regression 1 / 19 Multiple Regression 1 The Multiple
More informationEcological Statistics
A Primer of Ecological Statistics Second Edition Nicholas J. Gotelli University of Vermont Aaron M. Ellison Harvard Forest Sinauer Associates, Inc. Publishers Sunderland, Massachusetts U.S.A. Brief Contents
More informationHZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION
HZAU MULTIVARIATE HOMEWORK #2 MULTIPLE AND STEPWISE LINEAR REGRESSION Using the malt quality dataset on the class s Web page: 1. Determine the simple linear correlation of extract with the remaining variables.
More informationReveal Relationships in Categorical Data
SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction
More informationDescribe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More informationSmall Group Presentations
Admin Assignment 1 due next Tuesday at 3pm in the Psychology course centre. Matrix Quiz during the first hour of next lecture. Assignment 2 due 13 May at 10am. I will upload and distribute these at the
More informationThe SAGE Encyclopedia of Educational Research, Measurement, and Evaluation Multivariate Analysis of Variance
The SAGE Encyclopedia of Educational Research, Measurement, Multivariate Analysis of Variance Contributors: David W. Stockburger Edited by: Bruce B. Frey Book Title: Chapter Title: "Multivariate Analysis
More informationANOVA in SPSS (Practical)
ANOVA in SPSS (Practical) Analysis of Variance practical In this practical we will investigate how we model the influence of a categorical predictor on a continuous response. Centre for Multilevel Modelling
More informationList of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition
List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing
More informationBasic Biostatistics. Chapter 1. Content
Chapter 1 Basic Biostatistics Jamalludin Ab Rahman MD MPH Department of Community Medicine Kulliyyah of Medicine Content 2 Basic premises variables, level of measurements, probability distribution Descriptive
More informationReadings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F
Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions
More informationScore Tests of Normality in Bivariate Probit Models
Score Tests of Normality in Bivariate Probit Models Anthony Murphy Nuffield College, Oxford OX1 1NF, UK Abstract: A relatively simple and convenient score test of normality in the bivariate probit model
More informationAnalysis and Interpretation of Data Part 1
Analysis and Interpretation of Data Part 1 DATA ANALYSIS: PRELIMINARY STEPS 1. Editing Field Edit Completeness Legibility Comprehensibility Consistency Uniformity Central Office Edit 2. Coding Specifying
More information1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.
LDA lab Feb, 6 th, 2002 1 1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA. 2. Scientific question: estimate the average
More informationMBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION
MBA 605 Business Analytics Don Conant, PhD. GETTING TO THE STANDARD NORMAL DISTRIBUTION Variables In the social sciences data are the observed and/or measured characteristics of individuals and groups
More informationIntro to SPSS. Using SPSS through WebFAS
Intro to SPSS Using SPSS through WebFAS http://www.yorku.ca/computing/students/labs/webfas/ Try it early (make sure it works from your computer) If you need help contact UIT Client Services Voice: 416-736-5800
More informationLecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics
Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose
More informationNORTH SOUTH UNIVERSITY TUTORIAL 2
NORTH SOUTH UNIVERSITY TUTORIAL 2 AHMED HOSSAIN,PhD Data Management and Analysis AHMED HOSSAIN,PhD - Data Management and Analysis 1 Correlation Analysis INTRODUCTION In correlation analysis, we estimate
More informationMultiple Linear Regression Analysis
Revised July 2018 Multiple Linear Regression Analysis This set of notes shows how to use Stata in multiple regression analysis. It assumes that you have set Stata up on your computer (see the Getting Started
More informationBIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA
BIOL 458 BIOMETRY Lab 7 Multi-Factor ANOVA PART 1: Introduction to Factorial ANOVA ingle factor or One - Way Analysis of Variance can be used to test the null hypothesis that k or more treatment or group
More informationMedia, Discussion and Attitudes Technical Appendix. 6 October 2015 BBC Media Action Andrea Scavo and Hana Rohan
Media, Discussion and Attitudes Technical Appendix 6 October 2015 BBC Media Action Andrea Scavo and Hana Rohan 1 Contents 1 BBC Media Action Programming and Conflict-Related Attitudes (Part 5a: Media and
More informationNormal Q Q. Residuals vs Fitted. Standardized residuals. Theoretical Quantiles. Fitted values. Scale Location 26. Residuals vs Leverage
Residuals 400 0 400 800 Residuals vs Fitted 26 42 29 Standardized residuals 2 0 1 2 3 Normal Q Q 26 42 29 360 400 440 2 1 0 1 2 Fitted values Theoretical Quantiles Standardized residuals 0.0 0.5 1.0 1.5
More informationStill important ideas
Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!
More informationModeling Sentiment with Ridge Regression
Modeling Sentiment with Ridge Regression Luke Segars 2/20/2012 The goal of this project was to generate a linear sentiment model for classifying Amazon book reviews according to their star rank. More generally,
More informationReadings Assumed knowledge
3 N = 59 EDUCAT 59 TEACHG 59 CAMP US 59 SOCIAL Analysis of Variance 95% CI Lecture 9 Survey Research & Design in Psychology James Neill, 2012 Readings Assumed knowledge Howell (2010): Ch3 The Normal Distribution
More informationbivariate analysis: The statistical analysis of the relationship between two variables.
bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for
More informationDescribe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo
Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter
More informationBiology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 8 One Way ANOVA and comparisons among means Introduction
Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 8 One Way ANOVA and comparisons among means Introduction In this exercise, we will conduct one-way analyses of variance using two different
More informationSTATISTICS AND RESEARCH DESIGN
Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have
More informationStatistics 2. RCBD Review. Agriculture Innovation Program
Statistics 2. RCBD Review 2014. Prepared by Lauren Pincus With input from Mark Bell and Richard Plant Agriculture Innovation Program 1 Table of Contents Questions for review... 3 Answers... 3 Materials
More informationStill important ideas
Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still
More informationStudent name: SOCI 420 Advanced Methods of Social Research Fall 2017
SOCI 420 Advanced Methods of Social Research Fall 2017 EXAM 1 RUBRIC Instructor: Ernesto F. L. Amaral, Assistant Professor, Department of Sociology Date: October 12, 2017 (Thursday) Section 903: 9:35 10:50am
More informationBiostatistics II
Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,
More informationCross-over trials. Martin Bland. Cross-over trials. Cross-over trials. Professor of Health Statistics University of York
Cross-over trials Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk Cross-over trials Use the participant as their own control. Each participant gets more than one
More informationWhat you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu
What you should know before you collect data BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Types and levels of study Descriptive statistics Inferential statistics How to choose a statistical test
More informationChoosing the Correct Statistical Test
Choosing the Correct Statistical Test T racie O. Afifi, PhD Departments of Community Health Sciences & Psychiatry University of Manitoba Department of Community Health Sciences COLLEGE OF MEDICINE, FACULTY
More informationBiology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction
Biology 345: Biometry Fall 2005 SONOMA STATE UNIVERSITY Lab Exercise 5 Residuals and multiple regression Introduction In this exercise, we will gain experience assessing scatterplots in regression and
More informationPrimary Lighting in a Growth Chamber with Lettuce 330 Watt LED vs. 600 Watt HPS. Prepared for: LumiGrow Inc. By: Robert L. Starnes & Chris P.
BTO Solutions, LLC 2320 Professional Drive Roseville CA 95661 916 374 0102 916 374 0104 office fax Primary Lighting in a Growth Chamber with Lettuce 330 Watt LED vs. 600 Watt HPS Prepared for: LumiGrow
More informationStatistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN
Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Vs. 2 Background 3 There are different types of research methods to study behaviour: Descriptive: observations,
More informationIntroduction to Discrimination in Microarray Data Analysis
Introduction to Discrimination in Microarray Data Analysis Jane Fridlyand CBMB University of California, San Francisco Genentech Hall Auditorium, Mission Bay, UCSF October 23, 2004 1 Case Study: Van t
More informationNumerous hypothesis tests were performed in this study. To reduce the false positive due to
Two alternative data-splitting Numerous hypothesis tests were performed in this study. To reduce the false positive due to multiple testing, we are not only seeking the results with extremely small p values
More information3 CONCEPTUAL FOUNDATIONS OF STATISTICS
3 CONCEPTUAL FOUNDATIONS OF STATISTICS In this chapter, we examine the conceptual foundations of statistics. The goal is to give you an appreciation and conceptual understanding of some basic statistical
More informationn Outline final paper, add to outline as research progresses n Update literature review periodically (check citeseer)
Project Dilemmas How do I know when I m done? How do I know what I ve accomplished? clearly define focus/goal from beginning design a search method that handles plateaus improve some ML method s robustness
More informationAn Introduction to Multiple Imputation for Missing Items in Complex Surveys
An Introduction to Multiple Imputation for Missing Items in Complex Surveys October 17, 2014 Joe Schafer Center for Statistical Research and Methodology (CSRM) United States Census Bureau Views expressed
More informationChapter 1: Review of Basic Concepts
Chapter 1: Review of Basic Concepts Multiple Choice 1. A researcher uses a six-sided dice to determine group membership. The sampling method being used is: a. random sample. b. stratified sample. c. convenience
More informationTable of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017
Essential Statistics for Nursing Research Kristen Carlin, MPH Seattle Nursing Research Workshop January 30, 2017 Table of Contents Plots Descriptive statistics Sample size/power Correlations Hypothesis
More informationStatistical Techniques. Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview
7 Applying Statistical Techniques Meta-Stat provides a wealth of statistical tools to help you examine your data. Overview... 137 Common Functions... 141 Selecting Variables to be Analyzed... 141 Deselecting
More informationOverview of Non-Parametric Statistics
Overview of Non-Parametric Statistics LISA Short Course Series Mark Seiss, Dept. of Statistics April 7, 2009 Presentation Outline 1. Homework 2. Review of Parametric Statistics 3. Overview Non-Parametric
More informationRegression Including the Interaction Between Quantitative Variables
Regression Including the Interaction Between Quantitative Variables The purpose of the study was to examine the inter-relationships among social skills, the complexity of the social situation, and performance
More informationAn Introduction to Bayesian Statistics
An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA Fielding School of Public Health robweiss@ucla.edu Sept 2015 Robert Weiss (UCLA) An Introduction to Bayesian Statistics
More informationHS Exam 1 -- March 9, 2006
Please write your name on the back. Don t forget! Part A: Short answer, multiple choice, and true or false questions. No use of calculators, notes, lab workbooks, cell phones, neighbors, brain implants,
More informationRepeated Measures ANOVA and Mixed Model ANOVA. Comparing more than two measurements of the same or matched participants
Repeated Measures ANOVA and Mixed Model ANOVA Comparing more than two measurements of the same or matched participants Data files Fatigue.sav MentalRotation.sav AttachAndSleep.sav Attitude.sav Homework:
More informationPSY 216: Elementary Statistics Exam 4
Name: PSY 16: Elementary Statistics Exam 4 This exam consists of multiple-choice questions and essay / problem questions. For each multiple-choice question, circle the one letter that corresponds to the
More informationAbstract. Introduction
Local Control Analysis of Radon and Ozone S. Stanley Young, CGStat LLC Robert L. Obenchain, Risk Benefit Statistics LLC Goran Krstic, Fraser Health Authority Abstract Large (observational) data sets typically
More informationUnit 1 Exploring and Understanding Data
Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile
More informationReview: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections
Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections New: Bias-variance decomposition, biasvariance tradeoff, overfitting, regularization, and feature selection Yi
More informationOne-Way ANOVAs t-test two statistically significant Type I error alpha null hypothesis dependant variable Independent variable three levels;
1 One-Way ANOVAs We have already discussed the t-test. The t-test is used for comparing the means of two groups to determine if there is a statistically significant difference between them. The t-test
More informationinvestigate. educate. inform.
investigate. educate. inform. Research Design What drives your research design? The battle between Qualitative and Quantitative is over Think before you leap What SHOULD drive your research design. Advanced
More informationCorrelation and Regression
Dublin Institute of Technology ARROW@DIT Books/Book Chapters School of Management 2012-10 Correlation and Regression Donal O'Brien Dublin Institute of Technology, donal.obrien@dit.ie Pamela Sharkey Scott
More informationMultivariate dose-response meta-analysis: an update on glst
Multivariate dose-response meta-analysis: an update on glst Nicola Orsini Unit of Biostatistics Unit of Nutritional Epidemiology Institute of Environmental Medicine Karolinska Institutet http://www.imm.ki.se/biostatistics/
More information10. LINEAR REGRESSION AND CORRELATION
1 10. LINEAR REGRESSION AND CORRELATION The contingency table describes an association between two nominal (categorical) variables (e.g., use of supplemental oxygen and mountaineer survival ). We have
More informationWhite Paper Estimating Complex Phenotype Prevalence Using Predictive Models
White Paper 23-12 Estimating Complex Phenotype Prevalence Using Predictive Models Authors: Nicholas A. Furlotte Aaron Kleinman Robin Smith David Hinds Created: September 25 th, 2015 September 25th, 2015
More informationIn each hospital-year, we calculated a 30-day unplanned. readmission rate among patients who survived at least 30 days
Romley JA, Goldman DP, Sood N. US hospitals experienced substantial productivity growth during 2002 11. Health Aff (Millwood). 2015;34(3). Published online February 11, 2015. Appendix Adjusting hospital
More informationStatistics is the science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making effective decisions
Readings: OpenStax Textbook - Chapters 1 5 (online) Appendix D & E (online) Plous - Chapters 1, 5, 6, 13 (online) Introductory comments Describe how familiarity with statistical methods can - be associated
More informationData Analysis in Practice-Based Research. Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine
Data Analysis in Practice-Based Research Stephen Zyzanski, PhD Department of Family Medicine Case Western Reserve University School of Medicine Multilevel Data Statistical analyses that fail to recognize
More informationSection 6: Analysing Relationships Between Variables
6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations
More informationTwo-Way Independent Samples ANOVA with SPSS
Two-Way Independent Samples ANOVA with SPSS Obtain the file ANOVA.SAV from my SPSS Data page. The data are those that appear in Table 17-3 of Howell s Fundamental statistics for the behavioral sciences
More informationPrepared by: Assoc. Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies
Prepared by: Assoc. Prof. Dr Bahaman Abu Samah Department of Professional Development and Continuing Education Faculty of Educational Studies Universiti Putra Malaysia Serdang At the end of this session,
More informationHere are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :
Descriptive Statistics in SPSS When first looking at a dataset, it is wise to use descriptive statistics to get some idea of what your data look like. Here is a simple dataset, showing three different
More informationSurvey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.
Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation
More informationSurvey research (Lecture 1)
Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation
More informationContent. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes
Content Quantifying association between continuous variables. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General
More informationSUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK
SUMMER 011 RE-EXAM PSYF11STAT - STATISTIK Full Name: Årskortnummer: Date: This exam is made up of three parts: Part 1 includes 30 multiple choice questions; Part includes 10 matching questions; and Part
More informationComparison of discrimination methods for the classification of tumors using gene expression data
Comparison of discrimination methods for the classification of tumors using gene expression data Sandrine Dudoit, Jane Fridlyand 2 and Terry Speed 2,. Mathematical Sciences Research Institute, Berkeley
More informationAddendum: Multiple Regression Analysis (DRAFT 8/2/07)
Addendum: Multiple Regression Analysis (DRAFT 8/2/07) When conducting a rapid ethnographic assessment, program staff may: Want to assess the relative degree to which a number of possible predictive variables
More informationModeling unobserved heterogeneity in Stata
Modeling unobserved heterogeneity in Stata Rafal Raciborski StataCorp LLC November 27, 2017 Rafal Raciborski (StataCorp) Modeling unobserved heterogeneity November 27, 2017 1 / 59 Plan of the talk Concepts
More informationNotes for laboratory session 2
Notes for laboratory session 2 Preliminaries Consider the ordinary least-squares (OLS) regression of alcohol (alcohol) and plasma retinol (retplasm). We do this with STATA as follows:. reg retplasm alcohol
More informationSTAT 201 Chapter 3. Association and Regression
STAT 201 Chapter 3 Association and Regression 1 Association of Variables Two Categorical Variables Response Variable (dependent variable): the outcome variable whose variation is being studied Explanatory
More informationChapter 1: Explaining Behavior
Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring
More informationReadings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14
Readings: Textbook readings: OpenStax - Chapters 1 11 Online readings: Appendix D, E & F Plous Chapters 10, 11, 12 and 14 Still important ideas Contrast the measurement of observable actions (and/or characteristics)
More informationChapter 9. Factorial ANOVA with Two Between-Group Factors 10/22/ Factorial ANOVA with Two Between-Group Factors
Chapter 9 Factorial ANOVA with Two Between-Group Factors 10/22/2001 1 Factorial ANOVA with Two Between-Group Factors Recall that in one-way ANOVA we study the relation between one criterion variable and
More informationA COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY
A COMPARISON OF IMPUTATION METHODS FOR MISSING DATA IN A MULTI-CENTER RANDOMIZED CLINICAL TRIAL: THE IMPACT STUDY Lingqi Tang 1, Thomas R. Belin 2, and Juwon Song 2 1 Center for Health Services Research,
More informationLinear Regression in SAS
1 Suppose we wish to examine factors that predict patient s hemoglobin levels. Simulated data for six patients is used throughout this tutorial. data hgb_data; input id age race $ bmi hgb; cards; 21 25
More informationUnderstandable Statistics
Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement
More informationCausal Mediation Analysis with the CAUSALMED Procedure
Paper SAS1991-2018 Causal Mediation Analysis with the CAUSALMED Procedure Yiu-Fai Yung, Michael Lamm, and Wei Zhang, SAS Institute Inc. Abstract Important policy and health care decisions often depend
More informationOne way Analysis of Variance (ANOVA)
One way Analysis of Variance (ANOVA) Esra Akdeniz March 22nd, 2016 Introduction Test hypothesis concerning one population mean. Test hypothesis concerning two population means What if we want to compare
More information