Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes
|
|
- Brendan Cobb
- 6 years ago
- Views:
Transcription
1 Content Quantifying association between continuous variables. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma The Research Unit for General Practice in Copenhagen In particular: Correlation (Simple) regression Dias 1 Dias 2 Example Newly diagnosed Type 2 Diabetes Research question pt glucose bmi sex age A data set with 729 newly diagnosed Type 2 diabetes patients. pt: Patient ID glucose: Diagnostic plasma glucose (mmol/l) bmi: sex: age: Body Mass Index (kg/m2) sex (1=male, 0=female) age (years) Do fat people have a more severe diabetes when the diabetes is discovered? Or in a more statistical language: Is diagnostic plasma glucose (positively) associated with the body mass index at the time of diagnosis? Dias 3 Dias 4
2 Scatter-plot Scatter-plot When investigating a potential association between only two variables (like diagnostic plasma glucose and BMI) a scatterplot is an important part of the analysis. It gives insight in the nature of the association. It shows problems in the data, e.g. outliers, strange or impossible values. Dias 5 Dias 6 Scatter-plot There is no apparent tendency, specifically not one that would support our research question and if we have to point out a tendency, it would be that high BMI associates with lower diagnostic glucose (why is this not so strange if we think about the diagnosis of diabetes?). Scatter-plot R code plot(diabetes$bmi,diabetes$glucose, frame=true, main=null, xlab= BMI (kg/m2), ylab= Glucose (mmol/l), col= green, pch=19) There seem to be some very large values, especially for diagnostic plasma glucose. These are valid measurements. Maybe a log transformation of glucose would make associations more apparent? Dias 7 Dias 8
3 Scatter-plot log transformation Measures of association We want to capture the association between two variables in a single number: a correlation coefficient, a measure of association. Suppose that Y i is the diagnostic plasma glucose of patient i and X i the BMI for the same person. Then we want our measure of association to have the following characteristics: A positive association indicates that if X i is large (relative to the rest of the sample) then Y i is likely to be large as well. A negative association indicates that if X i is large then Y i is likely to be small. Dias 9 Dias 10 Measures of association between -1 and 1 Measures of association for the diabetes data 0 : No association r = ρ = τ = : perfect positive association -1 : Perfect negative association Dias 11 Dias 12
4 Measures of association for the diabetes data Pearson s correlation coefficient and log transformed r = ρ = τ = Only the first one changes! Pearson s correlation coefficient is computed from the data set (X i, Y i ), i = 1,,N as: N r = i= 1 ( X X )( Y Y ) where X and Y are the respective means and SD x and SD y the respective standard deviations. i ( N 1) SD SD x i y Dias 13 Dias 14 Characteristics of Pearson s correlation coefficient Pearson s correlation coefficient has the following properties: It measures the degree of linear association. Pearson s correlation coefficient R code > cor(diabetes$bmi,diabetes$glucose,use= complete.obs ) [1] Gives only the correlation coefficient. It is invariant to linear change of scale for the variables. It is not robust to outliers. Coefficient values that are comparable between different data sets, and moreover a valid confidence interval and p-value, require that both X i and Y i are normally distributed. > cor.test(diabetes$bmi,diabetes$glucose) Pearson's product-moment correlation data: diabetes$bmi and diabetes$glucose t = , df = 723, p-value = alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: sample estimates: cor Also performs a statistical test to see whether the coefficient is different from zero. Dias 15 Dias 16
5 Normally distributed? Normally distributed? BMI Glucose BMI Log(Glucose) A Normal distribution for comparison. Dias 17 Dias 18 Normally distributed? Normally distributed? Dias 19 Dias 20
6 R code Rank correlation Spearman s ρ A histogram of BMI: hist(diabetes$bmi,main= BMI,xlab= BMI (kg/m2),col= green ) A Normal Q-Q plot of BMI: qqnorm(diabetes$bmi,main= BMI,col= green ) qqline(diabetes$bmi,col= red ) And how do we get all these works of art in some decent format? If data does not appear to be Normally distributed, or when there are outliers, one may instead compute the correlation between the ranks of the X i values and the ranks of the Y i values. This gives a nonparametric correlation coefficient called Spearman s ρ. It measures monotone association. It is invariant to monotone transformations (like a log transformation). jpeg(file= D:\mydirectory\mypicture.jpg,width=500,height=500) # # put here the code that generates the picture # dev.off() It is robust to outliers. It has an odd interpretation. Dias 21 Dias 22 Spearman s rank correlation coefficient R code > cor.test(diabetes$bmi,diabetes$glucose,method= spearman ) Spearman's rank correlation rho data: diabetes$bmi and diabetes$glucose S = , p-value = alternative hypothesis: true rho is not equal to 0 sample estimates: rho Warning message: In cor.test.default(diabetes$bmi, diabetes$glucose, method = "spearman") : Cannot compute exact p-values with ties Rank correlation Kendall s τ A measure of monotone association with a more intuitive interpretation than Spearman s ρ is Kendall s τ. The observations from a pair of subjects i, j are and concordant if X i < X j and Y i < Y j or X i > X j and Y i > Y j discordant if X i < X j and Y i > Y j or X i > X j and Y i < Y j Kendall s τ is the difference between the probability for a concordant pair and the probability for a discordant pair. There are various versions of Kendall s τ depending on how ties are treated. Dias 23 Dias 24
7 Characteristics of Kendall s tau Kendall s rank correlation coefficient R code It measures monotone association. It is invariant to monotone transformations (like a log transformation). It is robust to outliers. It has a more straightforward interpretation than Spearman s rho. > cor.test(diabetes$bmi,diabetes$glucose,method= kendall ) Kendall's rank correlation tau data: diabetes$bmi and diabetes$glucose z = , p-value = alternative hypothesis: true tau is not equal to 0 sample estimates: tau Dias 25 Dias 26 Correlation in the diabetes data Correlation in the diabetes data and log transformed r = (p = 0.110) ρ = (p = 0.180) τ = (p = 0.169) r = (p = 0.154) ρ = (p = 0.180) τ = (p = 0.169) Dias 27 Dias 28
8 Limitations of correlation coefficients While it is (relatively) clear what a correlation coefficient of 0 means, and also 1 or -1, it is often unclear what a highly significant correlation of, say, 0.5 means Correlation rarely answers the research question to a sufficient extend; because it is not easily interpretable. Coefficients of correlation depend on the sample selection and therefore we cannot compare values of the coefficients found in different data. Dias 29 Dias 30 Regression analysis Regression model formulation An (intuitively interpretable) way to describe a (linear) association between two continuous type variables. We say: To regress Y on X or: To regress glucose on BMI It models a response Y (the dependent variable, the exogenous variable, the output) as a function of a predictor X (the independent variable, the exogenous variable, the explanatory variable, the covariate) and a term representing random other influences (error, noise). Mathematically: Y i = α + βx i + ε i Where ε i are independently Normal distributed noise terms with mean 0 and standard deviation σ. Dias 31 Dias 32
9 Regression model Scatter-plot with regression line The mean of Y is modelled with a linear function of X; a line in the X-Y plane. For each X, Y is a random variable Normally distributed around the modelled mean of Y, with standard deviation σ Dias 33 Dias 34 Interpretation of the parameters Research question We have variation due to a systematic part, the explanatory variable, and a random part, the noise. The systematic part of the model is defined by the regression line. α = the intercept: β = the slope: mean level for Y i when X i = 0 mean increase for Y i when X i is increased 1 unit. Do fat people have a more severe diabetes when the diabetes is discovered? Or in a more statistical language: Is diagnostic plasma glucose (positively) associated with the body mass index at the time of diagnosis? In a (simple) linear regression analysis, is the slope β different from 0 (or more pertinently, larger than 0)? Dias 35 Dias 36
10 How does the model answer the research question? Linear regression R code Interest may focus on making a simple hypothesis about the two parameters: Null hypothesis : β = 0 Null hypothesis : α = 0 The second hypothesis often has no (clinical) meaning. > mymodel <- lm(diabetes$glucose~diabetes$bmi) > summary(mymodel) Call: lm(formula = diabetes$glucose ~ diabetes$bmi) Residuals: Min 1Q Median 3Q Max Estimate of the slope P-value of the test for the null hypothesis β = 0. Coefficients: Estimate Std. Error t value Pr(> t ) Table with (Intercept) <2e-16 *** parameter diabetes$bmi estimates --- Signif. codes: 0 *** ** 0.01 * Residual standard error: on 723 degrees of freedom (4 observations deleted due to missingness) Multiple R-squared: , Adjusted R-squared: F-statistic: on 1 and 723 DF, p-value: Dias 37 Dias 38 Plot of regression line R code Scatter-plot with regression line The lm() function can be used to plot the regression line in the scatter-plot: > plot(diabetes$bmi,diabetes$glucose) > mymodel <- lm(diabetes$glucose~diabetes$bmi) > abline(mymodel) log transformed glucose Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** diabetes$bmi Dias 39 Dias 40
11 How are the parameters estimated? Least squares fit The estimated parameters of the linear model define the line (found among all possible lines) which minimizes the squared distance between the data-points and the line in the scatter-plot. The estimation method is called ordinary least-squares (maximum likelihood gives the same answer). Dias 41 Dias 42 Does the model fit the data? Diagnostic plots Dias 43 Dias 44
12 Diagnostic plots R produces some diagnostic plots (of varying usefulness). The residuals (the error or noise) was supposed to be Normal distributed, this can be studied in the Q-Q plot (top right) More importantly, the residuals should have a single standard deviation, i.e. the variance should not increase with, for example, BMI. This can be studied in the residuals vs. fitted plot (top left) Data transformations If the residuals are not Normal, or (and this is more serious because the central limit theorem deals with much of the non- Normality issue) if variance seems to increase with level, it may be a good idea to transform one or both variables. This is the real reason to investigate log(glucose) instead of glucose. > mymodel <- lm(diabetes$glucose~diabetes$bmi) > opar <- par(mfrow = c(2,2), oma = c(0,0,1.1,0)) > plot(mymodel) > par(opar) Dias 45 Dias 46 Data transformations log transform The influence of one outlier Dias 47 Dias 48
13 Simpson s paradox Simpson s paradox Florida death penalty verdicts for homicide relative to defendant s race White Black Blacks tend to murder blacks and whites tend to murder whites White Black Victim white 11% 23% (53/414) (11/37) and the murder of a white person has a higher probability of death penalty. 11% (53/430) 8% (15/176) Victim black 0% (0/16) 3% (4/139) For any victim the probability for a black person to get death penalty is about 2 times higher. Dias 49 Dias 50 Confounding Confounding Victim s race We are interested in the green highlighted association, but there is a correlation with the victim s race both with the defendant s race and the outcome of the trial. Confounder A confounder influences both exposure and outcome When confounding is present we cannot interpret the green highlighted association as causal Defendant s race Death penalty Exposure Outcome Dias 51 Dias 52
14 Randomization Two regressions Confounder Often there are many factors that may influence both exposure and outcome, some of them may not be observed The blue points denote patients with SBP>140 mmhg; the blue line the corresponding regression line. Exposure randomised Outcome or are unknown. If exposure is randomised, then there is no confounding. The green highlighted association can be interpreted causal. The red points denote patients with SBP < 140 mmhg; the red line the corresponding regression line. The black line is the general regression line. The slopes from the stratified analyses are less steep than the slope of the general line. Dias 53 Dias 54 Multiple regression > mymodel <- lm(log(diabetes$glucose)~diabetes$bmi+diabetes$sbp) > summary(mymodel) Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) <2e-16 *** diabetes$bmi diabetes$sbp * --- Signif. codes: 0 *** ** 0.01 * Multiple regression Adjusting a statistical analysis means to include other predictor variables into the model formula. Intuitively, a slope for BMI is determined for each level of the SBP variable separately and these are then averaged. including SBP in the analysis removes the confounding effect of SBP from the relationship between log(glucose) and BMI. The adjusted slope (association) of bmi is less pronounced than before. SBP is related to both glucose and bmi and is a confounder. Dias 55 Dias 56
15 Take home message Association between two continuous variables may be measured by correlation coefficients or in (simple) linear regression analysis. The latter provides arguably the best interpretable results. Moreover, it is straightforwardly extended to be able to deal with confounding, and more Dias 57
NORTH SOUTH UNIVERSITY TUTORIAL 2
NORTH SOUTH UNIVERSITY TUTORIAL 2 AHMED HOSSAIN,PhD Data Management and Analysis AHMED HOSSAIN,PhD - Data Management and Analysis 1 Correlation Analysis INTRODUCTION In correlation analysis, we estimate
More informationMultiple Regression Analysis
Multiple Regression Analysis Basic Concept: Extend the simple regression model to include additional explanatory variables: Y = β 0 + β1x1 + β2x2 +... + βp-1xp + ε p = (number of independent variables
More informationMMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?
MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference
More informationOverview of Non-Parametric Statistics
Overview of Non-Parametric Statistics LISA Short Course Series Mark Seiss, Dept. of Statistics April 7, 2009 Presentation Outline 1. Homework 2. Review of Parametric Statistics 3. Overview Non-Parametric
More informationClass 7 Everything is Related
Class 7 Everything is Related Correlational Designs l 1 Topics Types of Correlational Designs Understanding Correlation Reporting Correlational Statistics Quantitative Designs l 2 Types of Correlational
More informationSimple Linear Regression the model, estimation and testing
Simple Linear Regression the model, estimation and testing Lecture No. 05 Example 1 A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity.
More information2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%
Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of
More informationIAPT: Regression. Regression analyses
Regression analyses IAPT: Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a student project
More informationStatistical reports Regression, 2010
Statistical reports Regression, 2010 Niels Richard Hansen June 10, 2010 This document gives some guidelines on how to write a report on a statistical analysis. The document is organized into sections that
More informationStatistics as a Tool. A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations.
Statistics as a Tool A set of tools for collecting, organizing, presenting and analyzing numerical facts or observations. Descriptive Statistics Numerical facts or observations that are organized describe
More informationMidterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.
Midterm STAT-UB.0003 Regression and Forecasting Models The exam is closed book and notes, with the following exception: you are allowed to bring one letter-sized page of notes into the exam (front and
More informationChapter 3 CORRELATION AND REGRESSION
CORRELATION AND REGRESSION TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept or a 5 What Value of the Y-Variable Should be Predicted When r = 0? 7 The Regression
More informationSCHOOL OF MATHEMATICS AND STATISTICS
Data provided: Tables of distributions MAS603 SCHOOL OF MATHEMATICS AND STATISTICS Further Clinical Trials Spring Semester 014 015 hours Candidates may bring to the examination a calculator which conforms
More informationNormal Q Q. Residuals vs Fitted. Standardized residuals. Theoretical Quantiles. Fitted values. Scale Location 26. Residuals vs Leverage
Residuals 400 0 400 800 Residuals vs Fitted 26 42 29 Standardized residuals 2 0 1 2 3 Normal Q Q 26 42 29 360 400 440 2 1 0 1 2 Fitted values Theoretical Quantiles Standardized residuals 0.0 0.5 1.0 1.5
More informationSimple Linear Regression
Simple Linear Regression Assoc. Prof Dr Sarimah Abdullah Unit of Biostatistics & Research Methodology School of Medical Sciences, Health Campus Universiti Sains Malaysia Regression Regression analysis
More informationPoisson regression. Dae-Jin Lee Basque Center for Applied Mathematics.
Dae-Jin Lee dlee@bcamath.org Basque Center for Applied Mathematics http://idaejin.github.io/bcam-courses/ D.-J. Lee (BCAM) Intro to GLM s with R GitHub: idaejin 1/40 Modeling count data Introduction Response
More informationData Analysis in the Health Sciences. Final Exam 2010 EPIB 621
Data Analysis in the Health Sciences Final Exam 2010 EPIB 621 Student s Name: Student s Number: INSTRUCTIONS This examination consists of 8 questions on 17 pages, including this one. Tables of the normal
More informationSUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK
SUMMER 011 RE-EXAM PSYF11STAT - STATISTIK Full Name: Årskortnummer: Date: This exam is made up of three parts: Part 1 includes 30 multiple choice questions; Part includes 10 matching questions; and Part
More informationEXECUTIVE SUMMARY DATA AND PROBLEM
EXECUTIVE SUMMARY Every morning, almost half of Americans start the day with a bowl of cereal, but choosing the right healthy breakfast is not always easy. Consumer Reports is therefore calculated by an
More informationCorrelation and regression
PG Dip in High Intensity Psychological Interventions Correlation and regression Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk/ Correlation Example: Muscle strength
More informationCRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys
Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests
More informationDaniel Boduszek University of Huddersfield
Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to Correlation SPSS procedure for Pearson r Interpretation of SPSS output Presenting results Partial Correlation Correlation
More informationSTATISTICS INFORMED DECISIONS USING DATA
STATISTICS INFORMED DECISIONS USING DATA Fifth Edition Chapter 4 Describing the Relation between Two Variables 4.1 Scatter Diagrams and Correlation Learning Objectives 1. Draw and interpret scatter diagrams
More informationWhat you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu
What you should know before you collect data BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Types and levels of study Descriptive statistics Inferential statistics How to choose a statistical test
More information11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES
Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are
More information5 To Invest or not to Invest? That is the Question.
5 To Invest or not to Invest? That is the Question. Before starting this lab, you should be familiar with these terms: response y (or dependent) and explanatory x (or independent) variables; slope and
More informationSTATISTICS & PROBABILITY
STATISTICS & PROBABILITY LAWRENCE HIGH SCHOOL STATISTICS & PROBABILITY CURRICULUM MAP 2015-2016 Quarter 1 Unit 1 Collecting Data and Drawing Conclusions Unit 2 Summarizing Data Quarter 2 Unit 3 Randomness
More informationBusiness Statistics Probability
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More informationAn Introduction to Bayesian Statistics
An Introduction to Bayesian Statistics Robert Weiss Department of Biostatistics UCLA Fielding School of Public Health robweiss@ucla.edu Sept 2015 Robert Weiss (UCLA) An Introduction to Bayesian Statistics
More informationDay 11: Measures of Association and ANOVA
Day 11: Measures of Association and ANOVA Daniel J. Mallinson School of Public Affairs Penn State Harrisburg mallinson@psu.edu PADM-HADM 503 Mallinson Day 11 November 2, 2017 1 / 45 Road map Measures of
More informationUnderstandable Statistics
Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement
More informationStill important ideas
Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement
More informationSelf-assessment test of prerequisite knowledge for Biostatistics III in R
Self-assessment test of prerequisite knowledge for Biostatistics III in R Mark Clements, Karolinska Institutet 2017-10-31 Participants in the course Biostatistics III are expected to have prerequisite
More informationNotes for laboratory session 2
Notes for laboratory session 2 Preliminaries Consider the ordinary least-squares (OLS) regression of alcohol (alcohol) and plasma retinol (retplasm). We do this with STATA as follows:. reg retplasm alcohol
More informationDescribe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo
Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment
More informationChapter 14: More Powerful Statistical Methods
Chapter 14: More Powerful Statistical Methods Most questions will be on correlation and regression analysis, but I would like you to know just basically what cluster analysis, factor analysis, and conjoint
More informationLecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics
Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose
More informationSTATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION XIN SUN. PhD, Kansas State University, 2012
STATISTICAL METHODS FOR DIAGNOSTIC TESTING: AN ILLUSTRATION USING A NEW METHOD FOR CANCER DETECTION by XIN SUN PhD, Kansas State University, 2012 A THESIS Submitted in partial fulfillment of the requirements
More information1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA.
LDA lab Feb, 6 th, 2002 1 1. Objective: analyzing CD4 counts data using GEE marginal model and random effects model. Demonstrate the analysis using SAS and STATA. 2. Scientific question: estimate the average
More informationLogistic regression. Department of Statistics, University of South Carolina. Stat 205: Elementary Statistics for the Biological and Life Sciences
Logistic regression Department of Statistics, University of South Carolina Stat 205: Elementary Statistics for the Biological and Life Sciences 1 / 1 Logistic regression: pp. 538 542 Consider Y to be binary
More informationPerformance of Median and Least Squares Regression for Slightly Skewed Data
World Academy of Science, Engineering and Technology 9 Performance of Median and Least Squares Regression for Slightly Skewed Data Carolina Bancayrin - Baguio Abstract This paper presents the concept of
More informationChapter 1: Exploring Data
Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!
More information6. Unusual and Influential Data
Sociology 740 John ox Lecture Notes 6. Unusual and Influential Data Copyright 2014 by John ox Unusual and Influential Data 1 1. Introduction I Linear statistical models make strong assumptions about the
More informationUnit 1 Exploring and Understanding Data
Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile
More informationChapter 3: Examining Relationships
Name Date Per Key Vocabulary: response variable explanatory variable independent variable dependent variable scatterplot positive association negative association linear correlation r-value regression
More informationNEUROBLASTOMA DATA -- TWO GROUPS -- QUANTITATIVE MEASURES 38 15:37 Saturday, January 25, 2003
NEUROBLASTOMA DATA -- TWO GROUPS -- QUANTITATIVE MEASURES 38 15:37 Saturday, January 25, 2003 Obs GROUP I DOPA LNDOPA 1 neurblst 1 48.000 1.68124 2 neurblst 1 133.000 2.12385 3 neurblst 1 34.000 1.53148
More informationReadings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F
Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions
More informationCorrelational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots
Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship
More informationObservational studies; descriptive statistics
Observational studies; descriptive statistics Patrick Breheny August 30 Patrick Breheny University of Iowa Biostatistical Methods I (BIOS 5710) 1 / 38 Observational studies Association versus causation
More informationCHAPTER TWO REGRESSION
CHAPTER TWO REGRESSION 2.0 Introduction The second chapter, Regression analysis is an extension of correlation. The aim of the discussion of exercises is to enhance students capability to assess the effect
More informationMEA DISCUSSION PAPERS
Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89 38602-390_www.mea.mpisoc.mpg.de
More informationClincial Biostatistics. Regression
Regression analyses Clincial Biostatistics Regression Regression is the rather strange name given to a set of methods for predicting one variable from another. The data shown in Table 1 and come from a
More informationSection 3.2 Least-Squares Regression
Section 3.2 Least-Squares Regression Linear relationships between two quantitative variables are pretty common and easy to understand. Correlation measures the direction and strength of these relationships.
More informationAdvanced IPD meta-analysis methods for observational studies
Advanced IPD meta-analysis methods for observational studies Simon Thompson University of Cambridge, UK Part 4 IBC Victoria, July 2016 1 Outline of talk Usual measures of association (e.g. hazard ratios)
More informationMath 215, Lab 7: 5/23/2007
Math 215, Lab 7: 5/23/2007 (1) Parametric versus Nonparamteric Bootstrap. Parametric Bootstrap: (Davison and Hinkley, 1997) The data below are 12 times between failures of airconditioning equipment in
More informationOn Regression Analysis Using Bivariate Extreme Ranked Set Sampling
On Regression Analysis Using Bivariate Extreme Ranked Set Sampling Atsu S. S. Dorvlo atsu@squ.edu.om Walid Abu-Dayyeh walidan@squ.edu.om Obaid Alsaidy obaidalsaidy@gmail.com Abstract- Many forms of ranked
More informationGPA vs. Hours of Sleep: A Simple Linear Regression Jacob Ushkurnis 12/16/2016
GPA vs. Hours of Sleep: A Simple Linear Regression Jacob Ushkurnis 12/16/2016 Introduction As a college student, life can sometimes get extremely busy and stressful when there is a lot of work to do. More
More informationPsychology Research Process
Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:
More informationDr. Kelly Bradley Final Exam Summer {2 points} Name
{2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. This exam is being scored out of 00 points.
More informationLecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression
Lecture 6B: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression! Equation of Regression Line; Residuals! Effect of Explanatory/Response Roles! Unusual Observations! Sample
More information11/24/2017. Do not imply a cause-and-effect relationship
Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection
More information12.1 Inference for Linear Regression. Introduction
12.1 Inference for Linear Regression vocab examples Introduction Many people believe that students learn better if they sit closer to the front of the classroom. Does sitting closer cause higher achievement,
More informationAn informal analysis of multilevel variance
APPENDIX 11A An informal analysis of multilevel Imagine we are studying the blood pressure of a number of individuals (level 1) from different neighbourhoods (level 2) in the same city. We start by doing
More informationMeta-analysis: Basic concepts and analysis
Meta-analysis: Basic concepts and analysis Matthias Egger Institute of Social & Preventive Medicine (ISPM) University of Bern Switzerland www.ispm.ch Outline Rationale Definitions Steps The forest plot
More informationSummary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0
Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0 Overview 1. Survey research and design 1. Survey research 2. Survey design 2. Univariate
More information10. LINEAR REGRESSION AND CORRELATION
1 10. LINEAR REGRESSION AND CORRELATION The contingency table describes an association between two nominal (categorical) variables (e.g., use of supplemental oxygen and mountaineer survival ). We have
More informationLAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*
LAB ASSIGNMENT 4 1 INFERENCES FOR NUMERICAL DATA In this lab assignment, you will analyze the data from a study to compare survival times of patients of both genders with different primary cancers. First,
More informationA Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model
Nevitt & Tam A Comparison of Robust and Nonparametric Estimators Under the Simple Linear Regression Model Jonathan Nevitt, University of Maryland, College Park Hak P. Tam, National Taiwan Normal University
More informationPsychology Research Process
Psychology Research Process Logical Processes Induction Observation/Association/Using Correlation Trying to assess, through observation of a large group/sample, what is associated with what? Examples:
More informationCHAPTER ONE CORRELATION
CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to
More informationLinear Regression in SAS
1 Suppose we wish to examine factors that predict patient s hemoglobin levels. Simulated data for six patients is used throughout this tutorial. data hgb_data; input id age race $ bmi hgb; cards; 21 25
More informationSurvey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.
Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation
More informationSurvey research (Lecture 1)
Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation
More informationImmunological Data Processing & Analysis
Immunological Data Processing & Analysis Hongmei Yang Center for Biodefence Immune Modeling Department of Biostatistics and Computational Biology University of Rochester June 12, 2012 Hongmei Yang (CBIM
More information1.4 - Linear Regression and MS Excel
1.4 - Linear Regression and MS Excel Regression is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear
More informationQuantitative Methods in Computing Education Research (A brief overview tips and techniques)
Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu
More informationbivariate analysis: The statistical analysis of the relationship between two variables.
bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for
More informationWDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?
WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters
More informationDescribe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo
Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter
More informationResearch Analysis MICHAEL BERNSTEIN CS 376
Research Analysis MICHAEL BERNSTEIN CS 376 Last time What is a statistical test? Chi-square t-test Paired t-test 2 Today ANOVA Posthoc tests Two-way ANOVA Repeated measures ANOVA 3 Recall: hypothesis testing
More informationThe Pretest! Pretest! Pretest! Assignment (Example 2)
The Pretest! Pretest! Pretest! Assignment (Example 2) May 19, 2003 1 Statement of Purpose and Description of Pretest Procedure When one designs a Math 10 exam one hopes to measure whether a student s ability
More informationStandard Scores. Richard S. Balkin, Ph.D., LPC-S, NCC
Standard Scores Richard S. Balkin, Ph.D., LPC-S, NCC 1 Normal Distributions While Best and Kahn (2003) indicated that the normal curve does not actually exist, measures of populations tend to demonstrate
More informationMidterm Exam ANSWERS Categorical Data Analysis, CHL5407H
Midterm Exam ANSWERS Categorical Data Analysis, CHL5407H 1. Data from a survey of women s attitudes towards mammography are provided in Table 1. Women were classified by their experience with mammography
More informationSTAT 201 Chapter 3. Association and Regression
STAT 201 Chapter 3 Association and Regression 1 Association of Variables Two Categorical Variables Response Variable (dependent variable): the outcome variable whose variation is being studied Explanatory
More information1 Simple and Multiple Linear Regression Assumptions
1 Simple and Multiple Linear Regression Assumptions The assumptions for simple are in fact special cases of the assumptions for multiple: Check: 1. What is external validity? Which assumption is critical
More informationStill important ideas
Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still
More informationMultiple Linear Regression Analysis
Revised July 2018 Multiple Linear Regression Analysis This set of notes shows how to use Stata in multiple regression analysis. It assumes that you have set Stata up on your computer (see the Getting Started
More informationCHILD HEALTH AND DEVELOPMENT STUDY
CHILD HEALTH AND DEVELOPMENT STUDY 9. Diagnostics In this section various diagnostic tools will be used to evaluate the adequacy of the regression model with the five independent variables developed in
More informationName: emergency please discuss this with the exam proctor. 6. Vanderbilt s academic honor code applies.
Name: Biostatistics 1 st year Comprehensive Examination: Applied in-class exam May 28 th, 2015: 9am to 1pm Instructions: 1. There are seven questions and 12 pages. 2. Read each question carefully. Answer
More informationQuestion 1(25= )
MSG500 Final 20-0-2 Examiner: Rebecka Jörnsten, 060-49949 Remember: To pass this course you also have to hand in a final project to the examiner. Open book, open notes but no calculators or computers allowed.
More informationLecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression
Lecture 12: more Chapter 5, Section 3 Relationships between Two Quantitative Variables; Regression Equation of Regression Line; Residuals Effect of Explanatory/Response Roles Unusual Observations Sample
More informationBivariate Correlations
Bivariate Correlations Brawijaya Professional Statistical Analysis BPSA MALANG Jl. Kertoasri 66 Malang (0341) 580342 081 753 3962 Bivariate Correlations The Bivariate Correlations procedure computes the
More informationApplication of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties
Application of Local Control Strategy in analyses of the effects of Radon on Lung Cancer Mortality for 2,881 US Counties Bob Obenchain, Risk Benefit Statistics, August 2015 Our motivation for using a Cut-Point
More informationConditional Distributions and the Bivariate Normal Distribution. James H. Steiger
Conditional Distributions and the Bivariate Normal Distribution James H. Steiger Overview In this module, we have several goals: Introduce several technical terms Bivariate frequency distribution Marginal
More informationSTATISTICS AND RESEARCH DESIGN
Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have
More informationBiostatistics II
Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,
More informationPopper If data follows a trend that is not linear, we cannot make a prediction about it. a. True b. False
Popper 12 1. If data follows a trend that is not linear, we cannot make a prediction about it. a. True b. False 5.5 Non-Linear Methods Many times a scatter-plot reveals a curved pattern instead of a linear
More informationBiostatistics for Med Students. Lecture 1
Biostatistics for Med Students Lecture 1 John J. Chen, Ph.D. Professor & Director of Biostatistics Core UH JABSOM JABSOM MD7 February 14, 2018 Lecture note: http://biostat.jabsom.hawaii.edu/education/training.html
More informationChapter 3: Describing Relationships
Chapter 3: Describing Relationships Objectives: Students will: Construct and interpret a scatterplot for a set of bivariate data. Compute and interpret the correlation, r, between two variables. Demonstrate
More informationMTH 225: Introductory Statistics
Marshall University College of Science Mathematics Department MTH 225: Introductory Statistics Course catalog description Basic probability, descriptive statistics, fundamental statistical inference procedures
More information