Kidane Tesfu Habtemariam, MASTAT, Principle of Stat Data Analysis Project work

Size: px
Start display at page:

Download "Kidane Tesfu Habtemariam, MASTAT, Principle of Stat Data Analysis Project work"

Transcription

1 1

2 1. INTRODUCTION Food label tells the extent of calories contained in the food package. The number tells you the amount of energy in the food. People pay attention to calories because if you eat more calories than your body uses, you might gain weight. This project paper presents a statistical analysis report of a research problem concerned with the accuracy of labeling of diet and health taken from the work published by David B.- Allison; September, 1993 in the Journal of the American Medical Association (JAMA). Research setting; Foods were sampled from retail merchants throughout the borough of Manhattan, New York, NY. The researcher sampled 40 different food items across regionally distributed, nationally advertised and locally prepared. They measured the caloric content of each food item via bomb calorimeter and converted the readings into an estimate of total metabolically energy. In addition, they calculated the percentage difference between the measured calories and the labeled calories for each item and per gram. 1.1 Objectives Determine accuracy of caloric labeling of diet and health foods. Assess whether the accuracy differs for certain categories of food suppliers. Evaluate if there is evidence of overall underreporting/over reporting of calories per gram on food labels. Assess if the degree of underreporting/over reporting of calories per gram differ regional versus national. Evaluate if there is variability of under reporting/over reporting of calories per gram regional versus national. Analyze the degree of underreporting/over reporting of calories per item differ across food suppliers. Examine whether there is any relationship between the relative frequency of underreporting of calories per gram and the type of food supplier.

3 1. Data and Methodology Data: A sample of 40 food items including regionally distributed (n=1), nationally advertised (n=0), and locally prepared items (n=8). The data contains 8 missing values for locally prepared food in food label of per gram. Additionally, classification represents the three different food suppliers denoted as (R, N &L). The measurement of calories was on food per item and food per gram. For each food type a percentage difference between measured calorie minus labeled calorie ( +ve, underreporting) and percentage difference between measured calorie minus labeled calories(-ve, over reporting) obtained. Methodology Percentage difference of caloric labeling with positive value indicates underreporting whereas negative implies over reporting. In the first analysis a descriptive plots such as box plot, QQplots, histograms and bar plot were used to demonstrate the nature and pattern of the dataset. The histogram for per item overall indicates extremely right skewed and for per gram moderately right skewed. The QQ plot for per item is far from normality and the remedial measure taken was to transform into log scale where as plots of per gram behaves some how a normal with some extreme values at the upper tail of QQ plot. The QQ plot (per gram) is very sensitive for outlier and a solution was to remove outliers. The data contains 8 missing values in the locally prepared food for food label per gram; this is whole set of missing data for a particular variable. Possible solution applied was omitting them, reasoning can be they are whole set of data no way to replace them. After having confirmed that per gram measurements are approximately normally distributed with Shapiro test, Two sample T test was applied to evaluate if there is overall over reporting or under reporting exists. Further more, two sample T test was performed with in food labeling on per gram across the regionally advertised and nationally advertised foods to check if there is over reporting or underreporting. To examine if there is variability between region and nation on food labeling of food per gram, an F test, a test of variance was performed. Concurrently, in the caloric labeling per item group, caloric labeling was compared between the 3 food suppliers by one-way analysis of variance, after having transformed them into log 3

4 scale and removal of influential outliers (), then after inspected that food labeling measurements are approximately normal distributed with the same variance in each of the 3 study areas after necessary transformation taken place using Bartlett test. Post-hoc analysis was based on Tukey s Honest Significant Difference method. Food labeling effect estimates were obtained as explained in Section., and are reported together with Bonferroni-adjusted p-values and 95% confidence intervals. In examining the relationship between relative frequency of underreporting or over reporting of per gram across classification Fisher exact and Pearson Chi square was applied. In all analyses, examination of normality was based on QQ-plots and Shapiro test and of homoscedasticity (i.e. constancy of variance) on the F test and Bartlett test. P-values below 0.05 (or 5%) will be termed statistically significant. All analyses were conducted in R Version.6, using the package faraway, stats and car. Results Section Part I: Els Goetghebeur.1 Descriptive Statistics The mean percentage over label per item and per gram overall is 4% and 5% respectively. Mean percentage over label per item in regionally distributed foods were 5% (SD = 16%). Nationally advertised foods on per item their mean percentage over label were 0.13% (SD = 11%). Where as locally prepared foods on per item their mean percentage over label (mean difference) were 8% (SD = 84%). Regionally distributed foods per gram had mean % over label of 15% (SD = 19%). While nationally advertised foods per gram had mean percentage over label of -0.95% (SD = 8%) In locally prepared foods per gram data is missing (values not reported). More is explained in Table.1 Table.1 Statistical indicators % difference over food Labeling per gram Classification % difference over food Labeling per item Classification L*= locally prepared N*= advertised R*= distributed Nationally Regionally Classification L N R L N R L = 8, N = 0, R = 1 Mean % over label NA s -0.95% 14.67% 81.75% 0.15% 5.1% Std.Deviation NA s 8.10% 18.7% 83.97% 10.5% 16.07% Median NA s -1.0% 1.50% 70%.5% 6.50% Total N = 40 NA s = Missing 4

5 Fig Check for Normality of per item via plots The left panel plots (fig.1) of different type are indicating for untransformed data of per item. In the box plot section there appears to be with many extreme outliers and the mean and median are different (table.1). A plot of histogram was depicted to demonstrate the nature of skew ness and it can be seen in the plot that the data are right skewed. A final plot of Normal QQ-plot was done to assess the normality pattern and linearity of the graph and it appears that the data set of per item is not normally distributed. Alternative remedy will be transformation of the per item dataset in to log scale (solution for right skewed data). 5

6 .1. Check for Normality of per gram via plots The right panel plots (fig.1) of different type are indicating for untransformed data of per gram. Looking at the boxplot the mean and median are almost the same but the data contains some few outliers. An extension for checking skewenes was performed using histogram and final plot for inspection of normality is QQ plot and the data behaves some how normal even though at the upper tail the QQ plot deviates and this a signal for existence of outlier and possible remedy is removal of outlier.. Normality of the data Fig. 6

7 As the data for per item are right skewed transforming them into log scale gave the solution to meet the normality. In the case of data per gram an alternative measure for normality was only to cut the outliers and these outliers are very few and we can tolerate the absence of these values. The removed outliers are values (greater of 5) which are only 3 values. This way brings the dataset in to normality. Section Part II (Stefan Van Aelst) 3.1 Evaluation of overall underreporting/over reporting over labeling per gram In order to evaluate if there is an evidence of overall underreporting/over reporting of calories per gram a statistical test which is Independent T test was performed. To apply the T test assumptions has to be fulfilled and these assumptions are normality and independent observations. As it is briefly explained in the methodology section since the dataset per gram consists extreme values (outliers>5) one way to handle this problem is to cut the outlier and fit the normality test. The Shapiro test for normality have confirmed that the dataset for per gram is approximately normal with (P = 0.78) and the T test can be applied. Furthermore, the missing values were omitted during the analysis. Fig 3.1( Plot of per gram ) Fig 3.1 depicts the caloric labeling of per gram after removal of outliers and this ensures the normality of the data. 7

8 The null hypothesis for the T test is H : 0 0 gram Versus H : 0 1 gram Let gram represents the average percentage difference of caloric measurement over labeling. = gram measuredcalory - Labelledcalory > 0 ========= underreporting ( t 0.68) with P-val(P = 0.5) and 95% of CI is (-.56, 5.11) The out put from R is 8,0.05 The p value is larger than 0.05 and 95% of confidence interval includes zero. Thus; there is insufficient information or evidence to reject the null hypothesis (Ho: represents the mean percentage difference between measured calorie and labeled calorie is the same). With 95% confidence the true mean % difference of food labeling per gram lies some where between -.56 and Section Part III B.1 & B. (Stijn Vansteelandt) 3. Evaluation of the degree of underreporting/over reporting of calorie per gram To determine the degree of underreporting/over reporting of calories per gram across regionally distributed foods and nationally advertised foods a statistical test has been performed. The test statistics is independent T test. To guarantee the use of this test the assumptions of normality has to be fulfilled. The Shapiro test for normality have confirmed the normality of the data per gram with regionally distributed foods (P = 0.36) and with nationally distributed foods (P =0.99). In both cases the assumption is met and we can continue to apply to Independent T test. Fig 3. 8

9 The null hypothesis for the T test across nationally advertised foods H : 0 0 NA Versus H : 0 1 NA Let NA represents the average percentage difference of caloric measurement over labeling for nationally advertised foods. = - Labelledcalorie > 0 ========= underreporting NA measuredcalorie ( t 0.5) with (P = 0.61) and 95% of CI is (-4.74,.84) The out put from R is 19,0.05 The p value is larger than 0.05 and 95% of confidence interval includes zero. Thus; there is insufficient information or evidence to reject the null hypothesis (Ho: represents the mean percentage difference between measured calorie and labeled calorie is the same for nationally advertised food labels). With 95% confidence the true mean % difference of food labeling per gram of nationally advertised food lies some where between 4.74 and.84. The null hypothesis for the T test across regionally distributed foods H : 0 Versus H : 0 1 Let 0 RG RG RG represents the average percentage difference of caloric measurement over labeling for regionally distributed foods. = measuredcalorie - Labelledcalorie > 0 ========= underreporting RG ( t.71) with (P = 0.0) and 95% of CI is (.77, 6.56) The out put from R is 11,0.05 The p value is less than 0.05 and 95% of confidence interval excludes zero. Thus; the null hypothesis is rejected at 5% level of significance and conclude that the mean percentage difference of calorie measure per gram for regionally distributed food is not zero. With 95% confidence the true mean % difference of food labeling per gram of regionally distributed food lies some where between.77 and Evaluation of the variability in overall of underreporting/over reporting of calorie per gram on food labels. Fig 3.(box plot) indicates that the median for the food suppliers varies and to examine the variability an F test is performed. F test demonstrates the variability in variance and tests if variance is constant or not with in the two independent samples. 9

10 Let s formulate our null hypothesis as follows. 1 (Variance of nationally advertised foods per gram) and (variance of regionally distributed foods per gram) H : Versus 0 1 H : 1 1 F ration test = > F = S / S ===== ratio of variances 1 ( F 0.19) with (P = 0.001) and 95% of CI is (0.06, 0.5). The out put from R is 19,11 The P value is highly significant and hence we reject the null hypothesis and conclude that the variance is not constant across nationally advertised foods per gram and regionally distributed foods per gram. Section Part III C (Stijn Vansteelandt) Figure 3.3 Figure 3.3 suggests that the average percentage difference of caloric labeling of food per item is higher in the locally prepared food than in the regional and national food suppliers. The m 10

11 This is confirmed by the one-way analysis of variance test, which reveals a significant difference in average percentage difference of caloric labeling between the food suppliers. The box plot depicted in fig 3.3 (green colors) indicates the untransformed value of caloric labeling per item deviates extremely from normality. One-way analysis of variance test prerequisite is homogeneity of variance and the Bartlett s test turns to be highly significant before transformation (P = 3.e-07). To meet the requirement for normality transformation of the per item variable was performed by changing into logscale. Even with this method again the Bartlett s test brought significant difference in variance (P = 0.034). There seems to have some influential outliers in the dataset per item for locally distributed foods, and one way to handle this problem is to cut off the extreme values which are observations (not danger). After all the Bartlett s test have confirmed for constant variance and appropriateness of the test with (P =0.1). The results for the F value are in logscale which are (F, 5 =5.91) and P-value (P=0.0078) which are highly significant. The null hypothesis for one-way analysis of variance H : 0 L R N Versus H A : at least 1 of the population means differs. F betweenmse Ho F withinmse k 1, n k Post hoc analysis shows that the average % difference in caloric labeling measurement in per item in log scale equals -1.8 (95% confidence interval (-.49,-0.07, P =0.036) between food suppliers of national and local,-0.01(95% confidence interval (-1., 1.19, P = 0.99) between food suppliers of regional and local, 1.7(95% confidence interval (0.5,.8, P = 0.01) between food suppliers of regional and national. All the above values are reported in log scale. 11

12 Section Part V (Yves Rossel) In evaluating for a possible relationship that may exist between the (relative) frequency of the underreporting of calories per gram and the type of food supplier, a Fisher s exact test for count data shows that nationally advertised food supplier didn t have significant relationship (P= 0.06) with regionally advertised food suppliers with respect to the frequency of the underreporting of calories per gram. The test is based on small sample method. The odd ration is and the 95% CI is (0.015, 1.13). A similar result of non-significance was obtained using the Pearson s Chi square test (P = 0.056). In sum there is no significant relationship between the relative frequency of underreporting of the calories per gram and the type of food supplier. In examining the degree of association between results were also obtained using Sakoda s method with 0.46 which can be associated as weak relationship. Another technique applied was fitting 3x tables (see appendix and R code) in which the missing values where fitted as structural zeros for locally prepared foods. The log linear model from the Poisson distribution fitted gave a deviance (G = 4.89) with 1 degree of freedom. Conclusion These findings suggest that food labels may be inadequate sources for caloric monitoring. Health care professionals should consider the accuracy of caloric labeling when advising patients to use food labels to help monitor their caloric intake. All locally prepared food labels per item had reported significant difference of underreporting/over reporting of caloric measurement. All regionally distributed food labels per item had reported significant difference of underreporting/over reporting of caloric measurement. All nationally advertised food labels per item had no significance difference of under reporting/over reporting of caloric measurement. The overall underreporting/over reporting of calories per item differs significantly regional versus national and national versus local where as comparison between regional and local is not significant. The overall underreporting/over reporting of calories per gram differs significantly between the regional and national food labels. There is no significant association between the relative frequencies of underreporting/over reporting of calories per gram with the type of food suppliers. 1

13 Appendix I ( The data set) Appendices Appendix II (Barplot of data by classification 13

Profile Analysis. Intro and Assumptions Psy 524 Andrew Ainsworth

Profile Analysis. Intro and Assumptions Psy 524 Andrew Ainsworth Profile Analysis Intro and Assumptions Psy 524 Andrew Ainsworth Profile Analysis Profile analysis is the repeated measures extension of MANOVA where a set of DVs are commensurate (on the same scale). Profile

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,

More information

ANOVA in SPSS (Practical)

ANOVA in SPSS (Practical) ANOVA in SPSS (Practical) Analysis of Variance practical In this practical we will investigate how we model the influence of a categorical predictor on a continuous response. Centre for Multilevel Modelling

More information

Before we get started:

Before we get started: Before we get started: http://arievaluation.org/projects-3/ AEA 2018 R-Commander 1 Antonio Olmos Kai Schramm Priyalathta Govindasamy Antonio.Olmos@du.edu AntonioOlmos@aumhc.org AEA 2018 R-Commander 2 Plan

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc. Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

Biostatistics II

Biostatistics II Biostatistics II 514-5509 Course Description: Modern multivariable statistical analysis based on the concept of generalized linear models. Includes linear, logistic, and Poisson regression, survival analysis,

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Title: A new statistical test for trends: establishing the properties of a test for repeated binomial observations on a set of items

Title: A new statistical test for trends: establishing the properties of a test for repeated binomial observations on a set of items Title: A new statistical test for trends: establishing the properties of a test for repeated binomial observations on a set of items Introduction Many studies of therapies with single subjects involve

More information

Dr. Kelly Bradley Final Exam Summer {2 points} Name

Dr. Kelly Bradley Final Exam Summer {2 points} Name {2 points} Name You MUST work alone no tutors; no help from classmates. Email me or see me with questions. You will receive a score of 0 if this rule is violated. This exam is being scored out of 00 points.

More information

Study of cigarette sales in the United States Ge Cheng1, a,

Study of cigarette sales in the United States Ge Cheng1, a, 2nd International Conference on Economics, Management Engineering and Education Technology (ICEMEET 2016) 1Department Study of cigarette sales in the United States Ge Cheng1, a, of pure mathematics and

More information

One-Way Independent ANOVA

One-Way Independent ANOVA One-Way Independent ANOVA Analysis of Variance (ANOVA) is a common and robust statistical test that you can use to compare the mean scores collected from different conditions or groups in an experiment.

More information

HS Exam 1 -- March 9, 2006

HS Exam 1 -- March 9, 2006 Please write your name on the back. Don t forget! Part A: Short answer, multiple choice, and true or false questions. No use of calculators, notes, lab workbooks, cell phones, neighbors, brain implants,

More information

Simple Linear Regression the model, estimation and testing

Simple Linear Regression the model, estimation and testing Simple Linear Regression the model, estimation and testing Lecture No. 05 Example 1 A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity.

More information

Business Research Methods. Introduction to Data Analysis

Business Research Methods. Introduction to Data Analysis Business Research Methods Introduction to Data Analysis Data Analysis Process STAGES OF DATA ANALYSIS EDITING CODING DATA ENTRY ERROR CHECKING AND VERIFICATION DATA ANALYSIS Introduction Preparation of

More information

CHAPTER ONE CORRELATION

CHAPTER ONE CORRELATION CHAPTER ONE CORRELATION 1.0 Introduction The first chapter focuses on the nature of statistical data of correlation. The aim of the series of exercises is to ensure the students are able to use SPSS to

More information

appstats26.notebook April 17, 2015

appstats26.notebook April 17, 2015 Chapter 26 Comparing Counts Objective: Students will interpret chi square as a test of goodness of fit, homogeneity, and independence. Goodness of Fit A test of whether the distribution of counts in one

More information

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol.

Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol. Ho (null hypothesis) Ha (alternative hypothesis) Problem #1 Neurological signs and symptoms of ciguatera poisoning as the start of treatment and 2.5 hours after treatment with mannitol. Hypothesis: Ho:

More information

Chapter 25. Paired Samples and Blocks. Copyright 2010 Pearson Education, Inc.

Chapter 25. Paired Samples and Blocks. Copyright 2010 Pearson Education, Inc. Chapter 25 Paired Samples and Blocks Copyright 2010 Pearson Education, Inc. Paired Data Data are paired when the observations are collected in pairs or the observations in one group are naturally related

More information

STATISTICS AND RESEARCH DESIGN

STATISTICS AND RESEARCH DESIGN Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have

More information

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival*

LAB ASSIGNMENT 4 INFERENCES FOR NUMERICAL DATA. Comparison of Cancer Survival* LAB ASSIGNMENT 4 1 INFERENCES FOR NUMERICAL DATA In this lab assignment, you will analyze the data from a study to compare survival times of patients of both genders with different primary cancers. First,

More information

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process Research Methods in Forest Sciences: Learning Diary Yoko Lu 285122 9 December 2016 1. Research process It is important to pursue and apply knowledge and understand the world under both natural and social

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Basic Biostatistics. Chapter 1. Content

Basic Biostatistics. Chapter 1. Content Chapter 1 Basic Biostatistics Jamalludin Ab Rahman MD MPH Department of Community Medicine Kulliyyah of Medicine Content 2 Basic premises variables, level of measurements, probability distribution Descriptive

More information

Midterm Exam MMI 409 Spring 2009 Gordon Bleil

Midterm Exam MMI 409 Spring 2009 Gordon Bleil Midterm Exam MMI 409 Spring 2009 Gordon Bleil Table of contents: (Hyperlinked to problem sections) Problem 1 Hypothesis Tests Results Inferences Problem 2 Hypothesis Tests Results Inferences Problem 3

More information

Applied Statistical Analysis EDUC 6050 Week 4

Applied Statistical Analysis EDUC 6050 Week 4 Applied Statistical Analysis EDUC 6050 Week 4 Finding clarity using data Today 1. Hypothesis Testing with Z Scores (continued) 2. Chapters 6 and 7 in Book 2 Review! = $ & '! = $ & ' * ) 1. Which formula

More information

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK

SUMMER 2011 RE-EXAM PSYF11STAT - STATISTIK SUMMER 011 RE-EXAM PSYF11STAT - STATISTIK Full Name: Årskortnummer: Date: This exam is made up of three parts: Part 1 includes 30 multiple choice questions; Part includes 10 matching questions; and Part

More information

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition List of Figures List of Tables Preface to the Second Edition Preface to the First Edition xv xxv xxix xxxi 1 What Is R? 1 1.1 Introduction to R................................ 1 1.2 Downloading and Installing

More information

Reflection Questions for Math 58B

Reflection Questions for Math 58B Reflection Questions for Math 58B Johanna Hardin Spring 2017 Chapter 1, Section 1 binomial probabilities 1. What is a p-value? 2. What is the difference between a one- and two-sided hypothesis? 3. What

More information

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics :

Here are the various choices. All of them are found in the Analyze menu in SPSS, under the sub-menu for Descriptive Statistics : Descriptive Statistics in SPSS When first looking at a dataset, it is wise to use descriptive statistics to get some idea of what your data look like. Here is a simple dataset, showing three different

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Examining differences between two sets of scores

Examining differences between two sets of scores 6 Examining differences between two sets of scores In this chapter you will learn about tests which tell us if there is a statistically significant difference between two sets of scores. In so doing you

More information

Overview of Non-Parametric Statistics

Overview of Non-Parametric Statistics Overview of Non-Parametric Statistics LISA Short Course Series Mark Seiss, Dept. of Statistics April 7, 2009 Presentation Outline 1. Homework 2. Review of Parametric Statistics 3. Overview Non-Parametric

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield d.boduszek@hud.ac.uk Introduction to Correlation SPSS procedure for Pearson r Interpretation of SPSS output Presenting results Partial Correlation Correlation

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 11 + 13 & Appendix D & E (online) Plous - Chapters 2, 3, and 4 Chapter 2: Cognitive Dissonance, Chapter 3: Memory and Hindsight Bias, Chapter 4: Context Dependence Still

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

Statistical reports Regression, 2010

Statistical reports Regression, 2010 Statistical reports Regression, 2010 Niels Richard Hansen June 10, 2010 This document gives some guidelines on how to write a report on a statistical analysis. The document is organized into sections that

More information

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics.

Poisson regression. Dae-Jin Lee Basque Center for Applied Mathematics. Dae-Jin Lee dlee@bcamath.org Basque Center for Applied Mathematics http://idaejin.github.io/bcam-courses/ D.-J. Lee (BCAM) Intro to GLM s with R GitHub: idaejin 1/40 Modeling count data Introduction Response

More information

Advanced ANOVA Procedures

Advanced ANOVA Procedures Advanced ANOVA Procedures Session Lecture Outline:. An example. An example. Two-way ANOVA. An example. Two-way Repeated Measures ANOVA. MANOVA. ANalysis of Co-Variance (): an ANOVA procedure whereby the

More information

1) What is the independent variable? What is our Dependent Variable?

1) What is the independent variable? What is our Dependent Variable? 1) What is the independent variable? What is our Dependent Variable? Independent Variable: Whether the font color and word name are the same or different. (Congruency) Dependent Variable: The amount of

More information

Missy Wittenzellner Big Brother Big Sister Project

Missy Wittenzellner Big Brother Big Sister Project Missy Wittenzellner Big Brother Big Sister Project Evaluation of Normality: Before the analysis, we need to make sure that the data is normally distributed Based on the histogram, our match length data

More information

Collecting & Making Sense of

Collecting & Making Sense of Collecting & Making Sense of Quantitative Data Deborah Eldredge, PhD, RN Director, Quality, Research & Magnet Recognition i Oregon Health & Science University Margo A. Halm, RN, PhD, ACNS-BC, FAHA Director,

More information

Demonstrating Client Improvement to Yourself and Others

Demonstrating Client Improvement to Yourself and Others Demonstrating Client Improvement to Yourself and Others Understanding and Using your Outcome Evaluation System (Part 2 of 3) Greg Vinson, Ph.D. Senior Researcher and Evaluation Manager Center for Victims

More information

Creative Commons Attribution-NonCommercial-Share Alike License

Creative Commons Attribution-NonCommercial-Share Alike License Author: Brenda Gunderson, Ph.D., 05 License: Unless otherwise noted, this material is made available under the terms of the Creative Commons Attribution- NonCommercial-Share Alike 3.0 Unported License:

More information

isc ove ring i Statistics sing SPSS

isc ove ring i Statistics sing SPSS isc ove ring i Statistics sing SPSS S E C O N D! E D I T I O N (and sex, drugs and rock V roll) A N D Y F I E L D Publications London o Thousand Oaks New Delhi CONTENTS Preface How To Use This Book Acknowledgements

More information

MODULE S1 DESCRIPTIVE STATISTICS

MODULE S1 DESCRIPTIVE STATISTICS MODULE S1 DESCRIPTIVE STATISTICS All educators are involved in research and statistics to a degree. For this reason all educators should have a practical understanding of research design. Even if an educator

More information

Health Consciousness of Siena Students

Health Consciousness of Siena Students Health Consciousness of Siena Students Corey Austin, Siena College Kevin Flood, Siena College Allison O Keefe, Siena College Kim Reuter, Siena College EXECUTIVE SUMMARY We decided to research the health

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

Stat Wk 9: Hypothesis Tests and Analysis

Stat Wk 9: Hypothesis Tests and Analysis Stat 342 - Wk 9: Hypothesis Tests and Analysis Crash course on ANOVA, proc glm Stat 342 Notes. Week 9 Page 1 / 57 Crash Course: ANOVA AnOVa stands for Analysis Of Variance. Sometimes it s called ANOVA,

More information

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences.

SPRING GROVE AREA SCHOOL DISTRICT. Course Description. Instructional Strategies, Learning Practices, Activities, and Experiences. SPRING GROVE AREA SCHOOL DISTRICT PLANNED COURSE OVERVIEW Course Title: Basic Introductory Statistics Grade Level(s): 11-12 Units of Credit: 1 Classification: Elective Length of Course: 30 cycles Periods

More information

Types of Statistics. Censored data. Files for today (June 27) Lecture and Homework INTRODUCTION TO BIOSTATISTICS. Today s Outline

Types of Statistics. Censored data. Files for today (June 27) Lecture and Homework INTRODUCTION TO BIOSTATISTICS. Today s Outline INTRODUCTION TO BIOSTATISTICS FOR GRADUATE AND MEDICAL STUDENTS Files for today (June 27) Lecture and Homework Descriptive Statistics and Graphically Visualizing Data Lecture #2 (1 file) PPT presentation

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Please note the page numbers listed for the Lind book may vary by a page or two depending on which version of the textbook you have. Readings: Lind 1 11 (with emphasis on chapters 10, 11) Please note chapter

More information

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 3: Overview of Descriptive Statistics October 3, 2005 Lecture Outline Purpose

More information

UPPER MIDWEST MARKETING AREA ANALYSIS OF COMPONENT LEVELS AND SOMATIC CELL COUNT IN INDIVIDUAL HERD MILK AT THE FARM LEVEL 2007

UPPER MIDWEST MARKETING AREA ANALYSIS OF COMPONENT LEVELS AND SOMATIC CELL COUNT IN INDIVIDUAL HERD MILK AT THE FARM LEVEL 2007 UPPER MIDWEST MARKETING AREA ANALYSIS OF COMPONENT LEVELS AND SOMATIC CELL COUNT IN INDIVIDUAL HERD MILK AT THE FARM LEVEL 2007 Staff Paper 08-01 Prepared by: Corey Freije 2008 Federal Milk Market Administrator

More information

YSU Students. STATS 3743 Dr. Huang-Hwa Andy Chang Term Project 2 May 2002

YSU Students. STATS 3743 Dr. Huang-Hwa Andy Chang Term Project 2 May 2002 YSU Students STATS 3743 Dr. Huang-Hwa Andy Chang Term Project May 00 Anthony Koulianos, Chemical Engineer Kyle Unger, Chemical Engineer Vasilia Vamvakis, Chemical Engineer I. Executive Summary It is common

More information

Math Section MW 1-2:30pm SR 117. Bekki George 206 PGH

Math Section MW 1-2:30pm SR 117. Bekki George 206 PGH Math 3339 Section 21155 MW 1-2:30pm SR 117 Bekki George bekki@math.uh.edu 206 PGH Office Hours: M 11-12:30pm & T,TH 10:00 11:00 am and by appointment More than Two Independent Samples: Single Factor Analysis

More information

EXECUTIVE SUMMARY DATA AND PROBLEM

EXECUTIVE SUMMARY DATA AND PROBLEM EXECUTIVE SUMMARY Every morning, almost half of Americans start the day with a bowl of cereal, but choosing the right healthy breakfast is not always easy. Consumer Reports is therefore calculated by an

More information

Introduction & Basics

Introduction & Basics CHAPTER 1 Introduction & Basics 1.1 Statistics the Field... 1 1.2 Probability Distributions... 4 1.3 Study Design Features... 9 1.4 Descriptive Statistics... 13 1.5 Inferential Statistics... 16 1.6 Summary...

More information

Analysis of Variance (ANOVA) Program Transcript

Analysis of Variance (ANOVA) Program Transcript Analysis of Variance (ANOVA) Program Transcript DR. JENNIFER ANN MORROW: Welcome to Analysis of Variance. My name is Dr. Jennifer Ann Morrow. In today's demonstration, I'll review with you the definition

More information

STAT445 Midterm Project1

STAT445 Midterm Project1 STAT445 Midterm Project1 Executive Summary This report works on the dataset of Part of This Nutritious Breakfast! In this dataset, 77 different breakfast cereals were collected. The dataset also explores

More information

South Australian Research and Development Institute. Positive lot sampling for E. coli O157

South Australian Research and Development Institute. Positive lot sampling for E. coli O157 final report Project code: Prepared by: A.MFS.0158 Andreas Kiermeier Date submitted: June 2009 South Australian Research and Development Institute PUBLISHED BY Meat & Livestock Australia Limited Locked

More information

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review

Results & Statistics: Description and Correlation. I. Scales of Measurement A Review Results & Statistics: Description and Correlation The description and presentation of results involves a number of topics. These include scales of measurement, descriptive statistics used to summarize

More information

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.

Survey research (Lecture 1) Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4. Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation

More information

Survey research (Lecture 1)

Survey research (Lecture 1) Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2015 Creative Commons Attribution 4.0 Overview 1. Survey research 2. Survey design 3. Descriptives & graphing 4. Correlation

More information

Section 6: Analysing Relationships Between Variables

Section 6: Analysing Relationships Between Variables 6. 1 Analysing Relationships Between Variables Section 6: Analysing Relationships Between Variables Choosing a Technique The Crosstabs Procedure The Chi Square Test The Means Procedure The Correlations

More information

Intro to SPSS. Using SPSS through WebFAS

Intro to SPSS. Using SPSS through WebFAS Intro to SPSS Using SPSS through WebFAS http://www.yorku.ca/computing/students/labs/webfas/ Try it early (make sure it works from your computer) If you need help contact UIT Client Services Voice: 416-736-5800

More information

Data, frequencies, and distributions. Martin Bland. Types of data. Types of data. Clinical Biostatistics

Data, frequencies, and distributions. Martin Bland. Types of data. Types of data. Clinical Biostatistics Clinical Biostatistics Data, frequencies, and distributions Martin Bland Professor of Health Statistics University of York http://martinbland.co.uk/ Types of data Qualitative data arise when individuals

More information

To open a CMA file > Download and Save file Start CMA Open file from within CMA

To open a CMA file > Download and Save file Start CMA Open file from within CMA Example name Effect size Analysis type Level Tamiflu Hospitalized Risk ratio Basic Basic Synopsis The US government has spent 1.4 billion dollars to stockpile Tamiflu, in anticipation of a possible flu

More information

COAL COMBUSTION RESIDUALS RULE STATISTICAL METHODS CERTIFICATION SOUTHERN ILLINOIS POWER COOPERATIVE (SIPC)

COAL COMBUSTION RESIDUALS RULE STATISTICAL METHODS CERTIFICATION SOUTHERN ILLINOIS POWER COOPERATIVE (SIPC) Regulatory Guidance Regulatory guidance provided in 40 CFR 257.90 specifies that a CCR groundwater monitoring program must include selection of the statistical procedures to be used for evaluating groundwater

More information

EPS 625 INTERMEDIATE STATISTICS TWO-WAY ANOVA IN-CLASS EXAMPLE (FLEXIBILITY)

EPS 625 INTERMEDIATE STATISTICS TWO-WAY ANOVA IN-CLASS EXAMPLE (FLEXIBILITY) EPS 625 INTERMEDIATE STATISTICS TO-AY ANOVA IN-CLASS EXAMPLE (FLEXIBILITY) A researcher conducts a study to evaluate the effects of the length of an exercise program on the flexibility of female and male

More information

Tutorial 3: MANOVA. Pekka Malo 30E00500 Quantitative Empirical Research Spring 2016

Tutorial 3: MANOVA. Pekka Malo 30E00500 Quantitative Empirical Research Spring 2016 Tutorial 3: Pekka Malo 30E00500 Quantitative Empirical Research Spring 2016 Step 1: Research design Adequacy of sample size Choice of dependent variables Choice of independent variables (treatment effects)

More information

Appendix III Individual-level analysis

Appendix III Individual-level analysis Appendix III Individual-level analysis Our user-friendly experimental interface makes it possible to present each subject with many choices in the course of a single experiment, yielding a rich individual-level

More information

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D.

Statistics Guide. Prepared by: Amanda J. Rockinson- Szapkiw, Ed.D. This guide contains a summary of the statistical terms and procedures. This guide can be used as a reference for course work and the dissertation process. However, it is recommended that you refer to statistical

More information

Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0

Summary & Conclusion. Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0 Summary & Conclusion Lecture 10 Survey Research & Design in Psychology James Neill, 2016 Creative Commons Attribution 4.0 Overview 1. Survey research and design 1. Survey research 2. Survey design 2. Univariate

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Assoc. Prof Dr Sarimah Abdullah Unit of Biostatistics & Research Methodology School of Medical Sciences, Health Campus Universiti Sains Malaysia Regression Regression analysis

More information

NORTH SOUTH UNIVERSITY TUTORIAL 1

NORTH SOUTH UNIVERSITY TUTORIAL 1 NORTH SOUTH UNIVERSITY TUTORIAL 1 REVIEW FROM BIOSTATISTICS I AHMED HOSSAIN,PhD Data Management and Analysis AHMED HOSSAIN,PhD - Data Management and Analysis 1 DATA TYPES/ MEASUREMENT SCALES Categorical:

More information

PRINTABLE VERSION. Quiz 1. True or False: The amount of rainfall in your state last month is an example of continuous data.

PRINTABLE VERSION. Quiz 1. True or False: The amount of rainfall in your state last month is an example of continuous data. Question 1 PRINTABLE VERSION Quiz 1 True or False: The amount of rainfall in your state last month is an example of continuous data. a) True b) False Question 2 True or False: The standard deviation is

More information

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017

Table of Contents. Plots. Essential Statistics for Nursing Research 1/12/2017 Essential Statistics for Nursing Research Kristen Carlin, MPH Seattle Nursing Research Workshop January 30, 2017 Table of Contents Plots Descriptive statistics Sample size/power Correlations Hypothesis

More information

APPENDIX N. Summary Statistics: The "Big 5" Statistical Tools for School Counselors

APPENDIX N. Summary Statistics: The Big 5 Statistical Tools for School Counselors APPENDIX N Summary Statistics: The "Big 5" Statistical Tools for School Counselors This appendix describes five basic statistical tools school counselors may use in conducting results based evaluation.

More information

MATH 1040 Skittles Data Project

MATH 1040 Skittles Data Project Laura Boren MATH 1040 Data Project For our project in MATH 1040 everyone in the class was asked to buy a 2.17 individual sized bag of skittles and count the number of each color of candy in the bag. The

More information

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do.

Midterm STAT-UB.0003 Regression and Forecasting Models. I will not lie, cheat or steal to gain an academic advantage, or tolerate those who do. Midterm STAT-UB.0003 Regression and Forecasting Models The exam is closed book and notes, with the following exception: you are allowed to bring one letter-sized page of notes into the exam (front and

More information

STAT 503X Case Study 1: Restaurant Tipping

STAT 503X Case Study 1: Restaurant Tipping STAT 503X Case Study 1: Restaurant Tipping 1 Description Food server s tips in restaurants may be influenced by many factors including the nature of the restaurant, size of the party, table locations in

More information

One way Analysis of Variance (ANOVA)

One way Analysis of Variance (ANOVA) One way Analysis of Variance (ANOVA) Esra Akdeniz March 22nd, 2016 Introduction Test hypothesis concerning one population mean. Test hypothesis concerning two population means What if we want to compare

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY

POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY No. of Printed Pages : 12 MHS-014 POST GRADUATE DIPLOMA IN BIOETHICS (PGDBE) Term-End Examination June, 2016 MHS-014 : RESEARCH METHODOLOGY Time : 2 hours Maximum Marks : 70 PART A Attempt all questions.

More information

9 research designs likely for PSYC 2100

9 research designs likely for PSYC 2100 9 research designs likely for PSYC 2100 1) 1 factor, 2 levels, 1 group (one group gets both treatment levels) related samples t-test (compare means of 2 levels only) 2) 1 factor, 2 levels, 2 groups (one

More information

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego

Biostatistics. Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego Biostatistics Donna Kritz-Silverstein, Ph.D. Professor Department of Family & Preventive Medicine University of California, San Diego (858) 534-1818 dsilverstein@ucsd.edu Introduction Overview of statistical

More information

Chapter 11. Experimental Design: One-Way Independent Samples Design

Chapter 11. Experimental Design: One-Way Independent Samples Design 11-1 Chapter 11. Experimental Design: One-Way Independent Samples Design Advantages and Limitations Comparing Two Groups Comparing t Test to ANOVA Independent Samples t Test Independent Samples ANOVA Comparing

More information

Introduction to Quantitative Methods (SR8511) Project Report

Introduction to Quantitative Methods (SR8511) Project Report Introduction to Quantitative Methods (SR8511) Project Report Exploring the variables related to and possibly affecting the consumption of alcohol by adults Student Registration number: 554561 Word counts

More information

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu What you should know before you collect data BAE 815 (Fall 2017) Dr. Zifei Liu Zifeiliu@ksu.edu Types and levels of study Descriptive statistics Inferential statistics How to choose a statistical test

More information

Collecting & Making Sense of

Collecting & Making Sense of Collecting & Making Sense of Quantitative Data Deborah Eldredge, PhD, RN Director, Quality, Research & Magnet Recognition i Oregon Health & Science University Margo A. Halm, RN, PhD, ACNS-BC, FAHA Director,

More information

REVIEW ARTICLE. A Review of Inferential Statistical Methods Commonly Used in Medicine

REVIEW ARTICLE. A Review of Inferential Statistical Methods Commonly Used in Medicine A Review of Inferential Statistical Methods Commonly Used in Medicine JCD REVIEW ARTICLE A Review of Inferential Statistical Methods Commonly Used in Medicine Kingshuk Bhattacharjee a a Assistant Manager,

More information

Biostatistics for Med Students. Lecture 1

Biostatistics for Med Students. Lecture 1 Biostatistics for Med Students Lecture 1 John J. Chen, Ph.D. Professor & Director of Biostatistics Core UH JABSOM JABSOM MD7 February 14, 2018 Lecture note: http://biostat.jabsom.hawaii.edu/education/training.html

More information

Introduction to SPSS. Katie Handwerger Why n How February 19, 2009

Introduction to SPSS. Katie Handwerger Why n How February 19, 2009 Introduction to SPSS Katie Handwerger Why n How February 19, 2009 Overview Setting up a data file Frequencies/Descriptives One-sample T-test Paired-samples T-test Independent-samples T-test One-way ANOVA

More information

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60

M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points Total 60 M 140 Test 1 A Name SHOW YOUR WORK FOR FULL CREDIT! Problem Max. Points Your Points 1-10 10 11 3 12 4 13 3 14 10 15 14 16 10 17 7 18 4 19 4 Total 60 Multiple choice questions (1 point each) For questions

More information

Quantitative Methods in Computing Education Research (A brief overview tips and techniques)

Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Quantitative Methods in Computing Education Research (A brief overview tips and techniques) Dr Judy Sheard Senior Lecturer Co-Director, Computing Education Research Group Monash University judy.sheard@monash.edu

More information