Size: px
Start display at page:



1 RESPONSE SURFACE MODELING AND OPTIMIZATION TO ELUCIDATE THE DIFFERENTIAL EFFECTS OF DEMOGRAPHIC CHARACTERISTICS ON HIV PREVALENCE IN SOUTH AFRICA W. Sibanda 1* and P. Pretorius 2 1 DST/NWU Pre-clinical platform North West University, South Africa 2 School of Information Technology North West University, South Africa ABSTRACT In this study, a Central Composite Face Centered (CCF) design was employed to study the individual and interaction effects of demographic characteristics on the spread of HIV in South Africa. The demographic characteristics studied for each pregnant mother attending an antenatal clinic in South Africa, were mother s age, partner s age, mother s level of education and parity. HIV status of an antenatal clinic attendee was found to be highly sensitive to changes in pregnant woman s age and partner s age, using the 2007 South African annual antenatal HIV and syphilis seroprevalence data. Individually the pregnant woman s level of education and parity had no significant effect on the HIV status. However, the latter two demographic characteristics exhibited significant effects on the HIV status of antenatal clinic attendees in two way interactions with other demographic characteristics. A 3D response surface plot indicated that the highest rate of HIV positive individuals was obtainable at the highest age of the pregnant women and lowest age of their partners. * Corresponding Author 252-1

2 1 INTRODUCTION: CIE42 Proceedings, July 2012, Cape Town, South Africa 2012 CIE & SAIIE In South Africa, the annual antenatal HIV survey is the only existing national surveillance activity for determining HIV prevalence and it is therefore a vitally important tool to track the geographic and temporal trends of the epidemic (Department of Health)[1]. Antenatal clinic data contains the following demographic characteristics for each pregnant woman; age (herein called mothage), population group (race), level of education (herein called education), gravidity (number of pregnancies), parity (number of children born), partner s age (herein called fathage), name of clinic, HIV and syphilis results (Department of Health) [2]. This research paper explores the application of response surface methodology (RSM) to study the intricate relationships between antenatal data demographic characteristics and one response variable (HIV prevalence). An RSM is a collection of mathematical and statistical techniques used for modelling and analysis of problems in which a response of interest is influenced by several variables and the objective is to optimize this response (Montgomery) [4]. The specific RSM methodology used in this research is the Central Composite Face Centred (CCF), first proposed by G. E. P. Box and K. B. Wilson in This study follows up on our previous work (Sibanda) [3] where we used a two level fractional factorial design to develop a ranked list of important through unimportant demographic characteristics affecting the HIV status of pregnant mothers attending antenatal clinics for the first time in South Africa. The two level fractional factorial design demonstrated that among demographic characteristics, mother s age had the greatest influence on the HIV status of antenatal clinic attendees. The effects of the rest of the demographics characteristics were ranked using Lenth s plot (figure 1) as shown below; mother s age > level of education > parity > father s age > gravidity > syphilis. Figure 1: Lenth Plot The summaries of the results of the two-level fractional factorial design are shown in Tables 1 and 2 below

3 Table 1: Summary of Results for Two-Level Fractional Factorial Design Summary R R 2 adjusted 0.76 Standard Error 0.18 PRESS 0.52 R 2 for Prediction 0.34 First Order Autocorrelation Collinearity 0.83 Coefficient of Variation Precision Index 9.96 Table 2: ANOVA for Two-Level Fractional Factorial Design ANOVA Source SS SS% MS F F signif df Regression Residual LOF Error Pure Error Total As shown in Table 1, the adjusted R 2 (coefficient of determination) value for the fitted model was The R 2 value provides the proportion of variability in a data set that is accounted for by the statistical model and it provides a measure of how well future outcomes are likely to be predicted by the model. In other words, the R 2 provides us with information about the goodness of fit of our model. Judging from the size of the adjusted R 2 value of the fractional factorial model, this suggested that perhaps an employment of a response surface model (RSM) would assist in elucidating the possible effect of interaction of demographic characteristics on the regression model. This belief was further substantiated by the low value of the F-statistic (F=10.64). The F value indicates the overall significance of the regression model and is thus used to decide whether the model as a whole has statistically significant predictive capability

4 2 LITERATURE REVIEW 2.1 Response Surface Methodology (RSM) RSM is a collection of statistical and mathematical methods that are useful for modelling and analyzing design. RSM experiments are designed to allow us to estimate interaction and even quadratic effects, and therefore give us an idea of the local shape of the response surface being investigated. Linear terms alone produce models with response surfaces that are hyperplanes. The addition of interaction terms allows for warping of the hyperplane. Squared terms produce the simplest models in which the response surface has a maximum or minimum, and so an optimal response. RSM comprises of fundamentally three techniques (Myers) [5], namely: Statistical experimental design Regression modelling Optimization The detailed outline of the steps involved in the design of experiments using RSM is clearly indicated in figure Design of Experiments for measurement of response 2. Mathematical model development 4. Two or Three dimensional plots of interactive effects 3. Finding Optimal set of experimental parameters Figure 2: Design procedure of an RSM 2.2 Central Composite Face Centred (CCF) Design Central Composite Face Centered (CCF) design is an example of an RSM that is widely used for fitting a second-order response surface (Mutnury) [6]. CCF involves use of a two-level factorial combined with axial points, factorial points, and center runs. The factorial points represent variance-optimal design for a first order and center runs provide information about the existence of curvature in the system (Zhang) [7]. If curvature is found in the system, the addition of axial points allows for efficient estimation of the pure quadratic terms. Therefore the CCF design is useful for experiments when there is need to fit a second order response surface 3 EXPERIMENTAL METHODOLOGY 3.1 Sources of Data Seroprevalence data studied was obtained from the 2007 South African antenatal data, supplied by the National Department of Health of South Africa (Department of Health) [1]. The data consisted of about subjects that attended antenatal clinics for the first time across the nine provinces of South Africa in

5 3.2 Research Tools CIE42 Proceedings, July 2012, Cape Town, South Africa 2012 CIE & SAIIE This research utilized the following research tools: 1. Design Expert V8 Software (Design Expert) [8] 2. SAS 9.3, an integrated system of software products provided by SAS Institute Inc. 3. Essential Regression and Experimental Design, version 2.2 (Gibsonia, PA) 3.3 Sampling Procedure To facilitate the experimental design, the data was completely randomized, and this process was undertaken as a preprocessing technique to reduce bias in the design of experiment. 3.4 Missing Data Out of the total of cases from the 2007 South African antenatal seroprevalence database, (68%) cases were found to be complete (32%) cases were incomplete and thus discarded. 3.5 Variables The variables used in the study were parity, education, mothage, fathage and HIV status. The integer value representing level of education stands for the highest grade successfully completed, with 13 representing tertiary education. Parity represents the number of times the individual has given birth. Parity is important as it shows the reproductive activity as well as reproductive health state of the women. The HIV status is binary coded; a 1 represents positive status, while a 0 represents a negative status. 3.6 Experimental Design In this study, the aim was to use a Central Composite Face Centered (CCF) design to study the individual and interaction effects of demographic characteristics on the HIV status of a pregnant mother using seroprevalence data. The CCF design with four factors and one response variable was developed as shown in Table 3. A two factor-interaction (2FI) design model was used, with 21 runs and no blocks. -1 and +1 denote the minimum and maximum levels of factors respectively. Table 3: The CCF Design Matrix with 4 Factors, 1 Response Variable and 4 Center Points Factors Response Run Mothage Fathage Education Parity HIV

6 Factors Response Run Mothage Fathage Education Parity HIV Design Matrix Evaluation Degrees of Freedom Design matrix evaluation showed that there were no aliases for the 2FI model and the degrees of freedom for the matrix are shown in Table 4. As a rule of thumb, a minimum of 3 lack-of-fit df and 4 pure error df ensure a valid lack of fit test. Fewer df tend to lead to a test that may not detect lack of fit (Design Expert) [8]. Standard Errors Table 4: Degrees of Freedom for matrix evaluation Model 10 Residuals 10 Lack of Fit 6 Pure Error 4 Corr total 20 The standard errors of the design are shown in figure 3 and these errors are larger at the edges of the design. This therefore shows that it is advisable to work well within the design margins to achieve a greater degree of accuracy

7 Std Error of Design B: fathage A: mothage Variance Inflation Factor (VIF) Figure 3: 3D Plot of standard error of design The Variance Inflation Factor (VIF) quantifies the severity of multicollinearity in an ordinary least squares regression analysis. It provides an index that measures how much the variance of an estimated regression coefficient is increased because of collinearity. Therefore, VIF values should be ideally 1 and values greater than 10 indicate that coefficients are poorly estimated due to multicollinearity (Design Expert) [8]. The VIF values in Table 5 indicate that coefficients of individual demographic characteristics and their interactions are estimated adequately without multicollinearity. However, quadratic terms displayed a higher degree of multicollinearity. Table 5: Signal to noise ratio with the design matrix Term VIF R i Squared A B C D E AB AC AD AE BC

8 Ri- squared Term VIF R i Squared BD BE CD CE DE A B C D E In general, high R i -squared values mean the terms are correlated with each other, leading to poor model. For this experiment, low R i -squared values were obtained for individual factors and their interactions but higher Ri-squared values were obtained for quadratic terms as shown in Table 5. Fraction of Design Space (FDS) FDS curve (figure 4) is the percentage of the design space volume containing a given standard error of prediction or less. Flatter FDS curve means that the overall prediction error is constant. In general the larger the standard error of prediction, the less likely the results can be repeated, and the less likely that a significant effect will be detected FDS Graph Std Error M ean Fraction of Design Space Figure 4: FDS Plot of the Standard Error over the Design Space 252-8

9 3.6.2 Choice of Levels for the Factors Table 6: Factor Levels Factor Parity (No. of children) Education (Grades) Levels > 2 < Mothage (years) Fathage (years) < < > 30 > 34 4 RESULTS 4.1 Response Transformations A ratio of maximum to minimum response greater than 10 implies that transformation is required. However as shown in Table 7, ratios less than 10 indicate that power transformation will have no effect, hence the response parameter (HIV) and response terms were not transformed for this study. Table 7: Response Ratio Minimum Maximum Response (HIV) Ratio 0.33/0.00 = Fit Summary Model Summary Statistics Table 8: Model Summary Statistics Source Sequential p-value Lackof-fit p-value R 2 Adjusted R 2 Adeq. Precisi on Linear FI The R 2 and adjusted R 2 statistics of 2FI model are impressively high at 0.99 and 0.98 respectively, as shown in Table 8. High R 2 values imply that a large proportion of variation 252-9

10 in the observed values is explained by the model. In addition, the lack-of-fit value of the 2FI of indicates that model lack-of-fit is not significant ANOVA for 2FI Response Surface From the ANOVA results (Table 9), it is evident that the mother s age and the father s age are significant terms in the 2FI model, while educational level and parity individually are not. However the non-significant individual terms tend to be significant in two-way interactions with other demographic characteristics. The model F-value (Table 9) of implies that the model is significant, and hence there is only a 0.01% chance that this model F-value could be due to noise. Table 9: ANOVA Results Source Sum of Squares df Mean square F value P value Model A- Mothage B- Fathage C- Education < D- Parity AB AC AD BC BD CD Adeq. precision is used to measure the signal to noise ratio. A ratio greater than 4 is desirable and for this experiment a ratio of 25 indicates an adequate signal. Therefore this model can be used to navigate the design space. 5 RESIDUAL ANALYSIS There are many statistical tools for model validation, but the primary tool for most process modeling applications is graphical residual analysis. The residual plots assist in examining the underlying statistical assumptions about residuals (see Table 10). Therefore residual analysis is a useful class of techniques for the evaluation of the goodness of a fitted model. One method of residual analysis is the normal plot of residuals

11 Table 10: Statistical assumptions about residuals Independence Whether response variables are independent Normality Homoscedacity Linearity Whether response variables are normally distributed Whether all response variables have same variance Whether the true relationship between response and explanatory variables is a straight line 5.1 Normal Plot of Residuals The normal plot of residuals (Figure 5), evaluates whether there are outliers in the dataset. All the points lie on the diagonal, implying that the residuals constitute normally distributed noise. A curved pattern indicates non-modelled quadratic relations or incorrect transformations. Normal Plot of Residuals N o rm a l % P ro b a b ility Internally Studentized Residuals Figure 5: Normal plot of residuals 6 FINAL EQUATION OF THE RESPONSE MODEL The final equation of the HIV response model was as shown below; HIV= *Mothage *Fathage *Education *Parity *Mothage*Fathage *Mothage*Education *Mothage*Parity *Fathage*Parity

12 A coefficient plot (figure 6) was drawn to represent the information provided by the 2FI response model equation. Coefficient plots tend to clearly represent the relative importance of each variable on the model equation Coefficient mothage fathage educa on parity mot*fat mot*edu mot*parity fat*parity Figure 6: Coefficient Plot of the Different Demographic Characteristics Inspection of the regression coefficients (figure 6) indicates that the two model terms, level of education and parity are not significant and can be removed from the model. 7 PERTURBATION PLOT The perturbation plot (Figure 7) compares the effect of all factors at a particular point in the design space. A steep slope or curvature in a factor shows that the response is sensitive to that factor. A relatively flat line shows insensitivity to change in that particular factor. However the perturbation plot does not show interactions. From figure 7, the perturbation plot indicates that the effects of the demographic characteristics on the response are in the order: Mothage (A) >Fathage (B) > Education (C) > Parity (D) Perturbation 0.6 A 0.5 H IV B C D D C B A Deviation from Reference Point (Coded Units) Figure 7: Perturbation Plot

13 8 3D RESPONSE SURFACE PLOT Figure 8 shows the 3D plot of the influences of mothage and fathage on HIV response. The highest rate of HIV is observed at the highest age of the mother and lowest age of the father H IV B: fathage A: mothage Figure 8: 3D Response Surface plot 9 DISCUSSION A central composite face centered (CCF) design was found to be suitable for studying the involvement of demographic characteristics in the determination of the HIV status of pregnant women attending antenatal clinic in South Africa. The 2FI polynomial function for mothage, fathage, education, and parity obtained using StatEase Design Expert was found to be statistically significant. The measured HIV prevalence response was in close agreement with the predicted values, as shown in Figure 9, below Predicted vs. Actual Predicted Actual Figure 9: Plot of Predicted vs. Actual Response

14 10 CONCLUSION CIE42 Proceedings, July 2012, Cape Town, South Africa 2012 CIE & SAIIE The CCF design therefore confirmed the results obtained by fractional factorial design (Sibanda) [3], that mother s age had the greatest effect on the HIV status of an antenatal clinic attendee. However, the CCF further demonstrated that interaction of factors had a significant effect on an individual s HIV status. The R 2 value of the predictive model improved from 33.5% (fractional factorial in the previous study) to 98% (CCF). The latter result demonstrated that the relationship between the demographic characteristics and HIV response were better modeled by a 2FI function. 11 ACKNOWLEDGEMENTS Wilbert Sibanda acknowledges doctoral funding from South African Centre for Epidemiological Modelling (SACEMA), Medical Research Council (MRC) and North-West University. Special thanks to Cathrine Tlaleng Sibanda and the National Department of Health (South Africa) for the antenatal seroprevalence data ( ). 12 REFERENCES [1] Department of Health National Antenatal Sentinel HIV and Syphilis Prevalence in South Africa. [2] Department of Health Protocol for implementing the National Antenatal Sentinel HIV and Syphilis Prevalence Survey in South Africa. [3] Sibanda, W Application of Two-level Fractional Factorial Design to Determine and Optimize the Effect of Demographic Characteristics on HIV Prevalence using the 2006 South African Annual Antenatal HIV and Syphilis Seroprevalence data, International Journal of Computer Applications, 35 (12). [4] Montgomery, D.C Design and Analysis of Experiments, John Wiley and Sons. [5] Myers, R.H Response Surface Methodology: Process and Product Optimization Using Designed Experiments, 2 nd Edition, John Wiley and Sons. [6] Mutnury, B Modeling and Characterization of High Speed Interfaces in Blade and Rack Servers Using Response Surface Model, Electronic Components and Technology Conference (ECTC). [7] Zhang, Z Comparison about the Three Central Composite Designs with Simulation, International Conference on Advanced Computer Control Advanced Computer Control (ICACC). [8] Design Expert StatEase software

Received: 19 November 2012 / Revised: 16 April 2013 / Accepted: 24 April 2013 / Published online: 9 May 2013 Ó Springer-Verlag Wien 2013

Received: 19 November 2012 / Revised: 16 April 2013 / Accepted: 24 April 2013 / Published online: 9 May 2013 Ó Springer-Verlag Wien 2013 Netw Model Anal Health Inform Bioinforma (23) 2:37 46 DOI.7/s372-3-32-z ORIGINAL ARTICLE Comparative study of the application of central composite face-centred (CCF) and Box Behnken designs (BBD) to study

More information

Trend Analysis of HIV Prevalence Rates amongst Gen X and Y Pregnant Women Attending Antenatal Clinics in South Africa between 2001 and 2010

Trend Analysis of HIV Prevalence Rates amongst Gen X and Y Pregnant Women Attending Antenatal Clinics in South Africa between 2001 and 2010 Trend Analysis of HIV Prevalence Rates amongst Gen X and Y Pregnant Women Attending Antenatal Clinics in South Africa between 2001 and 2010 Wilbert Sibanda Philip D. Pretorius School of Information Technology,

More information


CHILD HEALTH AND DEVELOPMENT STUDY CHILD HEALTH AND DEVELOPMENT STUDY 9. Diagnostics In this section various diagnostic tools will be used to evaluate the adequacy of the regression model with the five independent variables developed in

More information

Mark J. Anderson, Patrick J. Whitcomb Stat-Ease, Inc., Minneapolis, MN USA

Mark J. Anderson, Patrick J. Whitcomb Stat-Ease, Inc., Minneapolis, MN USA Journal of Statistical Science and Application (014) 85-9 D DAV I D PUBLISHING Practical Aspects for Designing Statistically Optimal Experiments Mark J. Anderson, Patrick J. Whitcomb Stat-Ease, Inc., Minneapolis,

More information

Multiple Regression Analysis

Multiple Regression Analysis Multiple Regression Analysis Basic Concept: Extend the simple regression model to include additional explanatory variables: Y = β 0 + β1x1 + β2x2 +... + βp-1xp + ε p = (number of independent variables

More information

International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: Volume: 4 Issue:

International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: Volume: 4 Issue: Application of the Variance Function of the Difference Between two estimated responses in regulating Blood Sugar Level in a Diabetic patient using Herbal Formula Karanjah Anthony N. School of Science Maasai

More information

WELCOME! Lecture 11 Thommy Perlinger

WELCOME! Lecture 11 Thommy Perlinger Quantitative Methods II WELCOME! Lecture 11 Thommy Perlinger Regression based on violated assumptions If any of the assumptions are violated, potential inaccuracies may be present in the estimated regression

More information

Chapter 1: Introduction

Chapter 1: Introduction Chapter 1: Introduction Index 1.1. Background 1.2. Problem statement 1.3. Aim and objectives 1.1. Background HIV/AIDS is a leading health problem in the sub-saharan African region. The need to formulate

More information


CHAPTER TWO REGRESSION CHAPTER TWO REGRESSION 2.0 Introduction The second chapter, Regression analysis is an extension of correlation. The aim of the discussion of exercises is to enhance students capability to assess the effect

More information

Pitfalls in Linear Regression Analysis

Pitfalls in Linear Regression Analysis Pitfalls in Linear Regression Analysis Due to the widespread availability of spreadsheet and statistical software for disposal, many of us do not really have a good understanding of how to use regression

More information

Simple Linear Regression

Simple Linear Regression Simple Linear Regression Assoc. Prof Dr Sarimah Abdullah Unit of Biostatistics & Research Methodology School of Medical Sciences, Health Campus Universiti Sains Malaysia Regression Regression analysis

More information


CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS - CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS SECOND EDITION Raymond H. Myers Virginia Polytechnic Institute and State university 1 ~l~~l~l~~~~~~~l!~ ~~~~~l~/ll~~ Donated by Duxbury o Thomson Learning,,

More information

Examining Relationships Least-squares regression. Sections 2.3

Examining Relationships Least-squares regression. Sections 2.3 Examining Relationships Least-squares regression Sections 2.3 The regression line A regression line describes a one-way linear relationship between variables. An explanatory variable, x, explains variability

More information


SUPPLEMENTAL MATERIAL 1 SUPPLEMENTAL MATERIAL Response time and signal detection time distributions SM Fig. 1. Correct response time (thick solid green curve) and error response time densities (dashed red curve), averaged across

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Study Guide #2: MULTIPLE REGRESSION in education

Study Guide #2: MULTIPLE REGRESSION in education Study Guide #2: MULTIPLE REGRESSION in education What is Multiple Regression? When using Multiple Regression in education, researchers use the term independent variables to identify those variables that

More information

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug? MMI 409 Spring 2009 Final Examination Gordon Bleil Table of Contents Research Scenario and General Assumptions Questions for Dataset (Questions are hyperlinked to detailed answers) 1. Is there a difference

More information

Week 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method.

Week 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method. Week 8 Hour 1: More on polynomial fits. The AIC Hour 2: Dummy Variables what are they? An NHL Example Hour 3: Interactions. The stepwise method. Stat 302 Notes. Week 8, Hour 1, Page 1 / 34 Human growth

More information

Daniel Boduszek University of Huddersfield

Daniel Boduszek University of Huddersfield Daniel Boduszek University of Huddersfield Introduction to Multiple Regression (MR) Types of MR Assumptions of MR SPSS procedure of MR Example based on prison data Interpretation of

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process Research Methods in Forest Sciences: Learning Diary Yoko Lu 285122 9 December 2016 1. Research process It is important to pursue and apply knowledge and understand the world under both natural and social

More information

Linear Regression Analysis

Linear Regression Analysis Linear Regression Analysis WILEY SERIES IN PROBABILITY AND STATISTICS Established by WALTER A. SHEWHART and SAMUEL S. WILKS Editors: David J. Balding, Peter Bloomfield, Noel A. C. Cressie, Nicholas I.

More information

Introduction to regression

Introduction to regression Introduction to regression Regression describes how one variable (response) depends on another variable (explanatory variable). Response variable: variable of interest, measures the outcome of a study

More information

This tutorial presentation is prepared by. Mohammad Ehsanul Karim

This tutorial presentation is prepared by. Mohammad Ehsanul Karim STATA: The Red tutorial STATA: The Red tutorial This tutorial presentation is prepared by Mohammad Ehsanul Karim STATA: The Red tutorial This tutorial presentation is prepared by

More information


CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys Multiple Regression Analysis 1 CRITERIA FOR USE Multiple regression analysis is used to test the effects of n independent (predictor) variables on a single dependent (criterion) variable. Regression tests

More information

In many cardiovascular experiments and observational studies,

In many cardiovascular experiments and observational studies, Statistical Primer for Cardiovascular Research Multiple Linear Regression Accounting for Multiple Simultaneous Determinants of a Continuous Dependent Variable Bryan K. Slinker, DVM, PhD; Stanton A. Glantz,

More information

Chapter 1: Exploring Data

Chapter 1: Exploring Data Chapter 1: Exploring Data Key Vocabulary:! individual! variable! frequency table! relative frequency table! distribution! pie chart! bar graph! two-way table! marginal distributions! conditional distributions!

More information

1.4 - Linear Regression and MS Excel

1.4 - Linear Regression and MS Excel 1.4 - Linear Regression and MS Excel Regression is an analytic technique for determining the relationship between a dependent variable and an independent variable. When the two variables have a linear

More information

11/24/2017. Do not imply a cause-and-effect relationship

11/24/2017. Do not imply a cause-and-effect relationship Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are highly extraverted people less afraid of rejection

More information

Linear Regression in SAS

Linear Regression in SAS 1 Suppose we wish to examine factors that predict patient s hemoglobin levels. Simulated data for six patients is used throughout this tutorial. data hgb_data; input id age race $ bmi hgb; cards; 21 25

More information

Regression Discontinuity Analysis

Regression Discontinuity Analysis Regression Discontinuity Analysis A researcher wants to determine whether tutoring underachieving middle school students improves their math grades. Another wonders whether providing financial aid to low-income

More information

Statistics 2. RCBD Review. Agriculture Innovation Program

Statistics 2. RCBD Review. Agriculture Innovation Program Statistics 2. RCBD Review 2014. Prepared by Lauren Pincus With input from Mark Bell and Richard Plant Agriculture Innovation Program 1 Table of Contents Questions for review... 3 Answers... 3 Materials

More information

bivariate analysis: The statistical analysis of the relationship between two variables.

bivariate analysis: The statistical analysis of the relationship between two variables. bivariate analysis: The statistical analysis of the relationship between two variables. cell frequency: The number of cases in a cell of a cross-tabulation (contingency table). chi-square (χ 2 ) test for

More information

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months? Medical Statistics 1 Basic Concepts Farhad Pishgar Defining the data Population and samples Except when a full census is taken, we collect data on a sample from a much larger group called the population.

More information

Business Statistics Probability

Business Statistics Probability Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you? WDHS Curriculum Map Probability and Statistics Time Interval/ Unit 1: Introduction to Statistics 1.1-1.3 2 weeks S-IC-1: Understand statistics as a process for making inferences about population parameters

More information

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to CHAPTER - 6 STATISTICAL ANALYSIS 6.1 Introduction This chapter discusses inferential statistics, which use sample data to make decisions or inferences about population. Populations are group of interest

More information

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations) Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations) After receiving my comments on the preliminary reports of your datasets, the next step for the groups is to complete

More information

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Week 9 Hour 3 Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality Stat 302 Notes. Week 9, Hour 3, Page 1 / 39 Stepwise Now that we've introduced interactions,

More information

Chapter 3: Examining Relationships

Chapter 3: Examining Relationships Name Date Per Key Vocabulary: response variable explanatory variable independent variable dependent variable scatterplot positive association negative association linear correlation r-value regression

More information

Anale. Seria Informatică. Vol. XVI fasc Annals. Computer Science Series. 16 th Tome 1 st Fasc. 2018

Anale. Seria Informatică. Vol. XVI fasc Annals. Computer Science Series. 16 th Tome 1 st Fasc. 2018 HANDLING MULTICOLLINEARITY; A COMPARATIVE STUDY OF THE PREDICTION PERFORMANCE OF SOME METHODS BASED ON SOME PROBABILITY DISTRIBUTIONS Zakari Y., Yau S. A., Usman U. Department of Mathematics, Usmanu Danfodiyo

More information


MEA DISCUSSION PAPERS Inference Problems under a Special Form of Heteroskedasticity Helmut Farbmacher, Heinrich Kögel 03-2015 MEA DISCUSSION PAPERS mea Amalienstr. 33_D-80799 Munich_Phone+49 89 38602-355_Fax +49 89

More information



More information



More information

Defining and Measuring Recent infection

Defining and Measuring Recent infection Defining and Measuring Recent infection Application to Incidence Estimation Alex Welte Alex Welte (SACEMA) Recent Infection November 2013 1 / 29 Introduction What is Recent Infection? Introduction of a

More information

Section 3.2 Least-Squares Regression

Section 3.2 Least-Squares Regression Section 3.2 Least-Squares Regression Linear relationships between two quantitative variables are pretty common and easy to understand. Correlation measures the direction and strength of these relationships.

More information


NORTH SOUTH UNIVERSITY TUTORIAL 2 NORTH SOUTH UNIVERSITY TUTORIAL 2 AHMED HOSSAIN,PhD Data Management and Analysis AHMED HOSSAIN,PhD - Data Management and Analysis 1 Correlation Analysis INTRODUCTION In correlation analysis, we estimate

More information

Still important ideas

Still important ideas Readings: OpenStax - Chapters 1 13 & Appendix D & E (online) Plous Chapters 17 & 18 - Chapter 17: Social Influences - Chapter 18: Group Judgments and Decisions Still important ideas Contrast the measurement

More information


Chapter 3 CORRELATION AND REGRESSION CORRELATION AND REGRESSION TOPIC SLIDE Linear Regression Defined 2 Regression Equation 3 The Slope or b 4 The Y-Intercept or a 5 What Value of the Y-Variable Should be Predicted When r = 0? 7 The Regression

More information

Optimization of Tomato Fruit Color after Simulated Transport Using Response Surface Methodology (RSM)

Optimization of Tomato Fruit Color after Simulated Transport Using Response Surface Methodology (RSM) International Journal of Food Science and Nutrition Engineering 2016, 6(2): 42-47 DOI: 10.5923/ Optimization of Tomato Fruit Color after Simulated Transport Using Response Surface Methodology

More information

investigate. educate. inform.

investigate. educate. inform. investigate. educate. inform. Research Design What drives your research design? The battle between Qualitative and Quantitative is over Think before you leap What SHOULD drive your research design. Advanced

More information

M15_BERE8380_12_SE_C15.6.qxd 2/21/11 8:21 PM Page Influence Analysis 1

M15_BERE8380_12_SE_C15.6.qxd 2/21/11 8:21 PM Page Influence Analysis 1 M15_BERE8380_12_SE_C15.6.qxd 2/21/11 8:21 PM Page 1 15.6 Influence Analysis FIGURE 15.16 Minitab worksheet containing computed values for the Studentized deleted residuals, the hat matrix elements, and

More information

Understandable Statistics

Understandable Statistics Understandable Statistics correlated to the Advanced Placement Program Course Description for Statistics Prepared for Alabama CC2 6/2003 2003 Understandable Statistics 2003 correlated to the Advanced Placement

More information

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo Business Statistics The following was provided by Dr. Suzanne Delaney, and is a comprehensive review of Business Statistics. The workshop instructor will provide relevant examples during the Skills Assessment

More information

TEACHING REGRESSION WITH SIMULATION. John H. Walker. Statistics Department California Polytechnic State University San Luis Obispo, CA 93407, U.S.A.

TEACHING REGRESSION WITH SIMULATION. John H. Walker. Statistics Department California Polytechnic State University San Luis Obispo, CA 93407, U.S.A. Proceedings of the 004 Winter Simulation Conference R G Ingalls, M D Rossetti, J S Smith, and B A Peters, eds TEACHING REGRESSION WITH SIMULATION John H Walker Statistics Department California Polytechnic

More information

Chapter 3: Describing Relationships

Chapter 3: Describing Relationships Chapter 3: Describing Relationships Objectives: Students will: Construct and interpret a scatterplot for a set of bivariate data. Compute and interpret the correlation, r, between two variables. Demonstrate

More information

Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms).

Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms). Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms). 1. Bayesian Information Criterion 2. Cross-Validation 3. Robust 4. Imputation

More information

Technical Specifications

Technical Specifications Technical Specifications In order to provide summary information across a set of exercises, all tests must employ some form of scoring models. The most familiar of these scoring models is the one typically

More information


10. LINEAR REGRESSION AND CORRELATION 1 10. LINEAR REGRESSION AND CORRELATION The contingency table describes an association between two nominal (categorical) variables (e.g., use of supplemental oxygen and mountaineer survival ). We have

More information



More information

isc ove ring i Statistics sing SPSS

isc ove ring i Statistics sing SPSS isc ove ring i Statistics sing SPSS S E C O N D! E D I T I O N (and sex, drugs and rock V roll) A N D Y F I E L D Publications London o Thousand Oaks New Delhi CONTENTS Preface How To Use This Book Acknowledgements

More information

Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump?

Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump? The Economic and Social Review, Vol. 38, No. 2, Summer/Autumn, 2007, pp. 259 274 Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump? DAVID MADDEN University College Dublin Abstract:

More information



More information

Rob Dorrington, Debbie Bradshaw and Debbie Budlender

Rob Dorrington, Debbie Bradshaw and Debbie Budlender by Rob Dorrington, Debbie Bradshaw and Debbie Budlender The Centre for Actuarial Research The Burden of Disease Research Unit The Actuarial Society of South Africa HIV/ profile in the provinces of South

More information

Comparison of Adaptive and M Estimation in Linear Regression

Comparison of Adaptive and M Estimation in Linear Regression IOSR Journal of Mathematics (IOSR-JM) e-issn: 2278-5728, p-issn: 2319-765X. Volume 13, Issue 3 Ver. III (May - June 2017), PP 33-37 Comparison of Adaptive and M Estimation in Linear

More information

Effects of Nutrients on Shrimp Growth

Effects of Nutrients on Shrimp Growth Data Set 5: Effects of Nutrients on Shrimp Growth Statistical setting This Handout is an example of extreme collinearity of the independent variables, and of the methods used for diagnosing this problem.

More information

AP Statistics Practice Test Ch. 3 and Previous

AP Statistics Practice Test Ch. 3 and Previous AP Statistics Practice Test Ch. 3 and Previous Name Date Use the following to answer questions 1 and 2: A researcher measures the height (in feet) and volume of usable lumber (in cubic feet) of 32 cherry

More information

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015 Introduction to diagnostic accuracy meta-analysis Yemisi Takwoingi October 2015 Learning objectives To appreciate the concept underlying DTA meta-analytic approaches To know the Moses-Littenberg SROC method

More information

Walkability vs. Several Health Diagnoses for Klamath Falls, OR

Walkability vs. Several Health Diagnoses for Klamath Falls, OR Walkability vs. Several Health Diagnoses for Klamath Falls, OR John Ritter, Ph.D. Geomatics Dept, Oregon Tech Stephanie Van Dyke, MD, MPH Medical Director, Sky Lakes Wellness Center Katherine Pope, RN,

More information

Problem set 2: understanding ordinary least squares regressions

Problem set 2: understanding ordinary least squares regressions Problem set 2: understanding ordinary least squares regressions September 12, 2013 1 Introduction This problem set is meant to accompany the undergraduate econometrics video series on youtube; covering

More information

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes Content Quantifying association between continuous variables. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma The Research Unit for General

More information

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F Plous Chapters 17 & 18 Chapter 17: Social Influences Chapter 18: Group Judgments and Decisions

More information

Things you need to know about the Normal Distribution. How to use your statistical calculator to calculate The mean The SD of a set of data points.

Things you need to know about the Normal Distribution. How to use your statistical calculator to calculate The mean The SD of a set of data points. Things you need to know about the Normal Distribution How to use your statistical calculator to calculate The mean The SD of a set of data points. The formula for the Variance (SD 2 ) The formula for the

More information

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University Multiple Regression James H. Steiger Department of Psychology and Human Development Vanderbilt University James H. Steiger (Vanderbilt University) Multiple Regression 1 / 19 Multiple Regression 1 The Multiple

More information

A Comparative Study of Some Estimation Methods for Multicollinear Data

A Comparative Study of Some Estimation Methods for Multicollinear Data International Journal of Engineering and Applied Sciences (IJEAS) A Comparative Study of Some Estimation Methods for Multicollinear Okeke Evelyn Nkiruka, Okeke Joseph Uchenna Abstract This article compares

More information

STA Module 9 Confidence Intervals for One Population Mean

STA Module 9 Confidence Intervals for One Population Mean STA 2023 Module 9 Confidence Intervals for One Population Mean Learning Objectives Upon completing this module, you should be able to: 1. Obtain a point estimate for a population mean. 2. Find and interpret

More information

Correlation and Regression

Correlation and Regression Dublin Institute of Technology ARROW@DIT Books/Book Chapters School of Management 2012-10 Correlation and Regression Donal O'Brien Dublin Institute of Technology, Pamela Sharkey Scott

More information

Overview of Non-Parametric Statistics

Overview of Non-Parametric Statistics Overview of Non-Parametric Statistics LISA Short Course Series Mark Seiss, Dept. of Statistics April 7, 2009 Presentation Outline 1. Homework 2. Review of Parametric Statistics 3. Overview Non-Parametric

More information

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc. Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able

More information

Knowledge as a driver of public perceptions about climate change reassessed

Knowledge as a driver of public perceptions about climate change reassessed 1. Method and measures 1.1 Sample Knowledge as a driver of public perceptions about climate change reassessed In the cross-country study, the age of the participants ranged between 20 and 79 years, with

More information

Title: A new statistical test for trends: establishing the properties of a test for repeated binomial observations on a set of items

Title: A new statistical test for trends: establishing the properties of a test for repeated binomial observations on a set of items Title: A new statistical test for trends: establishing the properties of a test for repeated binomial observations on a set of items Introduction Many studies of therapies with single subjects involve

More information


MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0% Capstone Test (will consist of FOUR quizzes and the FINAL test grade will be an average of the four quizzes). Capstone #1: Review of Chapters 1-3 Capstone #2: Review of Chapter 4 Capstone #3: Review of

More information

The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring

The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring Volume 31 (1), pp. 17 37 ORiON ISSN 0529-191-X 2015 The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression

More information

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables Tables and Formulas for Sullivan, Fundamentals of Statistics, 4e 014 Pearson Education, Inc. Chapter Organizing and Summarizing Data Relative frequency = frequency sum of all frequencies Class midpoint:

More information


MONGOLIA - PREVALENCE OF UNDERWEIGHT CHILDREN (UNDER FIVE YEARS OF AGE) MONGOLIA 4. Prevalence of underweight children (under five years of age) Kachondham Y. Report of a consultancy on the Mongolian Child Nutrition Survey. Institute of Nutrition. Nakompathom, Thailand; 1992

More information

Simple Linear Regression the model, estimation and testing

Simple Linear Regression the model, estimation and testing Simple Linear Regression the model, estimation and testing Lecture No. 05 Example 1 A production manager has compared the dexterity test scores of five assembly-line employees with their hourly productivity.

More information

Chapter 1: Explaining Behavior

Chapter 1: Explaining Behavior Chapter 1: Explaining Behavior GOAL OF SCIENCE is to generate explanations for various puzzling natural phenomenon. - Generate general laws of behavior (psychology) RESEARCH: principle method for acquiring

More information

On the purpose of testing:

On the purpose of testing: Why Evaluation & Assessment is Important Feedback to students Feedback to teachers Information to parents Information for selection and certification Information for accountability Incentives to increase

More information


CHAPTER 3 RESEARCH METHODOLOGY CHAPTER 3 RESEARCH METHODOLOGY 3.1 Introduction 3.1 Methodology 3.1.1 Research Design 3.1. Research Framework Design 3.1.3 Research Instrument 3.1.4 Validity of Questionnaire 3.1.5 Statistical Measurement

More information

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) * by J. RICHARD LANDIS** and GARY G. KOCH** 4 Methods proposed for nominal and ordinal data Many

More information


MODELING AN SMT LINE TO IMPROVE THROUGHPUT As originally published in the SMTA Proceedings MODELING AN SMT LINE TO IMPROVE THROUGHPUT Gregory Vance Rockwell Automation, Inc. Mayfield Heights, OH, USA Todd Vick Universal

More information

Method Comparison Report Semi-Annual 1/5/2018

Method Comparison Report Semi-Annual 1/5/2018 Method Comparison Report Semi-Annual 1/5/2018 Prepared for Carl Commissioner Regularatory Commission 123 Commission Drive Anytown, XX, 12345 Prepared by Dr. Mark Mainstay Clinical Laboratory Kennett Community

More information

Optimization of saccharification conditions of prebiotic extracted jackfruit seeds

Optimization of saccharification conditions of prebiotic extracted jackfruit seeds Paper Code: fb005 TIChE International Conference 0 November 0, 0 at Hatyai, Songkhla THAILAND Optimization of saccharification conditions of prebiotic extracted jackfruit seeds Sininart Chongkhong *, Bancha

More information


STATISTICS & PROBABILITY STATISTICS & PROBABILITY LAWRENCE HIGH SCHOOL STATISTICS & PROBABILITY CURRICULUM MAP 2015-2016 Quarter 1 Unit 1 Collecting Data and Drawing Conclusions Unit 2 Summarizing Data Quarter 2 Unit 3 Randomness

More information

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision

More information


STATISTICS AND RESEARCH DESIGN Statistics 1 STATISTICS AND RESEARCH DESIGN These are subjects that are frequently confused. Both subjects often evoke student anxiety and avoidance. To further complicate matters, both areas appear have

More information

Determinants of communication between partners about STD symptoms: implications for partner referral in South Africa

Determinants of communication between partners about STD symptoms: implications for partner referral in South Africa Determinants of communication between partners about STD : implications for partner referral in South Africa Meyer-Weitz A, PhD School of Psychology, University of KwaZulu-Natal, Durban, South Africa Reddy

More information

Advanced ANOVA Procedures

Advanced ANOVA Procedures Advanced ANOVA Procedures Session Lecture Outline:. An example. An example. Two-way ANOVA. An example. Two-way Repeated Measures ANOVA. MANOVA. ANalysis of Co-Variance (): an ANOVA procedure whereby the

More information

CHAPTER 3 Describing Relationships

CHAPTER 3 Describing Relationships CHAPTER 3 Describing Relationships 3.1 Scatterplots and Correlation The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers Reading Quiz 3.1 True/False 1.

More information