RESPONSE SURFACE MODELING AND OPTIMIZATION TO ELUCIDATE THE DIFFERENTIAL EFFECTS OF DEMOGRAPHIC CHARACTERISTICS ON HIV PREVALENCE IN SOUTH AFRICA

Similar documents
Received: 19 November 2012 / Revised: 16 April 2013 / Accepted: 24 April 2013 / Published online: 9 May 2013 Ó Springer-Verlag Wien 2013

Trend Analysis of HIV Prevalence Rates amongst Gen X and Y Pregnant Women Attending Antenatal Clinics in South Africa between 2001 and 2010

CHILD HEALTH AND DEVELOPMENT STUDY

Mark J. Anderson, Patrick J. Whitcomb Stat-Ease, Inc., Minneapolis, MN USA

Multiple Regression Analysis

International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: Volume: 4 Issue:

WELCOME! Lecture 11 Thommy Perlinger

Chapter 1: Introduction

CHAPTER TWO REGRESSION

Pitfalls in Linear Regression Analysis

Simple Linear Regression

CLASSICAL AND. MODERN REGRESSION WITH APPLICATIONS

Examining Relationships Least-squares regression. Sections 2.3

SUPPLEMENTAL MATERIAL

Unit 1 Exploring and Understanding Data

Study Guide #2: MULTIPLE REGRESSION in education

MMI 409 Spring 2009 Final Examination Gordon Bleil. 1. Is there a difference in depression as a function of group and drug?

Week 8 Hour 1: More on polynomial fits. The AIC. Hour 2: Dummy Variables what are they? An NHL Example. Hour 3: Interactions. The stepwise method.

Daniel Boduszek University of Huddersfield

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

Research Methods in Forest Sciences: Learning Diary. Yoko Lu December Research process

Linear Regression Analysis

Introduction to regression

This tutorial presentation is prepared by. Mohammad Ehsanul Karim

CRITERIA FOR USE. A GRAPHICAL EXPLANATION OF BI-VARIATE (2 VARIABLE) REGRESSION ANALYSISSys

In many cardiovascular experiments and observational studies,

Chapter 1: Exploring Data

1.4 - Linear Regression and MS Excel

11/24/2017. Do not imply a cause-and-effect relationship

Linear Regression in SAS

Regression Discontinuity Analysis

Statistics 2. RCBD Review. Agriculture Innovation Program

bivariate analysis: The statistical analysis of the relationship between two variables.

Medical Statistics 1. Basic Concepts Farhad Pishgar. Defining the data. Alive after 6 months?

Business Statistics Probability

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

CHAPTER - 6 STATISTICAL ANALYSIS. This chapter discusses inferential statistics, which use sample data to

Preliminary Report on Simple Statistical Tests (t-tests and bivariate correlations)

Stepwise method Modern Model Selection Methods Quantile-Quantile plot and tests for normality

Chapter 3: Examining Relationships

Anale. Seria Informatică. Vol. XVI fasc Annals. Computer Science Series. 16 th Tome 1 st Fasc. 2018

MEA DISCUSSION PAPERS

TEN YEARS OF SYPHILIS TRENDS IN THE NORTHERN CAPE PROVINCE, SOUTH AFRICA, UTILISING THE NHLS CORPORATE DATA WAREHOUSE

MODELLING THE SPREAD OF PNEUMONIA IN THE PHILIPPINES USING SUSCEPTIBLE-INFECTED-RECOVERED (SIR) MODEL WITH DEMOGRAPHIC CHANGES

Defining and Measuring Recent infection

Section 3.2 Least-Squares Regression

NORTH SOUTH UNIVERSITY TUTORIAL 2

Still important ideas

Chapter 3 CORRELATION AND REGRESSION

Optimization of Tomato Fruit Color after Simulated Transport Using Response Surface Methodology (RSM)

investigate. educate. inform.

M15_BERE8380_12_SE_C15.6.qxd 2/21/11 8:21 PM Page Influence Analysis 1

Understandable Statistics

Describe what is meant by a placebo Contrast the double-blind procedure with the single-blind procedure Review the structure for organizing a memo

TEACHING REGRESSION WITH SIMULATION. John H. Walker. Statistics Department California Polytechnic State University San Luis Obispo, CA 93407, U.S.A.

Chapter 3: Describing Relationships

Problem 1) Match the terms to their definitions. Every term is used exactly once. (In the real midterm, there are fewer terms).

Technical Specifications

10. LINEAR REGRESSION AND CORRELATION

DISTRIBUTION OF HAEMOGLOBIN LEVEL, PACKED CELL VOLUME, AND MEAN CORPUSCULAR HAEMOGLOBIN

isc ove ring i Statistics sing SPSS

Doctors Fees in Ireland Following the Change in Reimbursement: Did They Jump?

A MONTE CARLO STUDY OF MODEL SELECTION PROCEDURES FOR THE ANALYSIS OF CATEGORICAL DATA

Rob Dorrington, Debbie Bradshaw and Debbie Budlender

Comparison of Adaptive and M Estimation in Linear Regression

Effects of Nutrients on Shrimp Growth

AP Statistics Practice Test Ch. 3 and Previous

Introduction to diagnostic accuracy meta-analysis. Yemisi Takwoingi October 2015

Walkability vs. Several Health Diagnoses for Klamath Falls, OR

Problem set 2: understanding ordinary least squares regressions

Content. Basic Statistics and Data Analysis for Health Researchers from Foreign Countries. Research question. Example Newly diagnosed Type 2 Diabetes

Readings: Textbook readings: OpenStax - Chapters 1 13 (emphasis on Chapter 12) Online readings: Appendix D, E & F

Things you need to know about the Normal Distribution. How to use your statistical calculator to calculate The mean The SD of a set of data points.

Multiple Regression. James H. Steiger. Department of Psychology and Human Development Vanderbilt University

A Comparative Study of Some Estimation Methods for Multicollinear Data

STA Module 9 Confidence Intervals for One Population Mean

Correlation and Regression

Overview of Non-Parametric Statistics

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Knowledge as a driver of public perceptions about climate change reassessed

Title: A new statistical test for trends: establishing the properties of a test for repeated binomial observations on a set of items

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

2.75: 84% 2.5: 80% 2.25: 78% 2: 74% 1.75: 70% 1.5: 66% 1.25: 64% 1.0: 60% 0.5: 50% 0.25: 25% 0: 0%

The impact of pre-selected variance inflation factor thresholds on the stability and predictive power of logistic regression models in credit scoring

Chapter 2 Organizing and Summarizing Data. Chapter 3 Numerically Summarizing Data. Chapter 4 Describing the Relation between Two Variables

MONGOLIA - PREVALENCE OF UNDERWEIGHT CHILDREN (UNDER FIVE YEARS OF AGE)

Simple Linear Regression the model, estimation and testing

Chapter 1: Explaining Behavior

On the purpose of testing:

CHAPTER 3 RESEARCH METHODOLOGY

A review of statistical methods in the analysis of data arising from observer reliability studies (Part 11) *

MODELING AN SMT LINE TO IMPROVE THROUGHPUT

Method Comparison Report Semi-Annual 1/5/2018

Optimization of saccharification conditions of prebiotic extracted jackfruit seeds

STATISTICS & PROBABILITY

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

STATISTICS AND RESEARCH DESIGN

Determinants of communication between partners about STD symptoms: implications for partner referral in South Africa

Advanced ANOVA Procedures

CHAPTER 3 Describing Relationships

Transcription:

RESPONSE SURFACE MODELING AND OPTIMIZATION TO ELUCIDATE THE DIFFERENTIAL EFFECTS OF DEMOGRAPHIC CHARACTERISTICS ON HIV PREVALENCE IN SOUTH AFRICA W. Sibanda 1* and P. Pretorius 2 1 DST/NWU Pre-clinical platform North West University, South Africa wilbert.sibanda@nwu.ac.za 2 School of Information Technology North West University, South Africa philip.pretorius@nwu.ac.za ABSTRACT In this study, a Central Composite Face Centered (CCF) design was employed to study the individual and interaction effects of demographic characteristics on the spread of HIV in South Africa. The demographic characteristics studied for each pregnant mother attending an antenatal clinic in South Africa, were mother s age, partner s age, mother s level of education and parity. HIV status of an antenatal clinic attendee was found to be highly sensitive to changes in pregnant woman s age and partner s age, using the 2007 South African annual antenatal HIV and syphilis seroprevalence data. Individually the pregnant woman s level of education and parity had no significant effect on the HIV status. However, the latter two demographic characteristics exhibited significant effects on the HIV status of antenatal clinic attendees in two way interactions with other demographic characteristics. A 3D response surface plot indicated that the highest rate of HIV positive individuals was obtainable at the highest age of the pregnant women and lowest age of their partners. * Corresponding Author 252-1

1 INTRODUCTION: CIE42 Proceedings, 16-18 July 2012, Cape Town, South Africa 2012 CIE & SAIIE In South Africa, the annual antenatal HIV survey is the only existing national surveillance activity for determining HIV prevalence and it is therefore a vitally important tool to track the geographic and temporal trends of the epidemic (Department of Health)[1]. Antenatal clinic data contains the following demographic characteristics for each pregnant woman; age (herein called mothage), population group (race), level of education (herein called education), gravidity (number of pregnancies), parity (number of children born), partner s age (herein called fathage), name of clinic, HIV and syphilis results (Department of Health) [2]. This research paper explores the application of response surface methodology (RSM) to study the intricate relationships between antenatal data demographic characteristics and one response variable (HIV prevalence). An RSM is a collection of mathematical and statistical techniques used for modelling and analysis of problems in which a response of interest is influenced by several variables and the objective is to optimize this response (Montgomery) [4]. The specific RSM methodology used in this research is the Central Composite Face Centred (CCF), first proposed by G. E. P. Box and K. B. Wilson in 1951. This study follows up on our previous work (Sibanda) [3] where we used a two level fractional factorial design to develop a ranked list of important through unimportant demographic characteristics affecting the HIV status of pregnant mothers attending antenatal clinics for the first time in South Africa. The two level fractional factorial design demonstrated that among demographic characteristics, mother s age had the greatest influence on the HIV status of antenatal clinic attendees. The effects of the rest of the demographics characteristics were ranked using Lenth s plot (figure 1) as shown below; mother s age > level of education > parity > father s age > gravidity > syphilis. Figure 1: Lenth Plot The summaries of the results of the two-level fractional factorial design are shown in Tables 1 and 2 below. 252-2

Table 1: Summary of Results for Two-Level Fractional Factorial Design Summary R 2 0.84 R 2 adjusted 0.76 Standard Error 0.18 PRESS 0.52 R 2 for Prediction 0.34 First Order Autocorrelation -0.74 Collinearity 0.83 Coefficient of Variation 52.17 Precision Index 9.96 Table 2: ANOVA for Two-Level Fractional Factorial Design ANOVA Source SS SS% MS F F signif df Regression 0.66 84 0.33 10.64 0.03 2 Residual 0.12 16 0.03 4 LOF Error 0.05 43 0.05 2026 0.23 1 Pure Error 0.07 57 0.02 3 Total 0.78 100 6 As shown in Table 1, the adjusted R 2 (coefficient of determination) value for the fitted model was 0.76. The R 2 value provides the proportion of variability in a data set that is accounted for by the statistical model and it provides a measure of how well future outcomes are likely to be predicted by the model. In other words, the R 2 provides us with information about the goodness of fit of our model. Judging from the size of the adjusted R 2 value of the fractional factorial model, this suggested that perhaps an employment of a response surface model (RSM) would assist in elucidating the possible effect of interaction of demographic characteristics on the regression model. This belief was further substantiated by the low value of the F-statistic (F=10.64). The F value indicates the overall significance of the regression model and is thus used to decide whether the model as a whole has statistically significant predictive capability. 252-3

2 LITERATURE REVIEW 2.1 Response Surface Methodology (RSM) RSM is a collection of statistical and mathematical methods that are useful for modelling and analyzing design. RSM experiments are designed to allow us to estimate interaction and even quadratic effects, and therefore give us an idea of the local shape of the response surface being investigated. Linear terms alone produce models with response surfaces that are hyperplanes. The addition of interaction terms allows for warping of the hyperplane. Squared terms produce the simplest models in which the response surface has a maximum or minimum, and so an optimal response. RSM comprises of fundamentally three techniques (Myers) [5], namely: Statistical experimental design Regression modelling Optimization The detailed outline of the steps involved in the design of experiments using RSM is clearly indicated in figure 2. 1. Design of Experiments for measurement of response 2. Mathematical model development 4. Two or Three dimensional plots of interactive effects 3. Finding Optimal set of experimental parameters Figure 2: Design procedure of an RSM 2.2 Central Composite Face Centred (CCF) Design Central Composite Face Centered (CCF) design is an example of an RSM that is widely used for fitting a second-order response surface (Mutnury) [6]. CCF involves use of a two-level factorial combined with axial points, factorial points, and center runs. The factorial points represent variance-optimal design for a first order and center runs provide information about the existence of curvature in the system (Zhang) [7]. If curvature is found in the system, the addition of axial points allows for efficient estimation of the pure quadratic terms. Therefore the CCF design is useful for experiments when there is need to fit a second order response surface 3 EXPERIMENTAL METHODOLOGY 3.1 Sources of Data Seroprevalence data studied was obtained from the 2007 South African antenatal data, supplied by the National Department of Health of South Africa (Department of Health) [1]. The data consisted of about 32 000 subjects that attended antenatal clinics for the first time across the nine provinces of South Africa in 2007. 252-4

3.2 Research Tools CIE42 Proceedings, 16-18 July 2012, Cape Town, South Africa 2012 CIE & SAIIE This research utilized the following research tools: 1. Design Expert V8 Software (Design Expert) [8] 2. SAS 9.3, an integrated system of software products provided by SAS Institute Inc. 3. Essential Regression and Experimental Design, version 2.2 (Gibsonia, PA) 3.3 Sampling Procedure To facilitate the experimental design, the data was completely randomized, and this process was undertaken as a preprocessing technique to reduce bias in the design of experiment. 3.4 Missing Data Out of the total of 31 808 cases from the 2007 South African antenatal seroprevalence database, 21 646 (68%) cases were found to be complete. 10 162 (32%) cases were incomplete and thus discarded. 3.5 Variables The variables used in the study were parity, education, mothage, fathage and HIV status. The integer value representing level of education stands for the highest grade successfully completed, with 13 representing tertiary education. Parity represents the number of times the individual has given birth. Parity is important as it shows the reproductive activity as well as reproductive health state of the women. The HIV status is binary coded; a 1 represents positive status, while a 0 represents a negative status. 3.6 Experimental Design In this study, the aim was to use a Central Composite Face Centered (CCF) design to study the individual and interaction effects of demographic characteristics on the HIV status of a pregnant mother using seroprevalence data. The CCF design with four factors and one response variable was developed as shown in Table 3. A two factor-interaction (2FI) design model was used, with 21 runs and no blocks. -1 and +1 denote the minimum and maximum levels of factors respectively. Table 3: The CCF Design Matrix with 4 Factors, 1 Response Variable and 4 Center Points Factors Response Run Mothage Fathage Education Parity HIV 1 1-1 -1 1-2 0 0 0 0 0.34 3-1 1-1 1 0.13 4-1 1 1 1-5 0 0 0 0 0.34 6 0 1 0 0 0.30 7 1 0 0 0-8 0 0 0 0 0.34 9 0 0 0 1 0.31 252-5

Factors Response Run Mothage Fathage Education Parity HIV 3.6.1 Design Matrix Evaluation Degrees of Freedom 10-1 -1-1 -1 0.14 11 1-1 1 1 0.21 12-1 -1 1-1 0.00 13 0 0 0 0 0.34 14 0-1 0 0 0.37 15 0 0 1 0 0.33 16 0 0 0-1 0.30 17 0 0-1 0-18 -1 0 0 0 0.10 19 1 1 1-1 0.36 20 1 1-1 -1-21 0 0 0 0 0.34 Design matrix evaluation showed that there were no aliases for the 2FI model and the degrees of freedom for the matrix are shown in Table 4. As a rule of thumb, a minimum of 3 lack-of-fit df and 4 pure error df ensure a valid lack of fit test. Fewer df tend to lead to a test that may not detect lack of fit (Design Expert) [8]. Standard Errors Table 4: Degrees of Freedom for matrix evaluation Model 10 Residuals 10 Lack of Fit 6 Pure Error 4 Corr total 20 The standard errors of the design are shown in figure 3 and these errors are larger at the edges of the design. This therefore shows that it is advisable to work well within the design margins to achieve a greater degree of accuracy. 252-6

Std Error of Design 1.000 0.800 0.600 0.400 0.200 0.000 1.00 1.00 0.50 0.50 0.00 0.00 B: fathage -0.50-1.00-1.00-0.50 A: mothage Variance Inflation Factor (VIF) Figure 3: 3D Plot of standard error of design The Variance Inflation Factor (VIF) quantifies the severity of multicollinearity in an ordinary least squares regression analysis. It provides an index that measures how much the variance of an estimated regression coefficient is increased because of collinearity. Therefore, VIF values should be ideally 1 and values greater than 10 indicate that coefficients are poorly estimated due to multicollinearity (Design Expert) [8]. The VIF values in Table 5 indicate that coefficients of individual demographic characteristics and their interactions are estimated adequately without multicollinearity. However, quadratic terms displayed a higher degree of multicollinearity. Table 5: Signal to noise ratio with the design matrix Term VIF R i Squared A 1.0 0.0 B 1.0 0.0 C 1.0 0.0 D 1.0 0.0 E 1.0 0.0 AB 1.0 0.0 AC 1.0 0.0 AD 1.0 0.0 AE 1.0 0.0 BC 1.0 0.0 252-7

Ri- squared Term VIF R i Squared BD 1.0 0.0 BE 1.0 0.0 CD 1.0 0.0 CE 1.0 0.0 DE 1.0 0.0 A 2 4 0.77 B 2 4 0.77 C 2 4 0.77 D 2 4 0.77 E 2 4 0.77 In general, high R i -squared values mean the terms are correlated with each other, leading to poor model. For this experiment, low R i -squared values were obtained for individual factors and their interactions but higher Ri-squared values were obtained for quadratic terms as shown in Table 5. Fraction of Design Space (FDS) FDS curve (figure 4) is the percentage of the design space volume containing a given standard error of prediction or less. Flatter FDS curve means that the overall prediction error is constant. In general the larger the standard error of prediction, the less likely the results can be repeated, and the less likely that a significant effect will be detected. 1.000 FDS Graph Std Error M ean 0.800 0.600 0.400 0.200 0.000 0.00 0.20 0.40 0.60 0.80 1.00 Fraction of Design Space Figure 4: FDS Plot of the Standard Error over the Design Space 252-8

3.6.2 Choice of Levels for the Factors Table 6: Factor Levels Factor Parity (No. of children) Education (Grades) Levels -1 0 1 0 1 > 2 < 8 9-11 12-13 Mothage (years) Fathage (years) < 20 21-29 < 24 25-33 > 30 > 34 4 RESULTS 4.1 Response Transformations A ratio of maximum to minimum response greater than 10 implies that transformation is required. However as shown in Table 7, ratios less than 10 indicate that power transformation will have no effect, hence the response parameter (HIV) and response terms were not transformed for this study. Table 7: Response Ratio Minimum Maximum Response (HIV) 0.09 0.33 Ratio 0.33/0.00 = 0 4.2 Fit Summary 4.2.1 Model Summary Statistics Table 8: Model Summary Statistics Source Sequential p-value Lackof-fit p-value R 2 Adjusted R 2 Adeq. Precisi on Linear 0.0252 0.0002 0.64 0.50 2FI 0.0005 0.0103 0.99 0.98 25 The R 2 and adjusted R 2 statistics of 2FI model are impressively high at 0.99 and 0.98 respectively, as shown in Table 8. High R 2 values imply that a large proportion of variation 252-9

in the observed values is explained by the model. In addition, the lack-of-fit value of the 2FI of 0.0103 indicates that model lack-of-fit is not significant. 4.2.2 ANOVA for 2FI Response Surface From the ANOVA results (Table 9), it is evident that the mother s age and the father s age are significant terms in the 2FI model, while educational level and parity individually are not. However the non-significant individual terms tend to be significant in two-way interactions with other demographic characteristics. The model F-value (Table 9) of 63.77 implies that the model is significant, and hence there is only a 0.01% chance that this model F-value could be due to noise. Table 9: ANOVA Results Source Sum of Squares df Mean square F value P value Model 0.18 9 0.2 63.77 0.0001 A- Mothage B- Fathage C- Education 0.047 1 0.047 146.5 <0.0001 0.002 1 0.002 7.72 0.0390 0.000001 1 0.000001 0.004 0.9498 D- Parity 0.00005 1 0.00005 0.16 0.7079 AB 0.11 1 0.11 33.11 0.0022 AC 0.038 1 0.038 118.44 0.0001 AD 0.007 1 0.007 20.58 0.0062 BC 0.024 1 0.024 75.8 0.0003 BD 0.011 1 0.011 33.63 0.0021 CD 0.000 0 0.000 Adeq. precision is used to measure the signal to noise ratio. A ratio greater than 4 is desirable and for this experiment a ratio of 25 indicates an adequate signal. Therefore this model can be used to navigate the design space. 5 RESIDUAL ANALYSIS There are many statistical tools for model validation, but the primary tool for most process modeling applications is graphical residual analysis. The residual plots assist in examining the underlying statistical assumptions about residuals (see Table 10). Therefore residual analysis is a useful class of techniques for the evaluation of the goodness of a fitted model. One method of residual analysis is the normal plot of residuals. 252-10

Table 10: Statistical assumptions about residuals Independence Whether response variables are independent Normality Homoscedacity Linearity Whether response variables are normally distributed Whether all response variables have same variance Whether the true relationship between response and explanatory variables is a straight line 5.1 Normal Plot of Residuals The normal plot of residuals (Figure 5), evaluates whether there are outliers in the dataset. All the points lie on the diagonal, implying that the residuals constitute normally distributed noise. A curved pattern indicates non-modelled quadratic relations or incorrect transformations. Normal Plot of Residuals N o rm a l % P ro b a b ility 99 95 90 80 70 50 30 20 10 5 1-3.00-2.00-1.00 0.00 1.00 2.00 3.00 Internally Studentized Residuals Figure 5: Normal plot of residuals 6 FINAL EQUATION OF THE RESPONSE MODEL The final equation of the HIV response model was as shown below; HIV= + 0.33 + 0.23 *Mothage - 0.035 *Fathage - 0.013 *Education + 0.005 *Parity - 0.140 *Mothage*Fathage - 0.120 *Mothage*Education - 0.070 *Mothage*Parity - 0.020 *Fathage*Parity 252-11

A coefficient plot (figure 6) was drawn to represent the information provided by the 2FI response model equation. Coefficient plots tend to clearly represent the relative importance of each variable on the model equation. 0.25 0.2 0.15 0.1 0.05 0-0.05-0.1-0.15-0.2 Coefficient mothage fathage educa on parity mot*fat mot*edu mot*parity fat*parity Figure 6: Coefficient Plot of the Different Demographic Characteristics Inspection of the regression coefficients (figure 6) indicates that the two model terms, level of education and parity are not significant and can be removed from the model. 7 PERTURBATION PLOT The perturbation plot (Figure 7) compares the effect of all factors at a particular point in the design space. A steep slope or curvature in a factor shows that the response is sensitive to that factor. A relatively flat line shows insensitivity to change in that particular factor. However the perturbation plot does not show interactions. From figure 7, the perturbation plot indicates that the effects of the demographic characteristics on the response are in the order: Mothage (A) >Fathage (B) > Education (C) > Parity (D) Perturbation 0.6 A 0.5 H IV 0.4 0.3 B C D D C B 0.2 0.1 A -1.000-0.500 0.000 0.500 1.000 Deviation from Reference Point (Coded Units) Figure 7: Perturbation Plot 252-12

8 3D RESPONSE SURFACE PLOT Figure 8 shows the 3D plot of the influences of mothage and fathage on HIV response. The highest rate of HIV is observed at the highest age of the mother and lowest age of the father. 1 0.8 0.6 0.4 H IV 0.2 0-0.2 1.00 1.00 0.50 0.50 0.00 0.00 B: fathage -0.50-1.00-1.00-0.50 A: mothage Figure 8: 3D Response Surface plot 9 DISCUSSION A central composite face centered (CCF) design was found to be suitable for studying the involvement of demographic characteristics in the determination of the HIV status of pregnant women attending antenatal clinic in South Africa. The 2FI polynomial function for mothage, fathage, education, and parity obtained using StatEase Design Expert was found to be statistically significant. The measured HIV prevalence response was in close agreement with the predicted values, as shown in Figure 9, below. 0.40 Predicted vs. Actual 0.35 5 Predicted 0.30 0.25 0.20 0.15 0.10 0.10 0.15 0.20 0.25 0.30 0.35 0.40 Actual Figure 9: Plot of Predicted vs. Actual Response 252-13

10 CONCLUSION CIE42 Proceedings, 16-18 July 2012, Cape Town, South Africa 2012 CIE & SAIIE The CCF design therefore confirmed the results obtained by fractional factorial design (Sibanda) [3], that mother s age had the greatest effect on the HIV status of an antenatal clinic attendee. However, the CCF further demonstrated that interaction of factors had a significant effect on an individual s HIV status. The R 2 value of the predictive model improved from 33.5% (fractional factorial in the previous study) to 98% (CCF). The latter result demonstrated that the relationship between the demographic characteristics and HIV response were better modeled by a 2FI function. 11 ACKNOWLEDGEMENTS Wilbert Sibanda acknowledges doctoral funding from South African Centre for Epidemiological Modelling (SACEMA), Medical Research Council (MRC) and North-West University. Special thanks to Cathrine Tlaleng Sibanda and the National Department of Health (South Africa) for the antenatal seroprevalence data (2006-2007). 12 REFERENCES [1] Department of Health. 2010. National Antenatal Sentinel HIV and Syphilis Prevalence in South Africa. [2] Department of Health. 2010. Protocol for implementing the National Antenatal Sentinel HIV and Syphilis Prevalence Survey in South Africa. [3] Sibanda, W. 2011. Application of Two-level Fractional Factorial Design to Determine and Optimize the Effect of Demographic Characteristics on HIV Prevalence using the 2006 South African Annual Antenatal HIV and Syphilis Seroprevalence data, International Journal of Computer Applications, 35 (12). [4] Montgomery, D.C. 2008. Design and Analysis of Experiments, John Wiley and Sons. [5] Myers, R.H. 2002. Response Surface Methodology: Process and Product Optimization Using Designed Experiments, 2 nd Edition, John Wiley and Sons. [6] Mutnury, B. 2011. Modeling and Characterization of High Speed Interfaces in Blade and Rack Servers Using Response Surface Model, Electronic Components and Technology Conference (ECTC). [7] Zhang, Z. 2008. Comparison about the Three Central Composite Designs with Simulation, International Conference on Advanced Computer Control Advanced Computer Control (ICACC). [8] Design Expert 8.0.71. StatEase software. 252-14