UNIVERISTY OF KWAZULU-NATAL, PIETERMARITZBURG SCHOOL OF MATHEMATICS, STATISTICS AND COMPUTER SCIENCE

Similar documents
International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago

CONSTRUCTION OF STOCHASTIC MODEL FOR TIME TO DENGUE VIRUS TRANSMISSION WITH EXPONENTIAL DISTRIBUTION

NHS Outcomes Framework

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana

Optimal Planning of Charging Station for Phased Electric Vehicle *

Richard Williams Notre Dame Sociology Meetings of the European Survey Research Association Ljubljana,

PSI Tuberculosis Health Impact Estimation Model. Warren Stevens and David Jeffries Research & Metrics, Population Services International

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

Fitsum Zewdu, Junior Research Fellow. Working Paper No 3/ 2010

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS

Economic crisis and follow-up of the conditions that define metabolic syndrome in a cohort of Catalonia,

Copy Number Variation Methods and Data

Association between cholesterol and cardiac parameters.

Statistical Analysis on Infectious Diseases in Dubai, UAE

Gurprit Grover and Dulumoni Das* Department of Statistics, Faculty of Mathematical Sciences, University of Delhi, Delhi, India.

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

Validation of the Gravity Model in Predicting the Global Spread of Influenza

Project title: Mathematical Models of Fish Populations in Marine Reserves

A GEOGRAPHICAL AND STATISTICAL ANALYSIS OF LEUKEMIA DEATHS RELATING TO NUCLEAR POWER PLANTS. Whitney Thompson, Sarah McGinnis, Darius McDaniel,

Price linkages in value chains: methodology

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of

What Determines Attitude Improvements? Does Religiosity Help?

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data

An Introduction to Modern Measurement Theory

Appendix F: The Grant Impact for SBIR Mills

THIS IS AN OFFICIAL NH DHHS HEALTH ALERT

The Effect of Fish Farmers Association on Technical Efficiency: An Application of Propensity Score Matching Analysis

HIV/AIDS AND POVERTY IN SOUTH AFRICA: A BAYESIAN ESTIMATION OF SELECTION MODELS WITH CORRELATED FIXED-EFFECTS

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures

Does reporting heterogeneity bias the measurement of health disparities?

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx

THE NATURAL HISTORY AND THE EFFECT OF PIVMECILLINAM IN LOWER URINARY TRACT INFECTION.

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy

The effect of salvage therapy on survival in a longitudinal study with treatment by indication

Knowledge and perception on tuberculosis transmission in Tanzania: Multinomial logistic regression analysis of secondary data

Disease Mapping for Stomach Cancer in Libya Based on Besag York Mollié (BYM) Model

National Polyp Study data: evidence for regression of adenomas

Rainbow trout survival and capture probabilities in the upper Rangitikei River, New Zealand

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis

Using Past Queries for Resource Selection in Distributed Information Retrieval

Comparison of methods for modelling a count outcome with excess zeros: an application to Activities of Daily Living (ADL-s)

Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods

Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo

Lateral Transfer Data Report. Principal Investigator: Andrea Baptiste, MA, OT, CIE Co-Investigator: Kay Steadman, MA, OTR, CHSP. Executive Summary:

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect

Saeed Ghanbari, Seyyed Mohammad Taghi Ayatollahi*, Najaf Zare

ALMALAUREA WORKING PAPERS no. 9

Testing the Fetal Origins Hypothesis in a Developing Country: Evidence from the 1918 Influenza Pandemic

Supplement. PART A: Methods. In order to estimate population-wide HIV transmission and progression rates, we

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA

Kim M Iburg Joshua A Salomon Ajay Tandon Christopher JL Murray. Global Programme on Evidence for Health Policy Discussion Paper No.

Evaluation of two release operations at Bonneville Dam on the smolt-to-adult survival of Spring Creek National Fish Hatchery fall Chinook salmon

Intergenerational Use of and Attitudes Toward Food Labels in Louisiana

RENAL FUNCTION AND ACE INHIBITORS IN RENAL ARTERY STENOSISA/adbon et al. 651

Concentration of teicoplanin in the serum of adults with end stage chronic renal failure undergoing treatment for infection

NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 TIANHONG ZHOU

Normal variation in the length of the luteal phase of the menstrual cycle: identification of the short luteal phase

A Meta-Analysis of the Effect of Education on Social Capital

Discussion Papers In Economics And Business

TOPICS IN HEALTH ECONOMETRICS

Impact of Imputation of Missing Data on Estimation of Survival Rates: An Example in Breast Cancer

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field

Statistical models for predicting number of involved nodes in breast cancer patients

Are National School Lunch Program Participants More Likely to be Obese? Dealing with Identification

SMALL AREA CLUSTERING OF CASES OF PNEUMOCOCCAL BACTEREMIA.

Estimation of Relative Survival Based on Cancer Registry Data

Physical Model for the Evolution of the Genetic Code

Analysis of Malaria Incidence using Quasi-Poisson Regression Model: Evidence from Obuasi Municipality, Ghana

A STOCHASTIC EQUATION-BASED MODEL OF THE VALUE OF INTERNATIONAL AIR-TRAVEL RESTRICTIONS FOR CONTROLLING PANDEMIC FLU

Causal inference in nonexperimental studies typically

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015

Cancer morbidity in ulcerative colitis

BIOSTATISTICS. Lecture 1 Data Presentation and Descriptive Statistics. dr. Petr Nazarov

Risk Misperception and Selection in Insurance Markets: An Application to Demand for Cancer Insurance

International Journal of Business and Economic Development Vol. 3 Number 1 March 2015

STAGE-STRUCTURED POPULATION DYNAMICS OF AEDES AEGYPTI

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters

Optimizing an HIV testing program using a system dynamics model of the continuum of care

BIOSTATISTICS. Lecture 1 Data Presentation and Descriptive Statistics. dr. Petr Nazarov

Desperation or Desire? The Role of Risk Aversion in Marriage. Christy Spivey, Ph.D. * forthcoming, Economic Inquiry. Abstract

VALIDATION TOOL THE SETTING OF THE COMMUNITY PHARMACY

Are Drinkers Prone to Engage in Risky Sexual Behaviors?

EVALUATION OF BULK MODULUS AND RING DIAMETER OF SOME TELLURITE GLASS SYSTEMS

Addressing empirical challenges related to the incentive compatibility of stated preference methods

Can Subjective Questions on Economic Welfare Be Trusted?

Birol, Ekin; Asare-Marfo, Dorene; Ayele, Gezahegn; Mensa-Bonsu, Akwasi; Ndirangu, Lydia; Okpukpara, Benjamin; Roy, Devesh; and Yakhshilikov, Yorbol

I T L S. WORKING PAPER ITLS-WP Social exclusion and the value of mobility. INSTITUTE of TRANSPORT and LOGISTICS STUDIES

Study and Comparison of Various Techniques of Image Edge Detection

Non-parametric Survival Analysis for Breast Cancer Using nonmedical

Length of Hospital Stay After Acute Myocardial Infarction in the Myocardial Infarction Triage and Intervention (MITI) Project Registry

The effect of travel restrictions on the spread of a moderately contagious disease

Maize Varieties Combination Model of Multi-factor. and Implement

The Influence of the Isomerization Reactions on the Soybean Oil Hydrogenation Process

CORRUPTION PERCEPTIONS IN RUSSIA: ECONOMIC OR SOCIAL ISSUE?

The Impact of Private Sector Provision of Home-Based Services for Older People in Beijing

Transcription:

UNIVERISTY OF KWAZULU-NATAL, PIETERMARITZBURG SCHOOL OF MATHEMATICS, STATISTICS AND COMPUTER SCIENCE A COMPLEX SURVEY DATA ANALYSIS OF TB AND HIV MORTALITY IN SOUTH AFRICA By JOIE LEA MURORUNKWERE STUDENT NUMBER: 205980 A thess submtted n fulflment of the academc requrements for the degree of MASTER OF SCIENCE In APPLIED STATISTICS 202

DECLARATION I, the undersgned, hereby declare that the work contaned n ths thess s my orgnal work, and that any work done by others or by myself prevously has been acknowledged and referenced. November, 202. Ms JOIE LEA MURORUNKWERE (205980) Date Supervsor: Prof. HENRY MWAMBI Date Co-Supervsor: Dr. ACHIA THOMAS Date

DEDICATION Almghty God To my famly To all my frends

ACKNOWLEDGEMENTS The greatest of all thanks goes to the Almghty God who has sustaned me throughout all these years. The knowledge and wsdom all come from hm. Ths work could not be done wthout the help of many people whom, we hereby acknowledge. My specal thanks go to all who have contrbuted to my educaton to date. In a specal way, I would lke to thank my supervsor Professor Henry Mwamb for hs tremendous hard work wth me to complete ths thess successfully. Thank you for your nvaluable gudance, unfathomable support, encouragement, and patence all the tme. I am grateful to my co-supervsor, Dr. Acha Thomas wthout you ths would have not been what s today. Thank you for your fatherly role and excellent assstance for the successful submsson of my thess. My sncere grattude s also expressed to the staff of Statstcs of The Unversty of KwaZulu- Natal. I would lke to thank Statstcs South Afrca for allowng me to use ther rch dataset n ths thess. I am hghly ndebted to the ACTs PMB members for the assstance and cooperaton I receved from them. Thanks to my fellow Statstcs Postgraduates students who, wllngly came forward wth elaborate suggestons durng ths undertakng. To my famly, late parents, cousns, brothers and ssters: thank you for supportng me even n dffcult tmes, you were my pllar of strength. I would lke to extend my deepest and most heartfelt grattude to Father Incmatata Oreste for hs fatherly role, love and support that you have gven me snce what seems lke forever. My sncere grattude goes to Carmel Ssters especally Sster Lberatha. I would also lke to thank my wonderful frends, Mukandol Cartas, Nadne Ineza, Egda Uwzeyemarya and Muna Wllam for ther support, wse advses, prayers and love. I would lke to acknowledge all those who contrbuted n one way or another to the completon of ths thess.

ABSTRACT Many countres n the world record annual summary statstcs such as economc ndcators lke Gross Domestc Product (GDP) and vtal statstcs for example the number of brths and deaths. In ths thess we focus on mortalty data from varous causes ncludng Tuberculoss (TB) and HIV. TB s an nfectous dsease caused by bactera called Mycobacterum tuberculoss. It s the man cause of death n the world among all nfectous dseases. An addtonal complexty s that HIV/AIDS acts as a catalyst to the occurrence of TB. Vadyanathan and Sngh revealed that people nfected wth mycobacterum tuberculoss alone have an approxmately 0% lfe tme rsk of developng actve TB, compared to 60% or more n persons co-nfected wth HIV and mycobacterum tuberculoss. South Afrca was ranked seventh hghest by the World Health Organzaton among the 22 TB hgh burden countres n the world and fourth hghest n Afrca. The research work n ths thess uses the 2007 Statstcs South Afrca (STATSSA) data on TB and HIV as the prmary cause of death to buld statstcal models that can be used to nvestgate factors assocated wth death due to TB. Logstc regresson, Survey Logstc regresson and generalzed lnear models (GLM) wll be used to assess the effect of rsk factors or predctors to the probablty of deaths assocated wth TB and HIV. Ths study wll be guded by a theoretcal approach to understandng factors assocated wth TB and HIV deaths. Bayesan modelng usng WINBUGS wll be used to assess spatal modelng of relatve rsk and spatal pror dstrbutons for dsease mappng models. Of the 6532 deceased, 54697 (89%) ded from natural death, 479 (2%) were stllborn and 5426 (9%) from non-natural death possbly accdents, murder, sucde. Among those who ded from natural death and dsease, 65052 (2%) ded of TB and 378 (2%) ded of HIV. The results of the analyss revealed rsk factors assocated wth TB and HIV mortalty. v

LIST OF SYMBOLS AND NOTATIONS 2,n : Ch square dstrbuton wth n degree of freedom ˆ : Estmator of 2 ˆ : Estmator of populaton varance x: Expectaton of x y: Expectaton of y F, m, n : F dstrbuton wth m and n degrees of freedom : 2 : : ˆ : Populaton mean Populaton varance Proportonal parameter Estmator of regresson coeffcent ˆ : Estmator of regresson coeffcent X : : Sample mean Sgnfcance level of a test or probablty of Type I error t,n : t- dstrbuton wth n degree of freedom ˆ : Unbased estmator of the parameter Z : pr x / y: Expectaton of x gven y Var x: Varance of x f x, x2,..., x n ; n: Sample sze : Jont probablty dstrbuton v

LIST OF ABREVIATIONS AID: Acqured Immune Defcency Syndrome DOTS: Drectly Observed Treatment Shot course GDP: Gross Domestc Product GLM: Generalzed Lnear Models GLMM: Generalzed Lnear Mxed Models HIV: Human Immunodefcency Vrus OR: Odds rato SLR: Survey Logstc Regresson STATA: Statstc Analyss STATSSA: Statstcs South Afrca TB: Tuberculoss WHO: World Health Organzaton v

Table of Contents DECLARATION... DEDICATION... ACKNOWLEDGEMENTS... ABSTRACT... v LIST OF SYMBOLS AND NOTATIONS...v LIST OF ABREVIATIONS... v LIST OF TABLES... x. Background....2 Lterature Revew... 2.2. The Problem... 2.2.2 TB and HIV Interacton... 3.2.3 Theoretcal Framework... 6.2.3. Bologcal factors....... 6.2.3.2 Socal-economc factors.... 7.2.3.3 Envronmental factors.....7.2.3.4 Socal-demographc factors.....7.3 Methodology used n the study... 8.3. Collecton of data... 8.3.2 Statstcal Analyss and Statstcal Software... 8.4 Objectves of ths thess... 8.4. General objectve... 8.5 Overvew of the thess... 9 EXPLORATORY DATA ANALYSIS... 0 2. Introducton... 0 2.2 Data Descrpton... 0 2.3 Exploratory Analyss of TB Deaths Data... 2.5 Interacton of TB and HIV... 6 2.6 Summary... 7 2.6. Tuberculoss... 7 2.6.2 Human Immune Vrus... 8 GENERALIZED LINEAR MODELS... 9 v

3. The Exponental Famly... 9 3.2 Estmaton of Parameters... 23 3.3 Model Checkng... 25 3.3.Goodness-of-ft Test... 25 3.4 Model Selecton and Dagnostcs... 26 3.4. Model Selecton... 26 3.5 Logstc Regresson Model (LRM)... 26 3.5. Fttng a logstc regresson model... 28 3.5.2 Odds ratos... 28 3.6 Model Selecton and Dagnostcs for LRM... 3 3.7 Model checkng... 32 3.7. Goodness-of-ft Test... 32 3.8 Cluster Survey Logstc Regresson Model (CSLRM)... 34 3.8. Introducton... 34 3.8.2 The Model (CSLRM)... 35 3.8.3 Estmaton of parameters for CSLRM... 35 APPLICATION OF THE LOGISTIC REGRESSION MODELS... 38 4. Interpretaton of Model results for TB Death Data... 38 4.2 Interpretaton of Model results for HIV Death Data... 4 4.3 Interpretaton of Multple Logstc Regresson Model for TB Death... 43 4.4 Interpretaton of Multple Logstc Regresson Model for HIV Death... 46 4.5 TB death and HIV cause of death Co-mortalty... 48 BAYESIAN MODELLING AND MAPPING USING WINBUGS... 52 5. Introducton... 52 5.2 Spatal models for smoothng area data... 53 5.3 Model fttng and nterpretaton of results... 54 5.3. Posson-Gamma model... 55 5.3.2 Posson-Gamma wth hyper parameters for and... 56 5.3.3 Posson-Gamma spatal movng average (convoluton) model... 57 5.3.4 Posson Gamma wth spatal condtonal autoregressve... 58 5.4 MCMC methods... 58 5.4. Samplng the Hyper-parameter... 60 5.5 Applcaton and Interpretaton of Model result for TB Death Data... 60 5.6 Dscusson of TB Model Results... 62 5.7 Applcaton and Interpretaton of Model result for HIV Death Data... 63 DISCUSSION AND CONCLUSION... 65 6. Tuberculoss... 65 6.2 Human Immune Vrus (HIV)... 66 6.3 Concluson... 67 BIBLIOGRAPHY... 70 APPENDIXES... 74 v

Appendx A... 74 A. STATA Procedures... 74 A.. Model Selecton Usng STATA Code... 74 A..2 Model Fttng Usng STATA Statements... 80 Appendx B... 82 APPENDIX C... 83 C. Posson-gamma/Test Code... 83 C.2 Posson GLMMs/Test Code... 85 C.3 Spatal CAR Model /Test Code... 87 C.4 Spatal Model wth convoluton prors /Test Code... 89 APPENDIX D... 9 HIV DEATH MODELING CODES AND HISTORY PLOTS... 9 D. Posson-gamma /Test CODE... 9 D.2 Posson-GLMMs /Test CODE... 92 D.3 Spatal CAR Model /Test Code... 94 D.4 Spatal Model wth convoluton prors /Test CODE... 95 x

LIST OF FIGURES FIGURE : FACTORS ASSOCIATED WITH TB MORTALITY... 6 FIGURE 2.: PERCENTAGE DISTRIBUTION OF TB DEATHS BY PROVINCE OF DEATH OCCURRENCE, 2007... 22 FIGURE 2.2: THE PERCENTAGE DISTRIBUTION OF HIV DEATHS BY PROVINCE OF DEATH OCCURRENCE, 2007...ERROR! BOOKMARK NOT DEFINED. FIGURE 2.3: THE JOINT DISTRIBUTION OF TB DEATHS BY HIV DEATHS...7 FIGURE 5: RR OF TB DEATH MAPPED IN 9 PROVINCE OF SOUTH AFRICA: (TOP LEFT) RR; (TOP RIGHT) 2.5% LOWER LIMIT FOR THE RR; (BOTTOM LEFT) 97.5% UPPER LIMIT FOR THE RR...59 FIGURE 5.: SMR OF HIV DEATH MAPPED IN 9 PROVINCE OF SOUTH AFRICA....ERROR! BOOKMARK NOT DEFINED.4 FIGURE 5.2 : TB HISTORY PLOT BY FITTING POISSON-GAMMA MODEL....84 FIGURE 5.3 : TB HISTORY PLOT BY FITTING POISSON-GLMM MODEL....86 FIGURE 5.4 : TB HISTORY PLOT BY FITTING SPATIAL CAR MODEL....88 FIGURE 5.5 : TB HISTORY PLOT BY FITTING SPATIAL CONVOLUTION MODEL....90 FIGURE 5.6 : HIV HISTORY PLOT BY FITTING POISSON-GAMMA...92 FIGURE 5.7 : HIV HISTORY PLOT BY FITTING POISSON-GLMMS...93 FIGURE 5.8 : HIV HISTORY PLOT BY FITTING SPATIAL CAR MODEL...95 FIGURE 5.9: HIV HISTORY PLOT BY FITTING SPATIAL CONVOLUTION MODEL...97 x

LIST OF TABLES TABLE 2.0: DATA DESCRIPTION... TABLE 2.: PERCENTAGE OF TB AND NON TB DEATHS, WITH P-VALUES FOR CHI-SQUARE TEST, ACCORDING TO SELECTED DEMOGRAPHIC, SOCIAL, HEALTH STATUS AND LIFE STYLE... 33 TABLE 2.2: PERCENTAGE OF NON HIV DEATH AND HIV CAUSE OF DEATH, WITH P-VALUES FOR CHI-SQUARE TEST, ACCORDING TO SELECTED DEMOGRAPHIC, SOCIAL, HEALTH STATUS AND LIFE STYLE...5 TABLE 2.3: TWO-WAY TABLE SHOWING THE JOINT DISTRIBUTION OF TB DEATHS BY HIV DEATHS...6 TABLE 3: COMMMON DISTRIBUTIONS WITH CORRESPONDING LINK FUNCTIONS FOR CONSTRUCTING GENERALIZED LINEAR MODEL....22 TABLE 4.: LOGISTIC REGRESSION... 400 TABLE 4.2: SIMPLE AND SURVEY LOGISTIC REGRESSION...42 TABLE 4.3: MULTIPLE LOGISTIC REGRESSION FOR TB...45 TABLE 4.4: MULTIPLE AND SURVEY LOGISTIC REGRESSION FOR HIV...47 TABLE 4.5: TB DEATH AND HIV CAUSE OF DEATH CO-MORTALITY...49 TABLE 4.6: SURVEY LOGISTIC REGRESSION FOR TB DEATH AND HIV DEATH...5 TABLE 5: TB AND HIV DEATHS DATA 6 TABLE 5.: PARAMETER ESTIMATE OF TB DEATH FROM FOUR MODELS...6 TABLE 5.2: DIC VALUES...62 TABLE 5.3: DIC VALUE FOR HIV DEATH...63 TABLE 5.4: PARAMETER ESTIMATE OF HIV DEATH...64 x

CHAPTER ONE INTRODUCTION AND LITERATURE REVIEW. Background Many countres n the world record annual summary statstcs for economc ndcators (such as Gross Domestc Product (GDP) and unemployment rate under Mllennum Development Goals (MDGs) and vtal statstcs (such as the number of brths and deaths). In partcular, Statstcs South Afrca (STATSSA) collects annual data on natonwde number of deaths and assocated causes. Tuberculoss (tubercle bacllus- TB) s an nfectous dsease caused by bactera called Mycobacterum tuberculoss. These bactera attack manly the lungs (pulmonary TB), but also at lower extent other parts of the body such as the central nervous system, crculatory system, and the skeletal system (Khaled, 2008). TB s the man cause of death n the world among all nfectous dseases (Herchlne and Amorosa, 200). TB s classfed as latent when t s not yet causng llness or actve when llness has already been developed. Detals can be found n Mzolo (2009). Despte advances n TB treatments whch dramatcally reduced TB cases up to the 980s, the appearance of HIV/AIDS durng the 980s led to a rapd ncrease of TB, especally n the poorest parts of the world, manly n Afrca (Wllams and Dye, 2003). HIV/AIDS acts as catalyst to the occurrence of TB; hence t can dramatcally ncrease the proporton of actve TB cases. A study done n Inda by Vadyanathan and Sngh (2003) revealed that people nfected wth mycobacterum tuberculoss alone have an approxmately 0% lfe tme rsk of developng actve TB, compared to 60% or more n persons co-mortalty wth HIV and mycobacterum tuberculoss. In other words, regons wth hgh rates of HIV/AIDS cases have

also hgh rates of actve TB. A short report of WHO (2009) provdes the followng frghtenng data: In 2008, there were an estmated 8.9 9.9 mllon ncdent cases of TB, 9.6 3.3 mllon prevalent cases of TB,..7 mllon deaths from TB among HIV-negatve people and an addtonal 0.45 0.62 mllon TB deaths among HIV-cause of death people (classfed as HIV deaths n the Internatonal Statstcal Classfcaton of Dseases), wth best estmates of 9.4 mllon,. mllon,.3 mllon and 0.52 mllon, respectvely. Lawn (200) states that because of HIV and TB co-nfecton, the WHO DOTS (Drectly Observed Treatment Short course) program has faled to control TB n Sub-Saharan Afrca, even n countres wth good model of TB control such as Tanzana and Malaw. The research work n ths thess uses the 2007 Statstcs South Afrca (STATSSA) data on TB and HIV as the prmary causes of death to buld statstcal models that can be used to nvestgate factors assocated wth death due to TB and HIV..2 Lterature Revew.2. The Problem Accordng to Snger (997) the battle wth TB n South Afrca poses mmense challenge to the government. The annual number of new TB cases n South Afrca averages at 377 per 00,000 members of the populaton. Comparatvely n other hard-ht parts of the world, the average s only about 200 per 00,000. Rght now, approxmately 0,000 people de of TB n South Afrca every year. Snger (997) argues further that n South Afrca TB tends to affect the poorer populatons, who have hstorcally suffered a low standard of health care. But poverty s not the only contrbutng factor. Nearly two-thrds of the populaton of the country s nfected wth the 2

TB germ, thus approxmately 60,000 South Afrcans from all walks of lfe become ll wth TB every year. In 2006, South Afrca was ranked seventh hghest by the WHO among the 22 TB hgh burden countres n the world and fourth hghest n Afrca. In general, ensurng that patents adhere and complete ther TB treatment has presented major challenges; treatment takes sx to eght months, and patents often dscontnue treatment before they are cured. The prmary goal of the new natonal TB control programme s to ensure a hgh cure rate of nfectous TB patents the frst tme around by nsurng that they complete ther treatment. In the strategc prortes for the Natonal Health System set by the Department of Health for 2004-2009, the TB control programme s cted as achevng lmted success, gven ts synergstc relatonshp wth Human Immunodefcency Vrus (HIV) (SA, DoH, 2003). In South Afrca responsblty for publc health care s devolved to provnces among whch the qualty of TB control vares greatly. TB Treatment success remans low compared wth other Afrcan countres wth a hgher prevalence of HIV and wth consderably fewer resources (SA, DoH, 2003)..2.2 TB and HIV Interacton Accordng to the report publshed by Wllams and Dye n 2003, HIV/AIDS has dramatcally ncreased the ncdence of TB n Sub-Saharan Afrca where up to 60% of TB patents are confected wth HIV and each year 200,000 TB deaths are attrbuted to HIV co-nfecton. In ther report, they also ndcate that antretrovral (ARV) drugs can prevent TB by preservng mmunty and that early therapy, plus hgh levels of coverage and complance, wll be needed to avert a sgnfcant fracton of TB cases. However, they assert that ARVs could enhance the treatment of TB whle TB programmes provde an mportant entry pont for the treatment of HIV/AIDS. Corbett et al (2003) consders the decades leadng up to 980 when TB was n the declne throughout the world. However, as publshed by World Health Organzaton (WHO), n ther 3

report on Global Tuberculoss Control: Survellance, Plannng, Fnancng, n 2003, 30% of people n Sub-Saharan Afrca are latently nfected wth Mycobacterum Tuberculoss and the rapd spread of HIV durng the 980s and 990s led to a smlarly rapd ncrease n the ncdence of TB, wth notfcaton rates n some countres ncreasng by more than fve tmes n ten years. The report by UNAIDS n 2003 on the global HIV/AIDS epdemc presents the fact that HIV/AIDS control strateges have not substantally reduced the mortalty of HIV n the Sub- Saharan Afrca. Ravglone and Po, (2002) suggest that the declne n mmunty n people confected wth HIV and TB has meant that even good TB control programmes based on shortcourse chemotherapy have not been suffcent to contan the rsng ncdence of TB (De Cock and Chasson, 999). Cohen (2002) argued that the development of new classes of ARV drugs, the avalablty of cheap generc equvalence, and the ncreasng commtment of nternatonal donors to makng ARV drugs wdely avalable n poor countres should all help to reduce HIVrelated llness and deaths over the subsequent years (Tan, Upshur, and Ford, 2003). Whether ARVs have a sgnfcant mpact on TB depends on ther effcacy n preventng dsease progresson and prolongng lfe on populaton coverage and patent complance. The mpact also depends on the synergy between the treatment of TB patents and the provson of ARV therapy to those patents who are HIV nfected. As the TB and HIV pandemc contnue to collde n sub-saharan Afrca resultng n ncreased ncdence and mortalty, the relatve contrbuton to dsease specfc mortalty of AIDS-related Smear-Negatve Pulmonary TB (SNPTB) as result of ncreased ncdence, under-recognton and dagnoss, and poor management practces s unknown (Getahun, Harrngton, and Nunn, 2007). In 2007 the World health Organzaton (WHO) publshed revsed recommendatons for the dagnoss of SNPTB to address the dagnostc and treatment challenges of HIV-assocated TB n 4

resource-constraned settngs. In South Afrca, more than 6% of the populaton s nfected wth HIV, and 000 people de from AIDS-related dseases each day, and two-thrds of those wth HIV also suffer from TB, because of ther weakened mmune systems (AMREF, 2008). In 2004 estmates exceeded the 50% mark for TB patents lvng wth HIV n South Afrca (Dye, 2006). Accordng to Bekker and Wood (200), South Afrca s beleved to have the most people (approxmately mllon persons) lvng wth both TB dsease and HIV mortalty. When the HIV epdemc set n, exstng rates of latent TB mortalty (LTBI) were extremely hgh n many communtes, wth over two-thrds of adults n poor South Afrcan communtes for example, beng nfected. In those wth HIV co-nfecton, subsequent rsk of developng TB through reactvaton of latent TB was extremely hgh, wth overall rates reachng as hgh as 20-30% per year n those wth the most advanced mmunodefcency. Lawn (200) state that DOTS does not reduce the very hgh susceptblty of HIV-nfected ndvduals to develop rapdly progressve dsease followng exposure, thus although DOTS reduces transmsson rsk n the communty, ths may be out-weghed many-fold by the greatly ncreased rsk of rapdly progressve dsease n HIV-nfected. Major ncreases n ncdence rates of TB may further contrbute to transmsson, although ths s off-set to some extent by the fact that HIV-assocated TB cases are generally less nfectous than dsease cases n HIV-unnfected people. Furthermore, the result that co-mortalty wth HIV sgnfcantly ncreases the rsk of developng TB was establshed by Ravglone et al. (997). However, as publshed by World Health Organzaton (WHO), on ther report n (2000), the TB and HIV co-epdemc s ncreasng and wll contnue to fuel the TB epdemc. 5

.2.3 Theoretcal Framework Ths study wll be guded by a theoretcal approach to understandng factors assocated wth TB death. Such factors can be grouped nto specfc categores as shown n the fgure below: Fgure : Factors assocated wth TB mortalty..2.3. Bologcal factors Tuberculoss (TB) s one of the leadng causes of death among ndvduals lvng wth AIDS, not only because they are more susceptble to TB, but also because TB can ncrease the rate at whch the AIDS vrus replcate. One of the frst ndcatons of HIV mortalty may be the sudden start of TB often n a ste outsde the lungs (extra-pulmonary TB). Indvduals who have TB and also HIV nfected are more lkely to de from TB than any other deaths. TB can occur at any pont n the course of progresson of the HIV dsease. The rsk of developng TB rses sharply wth declne n mmune status. HIV promotes the rapd progresson of latent TB death (LTBI) to actve dsease and s the most powerful known rsk factor for the actvaton of latent TB (Urz, Reparaz, and Sola, 2007). 6

.2.3.2 Socoeconomc factors Many studes have shown that factors that drve the TB epdemc are mostly soco-economc factors. Examples nclude educaton, occupaton, and health status just to pont out a few. TB s also assocated wth poverty. The majorty of the poor n the world are lkely to contract TB as a result of contrbutng factors such as lack of basc health servces, poor nutrton and nadequate lvng condtons. It s evdent that those who are exposed to condtons such as unemployment and lvng n crowded areas are more lkely to be nfected wth the dsease. The hgher rates n poorer sectors of socety are due not only to the poor housng and overcrowdng brought about by urbanzaton and populaton ncrease, but also attrbutable to poor dets whch lower resstance to the dsease (Collns, 98)..2.3.3 Envronmental factors Poor workng envronments may ncreases the rsk of tuberculoss. For example workng n the mnes where shafts are poorly ventlated may facltate easy spread of the TB bactera. Incdence rates n prsons and homeless shelters are hgher than that n general populaton. TB ncdence s generally hgher n urban than n rural areas. The tendences for the burden of TB to be hgher n urban than n rural areas may be due to hgh populaton densty, crowded lvng and workng condtons as well as lfe style changes assocated wth urban lvng. TB bactera also can establsh n nursng homes because older adults often have mmune systems weakened by llness..2.3.4 Soco-demographc factors Demographc factors nclude age (expressed as a grouped varable), gender and martal status have been lnked to TB nfecton. The TB epdemc n rural and urban areas s most severe for a varety of reasons ncludng populaton dynamcs where mgrant mne and factores workers 7

carry the bactera, back home durng holdays and spread t to ther households and surroundng areas (Zuma et al.,2005). Crowded lvng envronments are also the effect of urbanzaton where people move to ctes n search of work and most of these end up lvng n crowded nformal settlements. In addton Zwang et al. (2007) stated that the co-nfecton of TB and HIV affects more males at an earler age who are lkely to be exposed to poor workng envronments that put them at rsk of TB than females..3 Methodology used n the study.3. Collecton of data The data used n ths study s regstraton and records survey data on deaths from varous causes gathered by Statstcs South Afrca n 2007. Our man nterest s on deaths due to TB and HIV..3.2 Statstcal Analyss and Statstcal Software Exploratory analyss s performed usng graphcal dsplays and some basc summary statstcs such as mean and the three quartles as well as assocated dsperson statstcs n the form of tables. Logstc regresson, as a specal case of the Generalzed Lnear Models (GLM), was used to assess the effect of rsk factors or predctors to the probablty of deaths assocated wth TB and HIV. Statstcal modelng and analyss was done usng STATA software..4 Objectves of ths thess.4. General objectve The study ams to dentfy factors that can be used to explan TB and HIV mortalty n South Afrca. The work wll also be concerned wth statstcal methods that can be best used to model 8

these assocatons, to dentfy factors assocated wth TB and HIV mortalty n South Afrca durng the year 2007..4.2 Specfc objectves. To nvestgate and dentfy factors assocated wth TB and HIV death n South Afrca usng mortalty gathered by STATS SA n 2007. 2. To apply logstc regresson, a specal case of generalzed lnear regresson modelng, to relate a bnary outcome namely death due to TB (HIV) to a number of predctor varables ncludng the effect of HIV (TB) co-mortalty. 3. To extend the regresson model n Objectve No. 2 to account for correlated data usng survey logstc regresson. 4. To extend the unvarate modellng approach to a jont modelng of the two bnary outcomes n one model as possble future study. 5. Suggest a spatal modellng approach to study the dstrbuton of rsk due to TB and HIV n South Afrca..5 Overvew of the thess In addton to Chapter one whch contans the ntroducton and lterature revew, Chapter two presents exploratory data analyss. Chapter three gves a bref revew of generalzed lnear models, dscusses mportant statstcal ssues n bnary logstc regresson modelng and the estmaton of parameters nvolved. These models wll also be used to model TB and HIV mortalty and assocated causes to acheve the research objectves. Chapter four wll focus on data analyss and the nterpretaton of results. In Chapter fve, we apply the Bayesan modelng and mappng usng WnBUGS verson.4. Fnally, Chapter sx wll provde conclusons, mplcatons, and avenues for future research work. 9

CHAPTER TWO EXPLORATORY DATA ANALYSIS 2. Introducton The data was sourced from Statstcs South Afrca consst of 6532 deaths from varous causes n the year 2007. As a prelmnary exploratory analyss, the use of tools such as cross tabulatons and graphcal dsplays wll gude n understandng of mportant relatonshps. Results from such an exploratory analyss wll assst n buldng a more formal statstcal model to understand the relatonshp between key predctor varables and the response varable. Our nterest n the current work s death due to tuberculoss (TB) and HIV. The synergy between TB and HIV has attracted a huge nterest n recent tmes. However, n ths study, the author most mportantly consdered only four varables among fourteen, namely those whch have potental sgnfcant effect on TB death and HIV death defned as the presence or absence of the dsease. The four used varables are: age group, sex, death provnce and death Insttuton. It should be noted however that we cannot nclude the other ten varables n analyss such as death type, martal status, provnce of brth, provnce of resdence, smokng status, pregnancy, HIV causerelated (self-reported), educaton level, occupaton, and type of ndustry or busness of work because there was a hgh rate of mssng data n these varables. 2.2 Data Descrpton Table 2.0.Shows a descrpton of the factor varables to be used and the codes assgned to the levels of each varable. 0

Table 2.0: Data descrpton Varable Name Descrpton TB causerelated Yes=, No=0 HIV causerelated Yes=, No=0 Age group 0-5=, 6-30=2, 3-45=3, 46-60=4, 6-75=5, 76-90=6, >90=7 Sex Male=, Female=2 Death Hosptal (n-patent)=, Emergency room/out-patent)=2, Death on arrval=3, Insttuton Nursng home=4, Home=5,Other=6 Death provnce Western Cape=, Eastern Cape=2, Northern Cape=3, Free State=4, KwaZulu- Natal=5, North West=6, Gauteng=7, Mpumalanga=8, Lmpopo=9, Outsde South Afrca=0 Note that Other=Unknown, not applcable and unspecfed. Yes=TB-cause of death and HIV- cause of death; No=Non TB and HIV negatve. A smlar descrpton of varables s used for HIV Data. 2.3 Exploratory Analyss of TB Deaths Data In ths secton, an exploratory analyss of the TB data s presented. The nterpretaton and analyss presented n ths secton s based on a cross-tabulaton analyss presented n Table 2.. Of the 6532 deceased people, 54697 (88%) ded from natural death and dsease, among the deceased 65052 (2%) ded of TB. The percentage of TB deaths among males s.8%, P <0.00 (see Table 2.). The percentage of deaths among 0-5 years old s 2.29%, 6.49% for 6-30 years old, 9.05% for 3-45, 2% for 46-60 years old, 4.28% for 6-75 years old,.47% for 76-90 years, to late 50 s old and 2.4% for above 90 years old. It shows that death due to TB appear to be n younger age groups (6-30 years, 3-45 years, 46-60 years) than older people, P < 0.00 (see Table 2.). Western Cape Provnce, and Gauteng to some extent, has, n general, lower rsk of TB deaths than the other regons of South Afrca whle KwaZulu-Natal, Mpumalanga and Eastern Cape have hgher rsk of TB death. Rsk of TB death also dffers by death Insttuton. The results

ndcate that rsk of death was hghest n hosptal (n-patent) followed by emergency room (outpatent) and home, wth rates of 4.53%, 9.% and 8.55% respectvely than other death Insttutons. It s observed from Fgure 2. that overall 32% of TB Deaths occurred n KwaZulu-Natal, followed by Eastern Cape (6%) and Gauteng (3%). The lowest percentage of deaths occurred n Northern Cape (2%). Less than % of deaths regstered were outsde South Afrca. It s mportant to note that the dstrbuton of deaths by provnce of occurrence s largely smlar to the dstrbuton of the South Afrcan populaton by provnce. Fgure 2.: Percentage dstrbuton of TB deaths by provnce of death occurrence, 2007 2

Table 2.: Percentage of TB and NON TB deaths, Wth P-values for Ch-Square test, Accordng to selected Demographc, Socal, Health status and lfe style NO N TB TB N NO N TB TB N Demographc/Provncal Characterstcs Death provnce P<0.00 Age group P<0.00 Western Cape 92.07 7.93 4809 Eastern Cape 88.5.5 88200 0-5 97.7 2.29 86 Northern Cape 9.2 8.8 5466 6-30 83.5 6.49 85939 Free State 89.73 0.27 5234 3-45 80.95 9.05 57694 KwaZuLu-Natal 85.42 4.58 4286 46-60 88..9 4469 Noth West 89.9 0.09 4633 6-75 95.72 4.28 9358 Gauteng 92.59 7.4 8449 76-90 98.53.47 6609 Mpumalanga 88..9 4968 >90 97.59 2.4 832 Lmpopo 92.24 7.76 53826 Outsde South Afrca 90.5 9.5 579 Sex P<0.00 Health status and lfe style Male 88.82.8 3438 Female 90.05 9.95 299933 Death Insttuton P<0.00 Hosptal (n-patent) 85.47 4.53 263962 Emergency room/out-patent) 90.9 9. 0672 Death on arrval 93.63 6.37 5254 Nursng home 94.8 5.9 2630 Home 9.45 8.55 93850 Other 93.66 6.34 8944 2.4 Exploratory Analyss of HIV Death Data The data provded by Statstcs South Afrca consst of 6532 deaths from varous causes n the year 2007. Of these deaths, the proporton of non-hiv cause related deaths was 97.77% and the proporton of HIV cause of death was 2.23%. HIV cause-related deaths vared accordng to dfferent factors as descrbed n ths Secton. Table 2.2 present the dstrbuton of the number of HIV cause-related deaths by each varable. The mortalty rate due to HIV was 2.23%, wth the rsk of HIV death also varyng by age. The rsk of HIV cause related death was hghest among those n age group 3 to 45 wth a rate of 4.32% followed by those n age group 6 to 30 wth a 3

rate of 3.88%. Those who are n age group 76 to 90 and above were less lkely to be nfected wth HIV. Thus the rsk of HIV cause related death n ths group s much lower. Results n Table 2.2 show that females are more lkely to de of HIV than males. The HIV cause-related death rate for males and females were.95% and 2.53% respectvely. The analyss shows that n 2007 death due to HIV was hghest n the Western Cape and KwaZulu-Natal wth rates of 3.25% and 3.7% respectvely. These were followed by Northern Cape wth 2.32%. Lmpopo provnce had the lowest rsk of death due to HIV. Examnaton of results n Table 2.2 ndcates that the number of death by HIV s hgher for hosptalzed people 3.49%. Fgure 2.2: The percentage dstrbuton of HIV deaths by provnce of death occurrence, 2007. The Overall s 6532, for HIV negatve N=60594 and N=378 for HIV cause of death. 4

Table 2.2: Percentage of Non HIV death and HIV cause of death, Wth P-values for Ch-Square test, Accordng to selected Demographc, Socal, Health status and lfe style Non HIV death HIV death N Demographc/Provncal Characterstcs Age group P<0.00 0-5 98.84.6 86 6-30 96.2 3.88 85939 3-45 95.68 4.32 57694 46-60 98.06.94 4469 6-75 99.72 0.28 9358 76-90 99.95 0.05 6609 >90 99.52 0.48 832 Sex P<0.00 Male 98.05.95 3438 Female 97.47 2.53 299933 Death provnce P<0.00 Western Cape 96.75 3.25 4809 Eastern Cape 98.6.84 88200 Northern Cape 97.68 2.32 5466 Free State 98.03.9 5234 KwaZuLu-Natal 96.83 3.7 4286 North West 98.36.64 4633 Gauteng 97.77 2.23 8449 Mpumalanga 98.05.95 4968 Lmpopo 99.48 0.52 53826 P<0.00 Death Insttuton Hosptal (n-patent) 96.5 3.49 263962 Emergency room/out-patent) 97.03 2.97 0672 Death on arrval 98.55.45 5254 Nursng home 98.36.64 2630 Home 99.03 0.97 93850 Other 98.42.58 8944 5

2.5 Interacton of TB and HIV Table 2.3 shows that the rsk of TB death s hgher among ndvduals nfected wth HIV compared to those who are HIV negatve. People who ded of HIV related causes are at hgher rsk of TB death as the prmary cause. The observed probablty of dyng of TB gven HIV cause of death s 24% compared to 0% for non HIV related causes. Table 2.3: Two-way table showng the jont dstrbuton of TB deaths by HIV deaths Varable Category TB No TB HIV causerelated HIV negatve 6734 (0) 539860 (90) HIV cause of death 338 (24) 0400 (76) Total 65052 550260 Total 60594 378 6532 The table shows that 24% were reported to have ded due to co-mortalty whle 0% ded of TB but not wth HIV. The results shows that ndvduals who ded of other causes of death (non TB) but wth HIV related causes s 76% whle those who ded of TB alone wth no HIV related cause was 0%. 6

Fgure 2.3: The jont dstrbuton of TB deaths by HIV deaths Fgure 2.3 shows the effect of the jont dynamcs of HIV and TB. From a dsease modellng standpont modellng co-mortalty can present formdable mathematcal challenges due to the fact that the models of transmsson are qute ntertwned. Furthermore, the fact that HIV actvates TB an ndvdual who ded of TB could have been co-nfected wth HIV and vce-versa. Here the rsk of TB and HIV mortalty s gve 24% correspondng to 338 cases wthn TB cause related deaths. 2.6 Summary 2.6. Tuberculoss The exploratory analyss carred out n ths chapter ndcates that the rsk of TB death s hgher among males than females. The possble reason s that males tend to work n envronments that ncrease the rsk of TB nfecton. One possble workng envronment s that males work n mnes 7

more than females where shafts n mnes are poorly ventlated and therefore facltatng very easy spread of TB bactera. Returnng mgrant mne workers carry the bactera back home durng holdays and may possbly spread t to ther surroundng areas. The prelmnary results on TB death data ndcate that TB cause-related seems to be hgher among younger ndvduals. The reason s possbly due to the fact that younger ndvduals are more vulnerable to co-nfecton wth HIV. The fact that TB s an opportunstc nfecton among HIV nfected ndvduals may explan ths correlaton. Indvduals who lve n nformal settlements, or work n crowded envronments, such as factores where there s a lot of polluton, or n crowded households tend to be at hgher rsk of contractng and dyng of TB than other lvng and workng condton. 2.6.2 Human Immune Vrus The exploratory analyss carred out n ths chapter ndcates that HIV cause related death s hgh among females than males. Possble reasons for ths nclude the fact that women are exposed to sexual abuse, rape and commercal sex actvtes for survval whch expose them to HIV. A possble bologcal reason for a hgh HIV transmsson rate n females s that females have a larger cervcal area whch makes t easer for HIV to establsh tself n females than n males. The cause of death n young ndvduals could be due to the fact that they are more sexually actve and nexperenced whch lead them to be at hgher rsk of HIV nfecton hence hgh HIV mortalty. Low levels of educaton, poverty, overcrowdng and unemployment are much assocated wth the less knowledge about HIV/AIDS. 8

CHAPTER THREE GENERALIZED LINEAR MODELS 3. The Exponental Famly Generalzed Lnear Models (GLMs) are an extenson of the classcal lnear models and are used to model observatons on random varables havng a dstrbuton belongng to the exponental famly of dstrbutons. If the probablty densty functon (p.d.f.) of the -th observaton from a random sample of sze n from a random varable Y s gven by y ( ) b ( y;, ) exp c( y, ), a ( ) f (3.) where a, b and c are known functons, then f(.) s sad to belong to the exponental famly. The parameter s called the natural locaton parameter whlst s the dsperson parameter. Many known dstrbutons belong to the exponental famly (e.g. normal, bnomal and Posson dstrbutons). The mean and varance of and are respectvely gven by E Y ) b'( ), (3.2) ( 2 Var(Y ) a( )b' '( ). (3.3) McCullagh and Nelder (989) and Myers, Montgomery and Vnng (2002, pp.57-60) provde a detaled theoretcal background of these models. In partcular, the publcaton by McCullagh and Nelder (989) s the most referenced book on generalzed lnear models (GLMs). The dea was frst developed by Nelder and Wedderburn 9

(972) and extended later by Dobson (990), wth dscusson on the theory and applcaton of such models, to numerous applcaton areas. In order to dscuss the use of GLMs to regresson problems, let us consder n ndependent observatons y,, y2, yn of a random varable Y. Let and suppose each of the y depend on a set of predctor varables or explanatory Y varables x, x2,, x p, also called covarates n applcaton areas such as medcal research. We am to estmate or ft the model of the form 0. g x x p p Then formally, the random varable Y s sad to conform to a generalzed lnear model (GLM) f t has the followng three condtons hold: () Each realzaton y of Y belongs to an exponental famly of dstrbutons wth p.d.f. of the form () for whch the natural parameter s,,2,, n. functon of,,, (2) A lnear predctor 0 p values of the explanatory varables. T wth p n. s consdered to be a 0 x 2 x2 p xp whch s a lnear combnaton of (3) A monotonc functon called lnk functon g( ) between the mean response E( Y ) and the lnear predctor for,2,, n. If g ) 0 x 2 x2 p xp, then g s called a canoncal lnk. ( (McCullagh and Nelder, 989; Myers, Montgomery and Vnng, 2002, p.6). To see how the dea of a canoncal lnk arses let Y, Y2,, Y n be ndependently dstrbuted observatons such that s dstrbuted as, then, 20

2 y y p p p y f ;.,2, n Clearly the p.d.f above can be re-wrtten as an exponental famly because, p p p y p y f log log exp ; It follows that p p log therefore e e p and e p b log log Note that snce n ths case then log The functon g log s called the canoncal lnk functon. Therefore, f log = 0 2 2, p p x x x t mples that

p e e 0 x p xp 0 x p xp. Snce also p p and var t can easly be shown that p b b p p. Some well-known dstrbutons and ther assocated canoncal lnk functons are tabulated below. Table 3: Common dstrbutons wth correspondng lnk functons for constructng generalzed lnear models Dstrbuton Normal Bnomal Lnk functon Identty lnk: Probt lnk: ( ), where s the cumulatve functon of the standard normal dstrbuton Logt lnk: ln p p Complementary log-log lnk: ln( ln( )) Power lnk:, 0 ln( ), 0 Posson Gamma Log lnk: ) Recprocal lnk: ln( Source: Myers, Montgomery and Vnng (2002, p.62). 22

As shown n the Table 3, t s assumed the lnk functon (denoted by g) s a monotonc and dfferentable functon whch lnks the mean response = and the lnear predctor x. y If equals, the lnk functon s called a canoncal lnk functon. If g correspondng to. Thus each member of the exponental famly of dstrbutons has a unque canoncal lnk functon. For example, the canoncal lnk functon for Bnomal (or Bnary) data s the logt lnk gven by where log = The generalzed lnear model for ndependent Bernoull observatons wth logt lnk s referred to as the logstc regresson model. Wth GLMs the dentfcaton of the mean-varance relatonshp and the choce of the scale on whch the effects are to be measured can be done separately, thus overcomng the shortcomngs of the data transformaton approach. GLMs transform the parameters to acheve the lnear addtvty. p. 3.2 Estmaton of Parameters Parameter estmaton for generalzed lnear models s done usng the method of maxmum lkelhood. It follows from equaton (3.2) that the log-lkelhood of a generalzed lnear model can be wrtten as l a( ) n [( y b( )) c( y, )] (3.4) (Myers, Montgomery and vnng, 2002, p.63). Consder the case of a GLM wth canoncal lnk functon of the form 23

g ) x ( 0 2 x2 p xp. ( 0 2 p Estmates of the parameters,,,, ) are computed by dfferentatng the loglkelhood functon gven by equaton (3.4) wth respect to and then solvng the system of l equatons 0. Ths leads to the score equatons gven by n ( j y )x 0 for j 0,,, p and x, denotng the frst column of. 0 Thus ths system of p equatons can be wrtten n matrx form as x x p x x 2 2 p x x n np y y n n X T ( y ) 0 (3.5) where X s a n ( p ) desgn matrx, y = T ( y, y2,, y n ) s an n vector of observatons and s the n vector of expected mean responses. T (,,, n ) The smultaneous systems of equatons (3.5) are solved teratvely usng for example the Taylor approxmaton. After convergence the asymptotc varance-covarance matrx of ˆ s gven by Var( ˆ) T (X WX ) (3.6) Where W s the n n dagonal matrx wth (, ) th element gven by w ( ) Var Y 2 (3.7) 24

3.3 Model Checkng 3.3.Goodness-of-ft Test The log-lkelhood-rato (devance) and the Pearson s ch-square statstcs are the man tools used for assessng the goodness-of ft of the ftted generalzed lnear model (Agrest, 2002). They measure the dscrepancy of ft between the maxmum log-lkelhood achevable and the acheved log-lkelhood by the ftted model. The most commonly used measure n GLMs called devance, s defned as y, ˆ 2 y; y ˆ y D ; } (3.8) where ˆ; y s the log-lkelhood under the model of nterest and y; y s the log-lkelhood under the maxmum achevable (saturated) model (Agrest, 2002, p.8). Under the hypothess that the model s correct, the devance (3.8) has a ch-square dstrbuton wth n p degrees of freedom where n s the number of observatons and p s the number of model predctor varables (Myers, Montgomery and Vnng, 2002, p.34). For a bnomal model such as the one we are dealng wth defned by n! y n y P( Y y) p ( p), y 0,,2, n (3.9) y!( n y)! the devance (3.8) for bnomal data s gven by D 2 n y ˆ y ˆ n y ln n ( y )ln 2observed (Agrest, 2002, pp.40-4). n ln observed/ftted 25

3.4 Model Selecton and Dagnostcs 3.4. Model Selecton There can be a number of models n the famly of generalzed lnear models that descrbe a gven data set. Therefore, t s necessary to select the smplest ratonal model that suffcently descrbes the partcular data (Agrest, 990). As n most applcatons ncludng the current one there can be many varables under consderaton. In ths case the stepwse selecton procedure s mostly preferred because t has an advantage of mnmzng the chances of keepng redundant varables and leavng out mportant varable n the model. In all the procedures, a varable that leads to a sgnfcant change n the devance (gven by equaton 3.8) when added to or dropped from the model s retaned, otherwse t s dropped. Ths method of model selecton s referred to as devance analyss and s used to test the model for the goodness-of ft. 3.5 Logstc Regresson Model (LRM) The logstc regresson model (LRM) s a member of generalzed lnear models used to model bnary data and ts man propertes wll be dscussed because t wll be the man applcaton tool n analyss of the mortalty data n the thess. Consder n ndependent observatons y of a bnary random varable Y takng values for a success and 0 for a falure. Each realzaton y of Y s sad to follow a Bernoull dstrbuton wth probablty densty functon gven by f ( y ) p ( p) y y, y 0,, 26

where p s the probablty of success,.e. p P( Y ). For n ndependent Bernoull trals, the number of successes Y n Y follows a bnomal dstrbuton wth probablty densty functon gven by n p y y ny p ; y 0,,2,, n. Thus let us consder a bnomal random varable Y wth parameters n and p. Gven a set of explanatory varables x, x 2,, x p assumed to have an effect on the response y, the probablty of response p P( Y y x, x 2,, x p ) s sad to follows a logstc dstrbuton f exp( 0 x 2x2 pxp) p (x) exp( x x x ) 0 2 2 p p (3.0) or n terms of the logt functon as p logt(p(x) ) ln 0 x 2x2 px p (3.) p where 0,, 2, p are unknown model parameters to be estmated (Agrest, 2002, p.82). The predctor varables x, x 2,, x p can be contnuous (example, age) or categorcal (example, sex, martal status). The parameters 0,, 2, p are nterpreted as log odds ratos wth respect to the reference level of the factor varable under consderaton. Thus, parameter estmates and assocated varance-covarance matrx are calculated usng equatons (3.5), (3.6) and (3.7) earler stated. 27

3.5. Fttng a logstc regresson model The fttng of a logstc regresson model s exactly the same as for any Generalzed Lnear Models for bnomal but wth n fxed atn. Therefore the detals of ts fttng process wll not be repeated here but for nterested readers the book by Agrest (2002) s recommended. As already stated n Secton 3.2, the estmatng equatons for a GLM partcularly the case of the logstc regresson model are readly solved usng teratve methods generally nstalled n statstcal packages such as SAS, Genstat, and SPSS. As n (3.6), the varance-covarance matrx ˆ ˆ ˆ 0 p s gven by of the vector ˆ =,,, T V ˆ( ˆ) =(X T WX), where W s the k k dagonal matrx wth dagonal elements n pˆ ( pˆ ) for,2,, k (Agrest, 2002). Hence, the standard error of ˆ s n pˆ ( pˆ ). As a consequence, a j ( ) 00% confdence nterval of j s ˆ j t, se ˆ ) where se ˆ ) = n pˆ ( pˆ ), 2 ( j ( j and t, 2 s the value of the t-dstrbuton on =k- degrees of freedom at the left of whch the area under the curve or dstrbuton s 2 3.5.2 Odds ratos For nterpretaton of regresson parameters n the logstc regresson model gven by (3.), many researchers prefer reportng odds ratos than the drect model parameter ˆ, j,2, p. j, In general, n the case of a bnomal dstrbuton wth probablty of success p, the odds of a success s defned as 28

O prob of success p. prob of falure p For two probabltes of success p and p 2, the rato of the assocated odds O and O 2 s called odds rato and s gven by O p /( p). O p /( p ) Clearly, the logstc regresson defned n terms of logt (3.) s a log (odds). 2 To explan the dependence of the odds rato on covarates, consder the specal case of one categorcal explanatory varable x, for example exposure status wth value x 0for unexposed and x for exposed. Then, from equaton (3.0) assumng x by a dsease, we have or equvalently p 2 2 p s the probablty of nfecton exp( 0 x) x (3.2) exp( x) 0 p logt( px) ln 0 x. (3.3) p p The odds of dsease for those exposed x s O exp( 0 ). p p2 Lkewse the odds of dsease for those unexposed x 0s O 2 exp( 0 ). p Fnally the odds rato of exposed relatve to unexposed s now gven by O exp( 0 ) exp( ). O exp( ) 2 Hence, the odds rato comparng the two odds of dsease s the exponental of the slope parameter n model (3.3) or lkewse s the log odds rato, 0 2 log. 29

The calculatons of odds rato n the case of a bnary explanatory varable such as exposure status (exposed versus unexposed) can be generalzed to the case of a categorcal varable wth l levels where l 3. In such a stuaton one level s taken as the reference, and model (3.3) can be extended to the case of multple lnear logstc regresson model wth l dummy varables x where x f level s consdered, otherwse x 0 for,2,, l. Model, x2,, x l (3.3) becomes log t( p p p x) ln 0 x 2x2 l xl (see Agrest, 2002, p.78). Note that term model s used for p x because the equatons descrbe the probablty of success p x n terms of the covarates. The odds rato assocated wth level relatve to the reference s calculated n the same way as for the case of a one varable wth two levels except here t has to be nterpreted condtonal on the other varables held constant. Now, consder the case when x s a contnuous varable, we can calculate the odds of an event When x ncreases by one unt relatve to the odds when x remans unchanged. The two odds at x and x are respectvely and smlarly exp[ ( x )] 0 p exp[ 0 ( x )] O exp[ 0 ( x )]. p exp[ ( x )] It follows that the odds rato s then gven by O 0 p2 exp( 0 ). p 2 x 2 O exp[ 0 ( x )] exp( ). O exp( x) 2 0 30

Here, the odds rato s agan the exponental of the slope parameter, and can be nterpreted as the rato of the odds when x ncreases by one unt. It mmedately follows from exp( ) that ln( ),.e. the slope parameter can be nterpreted as the natural logarthm of the odds rato. Hence, f ( ) 00% confdence nterval of s t, v se( ), then a ( ) 00% confdence nterval of the odds rato s{exp[ t, v se( ) ] }where ˆ s the estmate of, s the sgnfcance level (often taken as 0.05), se( ˆ ) s the standard error of ˆ 2 ˆ 2 ˆ ˆ ˆ, t (, ) 2 s approprately read or derved from the t-dstrbuton quantles. Now, consder the multple logstc regresson model (3.0) or equvalently model (3.). The nterpretaton of the slope parameter n the case of the one-varable bnary logstc regresson model (3.2) can be extended to the case of multple logstc regresson model (3.0). If an explanatory varable X j s contnuous, the parameter j n model (0) s the ncrease n natural logarthm of odds rato at X j x j relatve to X j x j when the other p varables are mantaned unchanged or constant. If an explanatory varable X j s categorcal wth l levels, the parameter k can be nterpreted as the ncrease n natural logarthm of odds rato at level k relatve to the reference level of X wth k,2,, l and j,2,, p. Confdence j ntervals of parameters and odds ratos are calculated n smlar way as for the case of one explanatory varable. 3.6 Model Selecton and Dagnostcs for LRM The same procedures dscussed n secton 3.4 for model selecton apply here. For ungrouped bnary data, the devance statstc D (or D ) s used only to select varables and not as a measure 3

32 of goodness-of-ft. Hosmer and Lemeshow (989) proposed and dscussed the goodness-of-ft as explaned n the next secton. In the secton napproprateness of the devance statstc as a measure of goodness-of-ft test wll also be dscussed. 3.7 Model checkng 3.7. Goodness-of-ft Test Recall that the devance s gven by (3.8) as y y y y D ; ˆ ; 2 ˆ, } (3.4) Where y ˆ; s the log-lkelhood under the current model and y y; s the log-lkelhood under the maxmum achevable (saturated) model. We consder the typcal case of grouped data where the th group has m observatons n t nstead of the case ungrouped bnary data. Suppose generally ~Bn m,, then Y m E. The lkelhood functon s y m y n n y m y m y f!!! ;. The log-lkelhood s y ; = n n n y m y y m y m ln ln!!! ln = n n n m m m y m m m y y m y m ln ln!!! ln = n n n m m y m m y y m y m ln ln!!! ln. Therefore, the log-lkelhood for the ftted model s