Impact of Imputation of Missing Data on Estimation of Survival Rates: An Example in Breast Cancer

Similar documents
International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores

Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana

Copy Number Variation Methods and Data

Saeed Ghanbari, Seyyed Mohammad Taghi Ayatollahi*, Najaf Zare

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of

Study and Comparison of Various Techniques of Image Edge Detection

The effect of salvage therapy on survival in a longitudinal study with treatment by indication

Estimation of Relative Survival Based on Cancer Registry Data

Statistical Analysis on Infectious Diseases in Dubai, UAE

NHS Outcomes Framework

Using Past Queries for Resource Selection in Distributed Information Retrieval

Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA

Non-parametric Survival Analysis for Breast Cancer Using nonmedical

Appendix F: The Grant Impact for SBIR Mills

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis

Statistical models for predicting number of involved nodes in breast cancer patients

Association between cholesterol and cardiac parameters.

What Determines Attitude Improvements? Does Religiosity Help?

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

Length of Hospital Stay After Acute Myocardial Infarction in the Myocardial Infarction Triage and Intervention (MITI) Project Registry

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16

THIS IS AN OFFICIAL NH DHHS HEALTH ALERT

Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods

National Polyp Study data: evidence for regression of adenomas

Richard Williams Notre Dame Sociology Meetings of the European Survey Research Association Ljubljana,

Insights in Genetics and Genomics

Economic crisis and follow-up of the conditions that define metabolic syndrome in a cohort of Catalonia,

THE NATURAL HISTORY AND THE EFFECT OF PIVMECILLINAM IN LOWER URINARY TRACT INFECTION.

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

Lateral Transfer Data Report. Principal Investigator: Andrea Baptiste, MA, OT, CIE Co-Investigator: Kay Steadman, MA, OTR, CHSP. Executive Summary:

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015

ALMALAUREA WORKING PAPERS no. 9

An Introduction to Modern Measurement Theory

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE

Cancer morbidity in ulcerative colitis

Project title: Mathematical Models of Fish Populations in Marine Reserves

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS

NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 TIANHONG ZHOU

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA

Comparison of methods for modelling a count outcome with excess zeros: an application to Activities of Daily Living (ADL-s)

Rainbow trout survival and capture probabilities in the upper Rangitikei River, New Zealand

UNIVERISTY OF KWAZULU-NATAL, PIETERMARITZBURG SCHOOL OF MATHEMATICS, STATISTICS AND COMPUTER SCIENCE

PSI Tuberculosis Health Impact Estimation Model. Warren Stevens and David Jeffries Research & Metrics, Population Services International

Disease Mapping for Stomach Cancer in Libya Based on Besag York Mollié (BYM) Model

Survival Rate of Patients of Ovarian Cancer: Rough Set Approach

DECREASING SYMPTOMS IN INTERSTITIAL CYSTITIS PATIENTS: PENTOSAN POLYSULFATE VS. SACRAL NEUROMODULATION. A Research Project by. Katy D.

EXAMINATION OF THE DENSITY OF SEMEN AND ANALYSIS OF SPERM CELL MOVEMENT. 1. INTRODUCTION

TOPICS IN HEALTH ECONOMETRICS

A GEOGRAPHICAL AND STATISTICAL ANALYSIS OF LEUKEMIA DEATHS RELATING TO NUCLEAR POWER PLANTS. Whitney Thompson, Sarah McGinnis, Darius McDaniel,

The impact of asthma self-management education programs on the health outcomes: A meta-analysis (systemic review) of randomized controlled trials

Price linkages in value chains: methodology

Optimal Planning of Charging Station for Phased Electric Vehicle *

Physical Model for the Evolution of the Genetic Code

CONSTRUCTION OF STOCHASTIC MODEL FOR TIME TO DENGUE VIRUS TRANSMISSION WITH EXPONENTIAL DISTRIBUTION

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters

Leukemia in Polycythemia Vera. Relationship to Splenic Myeloid Metaplasia and Therapeutic Radiation Dose

Gurprit Grover and Dulumoni Das* Department of Statistics, Faculty of Mathematical Sciences, University of Delhi, Delhi, India.

Resampling Methods for the Area Under the ROC Curve

Are National School Lunch Program Participants More Likely to be Obese? Dealing with Identification

STAGE-STRUCTURED POPULATION DYNAMICS OF AEDES AEGYPTI

AUTOMATED DETECTION OF HARD EXUDATES IN FUNDUS IMAGES USING IMPROVED OTSU THRESHOLDING AND SVM

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy

Statistically Weighted Voting Analysis of Microarrays for Molecular Pattern Selection and Discovery Cancer Genotypes

N-back Training Task Performance: Analysis and Model

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article

Recent Trends in U.S. Breast Cancer Incidence, Survival, and Mortality Rates

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data

Does reporting heterogeneity bias the measurement of health disparities?

Balanced Query Methods for Improving OCR-Based Retrieval

Strategies for the Early Diagnosis of Acute Myocardial Infarction Using Biochemical Markers

Maize Varieties Combination Model of Multi-factor. and Implement

Comparison of support vector machine based on genetic algorithm with logistic regression to diagnose obstructive sleep apnea

A Meta-Analysis of the Effect of Education on Social Capital

Investigation of zinc oxide thin film by spectroscopic ellipsometry

NATIONAL QUALITY FORUM

Association Analysis and Distribution of Chronic Gastritis Syndromes Based on Associated Density

Evaluation of two release operations at Bonneville Dam on the smolt-to-adult survival of Spring Creek National Fish Hatchery fall Chinook salmon

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION

Survival Comparisons for Breast Conserving Surgery and Mastectomy Revisited: Community Experience and the Role of Radiation Therapy

THE IMPACT OF IMPLANTABLE CARDIOVERTER- DEFIBRILLATORS ON MORTALITY AMONG PATIENTS ON THE WAITING LIST FOR HEART TRANSPLANTATION

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx

Relevance of statistical techniques when using administrative health data: gender inequality in mortality from cardio-vascular disease

Sheffield Economic Research Paper Series. SERP Number:

econstor Make Your Publications Visible.

Encoding processes, in memory scanning tasks

Normal variation in the length of the luteal phase of the menstrual cycle: identification of the short luteal phase

Validation of the Gravity Model in Predicting the Global Spread of Influenza

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Journal of Economic Behavior & Organization

Effects of Estrogen Contamination on Human Cells: Modeling and Prediction Based on Michaelis-Menten Kinetics 1

Computing and Using Reputations for Internet Ratings

Optimal probability weights for estimating causal effects of time-varying treatments with marginal structural Cox models

Case 1:09-cv RWS Document 162 Filed 12/23/2009 Page 1 of 5. , UNITED STATES DISTRICT COURT 'i FOR THE SOUTHERN DISTRICT OF NEW YORK.

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field

Evaluation of the generalized gamma as a tool for treatment planning optimization

Transcription:

Orgnal Artcle Impact of Imputaton of Mssng Data on Estmaton of Survval Rates: An Example n Breast Cancer Banesh MR 1, Tale AR 2 Abstract Background: Multfactoral regresson models are frequently used n medcne to estmate survval rate of patents across rsk groups. However, ther results are not generalsable, f n the development of models assumptons requred are not satsfed. Mssng data s a common problem n pathology. The am of ths paper s to address the danger of excluson of cases wth mssng data, and to hghlght the mportance of mputaton of mssng data before development of multfactoral models. Methods: Ths study was performed on 310 breast cancer patents dagnosed n Shraz (Southern Iran). Performng a complete-case Cox regresson model, a prognostc ndex was calculated so as to categorse the patents nto 3 rsk groups. Then, applyng the Multvarate Imputaton va Chaned Equatons (MICE) method, mssng data were mputed 10 tmes. Usng mputed data sets, modellng was performed to assgn patents nto rsk groups. Estmated actuaral Overal Survval (OS) rates correspondng to analyss of complete-case and mputed data sets were compared. Results: Cases wth at least one mssng datum experenced a sgnfcantly better survval curve. Estmates derved analysng complete-case data, relatve to mputed data sets, underestmated the OS rate n all rsk groups. In addton confdence ntervals were wder ndcatng loss n precson due to attrton n sample sze and power. Concluson: Results obtaned hghlghted the danger of excluson of mssng data. Imputaton of mssng data avods based estmates, ncreases the precson of estmates, and mproves genralsablty of results to other smlar populatons. Key words: Mssng data; Multple mputaton; Breast neoplasm; Overall survval, Iran Please cte ths artcle as: Banesh MR, Tale AR. Impact of Imputaton of Mssng Data on Estmaton of Survval Rates: An Example n Breast Cancer. Iran J Cancer Prev. 2010; Vol3, No3, p.127-31. 1. Health School, Kerman Unversty of Medcal Scences, Deptartment of Bostatstcs and Epdemology, Kerman, Iran 2. Shahd Faghh Hosptal, Shraz Unversty of Medcal Scences, Shraz, Iran Correspondng author: Mohammad Reza Banesh, PhD n Bostatstcs Tel: (+98) 913 442 39 48 Emal: m_banesh@ kmu.ac.r Receved: 12 Jan, 2010 Accepted: 21 Jun, 2010 Iran J Cancer Prev 2010; 3: 127-31 Introducton Cancer s one of the most major health problems worldwde. In 2002, a quarter of the 11 mllon new cases of cancer reported worldwde occurred n Europe. Among new cancer patents dagnosed n the UK, whch s more than a quarter of a mllon per year, the most prevalent carcnomas (ncdence rate) were breast (16%), lung (13%), bowel or colorectal (13%), and prostate (12%) [1]. Breast carcnoma, wth one mllon newly dagnosed cases annually, s the most prevalent malgnancy, comprsng 18% of all female cancers [2]. In Iran, cancer s the thrd cause of deaths after cardovascular dseases and accdents [3]. The breast cancer s the most lethal one among women. The prevalence of breast cancer was reported 25.4 and deaths due to breast cancer were 12.3 per 100,000 [3]. Clncal trals typcally nvolve collecton of patent data at entry and n so far as are possble these data wll nclude varables of potental relevance to the lkely cause of the dsease under study. These data sets have been used n development of prognostc models, whch provdes a valuable resource n dentfyng mportant rsk factors for dsease course and hence also for rsk stratfcaton of patents. However, f n development of prognostc models, one gnores model assumptons and lmtatons the models obtaned mght not be generalsable [4,5]. Presence of mssng data s one ssue whch makes dffcultes n model buldng. When mssng data 127

Banesh and Tale present, researchers frequently drop out patents wth mssng data on any of varables under study from consderaton. Ths ad hoc method s known as Complete-Case (C-C) analyss [6]. It has been emphaszed that excluson of mssng data wll dmnsh precson of estmates and can lead to based estmates [7]. Survval rates are frequently reported n the lterature to compare treatment optons, and to nform the patents about ther lkely outcome [8]. Excluson of mssng data results n based estmate of cohort survval rates, n partcular when there s dfference n survval curve of cases wth avalable data wth the remander (who had at least one mssng datum) [9]. As an example when cases wth mssng data, n comparson wth those who had data avalable, exhbts lower survval curve, omsson of mssng data results n overestmaton of survval rates [9]. Therefore, approprate methods should be appled to mpute mssng data so as to avod attrton n sample sze. The am of ths paper s to compare estmaton of survval rates under two scenaros: n complete-case analyss, and after mputaton of mssng data. Methods were appled analysng a breast cancer data set. Materals and Methods Patents and outcome From 1994 to 2003, the nformaton of 310 breast cancer patents n Shraz, southern Iran were collected from Hosptal-based Cancer Regstry of Nemazee Hosptal afflated to Shraz Unversty of Medcal Scences. Medan follow-up tme was 2.5 years. The man outcome of study was Overall Survval (OS). Survval was consdered as the tme perod between dagnoss and death for patents who ded, and from dagnoss to the last vst for censored patent. At the end of the study, there had been 56 deaths. At the frst step a multfactoral model was developed (see the rest of the text). The OS rates were estmated from rsk groups derved (explaned later). Varables offered to the multfactoral models were those showed to have unvarate predctve ablty [10] (tumour stage wth 3 levels (early, locally advanced, and advanced), tumour grade wth 3 levels (1, 2, and 3), hstory of bengn breast dsease (postve versus negatve), and age at dagnoss). Pror to analyss, the age varable was dchotomsed at 48 to be a surrogate for approxmate menopausal status [11]. Multfactoral Models At frst a dummy varable was created whch took a value of 0 f patent had avalable data on all varables under consderaton and 1 otherwse. Survval curve of patents wth and wthout mssng data were compared plottng Kaplan-Meer curves ad performng Log-Rank test. Lnear Cox model was then appled to develop the multfactoral regresson models [12]. Complete-Case (C-C) Model In the C-C model, patents wth mssng data on any of 4 canddate varables were excluded. Cox regresson model n conjuncton wth ENTER varable selecton method was then ftted. A fnal rsk score was calculated by multplyng varables nto the estmated regresson coeffcent. Tertles of the rsk score estmated were appled as cut off to categorse patents nto low, ntermedate, and hgh rsk groups. MICE Model Multvarable Imputaton va Chaned Equatons (MICE) method s then appled to mpute mssng data. The MICE method s a powerful tool to tackle the mssng values. The MICE method replaces each mssng value by multple mputed values, typcally 10, resultng n multply mputed data sets [13,14]. Patents' outcome and set of 4 rsk factors were used n the MICE algorthm [15]. Polytomous and logstc regresson were used to mpute mssng data for categorcal and bnary data respectvely. The creaton of 10 data sets means there s a requrement for 10 modellng analyses, one for each data set, and there wll therefore be 10 dfferent estmates for each parameter. A Cox regresson model was ftted to each of 10 mputed data sets. In each of 10 data sets, multplyng data set specfc estmates nto the varables, a rsk score was calculated (10 n total). Fnally, for each patent a sngle averaged rsk score was calculated by averagng her estmated rsk scores from each of the 10 mputed data sets. Tertles of the fnal rsk score was appled as cut offs to dvde patents nto low, ntermedate, and hgh rsk groups. Estmaton of Overall Survval (OS) rates To compare the OS rates n rsk groups, actuaral 2, 4, and 5-year OS rates n the lowest, ntermedate, and hghest rsk groups are reported. Ths was done analysng complete-case and mputed data sets. Based on defnton the survval functon, say S (4), s the probablty of beng alve at least tll 4 th year of 128 Iranan Journal of Cancer Preventon

Impact of Imputaton of Mssng Data on Estmaton of Survval Rates: An Example n Breast Cancer 1.0 0.8 Patents wth at least one mssg datum Proporton alve 0.6 0.4 0.2 Patents wth complete data 0.0 0.00 2.00 4.00 6.00 Tme (years) Fgure 1. K-M curves for cases wth avalable data and cases wth at least one. Table 1. Comparson of survval of patents wth avalable data and wth at least one mssng datum Group # of patents # of events Log-Rank P-value Cases wth avalable data on all 4 varables 203 54 <0.0001 Cases wth at least 1 mssng datum 107 2 Table 2. Comparson of estmated OS rates n the rsk groups derved analysng complete case and mputed data sets Model Rsk group 2-year OS (%) 4-year OS (%) 5-year OS (%) Complete Case Imputed data set Low 92 (84, 100) 84 (70, 96) 84 (70, 96) Intermedate 79 (67, 91) 67 (51, 83) 67 (51, 83) Hgh 52 (38, 66) 28 (12, 44) 16 (0, 32) Low 95 (91, 99) 90 (82, 98) 90 (82, 98) Intermedate 88 (80, 96) 82 (70, 94) 82 (70, 94) Hgh 64 (52, 76) 42 (28, 56) 32 (16, 48) follow up. Therefore, survval at the 4 th year depends on survval at frst, second and 4 th year whch mples that S(4) = P( T 4). In actuaral lfetable procedure, the whole follow-up duraton wll be splt to ntervals (as an example to 1 year ntervals (0, 1], (1, 2], (2, 3], (3, 4] respectvely). If n and d show number of patents at rsk just before the -th nterval and the number of events at -th nterval, then the probablty of survvng to 4 th 4 d year s gven by S (4) = (1 ) n = 1 Based on Greenwood s formula the varance of ths estmator can be estmated by 4 ˆ ˆ2 d ˆ( ( )) ( ) V S t = S t = 1 n( n d) To address loss n precson of estmates, confdence ntervals of OS rates, correspondng to analyss of C-C and mputed data sets, were estmated and compared. Software A seres of packages whch work under R software (verson 2.5.1) were used [16]. Mssng data were 129

Banesh and Tale mputed usng MICE package [17]. Performance of models (dscrmnaton and predctve ablty) were assessed usng Desgn [18] lbrary. K-M curves are plotted usng SPSS software. Results The numbers (percentages) of patents wth mssng value on node status, grade, and hstory of bengn dsease were 63 (20.3%), 64 (20.6%), and 47 (15.2%) respectvely. In total, out of 310 patents, 203 cases (65%) had data avalable on all 4 varables of whch 54 had ded. Table 1 reports the number of deaths for patents wth complete data and the remander wth at least one mssng datum. Correspondng K-M curves s plotted n Fgure 1. Cases wth complete data had much lower survval curve (Log-Rank P-value <0.0001). Ths ndcates that excluson of cases wth mssng data leads to underestmaton of the true OS rates n the cohort analysed. As explaned n methods secton a rsk score was estmated for complete-case and mputed data sets. Usng tertles as cut off, patents were categorsed nto 3 rsk groups (low, ntermedate, and hgh). Estmated OS rates n rsk groups derved are summarsed n Table 2. Estmatons derved analysng patents wth avalable data, underestmated OS rates n all 3 rsk groups. Ths was the case n all 3 rsk groups, and tme ponts. For example, estmated 2-year OS rate n lowest rsk group for complete-case ad mputed data sets were 92% and 95% respectvely. Correspondng rates at 4 years were 84% and 90% respectvely. Furthermore, C.I.'s correspondng to mputed data sets, relatve to complete-case data, was tghter snce attrton n sample sze s avoded. Dscusson We have seen that confdence ntervals of OS rates correspondng to the mputed data sets were narrower ndcatng mprovement n precson of estmates. Furthermore, comparng K-M curves of patents wth avalable data wth those wth at least one mssng datum suggested that excluson of mssng data leads to underestmaton of OS rates. Ths was consstent wth estmated we obtaned whch are summarsed n Table 2. To provde more accurate estmates, we mputed mssng data 10 tmes. Ths was to protect aganst chance effects dues to mputaton. Ths protecton was to be felt worth the nconvenence of havng to average rsk scores across 10 fnal models. Easer mputaton methods such as Expectaton Maxmum (E- M) algorthm are lkelhood based and sutable approaches. However, E-M method replaces each mssng data by a sngle value so does not take nto account mputaton uncertanty. It has been noted that under the Mssng Completely At Random (MCAR) assumpton, subjects wth complete data are a random sample of data [19]. It has been argued that under MCAR mechansm f mssng rate s less than 5%, case deleton s a reasonable approach [20]. However, t should be emphaszed that even when C-C analyss gve results comparable to the MICE, a gold standard (MICE) s requred to compare results from other smpler methods [21]. On the other hand, when mssng rate s hgh, excluson of mssng data wll dmnsh precson of estmates. Another ssue s that even a low rate of mssng data on each varable mght cause serous problems n multvarate modellng when patents wth mssng data on are scattered across the data. That s because ths mght substantally reduce the number of complete cases avalable for analyss, and ncrease the chance of bas due to excluded cases. There are lots of ad hoc (such as C-C, replacement by mean, and mssng ndcator approaches) and maxmum lkelhood methods (such as E-M algorthm, and multple mputaton technque) to deal wth mssng data [22]. Applcaton and comparson of alternatve mputaton methods was beyond the scope of ths paper and wll be publshed elsewhere. The ultmate consequence of complete-case analyss s power reducton. In addton, case-deleton mght result n based regresson coeffcents f the remanng cases are not the representatve of the whole sample [7,23]. Results presented showed that excluson of cases wth mssng data leads to bas and mprecse estmates. Therefore mputaton of mssng data should be a prme before any modellng practce. Acknowledgment We should thank staff of Motahhar Para clnc and Shahd Faghh hosptal who facltated our access to patents' folder and nformaton. Conflct of Interest There s no conflct of nterest n ths artcle. Authors' Contrbuton The data set analyzed n ths project was collected under the drecton of Professor TAR at Shraz Unversty of Medcal Scences. All analyses and wrtng of manuscrpt has been done by BMR. Both 130 Iranan Journal of Cancer Preventon

Impact of Imputaton of Mssng Data on Estmaton of Survval Rates: An Example n Breast Cancer authors have read and approved the fnal verson of the manuscrpt. References 1. Cancer Research UK. UK cancer ncdence statstcs. http://nfo cancerresearchuk org/ cancerstats/ ncdence/?a=5441 2007 January [cted 2007 Feb 26];Avalable from: URL: http:// nfo. cancerresearchuk. org/ cancerstats/ ncdence/?a=5441 2. McPherson K, Steel CM, Dxon JM. ABC of breast dseases. Breast cancer-epdemology, rsk factors, and genetcs. BMJ 2000 Sep 9; 321(7261):624-8. 3. Naghav M. Iranan annual of natonal death regstraton report. Iran mnstry of health and medcal educaton; 2005. 4. Concato J, Fensten AR, Holford TR. The rsk of determnng rsk wth multvarable models. Ann Intern Med 1993 Feb 1; 118(3):201-10. 5. Wyatt JC, Altman DG. Prognostc models: clncally useful or smply forgotten. BRITISH MEDICAL JOURNAL 1995; 311:1539-41. 6. Burton A, Altman DG. Mssng covarate data wthn cancer prognostc studes: a revew of current reportng and proposed gudelnes. Br J Cancer 2004 Jul 5; 91(1):4-8. 7. Altman DG, Bland JM. Mssng data. BMJ 2007 Feb 24; 334(7590):424. 8. Altman DG, Lyman GH. Methodologcal challenges n the evaluaton of prognostc factors n breast cancer. Breast Cancer Res Treat 1998; 52(1-3):289-303. 9. Van Buuren S, Boshuzen HC, Knook DL. Multple mputaton of mssng blood pressure covarates n survval analyss. Stat Med 1999 Mar 30; 18(6):681-94. 10. Rajaeefard AR, Banesh MR, Tale AR, Mehraban D. Survval Models n Breast Cancer. Iranan Red Crescent Medcal Journal 2009; 11(3):295-300. 11. Ayatollah SM GHASA. Menstrual-reproductve factors and age at natural menopause n Iran. Internatonal journal of gynaecology and obstetrcs 2003; 80(3):311-3. 12. Cox DR. Regresson models and lfe tables. Journal of royal statstcal socety 1972; 34:187-220. 13. Schafer JL. Analyss of Incomplete Multvarate Data. Florda: Chapman and Hall; 1997. 14. Schafer JL. Multple mputatons: a prmer. Stat Methods Med Res 1999 Mar; 8(1):3-15. 15. Moons KG, Donders RA, Stjnen T, Harrell FE, Jr. Usng the outcome for mputaton of mssng predctor values were preferred. J Cln Epdemol 2006 Oct; 59(10):1092-101. 16. R: A language and envronment for statstcal computng [computer program]. 2007. 17. Mce: Multvarate Imputaton by Chaned Equatons [computer program]. 2007. 18. Desgn: Desgn Package [computer program]. 2008. 19. Donders AR, van der Hejden GJ, Stjnen T, Moons KG. Revew: a gentle ntroducton to mputaton of mssng values. J Cln Epdemol 2006 Oct; 59(10):1087-91. 20. Farclough DL. Patent reported outcomes as endponts n medcal research. Stat Methods Med Res 2004 Apr; 13(2):115-38. 21. Greenland S, Fnkle WD. A crtcal look at methods for handlng mssng covarates n epdemologc regresson analyses. Am J Epdemol 1995 Dec 15; 142(12):1255-64. 22. Banesh MR. Statstcal Models n Prognostc Modellng of Many Skewed Varables and Mssng Data: A Case Study n Breast Cancer (PhD thess submtted at Ednburgh Unversty) 2009. 23. Harrell FE. Regresson modellng strateges wth applcaton to lnear models, logstc regresson, and survval analyss. New York: Sprnger-Verlag; 2001. 131