Author's response to reviews Title: Epidemiology of breast cancer in Cyprus: a population based case control study Authors: Andreas Hadjisavvas (ahsavvas@cing.ac.cy) Maria A Loizidou (loizidou@cing.ac.cy) Nicos Middleton (nicos.middleton@cut.ac.cy) Thalia Michael (thalia@cing.ac.cy) Rena Papachristoforou (rena@cing.ac.cy) Eleni Kakouri (eleni.kakouri@bococ.org.cy) Maria Daniel (maria.daniel@bococ.org.cy) Panayiotis Papadopoulos (panicos.papadopoulos@bococ.org.cy) Simon Malas (malas@logos.net.cy) Yiola Marcou (yiola.marcou@bococ.org.cy) Kyriacos Kyriacou (kyriacos@cing.ac.cy) Version: 2 Date: 30 March 2010 Author's response to reviews: see over
29 March 2010 Jason Kerr Assistant Editor BMC Cancer THE CYPRUS INSTITUTE OF NEUROLOGY & GENETICS Dr Kyriacos Kyriacou, PhD Head, Electron Microscopy/Molecular Pathology Department P.O. Box 23462 1683 Nicosia, Cyprus Tel.: (+357) 22 392 631 (+357) 22 358 600 Fax: (+357) 22 392 641 E-mail: kyriacos@cing.ac.cy Website: www.cing.ac.cy Dear Editor, Re: Epidemiology of breast cancer in Cyprus: a population based case control study Revised title: A study of breast cancer risk factors in Cyprus: a population-based case control study I am pleased to accept your offer to resubmit a revised version of the above article. Please note that following one of the reviewers suggestion we have now changed the title of the paper to A study of breast cancer risk factors in Cyprus: a population-based case control study to better reflect the scope of the submitted work. We have also requested statistical advice from a biostatistician whose name has of course been added to the paper as a contributing author (Dr Nicos Middleton). The data have now been re-analyzed using more appropriate methods including multivariable logistic regression. While breast cancer is the leading cancer among Cypriot women, to date, there has not been any epidemiological investigation into the strength of associations between recognized person-based risk factors and breast cancer in our population. While there is no doubt that locally this study has important public health implications, we also believe that describing the underlying distribution of some recognized risk (or protective) factors for breast cancer in a less-well studied population, will also be of interest to the international community. For instance, 1 in 4 Cypriot women have never breast-fed while as many as 80% have at least two children and a comparable proportion has never used hormone replacement therapy. More interestingly, previous published work by our group has shown that the Cypriot population exhibits some unique genetic features as revealed by the identification of novel mutations in the BRCA1 and BRCA2 genes. Thus, the current study will function as a baseline of future efforts from our group, into combining person-based data with the molecular/genetic data to study gene-environment interactions. We have explained the background and scope of this work in the Background section of the manuscript. We have now addressed all of the important issues and recommendations raised by the reviewers. All authors have seen and approved the revised draft. Finally, I would like to thank the reviewer s for their constructive comments and suggestions and below please find our point by point responses (in the highlighted sections) in the order that were raised by the two referees.
Referee 1: Bachok Norsa Adah Overall: this article does not add more knowledge on breast cancer. It has value for local interest only unless researchers put in new variables pertaining to local habits. Eg frequency consumption of olive oil in diet, Mediterranean lifestyle etc. While we certainly agree with the reviewer about the novel contribution in studying the contribution of dietary factors on a Mediterranean island, detailed data on dietary habits of these women were unfortunately not available. Nevertheless, the main focus in this population-based case-control study of breast cancer risk factors in Cyprus was mainly to assess the strength of association with some of the internationally established risk factors for breast cancer such as family history and reproductive status. Indeed this is the first such study in our population. Major Compulsory Revisions: Title is neither complete nor reflecting content. When epidemiology word is put up, I am expecting the article about the distribution of breast cancer according to person, time and place. It will be better to put up a study of risk factors of breast cancer in Cyprus. As already mentioned above, the word epidemiology has been removed from the title of the article, which has now been modified to read A study of breast cancer risk factors in Cyprus: a populationbased case-control study. Method is not well described nor detail. Inclusion & exclusion criteria of case & control are not clear. What is meant by healthy control? No breast cancer only or no diabetes, hypertension etc? We have now expanded the Methods, Study Participants and Data collection section of the article to adequately describe the study design as well as the inclusion/exclusion criteria of the participants, both cases and controls. In more detail, cases were all histologically-confirmed breast cancer cases, diagnosed between 1999 and 2005 who were recruited from the main cancer treatment hospitals in Cyprus. Controls were selected from the pool of women who participated in the National programme for breast cancer mammography screening, and were negative for breast cancer. How age is matched? Looking at table 1 age is not properly matched for example those <45, cases are almost twice in number than controls. It was repetitively mentioned population based study but how the researcher get the participants from the population are not clearly mentioned. Specifically, regarding the issue of age matching: we have now explained in the Methods, Study Participants and Data collection sections of the article, that the controls have NOT been matched to the age-distribution of cases as it was falsely implied in the previous version of the paper. This of course explains why the age-distribution of cases and controls differed in Table 1 in the old manuscript. This table has now been removed and no reference of age-matching appears in the revised text. Of course, due to the narrow age span of women invited for screening (ages 50-69) younger and older women were underrepresented, thus all analyses were adjusted for the confounding effect of age. Definition of operative terms such as cigarette smokers. Does it mean current smokers only or include previous smokers. The same with exercise, OCP, HRT etc. HRT category is not 2
comprehensive. HRT use <6month is not mentioned at all. In the same section (Methods, Study Participants and Data collection), we have now given further explanations about the risk factors investigated. Regarding smoking, we have now included past smokers as a separate category while in terms of HRT, we now display associations across duration of use in the following categorisation: never, <6months, 6-60months and >60 months. Only information on current exercise status was available. Socio-economic status is not explored. Unlike Britain where socio-economic classification is commonly operationalised on the basis of occupation, there are no similar socio-economic classification systems in Cyprus. In the absence of such socio-economic indicators, educational status is commonly used as a proxy. We have also explained this in the article (Discussion). Statistical analysis. Not adequate using only univariate logistic regression. Should continue to multivariate so that confounders can be controlled. Also interaction & multicollinearity can be checked. The proper statistical analysis is conditional multiple logistic regression. Based on the misunderstanding that this was a matched case-control study (see above), the reviewer suggests the use of conditional logistic regression. However, we have now clarified that since no matching was employed between cases and controls, simple logistic regression is the appropriate method of analysis of these data. Of course, we agree with the reviewer about the importance of controlling for confounders in a multivariable approach (as well as testing for effect modification), thus, the data have now been re-analysed. As a result, three new tables are presented in the revised manuscript. The first Table presents the socio-demographic characteristics of the study participants along with a χ 2 test for differences between cases and controls. Table 2 first investigates the interrelationships between variables with a high degree of collinearity (i.e. pregnancy, number of children and breastfeeding) and finally, Table 3 presents odds ratios by participants characteristics before and after adjusting for the effect of all other risk factors estimates in multivariable logistic models. This is detailed both in the Methods, Data analysis section of the article as well as in the Results section where each of these tables is presented and discussed appropriately. Conclusion is not supported by data. The present conclusion can be put in the discussion. Following the reviewer s suggestion, the former conclusion has now been re-written. Minor Essential Revisions: Results Sociodemographic background of the participants are not included Table 1 (previously Table 2) presents the distribution of socio-demographic characteristics and potential risk factors among the participants in absolute numbers and relative frequencies, separately for cases and controls. Table 1&2 there are mean & median age of both groups but there is no p value. I am interested to know whether there is significant difference or not. We are expecting there is no significant difference in the mean age since matching is used. 3
There was no difference in mean age between cases and controls (p-value of t-test=0.22). Nevertheless, since this was not an age-matched case-control study (see above), there were some differences in the age distribution among cases and controls, mainly driven by the underrepresentation of controls in the younger and older age-groups. After restricting to ages 45-65, p- value for differences in the age-structure of cases and controls becomes 0.46. P-values were included in Table 1 and this was explained in a footnote. Pg 7 paragraph 2, 2nd line ~50% should be the exact value Table 2, Sum of all variables are not equal to the total number of participants stated in the table headings. All descriptive statistics should have frequency with %. All decimal numbers should be consistent through out the article eg table 2 the p values range between 1 to 4 decimal numbers. All three issues have been dealt with i.e. (a) we present exact estimates (as well as confidence intervals) in the text, (b) we now present both frequencies and relative frequencies for all participant characteristics in Table 1 (previously labeled table 2) and (c) we ensured that all estimates consistently display 2 decimal points. Categorization of breastfeeding may need revision since the ORs are not consistent. Reference group may need reversion. I do not agree with the categorization of reproductive status. Why include breast feeding? There will be multi-collinearity with breast feeding variable. If the authors want to show interaction between variables, can check using multiple logistic regression. We agree with the reviewer that our previous categorisation of reproductive status was not adequate since it does not allow for the synergy between breastfeeding and pregnancy to be examined. We have now estimated the protective effect of pregnancy and the added effect of breastfeeding over the effect of pregnancy in a logistic regression model that included the main effect of pregnancy and an interaction term between pregnancy and breastfeeding (to capture the combined effect of the two variables) but no separate main effect for breastfeeding (since only women who have had a child breastfed). As it was found that the effect of breastfeeding was more protective rather than pregnancy alone (even if number of pregnancies is considered), only breastfeeding was considered in multivariable models. These findings are presented in the newly added Table 2 as well as discussed in the Results section. Discussion is a repetitive of results. There is too few discussion, comparison & contrast and similarity with other studies. 1st paragraph should be combined in the background Many statements do not have supported references eg. pg10 1st & 2nd sentences. No mention of limitation of study Following the reviewer s advice, we have now restructured the Discussion section of the manuscript, discussed our findings in the context of the international literature, moved the first paragraph to the Background section, as well as mentioned the limitations of the study in the Discussion section. 4
Referee 2: Sunita Saxena The manuscript entitled Epidemiology of breast cancer in Cyprus: a population based case control study is based upon the adequate sample size. However there are areas of major concerns, especially to the analytical methods adopted in the analysis of data and the presentation of results. The other area of concern is the imprecise postulation of the research question addressed. The authors say that the present study main aim was to compare established and recognized breast cancer risk factor, in Cypriot women with and without breast cancer. Lastly it has been concluded that their study provides the first scientific evidence for more targeted campaigns of prevention and early diagnosis, in the studied population. But even after large n the results are not well presented that is the primary requirement for any epidemiological investigation. It was pleasing to note that the reviewer recognised the large sample size as an obvious strength of our study. As we mention in the article, due to the small population of the island, cases represented 50% of all breast cancer cases diagnosed during the 7-year period 1999-2005. Power calculations indicate that the sample size in the study would have 90% power to detect a magnitude of association at the odds ratio scale of 1.5 at 5% statistical significance for an exposure that occurs among 10% of the controls. We have now addressed their main concern which mainly refers to the analysis and presentation of the data. As we also mentioned above the data have now been reanalysed using multivariable logistic regression (see Methods, Data analysis). We have also framed our research question more clearly. Major Compulsory Revisions: Abstract # The first paragraph states that there are no data as yet available about the risk factors and breast cancer in the Cypriot female population. But then they say that the aim of the present study was to compare the established and recognized breast cancer risk factors, in Cypriot women with and without breast cancer. There is a lack of well defined research question in the study. If the aim is to compare then the methodology adopted cannot specifically address this issue. Background # The same issue of imprecise research question in the abstract also applies to the background section of the manuscript. The main aim of the study was (a) to describe the underlying distribution of some recognized breast cancer risk factors among the Cypriot population (hence the choice of a population-based control group) and (b) to assess the strength of the associations with breast cancer risk in a Cypriot population. Methods # Please state more specifically how the matching was performed. Individual/frequency matched. # There is a need for mentioning the criterion laid down for considering the age group 40-70, for completeness? # What it is meant by controls participating in the study was stratified. It needs to be written for the clear understanding of the readers? The controls have been stated to have been stratified to be representative of the island population. In such a scenario, these are not expected to be appropriate to serve as controls in a case control study. # The criterions lay down for selection of selection has not been explicitly stated. 5
We have now explained in more detail the design of the study as well as the selection criteria for the participants in the Methods, Study Participants and Data collection sections of the article. Specifically, we have clarified that this was NOT an age-matched case-control study (as it might have been wrongly implied in the earlier draft). That is why simple logistic regression adjusting for possible confounding effect of age (rather than conditional logistic regression) was employed for the analysis of the data. The controls were women who participated in the National screening programme; to a large extent the same population that would also give rise to cases. All women aged 50-69 (identified from population lists) are invited to participate in the National screening programme. Due to the narrow age span of women invited for screening (ages 50-69) younger and older women were underrepresented. Nevertheless, between the ages of 50-65, the observed age distribution roughly reflected the age-distribution structure of the female population. Data analysis and Methods # The study design is case control with age matched. Then why Mc-Nemars test was not performed. The Case Control Study design should accompany the Matched Analysis. Presently, the degrees of freedom and P value in relation to the #2, are not being provided, which should be provided irrespective of the unmatched or matched analysis is performed. # The fundamental issues of Multicollineraity, Confounding and Interaction assessment was possibly not performed as the results are based only on the univariate logistic regression analysis. These are the fundamental issues of such epidemiological studies that help in estimating the true estimate of the real effect. # What was the value of the non parametric Kendall s Tau-b, statistics for the covariates under consideration? If the coefficient matrix, illustrated significant non parametric correlation, then how it was taken care during the analysis? # The depicted results are based only upon the univariate logistic regression analysis. Why multivariable analysis results are not presented? It is strongly recommended to please do the multivariable analysis and present their result. # Why conditional logistic regression analysis was not performed in spite of adopting the Case Control study design? # The results of univariate only cannot be considered for campaigns of prevention and early diagnosis, in the studied population. The univariate analysis only gives the direction for association and is an exploratory step prior to the multivariable model development. The multivariable analysis is definitive and is a standard regression approach for the epidemiological model development as they address the issue of Multicollineraity, Confounding and provide scope for the interaction assessment. # There is no mentioning of the statistical package considered for the data/statistical analysis As already mentioned above, χ 2 test and simple logistic regression was performed (rather than McNemar s and conditional logistic regression) because no individual matching was employed. Note that all model estimates presented in the revised version of the article (including associations with single risk factors) were adjusted for age. We, of course, agree with the reviewer about the importance of issues such as multicollinearity, confounding and interaction and acknowledge the weakness of the previous version of the manuscript to address these issues effectively. This has also been noted by the first reviewer. As also mentioned above, the data have now been re-analysed using multivariable logistic regression in STATA SE 9.0 and presented accordingly in a series of three Tables (see details in the Methods, Data Analysis section of the revised manuscript). As expected, pregnancy, number of children and breastfeeding (both status and duration) displayed a high degree of dependence (as indicated by Kenall s tau-b statistic). To avoid multi-collinearity, associations with these variables were firstly explored, in order to study their behaviour when adjusting for the 6
effect of each other and selected the variable with the strongest association, to be included in the final model. Table 2 presents the results of this part of the analysis. In more detail, the possible protective effect of pregnancy and the added effect of breastfeeding over the effect of pregnancy, was investigated in a logistic regression model that included the main effect of pregnancy and an interaction term between pregnancy and breastfeeding (to capture the combined effect of the two variables) but no separate main effect for breastfeeding (since only women who have had a child breastfed). Breastfeeding appeared to have a more protective effect (rather than pregnancy alone). Similarly, the strong univariable association observed with number of children diminished once breastfeeding was controlled for, in the analysis. In fact, the Odds Ratio for a one unit increase in the number of children was 0.92 (95%CI 0.80, 1.05) among women who did not breastfeed and 0.97 (95%CI 0.82, 1.15) among women who breastfed. The Mantel-Haenszel estimate controlling for breastfeeding was 0.94 (95% CI 0.85, 1.04) (test of homogeneity of ORs= =0.61), indicating no evidence of an association between breast cancer and number of children, irrespective of breastfeeding status. Thus, among these variables, breastfeeding was the only one further considered in multivariable models. These issues are covered in the Methods, Data Analysis section as well as presented accordingly in the Results section of the revised manuscript. Results # Table 2 presents data according to various risk factors as sub tables. However, the total of cases and even controls is different in different sub tables. In fact, for subtables on marital status, pregnancy, breast feeding, reproductive status, smoking and exercise, the total number of cases is 1111, 1112, 1179, 1143 and 1110 respectively, which is more than the cases enrolled. In fact all the sub tables seem to have different totals, the reason for which is not given. Similar discrepancies exist for control also. There were 88 women among cases stated to have had no pregnancy, but 103 women were given as not having any children. Were 15 women pregnant at the time of interview? Under this sub-table 1006 women cases were given as pregnant (at any time) but the details on age at first pregnancy is given for 996, but breast feeding details are given for 1112 cases. Reproductive status of 1179 is given with only 66 stated as never pregnant (against 88 given elsewhere). # The percentages for each level of the considered covariates are not provided. # The median is not accompanied with inter quartile range (IQR). # P value in table 2 for age not specified. # Associated statistics related to #2 not given in the result (value of the test statistic and significance), so the statement that there was statistically significant differences, between cases and controls in terms of level of education and marital status lacks the evidence. # There should be uniformity in stating and placing the reference category for the covariates under the consideration (for age at menarche written lastly whereas for others first). # Its OK that for the age at menarche the Odds Ratio value of 1.57 (for the category <11) is more than 1.27 (for the category 12-14), but if the trend is significant, then the P value for the trend must be mentioned in the text or at the footnote of the table with appropriate legend identification. # The statement some associations of reproductive factors with breast cancer------, pregnant ------- 95%CI (0.50, 0.96) require to be interpreted specifically in relation to the considered dependent variable in the logistic regression analysis (breast cancer risk). # What was the rationale behind considering the threshold of 5 years for HRT. Needs to be addressed for proper understanding, partly addressed. # The interpretation for the variable smoking needs to be correctly interpreted in relation to the dependent variable in the logistic regression analysis (breast cancer risk). The precise way how the variable smoking was considered the analysis quantitatively has not been mentioned (mention regarding duration and intensity/number of pack years). 7
We have now re-structured the presentation of the results in three separate Tables and re-wrote the results accordingly. As such, we have now followed the reviewer s suggestions faithfully for correcting the previous inconsistencies. This includes the provision of percentages, reporting p- values for differences, and p-values for trend across ordered categorical variables, ensuring consistency in reference categories as well as consistency in reported numbers. For instance, the difference in number of women reporting not having had a pregnancy (N=155) and not having had a child (N=191) is due to the fact that 15 and 21 women in cases and controls respectively have reported having a pregnancy that did not result in birth. With regards to the reviewer s comments on smoking and HRT use (also noted by the first reviewer), we have now explained that no information for duration or intensity of smoking was available, in order to calculate pack-years. Regarding HRT use only a small percentage of women in our sample reported use for longer than 5 years. Nevertheless, we have now displayed associations across duration of use in the following categorisation: never, <6months, 6-60months and >60 months. Discussion # Discussion is based only upon the findings of the univariate logistic regression analysis. The authors need to do multivariable analysis and present the results and discuss them subsequently. # No result has been provided for the statement breast cancer risk was significantly greater in postmenopausal woman Vs pre menopausal women. We have now performed multivariable logistic analysis and presented/discussed our results accordingly. Lastly, while we have previously used age of diagnosis as a proxy for menopausal status, it was decided to remove it from the revised version of the manuscript. Thank you and the reviewers for their constructive comments and for giving us sufficient time to respond to their suggestions. I look forward to receiving your response. Yours sincerely, K. Kyriacou, PhD Senior Scientist 8