Title:Prediction of poor outcomes six months following total knee arthroplasty in patients awaiting surgery

Author's response to reviews Title:Prediction of poor outcomes six months following total knee arthroplasty in patients awaiting surgery Authors: Eugen Lungu (eugen.lungu@umontreal.ca) François Desmeules (f.desmeules@umontreal.ca) Clermont E Dionne (clermont.dionne@uresp.ulaval.ca) Étienne L Belzile (etbelzile@hotmail.com) Pascal-André Vendittoli (pa.vendittoli@videotron.ca) Version:3Date:19 July 2014 Author's response to reviews: see over

Authors response to the reviews Title: Prediction of poor outcomes six months following total knee arthroplasty in patients awaiting surgery Authors: Eugen Lungu (eugen.lungu@umontreal.ca) François Desmeules (f.desmeules@umontreal.ca) Clermont E. Dionne (clermont.dionne@uresp.ulaval.ca) Étienne L. Belzile (etbelzile@hotmail.com) Pascal-André Vendittoli (pa.vendittoli@umontreal.ca) Version: 3 Date: 19 July 2014 Author s response to reviews: see over 1

July 19, 2014. The Biomed Central Editorial Team Object: BMC Musculoskeletal Disorders MS: 4422488461277119 - Prediction of poor outcomes six months following total knee arthroplasty in patients awaiting surgery. Lungu et al. Thank you for considering our manuscript for publication in your journal. We have reviewed the above manuscript according to the reviewers comments. Please note that upon review of Figure 2 and of Table 3, we have identified an error in the presentation of the classification criteria of the prediction rule. The possible answer patterns to the items (none, mild, moderate, severe or extreme) have been corrected accordingly in both instances. The title page as well as the tables, figures and additional files have also been revised according to the journal s guidelines. Reviewer # 1: Erik Lenguerrand Reviewer s report: - Major Revisions 1. "The objective was to develop a preliminary prediction rule (PR) to identify patients enrolled on surgical wait lists who are at the greatest risk of poor outcomes 6 months after TKA". The derived PR is based on pre-operative factors. This provides a tool of major clinical interest as it can be used as soon as the patient is registered on the waiting list. This allows a very early planning of the appropriate course of action, such as prehabilitation, conservative management, wait list priority or intensive post-operative rehabilitation. However it seems that surgical and post-surgical (complications) patients characteristics have been considered as potential predictive factors. None of them were selected in the final model but the fact that they were investigated does introduce some conceptual and clinical limitations to this work: 1.these characteristics could be consequences of pre-operative factors; 2.clinical staff would need to wait the surgery to be able to derive the final prognostic score if any of these variables would have been selected and this would not have allowed the implemention of prehabilitation or conservative management strategies.3. Surgical factors related to surgeon or hospital (such as staff seniority, rates of success/complications etc) could also be of some interests/associated with the 6 months outcomes. For these reasons, I would recommend to focus this work only on the pre-operative factors and write the manuscript accordingly. Table 1 seems to suggest that the surgical factors were not considered but the length of inpatient stay was. Please clarify this point in the whole manuscript. The reviewer is correct in stating that the focus of this work should be on preoperative factors and this is precisely what was done. Table 1 was reformatted as per reviewer s request; it now distinguishes the variables that were considered as potential predictors for developing the PR from the other values that were collected. Moreover, the paragraph entitled Potential predictors at enrolment on surgical wait list in the Methods section of the manuscript indicates that in addition to the variables presented in Table 1, we also considered individual questions from validated questionnaires (i.e.: social support tool, PSI and 2

WOMAC) to build the rule. We have revised the confusion that might arise from the sentence The pre-surgery wait times were calculated from the data extracted from the wait list database of each hospital being present in the Potential predictors at enrolment on surgical wait list paragraph as pre-surgery wait time was not considered as a potential predictor. We deleted it from the abovementioned paragraph and inserted it in the other variables paragraph. 2. The authors specified that a set of eligible candidate predictors was created by manual adjustment based on statistical, clinical and ease of use considerations. For each resulting PR sensitivity (Ss), specificity (Sp), (AUC), were calculated with their 95% confidence intervals. The simplest rule demonstrating the highest sensitivity with acceptable level of specificity was selected as the final tool. The readers are only shown the final model. As the concepts of the simplest and acceptable level of are in some extend subjective I would recommend to present in an appendix all the combos of variables considered with their respective statistics: SS, Sp, AUC This would make the manuscript more transparent, and provides useful information for the readers interested in conducted such research. We agree with the reviewer that there is some subjectivity associated with the selection of the final rule the appendix now contains 8 prediction rules that were considered with their respective statistics. Although one rule (PR4) presented slightly better metrologic qualities, the final rules was chosen because it included one less variable and was based only on question of the baseline WOMAC 3. As the final model is based on some of the factors used to derived the WOMACfunction subscale, it would be interesting to compare the predictive value of the presented PR with the predicted value of the whole WOMAC-function subscore; for example by compared the current findings with the SS, Sp and AUC of a dichotomised version of the pre-operative individual WOMAC-function scores (dichotomised using the threshold which maximised the AUC, SS, and Sp). This would allow the reader to determine how much better is the current PR compare to an existing score with established psychometric properties and well accepted in the clinical and academic communities. We acknowledge the interest of such a comparative analysis; however, in itself, such an approach can be the subject of an individual study. Moreover, we believe that once calculated, this threshold does not necessarily hold the established psychometric properties of the WOMAC function scale because, among others, it would have to be validated with a different sample. Presenting this threshold would diminish the focus we want to instate in our developed prediction rule. 4. No external validation of the selected PR has been conducted; the internal validation is based on a bootstrap approach to test the accuracy of the estimation. However, as the sample used is quite small, I am wondering if a V-fold Cross Validation would also 3

have been a relevant internal validation strategy to determine if the presented tree is the most appropriate one. 10-fold Cross Validation would be impossible to perform as the SPSS Answer Tree 3.1 Software can only conduct it on a decision tree that was developed automatically. The chosen final PR was developed using a manual adjustment based on statistical, clinical and ease of use considerations ; we thus modified the automatic tree that the software yielded, hence we couldn t apply the 10-fold Cross Validation. -Minor essential revisions 1. The format of table 1 could be improved. The authors have highlighted the variables tested for the PR, however this looks rambling and I would recommend splitting the table in two, with the first half dedicate to the tested variables. I would also recommend presenting the variables in the text in the same order as in the table. Changes made as indicated by the reviewer 2. In table 2, the authors mentioned that the scores are presented as %. Technically they are presenting means and SDs of standardised scores rather than mean and SDs of %. Changes made as indicated by the reviewer 3. The following concepts presented in the method, results sections and in table 5 are not defined. It would be beneficial for the reader to understand their use in the context of prediction especially as only the first two have been used to select the final model: Sensitivity, specificity, positive predictive value, negative predictive value, positive likelihood ratio, negative likelihood ratio, Area under ROC curve. Changes made at the bottom of Table 5as indicated by the reviewer 4

Reviewer # 2: Kristina Harris Reviewer s report: Major Compulsory Revisions 1. The authors ground their predictive model on the premise that the patients who score in the first quintile of the postoperative WOMAC score are defined as having poor outcome and patients in other four quintiles are satisfactory. I have major concerns with this reasoning: The postoperative WOMAC score depends on the preoperative score. For instance, a patient could have been in the 4 th quintile preoperatively and fall to 2nd quintile postoperatively (which is up to 50 points on the 0-100 WOMAC score) and would still be considered by authors as having a satisfactory outcome. Authors should focus on identifying patients who have achieved satisfactory change in the score-this being the minimally clinically important change. This information has been published, see: (Chesworth et al., 2008) "Clinically important difference (CID) was explored with n=1578 patients one year following TKR. Furthermore (unless I am missing something here) the "Dependent variables" paragraph does not make much sense to me. The authors mention that the WOMAC score was transformed from 0 (best, no pain) to 100 (worst), yet they say that they will define the patients in the first quintile of the postoperative WOMAC score as the ones having a bad outcome? Furthermore, they introduce a cutoff of 40.4 on the WOMAC score as a first quintile cutoff. Conceptually, the minimal clinically important change is different from what we desired to achieve by employing the worst quintile of post-operative WOMAC scores as a definition of poor outcome. Our purpose was to develop a prediction model that would help identify the patients with poor functioning status and severe pain regardless of how effective the intervention was for them. We do not target the prediction of a treatment effect, such as would be the case if we used the MCID. In our manuscript, we mention that numerous references state that 10-30% of patients experience poor pain and functional outcomes following TKA. The references in question employ a large number of methods of assessing what represents poor pain and poor function after the surgery. We believe that defining the worst quintile (i.e. 20% of the sample) as having poor outcomes is an appropriate value compatible with the interval (10-30 %) suggested in the literature. Moreover, in the process of developing the prediction rule we categorized the sample according to tertiles, quartiles, median etc. The quintile categorization allowed the development of the best model by taking into account statistical, clinical and ease of use considerations. 2. In the Methods section, more key information could be presented on the sample characteristics. For instance: what is considered a severe degenerative disease which was used as an exclusion criteria for this study? Also, did all patients had just OA or were other pathologies included (such as RA, tumors)? 5

Table 1 indicates that 136 patients (out of 141, 97% corrected to 96%) had OA. Also, 5 patients (out of 141, 3% corrected to 4%) had RA. Severe degenerative disease was defined in the manuscript as indicated by the reviewer. 3. There is a duplication of information in the results section and tables. The authors could consider focusing only on presenting key information in the text. Changes made as specified by the reviewer 4. The authors should elaborate more (and present more information) on the decision-making processes that guided the creation on the final set of eligible candidate predictors. They should clearly present information (perhaps in the Supplementary material) on all of the variables considered and used and their completion rates. The appendix now contains 8 models that were considered. The completion rate of the variables was added in the reformatted version of Table 1. See also response to comment 1. 5. In the Discussion, the authors recognize that their final rule model may not make sense. Given the fact that the entire analysis is based on the sample size of 144 and on a single sample (and some of my points above on what constitutes a good outcome and how the predictors are chosen), I would recommend that the analysis and recommendations are revisited accordingly. See response to comment 1. 6. The authors could discuss the relevance of the use of the 48-hour recall window in their prediction algorithm. The 48-hour recall window in the items of the prediction algorithm is used because we employed the standard WOMAC version (Bellamy & Buchanan, 1986 A preliminary evaluation of the dimensionality and clinical importance of pain and disability in osteoarthritis of the hip and knee ). We do not believe that a different recall period would change our results. Other versions of the WOMAC using other recall periods have been used by others (24 hours, 7 days, 4 days, since last visit) but the authors of the WOMAC have stated that: the index appears sufficiently robust to tolerate these variations in timeframe between 24 hours and 1 month (Bellamy 2005, The WOMAC Knee and Hip Osteoarthritis Indices: Development, validation, globalization and influence on the development of the AUSCAN Hand Osteoarthritis Indices). 6

Minor Essential Revisions 1. Consider adding horizontal lines and/or narrowing the space between Variables and % column in Table 1 We have reformatted Table 1 completely as requested by Reviewer 1 2. The number of patients in the top second box in Figure 2 does not add up. The error has been corrected. 3. Have authors considered copyright issues for the WOMAC questionnaire (both for this publication and the potential use of the predictive tool)? Having developed a preliminary tool, we have not considered any copyright issues. We will be in touch with Dr. Bellamy s team during the validation process of the prediction rule. 4. The reporting of means, SD's and 95% CI's should be consistent throughout the manuscript. Changes made as indicated by the reviewer 7