Author s response to reviews - PDF Free Download

Author s response to reviews Title: The validity of a professional competence tool for physiotherapy students in simulationbased clinical education: a Rasch analysis Authors: Belinda Judd (belinda.judd@sydney.edu.au) Justin Scanlan (justin.scanlan@sydney.edu.au) Jennifer Alison (jennifer.alison@sydney.edu.au) Donna Waters (donna.waters@sydney.edu.au) Christopher Gordon (christopher.gordon@sydney.edu.au) Version: 1 Date: 07 Jun 2016 Author s response to reviews: Ms Belinda Judd Faculty of Health Sciences and Sydney Nursing School, University of Sydney Dear Professor Edwards, Re: MEED-D-16-00275 The validity of a professional competence tool for physiotherapy students in simulation-based clinical education: a Rasch analysis Thank you to you and the reviewers for the positive and constructive feedback on our manuscript. We are confident the changes we have made in response to this feedback have strengthened the manuscript and we hope that you will now find it suitable for publication. Outlined below is the reviewer comments (in red text) followed in turn by our author response to each of these comments. We have also made the corresponding changes in the manuscript text. The revised manuscript is resubmitted for you with changes highlighted in yellow.

With kind regards, Ms Belinda K Judd Reviewer reports: Reviewer #1: This well-written manuscript describes an important and relevant topic, because of the increased use of simulation in health professions education. Thank you for this positive summary. Method: - Please give a little bit more information about the participants: amount, age, what do you mean with "the latter stages". Thank you for this comment. We have added further detail as requested. The manuscript now reads (from line 147) Participants in this multi-site cross-sectional study were pre-registration physiotherapy students from two Australian universities. There were 444 students in simulation and 190 students in clinical placements. The students were from years three and four of a four-year undergraduate program and from year two of a two-year graduate entry master program. We were unable to provide demographic data such as age of participants as this was not within the remit of de-identified data from ethics. Therefore, we are unable to add these data; however, it was unlikely to be a confounder as previous research (Dalton et al, 2011) has not shown any effect of age and other demographic variables on the Rasch analysis of the APP tool. This has been discussed in the limitations (see from line 449). Dalton, M., Davidson, M., & Keating, J. (2011). The Assessment of Physiotherapy Practice (APP) is a valid measure of professional competence of physiotherapy students: a crosssectional study with Rasch analysis. Journal of physiotherapy, 57(4), 239-246). Discussion: - Can you say something about the generalizability to other fields of health profession education? We have revised the manuscript to address this. The manuscript now reads from line 456

The approach of using standardized competency tools for assessments, that have been validated using Rasch analysis, has been previously undertaken in occupational therapy and physiotherapy[18, 32]. This study has now demonstrated the applicability of this approach to assessments also in simulation settings. This approach therefore may also be useful for consideration by other allied health professions who wish to develop their own standardized competency tools for simulation or clinical placement evaluation. - Discussions: What could be the impact on the results if you had the possibility to brief the educators about scoring objectives. Thank you for this comment; the results show that the APP in simulation was quite robust except when used in a short form assessment of a single student performance. This study was an observational study and it was beyond the scope of the research design to have any interventional aspects. It is interesting concept however and could be considered for further research and we had noted this is the limitations section see from line 452. Reviewer #2: This is a well-written and very interesting paper on the validity of the APP using Rasch analysis. I do think the paper would be significantly improved with some minor revisions: Thank you for this positive summary. 1 - The biggest overall concern is the continuous ambiguous use of the term 'validity'. Validity can mean many things and comes in many forms (Construct, content, face, predictive, etc...). It seems that here the authors are using validity to mean 'quality' or, more functionally, 'fitnessfor-purpose'. The argument appears to be that the APP is fit-for-purpose, because it is psychometrically robust, reliable, feasible, of good quality, targets the relevant content, and so on. In one instance, the authors actually use the word 'suitable' (line 131). The Rasch analysis is being used to justify these claims, but I think the term validity require more explication, definition and consideration by the authors throughout the entire paper. Otherwise, every time it is used the question remains 'what does validity mean in this specific sentence?' The 'Introduction' paragraphs 4, 5 and 6 should be re-considered in light of these comments. See also lines 444 and 450 in the conclusion. Thank you for this comment. We agree that our use of the term validity needs further careful clarification. We have reviewed the manuscript and clarified the aims which were to explore specifically, both the construct validity of the tool using Rasch analysis, and an overall functionality of the APP s use in simulation. These changes have been reflected throughout the paper.

2 - The Introduction is a bit repetitive and could use some editing. The 6 paragraphs could probably be condensed to 5. We have modified the introduction to minimise repetition. We felt that this topic required the reader to be stepped through several concepts and whilst our intention was not to be repetitive but to interlace concepts. The changes are reflected in the revised manuscript. See particularly from line 94. 3 - Line 209 requires more elucidation. Rasch analysis allows for the validation of the unidimensionality of a latent trait purportedly being measured by an instrument by assessing the goodness of fit of the items to the Rasch model. The words 'internal structure' sound much more like the reliability (Cronbach's alpha) in Classical Test Theory. Thank you for this comment. We have re-worded this paragraph. The manuscript now reads from line 207 The Rasch measurement model provides a mathematical framework to explore the construct validity of an instrument. The central theory to Georg Raschs model is that a person having a greater ability than another person should have the greater probability of solving any item of the type in question, and similarly, one test item being more difficult than the other means that for any person the probability of solving the second test item is the greater one [22]. The unidimensionality of the instrument is evaluated by examining the goodness of fit of the items to the Rasch model [23]. 4 - It would be good to reference the foundation of the Rasch model and Rasch Measurement in general (such as Georg Rasch's 1960 paper) rather than just applications of the Rasch model in health contexts. The reviewer makes a valid point. We have addressed this in the paragraph change from reviewer point 3 (above) and also in the manuscript from line 208 5 - Line 238. This fit range is not unanimously accepted. Some think it is highly contentious to regard anything beyond 0.8 and 1.2 as acceptable fit, for instance. Some discussion around these values should be given and reflected on in the discussion section in light of the large infit and outfit values in Table 1. One single reference to one author who recommends an acceptable fit range is insufficient. Thank you for this comment. We accept that there is some debate about the acceptance of fit range. We incorporated this into the manuscript to demonstrate that there is a difference of agreement with in fit range, see from line 238 There are varying levels in the research regarding what constitutes an acceptable fit [23, 26]. For an ideal fit, the mean square value is 1.0. We considered an acceptable fit being 0.5-1.5,

and values >2.0 suggest that the item is either being used inconsistently enough to potentially corrupt the measurement model or that it is not part of the construct under examination [26]. 6 - Line 276. Perhaps different assessment 'contexts' as well as 'formats'. We agree with the suggestion. The manuscript now reads from line 276: A scale that fits the Rasch model performs consistently irrespective of the different assessment formats and different contexts in which the tool was applied. Examining differential item functioning allows the investigation of item biases that may exist in one of these different assessment formats and contexts. In this study an analysis was established to explore differential item functioning of the APP between the three simulation formats (longitudinal 1 week, longitudinal 2 week, and short-form) and also compare these formats with the clinical placement setting. 7 - Lines 280-282. Did these other studies use Rasch? It would be good to clarify in a sentence how the item bias was investigated in the other studies. We agree this needs further clarification. The item bias was investigated using Rasch analysis and we have now reflected this in the revised manuscript. The manuscript now reads (from line 284) Previous research has demonstrated, using Rasch analysis, no item bias for the APP across nine demographic variables (student and educator age, gender and experience levels, type of facility, university type, clinical area) [20] and therefore these variables were not re-examined in this study. 8 - Line 313. Again, fit statistics of less than 2.0 still seem very high. In table 1, it would be good to include "n" for all items in the different contexts, as that way the variability of the figures based on small sample sizes can be viewed by the reader. As suggested, for clarity, the percentage of non-marked items on Table 1 has now been deleted and replaced with an individual count of the number of times ( n ) an educator marked each item in each of the different assessment formats. Please see attached resubmission revised Table 1. 9 - Line 379. 'difficult' should be 'difficulty' Thank you for highlighting this error. The manuscript now reads (from line 388) The hierarchy of item difficulty was consistent between simulation assessment formats (short and longitudinal) with the exception of two items. 10 - Line 385. Rather than just 'appropriate fit', please give the fit values as well.

We have now added the fit statistics. These are presented in Table 1, but we have also now altered the manuscript to read (from line 393) Item 6 was a misfit in Dalton et al. [18] data, but we found this item demonstrated appropriate fit for both longitudinal assessment formats (longitudinal one week infit/outfit 0.99/0.90 mnsq; longitudinal two weeks infit/outfit 1.07/1.18 mnsq, Table 1). 11 - Another benefit of the Rasch model is the generation of detailed item statistics. Presumably these were calculated in the production of the data in Table 1? It would be good to see what analysis was done on the items themselves. Item thresholds are discussed, so maybe these should all be reported in another table? It would be interesting to see the item-total correlations, for instance. This would add another analysis dimension to the paper, and would allow discussion on the quality of the individual items in the APP, rather than just focusing on the performance of the instrument as a whole. Thank you for this comment. We believe that individual item quality has been adequately presented in Table 1 by the individual item measures and fit statistics and in other results in the text. It is generally not recommended to consider correlation statistics of items when using Winsteps derived analysis. Please see the following link for further explanation: http://www.winsteps.com/winman/correlations.htm. We don t feel these further statistics bring anything substantial to the analysis. 12 - It would be good to present a Wright (Item-Person) Map as well if possible to demonstrate the targeting of the items. The item locations are reported under the 'measure' column in Table 1. The is presumably the Rasch difficulty. The person locations must have been calculated as well, so an Item Map would offer a great visual report of this for the reader. This is a good suggestion and we have now included an Item-Person Map to graphically present the targeting of the items in the all-simulation data. See the revised manuscript Figure 2 and the explanation from lines 350: Overall, the APP data from simulation demonstrated mean location scores that were obtained for students closely matching with the value of zero set for items (Figure 2). There was no major floor or ceiling effects suggesting an overall good match of the item difficulty to student ability (Figure 2). 13 - Figure 1. It would be good to discuss the DIF in the 'Two week' category as well. Although not major, there does appear some DIF for items 1-4, 8-13 and 17-20. If these are not significant, then a justification of why would be good, along with some discussion of the threshold of 'significance' being applied. We have amended the methods and results sections to reflect further details of our DIF analysis. The manuscript methods now reads (from line 282)

Statistically significant items with a mean difference of > 0.5 logits was considered an appropriate threshold to determine the item to be making a noticeable impact on the functioning of the scale [27]. The manuscript results section now reads (from line 342) Item 6 demonstrates clear and accurate documentation was the only statistically significant item to display item bias and DIF over the threshold of >0.5 logits.