PEER REVIEW HISTORY ARTICLE DETAILS TITLE (PROVISIONAL)

PEER REVIEW HISTORY BMJ Open publishes all reviews undertaken for accepted manuscripts. Reviewers are asked to complete a checklist review form (http://bmjopen.bmj.com/site/about/resources/checklist.pdf) and are provided with free text boxes to elaborate on their assessment. These free text comments are reproduced below. TITLE (PROVISIONAL) AUTHORS REVIEWER REVIEW RETURNED GENERAL COMMENTS REVIEWER REVIEW RETURNED GENERAL COMMENTS ARTICLE DETAILS How do GPs in Switzerland perceive their patients satisfaction and expectations? An observational study Sebo, Paul; Herrmann, Francois; Haller, Dagmar VERSION 1 - REVIEW Carl de Wet Logan Hyperdome Doctors, Australia 12-Dec-2014 Thank you for the opportunity to review this clear and well-written manuscript. In my professional opinion it is suitable for publication in its current form. Nienke Bleijenberg Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, the Netherlands 25-Feb-2015 The authors conducted a cross sectional study and explored whether 23 GPs are able to accurately predict their patients' satisfaction with the care they provide, as well as their expectations in general practice. A large sample of patients was included. Although the topic is highly relevant to improve GPs care, there are several methodological concerns that seriously affects the interpretation of the results (research question, methods and outcome). Furthermore the references in this paper are all very outdated, please update. The title is misleading, and throughout the manuscript the authors state in their objective is GPs are able to predict their patients' satisfaction, but in order to predict an outcome one should have follow-up data. Since this study has a cross sectional design the aim should therefore be adjusted into assessing the association/correlation. Prediction should be removed throughout the manuscript. Methods: Selection of patients is unclear. Was this and despite the authors state in the discussion section that the risk of selection was reduced due to consecutively enrollment, selection bias cannot be ruled out and is a limitation of this study. Timing of measurement differ among patient: questionnaires were filled in before or after the visit. It would be valuable to evaluate if and to what extent this influences the results.

REVIEWER REVIEW RETURNED GENERAL COMMENTS The ' expectation' questionnaire was not validated. the authors state that " the questionnaire was pretested" but it is unclear what exactly was tested. The outcome variable for the logistic regression ambiguous and not clearly described. Why did the authors dichotomize the outcome? A major drawback is that they lose important information. Please describe which variables were included in the multivariate model. All variables listed in the table or another approach? Results: please include how many patients were approached. Correlations are missing. Discussion: Relevant literature and a theoretical explanation for the findings are lacking. Recommendations for clinical practice are not sufficiently described. Laurent Letrilliart Université Claude Bernard Lyon 1, France 05-Mar-2015 General comments This article describes and compares patients expectations and satisfaction and the corresponding perceptions of their GP. It shows an overestimation of patients expectations and an underestimation of patients satisfaction by GPs. Certified GPs have more accurate assessments than others. The findings regarding patients expectations are original, but the stakes are not so clear and the methods used raise some questions. Spécific comments Abstract Mean patients age is likely to be 54 years rather than 64, according to the figure reported in the results section. How the main outcomes were measured should be mentioned. Introduction The introduction should be better constructed, by separating the presentation of the two issues addressed in the study (patients expectations and satisfaction, respectively), before devising their articulation. Patients «involvement in decision about their care» refers to another issue, and looks off topic. Methods What is the «local research ethics committee»? The data collected were focused on expectations and satisfaction about organisational aspects, and did not take into account other important aspects of care such as patient-physician communication or technical procedures. The authors should justify this choice. It seems that the patient clustering effect has not been considered in the analyses, in particular by using a multilevel model, which should be argued. It is hard to understand why the factors associated with an accurate assessment of patients expectations have not been evaluated also in a multivariate model. Results In tables 2 and 3, the duplication of the results in terms of numbers of patients evaluation more, less or equally favourable than GPs is not really needed. In addition, the mean difference between patients and GPs evaluations should be presented in these tables, in order to describe its direction and size, for all variables but ordinal ones.

In table 3, the five items exploring accessibility and availability could presumably be analysed as ordinal variables, using a rank test. Discussion Reasons why physicians should assess patients general expectations/satisfaction rather than expectations/satisfaction relating to the particular consultation should be addressed. The influence of physician s certification, which is an original finding, should be better emphasized and interpreted. In particular, are there some certification criteria that cover the organisational factors considered in the study? The sample size of GPs should actually have nothing to do with this finding, as it is taken into account by the statistical test. Which proportions of eligible patients were finally excluded from the study? To assess a possible selection bias of the participating GPs, they should be compared with the GP population in Geneva for variables such as age, gender and certification status. The social desirability bias that may apply to patients assessments is likely to be unintentional rather than intentional. Conclusion Patient satisfaction is possibly, but not certainly, related to better outcomes. VERSION 1 AUTHOR RESPONSE Reviewer Name Dr. Carl de Wet Institution and Country Logan Hyperdome Doctors, Australia Please state any competing interests or state None declared : None declared Please leave your comments for the authors below Thank you for the opportunity to review this clear and well-written manuscript. In my professional opinion it is suitable for publication in its current form. We thank the reviewer for his comment. Reviewer Name N. Bleijenberg Institution and Country Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, the Netherlands Please state any competing interests or state None declared : none declared Please leave your comments for the authors below The authors conducted a cross sectional study and explored whether 23 GPs are able to accurately predict their patients' satisfaction with the care they provide, as well as their expectations in general practice. A large sample of patients was included. Although the topic is highly relevant to improve GPs care, there are several methodological concerns that seriously affects the interpretation of the results (research question, methods and outcome). Furthermore the references in this paper are all very outdated, please update. We included some recent references. The title is misleading, and throughout the manuscript the authors state in their objective is GPs are able to predict their patients' satisfaction, but in order to predict an outcome one should have followup data. Since this study has a cross sectional design the aim should therefore be adjusted into assessing the association/correlation. Prediction should be removed throughout the manuscript.

We understand the reviewer's remark. Indeed, it is true that, in general, cross sectional studies are of limited value for investigating etiological relationships, because exposures and outcomes are measured at a particular point in time. However, as shown in Table 2 (for satisfaction items) and 3 (for expectation items), our study only compared outcomes, i.e. patients views with GPs perceptions of their patients views. Only Table 4 presented in a multivariate analysis a list of exposures (GPs characteristics) independently associated with the ability of the GPs to correctly estimate patients scores. We think that we were rather prudent in the description of these results, because, as stated in the title of the tables and the Methods, Results and Discussion sections, we presented the characteristics «associated with» and not «causing» the outcome. However, as asked by the reviewer, we removed prediction and predict throughout our manuscript. Methods: Selection of patients is unclear. Was this and despite the authors state in the discussion section that the risk of selection was reduced due to consecutively enrollment, selection bias cannot be ruled out and is a limitation of this study. The selection of the patients is described in the methods section. The participating GPs were asked to recruit between 50 and 100 consecutive patients (i.e. min 50, max 100 patients). The inclusion criteria were relatively simple: patients coming to the practice for a planned consultation (thus, patients visited at home and those seen in emergency situations were not included), age > 15, and patients understanding and writing French. All new patients and those suffering from disorders affecting their ability to consent should not be included. We understand the reviewer s concern, but we think that the risk of selection bias is really small in our study. First, because the GPS were selected at random (and the study sample actually seems to be representative of the study population, as mean age (50 vs. 53 years) and sex (men: 61% in the two groups) are similar). Second, because the patients were consecutively recruited by their GPs and only 45 patients out of 1637 refused to participate (i.e. participation rate > 97%). However, as new patients, those consulting in an emergency situation and those who did not speak French were excluded from the study, we stated in the Limitations section that «these patients might have lower levels of satisfaction and different expectations, as they are likely to have lower health or socio-economic status than patients with a planned appointment and/or those who speak French». Timing of measurement differ among patient: questionnaires were filled in before or after the visit. It would be valuable to evaluate if and to what extent this influences the results. Patients had to complete the questionnaire in the waiting room, but could do that before or after the consultation for practical reasons. We agree with the reviewer that it would have been indeed interesting to address this question. Unfortunately, we did not record this variable. We think that patients expectations are unlikely to be much affected by the timing of completion of the questionnaire (i.e. before or after the consultation), because patients views concerning expectations should not be too influenced by the current consultation. By contrast, patients satisfaction could be influenced by the recent experience with the doctor and/or the office. We therefore added a sentence about this in Limitations section. The ' expectation' questionnaire was not validated. the authors state that " the questionnaire was pretested" but it is unclear what exactly was tested. As stated in the methods section, these items, identified through a review of the literature and discussion between the members of the research team, were selected as they were considered as the most important expectations to be studied. The formulation of the questions was based on the formulation used in previous studies when available in the methods section (see bibliography) and then they were translated into French by a

native French speaker. In addition, all the questions concerning the importance given to equipment, appearance/cleanliness and accessibility/availability were built on the same model, i.e. the model of the satisfaction questions (5 point Likert scale). The questionnaire was pretested on a population similar to the study population and feedback was obtained from the responders (n=20 for patients and 8 for GPs) in order to identify any comprehension problem and difficulties patients and GPs may meet in responding to the questions. Then, we modified the questionnaire through discussion between the members of the research team. Finally, we pre-tested the new version of the questionnaire. We also re-administered the same questionnaire to a small number of participants at 2 weeks interval to make sure the questionnaire was reliable in time. We slightly modified our Methods section to better reflect this. The outcome variable for the logistic regression ambiguous and not clearly described. Why did the authors dichotomize the outcome? A major drawback is that they lose important information. We tried to improve the description as follows: Multiple logistic regression models were carried out to identify which doctor factors were independently associated with the ability to correctly evaluate patients satisfaction, assessed either as overall satisfaction or as the mean of all satisfaction items. These models allow for intra-group correlation, relaxing the usual requirement that the observations be independent. The dependant variable was a binary variable coded as 1 when the difference between patient and doctor satisfaction was considered small (< 0.5) or 0 when the difference was considered large ( 0.5, i.e. GP failed to identify patients views). We agree with the reviewer that we lose some information by dichotomizing the outcome, but we think that Odds Ratio are simpler to interpret and more explicit to the reader than betas from a multiple linear regression model. Please describe which variables were included in the multivariate model. All variables listed in the table or another approach? We included all variables listed in the table. We added this information in Table 4. Results: please include how many patients were approached. Correlations are missing. We stated in the results section that 1637 patients accepted to participate and 45 denied. Therefore 1682 (1637+45) patients were asked to participate. Unfortunately, we do not understand what the reviewer means here by «correlations are missing. Discussion: Relevant literature and a theoretical explanation for the findings are lacking. Recommendations for clinical practice are not sufficiently described. We added more recent references. We have provided several theoretical explanations for our findings, as detailed from line 6 of our discussion section. We have added the following subheading before conclusions: Implications for clinical practice and research: An explicit exploration of patients expectations about the practice and their satisfaction could guide GPs in their daily clinical work and help them identify areas for improvement more specifically. Future research should provide guidance about how this can best be done. Reviewer Name Laurent Letrilliart Institution and Country Université Claude Bernard Lyon 1, France Please state any competing interests or state None declared : None declared

Please leave your comments for the authors below General comments This article describes and compares patients expectations and satisfaction and the corresponding perceptions of their GP. It shows an overestimation of patients expectations and an underestimation of patients satisfaction by GPs. Certified GPs have more accurate assessments than others. The findings regarding patients expectations are original, but the stakes are not so clear and the methods used raise some questions. Specific comments Abstract Mean patients age is likely to be 54 years rather than 64, according to the figure reported in the results section. Sorry for the typo. We replaced 64 by 54 in the abstract. How the main outcomes were measured should be mentioned. We stated in the methods section that patients completed a questionnaire containing questions about their satisfaction level with and expectations from their GPs, and that GPs were also asked to complete a questionnaire containing questions about their perceptions of their patients views. In other words, as stated in the methods section, «GPS were asked to imagine how their patients would have responded and to score the items accordingly». We also explained that the same scales were used for patients and their GPs, and that paired t-tests and Wilcoxon signed-rank tests were used to compare patients views with GPs perceptions. All data are shown in Table 2 and 3. We tried to improve the description of the outcome for the logistic regression as follows: Multiple logistic regression models were carried out to identify which doctor factors were independently associated with the ability to correctly evaluate patients satisfaction, assessed either as overall satisfaction or as the mean of all satisfaction items. These models allow for intra-group correlation, relaxing the usual requirement that the observations be independent. The dependant variable was a binary variable coded as 1 when the difference between patient and doctor satisfaction was considered small (< 0.5) or 0 when the difference was considered large ( 0.5, i.e. GP failed to identify patients views). Introduction The introduction should be better constructed, by separating the presentation of the two issues addressed in the study (patients expectations and satisfaction, respectively), before devising their articulation. We tried to improve the construction of our introduction. Patients «involvement in decision about their care» refers to another issue, and looks off topic. We agree with the reviewer and removed this from our introduction. Methods What is the «local research ethics committee»? We corrected this into «the research protocol was approved by the ethics committee for research in ambulatory care in Geneva The data collected were focused on expectations and satisfaction about organisational aspects, and

did not take into account other important aspects of care such as patient-physician communication or technical procedures. The authors should justify this choice. We understand the reviewer s remark. We chose to focus on organizational aspects of care because in Switzerland, as in many other countries, there is currently a debate about best models of care (and in particular group versus small or solo practices). We thought it would therefore be interesting to compare patients expectations and GPs assessment of their expectations on this particular theme. In addition, the questionnaire was already relatively long. Other aspects of care are, of course, also important to consider, but quality of the data could have been poor if we included more items in our questionnaire. It seems that the patient clustering effect has not been considered in the analyses, in particular by using a multilevel model, which should be argued. We took the cluster effect into account for the sample size calculation (see Methods section); in addition, we used multiple logistic regression models that allow for intragroup correlation, relaxing the usual requirement that the observations be independent. We added this information in the Methods section. Now, the data presented in Table 4 take into account the clustering of the data. It is hard to understand why the factors associated with an accurate assessment of patients expectations have not been evaluated also in a multivariate model. The paper already presents many data and 4 Tables. Table 1 is useful, as it presents patients sociodemographic characteristics. Our most important findings are shown in Table 2 (patients satisfaction and GPs perceptions of their patients satisfaction) and 3 (patients expectations and GPs perceptions of their patients expectations). We think that the data shown in Table 4 (GPs characteristics associated with the ability of the GPs of to correctly evaluating patients scores in multivariable analysis) are secondary results. We only used overall satisfaction (and mean of all satisfaction items) to carry out these analyses, because overall satisfaction is a well-known and often used variable in many previous studies, whereas mean of all satisfaction items could be considered as a surrogate marker for satisfaction items in general (the table would have been too complicated with seven satisfaction variables). Therefore, we decided not to perform multivariable analyses with 15 (!) expectation items in addition. Results In tables 2 and 3, the duplication of the results in terms of numbers of patients evaluation more, less or equally favourable than GPs is not really needed. We think that these data could be useful to the reader, first because they are another and simple way of representing GPs perception of their patients views (comparison between «number of patients with evaluations more favourable than GPs think» and «number of patients with evaluations less favourable than GPs think» instead of comparison between «patients evaluations (mean)» and «GPs perceptions of patients evaluations (mean)». Second, because these data allow explaining to the reader which approach is used to carry out Wilcoxon signed-rank tests, as p-value Wilcoxon signed-rank tests are shown in the table. We are happy for the Editor to decide if he/she prefers to keep or to remove these data. In addition, the mean difference between patients and GPs evaluations should be presented in these tables, in order to describe its direction and size, for all variables but ordinal ones. We think, that the mean difference between patients evaluations and GPs perceptions of patients evaluations, as well as its direction and size, would unnecessarily complicate our table, because it

would be easily calculated by the reader, as patients evaluations and GPs perceptions of patients evaluations are shown. For example, for overall satisfaction level, as patients evaluation mean is 4.63 and GPs perception of patients evaluation mean is 3.98, the difference between patients evaluations and GPs perceptions of patients evaluations is +0.65. If we can choose we would prefer to keep the data concerning the number of patients with evaluations more/less/equally favorable than GPs think, rather than adding the mean difference between patients and GPs evaluations. In table 3, the five items exploring accessibility and availability could presumably be analysed as ordinal variables, using a rank test. Indeed, these five items have been explored using Wilcoxon signed-rank tests (see Table 3 for these data). Paired t tests were not applicable, as the individual differences were not normally distributed (see Methods section and footnote 1 in Table 3). Discussion Reasons why physicians should assess patients general expectations/satisfaction rather than expectations/satisfaction relating to the particular consultation should be addressed. Our objective was to assess whether GPs were able to predict their patients satisfaction with the care they provide in general, as well as their expectations in general, because for practical reasons the questionnaires were often completed before the consultation. However, we think that the reviewer s remark is interesting and could be the aim of another study. The influence of physician s certification, which is an original finding, should be better emphasized and interpreted. In particular, are there some certification criteria that cover the organisational factors considered in the study? The sample size of GPs should actually have nothing to do with this finding, as it is taken into account by the statistical test. The finding that uncertified GPs had less ability to correctly evaluate patients views could be explained in several ways. First, uncertified GPs might have (voluntarily or not) altered their perception, because they thought that their patients were less satisfied, as they were uncertified. Second, certified doctors usually acquired certification after training in university clinics where organizational aspects of care are often discussed. Third, as certification usually means additional training and qualifications, certified doctors could also be more skilled at accurately estimating their patients satisfaction level and their expectations. We slightly modified the paragraph accordingly. Note that this finding was only shortly discussed for two reasons. First, as it was based on a small sample size (23 GPs agreed to participate in the study, and only 3 (!) were uncertified), this is really difficult to interpret. In addition, certification status is a contextual factor (specific to Switzerland), explaining why its influence on the outcome is probably not generalizable to other countries, and is probably not an interesting finding for non Swiss readers. Which proportions of eligible patients were finally excluded from the study? We stated in the results section that 1637 patients accepted to participate and 45 denied. Therefore 45 / (1637+45) patients, i.e. 2.7%, refused to participate. This information is stated in the Results section: «the resulting participation rate being above 97%». To assess a possible selection bias of the participating GPs, they should be compared with the GP population in Geneva for variables such as age, gender and certification status.

650 GPs practise in the canton of Geneva. The participating GPs (i.e. 23/650) seem to be representative of the study population. We therefore added the following sentence in the results section: «It is worth noting that the sample of 23 GPs who agreed to participate seems to be representative of the study population (n=650), as mean age (50 vs. 53 years) and sex (men: 61% in the two groups) are similar». We are sorry, but certification status is not available for the study population. The social desirability bias that may apply to patients assessments is likely to be unintentional rather than intentional. We thank the reviewer for this remark. We therefore removed «intentionally» from the limitations section, changing «therefore the patients could have intentionally overestimated their satisfaction levels» into «therefore the patients could have overestimated their satisfaction levels». Conclusion Patient satisfaction is possibly, but not certainly, related to better outcomes. We modified our conclusion according to the reviewer s remark, changing «As patient satisfaction and meeting patients expectations is related to better outcomes» into «As patient satisfaction and meeting patients expectations seems to be related to better outcomes». REVIEWER REVIEW RETURNED GENERAL COMMENTS VERSION 2 REVIEW Nienke Bleijenberg Julius Center for Health Sciences and Primary Care, University Medical Center Utrecht, the Netherlands 27-Apr-2015 The previous comments are adequately addressed. Objective is clearly stated and adapted correctly. Methods: On page 7, the authors describe that the questionnaire was pretested. Can the authors elaborate more on what was exactly tested, how, and what the feedback was and which items were improved? Data collection is clear. Sample size calculation is somewhat unclear. Why was the prevalence of the patient satisfaction items used to estimate the sample size? There is no reference why 400 patients would haven been sufficient. Research ethics: Although the study has been approved by the ethics committee for research in ambulatory care in Geneva, patients do not have time to think whether or not they would like to participate. Asking patients before or after their GP visit may have resulted in high participation rate. This has been addressed in the discussion. Limitation that should be described is that the expectation questionnaire is not validated (yet). VERSION 2 AUTHOR RESPONSE > Reviewer Name N. Bleijenberg > Institution and Country University Medical Center Utrecht, the Netherlands. > Please state any competing interests or state None declared : none declared.

> > Please leave your comments for the authors below > The previous comments are adequately addressed. > Objective is clearly stated and adapted correctly. > Methods: > On page 7, the authors describe that the questionnaire was pretested. Can the authors elaborate more on what was exactly tested, how, and what the feedback was and which items were improved? We added some information about the pretest of the questionnaire: The questionnaire was pretested in a GP s practice (PS). The respondents (20 patients & 8 GPs) provided general feedback on the questionnaire and the cover letter, as well as comments on the content and wording of individual questions, as a result of which some questions were modified accordingly. > Data collection is clear. > Sample size calculation is somewhat unclear. Why was the prevalence of the patient satisfaction items used to estimate the sample size? There is no reference why 400 patients would haven been sufficient. As previous data in similar settings were lacking, it is common practice to estimate sample size from the desired precision of a confidence interval around an expected prevalence. We assumed a prevalence of 50% of the patients expectations items. The number 400 arises directly from computation as follows: a sample size of 384 produces a twosided 95% confidence interval with a width equal to 0.10 (a precision of 5% for the half confidence interval), when the sample proportion is 0.50, 50% corresponding to an average score of 2.5 on a 5 point Likert scale (ref: Newcombe, R. G. 1998. 'Two-Sided Confidence Intervals for the Single Proportion: Comparison of Seven Methods. Statistics in Medicine, 17, pp. 857-872). So we changed the text as follows: From computation a sample size of 400 patients would have been sufficient.... In addition, we added the expected prevalence (=50%) of the patients expectations items. > Research ethics: Although the study has been approved by the ethics committee for research in ambulatory care in Geneva, patients do not have time to think whether or not they would like to participate. Asking patients before or after their GP visit may have resulted in high participation rate. This has been addressed in the discussion. We indeed addressed this limitation in our discussion. > Limitation that should be described is that the expectation questionnaire is not validated (yet). We agree with the reviewer and included the following sentence in the limitation section: Finally, the assessment of patients expectations used a non-validated questionnaire. However the selection of items for this questionnaire was based on a literature review and we chose a similar response format to that of the well-validated Europep questionnaire assessing patient satisfaction. In addition, the questionnaire was pre-tested among patients and GPs