Author s response to reviews. Title: Attitudes towards assisted dying are influenced by question wording and order: a survey experiment.

Author s response to reviews Title: Attitudes towards assisted dying are influenced by question wording and order: a survey experiment Authors: Morten Magelssen (magelssen@gmail.com) Magne Supphellen (Magne.Supphellen@nhh.no) Per Nortvedt (per.nortvedt@medisin.uio.no) Lars Johan Materstvedt (lars.johan.materstvedt@ntnu.no) Version: 1 Date: 18 Mar 2016 Author s response to reviews: We are grateful for the many comprehensive and constructive comments made by the reviewers. These have enabled us to improve the manuscript in several ways. We here respond to the comments: Reviewer #1: -It appears that the aim of the study was "to demonstrate and measure the framing effects" (see abstract, p). However, another aim (to examine the views of the Norwegian populace when it comes to AD, page 8) appears to be added in without adequate considerations (not clearly addressed in the abstract, inadequate relevant background information, nothing about instrument validation and response rate, inadequate description of results relation to previous studies). Further, since four different instruments that gave statistically significant differences were used, combining their results is questionable. It is suggested that this aim is removed and reported separately (it is of note that the authors stated (p 19)that the data used in the article are not available for sharing as they will be used for subsequent scientific analyses). Alternatively, the manuscript requires measure revision.

We agree that examining the views of the Norwegian population on AD should not be a primary aim of the article, and so we have removed this from the introduction. As we think that this topic is relevant to comment upon, however, we would like to retain one paragraph of the discussion devoted to this. We have now rewritten this paragraph, stating very clearly at the outset that combining answers to the four questionnaire versions involves some important assumptions that should be kept in mind when interpreting results. We have also added information about questionnaire validation and response rate. R1: -The difference between the two versions that examined the wording effect appears to be much more than just framing (Table 2). In one version there is suffering (great pain) in the other there is only terminal illness. This is a substantive not framing difference. As referenced by the authors, the Netherlands, Belgium, and Luxembourg legalized E and PAS for suffering without the requirement of terminal illness. We do not entirely agree. Q1-3 portray PAS for terminal illness, E for terminal illness, and AD for chronic disease respectively. The context-focused version s portrayal of PAS/E (Q1-2) involves a description of a terminally ill patient in pain, yet this is precisely the kind of terminally ill patient for whom a majority of the public would want PAS/E to be available; in that sense, it a paradigmatic example. We therefore find the propositions to be sufficently equivalent across the two questionnaire versions: They portray the legalization of PAS or E for terminal illness, or AD for chronic disease, respectively. R1: Further, it is not clear how much of the observed difference is due to wording in the instruments vs wording in the introduction. This is correct. This was acknowledged in our original version; the sentence has now been moved to a new section in the discussion entitled Potential weaknesses, and reads: The study design does not enable separate assessment of the effects of the different questionnaire introductions on the one hand, and of the different question wording on the other.

R1: -The investigators stated that they used MANOVA. However, statistical analysis methodology and reporting are far from clear. Which statistical test was used for MANOVA? What was the p value? Which test was used for comparing the 8 dependent variables individually? Is the p value one or two tailed? Was the p value adjusted for multiple comparisons? For group comparisons (age, education, etc) what was included in the model? Please include a section under Methodology to fully describe all statistical analysis. We have added the required description of the analyses, both in the Methods section and in table footnotes. Three tests were used for the MANOVA: Wilk s Lambda, Pillai s Trace, and Hotelling s Trace. F-tests were used for specific group comparisons. All tests were two-tailed. R1: -Also, please add a section for study limitations. This has now been added. R1: -It is stated that 3050 responses were received. However, it is not clear how many were invited? What was the response rate? How many were randomized to each of the 4 versions? The number of invites and the response rate have now been added, and the implications of the low response rate are discussed in the section on limitations. Respondents were randomized equally to the four groups corresponding to the four versions of the questionnaire; this is now stated in the Methods section. R1: -Page 6, first paragraph is not clear, especially the last two sentences. Please clarify the relation between question order effect, assimilation effect, and contrast effect.

The paragraph has now been rewritten. R1: -Page 7, lines 8-10, and 19-21, the meaning of the sentences are not clear. The sentences have been rewritten. R1: -Page 8, line 18, any reference for NOBAS? Line 22, what was the response rate? A reference has been added as an endnote. The response rate has now been added. R1: Line 24, how were the data weighted (same for Table 1). This is now described in the Methods section: Specifically, females, younger respondents (<35 years) and respondents from some geographical areas were underrepresented in the sample compared to the known population profile of the country. Data from these groups were given extra weight when estimating means scores on study variables (Table 1). R1: -Page 9, How were the 4 questionnaires validated? This is now described in the Methods section: Several steps were taken to validate the questionnaire. Key questions were adopted from previous studies. Several experts on medical ethics and survey methodology gave their inputs to choice of measures and question wording. Two random individuals and two survey experts, all blind to the purpose of the study, responded

to an earlier version of the questionnaire. This test lead to the removal of three questions (due to respondent fatigue) and to minor changes in the wording of two questions. R1: -Page 10, please clarify how responses were scored (Did you assign strongly agree a score of 1 or 5?) -Page 12, last sentence needs editing. -Page 13, lines 18-25 and page 14 lines 1-4, the message/logic is not clear. Sceptical? The relevant sentences or paragraphs have been rewritten. R1: -Page 17, lines 17-24 and page 18, lines 1-2, more discussion and better referencing of the previous Norwegian study are needed. What were the results of these studies? We have explained why we take previous Norwegian studies to be deficient and we thus believe that it is best not to go into detail about the findings in these studies. Another reason for this choice is that we have now chosen to downplay the discussion of the attitudes of the Norwegian public, as per the reviewer s first suggestion. However, our reference #4 does list the relevant previous studies on the topic, as we state on p. 4. R1: -Page 18, lines 3-4, the sentence is not clear. The conclusions on lines 4-6 (effect of education), lines 7-9 (support by the young), and lines 1-12 (support by Christians) are not really supported by the data (Table 4). The scores peaked at age group 35-44 and were lower for older and younger groups, the scores were highest for upper secondary school education and lower with less and more education, Christians had mean scores of 3.62, 3.38, and 2.85 for the three first questions, while it appears that 3 means neither agree nor disagree (page 10). The same applies to the results section (page 12 line 23-25 and page 13 lines 1-2).

These section has now been rewritten with a focus on precision. R1: -Table 3, please clarify in the footnote what data are presented. Mean score? What scale? What is 1? Please provide SDs. Please remove the footnote from the Table title. What do you mean by MANOVA tests? What was the result of the MANOVA? Which statistical test was used? Clarify the tests that generated the reported p values (see above). We have added the required description of the analyses, both in the text and in Table footnotes. SDs are now included in the Tables. Three tests were used for the MANOVA: Wilk s Lambda, Pillai s Trace, and Hotelling s Trace. F-tests were used for specific group comparisons. All tests were two-tailed. R1: [Table 3] What do you mean by "experienced pressure"? "palliative care"? What is NTD? The first phrase has been changed to pressure on weak groups (as per Q6). We believe palliative care and NTDs to be adequate short-form reminders of the substance of Q7 and Q8 respectively; NTD is defined both in the text and in the final list of abbreviations. R1: -Table 4, please clarify in the footnote what data are presented. Mean score? What scale? What is 1? Please provide SDs. Please remove the footnote from the Table title. What do you mean by MANOVA/ what was included in the model? What was the result of the MANOVA? Please clarify the tests that generated the reported p values (see above). Again, we have added more description of the analyses, both in the text and in Table footnotes. SDs are now included. We have removed the footnote from the Table title, as requested.

R1: Tables 5 and 6, please remove the footnote from the Table title. What do you mean by "experienced pressure"? "palliative care"? What is NTD? Each of the Table apparently gives the combined results for the two sequences of the questions and thus may be misleading. This needs at least to be indicated in the footnote. These Tables add little to the manuscript and can be removed. We agree, and have now removed these tables. Reviewer #2: This is a clear and well written paper which adds evidence for, and gives a level of significance to, the framing effect of questions in relation to surveys of public opinion on euthanasia and assisted suicide. The conclusions are in line with other work in this area but nevertheless represent a contribution to the discourse. The design is elegant in independently measuring differences of wording and of question order. The paper could be improved by having some reference to the relevance of qualitative research as a way of exploring the complexity of opinions on euthanasia, see for example M Berghs, B Dierckx de Casterlé, C Gastmans 'The complexity of nurses' attitudes toward euthanasia: a review of the literature' J Med Ethics 2005;31:441-446, or A 'Dying cancer patients talk about euthanasia' Social Science & Medicine 67 (2008) 647-656. Such research provides rich data to complement the results of quantitative surveys, notwithstanding that quantitative surveys can be more or less adequate. This acknowledgement of the scope of different methods is important to note given that the authors are presenting their design as 'the benchmark for future national surveys' (page 18 line 1 to 2). We agree, and the relevance of qualitative accounts is now acknowledged in the Discussion section: In a similar vein, quantitative research on opinions on AD can and should be complemented by qualitative research, which is able to enrich the account of such attitudes with depth and complexity. R2: The paper could also usefully have referred to surveys of opinion among specific groups, especially healthcare professionals, which also play a role in the debate, see for example C Seale,

'Legalisation of euthanasia or physician-assisted suicide: survey of doctors' attitudes.' Palliative Medicine 23.3 (2009): 205-212 or R McCormack, M Clifford, and M Conroy. 'Attitudes of UK doctors towards euthanasia and physician-assisted suicide: a systematic literature review.' Palliative medicine 26.1 (2012): 23-33. Such surveys may suffer from flaws similar to those that vitiate public opinion polls (simple yes/no questions and insufficient attention to framing effects). There may well be framing effects in the surveys of healthcare professionals but the size of these effects cannot be inferred from a survey of public opinion, as experience of working with dying people may mitigate the framing effects (but may not - this is an empirical question). We are grateful for the suggestion but have decided not to introduce further complexity and length to the Introduction section which is already long. We deem it as not essential to discuss examples of surveys of healthcare professionals, seeing as the paper reports the results of a survey to the general population. R2: A very pedantic point - the superlative ("best", "most" etc.) should only be used when comparing three or more items, hence page 8 line 13 should be "Second, and more important". The paragraph has now been rewritten.