Copyright by Sindhu Rose Johnson, 2013

Size: px
Start display at page:

Download "Copyright by Sindhu Rose Johnson, 2013"

Transcription

1 Making More Valid Estimates of Treatment Effect Using Observational Data in Uncommon Disease. The Warfarin in Scleroderma-associated Pulmonary Arterial Hypertension Model by Sindhu Rose Johnson A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy in Clinical Epidemiology, Institute of Health Policy, Management and Evaluation, University of Toronto Copyright by Sindhu Rose Johnson, 2013

2 Making More Valid Estimates of Treatment Effect Using Observational Data in Uncommon Disease. The Warfarin in Scleroderma-associated Pulmonary Arterial Hypertension Model Abstract Sindhu Rose Johnson Doctor of Philosophy in Clinical Epidemiology Institute of Health Policy, Management and Evaluation University of Toronto 2013 Objectives. The aim of this dissertation is to develop and apply methods to make more valid estimates of treatment effect using observational data in uncommon disease. The objectives were 1) using strategies that reduce the threat of potential biases on the elicited response, to develop and evaluate the feasibility, validity and reliability of a revised belief elicitation method for Bayesian priors; 2) to determine expert clinicians beliefs regarding the effect of warfarin on survival in systemic sclerosis associated pulmonary arterial hypertension (SSc-PAH) and idiopathic pulmonary arterial hypertension (IPAH) (expressed as prior probability distributions), and determine the importance of factors that influence experts use of warfarin for SSc-PAH and IPAH; 3) a) use data from experts to derive a propensity score in a Bayesian context, and, b) combine observed survival data from 2 large, inception cohorts with the prior probability distributions derived from experts, to evaluate the effect of warfarin on survival in SSc- PAH and IPAH. Methods. The methods include systematic review of the literature, application of principles of psychometric science, cross-sectional elicitation of experts beliefs, analysis of retrospective observational cohort data with a survival outcome using ii

3 propensity score modeling and Bayesian inference. Results. I have developed a scientific method to quantifiably elicit experts beliefs in the form of Bayesian priors that can be included in models of treatment effect. This method incorporates strategies that reduce the effect of potential biases, and has demonstrable feasibility, validity and reliability. I have developed methods in a Bayesian framework to make propensity-score adjusted estimates of treatment effect using observational data with a survival end point. I have determined experts beliefs regarding the effect of warfarin on survival on SSc-PAH and IPAH, and have determined factors that influence experts use of warfarin in SSc-PAH and IPAH. Using these methods, I have found that warfarin has a low probability of improving survival in SSc-PAH and IPAH. Conclusion. I have demonstrated that making more valid estimates of treatment effect using observational data of an uncommon disease can be successfully achieved. Given the low probability that warfarin improves survival in SSc-PAH and IPAH, and availability of other PAH therapies with demonstrable benefits, there is little role for warfarin in these patients. iii

4 Acknowledgments First, I would like to acknowledge my parents, Sunny and Thresiamma Johnson, and my sister, Sonia Johnson, for their love, support and encouragement. Through their example, I have learned the value of hard work, perseverance, and life-long learning. I am thankful to my thesis supervisor, Dr. Brian Feldman, and thesis committee, Dr. Gillian Hawker, Dr. John Granton and Dr. George Tomlinson. Each of you has spent countless hours teaching, supporting and guiding me. I am thankful that my progress was never rushed, that I was given so much room to grow, and for all the extra opportunities you each gave me. I am thankful to the Division of Rheumatology, Drs. Claire Bombardier and Jorge Sanchez-Guerrero, for their patience, support of my dissertation work and career. Thank you to Drs. Dafna Gladman and Simon Carette who guided me to embark on this journey, and who supported my progress every step of the way. Thank you to my clin epi buddies and research friends: Samantha Stephens, Audrey Abad, Lusine Abrahamyan, Dan Ignas, Haddas Grosbein, Shahin Jamal, Kelly O Brien, Esther Waugh and Sergio Rueda it is so great to have friends to share my research ideas and dream together. Thank you to my beautiful daughters Isabella and Helena Wijeysundera who have brought such joy to my life. You two have sacrificed more than anyone so that I may complete this dissertation. Finally, thank you to my wonderful husband, Duminda Wijeysundera. You have been my pace-setter through this marathon. I am truly blessed to be able to share my research visions, troubleshoot challenges, and have insightful pep talks with the same person that I am building a life and family with. This dissertation is dedicated to you. iv

5 Table of Contents Acknowledgments... iv Table of Contents... v List of Tables... vii List of Figures... viii List of Appendices... ix CHAPTER 1 Uncommon Diseases, Methodological Challenges of Observational Data and Possible Solutions Uncommon Diseases and Methodological Challenges of Observational Data Potential Solution #1. Bayesian Methods Science and Statistical Inference Bayesian versus Frequentist Inference Bayesian Inference in Rheumatology Potential Solution #2. Propensity Score Methods New Territory CHAPTER 2 Systemic Sclerosis-associated and Idiopathic Pulmonary Arterial Hypertension As Good Models to Test the Methods Summary of Observational Studies evaluating Warfarin in IPAH A Good Model to Develop and Apply Methods Dissertation Aims, Objectives and Hypotheses Objectives CHAPTER 3 Methods to Elicit Beliefs for Bayesian Priors. A Systematic Review Background Methods Results Discussion Conclusion CHAPTER 4 A Valid and Reliable Belief Elicitation Method for Bayesian Priors Background Methods v

6 4.3 Results Discussion Conclusion CHAPTER 5 Effect of Warfarin on Survival in SSc-PAH and IPAH. Belief Elicitation for Bayesian Priors Background Materials and Methods Results Discussion CHAPTER 6 Warfarin in Scleroderma-associated and Idiopathic Pulmonary Arterial Hypertension. A Bayesian Approach to Evaluating Treatment in Uncommon Disease Introduction Materials and Methods Results Discussion CHAPTER 7 Synthesis References Appendices vi

7 List of Tables Table 1.1 Comparison of propensity score methods Table 3.1 Summary of study characteristics Table 3.2 Summary of elicitation methods Table 3.3 Summary of studies that considered validity, reliability, responsiveness and feasibility Table 3.4 Biases in belief elicitation and methodologic strategies to their effect Table 4.1 Study participants Table 4.2 Reliability, validity and sensibility of the elicitation procedure Table 4.3 Feasibility of the elicitation procedure Table 5.1 Characteristics of study participants Table 5.2 Factors that influence experts use of warfarin Table 6.1 SSc-PAH patient characteristics Table 6.2 IPAH patient characteristics Table 6.3 Cox proportional hazards model survival analysis vii

8 List of Figures Figure 3.1 Flow diagram of systematic review results Figure 3.2 Example of a bins and chips belief elicitation method Figure 3.1 Biases affecting the validity of belief elicitation Figure 4.1 Belief Elicitation Procedure Example Figure 4.2 Construct validity Relationship between median effect on survival and overall effect on survival Figure 4.3 Group probability distribution for 3-year survival in SSc-PAH and IPAH patients treated with warfarin Figure 5.1 Flow diagram of participant recruitment Figure 5.2 Group prior probability distributions for probability of 3-year survival in SSc- PAH and IPAH treated with warfarin Figure 5.3 Prior probability distributions for the effect of warfarin on the absolute risk difference for 3-year mortality in SSc-PAH patients from experts Figure 5.4 Probability distributions for the effect of warfarin on the absolute risk difference for 3-year mortality in IPAH patients from experts and the published literature Figure 6.1 Bayesian triplot for effect of warfarin on survival in SScPAH patients Figure 6.2 Bayesian triplot for effect of warfarin on survival in IPAH patients Figure 6.3 Density plot for difference in median survival times in SScPAH patients untreated and treated with warfarin using an informative group prior Figure 6.4 Density plot for difference in median survival times in IPAH patients untreated and treated with warfarin using an informative group prior viii

9 List of Appendices Appendix 1. Standardized script for vitamin C example Appendix 2. Standardized script for belief elicitation of effect of warfarin in SSc-PAH and IPAH Appendix 3. Elicited responses ix

10 Thesis Overview Chapter 1. Uncommon diseases, methodological challenges of observational data and possible solutions. I discuss the challenges of evaluating therapy in uncommon diseases and the use of observational data. In particular, I discuss issues related to small sample sizes and confounding by indication. I propose that the innovative application of Bayesian statistical inference and propensity score models may be potential solutions. I review use of these methods. Chapter 2. Systemic sclerosis-associated and idiopathic pulmonary arterial hypertension as good models to test the methods. I discuss SSc-PAH and IPAH as ideal clinical models to test our methods. I summarize the epidemiology of SSc-PAH and IPAH. I summarize the evidence evaluating the use of warfarin to improve survival in SSc-PAH and IPAH, and highlight areas of methodological concern. I outline my research plan, aims, objectives and hypotheses. Chapter 3. Methods to elicit beliefs for Bayesian priors. Using Bayesian inference, expert knowledge can be used in models estimating treatment effects when expressed as a prior probability distribution. In this chapter, we report a systematic review of the literature to identify methods for eliciting beliefs from experts. We review factors that threaten the validity and reliability of the elicited response, and strategies that can reduce the effect of these potential biases. We present a conceptual framework outlining the process by which beliefs about treatment effects are formulated by experts and the process by which investigators may elicit beliefs. Chapter 4. A valid and reliable belief elicitation method for Bayesian priors. Using strategies to reduce the effect of potential biases, we developed a revised belief elicitation method for Bayesian priors. We test the psychometric properties (feasibility, validity and reliability) of the revised belief elicitation method. Chapter 5. Belief elicitation study. Using the valid and reliable belief elicitation method developed in Chapter 4, we evaluate experts belief regarding the effect of warfarin on survival in SSc-PAH and IPAH, and evaluate the importance of factors that influence experts use of warfarin in SSc-PAH and IPAH. We present prior probability x

11 distributions reflecting experts beliefs about the effect of warfarin that can be used in models evaluating treatment effects. We present evidence of a divergence of opinion in the community regarding the effect of warfarin on survival in SSc-PAH and IPAH. We present the importance and ranking of factors that influence experts use of warfarin. This information can be used in the construction of models that adjust for confounding when estimating treatment effects on survival. Chapter 6. Retrospective cohort study. We combine the prior probability distributions elicited from experts with observed survival data to derive an estimate of the treatment effect of warfarin on survival in SSc-PAH and IPAH. We use data from the belief elicitation study to develop a Bayesian propensity score model. Thus, we apply a bias correcting analytic method in observational data with a survival endpoint, all in a fully Bayesian context. Chapter 7. Synthesis. In this chapter, I summarize and synthesize our findings. I discuss methodological strengths and limitations. I discuss the importance of this work, implications for clinical practice and future research. xi

12 CHAPTER 1 Uncommon Diseases, Methodological Challenges of Observational Data and Possible Solutions 1.1 Uncommon Diseases and Methodological Challenges of Observational Data The merits of observational data for the study of uncommon diseases have long been recognized.(1) The use of narrow inclusion criteria to select patients for clinical trials can result in a more precise estimation of a treatment effect in a defined group of patients. Observational studies, though, can evaluate the effect of a treatment in a wider population that might react differently to the treatment.(2) Thus, observational data can provide a better representation of the spectrum of real world practice than conventional randomized trials.(3) The use of observational data allows studies to have a longer duration of follow-up. This can yield important understanding about long-term treatment effects, as well as long-term adverse effects. With the increasing recognition of the limitations of clinical trials, there is great value in the use of observational studies, particularly observational cohorts or registries.(1) The ability to make precise estimates of treatment effects using observational data in uncommon disease has historically faced a number of challenges. The first challenge relates to patient numbers. In the setting of uncommon diseases, small numbers of patients are available for study recruitment. The number of accrued patients (sample size) influences the amount of sampling error in a test result. A low sample size will decrease the probability of concluding a treatment is effective when there is actually a treatment effect (referred to as the power of a statistical test).(4) As they often recruit relatively small numbers of patients, studies of uncommon diseases often have inadequate power to detect important treatment effects.(5, 6) Another challenge is the issue of confounding.(7) Confounding of a treatment effect occurs when there is a distortion of the estimated treatment effect on an outcome caused by the presence of another factor.(8) This factor (i.e., confounder) is a) causally related to the outcome independently of the exposure, and b) is associated with the 1

13 exposure but not a consequence of exposure. The confounder can have a positive influence, increasing the measured treatment effect above what it would otherwise be, or it can have a negative influence, falsely lowering the measured treatment effect.(8) Random allocation of treatment exposure (randomization) on average removes the potential effect of confounding.(9) It is important to note however, that randomization does not always lead to balance in baseline characteristics across exposure groups.(10) Indeed, just by chance (random error) imbalance in baseline characteristics can occur. There may be some factors that might be causally related to the outcome that are imbalanced after randomization. However, this residual imbalance is random imbalance as opposed to systematic imbalance. The fact that there is no systematic direction to this imbalance allows us to use statistical inference to determine how likely our data is under the null hypothesis meaning what is the chance that imbalance and random variation in outcomes alone have led to the results that we see. Similarly, if there is imbalance of unmeasured influences on the outcome, the imbalance is random as opposed to systematic. Increasing the trial size can minimize the probability of imbalance occurring. In contrast, confounding in observational studies is not minimized as the study size increases.(8) In either observational studies or clinical trials, study design and analytic strategies are needed to address the potential effects of confounding. Confounding by indication (also known as treatment selection bias or susceptibility bias) is a special and important form of confounding that threatens the use of observational data to make unbiased estimates of treatment effect.(8, 11) In a randomized trial, the act of randomization ensures that treatment assignment is random. In an observational study, treatment assignment is not random, and may be influenced by a variety of factors. Confounding by indication occurs when there is non-comparability between the study groups resulting from the way they were constructed.(8) Exposed and unexposed patients may differ systematically in important characteristics. These may include disease severity, co-morbidity, prognosis, local practice patterns, health care access and patient preferences.(10, 12) Small differences between treatment groups in many covariates can accumulate into substantial overall differences.(3) It may be that these differences have a greater effect on the outcome than the intervention itself. This bias 2

14 may result in a distortion of the measured treatment effect as a consequence of the way in which the study groups were constructed.(8) Properly conducted randomized trials are not affected by confounding by indication. Confounding by indication needs to be considered when the interest of an analysis lies in the effect of a treatment that is given in the course of clinical care. Alvan Feinstein, a founder of clinical epidemiology, recognized these issues.(11) He challenged investigators to develop methods that would allow for the unbiased estimation of treatment effects with observational data, which approximate that which would be observed in a similarly-sized clinical trial.(13) Indeed, the integrity of observational studies is said to depend on the ability of investigators to actively develop methods that will minimize bias.(14) Two potential methodological solutions to the challenges of small sample sizes and confounding by indication involve the use of Bayesian statistical inference and propensity score models. 1.2 Potential Solution #1. Bayesian Methods Science and Statistical Inference The philosophy of science, according to Stephen Hawking and Leonard Mlodinow, is based on the assumption that we form models of the world from sensory input. We use the models that are most successful at explaining events, and assume that the models match reality.(15) The discipline of statistics, in part, describes the way people learn as they make observations.(9) Investigators try to understand the world by making mathematical models. Each model represents our understanding of the process or phenomenon under which we are studying.(4) Statistical inferences are based on mathematical models.(4) In the long run, we retain models based on their validity, reliability, predictability and perceived match to reality.(15) Statistics facilitates description of the average person, ascertains how well the description fits other people, and allows us to generalize from our sample to a large group of people or the population.(16) Furthermore, statistics is a science of making inferences about unknown quantities. Unknown quantities can include important outcomes, such as measures of effectiveness, adverse events, and diagnostic test results.(17) 3

15 Schools of statistical inference differ in their approach to truth and uncertainty. Models are used to describe a relationship as it occurs in the population, i.e., the truth. The true quantities in the model are referred to as parameters, inherent properties of nature. Since the complete population is usually not fully observable, the parameter is not known with certainty. Observations are most often restricted to a sample from the population.(4) Statistical inferences are based on observations and involve a description of uncertainty. There are philosophical differences in how uncertainty is conceptualized and handled that characterize the various schools of statistical inference. The frequentist statistical method (also referred to as classical statistics) is one method of making inferences from observations. Frequentist inference is the culmination of methods proposed by Ronald A. Fisher, Egon Pearson and Jerzy Neyman. Using this paradigm, hypothesis testing is based on the frequency of obtaining a result (data) as extreme or more extreme if the experiment were repeated many times, under certain fixed conditions.(18) Unknown parameter values are treated as fixed constants, not having a probability distribution. Observations are treated as only one possible instance of data that could have come from a given probability distribution.(19) The Bayesian statistical method is an alternative inferential paradigm. Using Bayesian inference, the uncertainty about the values of parameters is represented by probability distributions, whereas data, once observed, are treated as fixed.(19) Probability is used to measure uncertainty; that which is unknown has a probability distribution. Everything that is known is taken as given, and probabilities are calculated conditionally on known values.(4) These fundamental differences in the conceptualization of truth and uncertainty not only affect how investigators think about the universe (including health care research), but also how we learn about our universe from our observations over time Bayesian versus Frequentist Inference Bayesian statistical inference began with the work of Reverend Thomas Bayes.(20, 21) The Bayesian statistical paradigm uses probability as the measure of one s uncertainty about an unknown quantity. In its simplest terms, one may begin with a probability of the 4

16 truth of a hypothesis, for example that a parameter is equal to a specific value. Using Bayes theorem, observations can be used to update the probability that this hypothesis is true. Pre-existing data or knowledge about the hypothesis is quantifiably expressed as a prior probability distribution or prior. New observations are made and their information content is expressed through the likelihood (i.e., the likelihood of the data under a given hypothesis). Through incorporation of the new observations, the probability of the truth of the original hypothesis is re-calculated. Bayes theorem (also known as Bayes rule, or the rule of inverse probabilities)(4) indicates how probabilities change in view of new data. P(H D) = P(D H) X P(H)/P(D) Formula 1. Bayes Theorem. The probability of the hypothesis, given the data, is equal to the probability of the data, given that particular hypothesis, multiplied by the probability of the hypothesis before obtaining the data divided by the averaged probability of the data.(18) P(H D) = probability of the hypothesis given the data, also called the posterior probability distribution or posterior P(D H) = probability of the data given the hypothesis, also called the likelihood function for the data P(H) = probability of the hypothesis, also called the prior probability distribution or prior P(D) = probability of the data over all k possible competing hypotheses, where P(D) = P(D H 1 )P(H 1 ) + P(D H 2 )P(H 2 ) + + P(D H k )P(H k ) The application of the Bayesian paradigm confers potential advantages. First, the use of priors confers the ability to incorporate external knowledge, beliefs and data into models estimating treatment effects. Often, there is some form of preliminary knowledge preceding the conduct of a study. The knowledge may take the form of published clinical observations, such as case reports, case series, observational studies or randomized trials. The knowledge may also take the form of expert opinion. In the 5

17 absence of published data, clinicians frequently look to experts to inform therapeutic decision-making. The expert s knowledge is usually the result of years of training and observations treating patients. In clinical reality, when new data is published, it is considered in the context of pre-existing knowledge. Instead of considering pre-existing knowledge in the discussion section of a manuscript, Bayesian methods can incorporate all sources of information in the estimation of treatment effect.(22) Thus, the Bayesian process of making inferences mirrors clinical practice. In contrast, frequentist inference requires investigators to blind themselves to existing information due to concerns that it might bias their conclusions.(18) Indeed, it has been argued that when interpreting new data, frequentist statistical inference ignores the past.(18) Pre-existing information is only considered after the conclusions of a given study are presented. Bayesian inference also allows direct probability statements to be made (e.g., there is a 95% probability that ibuprofen use will reduce headache pain). The probability statement can be revised as more data is gathered. This contrasts with frequentist inference, which takes the following approach. Prior to initiation of a study, an alternative hypothesis that a treatment effect of some magnitude exists and a null hypothesis of no effect are specified. A permissible false-positive rate (level of significance) is typically set somewhat arbitrarily at 5%. A therapy is said to be beneficial if the p-value is < 0.05 or the 95% confidence interval for the therapy effect does not include the null value. If the p-value is greater than 0.05, the investigators must conclude that there is insufficient evidence to reject the null hypothesis of no treatment effect (even if the data suggests there is evidence of a beneficial treatment effect)(5, 6). Use of the frequentist paradigm in the setting of uncommon diseases can result in studies with low power against important effects and that have conclusions that are meaningless for practicing clinicians. A serious and pragmatic limitation of the frequentist method is that clinicians and investigators, even those with some statistical experience, misinterpret the p-value and 95% confidence interval.(18) A p-value of less than 0.05 is frequently interpreted as there is a treatment effect, whereas a p-value greater than 0.05 is frequently 6

18 interpreted as evidence of no treatment effect. In reality, the p-value is the probability of observing data as extreme or more extreme than the observed data assuming the null hypothesis is true (i.e., the treatment is ineffective), if the study were to be repeated a countless number of times.(6) The 95% confidence interval is also frequently misunderstood.(6) The 95% confidence interval indicates that if the same study were repeated an infinite number of times, 95% of the confidence intervals formed would include the true treatment effect.(23) The confidence interval does not report what clinicians are interested in, a fixed range of values that has a 95% probability of including the true treatment effect.(24) In fact, the probabilities in frequentist inference do not refer to uncertainty in the treatment effect, but to the uncertainty in behavior of the intervals based on hypothetical samples of data from the same population. Clinicians regularly think in a Bayesian framework when considering the utility of a diagnostic test. Using information from a patient history and physical examination, clinicians construct a pre-test probability of disease (equivalent to a prior). Information is gained from a diagnostic test (equivalent to a likelihood) and used to construct the posttest probability of disease (equivalent to the posterior). The use of Bayesian inference in the interpretation of new data in the context of preexisting knowledge is an extension of this way of thinking. The widespread use of Bayesian inference in daily experiences can be seen in weather forecasting (e.g., 40% probability of precipitation) or information technology (80% probability that an is spam and should be removed).(25) Bayesian Inference in Rheumatology Bayesian methods have only been applied infrequently in rheumatology research.(25) Amitriptyline use in juvenile inflammatory arthritis. Bayesian meta-analysis was used to synthesize multiple N-of-1 trials evaluating amitriptyline use in children with juvenile inflammatory arthritis for pain reduction.(26) They demonstrated a mean treatment effect for pain of 0.67 (standard deviation (SD) 0.89, 95% credible interval (CrI) -0.99, 2.55). The probability that the treatment effect was less than 0 was 16%. Application of this method allowed data from a small number of patients (n = 6) to 7

19 estimate the probability of treatment benefit and the population effect. This approach is of important value before embarking on a large, potentially costly, multicenter trial. In this study, the researchers reported a small probability of a beneficial treatment effect, thereby preventing the initiation of a trial that would likely have been futile.(26) Certolizumab in rheumatoid arthritis. Bayesian methods of meta-analysis were applied to evaluate the non-inferiority of certolizumab compared to other biologic agents (infliximab, etanercept, adalimumab, golimumab, anakinra, tocilizumab) in the treatment of rheumatoid arthritis. (27) The investigators applied Bayesian indirect comparison analyses to evaluate treatments that have been compared against placebo in other studies, and made inferences about specific treatment contrasts.(28) Bayesian indirect comparisons allow measures of treatment effects that have never been directly observed.(28) The primary outcome was the American College of Rheumatology (ACR) 20, defined as a 20% improvement in tender and swollen joint counts and 20% improvement in at least 3 of the following 5 ACR core set measures: pain, patient and physician global assessments, self-assessed physical disability, and acute phase reactant.(29) The analysis demonstrated that the ACR20 response of certolizumab (Odds Ratio (OR) 11.82; 95% CrI 5.98, 21.71) was superior to that of infliximab (OR 3.31; 95% CrI 2.05, 5.03), adalimumab (OR 3.72; 95% CrI 2.35, 5.93), and anakinra (OR 2.40; 95% CrI 0.96, 5.03); and equivalent or superior to that of etanercept (OR 8.07; 95% CrI 3.34, 16.75), golimumab (OR 3.62; 95% CrI 1.62, 6.97), and tocilizumab (OR 4.13; 95% CrI 2.64, 6.19). This analysis provided evidence regarding efficacy of different treatments effect in a setting where no head-to-head trial evidence exists. Abatacept in rheumatoid arthritis. Bayesian methods of meta-analysis were also applied to evaluate the non-inferiority of abatacept compared to other biologic agents (infliximab, etanercept, adalimumab, golimumab, certolizumab) in the treatment of methotrexate non-responsive rheumatoid arthritis patients. The primary outcome was the change in the Health Assessment Questionnaire (HAQ) Disability Index at 6 months after initiation of therapy. Abatacept was superior to placebo (mean difference in HAQ (95% CrI-0.42, -0.16), and comparable to all the other biologic agents. This analytic approach demonstrated that abatacept is an effective, alternative option in the 8

20 treatment of methotrexate non-responsive patients, in the absence of a placebocontrolled trial.(30) Prevalence of systemic autoimmune rheumatic diseases. Bayesian methods have also been applied in the estimation of the prevalence of inflammatory myositis (polymyositis and dermatomyositis) using administrative data.(31) Ascertainment of cases was dependent on accurate International Classification of Disease version 9 coding and different diagnostic algorithms. The administrative data sources were potentially susceptible to measurement error resulting in misclassification. The investigators used a Bayesian latent class regression model to account for uncertainty in prevalence estimates due to patient demographics, and sensitivity and specificity of the diagnostic algorithms. The investigators demonstrated the prevalence of inflammatory myositis to be 21.5/100,000 (95% CrI 19.4, 23.9). The prevalence was highest in older, urban women (70/100,000, 95% CrI 61.3 to 79.3), and lowest in young, rural men (2.7/100,000, 95% CrI 1.6, 4.1). The sensitivity of case ascertainment was lower for older versus younger individuals. Hospitalization data was more sensitive in ascertaining cases in rural regions. In contrast, rheumatology billing data was more useful in urban areas. These applied Bayesian methods allowed the researchers to make population-level estimates of inflammatory myositis prevalence despite imperfect data and variable case ascertainment algorithms. Based on this work, Bayesian latent class regression models have been used to comparatively evaluate the prevalence of systemic autoimmune rheumatic diseases (SARDs) (systemic lupus erythematosus, systemic sclerosis (SSc), Sjogren s syndrome, polymyositis, dermatomyositis) across three Canadian provinces accounting for regional and demographic variations.(32) The investigators demonstrated the prevalence of SARDs to be 2 3 cases per 1,000. The prevalence in older females was approximately 1 in 100 (possibly related to the presence of Sjogren s syndrome in this subset of patients), and there was a greater prevalence in urban regions. Again, applied Bayesian methods allowed the researchers to make population-level estimates of SARDs prevalence despite imperfect data and variable case ascertainment algorithms. This research provided important information about regional and demographic 9

21 variations; and suggests that surveillance of SARDs using administrative data is feasible. Methotrexate use in systemic sclerosis. Using the example of the efficacy of methotrexate in systemic sclerosis (SSc), Bayesian inference was used to make inferences about treatment effects in an uncommon disease where the sample size was small and the study had insufficient power to detect a treatment effect using the frequentist inference.(24) Data from the SSc trial indicated that treatment with methotrexate improved skin score.(5) However, due to the small sample size and limited power, the researchers concluded that there was insufficient evidence to reject the null hypothesis of no treatment effect. This supported the (potentially false) belief that methotrexate is ineffective in SSc. A survey of general rheumatologists and scleroderma experts, conducted after the publication of the methotrexate trial, found only 22% frequently use methotrexate for SSc skin involvement.(33) Re-analysis of the same study data with Bayesian methods demonstrated that methotrexate has a high probability of a beneficial treatment effect on skin score.(24) The probability that methotrexate resulted in improvement was 94% for the modified Rodnan skin score, 96% for University of California, Los Angeles skin score, and 88% for physician global assessment. There was 96% probability that at least 2 of 3 primary outcomes improved with methotrexate. These findings have contributed to a tempered perception of methotrexate, as it is presently considered a treatment option in SSc patients.(34) Bayesian methods facilitated clinically useful inferences to be made with the data from a small clinical trial. Given the issues that challenge research in uncommon diseases (small samples sizes resulting in limited power, imperfect data, competing trials for subject recruitment), Bayesian inference has a number of potential advantages. Bayesian methods permit simple, intuitive and meaningful statements of statistical inference of treatment effect.(35) They provide a transparent framework for combining new information with pre-existing knowledge.(35) Most importantly to the study of uncommon diseases, Bayesian inference allows for inferences to be made from a limited number of patients.(36) 10

22 1.3 Potential Solution #2. Propensity Score Methods Inferences about the effect of a treatment involve speculation about (comparisons to) the effect that treatment would have on a patient who did not receive it (or received something else).(37) In observational studies, inferences are made by comparing exposed and non-exposed patients. Confounding by indication occurs when there is non-comparability between the study groups resulting from the way they were constructed.(8) The non-comparability may result in a situation where the treatment group is systematically different from the comparator group in a way that distorts the estimation of the treatment effect; direct comparisons of the 2 groups may be misleading and lead to a biased estimation of treatment effect.(37) Confounding by indication is a critical threat to the validity of using observational data to make estimates of treatment effect. One solution is the use of propensity score methods. The propensity score is a balancing score that can be used to make groups or pairs that are not systematically different, and thus enables comparisons between groups.(37) It is defined as the conditional probability of exposure to treatment, given the observed covariates.(38) The propensity score can be estimated by regressing treatment assignment on observed baseline characteristics using a logistic regression model (for example). e(x) = P(Z=1 X) Formula 2. Propensity Score e(x) = propensity score Z = exposure, where 1 = exposed, 0 = unexposed X = a set of baseline characteristics, where X = (X 1. X p ) P(Z=1 X) = probability of exposure given observed baseline characteristics Note. Each patient has a probability of exposure where 0 < e(x) < 1. 11

23 At an individual level, it is a measure of the likelihood that a person would have been treated considering their baseline characteristics.(38) The propensity score summarizes all the relevant baseline characteristics in a single composite score.(3, 39) The propensity score can be used to ascertain if there is sufficient overlap in baseline characteristics between the treatment and comparator groups to allow appropriate estimation of a treatment effect.(3) The propensity score can be used to create a design with balance. Most importantly, use of the propensity score allows unbiased estimation of the causal effect of an exposure in the presence of confounding.(37, 40) Subjects with a given propensity score will tend to have the same distribution of covariates. This creates a pseudo-randomized scenario allowing unbiased estimation of the treatment effect at each value of the propensity score.(40) Four different methodological approaches have been described for the application of the propensity score. The first method involves matching treated and comparator patients on the propensity score to create an individually matched sample. This method is typically used in a situation where there are a limited number of treated patients and a larger number of comparator patients.(38) A number of matching methods have been proposed.(38) The easiest and most commonly utilized method is matching on the nearest available propensity score within calipers defined by the propensity score.(38) The treated subjects are randomly ordered. The first treated patient is matched to the closest comparator patient, and the pair is removed from the pool. This is repeated until no further matches can be achieved within the defined caliper width.(38) It has been recommended to use a caliper width of of the standard deviation of the logit of the propensity score.(41, 42) A second method is subclassification (also called stratification) where treated and comparator subjects are divided on the basis of the propensity score into subclasses. Treated and comparator patients who are in the same subclass are compared directly.(38) A pooled estimate of treatment effect is calculated across the subclasses. Subclassification with five subclasses has been shown to remove up to 90% of the bias.(43) 12

24 A third method is the inverse-probability-of-treatment weighted estimator (also called the marginal structural model approach). Patients are weighted by the inverse probability of the treatment received, i.e., the inverse of the propensity score. This approach essentially creates 2 groups that would have been observed if all patients had been exposed, and all patients were not exposed (also referred to as pseudopopulations).(44) Outcomes can be compared between the treated and comparator patients in the weighted sample. A criticism of this approach is that patients with low propensity scores can result in decreased precision in the estimate of treatment effect (i.e., very high variance and wide confidence intervals).(40) A fourth method is the use of the propensity score in regression adjustment. With traditional multivariable modeling and a large dataset, one could regress the outcome on baseline characteristics. However, in the setting of uncommon diseases, there are frequently insufficient numbers of patients (and outcomes) to allow for a complex model with numerous variables. Using this fourth propensity score approach, important baseline variables are used to model the propensity score. The outcome is then regressed on the propensity score and an indicator for treatment assignment. The ability to include a complex model with many variables in the estimation of a propensity score, and then regress the outcome on the single propensity score, confers an important methodological advantage in the study of uncommon diseases.(37) A criticism of this method is loss of the ability to mimic the design of a randomized trial. Matching, subclassification and weighting remove confounding by design.(45) Treatment assignment is independent of baseline characteristics in the matched sample, stratified sample or weighted sample. The outcome is only assessed after confounding has been addressed. The use of the propensity score for regression adjustment requires the propensity score to be used in the same model as evaluation of the outcome. This lack of separation of the design and analysis is of concern for some critics.(41) Using the other 3 methods, the propensity score is used in study design, and the analysis is separate.(45) A comparison of the propensity methods highlighting their relative advantages and disadvantages(37, 38, 41) is summarized in Table 1.1. After construction of a propensity score, the quality of balance achieved by the propensity score should be evaluated. The test of a good propensity score is the degree 13

25 to which it results in the measured baseline characteristics being balanced between the treated and comparator patients. If matching is used, it has been recommended that investigators describe the distribution of baseline characteristics in the matched sample.(46) One method of assessing balance, after propensity score matching, is evaluation of the standardized difference in each baseline characteristic. The standardized difference is the absolute difference in the sample means divided by an estimate of the pooled standard deviation of the variable. The standardized difference represents the difference in means between the two groups in units of standard deviation. A similar formula is used for determining the standardized differences for dichotomous variables.(47) d = (x treatment x control ) s 2 treatment + s 2 control 2 Formula 3. Standardized difference for comparing means d = standardized difference x group = mean of baseline characteristic in the specified group s 2 group = variance of baseline characteristic in the specified group d = ( p ˆ treatment p ˆ control ) p ˆ treatment (1 p ˆ treatment ) + p ˆ control (1 p ˆ control ) 2 Formula 4. Standardized differences for comparing prevalences d = standardized difference ˆp group = prevalence of baseline characteristic in the specified group 14

26 The use of the standardized difference to assess balance is preferable to conventional significance testing as it is not affected by the sample size and it is a property of the sample.(48) Diagnostics have also been proposed for when the propensity score is used for covariate adjustment (weighted conditional standardized difference). When imbalance occurs, it is suggested to modify the propensity score iteratively, in an attempt to achieve better balance.(48) As in a randomized trial, there may be some residual imbalance in measured baseline characteristics in the matched sample.(46) There is no gold standard criterion for assessing appropriate balance. It has been recommended that a standardized difference of 10% be used as a threshold for assessing balance.(47) Once the matched sample has been created, it is recommended to use statistical methods appropriate for the analysis of matched data when estimating the treatment effect and its statistical significance (e.g., paired t-test, Wilcoxon signed ranks test).(48) Statistical testing should take into consideration the lack of independence of patients within the propensity score matched pair.(46) A significant advantage of the propensity score is that it can facilitate causal inferences from observational data.(37) If treatment assignment can be ignored as a result of balancing on the propensity score, then the difference between the treatment and comparator groups at each value of the propensity score is an unbiased estimate of the treatment effect. Consequently, use of propensity score methods can produce unbiased estimates of the average treatment effect.(37) 1.4 New Territory Despite the increasing awareness and use of Bayesian and propensity score methods, there are methodological innovations and issues of implementation that need to be resolved.(19) None of the Bayesian studies in rheumatology have been fully Bayesian in their approach; none have taken full advantage of the potential of Bayesian methodology. That is, none have utilized the ability to incorporate all knowledge of treatment effect (expert belief or published evidence) prior to the conduct of their study to inform their estimated treatment effect. It has been argued that if there is important 15

27 pre-existing information that needs to be taken into consideration, it must be formally incorporated in the analysis.(6) In a setting of scarce published data, and heavy reliance on expert opinion for clinical guidance, the appropriate method of incorporating this type of knowledge is less clear. To undertake a fully Bayesian approach, a number of methodological questions need to be addressed. What is the best method for eliciting beliefs from experts? How are beliefs quantified and expressed as a prior probability distribution for inclusion in models estimating treatment effect? Propensity score modeling is an innovative bias-correcting method that may be potentially useful in the setting of observational data. However, a number of issues need to be addressed. Propensity scores are traditionally used in large, administrative data sets.(3) Can propensity score models be used in the setting of uncommon disease where the sample sizes are small? Using a small data set, one may not be able to find a suitable patient for propensity score matching. There is a risk of losing a substantial number of cases when a suitable match cannot be found. The performance of propensity score methods in small samples is uncertain. Furthermore, it has been recommended that models should be based on data and clinical sensibility. Ideally, the propensity score model should be based on factors that influence exposure. Is it possible to develop a propensity score that incorporates prior knowledge, i.e., a Bayesian propensity score? A Bayesian approach would allow the incorporation of prior opinion about which variables to include in the propensity score model. However, methods for using propensity scores in a Bayesian setting have only just begun to be developed.(49) In Chapter 3 of this dissertation, I conducted a systematic review of the literature to identify methods of belief elicitation for Bayesian priors.(50) I applied a psychometric approach, by reviewing the validity, reliability, responsiveness and feasibility of these methods to determine if one method had incremental value over another. I developed a conceptual framework for the belief elicitation process (the processes by which experts formulate a belief and the process by which investigators can elicit the belief quantifiably).(50) Using this conceptual framework, I identified biases that may threaten 16

28 the elicited response, and identified methodological strategies that may reduce the effect of bias on the elicitation process. Identification of the best method based on psychometric properties was limited by the availability of data.(50) I found that the validity, reliability and responsiveness of existing belief elicitation methods for Bayesian priors were not adequately evaluated. In Chapter 4, using my conceptual framework(50), I revised existing methods of belief elicitation for Bayesian priors and added pragmatic methodological strategies to reduce the effect of potential biases.(51) I tested the validity and reliability of this belief elicitation method.(51) Given demonstrable validity and reliability of our belief elicitation method, in Chapter 5, I conducted a belief elicitation study of international experts to quantify their belief about treatment effect for expression as Bayesian priors.(52) In Chapter 6, I developed a Bayesian propensity score model, based on the experts beliefs on variables related to treatment assignment(52), as a bias correcting analytic method. I then used these prior probability distributions for treatment effect, along with the propensity score, to model an observational dataset with a survival endpoint - all in a fully Bayesian analysis.(53) This dissertation is the first, to my knowledge, to develop and test a belief elicitation method, and try to apply a fully Bayesian approach to make unbiased estimates of treatment effects using observational data in uncommon disease. In doing so, I have attempted to make use of observational data to make inferences about treatment effect. 17

29 Table 1.1 Comparisons of propensity score methods Method Advantages Disadvantages Matching Subclassification Covariate adjustment Inverse probability weighting Eliminates more bias from confounding by indication than stratification Transparency in comparing treated and comparator patients Well-developed balance diagnostics Outcome is not specified in the model Allows estimation of risk differences and relative risk Transparency in comparing treated and comparator patients Permits use of the whole dataset Can assess balance within each strata Can summarize numerous confounders in a relatively small data set Permits use of the whole dataset Removes approximately equivalent amount of imbalance as matching Permits use of the whole dataset Reduced sample size from discarding unmatched subjects May not have adequate overlap May have very small cell sizes in some strata Less well-developed balance diagnostics Loses design comparability to a randomized trial: adjusting instead of comparing similar patients May include treated patients for whom there are no appropriate comparator patients Loses ability to estimate risk difference and relative risk Less well-developed balance diagnostics Potential for decreased precision in estimated treatment effect Less intuitive 18

30 CHAPTER 2 Systemic Sclerosis-associated and Idiopathic Pulmonary Arterial Hypertension As Good Models to Test the Methods One application that would benefit from the use of Bayesian and propensity score methods is evaluation of the effect of warfarin in improving survival in systemic sclerosis (SSc) associated pulmonary arterial hypertension (PAH) and idiopathic PAH (IPAH). SSc is an uncommon chronic disease characterized by fibrosis (of the skin, lungs, kidneys), vasculopathy (resulting in digital ulceration, gangrene and PAH) and immune activation. It has an annual incidence of 2 10 per million and a prevalence of up to 290 per million.(54) A leading cause of death in SSc patients is PAH.(55) Patients suffer from shortness of breath, decreased exercise tolerance, progressive heart failure, poor quality of life, and succumb to an untimely death. Right heart catheterization-based prevalence estimates of SSc-PAH range between 7% 29%.(56-59) Prevalence estimates are wide due to variability in patient selection and threshold of pulmonary pressure used for diagnosis. It has been argued that studies may underestimate the true prevalence since severe or symptomatic patients are referred to academic centers.(60) Untreated, SSc-PAH has a median survival of 12 months, and 2-year, 3- year and 5-year survival of 44%, 40% and 4%.(61) Using modern treatments, survival has moderately improved with a median survival of 4 years, and 2-year, 3-year and 5- year survival of 72%, 67%, and 36%.(62) Idiopathic pulmonary arterial hypertension (IPAH) has a reported incidence ranging from cases/million/year, and a prevalence ranging from cases per million.(63) Historically, IPAH had a median survival of 2.8 years.(64) In the modern treatment era, 3-year survival has improved to 76% 85%.(65, 66) Newer therapies, such as prostacyclin, endothelin receptor antagonists, and phosphodiesterase antagonists, have been shown to have beneficial effects on exercise capacity and subjective dyspnea. However, they are very expensive (e.g., Flolan 19

31 $80,000 per year, Bosentan $48,000 per year, Sildenafil $13,000 per year), logistically difficult to administer (Flolan patients cannot live more than 1 hour from their physician and require a permanent central catheter) and not equally accessible to patients in all provinces. One inexpensive and readily available potential treatment is warfarin. Anticoagulation of patients with PAH has been recommended with the rationale that PAH has been characterized histologically by regions of in situ thrombosis, and abnormalities in the coagulation cascade. Recommendations for the use of warfarin in SSc-PAH have been generalized from IPAH studies. There are no studies evaluating the effect of warfarin in SSc-PAH. Nonetheless, it has been recommended that SSc-PAH patients should be considered for treatment and IPAH patients should be treated with warfarin.(67-69) Our updated systematic review of the literature found that the evidence to support both of these recommendations is limited by methodological constraints and conflicting studies.(70) 2.1 Summary of Observational Studies evaluating Warfarin in IPAH Five studies support the effect of anticoagulation in IPAH, while 3 studies do not. Studies suggesting treatment benefit. Follow-up was reported on a retrospective cohort of 115 IPAH patients seen at the Mayo clinic between , and followed until 1983.(71) The mean age at baseline was 34 years, 73% were female, and the baseline mean pulmonary artery pressure (PAP) was 64 mmhg (range 36 to 120 mmhg). Improved 3-year survival was observed in 78 anticoagulated patients compared to 37 non-anticoagulated patients. Unadjusted analysis of the 56 patients who underwent autopsy demonstrated a beneficial effect of warfarin. In a multiple regression analysis, treatment with warfarin was also associated with improved survival. Ogata et al. reported on 20 IPAH patients (6 males and 14 females) who underwent cardiac catheterization between 1964 and 1988.(72) The mean follow-up was 6 years. The mean age at diagnosis was 31.2 years (range 14 to 56 years). All were symptomatic with New York Health Association (NYHA) class II (n =10) or III (n=10). Seven patients were treated with warfarin, of which 3 patients received concurrent 20

32 isoproterenol and 4 patients received concurrent nifedipine. Thirteen patients received neither warfarin nor vasodilators. The warfarin group had a higher age at diagnosis than the control group (39 ± 15.5 years versus 27.2 ± 12.0 years, p < 0.05). There were no differences in functional class or symptom duration between groups. The baseline mpap was 57.5 ± 8.9 mmhg in the treatment group and 49.3 ± 5.0 mmhg in the control group. The warfarin group had a reduction of mpap of 19% (p < 0.05) and a reduction of pulmonary vascular resistance (PVR) of 13% (p < 0.05). The 5-year survival was improved in the warfarin group (57%) compared to controls (15%), p < Rich et al. reported on a prospective cohort study of 64 IPAH patients seen at the University of Illinois between 1985 and 1991.(73) All patients were given calcium channels blockers (nifedipine or diltiazem). Patients were categorized as responders if treatment with calcium channel blockers resulted in a >20% reduction in mean PAP. Patients were categorized as non-responders if there was a <20% reduction in mean PAP. Patients were treated with warfarin if there was evidence of non-uniform blood flow on lung scanning. Seventeen patients were categorized as responders, 8 of whom were treated with warfarin. Forty-seven patients were categorized as non-responders, 27 of whom were treated with warfarin. Accounting for baseline hemodynamics and response to calcium channel blockers, improved overall survival was observed in the warfarin treated patients. Warfarin treated patients who did not respond to calcium channel blockers had 1-, 3-, and 5-year survival rates of 91%, 62%, and 47%, respectively. Patients who did not respond to calcium channel blockers and were not treated with warfarin had 1-, 3-, and 5-year survival rates of 52%, 31%, and 31%, respectively. There are potential limitations to consider in the interpretation of these study results. The primary objective of this study was to evaluate the effect of vasodilator therapy on survival. Evaluation of the effect of warfarin on survival was a post hoc, sub-group analysis. As such, these findings are provocative and hypothesis generating, but should not be used to make firm conclusions. The study results may also be susceptible to confounding by indication since exposure to warfarin was based on the presence of an abnormal lung scan. The authors highlight that this may represent a subgroup of patients with a greater likelihood of survival. 21

33 Roman et al. retrospectively reported on 44 IPAH patients followed in Barcelona, Spain between 1992 and 2000.(74) There were 33 females and 11 males, with a median age of 38.5 years (range 15 to 71 years). The baseline mean PAP was 58 mmhg (standard deviation (SD) 18 mmhg). Six patients were NYHA function class I, 11 patients class II, 25 patients class III and 2 patients class IV. They report that 5 patients improved with warfarin and calcium channel blockers (diltiazem or nifedipine). No additional data were given. Kawut et al. retrospectively evaluated 84 consecutive, newly diagnosed, adult PAH patients between 1994 and 2002.(75) Sixty-six patients had a diagnosis of IPAH. Cointerventions included digoxin, calcium channel blockers, spironolactone, epoprostenol, treprostinil and bosentan. Seventy-nine of 84 (86%) patients were treated with warfarin. Evaluation of survival indicated that warfarin use was associated with improved survival (HR 0.35, 95% CI , p = 0.05). The authors note that the results may be susceptible to bias resulting from confounding by indication, as warfarin is not given to patients with a history of previous bleeding or severe thrombocytopenia. The less severe patients may have been exposed to warfarin resulting in the observed improved outcomes. Studies suggesting no treatment benefit. Three observational studies do not support the use of anticoagulation in IPAH. Goodwin et al. reported a retrospective cohort study on 19 patients with cardiac catheterization confirmed PAH in London, England.(76, 77) Eleven patients had signs and symptoms of pulmonary embolism or thrombosis, and 8 patients were diagnosed as IPAH. The investigators report that anticoagulation failed in all but one treated patient. They attribute the lack of treatment effect to late diagnosis and delay in treatment. No other data characterizing the patients, the treatment or the outcome(s) were given. Another retrospective cohort study of 17 IPAH patients identified from 1,550 right heart catheterizations conducted at a single center over 6 years was reported.(78) There were 7 men and 10 women, with ages ranging from 7 to 70 years. The disease duration ranged from 1 year to 22 years. Ten patients were treated with anticoagulants and 7 patients were not treated. All patients were symptomatic with dyspnea, 13/17 had 22

34 cyanosis, 8/17 experienced syncope and 7/17 had edema. Four patients reported a history of thrombosis. The mean pulmonary arterial pressure (PAP) on cardiac catheterization at baseline ranged from 50 to 125 mmhg. Treatment with anticoagulants (anticoagulant not specified) was titrated to maintain a prothrombinproconvertin time in the range of 10% to 30%. Treatment duration ranged from 2 to 5 years. No other PAH specific medical therapies were described. Of the 9 patients who underwent pathologic evaluation, 4 patients had pulmonary artery thrombosis in conjunction with arteriopathic changes. Follow-up cardiac catheterization was performed in 4 of the anticoagulated patients, and none had a reduction in their PAP during treatment. Six anticoagulated patients died. The researchers concluded that anticoagulation provided neither a reduction in PAP nor a difference in survival between groups. There are issues to consider in the interpretation of this study. It may have been difficult to ascertain a treatment effect due to the mixed sample. It contained both pediatric and adult patients. It also contained patients that were early in their disease (1 year) and late in their disease (22 years). All patients were symptomatic, and more than half had symptoms of severe disease (syncope and edema). These differences were not accounted for in the analysis, and could not be due to the small size of the sample.(15) Frank et al. retrospectively evaluated the effect of warfarin in 69 IPAH patients from Vienna, Austria and Berne, Switzerland.(79) The warfarin group was comprised of 24 patients (17 females). The disease onset occurred between 1948 and The age at diagnosis ranged between years. The baseline mean PAP was 49.8 ± 12.5 mmhg. The non-anticoagulated group consisted of 45 patients (36 females). Symptom onset occurred between 1953 and The baseline mean PAP was 64.5 ± 21.5 mmhg. Co-interventions in both the warfarin exposed and non-exposed IPAH groups included digoxin, diuretics, steroids and alpha-receptor antagonists. There was no difference in 5-year survival between groups. A non-significant survival benefit in the warfarin treated group was observed at 10 years. Only 3 and 4 patients from each group respectively, underwent repeat catheterization. Changes in pulmonary hemodynamics were not reported. Of the 19 warfarin treated patients who had follow-up functional class data, 5 patients improved, 10 deteriorated and 4 patients remained 23

35 stable. Of the 15 warfarin non-exposed patients who had follow-up functional class data, 2 improved, 10 deteriorated and 3 showed no change. Methodological concerns and potential biases. All of these studies raise considerable methodological concerns and are susceptible to potential biases. Aside from the methodological concerns unique to each study, there are a few significant issues that relate to the majority, if not all, of the studies. All but one of the studies predates the modern treatment era, and do not reflect the impact of therapies that are currently available on the market.(71-73, 76-79) None of the studies were blinded. Physicians or patients may change their reporting of outcomes in a systematic manner when they are aware of exposure or non-exposure to warfarin.(10) Blinding the physicians to treatment assignment or blinding the patients to treatment assignment would prevent them from acting differently.(10) Since warfarin use requires regular INR monitoring, both the physicians and patients are aware of treatment assignment. This form of bias affects the internal validity of the study results. All of the studies had small numbers of patients. This limited their ability to account for confounding factors. The negative results may reflect insufficient power. Most importantly, none of the studies were randomized. As such they were all susceptible to bias from confounding by indication. Most studies did not report the basis for anticoagulating patients, or conversely why anticoagulation was withheld from other patients. In all cases, treatment was based on physician judgment. It may be that patients who were sicker or had more severe disease were systematically treated (or not treated). It has been argued that co-morbidity may have precluded use of warfarin in the untreated patients.(80) Warfarin may be withheld from patients with severe PAH manifesting as hemoptysis (from a ruptured, dilated, pulmonary vessel), or thrombocytopenia (resulting from pulmonary vascular sequestration or epoprostenol use).(81) This would result in a bias towards a favorable treatment effect for warfarin resulting from confounding by severity, a form of confounding by indication.(75) 2.2 A Good Model to Develop and Apply Methods The evidence evaluating the use of warfarin to improve survival in SSc-PAH and IPAH is modest, conflicting and constrained by methodological issues. Finally, the use of 24

36 warfarin is not without disadvantages. Patients on warfarin must have their blood tests monitored regularly, which is not easy among SSc patients who have thickened skin and vasculopathy. Warfarin is also associated with a potential risk of major bleeding (necessitating treatment with blood transfusion). The risks of major bleeding with warfarin in SSc-PAH and IPAH are not known. In the setting of chronic anticoagulation with warfarin for atrial fibrillation, major bleeding rates range from 2% to 3% annually.(82, 83) Furthermore, SSc patients with luminal telangiectasia or gastric antral vascular ectasia (GAVE, watermelon stomach ) may be at higher risk of gastrointestinal bleeding.(84, 85) Despite the limited evidence, and significant adverse event profile of warfarin, this treatment continues to be recommended in patients with SSc-PAH and IPAH.(67-69) The clinical dilemma regarding the effect of warfarin on survival in SSc-PAH and IPAH provides an excellent model to develop and apply methods that allow more valid inferences to be made from observational data. The ideal situation, as laid out in Feinstein s challenge, is to develop methods that can make unbiased estimates of treatment effect using observational data that would have been achieved using a comparable randomized trial. The goal, then, is to use bias reduction methods and all available sources of knowledge to make more valid estimates of treatment effect using observational data in uncommon disease. In this dissertation, I test innovative Bayesian and propensity score methods, to evaluate the effect of warfarin on survival in SSc-PAH and IPAH using observational data. In chapter 5, I elicit belief data about the effect of warfarin on survival in SSc-PAH and IPAH patients from expert clinicians. I evaluate factors that influence clinicians use of warfarin. These data are used to develop prior probability distributions representing clinicians beliefs about the effect of warfarin and the determinants of warfarin use. In chapter 6, I conduct a retrospective cohort study using Bayesian propensity score modeling to evaluate the effect of warfarin use on survival in SSc-PAH and IPAH patients who were and were not exposed to warfarin. This study is the first to estimate the survival benefit of warfarin for SSc-PAH patients. 25

37 2.3 Dissertation Aims, Objectives and Hypotheses Overall Aims 1. To develop methods in a Bayesian framework to make propensity-score adjusted estimates of treatment effects using observational data with a survival end point, in the setting of uncommon disease. 2. To quantify experts beliefs about the effect of warfarin on survival in SSc-PAH and IPAH as prior probability distributions using a method that meets the rigors of measurement science. 3. Using these methods, determine whether warfarin improves survival in SSc-PAH and IPAH. Objectives 1. Using strategies that reduce the threat of potential biases on the elicited response, to develop and evaluate the feasibility, validity and reliability of a revised belief elicitation method for Bayesian priors. 2. Using a cross-sectional study, determine expert clinicians beliefs regarding the effect of warfarin on survival on SSc-PAH and IPAH (expressed as prior probability distributions), and determine the importance of factors that influence experts use of warfarin in SSc-PAH and IPAH. 3. Using a retrospective cohort study, a) use data from expert clinicians to derive a propensity score in a Bayesian context, and, b) combine observed survival data from 2 large Canadian inception cohorts with the prior probability distributions derived from experts, to evaluate the effect of warfarin on survival in SSc-PAH and IPAH. 26

38 Hypothesis 1. The revised belief elicitation method will be feasible, valid and reliable. 2. There will be a spectrum of beliefs regarding the effect of warfarin for improving survival in SSc-PAH and IPAH among expert clinicians. 3. Warfarin will have a moderate probability ( 0.70) of improving survival in SSc-PAH and IPAH. 27

39 CHAPTER 3 Methods to Elicit Beliefs for Bayesian Priors. A Systematic Review Publication 1 Johnson SR, Tomlinson GA, Hawker GA, Granton JT, Feldman BM. Methods to elicit beliefs for Bayesian priors. J Clin Epidemiol 2010;63(4): Printed with permission, Elsevier Limited. Methods to elicit beliefs for Bayesian priors. A systematic review Sindhu R. Johnson MD FRCPC 1, 5, George A. Tomlinson PhD 5, 6, 7, Gillian A. Hawker MD MSc FRCPC 1, 3, 5, John T. Granton MD FRCPC 2, Brian M. Feldman MD MSc FRCPC 1, 4, 5, 6. 1 Division of Rheumatology, 2 Divisions of Respirology and Critical Care Medicine, University Health Network, 3 Women s College Hospital, 4 The Hospital for Sick Children, Toronto, Ontario, Canada; 4 Departments of Paediatrics, 5 Health Policy Management & Evaluation, and 6 Dalla Lana School of Public Health, University of Toronto, Toronto, Canada; 7 Division of Clinical Decision Making and Health Care, Toronto General Research Institute, Toronto, Ontario, Canada. Corresponding Author: Sindhu Johnson MD, Division of Rheumatology, Ground Floor, East Wing, Toronto Western Hospital, 399 Bathurst Street, Toronto, Ontario, Canada, M5T 2S8. FAX Sindhu.Johnson@uhn.on.ca Key words. Belief elicitation, Bayesian, Validity, Reliability, Bias Running Title. Belief elicitation methods Word Count. 4,140 28

40 ABSTRACT Objective. Bayesian analysis can incorporate clinicians beliefs about treatment effectiveness into models that estimate treatment effects. Many elicitation methods are available, but it is unclear if any confer advantages based on principles of measurement science. We review belief elicitation methods for Bayesian analysis; and determine if any had incremental value over the others based on its validity, reliability, and responsiveness. Study Design and Setting. A systematic review was performed. MEDLINE, EMBASE, CINAHL, Health and Psychosocial Instruments, Current Index to Statistics, MathSciNet and Zentralblatt Math were searched using the terms (prior OR prior probability distribution) AND (beliefs OR elicitation) AND (Bayes OR Bayesian). Studies were evaluated on: design, question stem, response options, analysis, consideration of validity, reliability, and responsiveness. Results. We identified 33 studies describing methods for elicitation in a Bayesian context. Elicitation occurred in cross-sectional studies (n=30, 89%), to derive point estimates with individual level variation (n=19, 58%). Although 64% (n=21) considered validity, 24% (n=8) reliability, 12% (n= 4) responsiveness of the elicitation methods, only 12% (n=4) formally tested validity, 6% (n=2) tested reliability, and none tested responsiveness. Conclusions. We have summarized methods of belief elicitation for use as Bayesian priors. The validity, reliability, and responsiveness of elicitation methods have been infrequently evaluated. Until comparative studies are performed, strategies to reduce the effects of bias on the elicitation should be utilized. 29

41 What is new? What this adds to what was known? This paper summarizes methods that have been applied for belief elicitation, reviews that which is known about the measurement properties of each method, presents a conceptual framework for the belief elicitation process, identifies pragmatic methodologic strategies to reduce the effect of bias in belief elicitation studies. What should change now? Strategies to reduce the effect of bias include sampling from groups of experts, use of clear instructions and a standardized script, provision of examples and training exercises, avoidance of scenarios or anchoring 30

42 3.1 Background Bayesian analysis is an increasingly common method of statistical inference used in clinical research.(4) Within this statistical inferential paradigm there are different schools of thought amongst statisticians who use a Bayesian approach.(28) The empirical Bayesian approach is one where parameters of the prior distribution are estimated from using the same data used in the main analysis. When no prior information is available, investigators will use a vague prior so that new data will dominate. The fully Bayesian approach is one that considers all sources of pre-existing knowledge admissible for the analysis. One advantage of the fully Bayesian approach over the traditional frequentist approach to statistical inference or the empirical Bayesian approach is the ability to incorporate beliefs into models that estimate treatment effects. Once beliefs are elicited from a sample (e.g., experts in a field), the elicited beliefs (e.g., regarding the probability of a treatment effect) can be graphically expressed as a prior probability distribution. This distribution can be used to document clinical equipoise (a prerequisite for clinical trials)(86), for sample size calculation(86), interim study monitoring(86, 87), and can be incorporated with treatment effect estimates obtained from trials.(88) In a fully Bayesian analysis, when no prior information is available, investigators will use a vague prior so that the new data will dominate. Prior belief is often a combination of fact-based knowledge with subjective impressions based on clinical experience.(89) Critics of use of the fully Bayesian paradigm in clinical trials are concerned that the inclusion of prior belief is too subjective(90) and lacking in methodologic rigor.(89) Bayesian methodologists have been challenged to take a stand for disciplined research methodology. (89) Therefore, in order to apply Bayesian prior probability distributions of existing belief about a treatment effect in clinical trials, clinical researchers would benefit from knowledge of existing belief elicitation methods and identification of methods that have demonstrable methodologic rigor. In particular, belief elicitation methods should be valid, reliable, responsive to change and feasible. Thus, the primary objectives of this study were: 1) to review methods of eliciting prior beliefs for a Bayesian analysis; and 2) to review the measurement properties (validity, 31

43 reliability, responsiveness and feasibility) of these methods to determine if one method had incremental value over another. In order to better understand the processes by which experts formulate a belief, as well as the processes by which investigators can elicit this belief, and the potential biases that may affect the validity, reliability and responsiveness of these methods, the secondary objectives of this study were: 1) to develop a conceptual framework for the belief elicitation process and biases that may affect the elicited response through review of the literature, and 2) to identify methodologic strategies that may reduce the effect of bias on elicitation process. 3.2 Methods Search strategy. Eligible studies were identified using MEDLINE (1950 to week 2 June 2008), EMBASE (1980 to 2008 week 25), CINAHL (1982 to week 2 June 2008), Health and Psychosocial Instruments (1985 to March 2008), Current Index to Statistics (1974 to June 2008), MathSciNet (1940 to June 2008) and Zentralblatt Math (1868 to June 2008) using the search terms (prior OR prior probability distribution) AND (beliefs OR elicitation) AND (Bayes OR Bayesian). Mapping of term to subject heading was used, where appropriate. Titles and abstracts were screened to exclude ineligible studies. Included studies were entered in the Science Citation Index and PUBMED (with use of the related articles tool) to search for other potentially eligible studies. In addition, the bibliographies of included studies and published reviews were searched. Inclusion and exclusion criteria. Eligible papers included: published observational studies, randomized controlled trials, book chapters and technical reports, which describe elicitation of beliefs in a Bayesian context. Studies using human and nonhuman subjects were included. Non-English language studies were excluded. Data abstraction and methodologic assessment. Using a standardized form, the following data were abstracted: sample size, study design (cross-sectional, longitudinal, unspecified), level of elicitation (individual, group), questionnaire administration format (in person, telephone interview, mail, Delphi consensus, other), questionnaire format (paper, computer assisted, other), question format (scenario with/without data provided in stem, predictive question, both, other), response options (visual analogue scale, distribution of probabilities or proportions into bins, other), response rate (percentage, 32

44 not specified, not applicable (methodologic or simulation papers)), analysis (point estimate with group level variation, point estimate with individual level variation), and graphical display (none, probability density function, cumulative distribution function, other). Often respondents are asked to make a probability estimate for an event, which is not definitively known (e.g., probability of survival at 3 years). There may be some uncertainty around the reported point estimate. Group level variation was used to characterize analyses that reported the variability for the groups point estimate. Individual level variation was used to characterize analyses that reported the variability around the point estimate for each individual study participant. Measurement properties. Articles describing elicitation methods were evaluated for consideration of the following properties: 1) Validity. Face validity evaluates if the elicitation method appears to measure what it purports to measure. Content validity evaluates if the elicitation method captures all the relevant aspects of the belief.(91, 92) Criterion validity evaluates the correlation of an elicitation method with the gold standard. Under the assumption that there is no gold standard for the truth or belief, construct validity evaluates the relationship of 2 different methods of measuring the same belief. Convergent construct validity evaluates the correlation between 2 related aspects of the elicited belief, whereas divergent construct validity evaluates the ability of an elicitation method to correctly distinguish between dissimilar beliefs.(16) 2) Reliability. Reliability refers to the reproducibility of the measure. Intra-rater reliability is evaluated when the elicitation method is applied to the same participant(s) on two different occasions, whereas inter-rater reliability is evaluated when the elicitation method is applied to different participants on the same occasion. In the context of belief measurement, inter-rater reliability is of lesser importance. Measures of reliability include the method of Bland and Altman, intra-class correlation coefficient, or Cohen s kappa.(16) 3) Responsiveness refers to the ability of an elicitation method to accurately detect a meaningful change in belief over time when it has occurred.(16, 93) Measures of 33

45 responsiveness may include Cohen s effect size or the standardized response mean.(16) 4) Feasibility refers to the ease of usage of the elicitation method.(94) Determinants of feasibility include time, cost, and need for equipment or personnel. Consideration of validity, reliability, responsiveness and feasibility by investigators was categorized as commented on, evaluated (measure of association or change recorded), or not specified. The measures of validity, reliability and responsiveness cited above (e.g., correlations, kappa) are appropriate when elicitation yields a single value per respondent. When each respondent provides an entire probability distribution, it is not clear how validity, reliability and responsiveness should be measured. Statistical analysis. Summary statistics were calculated using R 2.4 (R Foundation for Statistical Computing, Vienna, Austria). 3.3 Results Search strategy. Systematic review of the literature identified 33 articles, which describe unique methods for belief elicitation in a Bayesian context. (Figure 3.1) Study characteristics. Table 3.1 summarizes the study characteristics. Belief elicitation mostly occurred in cross sectional studies (91%), at the level of the individual (97%), using small sample sizes (median of 11 participants). Questionnaires were largely administered in person (52%) or on paper (52%), and to derive a point estimate with individual level variation (58%). Elicitation methods. Question stems (the question asked of the participant) and response options are summarized in Table 3.2. Investigators have asked participants about the mean(95-97), median(98-101) and mode( ) for a parameter. Participants have been asked to estimate the probability of an outcome/event( ), the proportion of individuals who will have an outcome(86, 113, 114), the relative risk of an outcome(115, 116), the value for a dependent variable given specified values for independent variables(117, 118), and their weight of belief( ). Commonly used response options include direct probability estimates(87, 106), visual analogue 34

46 scale(110, 111, 123, 124), sketching of a graph(101, 102), and use of bins and chips (participants are asked to put the weight of their belief expressed as percentages into discrete intervals). (Figure 3.2)(120, 122, 125, 126) Methods used to illustrate the elicited beliefs include line graphs(86, 124), histograms(115, 127), probability density functions(86, 95, 113, 120, 122, 125), and cumulative distribution functions(97, 123, 128). Measurement properties. Of the identified studies, 64% (21/33) considered the validity, 24% (8/33) the reliability, 12% (4/33) the responsiveness and 55% (18/33) the feasibility of the elicitation methods. (Table 3.1) However only 4 (12%) studies formally evaluated validity, 2 (6%) studies tested reliability, none tested responsiveness and 1 (3%) study formally evaluated feasibility. (Table 3.3) Conceptual framework for belief formulation and elicitation. The formulation of a clinical belief, and the subsequent elicitation of the belief, is a complex process. Based on the literature(86, 87, 95, 98, 105, 107, ), we have developed a conceptual framework for this process (Figure 3.3). An individual s belief about the effectiveness of an intervention is influenced by their knowledge of the research evidence and their clinical experience, which are presumably both approximations of the truth. Some schools of thought suggest that an individual does not have a pre-existing quantification of their belief ready for the picking. (112) Rather, when asked about their belief about an intervention, an individual will synthesize their knowledge and experience into a quantified belief prior. (112) Using an elicitation procedure (question and response option) the investigator tries to elicit the belief. The investigator may quantify the elicited belief, express it graphically, and then combine multiple individual priors to form a group clinical prior (135) which reflects a spectrum of beliefs on the subject. Using the personalistic theory of probability, all self-consistent or coherent beliefs are admissible in a study as long as the individual feels that they correspond with his judgment.(112, 129) The elicitation procedure, the manner in which the belief is elicited, can influence the creation of both the individual s quantified prior and the group s clinical prior.(112) A person may modify the reporting of their quantified belief depending on the 35

47 method by which the belief was elicited. Biases that may threaten the validity of the elicited belief are summarized in Table 3.4.(95) The reliability, responsiveness and feasibility of an elicitation procedure are also important determinants of its utility. Threats to the reliability of an elicitation procedure include lack of understanding of the elicitation procedure, carelessness, lack of interest and fatigue.(131) In the setting of a longitudinal study, an elicitation procedure should also be able to detect any important changes in belief that occur over time as new information is gained. Finally, the implementation of an elicitation method in clinical research is constrained by factors that affect its feasibility. Factors may include costs incurred through implementation of the method, need for specialized personnel or hardware, and the time required of the study participant. Methodologic strategies to reduce bias. Methodologic strategies to reduce the influence of potential biases on the validity and reliability of elicitation methods are summarized in Table 3.4. Strategies to minimize bias can be implemented at each stage of the elicitation procedure: identification of the sample, framing of the question, choice of the response option and summarizing of the data. The sample. The inclusion of clinical experts(97) rather than generalists in an elicitation procedure improves the validity and reliability of the elicited beliefs for a number of reasons.(98, 117, 136, 137) The training of a clinical expert generally extends over a period of time years rather than weeks. During that time, the expert gains extensive experience with the specific events in question and with the factors that affect them.(138) An expert encounters the condition in a repetitive manner and receives relatively immediate feedback for the consequences of their therapeutic decisions.(138) Thus, an expert is one who has thought more deeply, and over a longer period of time about the subject than others have.(95) As a result, experts are able to predict events about which they have special training, and tend to be more consistent in their beliefs than non-experts.(138) Over-confidence, which underestimates realistic doubt(139), occurs among inexperienced individuals. Experienced individuals are more willing to admit to uncertainty.(132, 139) Inexperienced individuals tend to overuse round numbers and label events as impossible rather than assign small probabilities to 36

48 them.(112) This results in the elicited probability distributions being truncated at hard and perhaps unrealistic boundaries rather than extending to include extreme tail areas with very small probabilities.(112, 129) Clinical experience reduces these tendencies.(112) The question. Investigators have asked participants about measures of central tendency(95, 96, ), probability( ), proportion(86, 113, 114), relative risk(115, 116), value for a dependent variable given specified values for independent variables(117, 118), and their weight of belief( ). Insufficient normative goodness (statistical understanding) and insufficient understanding of the elicitation question threaten the validity of the belief elicited.(112) Strategies that have been shown to decrease the influence of these biases include the provision of an example (109, 119, 122) or training exercises.(100, 112) Study participants have reported that examples are helpful.(109) A training exercise improves both normative goodness(131) and reliability(112, 129, 132), and thus has been recommended.(103, 106, 134). Other strategies to improve reliability include the use of clear instructions(134) and use of a standardized script.(113) Investigators have provided a summary of research data(97, 114, 123) or a scenario(104, 105, 115) with the elicitation question. Although this may have the advantage of preventing a radical opinion(87), this may result in anchoring bias where their reported belief is influenced by the data.(104) Study participants give explicit attention to data to which they have been cued.(107) Strategies to reduce anchoring bias include avoidance of data presentation or scrambling the sequence of data presentation between participants.(103) The response option. Use of a dichotomous response option (e.g., I believe this intervention is effective. Yes/No) has insufficient content validity, as clinicians often have beliefs about the magnitude of the effect and varying degrees of certainty in the strength of their belief.(109, 140) Software for belief elicitation has been developed(96, 141), and some studies have been computer assisted.(96, 103, 115, 117) Strategies can be used to reduce the threat to the validity and reliability of the elicited belief of limited normative goodness, or the respondents insufficient understanding of 37

49 the elicitation procedure. Provision of feedback to the participant about the elicited belief allows for self-correction(16), and has been shown to improve probability assessment(105)(23) and reliability.(134) An opportunity for verification and revision of the elicited response allows the participant to detect and revise inconsistencies in their response.(103, 114, 129) Use of a response option that requires betting or utilizes penalties also improves validity and reliability. Participants will reflect more deeply when provided a disincentive as there is a sense of potential loss associated with their response (e.g., an approach where a study participant has to wager his own money based on his assessed probability of an outcome).(129) Bias introduced by base-rate neglect (which occurs when participants fail to take into account the prevalence of the outcome among untreated patients) may be reduced by asking the participant to state the baseline rate or describe the outcome in both untreated and treated patients.(107) Aggregation of data. There are a variety of methods by which individual priors are aggregated to form a group clinical prior. Although some studies have used consensus methods to derive a group clinical prior(100, 108, 114, 127, 128, 142), most studies have combined individually elicited priors. Biases introduced by over-optimism or overconfidence may be reduced through the use of averaging methods for the group clinical prior.(139) Methods for pooling priors have been proposed.(98, 131, 143) It has also been suggested that the elicited belief could be weighted by occupation, level of experience, self-confidence or other personal characteristics.(87) However, the value of these pooling and weighting methods remains uncertain and requires evaluation. Graphical presentation of the combined clinical prior has been used to express the degree of variability of the elicited belief, illustrate the existence of clinical uncertainty, and demonstrate the amount of evidence that would be required from data to convince optimistic and skeptical clinicians. In general, people more easily comprehend normal distributions than fractiles, relative densities or cumulative distribution functions.(112) A probability density function is more intuitive than a cumulative distribution function, and its use is associated with improved feasibility and validity.(112) The use of a concomitant histogram is useful for individuals who are less familiar with probability distributions. The use of simple graphical representations is preferred as the trade-off of more information is busier figures where patterns are harder to see.(133) 38

50 3.4 Discussion This systematic review summarizes methods of belief elicitation for use in a Bayesian analysis. The validity, reliability and responsiveness of the methods have not been adequately evaluated. Identification of the best method based on the principles of measurement science is limited by the paucity of data. With the increasing use of Bayesian analysis in clinical research(4), evaluation of the measurement properties of elicitation methods is required in order for researchers to be confident that the methods meet methodologic standards. In particular, evaluation of the validity and reliability of methods is needed. If belief elicitation is to be used in a longitudinal setting where new information is gained over time, research on the responsiveness of the methods is warranted. Through review of the literature, we have developed a conceptual framework outlining the process by which beliefs about treatment effects are formulated by experts and the process by which investigators may elicit beliefs. We have also identified potential biases that may threaten the validity, reliability and responsiveness of the elicited belief, and incorporated these findings into the conceptual framework. Conceptual frameworks are increasingly being used to guide our thinking.(144) This framework is meant to lay down a foundation on which we synthesize the existing knowledge about the belief elicitation process. It is not meant to be static, but rather meant to be modified as additional insights are gained. We summarize pragmatic methodologic strategies to reduce the effect of potential biases until comparative validity, reliability and responsiveness studies are conducted. Strategies to minimize bias can be implemented at each stage of the elicitation procedure. In an attempt to be comprehensive, we included all studies that elicited belief in a Bayesian context. While some studies elicited prior beliefs and then incorporated it with new data in a fully Bayesian analysis, other studies did not. For example, Bergus et al. evaluated diagnostic clinical reasoning of family physicians by comparing their elicited probabilities of different diagnoses to Bayesian derived probabilities.(104) This study was conducted in a Bayesian context, but did not use the elicited beliefs in a Bayesian analysis. 39

51 Future investigators are reminded that the term probability elicitation has been used in the literature with two different meanings.(28, 144) Using Bayesian inference, subjective probabilities are not uncertain and are not estimated. A probability is stated and used to describe one s uncertainty. However, probability elicitation is also used to estimate proportions or frequencies.(144) For example, investigators may ask participants to estimate their probability of being in struck by lightening, when investigators are actually asking for an estimate of the proportion of individuals who are struck by lightening. Estimating the probability of the event does not allow one to consider uncertainty. Using a Bayesian paradigm, investigators could elicit both an estimate of this proportion, and the individual's uncertainty about this proportion. One area of uncertainty is the number of participants required for a belief elicitation study.(86, 87) We found the median sample size of participants in belief elicitation studies is 11. Some investigators have advocated for the inclusion of more than one expert(86, 87), as groups of experts are thought to perform better than the average solitary expert.(103, 136) A group of participants is less likely to be dominated by a radical opinion.(87) The number of experts to include in a study is also constrained by the cost of information (time(103, 139), administration(97), personnel(139)). Indeed, the addition of an expert with beliefs identical to one already elicited does not add to the range of beliefs collected in the study.(98) The correct method of sampling experts is also uncertain. The selection of a group of experts to participate in a belief elicitation study is intended to yield some knowledge about the population of experts. It may not be possible to study the whole population. One option is simple random sampling. However, experts are not likely to be statistically independent. It may be preferable to include experts chosen non-randomly (e.g., purposive expert sampling) and capture a range of opinions of the target population.(145) Software for belief elicitation has been developed(96, 141), and some studies have been computer assisted.(96, 103, 116, 117) This has the advantage of instant graphical presentation of the elicited belief. However, these technologies have been criticized for their lack of usability and intuitiveness.(141) This is likely to be related to the software in 40

52 question. Computer-assisted elicitation studies have been performed one-on-one. Internet based, computer-assisted belief elicitation surveys may be an option for future studies. Evaluation of the validity of a belief elicitation method for Bayesian priors is challenged by the lack of a true objective probability that represents subjective uncertainty about a fixed, unknown quantity. In the psychology literature, there have been studies that measure the calibration of elicited distributions compared to the true value that has been verified by the investigator (e.g., population of a country, dates of historical events, meaning of words).(146) The participants in these studies are usually non-experts (e.g., university students, League of Women Voters).(146) The use of these calibration methods in studies evaluating the probability of an intervention s treatment effect is limited, as the true treatment effect is not known. Pre-existing clinical trials or observational studies may provide estimates of the treatment effect but the truth remains unknown. In the setting where the gold standard is not known, an alternative option would include evaluation of construct validity. For example, one study examined intensive care unit physicians judgments for the probability of survival for patients compared with probabilities generated by a logistic model derived from the APACHE II illness severity index.(147) The physicians had greater discrimination than the model and identified those who were likely to die.(144, 147) Whether it is better to include experts or non-experts remains a subject of controversy. The results of this review suggest that the inclusion of clinical experts rather than generalists in an elicitation procedure improves the validity and reliability of the elicited beliefs. Whether prior beliefs should be included in a Bayesian analysis is also controversial. Proponents of the empirical Bayesian approach do not use information external to the data at hand. We argue that the fully Bayesian approach, whether priors are informative or vague, more closely approximates true medical practice. Often, there is no published evidence available to guide physicians ability to make a diagnosis, prognosis or decision to institute a therapy. In these settings, clinicians will use other sources of knowledge (education, experience, expert opinion) to guide their beliefs. The fully Bayesian approach allows quantification and incorporation of these beliefs into statistical models. The onus remains on clinical investigators to use belief elicitation 41

53 methods that have demonstrable methodologic rigor. In addition, Hiance et al. have demonstrated that elicitation of prior beliefs is not only feasible, but allows for insights to be gained into the variability of experts beliefs.(148) Consideration of a variety of prior distributions allows for approximation of the posterior distributions held by all types of readers.(148) They suggest that elicitation from a set of experts should be considered as part of the design of future trials.(148) By summarizing methods that have been applied for belief elicitation, reviewing that which is known about the measurement properties of each method, developing a conceptual framework for the belief elicitation process and identifying pragmatic methodologic strategies to reduce the effect of bias, we have synthesized the current state of knowledge for clinical researchers. This study lays the necessary groundwork for future research by highlighting areas of needed investigation. Through the use of measurement properties as criteria to assess the utility of belief elicitation methods, we are rising to the challenge of using disciplined research methodology(89) when applying the Bayesian paradigm to clinical trials. Our ability to comparatively evaluate the identified elicitation methods is limited by the paucity of data evaluating their measurement properties. It should be noted that for the majority of the studies, evaluation of the methodologic properties of the elicitation method was not the intent of the investigators. Furthermore, evaluation of the measurement properties of the methods may not have been considered necessary. In an era of evolving and more rigorous methodologic standards(91), evaluation of the measurement properties of the methods is needed, and will provide objective criteria on which to decide the comparative utility of the various methods. 3.5 Conclusion This systematic review of the literature summarizes methods of belief elicitation for a Bayesian analysis. The measurement properties of the methods have not been adequately evaluated. Further evaluation of the validity, reliability and responsiveness of elicitation methods is needed. Until comparative studies are performed, methodologic strategies to reduce the effect of bias on the validity and reliability of the elicited belief should be utilized. Based on the results of this systematic review, we recommend the 42

54 following strategies: include sampling from groups of experts, use clear instructions and a standardized script, provide examples and/or training exercises, avoid use of scenarios or anchoring data, ask participants to state the baseline rate in untreated patients, provide feedback and opportunity for revision of the response, and use simple graphical methods. 43

55 Acknowledgements Dr. Sindhu Johnson has been awarded a Canadian Institutes of Health Research Phase 1 Clinician Scientist Award. Dr. Brian Feldman is supported by a Canada Research Chair in Childhood Arthritis. 44

56 Table 3.1 Summary of study characteristics STUDY CHARACTERISTICS Number (%) N = 33 ARTICLE Methodological 4 (12%) Applied 26 (79%) Both methodological and applied 3 (9%) STUDY DESIGN Study design Cross-sectional study 30 (91%) Longitudinal study 2 (6%) Not applicable 1 (3%) Level of elicitation Individual 32 (97%) Small group 0 (0%) Not applicable 1 (3%) Use of consensus methods 4 (12%) SAMPLE Sample size median (range) 11 (1-298)* QUESTIONNAIRE 45

57 Format Paper 17 (52%) Computer 7 (21%) Combined 1 (3%) Other 3 (9%) Not specified 5 (15%) Administration In person 19 (58%) Telephone 2 (6%) Mail 7 (21%) Combined 1 (3%) Not specified 3 (9%) Not applicable** 1 (3%) Response rate Rate median (range) 100% (50% - 100%) *** Not specified 10 (30%) ANALYSIS Level of analysis Point estimate with group level variation 8 (24%) Point estimate with individual level variation 19 (58%) Other 6 (18%) 46

58 MEASUREMENT PROPERTIES**** Consideration of validity 21/33 (64%) Consideration of reliability 8/33 (24%) Consideration of responsiveness 4/33 (12%) Consideration of feasibility 18/33 (55%) * Excluding studies where n = 0 or not specified ** Belief elicitation was conducted in hypothetical participants *** Excluding studies where n = 0 or 1 **** Each measurement property may occur more than once 47

59 Table 3.2 Summary of elicitation methods Authors Question Response option Errington 1991, Abrams 1994 a) Express your belief about neutron therapy compared to an expected 12-month failure rate of 50% in the photon arm of the trial. Given 20 counters, place 2 of them at the upper and lower limits of belief. Place the remaining 18 counters so as to express their remaining prior beliefs about the neutron failure rates. Bergus 1995 a) Estimate the probability of 3 diagnostic alternatives. Specify values b) Given additional information, give the post test probability of 3 diagnostic alternatives. c) Estimate the false negative rate and true negative rate of a normal CT scan. d) Estimate final probability estimates for the 3 diagnoses. Chaloner 1993, Carlin 1993*modified from Freedman 1983 a) Estimate the probability of experiencing toxoplasmosis within 2 years of treatment on placebo, clindamycin and pyrimethamine respectively. b) Guess the upper and lower quartiles of the probability s distribution. Probability on placebo = X% Probability on clindamycin = Y% Probability on pyrimethamine = Z% Chaloner 1996*modified from Chaloner 1993 a) What is your best guess of the percentage of people assigned to daily TMS group who will experience PCP 2 years after enrollment? 48 a) X% b) Y% and interval

60 b) Think about the people on thrice weekly arm and think about an interval estimate for what you would expect for the percentage of people on the thrice weekly TMS arm who will experience PCP in two years given that the proportion experiencing PCP on the daily TMS arm is what you guessed. Please specify the interval by an upper and lower number within which you think that the percentage of people experiencing PCP on the 3 times a week arm will lie. Chaloner 2001*modified from Chaloner 1996 a) What is your estimate of the percent of subjects randomized to daily TMS who will experience PCP during the two years after entry? b) What is your estimate of the percent of subjects randomized to thrice weekly TMS who will experience PCP during the two years after entry? c) Write down the difference between the two estimated percents. a) X% b) Y% c) X% - Y% d) 95% probability interval from to d) What is your estimate of the 95% probability interval of this difference? De Vet 1993 State belief about the hypothesis, A high intake of betacarotene protects against cervical cancer. 10 cm VAS 0-100% Dumouchel 1998 a) Specify parameters to be assessed and range for each parameter. Specify values b) Specify the log relative risk and uncertainty. Evans 2002 a) 40% of students are in the Engineering faculty. What is the probability that a member of the Drama society is also in the Engineering faculty? a) X% Feedman 1983, 1986, Spiegelhalter 1993** a) What is the most likely level of improvement to be gained from Thiopeta? b) Choose upper and lower bounds that are very unlikely to be exceeded. a) point estimate b) d) sketch graph 49

61 c) Define very unlikely. d) Estimate the chance of exceeding intermediate points. Flournoy 1994 a) Sketch a 95% probability interval for the dose response curve. Graph with probability of death 0-100% on vertical axis, and medication dose mg/kg on horizontal axis Garthwaite 1991 (a) Specify name and range for independent variables. (b) Estimate experimental error. Specify values (c) Estimate parameters. Garthwaite 1992 (a) Specify name and range for independent variables. (b) Estimate experimental error. Specify values (c) Estimate parameters. Gustafson 2003 a) Suppose you were asked to predict whether a project would be successfully implemented. You can ask me any question you want about the project and I will find the answer for you. What questions would you ask of me? Specify parameters and estimates b) Please give me examples of answers that would make you optimistic and pessimistic about the chances of success. c) Estimate the prior probability of implementation s success using an estimate-talk-estimate approach. Hughes 1991 *based on Spiegelhalter 1986 a) Define the lower and upper extremes of belief in relative reduction/increase in mortality. b) Place an adhesive dot above the most likely value and then add 19 stickers to indicate their beliefs for the outcome of the trial. Graph with adhesive dots simulating a histogram 50

62 Hutton 1993 a) Estimate the minimum, lower quartile, median, upper quartile and maximum prevalence of child abuse in children under the age of 10. Specify values Johnson 2006 a) Please give your best estimate of the relative probability of pregnancy in the 6 months following a lipiodal hysterosalpingogram, compared to no intervention probability of pregnancy being 1.0. b) Please give 95% confidence limits to this estimate. c) What is the minimum relative probability of pregnancy following a lipiodal hysterosalpingogram that would justify, in your opinion, this being used as a standard for some women with unexplained infertility? a) Relative probability = X b) Lower limit = Y, Upper limit = Z c) Relative probability Jones 1998 *based on De Vet 1993 Estimate degree of belief that magnesium sulfate is effective in eclampsia before and after publication of trial results. Linear analogue 10 cm scale Kadane 1980 a) Identify factors associated with fatigue cracking. Specify values b) Estimate the predictive distribution of the dependent variable given fixed values of the independent variables. Kadane 1986, 1994 a) In a patient with this set of characteristics, which therapy would you choose? a) X or Y b) In a patient with this set of characteristics, estimate the median, 75 th and 90 th percentile of the dependent variable on each therapy. Kadane 1992 a) How did you vote in the first ballot? Specify values b) What was the distribution of the votes on the first ballot? Kadane 1998 a) Estimate the prior mean. Specify values b) Estimate the degrees of freedom parameter. c) Specify the range of each of the covariates. 51

63 d) Specify the 50 th, 75 th and 90 th percentiles of y for each vector x. Lehmann 2000 a) Specify mean difference between 2 therapies and 95% Bayesian confidence interval. Specify values Li 2005 a) What is your guess of the percentage of the 758 first words in this particular edition of Of Human Bondage that have six or more letters? b) Imagine you were allowed to draw a sample of 10 randomly selected first words out of 758 pages. What weight (in decimal numbers) do you assign to a random sample of 10? c) What weight do you assign to the data if you were allowed to randomly select a larger sample of 50 pages from a total of 758? a) The percentage is X% b) My weight placed on a sample of 10 is. c) My weight placed on a sample of 50 is. Lilford 1994 a) What is the relative risk of permanent morbidity likely to be in a hypothetical and infinitely large randomized trial of similar patients? b) What would you consider a surprisingly good or bad result in a hypothetical trial? Analogue dial 1= no difference between immediate delivery, 0.5=chance of morbidity is halved by immediate delivery, 2=chance of morbidity is doubled Lilford 1996 a) Estimate relative risk. Specify values b) Estimate a 95% credible interval for the relative risk. O Hagan 1998 a) Specify upper (U) and lower (L) bounds for a quantity. Specify values b) Specify the mode (M) (the most likely value) Give probabilities for the following intervals: c) L,M d) L, (L+M)/2 52

64 e) (M+U)/2, U f) L, (L+3M)/4 g) (3M+U)/4, U Parmar 1994, 2001 We are interested in your expectations of the difference in 2- year survival rates, which might result from employing CHART rather than the standard radical radiotherapy for eligible patients. Enter your weight of belief in each of the possible intervals. The stronger you believe that the difference will truly lie in a given interval the greater should your weight for that interval. If you believe that it is impossible that the difference lie in a given interval your weight should be zero. Your weights should add up to 100. X% entered in boxes Ramachandran 2001 Specify the distribution, mean and relative standard deviation or lower and upper bound of distribution for each parameter. Specify values Tan 2003 *modified from Parmar We are interested in your expectations of the difference in 2- year survival rates, which might result from employing treatment X rather than the standard Y for eligible patients. Enter your weight of belief in each of the possible intervals. The stronger you believe that the difference will truly lie in a given interval the greater should your weight for that interval. If you believe that it is impossible that the difference lie in a given interval your weight should be zero. Your weights should add up to 100. X% entered in boxes Ten Centre Study Group 1987 a) Estimate the percentage reduction in mortality of artificial surfactant in babies of 25 to 29 weeks gestation. Specify values. Van der wilt 2004, Rovers 2005 Estimate the probability of complete hearing recovery and normal language recovery within a year, in a situation without treatment and in a situation with ventilation tube insertion. VAS (10 cm) 0-100% White 2005 *modified from We are interested in your expectations of the difference in rates of death or hospitalization, which might result from X% entered in boxes 53

65 Parmar employing treatment X rather than the standard Y for eligible patients. Enter your weight of belief in each of the possible intervals. The stronger you believe that the difference will truly lie in a given interval the greater should your weight for that interval. If you believe that it is impossible that the difference lie in a given interval your weight should be zero. Your weights should add up to 100. Suppose the annual event rate on placebo is 18%, what is your expectation for the annual event rate on X? Winkler 1967 Cumulative Distribution Function: (a) p =A% a) What is the probability that a random student at the university is male? b) Can you determine a point such that it is equally likely that p is less than or greater than this point? (b) I2 = B% (c) I3 = C% (d) I4 = D% c) Now suppose that you were told that p is less than I2. Determine a new point such that it is equally likely that p is less than or greater than this point. d) Now suppose that you were told that p is less than I3. Determine a new point such that it is equally likely that p is less than or greater than this point. Probability Density Function: a) What do you consider the most likely value of p? b) Can you determine 2 values of p (one on each side of p) which are about half as likely as the value in a)? c) Can you determine a point such that ½ the area under the graph of the density function is to the left of the point and half of the area is to the right of the point? d) Such that ¼ of the area is to the left of the point and ¾ is to the right? e) Such that ¾ of the area is to the left of the point and ¼ is 54

66 to the right? f) Such that 1/100 of the area is to the left of the point and 99/100 is to the right? g) Such that 99/100 of the area is to the left of the point and 1/100 is to the right? Questions have been paraphrased for space ** Used same method 55

67 Table 3.3 Summary of studies, which considered validity, reliability, responsiveness and feasibility. Authors Validity Reliability Responsiveness Feasibility Errington 1991, Abrams 1994 NS* NS NS NS Bergus 1995 Commented NS Commented NS Chaloner 1993, Carlin 1993 Commented NS NS Commented Chaloner 1996 Commented Commented NS Commented Chaloner 2001 Commented NS NS Commented De Vet 1993 Commented NS Commented NS Dumouchel 1998 Commented NS NS Commented Evans 2002 NS NS NS NS Feedman 1983, Spiegelhalter 1986, 1993 Commented NS NS Commented Flournoy 1994 NS NS NS Commented Garthwaite 1991 Commented Commented NS Commented Garthwaite 1992 NS NS NS Commented Gustafson 2003 Literature review to ensure content validity. Concurrent validity: correlation coefficient 0.77 Commented NS Commented Hughes 1991 NS NS NS NS 56

68 Hutton 1993 NS NS NS NS Johnson 2006 Commented NS NS Evaluated Jones 1998 NS NS Commented Commented Kadane 1980 NS NS NS NS Kadane 1986, 1994 NS Commented NS Commented Kadane 1992 NS NS NS NS Kadane 1998 Commented Commented NS NS Lehmann 2000 Commented NS NS Commented Li 2005 Poor accuracy, Intra-rater NS NS calibration <30% for reliability: 80% confidence correlation coefficient 0.63 Lilford 1994 NS NS NS NS Lilford 1996 NS NS NS NS O Hagan 1998 Commented NS NS Comment Parmar 1994, 2001 Commented NS NS Commented Ramachandran Criterion validity Inter-rater NS NS 2001 R 2 = reliability: R 2 =0.9 Tan 2003 NS NS NS Commented Ten Centre Study Group 1987 NS NS NS NS Van der wilt 2004, Rovers 2005 Commented Commented Commented NS White 2005 Commented NS NS Commented 57

69 Winkler 1967 Concurrent validity: 2 methods were consistent 65/75 of the time NS NS Commented *NS Not specified 58

70 Table 3.4 Biases in belief elicitation and methodologic strategies to their effect Potential biases Methodologic strategy Identification of the sample Substantive goodness: knowledge of the clinical context.(131) Participants with more contextual experience provide more valid and reliable quantitative descriptions of their belief.(131, 136, 137) Overconfidence may bias the validity of the elicited belief where some clinicians provide very little uncertainty around their estimate, corresponding to strong beliefs(28, 109) and do not reflect realistic doubt.(138) Representativeness bias may occur when clinicians give more credence to study findings that conform to what they believe the results should look like.(128) Conservatism may occur when clinicians beliefs confer less certainty to their belief than is justified by the data.(111) Believability: clinicians are more likely to be influenced by study findings that are concordant with their preconceived beliefs about the disease process or treatment effect.(128) Include experts Include experts, sample size greater than 1 Include representation of the spectrum of belief Include representation of the spectrum of belief Include representation of the spectrum of belief Framing the question stem Normative goodness: knowledge of probability and statistics.(131) Participants with more mathematical experience provide more valid and reliable quantitative descriptions of their belief.(131, 136, 137) Ease of use, clarity Provide an example(4, 26, 38) or training exercise(99, 111) Use clear instructions(133) and/or standardized script(112) 59

71 Anchoring bias: the reported belief is influenced by presentation of data/scenario.(103) Ordering: Participants probability estimates are influenced by data presented at the beginning of the question stem (primacy effect) while others are influenced by data presented at the end of the question stem (recency effect).(103) Avoid scenarios or summary of data Avoid scenarios or summary of data or scramble the sequence of data presentation between participants(102) Choice of response option Normative goodness Base-rate neglect: occurs when participants fail to take account the prevalence of the outcome among untreated patients.(106, 128) Provision of feedback, verification, opportunity for revision(102, 113, 135) State baseline rate or outcome in untreated patients(106) Summarizing the data Over-optimism, overconfidence Normative goodness Use averaging methods for the group clinical prior(138) Use simple figures 60

72 473 Citations identified Medline n = 45, Embase n = 41, CINAHL n=2, Health and Psychosocial Instruments n = 0, Current Index to Statistics n =17, MathSciNet n = 181, Zentralblatt Math n =125, Additional articles identified from screening of reference lists, identification in Science Citation Index and book chapters n = Citations excluded Screening of titles and abstracts for relevance to study, duplicate citation 118 Papers for full review 74 Articles excluded after full review Did not elicit beliefs n=34, Methodologic papers not relevant to study aim n = 38, Elicitation method not specified n=2, Elicited belief, but not Bayesian context n=3 41 Articles for inclusion into study 8 Articles report previously published data 33 Unique methods for elicitation in a Bayesian framework Figure 3.1. Flow diagram of systematic review results 61

73 You have been given 20 stickers. Each sticker represents 5% probability. Placing the stickers in the intervals, indicate the weight of belief for your survival estimates. Probability X X X 0 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100% Figure 3.2 Example of a bins and chips belief elicitation method. 62

74 Truth Conservatism Clinical Experience Research Group Clinical Prior Base-rate neglect Believability factor Representativeness Sample size (?) Belief Graphical display Anchoring bias * Primacy effect * Recency effect Quantified Prior Elicitation procedure INDIVIDUAL METHOD Substantive goodness Normative goodness Figure 3.3 Biases affecting the validity of belief elicitation 63

75 CHAPTER 4 A Valid and Reliable Belief Elicitation Method for Bayesian Priors. PUBLICATION 2 Johnson SR, Tomlinson GA, Hawker GA, Granton JT, Grosbein HA, Feldman BM. A valid and reliable belief elicitation method for Bayesian priors. J Clin Epidemiol 2010;63(4): Printed with permission, Elsevier Limited. A valid and reliable belief elicitation method for Bayesian priors. Sindhu R. Johnson MD FRCPC 1,5, George A. Tomlinson PhD 5,6,7, Gillian A. Hawker MD MSc FRCPC 1,3,5, John T. Granton MD FRCPC 2, Haddas A. Grosbein B.Sc. 4, Brian M. Feldman MD MSc FRCPC 1,4 5,6. Divisions of 1 Rheumatology, 2 Respirology and Critical Care Medicine, University Health Network, Toronto, Ontario, Canada 3 Women s College Hospital, Toronto, Ontario, Canada 4 The Hospital for Sick Children, Toronto, Ontario, Canada; 4 Departments of Paediatrics, 5 Health Policy Management & Evaluation, and 6 Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada; 7 Division of Clinical Decision Making and Health Care, Toronto General Research Institute, Toronto, Ontario, Canada. Corresponding Author: Sindhu Johnson MD, Division of Rheumatology, Ground Floor, East Wing, Toronto Western Hospital, 399 Bathurst Street, Toronto, Ontario, Canada, M5T 2S8. FAX Sindhu.Johnson@uhn.on.ca Key words. Belief Elicitation, Bayesian, Validity, Reliability, Priors Running Title. Valid and reliable belief elicitation method Word Count. 4,250 64

76 ABSTRACT Objective. Bayesian inference has the advantage of formally incorporating prior beliefs about the effect of an intervention into analyses of treatment effect through the use of prior probability distributions or priors. Multiple methods to elicit beliefs from experts for inclusion in a Bayesian study have been utilized; however, the measurement properties of these methods have been infrequently evaluated. The objectives of this study are to evaluate the feasibility, validity, and reliability of a belief elicitation method for Bayesian priors. Study design and setting. A single center, cross-sectional study using a sample of academic specialists who treat pulmonary hypertension patients was conducted to test the feasibility, face and construct validity, and reliability of a belief elicitation method. Using this method, participants expressed the probability of 3-year survival with and without warfarin. Applying adhesive dots or chips, each representing 5% probability, in bins on a line, participants expressed their uncertainty and weight of belief about the effect of warfarin on 3-year survival. Results. Of the 12 participants, 11 (92%) reported that the belief elicitation method had face validity, 10 (83%) found the questions clear and 11 (92%) found the response option easy to use. The median time to completion was 10 minutes (5-15 minutes). Internal validity testing found moderate agreement (weighted kappa = ). The intra-class correlation coefficient for test-retest reliability was Conclusion. This method of belief elicitation for Bayesian priors is feasible, valid and reliable. It can be considered for application in Bayesian clinical studies. 65

77 What is new? Key Finding This belief elicitation method is feasible, valid, and reliable. Key Finding This belief elicitation method is feasible, valid and reliable. What this adds to what was known? What this We adds present to what a pragmatic was known? strategy for quantifying clinicians beliefs for a Bayesian We present study a pragmatic that fulfills strategy the rigors for quantifying of measurement clinicians science. beliefs for a Bayesian study, which fulfills the rigors of measurement science. What should change now? What should This method change of now? belief elicitation can be considered for application in Bayesian clinical studies. 66

78 4.1 Background The use of Bayesian inference is increasingly common in the medical literature.(4) It has the advantage of formally incorporating prior beliefs about the effect of an intervention into analyses of treatment effect through the use of prior probability distributions or priors. (86) Although studies have used a variety of methods to elicit beliefs from experts for inclusion in a Bayesian study, the measurement properties (validity and reliability) of these methods have been infrequently evaluated.(50) Critics of the Bayesian paradigm have called for the development of realistic priors balanced with the use of disciplined research methodology.(89) One method of eliciting a quantified belief about a treatment effect from clinicians is the method of bins and chips. (125) As an example of the method of bins and chips, participants are asked to define the lower and upper extremes of their belief in the effect of an intervention (for example, relative reduction or increase in mortality). They place an adhesive dot above the most likely value and then add 19 dots to indicate other values they believe possible for the size of the effect of the intervention, thus creating a graph with adhesive dots that has the appearance of a histogram.(125) Other investigators have used a similar method, where experts are asked to indicate their weight of belief about the effect of a treatment by assigning 100 percent probability into one or more pre-defined discrete intervals.(120, 122, 126) Investigators are then able to create a prior probability distribution reflecting the belief of the participants. However, our review found that the validity and reliability of the elicited prior distribution may be threatened by a number of biases.(50) Insufficient knowledge of probability and statistics may threaten the validity and reliability of the elicited response as participants with more mathematical experience provide more valid and reliable quantitative descriptions of their belief.(132, 137, 138) Ease of use bias and clarity bias (related to the instructions and questions) may disturb the elicited response if the participant does not understand what is being asked or how to express the required response. Anchoring bias may occur if the reported prior is influenced by presentation of data or a scenario.(104) Ordering of the questions within the whole elicitation process may bias 67

79 the response as participants probability estimates can be influenced by the presentation of data. Ordering bias can occur in two forms. The primacy effect occurs when participants are influenced by data presented at the beginning of the question. The recency effect occurs when participants are influenced by data presented at the end of the question.(104) Base-rate neglect occurs when participants fail to take into account the prevalence of the outcome among untreated patients.(107, 130) Strategies that may reduce the effect of these biases on the validity and reliability of the elicited response include: provision of an example(119, 122) or training exercise(100, 112); use of clear instructions(134) or a standardized script(113); avoidance of scenarios or summaries of data; provision of feedback, verification, opportunity for revision(103, 114, 129); and a statement of the baseline rate or outcome in untreated patients(107). Based on previously used belief elicitation methods(120, 122, 125, 126) and the recommended strategies to reduce the effect of potential biases(50), we report a revised method for eliciting beliefs from experts in the form of prior distributions to be used in Bayesian analysis of clinical research. Prior to implementation however, the measurement properties of this method need to be evaluated. As a clinical example, we have chosen to study experts beliefs about the effect of warfarin for improving survival in scleroderma-associated pulmonary arterial hypertension (SSc-PAH) and idiopathic pulmonary arterial hypertension (IPAH). Pulmonary hypertension is an uncommon condition characterized by high pressure in the blood vessels of the lungs. It is believed to be related, in part, to blood clots in these vessels.(149) Over time, the high pressures lead to right heart failure, and death. The median survival times from diagnosis in SSc-PAH and IPAH are 12 months(61) and 2.8 years(64) respectively. There is no cure. Some experts have advocated the use of the anticoagulant warfarin to improve survival with the belief that it will reduce the burden of blood clots. However, there are no randomized trials evaluating the effect of warfarin in these patients. Observational studies in this area provide conflicting evidence and are constrained by methodologic issues.(70) Thus, treatment recommendations are influenced by experts beliefs about the effect of this intervention. A randomized trial is needed to definitively evaluate the efficacy of warfarin for improving survival in SSc-PAH and IPAH patients. Prior to implementation of such a 68

80 trial, clinicians beliefs need to be evaluated to i) capture existing knowledge in this area(86), ii) document the existence of clinical equipoise(150), and iii) estimate the magnitude of treatment effect.(86) Using the example of clinicians beliefs about the effect of warfarin for improving survival in SSc-PAH and IPAH, the objectives of this study were to evaluate the feasibility, face validity, construct validity and reliability of our method of belief elicitation for a Bayesian prior. 4.2 Methods Participants. A purposive sample(145) of academic specialists at the University of Toronto who treat SSc-PAH or IPAH patients were recruited. The following characteristics of study participants were collected: sex, specialty (rheumatology, respirology, cardiology, other), years in practice treating PAH patients, type of practice (non-teaching hospital, teaching hospital, both), number of new SSc-PAH and/or IPAH patients seen per year, country of primary PAH practice, history of formal statistical training (yes, no), use of warfarin for the treatment of PAH patients in their practice (yes, no). English as a first language (yes, no) was ascertained as it may influence the understanding of the questions. Belief elicitation procedure. A face-to-face interview was conducted with each participant using a standardized script. Sample questionnaire. Participants were given a sample questionnaire to illustrate the belief elicitation procedure. The sample questionnaire was identical to the study questionnaire but used a fictitious example of vitamin C as the therapeutic intervention. The investigator read each question aloud and demonstrated a sample answer. Participants were given the opportunity to ask questions about the sample questionnaire and response options. Once all questions were answered, participants proceeded to the study questionnaire. (Appendix 1) Study questionnaire. The investigator read each question aloud. Participants were asked, for an average group of newly diagnosed SSc-PAH (and separately, for an 69

81 average group of IPAH patients) to specify the probability of being alive at 3 years among patients 1) not treated with warfarin and 2) treated with warfarin. They were asked to indicate their response by placing an X on a line with 5% probability intervals ranging from 0% to 100%. (Figure 4.1) Participants were asked to express the uncertainty around their estimate of survival among warfarin treated patients by placing an X at the upper and lower limit of their estimate. Specifically, participants were told There may be some uncertainty around your estimate of survival. You may believe that the probability of survival could be a little lower or a little higher. Please indicate the lower boundary of your estimate for which you believe there is very little probability that the true estimate could be less than. Please indicate the higher boundary of your estimate for which you believe there is very little probability that the true estimate could be greater than. (Appendix 2) Participants were given a washable black marker to make their X. Participants could erase or cross out their response if they wished to revise their response. Participants were asked to indicate the weight of their belief for the probability of 3-year survival among warfarin treated patients by placing 0.64 cm diameter circular adhesive dots each representing 5% probability in discrete interval bins. Since observational studies have suggested that the median survival without treatment with warfarin is 1 year for SSc-PAH(61) and 2.8 years for IPAH(64), we chose a 3-year reference point. Three-year survival is a commonly used end-point in PAH survival studies.(64) Participants were given 20 dots adding to 100% of probability. (Figure 4.1) The investigator placed one dot in each bin based on the upper and lower boundaries indicated by the participant in question 3 to reduce the risk of error between questions 3 and 4. The investigator verified with the participant if the placement of dots was correct. The participant was asked to place the remaining 18 dots to indicate their weight of belief. Participants were asked to review the shape and distribution of dot placement, and asked to verify if this distribution truly reflected their belief about the effect of warfarin on survival. If not, the participants were given the opportunity to revise their placement of adhesive dots until they felt it truly reflected their belief. Finally, participants were asked to categorize their belief about the overall effect of warfarin using the response options, improves survival, worsens survival or no effect on 70

82 survival. (Appendix 2) A 3-option response was used because it was simple, balanced and the options have an equal conceptual distance (reducing measurement error related to misunderstanding).(151) Throughout the questionnaire, participants had the opportunity to request clarification. The questionnaire was laminated so that the adhesive dots could be easily removed if a participant wished to revise their response. Upon completion, the questionnaire was re-laminated to prevent movement of the dots. The same belief elicitation method was used to elicit the beliefs of experts about the effect of warfarin in SSc-PAH and IPAH separately. Assessment of measurement properties. Face validity evaluates if the elicitation method appears to measure what it purports to measure.(152) The face validity of the elicitation procedure was assessed by asking the questions, Do you feel that this questionnaire evaluated your belief about the effect of warfarin on survival in idiopathic and/or scleroderma-associated pulmonary arterial hypertension? Yes or No. Feasibility refers to the ease of use of the instrument in terms of time of completion and ease of scoring. Respondents were asked to comment on the clarity of each question, ease of each response option, and we measured their time to complete the elicitation process. Construct validity evaluates the relationships between different methods of measuring the same construct in this case the same belief. Convergent construct validity evaluates the correlation between 2 related aspects of the elicited belief, whereas divergent construct validity evaluates the ability of an elicitation method to correctly distinguish between dissimilar beliefs.(152) The convergent construct validity of assessment of warfarin effect was assessed by comparing the results of the question What overall effect do you believe warfarin has on 3-year survival? Improves survival, worsens survival or has no effect on survival with the effect observed by subtracting the point estimate for 3-year survival in the patients not treated with warfarin from the point estimate for 3-year survival in the warfarin treated patients. An improvement in survival was indicated if the probability of survival with warfarin was greater than the probability of survival without warfarin. Conversely, worsened survival was indicated if the probability of survival with warfarin was less than the probability of survival without warfarin. No effect was indicated if the difference between probability estimates was zero. Agreement between the two questions categorizing the effect of warfarin on 71

83 survival was evaluated with a weighted kappa using Cicchetti-Allison weights.(153) Others have demonstrated that responses can differ based on how the question is phrased.(151) Thus, we expected moderate agreement (weighted kappa ) between the two questions categorizing the effect of warfarin on survival. The sample size does not allow for great precision in the estimation of validity. Sensitivity analyses were performed to evaluate the effect of variable thresholds for agreement (exact agreement, differences of 5% and 10%). Reliability. Reliability refers to the reproducibility of the measure. Intra-rater reliability (test-retest) is evaluated when the elicitation method is applied to the same participant(s) on two different occasions, whereas inter-rater reliability is evaluated when the elicitation method is applied to different participants on the same occasion. In the context of belief measurement, inter-rater reliability is probably not important. Testretest reliability of the elicitation procedure was tested by administering the questionnaire to participants on 2 occasions 1 2 weeks apart. Of the measurement properties, the reliability of the elicitation procedure was felt to be the most important. With 2 observations for each participant and a hypothesized value of 0.9 for reliability, a sample size of 12 was calculated to give 80% power to reject a minimally acceptable level of reliability of 0.6 (moderate) with alpha = 0.05.(154) The intraclass correlation coefficient (ICC) for intra-rater reliability was calculated based on the point estimate for 3-year survival in the warfarin treated patients. The ICC (2,k) of Shrout and Fleiss was used to assess the reliability of k = 2 ratings assuming that all subjects were rated by the same raters who are a subset of all possible raters.(155) Data administration. All data were double entered into a computerized database. Logic and range checks were used to ensure data accuracy. Analysis. Descriptive statistics were used to summarize the data. Analyses as above were performed using SAS (version 9.2, SAS Institute, Cary, N.C) and R (version 2.2.1, The R Foundation for Statistical Computing). Institutional research ethics board approval was obtained for the conduct of this study. 72

84 4.3 Results Participants. Twelve academic specialists were recruited for this study. They were an international group with practices in Canada, France, Spain, Thailand, Mexico and Singapore. Four participants were pursuing additional research training at the University of Toronto. One participant recently immigrated to Canada. All physicians had completed their clinical training in their respective specialties. All 12 participants completed the questionnaire related to SSc-PAH patients. Eight (non-rheumatologists) completed the questionnaire related to IPAH patients. (Table 4.1) Measurement properties. The belief elicitation procedure demonstrated excellent feasibility, moderate agreement for validity and excellent test-retest reliability. (Table 4.2) Eighty-three percent (n = 10) of participants found the questions clear and 92% (n = 11) found the response option easy to use. The median time to completion was 10 minutes (5-15 minutes). Internal validity testing found moderate agreement (weighted kappa = ). Sensitivity analyses evaluating variable thresholds for the definition of agreement did not affect the construct validity. When a less than 5% difference between the probability of 3-year survival with and without warfarin was categorized as no effect, the weighted kappa for SSc-PAH patients was 0.57 (95% CI ) and IPAH patients was 0.54 (95% CI ). When a less than 10% difference between the probability of 3-year survival with and without warfarin was categorized as no effect, the weighted kappa for SSc-PAH patients was 0.49 (95% CI ) and IPAH patients was 0.37 (95% CI ). In evaluation of construct validity, box plot graphs demonstrate evidence of a strong trend in agreement between the different methods of measuring the belief of the effect of warfarin on survival. (Figure 4.2) The intra-class correlation coefficient for test-retest reliability was Elicited distributions. For SSc-PAH patients, the median (range) probability of 3-year survival without warfarin was 43% (20% - 60%) and with warfarin was 45% (20% - 75%). For IPAH patients, the median (range) probability of 3-year survival with and without warfarin was 53% (15% - 75%). The 12 elicited distributions pertaining to the effect of warfarin in SSc-PAH are presented in Appendix 3. They were all unimodal. 73

85 There was variability in the width of the distributions, the amount of skewness and the degree of kurtosis. None were incongruous. The group probability distributions for SSc- PAH and IPAH patients treated with warfarin are presented in Figure 4.3. Both distributions are wide. The group probability distribution for SSc-PAH is unimodal (with small modes caused by averaging together several spread out but narrow individual priors). The group probability distribution for IPAH is bimodal. 4.4 Discussion Expert opinions obtained under rigorous methodologic rules are increasingly being recognized as a valuable asset in diverse scientific fields including medicine, chemistry, veterinary and nuclear sciences.(100, 144) The testing of belief elicitation methods based on rigorous measurement science is important to increase the acceptance of use of Bayesian inference in clinical research. Our belief elicitation method presents an example of how expert opinion can be used to inform research. This method of eliciting beliefs from clinicians for creating a Bayesian prior probability distribution for clinical studies is feasible and has acceptable measurement properties. This belief elicitation method is feasible and has demonstrable face validity. Participants reported that questions were clear and that the response options were easy to use. For the purpose of this study, a self-reported history of participating in a post-secondary statistics course was used as a proxy measure for knowledge of statistics. Similarly, we used English as a first language as a proxy measure for fluency in English. In this study we found that whether or not a participant had a history of taking a post-secondary statistical course or had English as a first language, had no effect on the reported perception of question clarity, ease of response option use or difficulty expressing their beliefs as probability distributions. They also reported excellent face validity and acceptable time to completion. These findings are comparable to other commonly used questionnaires such as the Health Assessment Questionnaire.(92) The recommended standard of feasibility suggests that self-reported instruments should be completed in less than 15 minutes.(156) In evaluating construct validity, there is moderate agreement(157) between the two methods of eliciting clinicians beliefs about the effect of warfarin on survival. Due to the 74

86 small sample size, the confidence intervals for the kappa coefficients are wide, indicating imprecision. Further studies primarily evaluating different types of validity will require larger sample sizes to improve the precision of their results. The observed level of agreement is acceptable, as some degree of disagreement was expected for a number of reasons. First, others have shown that a participant s answer may vary based on the way the question is phrased and the response option.(151) When formulating and expressing their beliefs, respondents may look to the question and response option for assistance.(151) Second, disagreement may be related to variability in a participant s definition for the threshold of survival. For example, a participant may believe that a 10% improvement in the probability of 3-year survival is truly an improvement, whereas another participant may feel that this is not a meaningful improvement. The presence of a strong trend in the graph in participants belief about treatment effect and stated effect size supports the construct validity of the elicitation method. Third, English was a second language for a third of the participants. We used this as a proxy measure of their facility in English. If this assumption was true, this may have affected their understanding of the question and response option. However, this is less likely as these participants did not report difficulty with using the elicitation method. We found no change in construct validity between iterations. Most importantly, this method has excellent reliability. The degree of reliability was sustained across questions pertaining to both patient groups. The high reliability is evidence of no change in belief between iterations. In our study, the reliability of this elicitation method was not related to the clinical context of the questions. Our study conclusions must be considered in light of possible limitations of our methods. A few participants commented that the small size of the adhesive dots increased the time to completion and affected the ease of use of the method. Future investigators could consider using larger diameter adhesive dots to improve the feasibility and time to completion. All study participants were academics, thus the study results may only be applicable to this setting. Further evaluation should be considered in order to ascertain if this method is equally feasible, valid and reliable among participants who work in a community setting. In this study, the participants were specialists who treat this population of patients, but were not necessarily considered 75

87 leading experts in the field. There is some evidence to suggest that the inclusion of experts in a belief elicitation study improves the validity and reliability of the elicited response.(136, 137) Thus, the measurement characteristics of this method may be improved upon by the inclusion of clinical experts.(50) There remains some ambiguity regarding the meaning of a range of probability. We had asked participants to indicate boundaries for which there is very little probability that the true estimate could be greater or less than. We do not know if this was interpreted by participants as plausible estimates, extreme values or 95% credible intervals. This could be explored in the future using qualitative methods. In this questionnaire, we used a 3-year reference point. This was based on survival estimates from the published literature. It is a commonly used end-point in PAH prognostic studies.(64) Use of a different reference point, such as 5-year survival, would likely result in different probability estimates. It is uncertain whether a change in reference point would result in a change in the reliability or the validity of this method. Investigators are reminded that a number of factors may affect the validity and reliability of a belief elicitation method. The reliability of a measure is related to the population to which one wants to apply the measure.(152) The reliability may vary based on the homogeneity of the study sample. Investigators may need to evaluate the reliability of this method in their study sample. The recommended threshold of reliability is 0.70 if the result is intended for research purposes, and 0.9 if the result will be used for clinical purposes.(158) This belief elicitation method differs from earlier methods by including strategies that may reduce the effect of bias on the validity and reliability of the elicited belief. First, this method is easy to use with simple instructions, clear question stems and response options (reducing the potential for invalid responses based on insufficient understanding of the task). Second, we provided a practice exercise prior to conducting the elicitation exercise (training or provisions of examples have been shown to improve reliability).(16) Third, we have avoided presentation of data or a scenario in the question stem (avoiding anchoring bias). Fourth, we asked the participants to state the probability of survival in untreated patients (avoiding base-rate neglect). Fifth, we explicitly asked the participants to review their response and verify whether the responses accurately reflect 76

88 their beliefs. If need be, participants were given the opportunity to revise their response. The questionnaire was laminated to allow easy removal and replacement of the adhesive dots. Provision of feedback, verification and opportunity for revision has been shown to improve validity and reliability. Sixth, the elicitation exercise was conducted face-to-face, allowing participants to request clarification throughout the exercise, if required (avoiding invalid responses resulting from misunderstanding). Finally, use of adhesive dots to express a point estimate, uncertainty around the estimate and weight of belief creates a histogram that approximates a probability distribution. This is a simple graphical depiction of the participant s belief (simple graphical methods have been shown to better convey information than complex figures). The histogram can be displayed to demonstrate an individual s belief or aggregated with others through addition of bins to depict the beliefs of a group. We recommend the use of these strategies to reduce the effect of potential biases on the elicited belief. These strategies are easy to implement and can be generalized to other clinical contexts. Responsiveness, the ability of this elicitation method to measure meaningful change or clinically important change in a clinical state(93), was not evaluated in this study. No new information regarding the effect of warfarin for improving survival in either SSc-PAH or IPAH was published during the 2-week time interval that reliability testing was conducted. If future investigators wish to evaluate the reliability of this method in their study sample, they must monitor for the presence of new information during the time period of reliability testing that may change beliefs. Similarly, if investigators wish to use this method to monitor changes in beliefs over time, the responsiveness of this method should be evaluated. The individual and group probability distributions illustrate several interesting findings. At an individual level, there is variability in the belief of 3-year survival. There is variability among individuals on the amount of uncertainty around this probability and their weight of belief. The individual plots indicate that some individuals are more certain, while others are less certain. The presence of multiple small peaks in the group distributions is likely related to our small sample size. Smoothing of these curves may occur with a larger number of participants. The presence of 2 large peaks in the IPAH curve suggests that among participants, some are more pessimistic while others are 77

89 more optimistic about the probability of survival. These interesting findings should be viewed as hypothesis generating, and could be further explored in a larger study. The median probability of 3-year survival with warfarin was 45% for SSc-PAH and 53% for IPAH. There are no published studies evaluating the effect of warfarin on survival in SSc-PAH with which to compare our study finding. Published observational studies evaluating the effect of warfarin in IPAH patients report probabilities of 3-year survival of in the warfarin naïve patients of 19% to 40%, and probabilities of 3-year survival in warfarin treated patients of 42% to 62%.(71-73, 79). The results of our belief elicitation study are consistent with the published literature, which supports the external validity of this method. 4.5 Conclusion We present a pragmatic method for eliciting quantified beliefs from clinicians for inclusion in a Bayesian clinical study, which incorporates strategies that may reduce the effect of potential biases on the elicited belief. This method is easy to implement, has an acceptable time to completion, is valid and is reliable. This method can be considered for application in a Bayesian study. 78

90 Acknowledgements Dr. Sindhu Johnson has been awarded a Canadian Institutes of Health Research Phase 1 Clinician Scientist Award. Dr. Gillian Hawker is supported as the F.M. Hill Chair in Academic Women s Medicine, and a Distinguished Senior Rheumatologist Researcher of The Arthritis Society, University of Toronto. Dr. Brian Feldman holds a Canada Research Chair in Childhood Arthritis. The authors would like to thank the following individuals for their assistance with this study: Dr. Joan Badia, Dr. Vladimir Contreras, Dr. Shahin Jamal, Dr. Wanruchada Katchamart, Dr. Neil Lazar, Dr. Andrea H.L. Low, Dr. Jeffrey Mann, Dr. Carine Salliot, Dr. John Thenganatt, Dr. Duminda Wijeysundera, and Dr. Harindra Wijeysundera. 79

91 Table 4.1 Study participants Participants N = 12 Male sex 8 (67%) Specialty Cardiology 1 (8%) Respirology 6 (50%) Rheumatology 4 (34%) Other 1 (8%) Practice setting Teaching Hospital 100% Non-teaching hospital 0% Years in practice mean (sd) 4 (3) English as first language 8 (67%) Number of new SSc- and IPAH patients per year median (range) 10 (1-75) Previous statistical training 8 (67%) Uses warfarin in SSc- or IPAH patients 7 (58%) 80

92 Table 4.2 Reliability, validity and sensibility of the elicitation procedure Patient group SSc-PAH IPAH Test-retest reliability ICC (95% CI) 0.93 ( ) 0.93 ( ) Construct validity Weighted kappa (95% CI) 0.57 ( ) 0.54 ( ) Face validity Time to completion median (25% 75%) Yes 92% 10 minutes (5 15 minutes) SSc-PAH Systemic sclerosis (scleroderma) associated pulmonary arterial hypertension, IPAH Idiopathic pulmonary arterial hypertension, ICC Intraclass correlation coefficient, CI Confidence interval, IQR Interquartile range 81

93 Table 4.3 Feasibility of the elicitation procedure Percentage of respondents endorsing Question Clarity of question Ease of response option 1. For an average group of newly diagnosed SSc-PAH patients not treated with warfarin, what is the probability of being alive at 3 years? 100% 100% 2. For an average group of newly diagnosed SSc-PAH patients treated with warfarin, what is the probability of being alive at 3 years? 100% 100% 3. There may be some uncertainty around your estimate of survival. Using an X in the interval, indicate the upper and lower limits of your estimate. 83% 92% 4. You have been given 20 stickers. Each sticker represents 5% probability. Placing the stickers in the intervals, indicate the weight of belief for your survival estimates. 100% 92% 82

94 1. For an average group of newly diagnosed SSc-PAH patients not treated with warfarin, what is the probability of being alive at 3 years? Place an X in the interval to indicate the probability of 3-year survival. 0 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100% 2. For an average group of newly diagnosed SSc-PAH patients treated with warfarin, what is the probability of being alive at 3 years? 0 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100% 3. There may be some uncertainty around your estimate of survival. Using an X in the interval, indicate the upper and lower limits of your estimate. 0 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100% 4. You have been given 20 stickers. Each sticker represents 5% probability. Placing the stickers in the intervals, indicate the weight of belief for your survival estimates. Probability X X X 0 5% 10% 15% 20% 25% 30% 35% 40% 45% 50% 55% 60% 65% 70% 75% 80% 85% 90% 95% 100% Please review the shape and distribution of your answer. Does this reflect what you truly believe? If not, please feel free to revise the placement of stickers. 5. What overall effect do you believe warfarin has on 3-year survival? Improves survival Worsens survival No effect on survival Figure 4.1 Belief Elicitation Procedure Example 83

95 Figure 4.2 Construct validity Relationship between median effect on survival and overall effect on survival Note. 1. Iteration refers to test 1 and test 2, 1 2 weeks apart. 2. Graph depicts participants median effect of warfarin on survival (the difference between the probabilities of survival with and without treatment with warfarin) relative to the group s categorization of the overall effect of warfarin on survival. 84

96 Figure 4.3 Group probability distribution for 3-year survival in SSc-PAH and IPAH patients treated with warfarin 85

97 CHAPTER 5 Effect of Warfarin on Survival in SSc-PAH and IPAH. Belief Elicitation for Bayesian Priors. PUBLICATION 3 Johnson SR, Granton JT, Tomlinson GA, Grosbein HA, Hawker GA, Feldman BM. Effect of warfarin on survival in SSc-PAH and IPAH. Belief Elicitation for Bayesian Priors. J Rheumatol 2011;38:462-9 Printed with permission, The Journal of Rheumatology Sindhu R. Johnson MD 1,2, John T. Granton MD 3, George A. Tomlinson PhD 2,4,5, Haddas A. Grosbein BSc 6, Gillian A. Hawker MD MSc 2,7, Brian M. Feldman MD MSc 2,4,6 1 Division of Rheumatology, Department of Medicine, University Health Network, Toronto, Ontario, Canada 2 Department of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada 3 Divisions of Respirology and Critical Care Medicine, Department of Medicine, University Health Network, Toronto, Ontario, Canada 4 Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada 5 Division of Clinical Decision Making and Health Care, Toronto General Research Institute, Toronto, Ontario, Canada 6 Division of Rheumatology, Department of Paediatrics, The Hospital for Sick Children, Toronto, Ontario, Canada 7 Division of Rheumatology, Department of Medicine, Women s College Hospital, Toronto, Ontario, Canada Corresponding Author: Sindhu Johnson MD, Division of Rheumatology, Ground Floor, East Wing, Toronto Western Hospital, 399 Bathurst Street, Toronto, Ontario, Canada, M5T 2S8. Phone Fax Sindhu.Johnson@uhn.on.ca Word Count:

98 This study was supported by an operating grant from the Canadian Institutes of Health Research. Dr. Sindhu Johnson has been awarded a Canadian Institutes of Health Research Clinician Scientist Award and Abbott Scholar Award in Rheumatology Research. Dr. Gillian Hawker is supported as the F.M. Hill Chair in Academic Women s Medicine, and a Distinguished Senior Rheumatologist Researcher of The Arthritis Society, University of Toronto. Dr. Brian Feldman holds a Canada Research Chair in Childhood Arthritis. 87

99 ABSTRACT Objective. Warfarin use in scleroderma (SSc) associated-pulmonary arterial hypertension (PAH) and idiopathic-pah (IPAH) is controversial. A prerequisite for a trial is the demonstration of community uncertainty. We evaluated experts beliefs about the effect of warfarin on 3-year survival in SSc-PAH and IPAH, and factors that influence warfarin use. Methods. PAH experts attending the 2008 American College of Rheumatology or American Thoracic Society meetings expressed the probability of 3-year survival without and with warfarin and their degree of uncertainty by applying adhesive dots, each representing a 5% weight of probability, in bins on a line creating a prior probability distribution or prior. Using a numeric rating scale participants rated factors that influence their use of warfarin. Results. Forty-five experts (44% pulmonologists, 38% rheumatologists, 16% cardiologists, 2% internists) underwent the belief elicitation interview. In SSc-PAH, the mean probabilities of 3-year survival without and with warfarin were 54% and 56%. Pessimistic experts believe that warfarin worsens survival by 7%. Optimistic experts believe that warfarin improves survival by 13%. In IPAH, the mean probabilities of 3- year survival without and with warfarin were 68% and 76%. Factors (mean rating out of 10) that influence experts use of warfarin were functional class (5.4), age (5.4), pulmonary artery pressure (5.2), peripheral vascular disease (3.6), disease duration (2.8) and sex (1.7). Conclusions. Bayesian priors effectively quantify and illustrate experts beliefs about the effect of warfarin on survival in SSc-PAH and IPAH. This study demonstrates the presence of uncertainty about the effect of warfarin, and provides justification for a clinical trial. 88

100 5.1 Background Pulmonary arterial hypertension (PAH) is a lethal disease characterized by elevated pulmonary artery pressures and progressive right heart failure. It is a leading cause of death in patients with systemic sclerosis (scleroderma, SSc).(159) Historically, SSc- PAH patients have a 3-year survival of 22% 30%.(61, 160) Patients with idiopathic-pah (IPAH) have a similarly poor prognosis with a median survival time of 2 years.(64) One inexpensive and readily available potential treatment is warfarin. Anticoagulation of patients with PAH has been recommended with the rationale that PAH is the result of in situ thrombosis, and abnormalities in the coagulation cascade.(149, 161) However, our systematic review of the literature found that the evidence to support this recommendation is only modest; prior studies are limited by methodological constraints, which have led to conflicting results.(70) Five studies support the use of warfarin in IPAH(71-75), while 3 studies do not.(77-79) The major threat to the validity of the results of these studies is selection bias as none of these were randomized trials. In addition, these studies were limited by small sample sizes, and thus the negative results may reflect insufficient power.(70) The role of warfarin in the treatment of SSc-PAH has not been established. There are no studies evaluating the effect of warfarin in SSc-PAH. Recommendations for the use of warfarin in SSc-PAH have been generalized from IPAH studies. Attempts to conduct a randomized trial to definitively evaluate if warfarin confers survival benefit in these patients have met several challenges. First, the successful study of uncommon diseases such as SSc-PAH and IPAH is limited by the rarity of the condition. As a result, well-designed studies are often labeled negative due to the inability to recruit sufficient numbers of patients.(24) Second, a necessary prerequisite for the conduct of a trial is the demonstration of community uncertainty (a situation where not all within the community of experts agree on the efficacy of an intervention).(162) During our group s attempt to conduct a warfarin trial, many centers were reluctant to recruit patients as investigators believed that it is inappropriate to expose these patients to warfarin, while other investigators believed it is inappropriate to withhold warfarin from these patients.(163) A scientifically valid, quantitative 89

101 demonstration of community uncertainty is needed to convince participating centers. Third, studies are often too small to allow adjustment for important confounding variables. Particularly in the setting of an observational study, factors that may influence a clinician s use of warfarin should be accounted for at the analytic stage. It has been suggested that selection of variables for a regression model (i.e., models predicting the probability of exposure or estimating a treatment effect on the outcome) should be based on a priori knowledge.(48, 164) Thus, explicit identification of factors that influence clinicians use of warfarin in SSc-PAH and IPAH would inform study design and analysis. A solution to address some of these issues is the use of innovative analytic methods, including Bayesian statistical models.(165) Application of Bayesian inference has great utility in the study of uncommon diseases. One of its advantages is the ability to incorporate prior evidence, including experts beliefs about the effect of an intervention, in models estimating treatment effect.(113) Clinicians often rely on experts to guide clinical practice, particularly in the absence of definitive trial data. The Bayesian paradigm explicitly allows for the inclusion of experts beliefs about the effect of an intervention by expressing these beliefs as prior probability distributions, or priors. (86) Expression of experts beliefs as Bayesian priors has a number of pragmatic applications. First, Bayesian priors can demonstrate the presence of uncertainty (if it exists) in a quantifiable and illustrative manner. Second, quantification of beliefs allows for determination of the magnitude of a treatment effect expected by experts. This information can be used to inform study design (e.g., sample size calculation(139) and interim analysis(90)). Bayesian priors obtained through the elicitation of experts beliefs can be used to augment scarce therapeutic data.(50) The aim of this study is to evaluate experts beliefs about the effect of warfarin for improving survival in SSc-PAH and IPAH. The primary objective of this study is to evaluate experts beliefs about the effect of warfarin on 3-year survival in SSc-PAH and IPAH, respectively, expressed as probability distributions (i.e., expressed in a form that may be used as Bayesian priors). Secondary objectives of this study are: to determine the degree of uncertainty regarding the effect of warfarin on 3-year survival in SSc-PAH 90

102 and IPAH, respectively; and to evaluate factors that may influence experts use of warfarin in PAH patients. 5.2 Materials and Methods Sample. Our review of belief elicitation methods for Bayesian priors had found that the most valid and reliable beliefs are obtained from individuals who have a greater depth of knowledge and experience in the area.(50) Thus, we elicited beliefs from experts in SSc-PAH and/or IPAH. Experts in SSc-PAH and/or IPAH were defined as members of the a) Pulmonary Hypertension Association Scientific Leadership Council (PHA-SLC) (n=26), b) Pulmonary Vascular Research Institute (PVRI) Council of Senior Fellows (n=24), c) Scleroderma Clinical Trials Consortium PH investigators (n = 23), and d) Canadian Pulmonary Hypertension Trials Network (CPHTN) (n = 23). These groups comprise individuals with clinical and/or research interests in PAH. Since elicitation is best conducted as a face-to-face interaction between the expert and the investigator(25), we considered those experts attending the 2008 American College of Rheumatology (ACR) (San Francisco, USA) or American Thoracic Society (ATS) (Toronto, Canada) scientific meetings to be eligible for this study. Inclusion criteria for this study were 1) membership in PHA-SLC, PVRI Council of Senior Fellows, SCTC PH Investigators, or CPHTN; 2) attendance at either the 2008 ACR or ATS meetings, 3) agreement to participate, and 4) a practice in which the expert cares for SSc-PAH or IPAH patients. There is no consensus on the sample size required for an elicitation study.(50) Using the central limit theorem, a sample size of 30 was chosen to allow us to assume a normal distribution to the mean values of the group s belief.(166) Recruitment. A letter was ed/faxed to members of all organizations inviting them to share their beliefs about warfarin use in PAH patients and asking if they would be attending the ATS or ACR meetings. Two weeks later, a second , letter and fax was sent, inviting them to participate. Among those who agreed to participate, an interview time was arranged. Characteristics of participants collected at the time of the interview included sex, specialty, years in practice treating PAH patients, type of practice, number of new SSc-PAH and/or IPAH patients seen per year, history of formal statistical training, and use of warfarin in their practice. 91

103 Elicitation interview. A 10-minute, belief elicitation interview was conducted with each expert. Participants were given a sample questionnaire to illustrate the belief elicitation method.(51) The sample questionnaire was identical to the study questionnaire but used vitamin C as the therapeutic intervention. Participants were given the opportunity to ask questions about sample questionnaire questions and response options. Study questionnaire. The investigator read each question aloud. Participants were asked, for an average group of newly diagnosed SSc-PAH (and separately, for an average group of IPAH patients) to specify the probability of being alive at 3 years among patients if: 1) not treated with warfarin and 2) treated with warfarin. They were asked to indicate their response by placing an X on a line with 5% probability intervals ranging from 0% to 100%. Participants were asked to express the uncertainty around their estimate of survival among warfarin treated patients by placing an X at the upper and lower limit of their estimate. Participants were asked to indicate the weight of their belief for the probability of 3-year survival among warfarin treated patients by placing 0.64 cm diameter circular adhesive dots each representing 5% probability in discrete interval bins. Participants were given 20 dots adding to 100% probability. The investigator placed one dot in each bin based on the upper and lower boundaries indicated by the participant to reduce the risk of error. The investigator verified with the participant if the placement of dots was correct. The participant was asked to place the remaining 18 dots to indicate their weight of belief. Participants were asked to review the shape and distribution of dot placement, and asked to verify if it reflected their belief about the effect of warfarin on survival. Participants were given the opportunity to revise their placement of dots until they felt it reflected their belief. Finally, participants were asked to categorize their belief about the overall effect of warfarin using the response options, improves survival, worsens survival or no effect on survival. The questionnaire was laminated so that the dots could be easily removed if a participant wished to revise their response. Upon completion, the questionnaire was re-laminated to prevent movement of the dots. This elicitation procedure has a median time to completion of 10 minutes, demonstrable face validity, construct validity and reliability.(51) 92

104 Factors influencing use of warfarin. Experts were asked to list and evaluate factors that influence their decision to use warfarin in SSc-PAH and IPAH patients. The importance of functional class, age, pulmonary artery pressure, peripheral vascular disease, disease duration, interstitial lung disease and sex were specifically elicited, as these were prognostic factors in the published literature.(160, 167) Importance was reported using a numeric rating scale where 0 indicated not at all important and 10 indicated extremely important. Analysis. Sample. Descriptive statistics were used to summarize participant characteristics and relative importance of factors that influence their use of warfarin. Priors. Individual histograms representing the prior beliefs about the effect of warfarin for each respondent were constructed by counting the number of 5% dots in each bin. A group prior probability distribution representing the entire group of respondents was constructed by averaging the adhesive dot counts in each bin across all respondents. A pessimistic prior was constructed using information from the lower tenth percentile of participants specifying the smallest treatment effect. An optimistic prior was constructed using information from the upper tenth percentile of participants specifying the largest average treatment effect. For each participant, the risk difference (treatment effect) was determined by subtracting their reported probability of survival when treated without warfarin from their reported probability when treated with warfarin. A prior probability distribution for the risk difference was constructed for each expert. The weights assigned to the probability of survival treated with warfarin were centered at their reported risk difference. Comparison with literature. A systematic review of the literature was previously performed to identify studies evaluating the use of warfarin in SSc-PAH or IPAH.(70) Eligible studies were observational studies and randomized trials that reported death as an outcome. Studies were identified using MEDLINE and EMBASE databases. Two reviewers independently abstracted study design, sample size, treatment, and 3-year mortality data onto standardized forms. Details of the systematic review are available.(70) Four observational studies reporting the effect of warfarin on 3-year survival in IPAH were identified(71, 73, 79, 168), and were aggregated in a Bayesian 93

105 meta-analysis. The absolute risk difference (the difference in the proportion of events observed in the patients who did and did not receive warfarin) was calculated.(169) A random effects model was constructed; and a uniform prior on the range -1 to 1 was given to the absolute risk difference to give equal weight to all possible values of the parameter.(169) All studies were weighted equally. The mean absolute risk difference and 95% Credible Interval (CrI) were calculated from Monte Carlo Markov Chain (MCMC) sampling of the posterior distribution. Factors influencing use of warfarin. The mean importance and standard deviation for each factor were calculated. A Bayesian multivariable normal model was constructed for the logits of the importance scores for the 6 factors (rescaled between 0 and 1). MCMC samples from the posterior distribution of the mean logits were used to produce rankings, indicating the relative importance of each factor, 95% CrI and the probability of each factor having the highest ranking. Bayesian analyses. All analyses used randomly-generated initial values, a burn-in of 5,000 and 10,000 MCMC updates. The posterior means, medians, odds ratios (OR), 95% CrI and probabilities were computed from MCMC sampling of the posterior distribution. Where appropriate, convergence was evaluated using the Brooks-Gelman- Rubin convergence statistic. The reporting of the analysis and results are in accordance with the ROBUST criteria.(170) The code for all analyses is available from the authors upon request. Analyses were performed using SAS (version 9.2, SAS Institute, Cary, N.C), R (version 2.2.1, The R Foundation for Statistical Computing)(34) and WinBUGS (version , Imperial College and Medical Research Council, United Kingdom).(35) Research ethics board approval was obtained prior to the conduct of this study. Implicit consent was given when the participants agreed to proceed with the elicitation interview. 5.3 Results Sample. The combined membership list comprised 95 potential study participants. However, 42 individuals did not fulfill study eligibility criteria (4 individuals did not care for adult SSc-PAH or IPAH patients, 21 individuals declined or did not respond to our 94

106 invitations to participate (reason not given) and 17 individuals indicated that they would not attend either scientific meeting). Fifty-three experts fulfilled all eligibility criteria. We were unable to arrange a meeting time with 8 experts. We were able to conduct belief elicitation interviews with 45 experts. This gave a participation rate of 85% (45/53 experts who fulfilled inclusion criteria Figure 5.1). Participant characteristics are summarized in Table 5.1. Warfarin use. Thirty-eight (84%) participants use warfarin in their practice to treat patients with SSc-PAH or IPAH. The frequency of warfarin use ranged from only rarely to always, unless contra-indicated. A greater proportion of males (34/38 (89%)) reported use of warfarin than females (4/7 (57%)). There was no difference in the frequency of warfarin use between practices based in Europe (7/8 (88%) and North America (31/37 (84%)). Rheumatologists (10/17 (59%) less frequently reported use of warfarin to treat PAH patients, compared to cardiologists/pulmonologists (27/27 (100%)). Experts beliefs about warfarin in SSc-PAH. The mean (standard deviation (sd)) probabilities of 3-year survival without warfarin were 54% (16%) and with warfarin was 56% (16%). There were no differences in the probabilities of survival in SSc-PAH patients across specialties, practice locations or sexes. The group prior probability distribution for 3-year survival in patients treated with warfarin is presented in Figure 5.2. The curve is slightly left skewed and bimodal. Pessimistic experts (n=5) believe that warfarin confers an absolute worsening in 3-year survival by a mean of 7%. Optimistic experts (n=5) believe warfarin confers an absolute improvement in survival by a mean of 13%. (Figure 5.3) Comparing pessimists to optimists, there were no differences in sex (males: 3/5 versus 3/5) or practice location (North America: 5/5 versus 4/5). SSc-PAH optimists see more new consults per year than pessimists (54 versus 33). SSc-PAH pessimists have been in practice a mean of 14 years and optimists have been in practice a mean of 17 years. All (5/5) SSc-PAH pessimists report use of warfarin in their PAH practice whereas 3/5 SSc-PAH optimists report use of warfarin. 95

107 Experts beliefs about warfarin in IPAH. In patients with IPAH, the mean (sd) probabilities of 3-year survival without warfarin was 68% (12%) and with warfarin was 76% (11%). There were no significant differences in the probability of survival with and without warfarin in IPAH patients across specialties, practice locations or sexes. The group prior probability distribution for 3-year survival in patients treated with warfarin is presented in Figure 5.2. The curve is left skewed. Pessimistic experts believe warfarin had no effect on survival with a mean of 0%. Optimistic experts believe warfarin improves survival by a mean of 17%. (Figure 5.4) Comparing pessimists to optimists, there were no differences in sex (males: 5/5 versus 4/5) or practice location (North America: 3/5 versus 5/5). The number of years in practice for IPAH pessimists and optimists were 19 years versus 18 years. The mean number of new consults per year for IPAH pessimists and optimists were 185 versus 66. All IPAH pessimists (5/5) and optimists (5/5) report use of warfarin in their PAH practice. Factors that influence use of warfarin. The importance and ranking of factors that influence use of warfarin are reported in Table 5.2. Participants identified additional factors that influence their use of warfarin. Factors related to the right heart (right heart failure (n = 8), right heart function (n = 3)), thrombophilic state (history of thromboembolic disease (n = 7), antiphospholipid antibody (n = 4), hypercoagulable state (n = 2)), gastrointestinal disease (esophageal or gastric disease (n = 11), gastric antral vascular ectasia (n = 1)), bleeding (history of gastrointestinal bleeding (n = 10), risk of bleeding (n = 6), thrombocytopenia (n = 2), menorrhagia (n = 1), hemoptysis (n = 1), epistaxis (n = 1)) were considered important. Additional factors included falls (n = 7), compliance (n = 5), atrial fibrillation (n = 4), anemia (n = 2), IPAH versus SSc-PAH (n = 2), cardiac output (n = 2), sedentary (n = 2), liver function (n = 1), central nervous system disease (n = 1), severity of scleroderma (n = 1), and patient risk tolerance (n = 1). Comparison of experts beliefs with the published literature. Meta-analysis of 4 observational studies reporting the effect of warfarin on 3-year survival in IPAH found a 20% absolute reduction in mortality in patients who received warfarin (95% CrI: -0.49, 96

108 0.10). The probability distributions for the meta-analysis derived absolute risk difference and experts derived absolute risk difference are presented in Figure Discussion Use of a formal elicitation procedure allows us to quantify, illustrate and gain important insights into beliefs about the effect of warfarin held by leaders in the pulmonary hypertension community. Experts are guarded about the probability of survival with and without warfarin. The grim SSc-PAH prognosis is consistent with the published literature, which reports 2-year mortality estimates among warfarin naïve patients ranging from 22% 47%.(58, 61, 160, 171) There are no published estimates of SSc- PAH survival in warfarin treated patients with which to compare our study findings. Among the experts in this study, there is a wide range of probability of survival, ranging from as low as 20% to as high as 80%. This may reflect that SSc is a heterogeneous condition with different subsets of clinical presentation and prognosis.(172) Among SSc- PAH patients, experts believe that warfarin confers a small improvement in the probability of 3-year survival. Depending on the definition of a minimum clinically important difference, the magnitude of this effect may be interpreted by some as no effect. However, it is important to note that some experts (pessimists) believe warfarin worsens survival (i.e., confers harm), while others (optimists) believe that warfarin improves survival. These widely disparate views demonstrate that there is a divergence of opinion within the expert community. Insights are also gained by evaluation of experts probability distributions among IPAH patients. As a group, experts believe warfarin improves survival. The wide range in probability of survival and presence of two peaks in the distribution suggest the presence of community uncertainty. Some (relatively pessimistic) experts believe warfarin has no effect on survival, while others believe warfarin improves survival considerably. The experts belief that warfarin improves survival in IPAH is consistent with the direction of treatment effect in our meta-analysis. This supports the external validity of our study findings. However, there is a discrepancy in the magnitude of the effect, where experts believe that warfarin confers a smaller improvement in survival than was 97

109 demonstrated in the meta-analysis. This discrepancy may be related to a few issues. First, the meta-analysis only includes studies that reported 3-year survival as an outcome, in order to compare results with experts beliefs. It was necessary to identify one common outcome on which to base the analysis. The experts are likely aware that the results of other studies (that reported other survival endpoints) were conflicting with some showing benefit and others showing no benefit. The studies are all challenged by confounding by indication and this may lead to biased results, which overestimate the treatment effect; experts may adjust their own estimates of treatment effect to account for the perceived bias. Furthermore, the more conservative belief in survival benefit held by experts may reflect their real world experience. It may be that pessimism has come after experiences of unsuccessful use of warfarin. In the evaluation of other medical interventions, the magnitude of a treatment effect in the real world is often smaller than that observed in a study.(173) Identification of factors that should be controlled for in an observational study is another important insight. Confounding by indication often affects the validity of observational study findings, and in particular, may have affected the results of previous IPAH warfarin studies. Thus, it is important to recognize factors that influence expert s use of warfarin. Interestingly, none of the factors had a high mean rating. However, we were able to rank them based on their relative importance. In studies with a small sample size, and limited power to adjust for confounding, it is useful to know which factors should be included in models estimating the treatment effect. Limitations to this study may be related to characteristics of the study participants. Since there is no formal definition for PAH expert, we defined an expert as a physician who is a member of one of four pulmonary hypertension related organizations. The large number of years in practice seeing PAH patients, the large number of new consults per year (large for uncommon diseases) and the predominance of teaching hospital based practice, supports that the participants were indeed experts. This sample may not be representative of physicians in community based, non-teaching hospital practices. Patient-mix and clinical experience could be systematically different for physicians at teaching hospitals. A second potential limitation to this study was the requirement for experts to attend one of 2 scientific meetings in order to be considered eligible for this 98

110 study. As a result, some experts were excluded from study participation. This may have introduced a bias if experts who attended these meetings were systematically different that those who did not attend. Our previous work on belief elicitation for Bayesian priors has found that responses are more valid and reliable when elicited face-to-face. Conducting the interviews at the two largest scientific meetings (of different disciplines and held in different countries) where the experts are most likely to attend was the most pragmatic solution. This decision likely improved the internal validity of the study results with a small impact on the external validity. In conclusion, we have demonstrated that this belief elicitation method can be effectively used to quantify and illustrate the beliefs held by experts regarding the effect of an intervention. This method is not limited to PAH, and can be generalized to the study of many uncommon diseases. Using the Bayesian inferential paradigm, elicited beliefs in the form of priors can be used to augment scarce therapeutic data.(144) Furthermore, this study is the first to evaluate the beliefs of experts about the effect of warfarin on survival in SSc-PAH and IPAH and the magnitude of this effect. The demonstration of a divergence of opinion regarding the effect of warfarin on survival in SSc-PAH and IPAH indicates the presence of community uncertainty. If warfarin is effective in improving survival, it is an inexpensive therapy that could be accessible globally. If ineffective for improving survival, the harm of major hemorrhage precludes its use in these patient groups. A randomized trial is needed to address this important clinical question. This study provides necessary data to provide justification for the trial and inform trial design. 99

111 Acknowledgements The authors would like to thank the following individuals for their assistance with this study: Dr. David Badesch, Dr. Robyn Barst, Dr. Raymond Benza, Dr. Todd Bull, Dr. Ghazwan Butrous, Dr. Richard Channick, Dr. Vladimir Contreas, Dr. Clive Davis, Dr. Peter Docherty, Dr. C. Gregory Elliott, Dr. Barri Fessler, Dr. Daniel Furst, Dr. Eric Hachulla, Dr. Paul Hassoun, Dr. Paul Hernandez, Dr. Nicholas Hill, Dr. Andrew Hirsch, Dr. Vivien Hsu, Dr. Marc Humbert, Dr. Shahin Jamal, Dr. Bashar Kahaleh, Dr. Wanruchada Katchamat, Dr. Dinesh Khanna, Dr. Neil Lazar, Dr. Peter Lee, Dr. Robert Levy, Dr. Dale Lien, Dr. James Loyd, Dr. Andrea H.L. Low, Dr. Jeffrey Mann, Dr. Maureen Mayes, Dr. Michael McGoon, Dr. Thomas Medsger, Dr. Peter Merkel, Dr. Evangelos Mikalakis, Dr. Nicholas Morrell, Dr. Sanjay Mehta, Dr. David Ostrow, Dr. Ronald Oudiz, Dr. Janet Pope, Dr. Steve Provencher, Dr. Stuart Rich, Dr. Gabriela Riemekasten, Dr. Carine Salliot, Dr. James Seibold, Dr. Olivier Sitbon, Dr. Virginia Steen, Dr. John Swiston, Dr. John Thenganatt, Dr. Gabriele Valentini, Dr. Doug Veale, Dr. Fred Wigley, Dr. Duminda Wijeysundera, and Dr. Harindra Wijeysundera. The authors thank the Scleroderma Clinical Trials Consortium, the Pulmonary Hypertension Association, the Pulmonary Vascular Research Institute, and the Canadian Pulmonary Hypertension Trials Network for assistance with the conduct of this study. The authors thank administrative staff of the American College of Rheumatology and the American Thoracic Society for assistance with logistical issues pertaining to the conduct of this study at their respective scientific meetings. 100

112 Table 5.1. Characteristics of study participants Characteristics Number (%) n = 45 Male Sex 38 (84%) Specialty Cardiology 7 (16%) Pulmonology 20 (44%) Rheumatology 17 (38%) Internal Medicine 1 (2%) Number of years seeing PAH patients Median (25% 75%) 15 years (10 years 25 years) Type of practice Non-teaching hospital 0 (0%) Teaching hospital 44 (98%) Both 1 (2%) Number of new SSc-PAH and/or IPAH patients per year Median (25% 75%) 30 (15 50) Location of practice Europe 8 (18%) North America 37 (82%) History of post-secondary statistical training 27 (60%) Use of warfarin in their pulmonary hypertension patients 38 (84%) 101

113 Table 5.2. Factors that influence experts use of warfarin Factor (n = 38) Mean importance (sd) Ranking (95% CrI) Probability of having the highest ranking NYHA/WHO Functional class 5.4 (2.5) Age 5.4 (2.3) 2 (1, 3) 2 (1, 3) 46% 36% Severity of pulmonary artery pressure 5.2 (2.5) 2 (1, 3) 18% Peripheral vascular disease 3.6 (2.7) Disease duration 2.8 (2.5) Interstitial lung disease 2.3 (2.1) Sex 1.7 (2.0) 4 (4, 5) 5 (5, 5) 6 (5, 7) 7 (6, 7) 0% 0% 0% 0% Importance was reported on a numeric analogue scale where 0 indicated not at all important and 10 indicated extremely important sd - standard deviation Ranking - Ranking of importance of factors in affecting use of warfarin in PAH patients (1 = most important, 7 = least important) 95% CrI - 95% Credible Interval 102

114 Figure 5.1. Flow diagram of participant recruitment 103

115 Figure 5.2. Group prior probability distributions for probability of 3-year survival in SSc- PAH and IPAH treated with warfarin 104

116 Figure 5.3. Prior probability distributions for the effect of warfarin on the absolute risk difference for 3-year mortality in SSc-PAH patients from experts Notes: Values > 0 indicates warfarin increases the risk of death Values < 0 indicates warfarin reduces the risk of death 105

117 Figure 5.4. Probability distributions for the effect of warfarin on the absolute risk difference for 3-year mortality in IPAH patients from experts and the published literature Notes: Values > 0 indicates warfarin increases the risk of death Values < 0 indicates warfarin reduces the risk of death 106

118 CHAPTER 6 Warfarin in Scleroderma-associated and Idiopathic Pulmonary Arterial Hypertension. A Bayesian Approach to Evaluating Treatment in Uncommon Disease. PUBLICATION 4 Johnson SR, Granton JT, Tomlinson GA, Grosbein HA, Le T, Lee P, Seary ME, Hawker GA, Feldman BM. Warfarin in scleroderma-associated and idiopathic pulmonary arterial hypertension. A Bayesian approach to evaluating treatment in uncommon disease. J Rheumatol 2012;39: Printed with permission, The Journal of Rheumatology Warfarin in scleroderma-associated and idiopathic pulmonary arterial hypertension. A Bayesian approach to evaluating treatment in uncommon disease. Sindhu R. Johnson MD 1,2,7, John T. Granton MD 3, George A. Tomlinson PhD 2,4,5, Haddas A. Grosbein BSc 6, Thaolan Le BSc 3, Peter Lee MD 7, M. Elizabeth Seary BSc 6, Gillian A. Hawker MD MSc 2,8, Brian M. Feldman MD MSc 2,4,6 1 Division of Rheumatology, Department of Medicine, Toronto Western Hospital, Toronto, Ontario, Canada 2 Department of Health Policy, Management and Evaluation, University of Toronto, Toronto, Ontario, Canada 3 Divisions of Respirology and Critical Care Medicine, Department of Medicine, University Health Network, Toronto, Ontario, Canada 4 Dalla Lana School of Public Health, University of Toronto, Toronto, Ontario, Canada 5 Division of Clinical Decision Making and Health Care, Toronto General Research Institute, Toronto, Ontario, Canada 6 Division of Rheumatology, Department of Paediatrics, The Hospital for Sick Children, Toronto, Ontario, Canada 107

119 7 Division of Rheumatology, Department of Medicine, Mount Sinai Hospital, Toronto, Ontario, Canada 8 Division of Rheumatology, Department of Medicine, Women s College Hospital, Toronto, Ontario, Canada Corresponding Author: Sindhu Johnson MD, Division of Rheumatology, Ground Floor, East Wing, Toronto Western Hospital, 399 Bathurst Street, Toronto, Ontario, Canada, M5T 2S8. Phone Fax Sindhu.Johnson@uhn.on.ca Key words: Bayesian, propensity score, pulmonary hypertension, confounding, scleroderma, warfarin Acknowledgements: Dr. Sindhu Johnson has been awarded a Canadian Institutes of Health Research Clinician Scientist Award and Abbott Scholar Award in Rheumatology Research. Dr. Gillian Hawker is supported as the F.M. Hill Chair in Academic Women s Medicine, and a Distinguished Senior Rheumatologist Researcher of The Arthritis Society, University of Toronto. Dr. Brian Feldman holds a Canada Research Chair in Childhood Arthritis. Operating grants from the Canadian Institutes of Health Research and the Scleroderma Society of Ontario funded this research. 108

120 ABSTRACT Objectives. Warfarin is recommended in scleroderma-associated pulmonary arterial hypertension (SSc-PAH) and idiopathic PAH (IPAH) to improve survival. There is no evidence to support this in SSc-PAH and the evidence in IPAH is conflicting. We evaluated the ability of warfarin to improve survival using 2 large SSc-PAH and IPAH cohorts. Methods. The effect of warfarin on all-cause mortality was evaluated. Bayesian propensity scores (PS) were used to adjust for baseline differences between patients exposed and not exposed to warfarin, and assemble a matched cohort. Bayesian Cox proportional hazards models were constructed using informative priors based on international PAH expert elicitation. Results. Review of 1,138 charts identified 275 SSc-PAH (n=78 (28%) treated with warfarin), and 155 IPAH patients (n=91 (59%) treated with warfarin). Baseline differences in PAH severity and medications were resolved using PS matching. In the matched cohort of 98 SSc-PAH patients (n=49 treated with warfarin), the posterior median Hazard Ratio (HR) was 1.06 (95%CrI 0.70, 1.63). In the matched cohort of 66 IPAH patients (n=33 treated with warfarin), the posterior median HR was 1.07 (95%CrI 0.57, 1.98). The probability that warfarin improves median survival by 6 months or more is 23.5% in SSc-PAH and 27.7% in IPAH. Conversely, there is a greater than 70% probability that warfarin provides no significant benefit or is harmful. Conclusion. There is a low probability that warfarin improves survival in SSc-PAH and IPAH. Given the availability of other PAH therapies with demonstrable benefits, there is little role for warfarin in improving survival for these patients. 109

121 6.1 Introduction The evaluation of therapy in uncommon disease is threatened by many challenges. Problems stemming from the small numbers of patients available for participation in randomized controlled trials (RCT), and the cost of multi-center trials (which may be the only way to accrue enough subjects) can be insurmountable.(174) Furthermore, there is limited availability of funding for evaluation of older therapies. In such settings, some necessary RCTs will never be undertaken. Researchers have been challenged to develop methods using observational data to obtain unbiased estimates of treatment effect comparable to those that would be obtained in RCTs.(13) Recently, innovative methodologies and improved computational ability have resulted in the development of strategies that can give such estimates of treatment effect. An example of providing evidence upon which to base therapy for an uncommon disease, and an older therapy for which funding for research is scarce, is the use of warfarin for improving survival in pulmonary arterial hypertension (PAH). PAH is a lethal disease characterized by elevated pulmonary artery pressure that leads to dyspnea, heart failure and death. In the setting of systemic sclerosis (SSc), the prevalence of PAH ranges from 5%-12%(58, 175, 176), and is a leading cause of death.(55, 61) Historically, SSc-PAH had a median survival of 12 months.(61) In the modern treatment era, the median survival has improved to 3 4 years.(58, 62) Idiopathic pulmonary arterial hypertension (IPAH) has an incidence ranging from cases per million, and a prevalence ranging from cases per million population.(63) Untreated, IPAH as a median survival of 2.8 years.(64) In the modern treatment era, 3-year survival has improved to 76% 85%.(65, 66) One inexpensive and readily available potential treatment is warfarin. Anticoagulation of PAH patients has been recommended with the rationale that PAH may be the result of thrombotic arteriopathy and abnormalities in the coagulation cascade.(80, 149, 161) Our systematic review of the literature found that the evidence to support this recommendation is limited by methodological constraints and conflicting studies.(149) Five studies support the effect of anticoagulation in IPAH(71-75), while 3 studies do not.(76-79) None of the studies were placebo controlled or blinded. Only 1 of the 8 studies was prospective. Most 110

122 significantly, none of the studies were randomized. As such, the major threat to the validity of the results of these studies is confounding-by-indication. In addition, these studies were limited by small sample sizes, and thus the negative results may reflect insufficient power. The role of warfarin in the treatment of SSc-PAH has not been established. There are no studies evaluating the effect of warfarin in this population. The recommendation to consider warfarin in SSc-PAH is based on expert opinion and potential benefits generalized from IPAH studies.(68, 69) However, the theoretical benefits of warfarin may be offset by an increased risk of gastrointestinal (GI) bleeding and stroke that can lead to hospitalization, morbidity and mortality. Warfarin is associated with major bleeding rates ranging from 2 3% annually.(82, 83) Furthermore, SSc patients with luminal telangiectasia or gastric antral vascular ectasia (GAVE) may be at higher risk of GI bleeding.(84, 85) As such, an evaluation of the benefits and risks of warfarin in SSc- PAH and IPAH is needed. Newer therapies for PAH, such as prostacyclin analogues, endothelin receptor antagonists (ERA), and phosphodiesterase (PDE) inhibitors, have been shown to have beneficial effects on exercise capacity and dyspnea.(177, 178) However, they are very expensive, logistically difficult to administer and not equally accessible to patients. In the era of modern PAH management, it is unlikely that a trial evaluating the effect of warfarin will be conducted. Yet it remains uncertain if warfarin is effective, and uncertain whether it has a place in modern PAH therapy. This uncertainty is reflected in current guidelines. McLaughlin et al. recommend anticoagulation in SSc-PAH with advanced disease and without contraindications, and recommend anticoagulation in IPAH based on committee consensus.(68) Galie et al. recommend that anticoagulation should be considered in IPAH; and may be considered in SSc-PAH (Class IIb recommendation: usefulness/efficacy is less established by evidence/opinion).(69) Barst et al. give no recommendation for the use of warfarin in SSc-PAH and only given a moderate recommendation on the basis of expert opinion for warfarin use in IPAH.(80) The objectives of this study were to evaluate the effect of warfarin on survival in patients with SSc-PAH and IPAH, respectively. We used state-of-the-art methods to obtain 111

123 estimates of treatment effect comparable to those that would have been obtained from a randomized trial. 6.2 Materials and Methods Patients. The Toronto Scleroderma Clinic and the University Health Network Pulmonary Hypertension Programme are the largest published longitudinal cohorts of their kind in Canada. Patients were included if they had a diagnosis of SSc with PAH or IPAH defined as a mean pulmonary artery pressure (mpap) >25 mmhg and pulmonary capillary wedge pressure (PCWP) < 15mmHg on cardiac catheterization(179), and age >17 years. Patients were excluded if they had another etiology for pulmonary hypertension (HIV, anorexigen use, portal hypertension, cardiac abnormalities (left heart disease (systolic, diastolic dysfunction or valvular dysfunction)), interstitial lung disease (forced vital capacity < 70% predicted and bibasilar reticular abnormalities with minimal ground glass on high resolution computerized tomography of the thorax) or thromboembolic disease); had a diagnosis of other connective tissue disease; had other indications for warfarin use (atrial fibrillation, artificial heart valve, pulmonary thromboembolic disease); or had any contraindications to warfarin use. Exposure. The exposure was treatment with warfarin at any time after diagnosis of PAH. A minimum duration of exposure or minimum dose was not pre-specified. Outcome. The primary outcome was time from diagnosis of PAH to death from all causes. Patients who were alive at the end of the follow-up time were censored. Dates of death were obtained from the clinic chart, hospital electronic record or obituary. Online obituary websites were searched to identify deceased patients. The date of death from the obituary was used if there was a correct match on first and last name, sex, city/town and use of the terms scleroderma or pulmonary hypertension in the obituary text. If a patient was alive at the last clinic visit or survival status on January 1, 2008 was unknown, the family/referring physicians were contacted using a standardized letter that was faxed and mailed twice, and up to 2 subsequent telephone calls. Information about survival status, cause-of-death or date last seen was collected. 112

124 Data collection. Data were abstracted by a single abstractor from charts, standardized research protocols and hospital electronic records using a standardized abstraction form. Data included: date of birth, sex, postal code, dates of SSc and PAH diagnosis, etiology of PAH, co-morbidities, treatments, adverse events with warfarin, baseline functional class, cardiac hemodynamics (mpap, PCWP, cardiac output (CO), mean right atrial pressure (RAP), pulmonary vascular resistance (PVR)), right ventricular parameters (size, function, systolic pressure) and pulmonary function tests. GI bleeding was defined as the presence of upper or lower gastrointesinal bleeding requiring treatment with one or more units of blood. Data administration. Data collected from research protocols were verified against the chart or hospital record. Data were double-entered into a Filemaker Pro version 8.5 database. Data entry errors were minimized through the use of automatic value ranges and internal logic checks. Analytic overview. Propensity score methods were used to create a cohort of matched treated and untreated patients, as a method of bias reduction. A fully Bayesian analysis was undertaken using a survival outcome. This approach takes into account preexisting knowledge regarding the effect of warfarin on survival expressed as prior probability distributions ( priors ). The result is a posterior probability distribution, which allows for inferences about treatment effect to be made using probability statements that takes into consideration both this pre-existing knowledge and new data and which allows for inferences about treatment effect to be made using probability statements. Propensity score (PS). A Bayesian logistic regression model was fitted using baseline variables and exposure to warfarin as the outcome variable. Variables were chosen based on elicitation from international PAH experts and clinical sensibility. Variables PAH experts reported as important included sex, disease subtype, pulmonary artery pressure, right heart size and function, functional class, disease duration, comorbidities, and concomitant medications.(52) Uninformative priors were specified with a diffuse normal distribution, mean of 0 and precision of For each patient, the median posterior probability of exposure was used as the propensity score. Treated and untreated patients were matched without replacement on functional class and 113

125 propensity score, using 1:1, nearest neighbour matching with a caliper width of 0.2 standard deviations of the propensity score. The goodness-of-fit of the propensity score was evaluated by the degree to which it resulted in balance of baseline variables between the warfarin exposed and non-exposed patients. A method of assessing balance is evaluation of the standardized difference. Equations (1) and (2) show the standardized differences for binary and continuous variables. Equation 1. Standardized differences for comparing dichotomous variables d = standardized difference ˆ p = prevalence of baseline characteristic Equation 2. Standardized difference for comparing means d = standardized difference = mean of baseline characteristic s 2 = variance of baseline characteristic As in a randomized trial, there may be some residual imbalance in baseline characteristics in the matched sample. It has been recommended that the threshold for acceptable residual imbalance is the residual imbalance that would be observed in a similarly sized clinical trial in the same clinical area. Alternatively, it has been recommended that an absolute difference of 10% or less is considered a good match.(180) 114

126 Priors. Priors are quantified expressions of knowledge and/or beliefs regarding a treatment effect preceding the conduct of the study. A variety of priors were used to evaluate the sensitivity of the analysis to a range of viewpoints that should be considered when interpreting the evidence.(181) An uninformative prior was used to evaluate the data, independent of pre-existing knowledge. For SSc-PAH and IPAH separately, the uninformative prior was specified for the log-hazard ratio comparing the two groups: this was a diffuse normal distribution with a mean of 0, and a precision of An informative prior based on our study of international PAH experts beliefs about the effect of warfarin on survival, and an optimistic prior using information from the 10% of experts specifying the smallest hazard ratios (HR) were constructed as follows.(182) Each expert s distribution for the probability of 3-year survival with warfarin treatment was combined with the expert s probability of 3-year survival without warfarin treatment to generate the expert s prior distribution for the log(hr). The distributions were averaged across experts to generate the group mean prior. The SSc-PAH group prior for the log(hr) was best represented by a Student t distribution with 4 degrees of freedom, a mean of 0.03 and a standard deviation of The SSc-PAH optimistic prior was best represented by a Student t distribution with 3 degrees of freedom, a mean of and a standard deviation of The IPAH group prior was best represented by a Student t distribution with 5 degrees of freedom, a mean of 0.09 and a standard deviation of The IPAH optimistic prior was best represented by a Student t distribution with 4 degrees of freedom, a mean of and a standard deviation of Box s measure of conflict was used as a method to compare the experts informative prior with the data. It is analogous to a traditional p-value in measuring the predictive probability of getting a result at least as extreme as that observed.(181) It has been recommended that Bayesian analyses include a tri-plot. A tri-plot is used to illustrate the prior and the new data expressed as a likelihood function. It also illustrates the posterior probability distribution, that is, how the prior probability distribution has changed with the incorporation of new data. Survival model. Patients who were alive on January 1, 2008 were right censored. Median survival and survival probabilities in the treated and untreated patients in the matched cohort were determined using Kaplan-Meier survival curves. Within the 115

127 matched pairs, we used a matched Cox proportional hazards model to estimate the effect of warfarin on all-cause mortality. For SSc-PAH and IPAH separately, we evaluated the probability that warfarin improves median survival by 6 months or more. A time point was necessary for us to make a direct probability statement. We chose 6 months or more as we believe that this would be an improvement in survival that would be considered clinically important by both patients and clinicians. We assumed a constant hazard rate (exponential survival) in the warfarin non-exposed and multiplied this by samples from the posterior distribution of the hazard ratio for warfarin to obtain samples of the hazard rate in the exposed. We chose a constant (exponential) hazard when computing the probability that there was a survival benefit more than 6 months for computational simplicity and because the exponential is widely used in modeling of survival. A simple calculation converts the two hazard rates to a difference in median survival between the two groups. We computed the proportion of samples where the difference in median survival time exceeded 6 months. Simulation. For SSc-PAH and IPAH separately, we assessed whether our propensity score matched data gave comparable differences in baseline characteristics to that observed in a randomized trial of the same size. In this simulation, samples of size 98 (the size of the matched SSc-PAH cohort) were drawn from the observed SSc-PAH cohort of 275 patients and randomized in a 1:1 ratio to treatment or control. For each baseline characteristic, the absolute standardized difference was calculated. This was repeated 10,000 times, and we counted the number of times a standardized difference between the two groups exceeded the observed value in the propensity-matched sample. For the IPAH patients, we carried out a similar procedure, drawing samples of size 66 from the observed IPAH cohort of 155 patients All Bayesian analyses used a burn-in of 5,000 followed by collection of 10,000 Monte Carlo Markov Chain (MCMC) updates. The HR, 95% credible interval (CrI) and probabilities were computed from these MCMC samples from the posterior distribution. Where appropriate, convergence was evaluated using the Brooks-Gelman-Rubin convergence statistic. The reporting of the analysis and results are in accordance with 116

128 the ROBUST criteria.(170) The code for all analyses is available from the authors. Analyses were performed using SAS (v9.2, SAS Institute, Cary, N.C), R (v2.8.1, The R Foundation for Statistical Computing) and WinBUGS (v Imperial College and Medical Research Council, United Kingdom). Research ethics board approval was obtained prior to the conduct of this study. 6.3 Results Patients. Review of 1138 charts identified 275 SSc-PAH patients (78 (28%) of whom received warfarin), and 155 IPAH patients (91 (59%) of whom received warfarin). Warfarin treated SSc-PAH and IPAH patients had worse baseline measures of PAH severity, and more use of PAH medications. (Tables 6.1 and 6.2) Calcium channel blocker (CCB) use was more frequent among SSc-PAH patients than IPAH patients. This is likely due to the frequent use of CCB for the treatment of Raynaud s phenomenon. There were no significant differences in CCB use between the warfarin exposed and non-exposed patients in the SSc-PAH and IPAH matched cohorts, respectively. GI bleeding occurred in 7% (13/197) of the warfarin non-exposed and 8% (8/78) of the warfarin exposed SSc-PAH patients. Hemorrhagic stroke occurred in 1 warfarin nonexposed and 1 exposed SSc-PAH patient. GI bleeding occurred in 2% (1/64) of the warfarin non-exposed and 7% (6/91) of the warfarin-exposed IPAH patients. Hemorrhagic stroke occurred in 1 warfarin non-exposed IPAH patient. Propensity score matching. The differences in baseline characteristics between the treated and untreated groups were substantially reduced in the propensity score matched SSc-PAH and IPAH cohorts. (Tables 6.1 and 6.2) The largest standardized difference in the matched cohort was In each of the SSc-PAH and IPAH matched cohorts, a standardized difference greater than the recommended 0.10 was observed twice.(180) Our simulation found that in a RCT with 49 SSc-PAH patients per group and 16 baseline characteristics, there is a 91% probability of observing an absolute standardized difference of 0.27 or greater, and a 100% probability of observing at least 2 baseline characteristics with an absolute standardized difference of 0.10 or more. In a 117

129 RCT with 33 IPAH patients per group and 16 baseline characteristics, there is a 99.9% probability of observing an absolute standardized difference of 0.20 or greater, and a 100% probability of observing at least 2 baseline characteristics with an absolute standardized difference of 0.10 or more. Therefore, both PS matched cohorts had differences in baseline covariates smaller than those that would be observed in a RCT of the same size. We also evaluated differences in baseline hemodynamics not included in the propensity score model between warfarin non-exposed and exposed patients. In the SSc-PAH matched cohort, the warfarin non-exposed patients compared to the warfarin exposed patients had a mean (standard deviation (sd)) RAP 12.8 (4.6) mmhg versus 10.2 (7.6) mmhg, CO 3.5 (0.8) L/min versus 3.4 (1.7) L/min, PCWP 14.2 (8.13) mmhg versus 9.8 (4.5) mmhg, and PVR 492 (564) dyn s cm 5 versus 528 (346) dyn s cm 5. In the matched cohort of IPAH patients, the warfarin non-exposed patients compared to warfarin exposed patients had a mean RAP 14.9 (9.0) mmhg versus 14.6 (8.7) mmhg, CO 27.9 (13.5) L/min versus 27.0 (8.7) L/min, PCWP 10.5 (4.2) mmhg and 9.4 (4.5) mmhg and PVR 1278 (502) dyn s cm 5 versus 923 (538) dyn s cm 5. Survival. The 3-year survival in the matched SSc-PAH cohort was 61% for the warfarin non-exposed patients and 58% for the warfarin exposed patients. The 3-year survival in the matched IPAH cohort was 83% for both the warfarin non-exposed and exposed patients. The results of the Cox model survival analysis are presented in Table 6.3. The Bayesian tri-plots illustrating the prior, likelihood and posterior distributions for SSc-PAH and IPAH are presented in Figures 6.1 and 6.2, respectively. The tri-plots illustrate that the addition of data from this study has improved the precision (decreased the uncertainty) around the estimated hazard ratio. Box s measure of conflict between the group prior and the data was 0.62 for SSc-PAH and 0.91 for IPAH. This indicates no evidence of a significant discrepancy between the prior and the data. Given exponential survival with a median of 4.9 years in the untreated SSc-PAH patients, the probability of improving survival by 6 months or more with warfarin is 23.5% (i.e., 76.5% probability of survival worsening, or survival improvement of <6- months). Figure 3. Given a median survival of 3.9 years in the untreated IPAH patients, 118

130 the probability of improving survival by 6 months or more with warfarin is 27.7% (i.e., 72.3% probability of survival worsening, or survival improvement of <6-months). Figure 4. Sensitivity analyses indicate that if the baseline median survival is as high as 7 years, the probability of improving survival by 6 months or more in either SSc-PAH or IPAH does not exceed 32%. Using an optimistic prior representing experts with a beneficial view of the effect of warfarin in SSc-PAH, the probability of improving survival by 6 months or more with warfarin is 43.0%, and does not exceed 50% if the baseline median survival is as high as 7 years. Using an optimistic prior in IPAH, the probability of improving survival by 6 months or more with warfarin is 41.6%, and does not exceed 50% if the baseline median survival is as high as 7 years. 6.4 Discussion Using innovative methods, we have used observational data to make estimates of treatment effect comparable to that observed in a RCT. This is a unique methodologic contribution to the literature as the methods we used protect from biases that usually threaten the validity of other retrospective, observational studies. In a setting where a definitive RCT is not feasible due to the large sample size required, cost and/or political will(174), this is of great value. Furthermore, we have demonstrated that in both SSc- PAH and IPAH, the probability of a survival benefit with warfarin is low. We found that both SSc-PAH and IPAH patients exposed to warfarin have worse functional class, more right ventricular dysfunction and use more PAH medications than unexposed patients. This suggests that the crude association between warfarin and survival is likely to be confounded (confounding-by-indication). Failure to account for these systematic differences leads to biased estimates of treatment effect. In this case, it would lead to the conclusion that warfarin worsens survival. This is likely not the case since major hemorrhage was an infrequent cause of death. Our use of propensity score matching reduced the effect of confounding allowing us to make a less biased estimate of the treatment effect. 119

131 The use of the Bayesian paradigm in the setting of an uncommon disease conferred a number of advantages. First, Bayesian methods allowed us to make direct probability statements about the treatment effect given the data at hand; this contrasts with traditional (frequentist) methods that report on the extremeness of the data (the pvalue ) given an assumption about a true treatment effect.(6) Our inferences were not bound by a decision based on what is usually an arbitrary 0.05 level of significance. This has great utility in the study of uncommon diseases where numbers of patients available for study (and power to detect a treatment effect at a certain p-value) are limited. Second, in a Bayesian analysis, we are able to compute the probability that a treatment effect is larger than any specified threshold.(24) We are able to make direct evidence-based probability statements that are useful to clinicians. In SSc-PAH and IPAH, there is a low probability of improving survival by 6 months or more with warfarin. Third, Bayesian models allow the analysis to incorporate pre-existing knowledge and beliefs in the estimation of a treatment effect, so that the estimate includes all knowledge in the area to date.(86) In our study, evaluation of the matched data with an uninformative prior indicates hazard ratios close to one. Taking into account international PAH experts knowledge and experience gives a similar hazard ratio, with greater precision. Similarly, we analyzed the data using optimistic priors. Despite using the beliefs of experts who have the most optimistic view about a beneficial treatment effect of warfarin, the probability of improving survival by a median of 6 months or more is less than 50% in either SSc-PAH or IPAH. The use of informative priors informs the community how rational individuals can interpret the study findings, given experts knowledge and experience. Our finding provides a scientific, quantifiable answer to How should this new piece of evidence change what we currently believe? (28) We found that treatment with warfarin has a low probability of improving survival in either SSc-PAH or IPAH. This is the first study to evaluate the effect of warfarin in SSc- PAH. The study findings are discordant with some observational studies of warfarin in IPAH. Our findings could be explained within the context of the evolving understanding of the pathogenesis of PAH over time. The recommendation for warfarin use originates in a time when PAH was believed to be the result of thrombotic arteriopathy and abnormalities in the coagulation cascade.(149) Previous histo-pathologic studies report 120

132 a prevalence of thrombotic arteriopathy of 40% 57%.(71, 183) Also, in the past there were limited treatment options. The potential benefit of warfarin outweighed the known risk of hemorrhage. Over the last decade however, the pathogenesis has been recognized to be much more complex.(184) One potential explanation for the low probability of a beneficial effect of warfarin is that the role of thrombotic arteriopathy in the pathogenesis of PAH may be smaller than previously believed.(149) Therefore, in the current era, the risks and benefit should be carefully considered when making a decision to anticoagulate SSc-PAH or IPAH patients.(185) In a situation where there is a low probability of survival benefit, and alternative treatment options with demonstrable survival benefit there is little role for warfarin in these patients. Our findings also provide interesting insights into adverse events with warfarin in SSc- PAH and IPAH. In our SSc-PAH patients, there were no differences in the occurrence of hemorrhagic stroke or GI bleeding between warfarin non-exposed and exposed patients. Given the presence of GI vascular lesions (luminal telangiectasia or GAVE), SSc-PAH patients are potentially at higher risk of GI bleeding. In our study, warfarin use did not increase the risk of major GI bleeding. However, in our study, major GI bleeding was defined as necessitating transfusion. Our study did not capture rates of minor bleeding. In the IPAH patients, major GI bleeding occurred more frequently in the warfarin exposed patients. Hemorrhagic stroke occurred in 1 warfarin non-exposed IPAH patient. There are potential limitations that should be considered in the interpretation of this study. The first potential limitation is that our matching didn t account for all prognostic factors that are reported in the literature. We included all important confounders that were specified by experts, and our standardized differences showed that our groups were as equal in matching as would be expected in a randomized trial of the same size. However, we could not adjust for unknown confounders. A second limitation is our small study size. In a frequentist analysis we might have low power to detect an important difference. However, in a Bayesian framework power is not a consideration, and we showed a very low probability of a clinically important difference. A third potential limitation is our categorization of warfarin exposure. Timing of warfarin treatment could affect the outcome. If a patient died before they were given warfarin, they would have 121

133 been classified in the non-exposed group. This would bias the results toward a benefit for warfarin. If this is the case, the probability of survival benefit with warfarin in SSc- PAH and IPAH is even less than we report. A fourth potential limitation is our exclusion of patients who had other causes of pulmonary hypertension (e.g., interstitial lung disease). The prognosis of these patients has been shown to be different than patients with PAH and as such, was not included in this analysis. This affects the generalizability of our study results. Our results only apply to patients with SSc-PAH and IPAH. In conclusion, the use of innovative methods to make unbiased estimates of treatment effects using observational data is a significant contribution to the study of uncommon diseases. These methods will be valuable to all researchers who are faced with the methodologic challenge of making inferences about treatment effects from observational data of uncommon diseases. In this study, the probability that warfarin improves survival in SSc-PAH and IPAH is low. Given the availability of other PAH therapies with demonstrable beneficial effects, there is little role for warfarin in the treatment of SSc-PAH and IPAH. 122

134 Table 6.1 SSc-PAH patient characteristics Characteristics Unmatched Matched Absolute n (%) n = 275 n = 98 standardized difference No warfarin n = 197 Warfarin n = 78 No warfarin n = 49 Warfarin N = 49 Unmatched n = 275 Matched n = 98 Female sex 165 (84%) 66 (85%) 45 (92%) 44 (90%) PAH Characteristics at diagnosis mpap mmhg mean (sd) 39.0 (14.3) 46.8 (14.3) 38.8 (15.3) 42.5 (11.8) WHO Functional Class III/IV Moderate-severe RV enlargement Moderate-severe RV hypokinesis 68 (35%) 36 (46%) 23 (47%) 23 (47%) (13%) 25 (32%) 11 (22%) 13 (27%) (13%) 25 (32%) 11 (22%) 13 (27%) Comorbidities Cancer 21 (11%) 8 (10%) 7 (14%) 6 (12%) < Coronary artery disease 27 (14%) 7 (9%) 6 (12%) 5 (10%) Diabetes mellitus 10 (5%) 7 (9%) 4 (8%) 2 (4%) Hyperlipidemia 11 (6%) 9 (12%) 6 (12%) 3 (6%) Hypertension 55 (28%) 16 (21%) 16 (33%) 9 (18%) Peripheral vascular disease 9 (5%) 4 (5%) 2 (4%) 3 (6%) Ischemic stroke 7 (4%) 2 (3%) 2 (4%) 2 (4%) Concomitant medications Calcium channel blocker 104 (53%) 33 (42%) 23 (47%) 23 (47%) ER antagonist 35 (18%) 33 (42%) 9 (18%) 15 (31%) PDE inhibitor 16 (8%) 5 (6%) 3 (6%) 5 (10%)

135 Prostanglandin analog 10 (5%) 13 (17%) 4 (8%) 4 (8%) PAH Pulmonary arterial hypertension, mpap mean Pulmonary Artery Pressure, WHO World Health Organization, RV Right ventricular, ER Endothelin receptor antagonist, PDE Phosphodiesterase 124

136 Table 6.2 IPAH patient characteristics Characteristics Unmatched Matched Absolute n (%) n = 155 n = 66 standardized difference No warfarin n = 64 Warfarin n = 91 No warfarin n = 33 Warfarin n = 33 Unmatched n = 155 Matched n = 66 Female sex 42 (66%) 66 (73%) 20 (61%) 21 (64%) PAH Characteristics at diagnosis mpap mmhg mean (sd) 52.9 (13.5) 42.6 (13.3) 51.6 (15.6) 47.5 (15.9) WHO Functional Class III/IV Moderate-severe RV enlargement Moderate-severe RV hypokinesis 29 (45%) 54 (59%) 18 (55%) 18 (55%) (2%) 16 (18%) 1 (3%) 4 (12%) (30%) 50 (55%) 13 (39%) 17 (52%) Comorbidities Cancer 3 (5%) 10 (11%) 3 (9%) 1 (3%) Coronary artery disease 8 (13%) 16 (18%) 4 (12%) 1 (3%) Diabetes mellitus 14 (22%) 15 (16%) 9 (27%) 7 (21%) Hyperlipidemia 8 (13%) 7 (8%) 3 (9%) 1 (3%) Hypertension 24 (38%) 24 (26%) 10 (30%) 11 (33%) Peripheral vascular disease 1 (2%) 1 (1%) Ischemic stroke 2 (3%) 8 (9%) 1 (3%) 2 (6%) Concomitant medications Calcium channel blocker 12 (19%) 33 (36%) 11 (33%) 12 (36%) ER antagonist 11 (17%) 43 (47%) 9 (27%) 8 (24%)

137 PDE inhibitor 12 (19%) 24 (26%) 9 (27%) 8 (24%) Prostaglandin analog 3 (5%) 24 (26%) 1 (3%) 4 (12%) PAH Pulmonary arterial hypertension, RVSP Right ventricular systolic pressure, WHO World Health Organization, RV Right ventricular, ER Endothelin receptor antagonist, PDE Phosphodiesterase 126

138 Table 6.3 Cox proportional hazards model survival analysis Statistical method Sample size Bayesian Hazard Ratio* median (95% CrI) SSc-PAH Unmatched data (1.02, 2.19) Matched data noninformative prior Matched data informative group prior Matched data informative optimistic prior (0.59, 2.02) (0.70, 1.63) (0.65, 1.46) IPAH Unmatched data (0.63, 2.92) Matched data noninformative prior Matched data informative group prior Matched data informative optimistic prior (0.35, 3.01) (0.57, 1.98) (0.54, 1.64) CrI Credible Interval, CI Confidence Interval, SSc-PAH Scleroderma-associated pulmonary arterial hypertension, IPAH idiopathic pulmonary arterial hypertension, NA Not applicable 127

139 Figure 6.1 Bayesian tri-plot for effect of warfarin on survival in SSc-PAH patients. Notes: Hazard ratio great than 1 indicates increased mortality associated with warfarin exposure. Hazard ratio less than 1 indicates decreased mortality associated with warfarin exposure. 128

140 Figure 6.2 Bayesian tri-plot for effect of warfarin on survival in IPAH patients. Notes: Hazard ratio great than 1 indicates increased mortality associated with warfarin exposure. Hazard ratio less than 1 indicates decreased mortality associated with warfarin exposure. 129

141 Relative probability Figure 6.3 Density plot for difference in median survival times in SSc-PAH patients untreated and treated with warfarin using an informative group prior Notes: Differences in median survival greater than 0 indicate improved survival associated with warfarin exposure. Differences in median survival less than 0 indicate worsened survival associated with warfarin exposure. 130

142 Relative probability Figure 6.4 Density plot for difference in median survival times in IPAH patients untreated and treated with warfarin using an informative group prior Notes: Differences in median survival greater than 0 indicate improved survival associated with warfarin exposure. Differences in median survival less than 0 indicate worsened survival associated with warfarin exposure. 131

143 CHAPTER 7 Synthesis The methodologic lessons taught by randomized trials can lead to new paradigms, concepts and approaches that will achieve fundamental standards when randomized trials are not possible. The current methods need substantial improvement to produce trustworthy scientific evidence. Alvan Feinstein, Science Making unbiased estimates of treatment effects using observational data of uncommon disease has two significant sets of challenges. The first is challenges related to the small numbers of patients with the uncommon disease available for study. The second is challenges related to the observational (non-randomized or non-experimental) study design. Together these challenges may lead to biased, imprecise estimates of the effect of a treatment. Feinstein called upon investigators to develop and adopt rigorous methods in observational studies to make estimates of treatment effect comparable to estimates obtained from a randomized trial.(13) Through the combined use of the Bayesian inferential statistical paradigm, principles of measurement science, belief elicitation, and propensity score modeling, I have demonstrated an innovative strategy to make estimates of a treatment effect using observational data in an uncommon disease. What has been achieved? In this dissertation I have achieved my methodological aims. I have developed a conceptual framework for the way clinicians form beliefs about treatment effect, and identified biases that may potentially threaten the validity and reliability of the elicited belief. I have developed a scientific method to quantifiably elicit experts beliefs in the form of Bayesian priors that can be included in models of treatment effect. This method 132

144 incorporates strategies that may reduce the effect of potential biases. This method has demonstrable feasibility, validity and reliability. I have used information from experts to directly inform propensity score model building. I have developed methods in a Bayesian framework to make propensity-score adjusted estimates of treatment effect using observational data with a survival end point. Furthermore, in this dissertation, I have achieved my clinical aims. I have determined experts beliefs regarding the effect of warfarin on survival on SSc-PAH and IPAH. I have determined factors that influence experts use of warfarin in SSc-PAH and IPAH, respectively. I have evaluated the effect of warfarin on survival in SSc-PAH and IPAH, respectively. Most importantly, through this dissertation, I have demonstrated that making more valid (through use of bias reduction strategies and use of all sources of knowledge) estimates of treatment effect using observational data of an uncommon disease can be successfully achieved. Why this work is important Rising to Feinstein s challenge. Feinstein s challenge incorporates 2 components: 1) the application of principles used in randomized trials to observational data, and 2) the achievement of rigorous standards. A core principle of a randomized trial is random allocation of exposure. The absence of randomization in an observational study makes any inferences about treatment effect susceptible to confounding, in particular confounding by indication. In its simple form, confounding can be considered a confusion or mixing of effects where the effect of an exposure is distorted because of the effect of another variable mixed with the actual exposure effect.(186, 187) A more nuanced way of thinking about confounding by indication is use of the counterfactual definition. (8, 188) 133

145 Savitz sets up the counterfactual concept as follows: The ideal comparison group for the exposed group is the exposed group itself but under the condition of not having been exposed, an experience that did not, in fact, occur (thus it is counterfactual). If we could observe this experience (which we cannot), we would be able to compare the disease occurrence under the situation in which exposure has occurred to the counterfactual one in which everything else is the same except exposure was not present. Instead, we choose some other group to provide an estimate of what the experience of the exposed group would have been absent the exposure.. (188) Thus, investigators are trying to estimate what they would have observed had the same individual been exposed and nonexposed at the same time an unobservable difference.(189) When comparing different groups of subjects, the unexposed group may have other factors that influence the disease, and make their disease experience an inaccurate reflection of the disease experience of the exposed had they not been exposed. This is referred to as non-exchangeability in that the exposed and nonexposed are not exchangeable, aside from the effect of the exposure itself.(8) Confounding by indication is, therefore, a bias that causes a distortion in the measured treatment effect resulting from the way the groups were constructed. In order to rise to Feinstein s challenge of applying principles of randomized trials to the observational studies, strategies are needed to try to achieve exchangeability. That is, methods are needed to produce groups that are functionally randomized.(8) Use of propensity score matching is one method that attempts to solve this challenge. Observed characteristics are balanced at each level of propensity score. Exposed and unexposed patients are matched on the propensity score. This creates a pseudorandomized scenario, facilitating exchangeability. This allows unbiased estimation of the effect of a treatment at each value of the propensity score.(40) Added value of Bayesian inference. When facing challenges in the study of uncommon disease, our use of the Bayesian statistical paradigm has provided added value. First, we used more of the available knowledge to make inferences about treatment effect. In clinical situations that lack randomized trial data, the ability to make 134

146 inferences from available knowledge is valuable. In such situations, clinicians rely on experts to guide therapeutic decision-making. We elicited knowledge from experts in a scientific and quantifiable manner for use in models estimating treatment effect. This approach has demonstrable validity and reliability, thereby meeting Feinstein s challenge of rigorous scientific standards. We elicited beliefs about factors that should be included in the propensity score model. It has been recommended that variable selection for model building should be based on what experts feel is important.(48, 49) Few studies actually ask experts or clinicians. Most studies chose variables based on judgment of the investigator(s). We used information from those who guide therapeutic decision-making to inform model-building, thereby attempting to mirror clinical reality. Most importantly, our use of the Bayesian paradigm allowed us to make inferences about the effect of warfarin on survival using relatively small numbers of patients. Using these strategies, we made use of different forms of knowledge in the estimation of treatment effect. The Bayesian approach also allowed us to make a clinically relevant interpretation of the data. A frequentist interpretation of a 95% confidence interval for a hazard ratio that crosses one would not have been clinically helpful. We would have had to conclude that there is insufficient evidence to reject the null hypothesis of no effect of warfarin on survival in SSc-PAH and IPAH, respectively. This is not the same thing as proving that warfarin doesn t work. In contrast, the Bayesian paradigm allowed us to make direct probability statements there is a low probability that warfarin improves survival in SSc- PAH and IPAH. Furthermore, through the use of enthusiastic priors, we were able to demonstrate that even with the incorporation of beliefs of experts with an optimistic view of warfarin, there is still a low probability of a beneficial effect on survival. Our use of direct probability statements facilitated easy interpretation of the study findings for consumers of the medical literature. Broad implications of this dissertation Clinical implications. In this dissertation I found that warfarin has a low probability of a beneficial effect on survival in either SSc-PAH or IPAH. Given the availability of other PAH specific therapies with demonstrable survival benefit, I believe there is little role for 135

147 the use of warfarin in SSc-PAH and IPAH, respectively. However, this conclusion must be tempered. My study excluded patients with other indications for warfarin such as history of deep vein thrombosis, pulmonary embolism and atrial fibrillation. The use of warfarin in SSc-PAH and IPAH patients may still be warranted if other indications are present. Methodological implications. The methods that have been developed and applied in this dissertation are pragmatic, reproducible and not unique to this clinical area. All of these methods may be generalized to other disciplines. Our belief elicitation method can be used by other investigators who wish to elicit and quantify belief about treatment effects for inclusion into Bayesian models. Indeed, my belief elicitation method for Bayesian priors has already been utilized in other fields including pediatric surgery(190), chiropractic medicine(191), the pharmaceutical industry(192) and geriatrics.(193) For example, Diamond et al. used this belief elicitation method to elicit the beliefs of intestinal failure experts (pediatric surgeons, transplant surgeons, neonatologists, pediatric gastroenterologists, dieticians, and nurse practitioners) regarding the probability of intestinal failure-associated liver disease in infants treated with intravenous lipid emulsions.(190) The resulting prior probability distribution was then used to inform a Bayesian clinical trial evaluating these therapies. Hincapie et al. used my elicitation method to elicit the beliefs of chiropractors, general practitioners, orthopedic surgeons and neurosurgeons regarding the effect of chiropractic care on the risk of developing acute lumbar disc herniation, and acute severe lumbar disc herniation, that is surgically managed.(191) Deandrea et al. used my belief elicitation method to elicit beliefs about risk factors for falls in community dwelling older people among a sample of geriatricians and general practitioners. The elicited prior probability distributions were then incorporated in a Bayesian meta-analysis of observational studies.(193) Similarly, Bayesian analysis of propensity score matched observational data with a survival outcome can be applied to other uncommon (and common) diseases. Prior to this dissertation, there was some uncertainty regarding the ability to apply propensity score methods in the setting of small sample sizes.(194) Using propensity score matching, I have demonstrated that this can be successfully applied in my small 136

148 datasets, and is not limited to administrative data. Whether I can generalize from my experience to all small datasets remains uncertain. In a dataset with more confounding by indication, that is, several strong and related predictors of treatment assignment, it may be more challenging. Another issue, to consider when using propensity score matching in the setting of uncommon diseases, is that my inferences are only based on the patients who were successfully matched. Patients who were unmatched were not included in the analysis. In both the SSc-PAH and IPAH datasets, unmatched patients tended to have the extremes of propensity scores, that is, scores less than 0.2 or greater than 0.8. Others, when using propensity score matching using observational data of uncommon disease, have reported this finding.(195) The loss of patients may result in a reduction in the precision of the estimated effect of treatment. However, patients with the extremes of propensity score may reflect patients with milder or more severe disease, and for whom direct comparisons may be biased. Limiting the analysis to matched patients may improve the validity of the inference since it is the intermediate propensity score matched patients for whom there is true uncertainty regarding the appropriate treatment, i.e., to treat with or not treat with warfarin. The patients with the extremes of propensity score are not truly uncertain, because patients with the high scores are likely to be treated, and the patients with the low scores are likely to be untreated. Policy implications. The use of Bayesian methods for drug regulation is controversial. The United States Food and Drug Administration has made recommendations for the use of Bayesian statistics in medical device clinical trials.(17) For medical device clinical trials, it is currently felt that Bayesian methods may be considered when the reasons for their use are clear and the resulting conclusions are sufficiently robust. The meaning of clear reasons and robust conclusions is not specified. Regulatory agencies views on the use of Bayesian methods for the evaluation of drugs, and the role of Bayesian statistics using observational data, is unclear. The use of priors in drug and device regulation is also controversial. Some argue that the use of priors is inappropriate,(196, 197) while others argue that priors should be mandatory. (198) 137

149 In the setting of uncommon disease and the absence of randomized trial data, I believe that Bayesian methods may be particularly advantageous. In some situations, it is unlikely that a clinical trial will ever be conducted (due to cost, small numbers of patients, political will), yet data are needed to inform decision-making. Bayesian methods, as used in this dissertation, can be used in combination with existing observational cohorts or registries. This approach can provide regulatory agencies, as well as public and private health insurance carriers, evidence of treatment effect to inform policy. Limitations and future research directions Threat of unmeasured confounding. A limitation to the use of observational data to make unbiased estimates of treatment effect and a limitation of propensity score modeling is the threat of unmeasured confounders. Classically, only factors that have been measured can be accounted for in the design and analytic phases. Recently, innovative strategies have been proposed to account for unmeasured confounding. These include Bayesian bias analysis(199), instrumental variable analysis(200), and tracer analysis(201). Conventionally, these strategies have been used in large datasets. Although they hold great promise for the study of uncommon diseases, how they operate in the setting of small samples is uncertain. In my next phase of research, I propose to use all three strategies and evaluate if any methodology confers incremental value over the others in the study of uncommon diseases. Belief-based priors may be wrong. Critics of the inclusion of belief-based priors in models estimating treatment effect are concerned that beliefs may be wrong. A clinician may be over-confident, with a prior distribution that is too tight and does not reflect realistic doubt.(139) One solution is to include experts in the area. Spiegelhalter argues that over-confidence is usually observed in participants who are less knowledgeable about the topic.(139) Experts usually demonstrate a willingness to have wide (less confident) prior distributions.(132) A second concern is that experts may be biased in their opinion (excessively optimistic(202) or excessively skeptical(203) with prior distributions shifted too far towards or away from a beneficial treatment effect). To address this concern, it has been recommended to present a range of priors from 138

150 different perspectives.(181). This approach is crucial. It is of paramount importance to present as distinct entities: the evidence using the data, and the evidence using the data with a variety of prior distributions. One can then ascertain if the priors and the data are concordant or discordant. If discordant (i.e., evidence of conflict)(204), further evaluation may identify the reason for the discordance. It may be that the prior is biased, or there is a problem with the data (e.g., beta blocker use in perioperative medicine).(205) In this dissertation, I did not use a single prior, but examined the evidence from the point of view of the group, optimistic clinicians and pessimistic clinicians. In this case, there was no evidence of conflict between the expert-based prior and the data.(53) Uncertainty in the propensity score. In this dissertation (and virtually all published literature) the propensity score is treated as a fixed entity. In the Bayesian setting, there is uncertainty about each subject s propensity score reflecting uncertainty from the choice of variables in the propensity score model and uncertainty in the values of the estimated parameters in the propensity score model. It is unknown if a treatment effect can be estimated while accounting for uncertainty in the propensity score. If so, it is unclear how this will affect the ability to apply propensity score methods in small data sets. In my next phase of research, I plan to explore the ability to propagate uncertainty in the propensity score through estimation of a treatment effect, and its application in uncommon disease datasets. Lack of generalizability to other types of PAH. We have demonstrated a low probability of a beneficial treatment effect of warfarin for improving survival in SSc-PAH and IPAH. Whether these findings are generalizable to other forms of PAH is uncertain. The question still remains; does warfarin improve survival in other types of PAH? Given the availability of longitudinal observational data for PAH in other settings, the evaluation of this clinical question is the next step in this clinical program of research. Conclusion In this dissertation, I have risen to Feinstein s challenge. Using the principles of Bayesian statistical inferences, measurement science, and propensity score modeling, I have successfully developed and applied methods that use observational data in an 139

151 uncommon disease to make more valid estimates of treatment effect. Using a broad range of clinical epidemiologic skills (systematic review of the literature, belief elicitation in a cross-sectional study design, analysis of observational cohort data with a survival outcome), I have demonstrated that warfarin has a low probability of improving survival in SSc-PAH and IPAH, respectively. The successful application of these skills and methods is important for the study of uncommon diseases. These methods can be generalized to other disciplines, and this body of work lays a solid foundation for future research. 140

152 References 1. Bloom S. Registries in chronic disease: coming your way soon? Registries-- problems, solutions and the future. Rheumatology (Oxford) Jan;50(1): Bosco JL, Silliman RA, Thwin SS, Geiger AM, Buist DS, Prout MN, et al. A most stubborn bias: no adjustment method fully resolves confounding by indication in observational studies. J Clin Epidemiol Jan;63(1): Rubin DB. Estimating causal effects from large data sets using propensity scores. Ann Intern Med Oct 15;127(8 Pt 2): Berry DA. Bayesian clinical trials. NatRevDrug Discov. 2006;5(1): Pope JE, Bellamy N, Seibold JR, Baron M, Ellman M, Carette S, et al. A randomized, controlled trial of methotrexate versus placebo in early diffuse scleroderma. Arthritis Rheum Jun;44(6): Burton PR, Gurrin LC, Campbell MJ. Clinical significance not statistical significance: a simple Bayesian alternative to p values. J EpidemiolCommunity Health. 1998;52(5): Hughes MD, Williams PL. Challenges in using observational studies to evaluate adverse effects of treatment. N Engl J Med Apr 26;356(17): Savitz DA. Interpreting Epidemiologic Evidence. Strategies for study design and analysis. Oxford: Oxford University Press, Inc.; Berry DA. Statistics: A Bayesian Perspective: Wadsworth Publishing Company; Fletcher RH, Fletcher SW. Clinical Epidemiology. The Essentials. Fourth ed. Baltimore: Lippincott Williams and Wilkins; Feinstein AR. Clinical Epidemiology. The Architecture of clinical research. Philadelphia: W. B. Saunders Company; Salas M, Hofman A, Stricker BH. Confounding by indication: an example of variation in the use of epidemiologic terminology. Am J Epidemiol Jun 1;149(11): Feinstein AR. Scientific standards in epidemiologic studies of the menace of daily life. Science Dec 2;242(4883): Hudson M, Suissa S. Avoiding common pitfalls in the analysis of observational studies of new treatments for rheumatoid arthritis. Arthritis Care Res (Hoboken) Jun;62(6):

153 15. Hawking S, Mlodinow L. The Grand Design: Bantam Books; Streiner DL, Norman GR. Health Measurement Scales. A Practical Guide to their Development and Use Fourth ed. Oxford: Oxford Universit Press; Campbell G. Guidance for the Use of Bayesian Statistics in Medical Device Clinical Trials. Rockville, MD Malakoff D. Bayes offers a 'new' way to make sense of numbers. Science Nov 19;286(5444): Kadane JB. Prime time for Bayes. Control Clin Trials Oct;16(5): Bayes T. An essay towards solving a problem in the doctrine of chances. Philos T Roy Soc. 1763;53: Barnard G, Bayes T. Studies in the history of probability and statistics: IX. Thomas Bayes's essay towards solving a problem in the doctrine of chances. Biometrika. 1958;45(3/4): Wijeysundera DN, Austin PC, Hux JE, Beattie WS, Laupacis A. Bayesian statistical inference enhances the interpretation of contemporary randomized controlled trials. J Clin Epidemiol Jan;62(1):13-21 e Burton PR. Helping doctors to draw appropriate inferences from the analysis of medical studies. Stat Med Sep 15;13(17): Johnson SR, Feldman BM, Pope JE, Tomlinson GA. Shifting our thinking about uncommon disease trials: the case of methotrexate in scleroderma. J Rheumatol Feb;36(2): Johnson SR. Bayesian inference: statistical gimmick or added value? J Rheumatol May;38(5): Huber AM, Tomlinson GA, Koren G, Feldman BM. Amitriptyline to relieve pain in juvenile idiopathic arthritis: a pilot study using Bayesian metaanalysis of multiple N-of-1 clinical trials. J Rheumatol May;34(5): Launois R, Avouac B, Berenbaum F, Blin O, Bru I, Fautrel B, et al. Comparison of certolizumab pegol with other anticytokine agents for treatment of rheumatoid arthritis: a multiple-treatment Bayesian metaanalysis. J Rheumatol May;38(5): Spiegelhalter DJ, Abrams KR, Myles JP. Bayeisan approaches to clinical trials and health care evaluation. Chichester: John Wiley and Sons Ltd Felson DT, Anderson JJ, Boers M, Bombardier C, Furst D, Goldsmith C, et al. American College of Rheumatology. Preliminary definition of improvement in rheumatoid arthritis. Arthritis Rheum Jun;38(6):

154 30. Guyot P, Taylor PC, Christensen R, Pericleous L, Drost P, Eijgelshoven I, et al. Indirect treatment comparison of abatacept with methotrexate versus other biologic agents for active rheumatoid arthritis despite methotrexate therapy in the United kingdom. J Rheumatol Jun;39(6): Bernatsky S, Joseph L, Pineau CA, Belisle P, Boivin JF, Banerjee D, et al. Estimating the prevalence of polymyositis and dermatomyositis from administrative data: age, sex and regional differences. Ann Rheum Dis Jul;68(7): Bernatsky S, Lix L, Hanly JG, Hudson M, Badley E, Peschken C, et al. Surveillance of systemic autoimmune rheumatic diseases using administrative data. Rheumatol Int Apr;31(4): Pope JE, Ouimet JM, Krizova A. Scleroderma treatment differs between experts and general rheumatologists. Arthritis Rheum Feb 15;55(1): Kowal-Bielecka O, Distler O. Use of methotrexate in patients with scleroderma and mixed connective tissue disease. Clin Exp Rheumatol Sep-Oct;28(5 Suppl 61):S Willan A. Why a Bayesian Be? In: Canada SSo, editor. Wolfville, NS Abrahamyan L, Diamond IR, Feldman BM, Johnson SR. Rare Disease Trials: Developing the Evidence Base. Clin Pharmacol Ther. 2012;Submitted. 37. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational studies for causal effects. Biometrika. 1983;70(1): D'Agostino RB. Tutorial in biostatistics. Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Statist Med. 1998;17: Rosenbaum PR, Rubin DB. Reducing bias in observational studies using subclassification on the propensity score. Journal of the American Statistical Association. 1984;79(387): Williamson E, Morley R, Lucas A, Carpenter J. Propensity scores: From nave enthusiasm to intuitive understanding. Stat Methods Med Res Jan Austin PC. Some methods of propensity-score matching had superior performance to others: results of an empirical investigation and Monte Carlo simulations. BiomJ. 2009;51(1): Rosenbaum PR, Rubin DB. Constructing a control group using multivariate matched sampling methods that incorporates the propensity score. Am Stat. 1985;39(1): Cochran WG. The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics Jun;24(2):

155 44. Cole SR, Hernan MA. Constructing inverse probability weights for marginal structural models. Am J Epidemiol Sep 15;168(6): Rubin DB. The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials. Stat Med Jan 15;26(1): Austin PC, Grootendorst P, Anderson GM. A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: a Monte Carlo study. Stat Med Feb 20;26(4): Austin PC. Balance diagnostics for comparing the distribution of baselien covariates between tretament groups in propensity-score matched samples. Statist Med. 2009;28: Austin PC. Propensity-score matching in the cardiovascular surgery literature from 2004 to 2006: a systematic review and suggestions for improvement. J ThoracCardiovascSurg. 2007;134(5): McCandless LC, Gustafson P, Austin PC. Bayesian propensity score analysis for observational data. Stat Med Jan 15;28(1): Johnson SR, Tomlinson GA, Hawker GA, Granton JT, Feldman BM. Methods to elicit beliefs for Bayesian priors: a systematic review. J Clin Epidemiol. 2010;63(4): Johnson SR, Tomlinson GA, Hawker GA, Granton JT, Grosbein HA, Feldman BM. A valid and reliable belief elicitation method for Bayesian priors. J Clin Epidemiol. 2010;63(4): Johnson SR, Granton JT, Tomlinson GA, Grosbein HA, Hawker GA, Feldman BM. Effect of warfarin on survival in scleroderma-associated pulmonary arterial hypertension (SSc-PAH) and idiopathic PAH. Belief elicitation for Bayesian priors. J Rheumatol Mar;38(3): Johnson SR, Granton JT, Tomlinson GA, Grosbein HA, Le T, Lee P, et al. Warfarin in systemic sclerosis-associated and idiopathic pulmonary arterial hypertension. A Bayesian approach to evaluating treatment for uncommon disease. J Rheumatol Feb;39(2): Silman AJ. Scleroderma and survival. AnnRheumDis. 1991;50(4): Steen VD, Medsger TA. Changes in causes of death in systemic sclerosis, Ann RheumDis. 2007;66(7): Salerni R, Rodnan GP, Leon DF, Shaver JA. Pulmonary hypertension in the CREST syndrome variant of progressive systemic sclerosis (scleroderma). AnnInternMed. 1977;86(4):

156 57. Ungerer RG, Tashkin DP, Furst D, Clements PJ, Gong H, Jr., Bein M, et al. Prevalence and clinical correlates of pulmonary arterial hypertension in progressive systemic sclerosis. AmJMed. 1983;75(1): Mukerjee D, St George D, Coleiro B, Knight C, Denton CP, Davar J, et al. Prevalence and outcome in systemic sclerosis associated pulmonary arterial hypertension: application of a registry approach. Ann RheumDis. 2003;62(11): Murata I, Takenaka K, Yoshinoya S, Kikuchi K, Kiuchi T, Tanigawa T, et al. Clinical evaluation of pulmonary hypertension in systemic sclerosis and related disorders. A Doppler echocardiographic study of 135 Japanese patients. Chest Jan;111(1): Fisher MR, Mathai SC, Champion HC, Girgis RE, Housten-Harris T, Hummers L, et al. Clinical differences between idiopathic and scleroderma-related pulmonary hypertension. Arthritis Rheum Sep;54(9): Koh ET, Lee P, Gladman DD, Abu-Shakra M. Pulmonary hypertension in systemic sclerosis: an analysis of 17 patients. Br J Rheumatol Oct;35(10): Campo A, Mathai SC, Le Pavec J, Zaiman AL, Hummers LK, Boyce D, et al. Hemodynamic predictors of survival in scleroderma-related pulmonary arterial hypertension. Am J RespirCrit Care Med. 2010;182(2): Peacock AJ, Murphy NF, McMurray JJ, Caballero L, Stewart S. An epidemiological study of pulmonary arterial hypertension. Eur RespirJ. 2007;30(1): D'Alonzo GE, Barst RJ, Ayres SM, Bergofsky EH, Brundage BH, Detre KM, et al. Survival in patients with primary pulmonary hypertension. Results from a national prospective registry. AnnIntern Med. 1991;115(5): Barst RJ, Galie N, Naeije R, Simonneau G, Jeffs R, Arneson C, et al. Long-term outcome in pulmonary arterial hypertension patients treated with subcutaneous treprostinil. Eur RespirJ. 2006;28(6): McLaughlin VV, Sitbon O, Badesch DB, Barst RJ, Black C, Galie N, et al. Survival with first-line bosentan in patients with primary pulmonary hypertension. Eur RespirJ. 2005;25(2): Badesch DB, Abman SH, Simonneau G, Rubin LJ, McLaughlin VV. Medical therapy for pulmonary arterial hypertension: updated ACCP evidence-based clinical practice guidelines. Chest Jun;131(6): McLaughlin VV, Archer SL, Badesch DB, Barst RJ, Farber HW, Lindner JR, et al. ACCF/AHA 2009 expert consensus document on pulmonary hypertension a report of the American College of Cardiology Foundation Task Force on Expert Consensus Documents and the American Heart Association developed in collaboration with the 145

157 American College of Chest Physicians; American Thoracic Society, Inc.; and the Pulmonary Hypertension Association. J Am CollCardiol. 2009;53(17): Galie N, Hoeper MM, Humbert M, Torbicki A, Vachiery JL, Barbera JA, et al. Guidelines for the diagnosis and treatment of pulmonary hypertension: the Task Force for the Diagnosis and Treatment of Pulmonary Hypertension of the European Society of Cardiology (ESC) and the European Respiratory Society (ERS), endorsed by the International Society of Heart and Lung Transplantation (ISHLT). Eur Heart J. 2009;30(20): Johnson SR, Mehta S, Granton JT. Anticoagulation in pulmonary arterial hypertension: a qualitative systematic review. EurRespirJ. 2006;28(5): Fuster V, Steele PM, Edwards WD, Gersh BJ, McGoon MD, Frye RL. Primary pulmonary hypertension: natural history and the importance of thrombosis. Circulation Oct;70(4): Ogata M, Ohe M, Shirato K, Takishima T. Effects of a combination therapy of anticoagulant and vasodilator on the long-term prognosis of primary pulmonary hypertension. Jpn Circ J Jan;57(1): Rich S, Kaufmann E, Levy PS. The effect of high doses of calcium-channel blockers on survival in primary pulmonary hypertension. N Engl J Med Jul 9;327(2): Roman A, Rodes-Cabau J, Lara B, Bravo C, Monforte V, Pallissa E, et al. [Clinico-hemodynamic study and treatment of 44 patients with primary pulmonary hypertension]. Med Clin (Barc) Jun 1;118(20): Kawut SM, Horn EM, Berekashvili KK, Garofano RP, Goldsmith RL, Widlitz AC, et al. New predictors of outcome in idiopathic pulmonary arterial hypertension. Am J Cardiol Jan 15;95(2): Goodwin JF, Harrison CV, Wilcken DE. Obliterative pulmonary hypertension and thrombo-embolism. Br Med J Mar 16;1(5332): contd. 77. Goodwin JF, Harrison CV, Wilcken DE. Obliterative pulmonary hypertension and thromboembolism. Br Med J Mar 23;1(5333): concl. 78. Storstein O, Efskind L, Muller C, Rokseth R, Sander S. Primary pulmonary hypertension with emphasis on its etiology and treatment. Acta Med Scand Feb;179(2): Frank H, Mlczoch J, Huber K, Schuster E, Gurtner HP, Kneussl M. The effect of anticoagulant therapy in primary and anorectic drug-induced pulmonary hypertension. Chest Sep;112(3):

158 80. Barst RJ, Gibbs JS, Ghofrani HA, Hoeper MM, McLaughlin VV, Rubin LJ, et al. Updated evidence-based treatment algorithm in pulmonary arterial hypertension. J Am CollCardiol. 2009;54(1 Suppl):S78-S Horn EM, Barst RJ, Poon M. Epoprostenol for treatment of pulmonary hypertension in patients with systemic lupus erythematosus. Chest Oct;118(4): Beyth RJ, Quinn LM, Landefeld CS. Prospective evaluation of an index for predicting the risk of major bleeding in outpatients treated with warfarin. AmJMed. 1998;105(2): McMahan DA, Smith DM, Carey MA, Zhou XH. Risk of major hemorrhage for outpatients treated with warfarin. JGenInternMed. 1998;13(5): Omair MA, Johnson SR. Gastric antral vascular ectasia unmasked by alprostadil for digital ulceration in scleroderma. J Rheumatol. 2011;38(4): Duchini A, Sessoms SL. Gastrointestinal hemorrhage in patients with systemic sclerosis and CREST syndrome. Am J Gastroenterol Sep;93(9): Chaloner K, Rhame FS. Quantifying and documenting prior beliefs in clinical trials. Stat Med. 2001(4): Carlin BP, Chaloner K, Church T, Louis TA, Matts JP. Bayesian Approaches for Monitoring Clinical Trials with an Application to Toxoplasmic Encephalitis Prophylaxis. Statistician. 1993; White IR, Pocosk SJ, Wang D. Eliciting and using expert opinions about influence of patient characteristics on treatment effects: A Bayesian analysis of CHARM trials. Stat Med ;24: Moye LA. Bayesians in clinical trials: asleep at the switch. Stat Med Feb 20;27(4):469-82; discussion Spiegelhalter DJ. Incorporating Bayesian ideas into health-care evaluation. Statistical Science ;19(1): Singh JA, Solomon DH, Dougados M, Felson D, Hawker G, Katz P, et al. Development of classification and response criteria for rheumatic diseases. Arthritis Rheum Jun 15;55(3): Johnson SR, Hawker GA, Davis AM. The health assessment questionnaire disability index and scleroderma health assessment questionnaire in scleroderma trials: an evaluation of their measurement properties. Arthritis Rheum. 2005;53(2): Liang MH. Longitudinal construct validity: establishment of clinical meaning in patient evaluative instruments. Med Care Sep;38(9 Suppl):II

159 94. Feinstein AR. The Theory and Evaluation of Sensibility. In: Feinstein AR, editor. Clinimetrics. New Haven: Yale University Press; p Kadane JB, Wolfson LJ. Experiences in Elicitation. Statistician. 1998;47: Lehmann HP, Goodman SN. Bayesian Communication: A cliniclly significant paradigm for electronic publication. J Am Med Technol. 2000;3: Ramachandran G. Retrospective exposure assessment using Bayesian methods. Ann Occup Hyg Nov;45(8): Kadane JB. Progress toward a more ethical method for clinical trials. J Med Philos. 1986;11(4): Hutton JL, Owens RG. Bayesian Sample Size Calculation and Prior Beliefs About Child Sexual Abuse. Statistician. 1993; Van der Fels-Klerx IH, Goossens LH, Saatkamp HW, Horst SH. Elicitation of quantitative data from a heterogeneous expert panel: formal process and application in animal health. Risk Anal. 2002;22(1): Freedman LS, Spiegelhalter DJ. The Assessment of Subjective Opinion and its Use in Relation to Stopping Rules for Clinical Trials. Statistician. 1983; Spiegelhalter DJ, Freedman LS, Parmar MK. Applying Bayesian ideas in drug development and clinical trials. Stat Med Aug;12(15-16): ; discussion O'Hagan A. Eliciting Expert Beliefs in Substantial Practical Application. Statistician. 1998; Bergus GR, Chapman GB, Gjerde C, Elstein AS. Clinical reasoning about new symptoms despite preexisting disease: sources of error and order effects. FamMed. 1995;27(5): Carter BL, Butler CD, Rogers JC, Holloway RL. Evaluation of physician decision making with the use of prior probabilities and a decision-analysis model. ArchFamMed. 1993;2(5): Chaloner K, Church T, Louis TA, Matts JP. Graphical Elicitation of a Prior Distribution for a Clinical Trial. Statistician. 1993; Evans JS, Handley SJ, Over DE, Perham N. Background beliefs in Bayesian inference. MemCognit. 2002;30(2): Gustafson DH, Sainfort F, Eichler M, Adams L, Bisognano M, Steudel H. Developing and testing a model to predict outcomes of organizational change. Health ServRes. 2003;38(2):

160 109. Johnson NP, Fisher RA, Braunholtz DA, Gillett WR, Lilford RJ. Survey of Australasian clinicians' prior beliefs concerning lipiodol flushing as a treatment for infertility: A Bayesian study. Aust NZ J Obstet Gyn. 2006(4): Van Der Wilt GJ, Rovers M, Straatman H, Van Der Bij S, Van Den Broek P, Zielhuis G. Policy relevance of Bayesian statistics overestimated? Int J Technol Assess. 2004(4): Rovers MM, van der Wilt GJ, van der Bij S, Straatman H, Ingels K, Zielhuis GA. Bayes' theorem: a negative example of a RCT on grommets in children with glue ear. EurJ Epidemiol. 2005;20(1): Winkler RL. The Assessment of Prior Distributions in Bayesian Analysis. J Am Stat Assoc. 1967;62: Chaloner K. Elicitation of Prior Distributions. In: Berry DA, Stangl DK, editors. Bayesian Biostatistics. New York: Marcel Dekker Inc; p Normand SL, Frank RG, McGuire TG. Using elicitation techniques to estimate the value of ambulatory treatments for major depression. Med DecisMaking. 2002;22(3): Lilford R. Formal measurement of clinical uncertainty: prelude to a trial in perinatal medicine. The Fetal Compromise Group. BMJ. 1994;308(6921): Lilford RJ, Braunholtz D. The statistical basis of public policy: a paradigm shift is overdue. BMJ. 1996;313(7057): Garthwaite PH, Dickey JM. An Elicitation Method for Multiple Linear Regression Models. J Behav Decis Making. 1991; Garthwaite PH, Dickey JM. Elicitation of Prior Distributions for Variable Selection Problems in Regression. Ann Stat. 1992;20 (4): White IR, Pocock SJ, Wang D. Eliciting and using expert opinions about influence of patient characteristics on treatment effects: A Bayesian analysis of the CHARM trials. Stat Med. 2005(24): Parmar MK, Spiegelhalter DJ, Freedman LS. The CHART trials: Bayesian design and monitoring in practice. CHART Steering Committee. StatMed. 1994;13(13-14): Parmar MK, Griffiths GO, Spiegelhalter DJ, Souhami RL, Altman DG, van der SE. Monitoring of large randomised clinical trials: a new approach with Bayesian methods. Lancet. 2001;358(9279): Tan SB, Chung YF, Tai BC, Cheung YB, Machin D. Elicitation of prior distributions for a phase III randomized controlled trial of adjuvant therapy with surgery for hepatocellular carcinoma. Control Clin Trials. 2003;24(2):

161 123. de Vet HC, Kessels AG, Leffers P, Knipschild PG. A randomized trial about the perceived informativeness of new empirical evidence. Does beta-carotene prevent (cervical) cancer? J Clin Epidemiol. 1993;46(6): Jones P, Johanson R, Baldwin KJ, Lilford R. Changing belief in obstetrics: impact of two multicentre randomised controlled trials. Lancet. 1998;352(9145): Hughes MD. Practical reporting of Bayesian analyses of clinical trials. Drug Inf J. 1991(3): Parmar MKB, Ungerleider RS, Simon R. Assessing whether to perform a confirmatory randomized clinical trial. J Natl Cancer I. 1996(22): Errington RD, Ashby D, Gore SM, Abrams KR, Myint S, Bonnett DE, et al. High energy neutron treatment for pelvic cancers: study stopped because of increased mortality. BMJ. 1991;302(6784): Flournoy N. A Clinical Experiment in Bone Marrow Transplantation: Estimating a Percentage Point of a Quantal Response Curve. In: Gatsonis C, Hodges JS, Kass RE, Singpurwalla ND, editors. Lecture Notes in Statistics. New York: Sringer-Verlag; p Winkler RL. The Quantification of Judgement: Some Methodological Suggestions. J Am Stat Assoc. 1967;62 (320): Evans JS, Brooks PG, Pollard P. Prior beliefs and statistical inference. Brit J Psychiat. 1985;76: Winkler RL. Probabilistic Prediction: Some Experimental Results. J Am Stat Assoc. 1971;66 (336): Hogarth RM. Cognitive Processes and the Assessment of Subjective Probability Distributions. J Am Stat Assoc. 1975;70 (350): Kadane JB. An Application of Robust Bayesian Analysis to a Medical Experiment. J Stat Plan Infer. 1994; Li Y, Krantz DH. Experimental Tests of Subjective Bayesian Methods. The Psychological Record. 2005; Spiegelhalter DJ, Freedman LS, Parmar MK. Bayesian Approaches to Randomized Trials. J R Statist Soc A. 1994; Clemen RT, Wolmark N. Combining probability distributions from experts in risk analysis. Risk Anal. 1999;

162 137. Murphy AH, Winkler RL. Reliability of subjective probability forecasts of precipitation and temperature. Appl Statist. 1977; Wallsten TS, Budescu DV. Encoding Subjective Probabilities: A Psychological and Psychometric Review. Manage Sci. 1983;29 (2): Spiegelhalter DJ, Freedman LS. A predictive approach to selecting the size of a clinical trial, based on subjective clinical opinion. Stat Med Jan-Feb;5(1): Savage LJ. Elicitation of Personal Probabilities and Expectations. J Am Stat Assoc. 1971;66 (336): Dumouchel W. A Bayesian Model and a Graphical Elicitation Procedure for Multiple Comparisons. Bayesian Statistics. 1988; Fink CW. Proposal for the development of classification criteria for idiopathic arthritides of childhood. JRheumatol. 1995;22(8): Genest C, Zidek JV. Combining Probability Distributions: A Critique and an Annotated Bibliography. Stat Sci. 1986;1 (1): O'Hagan A, Buck CE, Daneskhah A, Eiser JR, Garthwaite PH, Jenkinson DJ. Uncertain Judgements. Eliciting experts' probabilities. Chichester: John Wiley & Sons; Trochim WM. The Research Methods Knowledge Base. Cincinnati, Ohio: Atomic Dog Publishing; Morgan MG, Henrion M. Human Judgement about and with Uncertainty. Uncertainty A Guide to Dealing with Uncertainty in Quantitative Risk and Policy Analysis. Cambridge: Cambridge University Press; p McClish DK, Powell SH. How well can physicians estimate mortality in a medical intensive care unit? Med Decis Making. 1989;9(2): Hiance A, Chevret S, Levy V. A practical approach for eliciting expert prior beliefs about cancer survival in phase III randomized trial. J Clin Epidemiol. 2009;62(4): Johnson SR, Granton JT, Mehta S. Thrombotic arteriopathy and anticoagulation in pulmonary hypertension. Chest. 2006;130(2): Weijer C, Shapiro SH, Cranley GK. For and against: clinical equipoise and not the uncertainty principle is the moral underpinning of the randomised controlled trial. BMJ. 2000;321(7263): Dillman DS, Smyth JD, Christian LM. Internet, mail, and mixed-mode surveys. The tailored design method. Hoboken NJ: John Wiley & Sons, Inc;

163 152. Streiner DL, Norman GR. Health Measurement Scales. A Practical Guide to their Development and Use. NewYork: Oxford University Press; Cicchetti DV, Allison T. A new procedure for assessing reliability of scoring EEG sleep recordings. Am J EEG Technol. 1971; Walter SD, Eliasziw M, Donner A. Sample size and optimal designs for reliability studies. Stat Med. 1998;17(1): Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychological Bulletin. 1979; McHorney CA, Tarlov AR. Individual-patient monitoring in clinical practice: are available health status surveys adequate? QualLife Res. 1995;4(4): Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics. 1977;33(1): Nunnally JC. Psychometric theory. New York McGraw-Hill; Scussel-Lonzetti L, Joyal F, Raynauld JP, Roussin A, Rich E, Goulet JR, et al. Predicting mortality in systemic sclerosis: analysis of a cohort of 309 French Canadian patients with emphasis on features at diagnosis as predictive factors for survival. Medicine (Baltimore). 2002;81(2): Williams MH, Das C, Handler CE, Akram MR, Davar J, Denton CP, et al. Systemic sclerosis associated pulmonary hypertension: improved survival in the current era. Heart. 2006;92(7): Welsh CH, Hassell KL, Badesch DB, Kressin DC, Marlar RA. Coagulation and fibrinolytic profiles in patients with severe pulmonary hypertension. Chest. 1996;110(3): Gifford F. Uncertainty about clinical equipoise. Clinical equipoise and the uncertainty principles both require further scrutiny. BMJ Mar 31;322(7289): Granton JT. Canadian Pulmonary Hypertension Trials Network Meeting. Lake Louise Senn S. Testing for baseline balance in clinical trials. Stat Med Sep 15;13(17): Austin PC, Brunner LJ, Hux JE. Bayeswatch: an overview of Bayesian statistics. J EvalClin Pract. 2002;8(2): Norman GR, Streiner DL. Biostatistics. The Bare Essentials. Hamilton: BC Decker;

164 167. Johnson SR, Swiston JR, Granton JT. Prognostic factors for survival in scleroderma associated pulmonary arterial hypertension. J Rheumatol. 2008;35(8): Ogawa A, Matsubara H, Fujio H, Miyaji K, Nakamura K, Morita H, et al. Risk of alveolar hemorrhage in patients with primary pulmonary hypertension--anticoagulation and epoprostenol therapy. Circ J Feb;69(2): Warn DE, Thompson SG, Spiegelhalter DJ. Bayesian random effects metaanalysis of trials with binary outcomes: methods for the absolute risk difference and relative risk scales. Stat Med. 2002;21(11): Sung L, Hayden J, Greenberg ML, Koren G, Feldman BM, Tomlinson GA. Seven items were identified for inclusion when reporting a Bayesian analysis of a clinical study. J Clin Epidemiol Mar;58(3): Steen V, Medsger TA, Jr. Predictors of isolated pulmonary hypertension in patients with systemic sclerosis and limited cutaneous involvement. Arthritis Rheum Feb;48(2): Johnson SR, Feldman BM, Hawker GA. Classification criteria for systemic sclerosis subsets. J Rheumatol Sep;34(9): Leidy NK. Evolving concepts in the measurement of treatment effects. Proc Am Thorac Soc May;3(3): Medsger TA, Jr. Progressive systemic sclerosis. Clin Rheum Dis Dec;9(3): Avouac J, Airo P, Meune C, Beretta L, Dieude P, Caramaschi P, et al. Prevalence of pulmonary hypertension in systemic sclerosis in European Caucasians and metaanalysis of 5 studies. J Rheumatol. 2010;37(11): Hachulla E, Gressin V, Guillevin L, Carpentier P, Diot E, Sibilia J, et al. Early detection of pulmonary arterial hypertension in systemic sclerosis: a French nationwide prospective multicenter study. Arthritis Rheum. 2005;52(12): Channick RN, Simonneau G, Sitbon O, Robbins IM, Frost A, Tapson VF, et al. Effects of the dual endothelin-receptor antagonist bosentan in patients with pulmonary hypertension: a randomised placebo-controlled study. Lancet. 2001;358(9288): Rubin LJ, Badesch DB, Barst RJ, Galie N, Black CM, Keogh A, et al. Bosentan therapy for pulmonary arterial hypertension. NEnglJMed. 2002;346(12): Simonneau G, Robbins IM, Beghetti M, Channick RN, Delcroix M, Denton CP, et al. Updated clinical classification of pulmonary hypertension. J Am CollCardiol. 2009;54(1 Suppl):S43-S

165 180. Austin PC. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat Med Nov 10;28(25): Spiegelhalter D, Abrams K, Myles J. Bayesian approaches to clinical trials and health-care evaluation. Chichester: John Wiley & Sons, Ltd; Johnson SR, Granton JT, Tomlinson GA, Grosbein HA, Le T, Lee P, et al. Warfarin in Systemic Sclerosis-associated and Idiopathic Pulmonary Arterial Hypertension. A Bayesian Approach to Evaluating Treatment for Uncommon Disease. J Rheumatol. 2012;In press Bjornsson J, Edwards WD. Primary pulmonary hypertension: a histopathologic study of 80 cases. Mayo Clin Proc. 1985;60(1): Farber HW, Loscalzo J. Pulmonary arterial hypertension. NEnglJ Med. 2004;351(16): Badesch DB, Abman SH, Ahearn GS, Barst RJ, McCrory DC, Simonneau G, et al. Medical therapy for pulmonary arterial hypertension: ACCP evidence-based clinical practice guidelines. Chest. 2004;126(1 Suppl):35S-62S K.J. R, Greenland S. Modern Epidemiology. Second ed. Philadelphia: Lippincott- Raven; Last JM. A Dictionary of Epidemiology. Fourth ed. New York: Oxford University Press; Greenland S, Robbins JM. Idenitifiability, exchangeability, and epidemiologic confounding. Int J Epidemiol. 1986;15: Curtis LH, Hammill BG, Eisenstein EL, Kramer JM, Anstrom KJ. Using inverse probability-weighted estimators in comparative effectiveness analyses with observational databases. Med Care Oct;45(10 Supl 2):S Diamond IR, Grant RC, Feldman BM, Tomlinson GA, Pencharz PB, Ling SC, et al. Expert beliefs regarding novel lipid based approaches to pediatric intestinal failure associated liver diseases. Submitted Hincapié CA, Tomlinson G, Rampersaud YR, Cassidy JD. Effect of chiropractic treatment on the risk of developing acute lumbar spine disc herniation: a belief elicitation study for Bayesian priors. Toronto Kinnersley N Deandrea S, Negri E, Ruggeri F. Integrating clinicians' opinions in the Bayesian meta-analysis of observational studies: the case of risk factors for falls in communitydwelling older people

166 194. Caliendo M, Kopeining S. Some practical guidance for the implementation of propensity score matching. Bonn2005 Contract No.: Seshadri R, Feldman BM, Ilowite N, Cawkwell G, Pachman LM. The role of aggressive corticosteroid therapy in patients with juvenile dermatomyositis: a propensity score analysis. Arthritis Rheum Jul 15;59(7): Whitehead J. Bayesian decision procedures with application to dose-finding studies. Int J Pharmaceut Med. 1997;11(4): Koch GG. Summary and discussion for 'Statistical issues in the pharmaceutical industry: analysis and reporting of phase III clinical trials including kinetic/dynamic analysis and Bayesian analysis. Drug Inf J. 1991;25: Matthews R. Fact versus factions: the use and abuse of subjectivity in scientific research. Cambridge: European Science and Environment Forum Streenland K, Greenland S. Monte Carlo sensitivity analysis and Bayesian analysis of smoking as an unmeasured confounder in a study of silica and lung cancer. Am J Epidemiol. 2004;160: Stukel TA, Fisher ES, Wennberg DE, Alter DA, Gottlieb DJ, Vermeulen MJ. Analysis of observational studies in the presence of treatment selection bias: effects of invasive cardiac management on AMI survival using propensity score and instrumental variable methods. JAMA Jan 17;297(3): Hackam DG, Mamdani M, Li P, Redelmeier DA. Statins and sepsis in patients with cardiovascular disease: a population-based cohort analysis. Lancet. 2006;367(9508): Fayers PM, Cuschieri A, Fielding J, Craven J, Uscinska B, Freedman LS. Sample size calculation for clinical trials: the impact of clinician beliefs. Br J Cancer Jan;82(1): Pocock S. Discussion of, 'Bayesian approaches to randomized trials.'. J R Statist Soc A. 1994;157: Box GEP. Sampling and Bayes' inference in scientific modelling and robustness. J R Stat Soc Ser A-G. 1980;143: Wijeysundera DN, Mamdani M, Laupacis A, Fleisher LA, Beattie WS, Johnson SR, et al. Clinical Evidence, Practice Guidelines, and beta-blocker Utilization Before Major Noncardiac Surgery. Circ Cardiovasc Qual Outcomes Jul 1;5(4):

167 Appendices Appendix 1. Standardized script for vitamin C example Thank you for agreeing to meet with me. The purpose of this study is to document experts beliefs about the effect of warfarin in improving survival in patients with scleroderma-associated and idiopathic pulmonary hypertension. I am a rheumatologist and a PhD Clinical Epidemiology candidate. This study will form part of my PhD thesis work. We are trying to quantify your belief, so you will be asked to express your belief as a probability and express the uncertainty around your belief. If you have any questions, I am here to assist you. This questionnaire will take approximately 10 minutes to complete. If you do not treat this patient population, or do not wish to answer the question, please feel free to decline. Your responses will be kept anonymous, and will only be used for research purposes. You have been assigned a unique identifier code which will be written at the top of your questionnaire. The code will be kept confidential. Are you agreeable to participation in this study? If no - thank the participant for their time. If yes - proceed to the example. Before we begin the questionnaire, I have an example of what will be asked of you. In the case of the example, the intervention is vitamin C and its effect on survival in SSc- PAH. This example is identical to the questionnaire you will be given except in this case the intervention is vitamin C instead of warfarin. Do you have any questions? 156

168 If yes - investigator to answer any questions. Once all questions have been addressed, then investigator will proceed. If no - investigator will give the participant the laminated example and read aloud each question. Let s start with the example. Question 1. For an average group of newly diagnosed scleroderma patients with pulmonary arterial hypertension treated with the standard of care at your institution but not treated with vitamin C, what do you believe is the probability of being alive at 3 years? Please indicate your answer by putting an X on the line. In this example, the participant thought that the probability of being alive at 3 years was 30% so he put an X on the line at 30%. Question 2. For the same average group of newly diagnosed scleroderma patients with pulmonary arterial hypertension treated with the standard of care at your institution and treated with vitamin C, what do you believe is the probability of being alive at 3 years? In this example, the participant thought that the probability of survival with treatment with vitamin C was 50% so he put an X on the line at 50%. Question 3. There may be some uncertainty around your estimate of survival. You may believe that the probability of survival could be a little lower or a little higher. Please indicate the lower boundary of your estimate for which you believe there is very little probability that the true estimate could be less than. Please indicate the higher boundary of your estimate for which you believe there is very little probability that the true estimate could be greater than. In this example, the participant put an X at 35% because he thought the probability could be as low as 35%. He put an X at 65% because he thought the probability could be as high as 65%. 157

169 Question 4. In Question 3, the participant indicated a range for the probability of survival. However, he may believe in some probabilities more than others. Using stickers that represent 5% probability, he placed the stickers in the bins to indicate his weight of belief. In this example, he had more belief in 50% and less belief in 35% and 65%. Do you have any questions? If yes investigator to answer any questions. Once all questions have been addressed, then ask the participant to proceed. If no investigator to continue. Once the stickers are placed, you can see that it creates a shape and distribution. After you place your stickers, you will be asked to take a moment and check if the shape and distribution of your sticker placement reflects what you truly believe. This questionnaire has been laminated to make it easier for you to remove and replace stickers. If you feel that the placement of stickers does not reflect what you believe, you will be able to rearrange your stickers until you are satisfied that the sticker placement reflects your true belief. Do you have any questions? If yes investigator to answer any questions. Once all questions have been addressed, investigator to give the participant the study questionnaire. If no investigator to give the participant the study questionnaire. 158

170 Appendix 2. Standardized script for belief elicitation of effect of warfarin in SSc- PAH and IPAH You have been shown an example of this belief elicitation exercise using the example of vitamin C as the intervention. The questionnaire I am giving you is exactly the same. In this case however, the intervention is warfarin and its effect on survival in SSc-PAH. I will be here to assist you if you have any questions. Do you have any questions? If yes investigator to answer any questions. Once all questions have been addressed, then investigator will give the participant the laminated questionnaire and read aloud each question. If no investigator will give the participant the laminated questionnaire and read aloud each question. Question 1. For an average group of newly diagnosed scleroderma patients with pulmonary arterial hypertension treated with the standard of care at your institution but not treated with warfarin, what do you believe is the probability of being alive at 3 years? Please indicate your answer by putting an X on the line. Question 2. For the same average group of newly diagnosed scleroderma patients with pulmonary arterial hypertension treated with the standard of care at your institution and treated with vitamin C, what do you believe is the probability of being alive at 3 years? Question 3. There may be some uncertainty around your estimate of survival. You may believe that the probability of survival could be a little lower or a little higher. Please indicate the lower boundary of your estimate for which you believe there is very little probability that the true estimate could be less than. Please indicate the higher boundary of your estimate for which you believe there is very little probability that the true estimate could be greater than. 159

171 You have indicated that the lower boundary is X% and the upper boundary is Y%. I will place one 5% sticker at each of these bins in question 4. I will give you 18 more stickers each of which represent 5% probability and totals 100%. Using these stickers, please indicate the weight of belief for your survival estimates. Do you have any questions? If yes investigator to answer any questions. Once all questions have been addressed, then investigator to continue. If no investigator to continue. Once all the stickers have been placed: Please take a moment to review the shape and distribution of your answer. Does this reflect what you truly believe? If not, please feel free to revise the placement of stickers. Do you have any questions? If yes investigator to answer any questions. Once all questions have been addressed, then investigator to ask the participant to proceed to question 5. If no investigator to ask the participant to proceed to question 5. After completion of the questionnaire: Thank you very much for sharing your beliefs with us and participating in this study. Your beliefs will be synthesized with that of your international colleagues. When the study is completed, we will you the results of the study for your interest. All of your responses will be kept anonymous. Thank you for your time. 160

172 Appendix 3. Elicited responses 161

173 162

Fundamental Clinical Trial Design

Fundamental Clinical Trial Design Design, Monitoring, and Analysis of Clinical Trials Session 1 Overview and Introduction Overview Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics, University of Washington February 17-19, 2003

More information

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis EFSA/EBTC Colloquium, 25 October 2017 Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis Julian Higgins University of Bristol 1 Introduction to concepts Standard

More information

A Brief Introduction to Bayesian Statistics

A Brief Introduction to Bayesian Statistics A Brief Introduction to Statistics David Kaplan Department of Educational Psychology Methods for Social Policy Research and, Washington, DC 2017 1 / 37 The Reverend Thomas Bayes, 1701 1761 2 / 37 Pierre-Simon

More information

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method

Lecture Outline. Biost 590: Statistical Consulting. Stages of Scientific Studies. Scientific Method Biost 590: Statistical Consulting Statistical Classification of Scientific Studies; Approach to Consulting Lecture Outline Statistical Classification of Scientific Studies Statistical Tasks Approach to

More information

1 The conceptual underpinnings of statistical power

1 The conceptual underpinnings of statistical power 1 The conceptual underpinnings of statistical power The importance of statistical power As currently practiced in the social and health sciences, inferential statistics rest solidly upon two pillars: statistical

More information

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering

Meta-Analysis. Zifei Liu. Biological and Agricultural Engineering Meta-Analysis Zifei Liu What is a meta-analysis; why perform a metaanalysis? How a meta-analysis work some basic concepts and principles Steps of Meta-analysis Cautions on meta-analysis 2 What is Meta-analysis

More information

School of Population and Public Health SPPH 503 Epidemiologic methods II January to April 2019

School of Population and Public Health SPPH 503 Epidemiologic methods II January to April 2019 School of Population and Public Health SPPH 503 Epidemiologic methods II January to April 2019 Time: Tuesday, 1330 1630 Location: School of Population and Public Health, UBC Course description Students

More information

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002

Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002 DETAILED COURSE OUTLINE Epidemiologic Methods I & II Epidem 201AB Winter & Spring 2002 Hal Morgenstern, Ph.D. Department of Epidemiology UCLA School of Public Health Page 1 I. THE NATURE OF EPIDEMIOLOGIC

More information

The Accuracy of Administrative Data Diagnoses of Systemic Autoimmune Rheumatic Diseases

The Accuracy of Administrative Data Diagnoses of Systemic Autoimmune Rheumatic Diseases The Accuracy of Administrative Data Diagnoses of Systemic Autoimmune Rheumatic Diseases SASHA BERNATSKY, TINA LINEHAN, and JOHN G. HANLY ABSTRACT. Objective. To examine the validity of case definitions

More information

04/12/2014. Research Methods in Psychology. Chapter 6: Independent Groups Designs. What is your ideas? Testing

04/12/2014. Research Methods in Psychology. Chapter 6: Independent Groups Designs. What is your ideas? Testing Research Methods in Psychology Chapter 6: Independent Groups Designs 1 Why Psychologists Conduct Experiments? What is your ideas? 2 Why Psychologists Conduct Experiments? Testing Hypotheses derived from

More information

Bayesian and Frequentist Approaches

Bayesian and Frequentist Approaches Bayesian and Frequentist Approaches G. Jogesh Babu Penn State University http://sites.stat.psu.edu/ babu http://astrostatistics.psu.edu All models are wrong But some are useful George E. P. Box (son-in-law

More information

Analysis A step in the research process that involves describing and then making inferences based on a set of data.

Analysis A step in the research process that involves describing and then making inferences based on a set of data. 1 Appendix 1:. Definitions of important terms. Additionality The difference between the value of an outcome after the implementation of a policy, and its value in a counterfactual scenario in which the

More information

MS&E 226: Small Data

MS&E 226: Small Data MS&E 226: Small Data Lecture 10: Introduction to inference (v2) Ramesh Johari ramesh.johari@stanford.edu 1 / 17 What is inference? 2 / 17 Where did our data come from? Recall our sample is: Y, the vector

More information

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University Biases in clinical research Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University Learning objectives Describe the threats to causal inferences in clinical studies Understand the role of

More information

Challenges of Observational and Retrospective Studies

Challenges of Observational and Retrospective Studies Challenges of Observational and Retrospective Studies Kyoungmi Kim, Ph.D. March 8, 2017 This seminar is jointly supported by the following NIH-funded centers: Background There are several methods in which

More information

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference

Lecture Outline Biost 517 Applied Biostatistics I. Statistical Goals of Studies Role of Statistical Inference Lecture Outline Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Statistical Inference Role of Statistical Inference Hierarchy of Experimental

More information

BIOSTATISTICAL METHODS

BIOSTATISTICAL METHODS BIOSTATISTICAL METHODS FOR TRANSLATIONAL & CLINICAL RESEARCH PROPENSITY SCORE Confounding Definition: A situation in which the effect or association between an exposure (a predictor or risk factor) and

More information

Lecture 9 Internal Validity

Lecture 9 Internal Validity Lecture 9 Internal Validity Objectives Internal Validity Threats to Internal Validity Causality Bayesian Networks Internal validity The extent to which the hypothesized relationship between 2 or more variables

More information

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research

Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy Research 2012 CCPRC Meeting Methodology Presession Workshop October 23, 2012, 2:00-5:00 p.m. Propensity Score Methods for Estimating Causality in the Absence of Random Assignment: Applications for Child Care Policy

More information

Incorporating Clinical Information into the Label

Incorporating Clinical Information into the Label ECC Population Health Group LLC expertise-driven consulting in global maternal-child health & pharmacoepidemiology Incorporating Clinical Information into the Label Labels without Categories: A Workshop

More information

Introduction & Basics

Introduction & Basics CHAPTER 1 Introduction & Basics 1.1 Statistics the Field... 1 1.2 Probability Distributions... 4 1.3 Study Design Features... 9 1.4 Descriptive Statistics... 13 1.5 Inferential Statistics... 16 1.6 Summary...

More information

Appraising the Literature Overview of Study Designs

Appraising the Literature Overview of Study Designs Chapter 5 Appraising the Literature Overview of Study Designs Barbara M. Sullivan, PhD Department of Research, NUHS Jerrilyn A. Cambron, PhD, DC Department of Researach, NUHS EBP@NUHS Ch 5 - Overview of

More information

PubH 7405: REGRESSION ANALYSIS. Propensity Score

PubH 7405: REGRESSION ANALYSIS. Propensity Score PubH 7405: REGRESSION ANALYSIS Propensity Score INTRODUCTION: There is a growing interest in using observational (or nonrandomized) studies to estimate the effects of treatments on outcomes. In observational

More information

Structural Approach to Bias in Meta-analyses

Structural Approach to Bias in Meta-analyses Original Article Received 26 July 2011, Revised 22 November 2011, Accepted 12 December 2011 Published online 2 February 2012 in Wiley Online Library (wileyonlinelibrary.com) DOI: 10.1002/jrsm.52 Structural

More information

A cost effectiveness analysis of treatment options for methotrexate-naive rheumatoid arthritis Choi H K, Seeger J D, Kuntz K M

A cost effectiveness analysis of treatment options for methotrexate-naive rheumatoid arthritis Choi H K, Seeger J D, Kuntz K M A cost effectiveness analysis of treatment options for methotrexate-naive rheumatoid arthritis Choi H K, Seeger J D, Kuntz K M Record Status This is a critical abstract of an economic evaluation that meets

More information

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach

Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach University of South Florida Scholar Commons Graduate Theses and Dissertations Graduate School November 2015 Analysis of Rheumatoid Arthritis Data using Logistic Regression and Penalized Approach Wei Chen

More information

Lecture 4: Research Approaches

Lecture 4: Research Approaches Lecture 4: Research Approaches Lecture Objectives Theories in research Research design approaches ú Experimental vs. non-experimental ú Cross-sectional and longitudinal ú Descriptive approaches How to

More information

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology ISC- GRADE XI HUMANITIES (2018-19) PSYCHOLOGY Chapter 2- Methods of Psychology OUTLINE OF THE CHAPTER (i) Scientific Methods in Psychology -observation, case study, surveys, psychological tests, experimentation

More information

Glossary. Ó 2010 John Wiley & Sons, Ltd

Glossary. Ó 2010 John Wiley & Sons, Ltd Glossary The majority of the definitions within this glossary are based on, but are only a selection from, the comprehensive list provided by Day (2007) in the Dictionary of Clinical Trials. We have added

More information

MCQ Course in Pediatrics Al Yamamah Hospital June Dr M A Maleque Molla, FRCP, FRCPCH

MCQ Course in Pediatrics Al Yamamah Hospital June Dr M A Maleque Molla, FRCP, FRCPCH MCQ Course in Pediatrics Al Yamamah Hospital 10-11 June Dr M A Maleque Molla, FRCP, FRCPCH Q1. Following statements are true in the steps of evidence based medicine except ; a) Convert the need for information

More information

In this second module in the clinical trials series, we will focus on design considerations for Phase III clinical trials. Phase III clinical trials

In this second module in the clinical trials series, we will focus on design considerations for Phase III clinical trials. Phase III clinical trials In this second module in the clinical trials series, we will focus on design considerations for Phase III clinical trials. Phase III clinical trials are comparative, large scale studies that typically

More information

Critical Appraisal Series

Critical Appraisal Series Definition for therapeutic study Terms Definitions Study design section Observational descriptive studies Observational analytical studies Experimental studies Pragmatic trial Cluster trial Researcher

More information

Annual Rheumatology & Therapeutics Review for Organizations & Societies

Annual Rheumatology & Therapeutics Review for Organizations & Societies Annual Rheumatology & Therapeutics Review for Organizations & Societies Comparative Effectiveness Studies of Biologics Learning Objectives Understand the motivation for comparative effectiveness research

More information

For Rheumatoid Arthritis

For Rheumatoid Arthritis For Rheumatoid Arthritis APRIL 2017 NOTICE: On April 14, 2017 the FDA issued a complete response letter for baricitinib indicating that the FDA is unable to approve the application in its current form

More information

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013

Evidence-Based Medicine Journal Club. A Primer in Statistics, Study Design, and Epidemiology. August, 2013 Evidence-Based Medicine Journal Club A Primer in Statistics, Study Design, and Epidemiology August, 2013 Rationale for EBM Conscientious, explicit, and judicious use Beyond clinical experience and physiologic

More information

Common Statistical Issues in Biomedical Research

Common Statistical Issues in Biomedical Research Common Statistical Issues in Biomedical Research Howard Cabral, Ph.D., M.P.H. Boston University CTSI Boston University School of Public Health Department of Biostatistics May 15, 2013 1 Overview of Basic

More information

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University

Biases in clinical research. Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University Biases in clinical research Seungho Ryu, MD, PhD Kanguk Samsung Hospital, Sungkyunkwan University Learning objectives Describe the threats to causal inferences in clinical studies Understand the role of

More information

Purpose. Study Designs. Objectives. Observational Studies. Analytic Studies

Purpose. Study Designs. Objectives. Observational Studies. Analytic Studies Purpose Study Designs H.S. Teitelbaum, DO, PhD, MPH, FAOCOPM AOCOPM Annual Meeting Introduce notions of study design Clarify common terminology used with description and interpretation of information collected

More information

A Case Study: Two-sample categorical data

A Case Study: Two-sample categorical data A Case Study: Two-sample categorical data Patrick Breheny January 31 Patrick Breheny BST 701: Bayesian Modeling in Biostatistics 1/43 Introduction Model specification Continuous vs. mixture priors Choice

More information

Propensity Score Methods for Causal Inference with the PSMATCH Procedure

Propensity Score Methods for Causal Inference with the PSMATCH Procedure Paper SAS332-2017 Propensity Score Methods for Causal Inference with the PSMATCH Procedure Yang Yuan, Yiu-Fai Yung, and Maura Stokes, SAS Institute Inc. Abstract In a randomized study, subjects are randomly

More information

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Cochrane Pregnancy and Childbirth Group Methodological Guidelines Cochrane Pregnancy and Childbirth Group Methodological Guidelines [Prepared by Simon Gates: July 2009, updated July 2012] These guidelines are intended to aid quality and consistency across the reviews

More information

Cost-effectiveness of apremilast (Otezla )

Cost-effectiveness of apremilast (Otezla ) Cost-effectiveness of apremilast (Otezla ) alone or in combination with Disease Modifying Antirheumatic Drugs (DMARDs) for the treatment of active psoriatic arthritis in adult patients who have had an

More information

Introduction to Observational Studies. Jane Pinelis

Introduction to Observational Studies. Jane Pinelis Introduction to Observational Studies Jane Pinelis 22 March 2018 Outline Motivating example Observational studies vs. randomized experiments Observational studies: basics Some adjustment strategies Matching

More information

ISPOR Task Force Report: ITC & NMA Study Questionnaire

ISPOR Task Force Report: ITC & NMA Study Questionnaire INDIRECT TREATMENT COMPARISON / NETWORK META-ANALYSIS STUDY QUESTIONNAIRE TO ASSESS RELEVANCE AND CREDIBILITY TO INFORM HEALTHCARE DECISION-MAKING: AN ISPOR-AMCP-NPC GOOD PRACTICE TASK FORCE REPORT DRAFT

More information

University of Wollongong. Research Online. Australian Health Services Research Institute

University of Wollongong. Research Online. Australian Health Services Research Institute University of Wollongong Research Online Australian Health Services Research Institute Faculty of Business 2011 Measurement of error Janet E. Sansoni University of Wollongong, jans@uow.edu.au Publication

More information

Abatacept (Orencia) for active rheumatoid arthritis. August 2009

Abatacept (Orencia) for active rheumatoid arthritis. August 2009 Abatacept (Orencia) for active rheumatoid arthritis August 2009 This technology summary is based on information available at the time of research and a limited literature search. It is not intended to

More information

Bayesian Adjustments for Misclassified Data. Lawrence Joseph

Bayesian Adjustments for Misclassified Data. Lawrence Joseph Bayesian Adjustments for Misclassified Data Lawrence Joseph Bayesian Adjustments for Misclassified Data Lawrence Joseph Marcel Behr, Patrick Bélisle, Sasha Bernatsky, Nandini Dendukuri, Theresa Gyorkos,

More information

The ROBINS-I tool is reproduced from riskofbias.info with the permission of the authors. The tool should not be modified for use.

The ROBINS-I tool is reproduced from riskofbias.info with the permission of the authors. The tool should not be modified for use. Table A. The Risk Of Bias In Non-romized Studies of Interventions (ROBINS-I) I) assessment tool The ROBINS-I tool is reproduced from riskofbias.info with the permission of the auths. The tool should not

More information

Live WebEx meeting agenda

Live WebEx meeting agenda 10:00am 10:30am Using OpenMeta[Analyst] to extract quantitative data from published literature Live WebEx meeting agenda August 25, 10:00am-12:00pm ET 10:30am 11:20am Lecture (this will be recorded) 11:20am

More information

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc.

Chapter 23. Inference About Means. Copyright 2010 Pearson Education, Inc. Chapter 23 Inference About Means Copyright 2010 Pearson Education, Inc. Getting Started Now that we know how to create confidence intervals and test hypotheses about proportions, it d be nice to be able

More information

Comparing treatments evaluated in studies forming disconnected networks of evidence: A review of methods

Comparing treatments evaluated in studies forming disconnected networks of evidence: A review of methods Comparing treatments evaluated in studies forming disconnected networks of evidence: A review of methods John W Stevens Reader in Decision Science University of Sheffield EFPSI European Statistical Meeting

More information

Process of Designing & Implementing a Research Project

Process of Designing & Implementing a Research Project Research Question, Hypothesis, Variables Dr. R.M. Pandey Prof & Head Department of Biostatistics A.I.I.M.S., New Delhi rmpandey@yahoo.com Process of Designing & Implementing a Research Project 2 HYPOTHESIS

More information

Comparison And Application Of Methods To Address Confounding By Indication In Non- Randomized Clinical Studies

Comparison And Application Of Methods To Address Confounding By Indication In Non- Randomized Clinical Studies University of Massachusetts Amherst ScholarWorks@UMass Amherst Masters Theses 1911 - February 2014 Dissertations and Theses 2013 Comparison And Application Of Methods To Address Confounding By Indication

More information

Journal of Clinical and Translational Research special issue on negative results /jctres S2.007

Journal of Clinical and Translational Research special issue on negative results /jctres S2.007 Making null effects informative: statistical techniques and inferential frameworks Christopher Harms 1,2 & Daniël Lakens 2 1 Department of Psychology, University of Bonn, Germany 2 Human Technology Interaction

More information

Bayesian Adjustments for Misclassified Data. Lawrence Joseph

Bayesian Adjustments for Misclassified Data. Lawrence Joseph Bayesian Adjustments for Misclassified Data Lawrence Joseph Marcel Behr, Patrick Bélisle, Sasha Bernatsky, Nandini Dendukuri, Theresa Gyorkos, Martin Ladouceur, Elham Rahme, Kevin Schwartzman, Allison

More information

Clinical problems and choice of study designs

Clinical problems and choice of study designs Evidence Based Dentistry Clinical problems and choice of study designs Asbjørn Jokstad University of Oslo, Norway Nov 21 2001 1 Manipulation with intervention Yes Experimental study No Non-experimental

More information

Public Health Masters (MPH) Competencies and Coursework by Major

Public Health Masters (MPH) Competencies and Coursework by Major I. Master of Science of Public Health A. Core Competencies B. Major Specific Competencies i. Professional Health Education ii. iii. iv. Family Activity Physical Activity Behavioral, Social, and Community

More information

Statistical Methods and Reasoning for the Clinical Sciences

Statistical Methods and Reasoning for the Clinical Sciences Statistical Methods and Reasoning for the Clinical Sciences Evidence-Based Practice Eiki B. Satake, PhD Contents Preface Introduction to Evidence-Based Statistics: Philosophical Foundation and Preliminaries

More information

Center for Evidence-based Policy

Center for Evidence-based Policy P&T Committee Brief Targeted Immune Modulators: Comparative Drug Class Review Alison Little, MD Center for Evidence-based Policy Oregon Health & Science University 3455 SW US Veterans Hospital Road, SN-4N

More information

Evidence Based Medicine

Evidence Based Medicine Course Goals Goals 1. Understand basic concepts of evidence based medicine (EBM) and how EBM facilitates optimal patient care. 2. Develop a basic understanding of how clinical research studies are designed

More information

Lecture Outline Biost 517 Applied Biostatistics I

Lecture Outline Biost 517 Applied Biostatistics I Lecture Outline Biost 517 Applied Biostatistics I Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics University of Washington Lecture 2: Statistical Classification of Scientific Questions Types of

More information

Designing Randomized Clinical Trials for Rare. Diseases. Lusine Abrahamyan

Designing Randomized Clinical Trials for Rare. Diseases. Lusine Abrahamyan Designing Randomized Clinical Trials for Rare Diseases by Lusine Abrahamyan A thesis submitted in conformity with the requirements for the degree of Doctor of Philosophy Department of Health Policy, Management

More information

The RoB 2.0 tool (individually randomized, cross-over trials)

The RoB 2.0 tool (individually randomized, cross-over trials) The RoB 2.0 tool (individually randomized, cross-over trials) Study design Randomized parallel group trial Cluster-randomized trial Randomized cross-over or other matched design Specify which outcome is

More information

GRADE. Grading of Recommendations Assessment, Development and Evaluation. British Association of Dermatologists April 2018

GRADE. Grading of Recommendations Assessment, Development and Evaluation. British Association of Dermatologists April 2018 GRADE Grading of Recommendations Assessment, Development and Evaluation British Association of Dermatologists April 2018 Previous grading system Level of evidence Strength of recommendation Level of evidence

More information

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine

Confounding by indication developments in matching, and instrumental variable methods. Richard Grieve London School of Hygiene and Tropical Medicine Confounding by indication developments in matching, and instrumental variable methods Richard Grieve London School of Hygiene and Tropical Medicine 1 Outline 1. Causal inference and confounding 2. Genetic

More information

Observational Study Designs. Review. Today. Measures of disease occurrence. Cohort Studies

Observational Study Designs. Review. Today. Measures of disease occurrence. Cohort Studies Observational Study Designs Denise Boudreau, PhD Center for Health Studies Group Health Cooperative Today Review cohort studies Case-control studies Design Identifying cases and controls Measuring exposure

More information

About Reading Scientific Studies

About Reading Scientific Studies About Reading Scientific Studies TABLE OF CONTENTS About Reading Scientific Studies... 1 Why are these skills important?... 1 Create a Checklist... 1 Introduction... 1 Abstract... 1 Background... 2 Methods...

More information

CHECKLIST FOR EVALUATING A RESEARCH REPORT Provided by Dr. Blevins

CHECKLIST FOR EVALUATING A RESEARCH REPORT Provided by Dr. Blevins CHECKLIST FOR EVALUATING A RESEARCH REPORT Provided by Dr. Blevins 1. The Title a. Is it clear and concise? b. Does it promise no more than the study can provide? INTRODUCTION 2. The Problem a. It is clearly

More information

Systematic reviews: From evidence to recommendation. Marcel Dijkers, PhD, FACRM Icahn School of Medicine at Mount Sinai

Systematic reviews: From evidence to recommendation. Marcel Dijkers, PhD, FACRM Icahn School of Medicine at Mount Sinai Systematic reviews: From evidence to recommendation Session 2 - June 18, 2014 Going beyond design, going beyond intervention: The American Academy of Neurology (AAN) Clinical Practice Guideline process

More information

Investigating the robustness of the nonparametric Levene test with more than two groups

Investigating the robustness of the nonparametric Levene test with more than two groups Psicológica (2014), 35, 361-383. Investigating the robustness of the nonparametric Levene test with more than two groups David W. Nordstokke * and S. Mitchell Colp University of Calgary, Canada Testing

More information

Unit 1 Exploring and Understanding Data

Unit 1 Exploring and Understanding Data Unit 1 Exploring and Understanding Data Area Principle Bar Chart Boxplot Conditional Distribution Dotplot Empirical Rule Five Number Summary Frequency Distribution Frequency Polygon Histogram Interquartile

More information

Guidance Document for Claims Based on Non-Inferiority Trials

Guidance Document for Claims Based on Non-Inferiority Trials Guidance Document for Claims Based on Non-Inferiority Trials February 2013 1 Non-Inferiority Trials Checklist Item No Checklist Item (clients can use this tool to help make decisions regarding use of non-inferiority

More information

The Hospital for Sick Children Technology Assessment at SickKids (TASK)

The Hospital for Sick Children Technology Assessment at SickKids (TASK) The Hospital for Sick Children Technology Assessment at SickKids (TASK) THE USE OF BIOLOGIC RESPONSE MODIFIERS IN POLYARTICULAR-COURSE JUVENILE IDIOPATHIC ARTHRITIS Report No. 2010-01 Date: January 11,

More information

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when.

Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. INTRO TO RESEARCH METHODS: Empirical Knowledge: based on observations. Answer questions why, whom, how, and when. Experimental research: treatments are given for the purpose of research. Experimental group

More information

Fixed Effect Combining

Fixed Effect Combining Meta-Analysis Workshop (part 2) Michael LaValley December 12 th 2014 Villanova University Fixed Effect Combining Each study i provides an effect size estimate d i of the population value For the inverse

More information

Reflection Questions for Math 58B

Reflection Questions for Math 58B Reflection Questions for Math 58B Johanna Hardin Spring 2017 Chapter 1, Section 1 binomial probabilities 1. What is a p-value? 2. What is the difference between a one- and two-sided hypothesis? 3. What

More information

Models for potentially biased evidence in meta-analysis using empirically based priors

Models for potentially biased evidence in meta-analysis using empirically based priors Models for potentially biased evidence in meta-analysis using empirically based priors Nicky Welton Thanks to: Tony Ades, John Carlin, Doug Altman, Jonathan Sterne, Ross Harris RSS Avon Local Group Meeting,

More information

Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials

Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials Full title: A likelihood-based approach to early stopping in single arm phase II cancer clinical trials Short title: Likelihood-based early stopping design in single arm phase II studies Elizabeth Garrett-Mayer,

More information

INTERPRETATION OF STUDY FINDINGS: PART I. Julie E. Buring, ScD Harvard School of Public Health Boston, MA

INTERPRETATION OF STUDY FINDINGS: PART I. Julie E. Buring, ScD Harvard School of Public Health Boston, MA INTERPRETATION OF STUDY FINDINGS: PART I Julie E. Buring, ScD Harvard School of Public Health Boston, MA Drawing Conclusions TRUTH IN THE UNIVERSE Infer TRUTH IN THE STUDY Infer FINDINGS IN THE STUDY Designing

More information

Types of Data. Systematic Reviews: Data Synthesis Professor Jodie Dodd 4/12/2014. Acknowledgements: Emily Bain Australasian Cochrane Centre

Types of Data. Systematic Reviews: Data Synthesis Professor Jodie Dodd 4/12/2014. Acknowledgements: Emily Bain Australasian Cochrane Centre Early Nutrition Workshop, December 2014 Systematic Reviews: Data Synthesis Professor Jodie Dodd 1 Types of Data Acknowledgements: Emily Bain Australasian Cochrane Centre 2 1 What are dichotomous outcomes?

More information

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions.

EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. Greenland/Arah, Epi 200C Sp 2000 1 of 6 EPI 200C Final, June 4 th, 2009 This exam includes 24 questions. INSTRUCTIONS: Write all answers on the answer sheets supplied; PRINT YOUR NAME and STUDENT ID NUMBER

More information

Evidence-Based Medicine and Publication Bias Desmond Thompson Merck & Co.

Evidence-Based Medicine and Publication Bias Desmond Thompson Merck & Co. Evidence-Based Medicine and Publication Bias Desmond Thompson Merck & Co. Meta-Analysis Defined A meta-analysis is: the statistical combination of two or more separate studies In other words: overview,

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

Chapter Three Research Methodology

Chapter Three Research Methodology Chapter Three Research Methodology Research Methods is a systematic and principled way of obtaining evidence (data, information) for solving health care problems. 1 Dr. Mohammed ALnaif METHODS AND KNOWLEDGE

More information

GLOSSARY OF GENERAL TERMS

GLOSSARY OF GENERAL TERMS GLOSSARY OF GENERAL TERMS Absolute risk reduction Absolute risk reduction (ARR) is the difference between the event rate in the control group (CER) and the event rate in the treated group (EER). ARR =

More information

Funnelling Used to describe a process of narrowing down of focus within a literature review. So, the writer begins with a broad discussion providing b

Funnelling Used to describe a process of narrowing down of focus within a literature review. So, the writer begins with a broad discussion providing b Accidental sampling A lesser-used term for convenience sampling. Action research An approach that challenges the traditional conception of the researcher as separate from the real world. It is associated

More information

CONSORT 2010 checklist of information to include when reporting a randomised trial*

CONSORT 2010 checklist of information to include when reporting a randomised trial* CONSORT 2010 checklist of information to include when reporting a randomised trial* Section/Topic Title and abstract Introduction Background and objectives Item No Checklist item 1a Identification as a

More information

Systematic Reviews and Meta- Analysis in Kidney Transplantation

Systematic Reviews and Meta- Analysis in Kidney Transplantation Systematic Reviews and Meta- Analysis in Kidney Transplantation Greg Knoll MD MSc Associate Professor of Medicine Medical Director, Kidney Transplantation University of Ottawa and The Ottawa Hospital KRESCENT

More information

Chapter 17 Sensitivity Analysis and Model Validation

Chapter 17 Sensitivity Analysis and Model Validation Chapter 17 Sensitivity Analysis and Model Validation Justin D. Salciccioli, Yves Crutain, Matthieu Komorowski and Dominic C. Marshall Learning Objectives Appreciate that all models possess inherent limitations

More information

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES

11/18/2013. Correlational Research. Correlational Designs. Why Use a Correlational Design? CORRELATIONAL RESEARCH STUDIES Correlational Research Correlational Designs Correlational research is used to describe the relationship between two or more naturally occurring variables. Is age related to political conservativism? Are

More information

Evaluating Risk Assessment: Finding a Methodology that Supports your Agenda There are few issues facing criminal justice decision makers generating

Evaluating Risk Assessment: Finding a Methodology that Supports your Agenda There are few issues facing criminal justice decision makers generating Evaluating Risk Assessment: Finding a Methodology that Supports your Agenda There are few issues facing criminal justice decision makers generating more interest than fairness and bias with risk assessments.

More information

INTERNAL VALIDITY, BIAS AND CONFOUNDING

INTERNAL VALIDITY, BIAS AND CONFOUNDING OCW Epidemiology and Biostatistics, 2010 J. Forrester, PhD Tufts University School of Medicine October 6, 2010 INTERNAL VALIDITY, BIAS AND CONFOUNDING Learning objectives for this session: 1) Understand

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

Design of Experiments & Introduction to Research

Design of Experiments & Introduction to Research Design of Experiments & Introduction to Research 1 Design of Experiments Introduction to Research Definition and Purpose Scientific Method Research Project Paradigm Structure of a Research Project Types

More information

Chapter 02. Basic Research Methodology

Chapter 02. Basic Research Methodology Chapter 02 Basic Research Methodology Definition RESEARCH Research is a quest for knowledge through diligent search or investigation or experimentation aimed at the discovery and interpretation of new

More information

Practical Bayesian Design and Analysis for Drug and Device Clinical Trials

Practical Bayesian Design and Analysis for Drug and Device Clinical Trials Practical Bayesian Design and Analysis for Drug and Device Clinical Trials p. 1/2 Practical Bayesian Design and Analysis for Drug and Device Clinical Trials Brian P. Hobbs Plan B Advisor: Bradley P. Carlin

More information

Are the likely benefits worth the potential harms and costs? From McMaster EBCP Workshop/Duke University Medical Center

Are the likely benefits worth the potential harms and costs? From McMaster EBCP Workshop/Duke University Medical Center CRITICAL REVIEW FORM FOR THERAPY STUDY Did experimental and control groups begin the study with a similar prognosis? Were patients randomized? Was randomization concealed? Were patients analyzed in the

More information

Biost 590: Statistical Consulting

Biost 590: Statistical Consulting Biost 590: Statistical Consulting Statistical Classification of Scientific Questions October 3, 2008 Scott S. Emerson, M.D., Ph.D. Professor of Biostatistics, University of Washington 2000, Scott S. Emerson,

More information

GUIDELINE COMPARATORS & COMPARISONS:

GUIDELINE COMPARATORS & COMPARISONS: GUIDELINE COMPARATORS & COMPARISONS: Direct and indirect comparisons Adapted version (2015) based on COMPARATORS & COMPARISONS: Direct and indirect comparisons - February 2013 The primary objective of

More information

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Supplement 2. Use of Directed Acyclic Graphs (DAGs) Supplement 2. Use of Directed Acyclic Graphs (DAGs) Abstract This supplement describes how counterfactual theory is used to define causal effects and the conditions in which observed data can be used to

More information