Table of Contents. Clinical Outcome Assessments (COAs): A Conceptual Foundation

Similar documents
in alphabetical order:

MEASURING PATIENT AND OBSERVER-REPORTED OUTCOMES (PROS AND OBSROS) IN RARE DISEASE CLINICAL TRIALS - EMERGING GOOD PRACTICES TASK FORCE

Performance Outcome Measures: A Regulatory Perspective

Clinical Trial Endpoints from Mobile Technology: Regulatory Considerations September 29, 2016

An Approach to Outcome Measure Development: A Regulatory Perspective

Clinician-reported Outcomes (ClinROs), Concepts and Development

Approach to Clinical Trials in Drug Development : Eosinophilic Esophagitis (EoE) Outline. Outline

Study Endpoint Considerations: Final PRO Guidance and Beyond

NHC Webinar Series on Clinical Outcome Assessments. Patient-Reported Outcomes and Patient-Centered Outcomes: Is There a Difference?

Patient Reported Outcomes

DRAFT (Final) Concept Paper On choosing appropriate estimands and defining sensitivity analyses in confirmatory clinical trials

September 30, Eric BASTINGS, MD. Acting Director Division of Neurology Products (DNP) Center for Drug Evaluation and Research (CDER)

Patient-Reported Outcomes (PROs) and the Food and Drug Administration Draft Guidance. Donald L. Patrick University of Washington

ISA 540, Auditing Accounting Estimates, Including Fair Value Accounting Estimates, and Related Disclosures Issues and Task Force Recommendations

European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)

Amyotrophic Lateral Sclerosis: Developing Drugs for Treatment Guidance for Industry

INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION CONTENTS

Donald L. Patrick PhD, MSPH, Laurie B. Burke RPh, MPH, Chad J. Gwaltney PhD, Nancy Kline Leidy PhD, Mona L. Martin RN, MPA, Lena Ring PhD

Re: Docket No. FDA D Presenting Risk Information in Prescription Drug and Medical Device Promotion

Task Force Background Donald Patrick. Appreciation to Reviewers. ISPOR PRO GRP Task Force Reports

Addressing Content Validity of PRO Measures: The Unique Case of Rare Diseases

Assurance Engagements Other than Audits or Review of Historical Financial Statements

Considerations for requiring subjects to provide a response to electronic patient-reported outcome instruments

Advancing Use of Patient Preference Information as Scientific Evidence in Medical Product Evaluation Day 2

Fiona Campbell. ISA 315 (Revised) Responding to Comments to the Exposure Draft. Deputy Chair of the IAASB and Chair of the ISA 315 Task Force

BACKGROUND + GENERAL COMMENTS

Actigraphy-based Clinical Study Endpoints: A Regulatory Perspective

10.2 Summary of the Votes and Considerations for Policy

Health and Quality of Life Outcomes BioMed Central

Guidelines for Making Changes to DSM-V Revised 10/21/09 Kenneth Kendler, David Kupfer, William Narrow, Katharine Phillips, Jan Fawcett,

Patient Reported Outcomes (PROs) Tools for Measurement of Health Related Quality of Life

Update on CDER s Drug Development Tool Qualification Program

LEVEL ONE MODULE EXAM PART TWO [Reliability Coefficients CAPs & CATs Patient Reported Outcomes Assessments Disablement Model]

IAASB Main Agenda (September 2005) Page Agenda Item. Analysis of ISA 330 and Mapping Document

VERDIN MANUSCRIPT REVIEW HISTORY REVISION NOTES FROM AUTHORS (ROUND 2)

Reflection paper on assessment of cardiovascular risk of medicinal products for the treatment of cardiovascular and metabolic diseases Draft

COMMISSION OF THE EUROPEAN COMMUNITIES REPORT FROM THE COMMISSION TO THE COUNCIL AND THE EUROPEAN PARLIAMENT

Basis for Conclusions: ISA 230 (Redrafted), Audit Documentation

IAASB Exposure Draft, Proposed ISAE 3000 (Revised), Assurance Engagements Other Than Audits or Reviews of Historical Financial Information

Oncology Drug Development: A Regulatory Perspective Faculty Presenter

Endpoints in a treatment trial in NMO: Clinician s view. Anu Jacob Consultant Neurologist The Walton Centre, Liverpool,UK

D.L. Hart Memorial Outcomes Research Grant Program Details

Assurance Engagements Other Than Audits or Reviews of Historical Financial Information

Developing a Pediatric COA Measurement Strategy A Case Study in Asthma

A response by Servier to the Statement of Reasons provided by NICE

95% 2.5% 2.5% +2SD 95% of data will 95% be within of data will 1.96 be within standard deviations 1.96 of sample mean

ISPOR Task Force Report: ITC & NMA Study Questionnaire

ICH Topic S1C(R2) Dose Selection for Carcinogenicity Studies of Pharmaceuticals. Step 5

Reflection paper on assessment of cardiovascular safety profile of medicinal products

Howard Sheth Model. The model claims that a person s purchase decision is often influenced by more than one individuals.

Chapter-2 RESEARCH DESIGN

Lessons learned along the path to qualification of an IBS outcome measure*

NOTE FOR GUIDANCE ON TOXICOKINETICS: THE ASSESSMENT OF SYSTEMIC EXPOSURE IN TOXICITY STUDIES S3A

Competency Rubric Bank for the Sciences (CRBS)

Transformation of the Personal Capability Assessment

Writing Measurable Educational Objectives

DRAFT GUIDANCE. This guidance document is being distributed for comment purposes only.

IAASB Main Agenda (March 2005) Page Agenda Item. DRAFT EXPLANATORY MEMORANDUM ISA 701 and ISA 702 Exposure Drafts February 2005

July 7, Dockets Management Branch (HFA-305) Food and Drug Administration 5630 Fishers Lane, Rm Rockville, MD 20852

Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews

FDA SUMMARY OF SAFETY AND EFFECTIVENESS DATA (SSED) CLINICAL SECTION CHECKLIST OFFICE OF DEVICE EVALUATION

Technology appraisal guidance Published: 6 December 2017 nice.org.uk/guidance/ta493

DOSE SELECTION FOR CARCINOGENICITY STUDIES OF PHARMACEUTICALS *)

College of American Pathologists

4 Diagnostic Tests and Measures of Agreement

Submitted to: Re: Comments on CMS Proposals for Patient Condition Groups and Care Episode Groups

Patient-Reported Outcomes to Support Medical Product Labeling Claims: FDA Perspective

Concepts and Case Study Template for Surrogate Endpoints Workshop. Lisa M. McShane, Ph.D. Biometric Research Program National Cancer Institute

International Conference on Harmonisation of Technical Requirements for Registration of Pharmaceuticals for Human Use

Two-stage Methods to Implement and Analyze the Biomarker-guided Clinical Trail Designs in the Presence of Biomarker Misclassification

Scope of Practice for the Diagnostic Ultrasound Professional

The Regression-Discontinuity Design

Regulatory Challenges across Dementia Subtypes European View

Understanding noninferiority trials

Auditing Standards and Practices Council

NATIONAL QUALITY FORUM

Geriatric Certification. Curriculum

ICH E9(R1): terminology, taxonomy, systematic approach. Reflections on trimmed means and undilution. Jay P. Siegel, MD April, 2018 Philadelphia, PA

Adverse Events Monitoring (aka Pharmacovigilance)

Benefit - Risk Analysis for Oncology Clinical Trials

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

PLANNING THE RESEARCH PROJECT

E11(R1) Addendum to E11: Clinical Investigation of Medicinal Products in the Pediatric Population Step4

Update on the Clinical Outcome Assessment Qualification Program and COA Compendium

ICH E9 (R1) Addendum on Estimands and Sensitivity Analysis

Systematic reviews: From evidence to recommendation. Marcel Dijkers, PhD, FACRM Icahn School of Medicine at Mount Sinai

CRITICALLY APPRAISED PAPER (CAP)

Corporate Medical Policy

Functional Activity and Mobility

Health authorities are asking for PRO assessment in dossiers From rejection to recognition of PRO

PGEU GPUE. Pharmaceutical Group of European Union Groupement Pharmaceutique de l Union Européenne

HOW ARE NON-INFERIORITY MARGINS SELECTED IN NON- INFERIORITY TRIALS AND HOW DO THEY VARY WITHIN AND ACROSS FOUR MAJOR DISEASE DOMAINS?

How the ICH E9 addendum around estimands may impact our clinical trials

Technical Specifications

TRANSLATION. Montreal, April 11, 2011

RESULTS OF A STUDY ON IMMUNIZATION PERFORMANCE

Selecting Research Participants. Conducting Experiments, Survey Construction and Data Collection. Practical Considerations of Research

Smiley Faces: Scales Measurement for Children Assessment

INTERNATIONAL CONFERENCE ON HARMONISATION OF TECHNICAL REQUIREMENTS FOR REGISTRATION OF PHARMACEUTICALS FOR HUMAN USE. ICH Considerations

Secretary-General of the European Commission, signed by Mr Jordi AYET PUIGARNAU, Director

Transcription:

Clinical Outcome Assessments (COAs): A Conceptual Foundation Authors*: Walton MK(1), Powers JH(3), Hobart J, Patrick DL, Marquis P, Vamvakas, S(2), Isaac, M(2), Papadopoulos, E(1), Slagle, AF(1), Piault E, Burke, L Table of Contents Abstract Introduction Effectiveness is a conclusion that the intervention provides a treatment benefit Clinical Study Endpoints and Outcome Assessments Identification of the Intended Treatment Benefit: The Meaningful Health Aspect The Measured Concept of Interest as a Practical Approach to Evaluating the MHA Clinical Study Context Affects the Properties of the OA Attributes of an OA Measurement Properties are Different than OA Attributes Identifying a COA Conclusion References [list to be added; not present in this document] *Disclaimers: (1) The views expressed in this article are those of the authors and do not necessarily represent an official FDA position. (2) The views expressed in this article are the personal views of the author(s) and may not be understood nor quoted as being made on behalf of or reflecting the position of EMA or one of its committees or working parties or any of the national agencies (3) The views expressed in this article are those of the authors and not necessarily represent the positions or policies of the NIH. 1

Abstract 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Developing therapies for diseases where no therapy has yet been developed or for disease manifestations not previously evaluated by existing therapies will likely call for developing new and improved efficacy endpoints. Efficacy endpoints are based on specified assessments of patients and must be well defined and have adequate measurement properties to demonstrate the benefits of a treatment. How to develop new assessments for use in endpoints, or evaluate the utility of existing assessments is not always clear. An initial step is clearly identifying and describing the meaningful aspect of how patients feel or function in their typical lives, or on survival, that is the intended benefit. This aspect is called the meaningful health aspect (MHA). When it is not practical or optimal to directly evaluate the selected health aspect, a bodily function or feeling that is thought a sub-component and more practical to measure may be identified and called the concept of interest (measured COI). The measured COI may be identical to the MHA when it is feasible or necessary to do so. Procedures are then developed to measure the COI. Also necessary is to fully define the circumstances and manner of use of the assessment, called the context of use. Assessments have identifiable attributes that affect the measurement properties of endpoints. These attributes include whether patient motivation influences the measurement, whether, and whose, judgment can also influence the measurement, and whether the assessment is directly or indirectly related to the meaningful aspects of feelings and functioning. Recognition of the specific attributes of an assessment aid in directing efforts at defining, standardizing, and refining the assessment to improve the measurement properties. This paper discusses these important concepts that apply to all types of assessments used in efficacy endpoints. 2

22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 Introduction The availability and adoption of disease therapies for patients is based in part on information that establishes the value of medical interventions from the perspective of multiple parties, including patients, healthcare providers, regulators and payors. An important element of that information, particularly for regulatory agencies, is the evidence provided by rigorous clinical trials evaluating the benefit of the therapy in patients. Efficacy of medical interventions is generally demonstrated by high-quality controlled clinical trials that measure the beneficial effect of the intervention on a patient outcome that evaluates a specified aspect of the disease. This paper is focused on characteristics of tools used to form clinical trial efficacy endpoints in these studies. For many diseases there is no well-defined and reliable measure for evaluating an important (or perhaps any important) disease manifestation. In many other diseases the current methods have known weaknesses for use as clinical trial endpoints or their validity is uncertain. In these circumstances therapy development can be aided by a new or improved method of assessing the patient. The process for developing and evaluating new assessments of patients for use in clinical trials is not always clear. This paper discusses important concepts for describing the goal for using the assessment, the circumstances of using the assessment for that goal, and certain distinguishing features of patient assessments. A separate paper (Powers, et.al., in preparation)discusses how these essential concepts are used in applying principles of good measurement to one type of assessment (clinician-reported outcome assessments) to ensure understanding the efficacy of a therapy in a clinical trial is outside this discussion (see Powers et.al, in preparation). The concepts discussed in this paper should be considered during the development and evaluation of all types of tools for the assessment of patients in clinical trials. The term therapy is used throughout this paper to mean the medical intervention administered to the person, irrespective of whether the intervention is intended to improve adverse effects of a disease present in patients, to prevent additional adverse effects on patients of a disease already established, or to prevent onset of a disorder not yet affecting a person. Effectiveness is a Conclusion that the Intervention Provides a Treatment Benefit A conclusion that a therapy is effective means that there is a treatment benefit caused by use of the drug. A treatment benefit is a favorable effect on a meaningful aspect of how a patient feels or functions in their life, or on survival. Two phrases in this definition deserve emphasis to ensure clarity. One is meaningful aspect : the effect on feels or functions should be meaningful to the patient to regard it as a benefit to the patient. The second is in their life : the aspect of feels or functions affected by the therapy should be what occurs in the patient s usual (typical) life, or have a well understood relationship to their usual life. A treatment effect is not a treatment benefit if, for example, it is solely an alteration in 3

58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 performing a specific task that occurs only in the medical clinic and has no well-defined relationship to any usual activity the patient does (or would want to do) in their life outside of the clinical trial setting. Controlled clinical trials (e.g. phase 3 trials) designed to show a difference in the study endpoint results for patients who received the investigational treatment as compared to those who received a comparator treatment (often a placebo) are the usual source of evidence to support a conclusion of treatment benefit. The study endpoint difference between treatment and control patients, therefore, needs to show or be confidently interpreted as indicating a meaningful effect on how patients feel, function, or survive. The benefit category of survival is distinctly different than feels or functions. Unlike many types of feeling or functioning, mortality has well defined means for determination (when not cause-attributed) with readily understood meaning. The meaning of death remains clear even in diseases where some people might view certain non-fatal outcomes of the disease as also highly undesirable. In contrast, there are many measurements related to feeling or functioning for which the meaning (value) to a patient is not self-evident and has not been adequately evaluated. In diseases where improving one or more important specific aspects of feeling or functioning is the intended benefit, an inability to interpret a measurement as meaningful leads to an inability to demonstrate treatment benefit of a therapy. Methods for evaluation of patients that describe meaningful aspects of how patients feel or function are necessary for evaluation of these disease treatments. This discussion is focused on the assessments intended to relate to how patients feel or function. Although the discussion often uses functional abilities to describe features of assessments or as examples, the important points and attributes are equally applicable to assessments of feelings and function. The Relationship between Patient Assessments and Study Endpoints The objective of Phase 3 studies is to demonstrate there is a treatment-related difference in the outcome of patients. This is shown by an analysis of patient assessment study data as specified by the endpoint description. Specifying the endpoint comprises identifying a particular patient assessment, obtained at one or more specified times during the study, and analyzed according to a specified statistical method to provide a comparison between groups. The patient assessment used in an endpoint, called an outcome assessment (OA), is the measuring instrument that provides a rating or score (numeric or categorical) that is intended to represent some aspect of the patient s medical status. The characteristics of the OA discussed in this manuscript will greatly influence the potential for success of the study and the interpretation of the meaning of the study endpoint results. The OA itself, in isolation from the other specified endpoint elements, is not the endpoint. This is important to recognize because the other aspects of an endpoint will also affect interpretation of the study results. 4

94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 Selecting an assessment for a study endpoint warrants attention to whether the assessment is appropriate as an outcome directly describing, or indirectly related to, a treatment benefit for the disease. Evaluation of a patient s medical status is done for many purposes in clinical care of patients (e.g., diagnosis, estimating prognosis, and monitoring of response), and many OAs are drawn from clinical care evaluations. Evaluations intended for a clinical care purpose, however, may not always be suitable for the purpose of an OA. For example, some disease diagnostic assessments may be not suitable as an OA. Often substantial medical experience led to an understanding of how the presence or absence of a specific assessment criterion prior to treatment aids in diagnosing a particular disease. In contrast, there will often not be sufficient medical experience demonstrating that elimination of a particular diagnostic sign after administration of an intervention reliably indicates the avoidance of a particular adverse consequence of the disease intended as the treatment benefit. Identification of the Intended Treatment Benefit: The Meaningful Health Aspect In many diseases a single manifestation may impair or prevent multiple activities that are normally part of a person s usual life. The affected activities that are related because of being impaired by a particular manifestation usually can be thought of as a group. In other diseases, a disease may cause a group of distinct, but related, symptoms (feelings). A word or phrase can be selected to express the concept of the abstract commonality of the identified effects on the patient s life. When a treatment effect on this abstracted grouping is intended to be the treatment benefit, it becomes the meaningful health aspect (MHA) that is studied in a clinical trial. The MHA should be identified with phrasing that promotes clear communication regarding this essential element of efficacy endpoints. The identifier used for the MHA during the therapy development period, however, does not need to be the optimal phrasing for other uses. For example, the naming of the treatment benefit in labeling accompanying regulatory agency marketing approval might be reworded for communication in that setting. Thus, naming of the MHA should focus on clear communication during the period of assessment development and clinical trials regarding the intended treatment benefit, without concern that the selected term will necessarily be the phrasing used in labeling. Identifying the MHA in, for example, a disease affecting arm motion (e.g., multiple sclerosis, Parkinson s disease) may directly diminish a patient s ability to perform related activities, such as dressing, eating, toileting, etc. This group of affected functional abilities might be identified by the term upper limb dependent personal activities. This unified, abstract conceptualization of the adverse disease impact is an aspect of a person s usual life that would be meaningful to improve, and thus will be a treatment benefit if favorably affected by a treatment. 5

130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 As another example, patients affected with lower limb weakness may have impaired ability for walking from a bus stop to their office, walking around a shopping mall, a grocery store, or to a neighbor s house. As these and similar activities are a common part of a typical healthy person s life the inability to perform them is meaningful to the person who cannot do so. A treatment which improved a patient s ability to walk from the bus stop to their work office because of increased lower limb strength would also be expected to have beneficial effect on similar activities that rely on walking. Thus, the benefit of an effective treatment would be the abstracted commonality of them, and the MHA might be termed ambulatory capacity. In some diseases the impact of an important disease manifestation as a MHA may be simpler to describe. For example, in some patients with breast or prostate cancer, pain related to bone metastases is an important concept of interest, and palliation of pain would be important treatment benefit. Because a single disease may have multiple manifestations, there may be multiple MHAs that could be appropriate as separate intended treatment benefits, each to be demonstrated by a distinct clinical study endpoint. In some cases, combining several separate MHAs into an overarching single concept will be appropriate. For example, some degenerative neurologic diseases damage the ability to use both upper and lower limbs for typical activities of life, and also impair cognitive function. An overall MHA might be called disability of disease X, but further identifying the specific types of significantly decreased functional abilities is needed, such as disability in personal activities using the upper limbs, disability in ambulation, and disability in cognitive activities. The Measured Concept of Interest is a Practical Approach to Learning about the MHA In order to determine if the intended treatment benefit(s) does occur, there must be a specific means to obtain a measurement that will be used in a study endpoint and analyzed to enable a conclusion whether the intended benefit occurs. Adequately measuring the MHA activities may be difficult in many disorders. Patient reported outcomes (PRO) are often intended for this purpose, and the difficulty of developing good PRO assessments has been described [add PRO references]. In some cases, investigators may instead deconstruct the MHA into simpler (more narrowly defined) bodily actions that are thought to be relied upon when doing MHA activities. One of these bodily actions may be hypothesized as both particularly well related to many activities within the MHA and as more readily measured. When a bodily action is selected as an opportunity to create a practical OA for studying a treatment the bodily action is called the concept of interest (COI) for measurement. If the MHA is reasonably feasible to measure directly, or can only be measured directly (e.g., patients feelings) the measured COI may be the MHA identically. The measurement method is an operationalized expression of an assessment producing a rating or score intended to represent the measured COI. Most COIs do not have a single, unique, assessment 6

166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 method. Usually there are multiple potential assessments that might form the basis of a relevant endpoint. For the example of pain as the MHA, pain intensity (a component of the full experience of pain) might be selected as a measurable COI. Pain intensity, however, may be measured in multiple ways. A patient might select a score between 0-10 depicting their current pain intensity or might categorize the average pain intensity over the past month. Alternatively, a count of numbers of analgesic pills used within one day might be proposed. All three methods could be proposed as measures of the COI, but might provide different results in a clinical trial due to different measurement properties (e.g., reliability may be different for pain scores using a one month recall period as compared to rating current pain). Additionally, the precise meaning of the pill count is not inherently clear. The pill count might relate to duration of pain relief from each pill or whether the pain intensity rises above some personal threshold. Evaluation of the measurement properties of potential OAs is essential when developing and choosing an OA for a clinical trial. The principles for evaluating OA measurement properties for clinicianreported OAs to determine their suitability is discussed by Powers et. al. (in preparation). Evaluation of the validity and suitability of PRO OAs have also been previously described (Patrick et. al., 2011, Matza et.al., 2013). One of the cardinal elements of showing the suitability of a potential OA for demonstrating a treatment benefit is to examine the hypothesis that the OA is well related to the MHA (i.e., differences in the OA are reflective of, and reflected in, changes in the MHA). Even when the measured COI is identical to the MHA, the relationship between the specific OA and the MHA should be examined. Context of Use: Clinical Trial Factors that Affect the Properties of the OA How the OA is used in a clinical trial and the setting of the clinical trial may substantially affect the OA s performance characteristics, and may go so far as to alter the interpretation of the results from the clinical study. Taken together, these aspects of the intended use of the OA are called the context of use (COU). These factors should be specified at the outset of developing a new OA or evaluating an existing OA. Different OAs, or different uses of the OA, may have different factors that are important to consider and specify in the COU. Factors that are often important include: a) Disease of study The disease of interest should be clearly defined (e.g., diagnostic criteria) as well as describing the important effects of the disease on the patient to ensure the MHA is relevant for the disease. b) Subpopulation of patients with the disease of study Many diseases are neither homogeneous between patients nor static over time within a patient. The specific disease subpopulation intended as the target population for study of the treatment should be specified. A wide range of types of factors may be important to identify such as the phenotype (or subtype or stage) of the disease, the patient-specific disease characteristics such as severity, duration, 7

202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 involvement of specific portions of the body, demographics of the intended patients, history of prior treatments, co-morbidities, or other factors that might be used as trial entry or exclusion criteria. c) Cultural, language, or other geography-related factors Many OAs rely on questioning the patient to obtain the primary input for the OA result. Thus, when multiple languages are used in the study, whether the patients understand the question to have the same meaning for each of the different languages becomes important. In addition, some questions can invoke cultural differences in what is taken into consideration by patients decisions on a response. d) Standard concomitant care Other therapies are commonly administered in clinical trials along with the investigational intervention under study when they are part of standard clinical practice. If these concomitant care therapies are effective the course of the disease may differ from the expected natural history disease course. If the OA s measurement properties had been evaluated in a time period prior to the use of current concomitant therapies, the OA performance characteristics may be different from those expected. In addition, if the standard concomitant care differs between clinical sites, the measurement properties of the OA may differ between clinical sites. e) Endpoint Positioning Endpoint positioning means the description of where the endpoint using the OA falls within the study objectives (as shown by the analysis plans for the study) and the regulatory role the endpoint is intended to support. Some OAs may be appropriate to support marketing approval decisions in one context (i.e., provide appropriate evidence for the key efficacy claim in a particular disease), but appropriate only for supplementary claims of efficacy in a different context, and some may be inappropriate to support any efficacy claim. Specifying the intended efficacy-claim purpose is thus an important element of the context of use. f) Manner of use within the endpoint OA data of the study groups can be analyzed in a variety of ways, for both quantitative (interval numeric) and categorical OAs. These include mean OA value at a specific timepoint of the study, percentage of patients at a specific timepoint who meet specified OA criteria (a responder or failure analysis), repeated measures analysis of several timepoints during the study, time to event analysis, etc. These different ways of using the OA in an endpoint summarize patient experience of the disease during the study differently, and may have different relationships to the MHA. The number and timing of OA measurements, along with the analysis method should be specified. g) Measurement setting Some OAs are feasible to obtain in a variety settings, such as in home, an outpatient clinic, or a hospital inpatient setting. These different settings may alter the actual measurement obtained, and thus the measurement properties of the OA. 8

238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 h) Method of OA administration Some OAs can be designed for administration in more than one way. These can include variations in who administers the OA (e.g., self-administration, an otherwise untrained person, trained professionals), or other aspects of administration (e.g., visual versus auditory, electronic versus non-electronic). These differences can also affect the data obtained, and thus alter the measurement properties of OA. This can be particularly problematic if multiple options for one of these factors are used within a single trial. These factors are among those important to bear in mind when the prior experience with an OA is considered or in developing a new OA. If the OA is developed, or performance characteristics are assessed, in a different context of use or for a different COI, re-evaluation of the OA characteristics in the new intended context of use is valuable. Attributes of an Outcome Assessment The COU describes the setting and manner of use of an OA, but does not define the OA. A careful and complete definition of the OA is needed to utilize it in clinical trials. This definition determines certain distinguishing attributes that influence the measurement properties and how the measurement properties are evaluated. Important attributes include the following. OA Attribute 1: Whether the OA is dependent on patient active involvement or on rater judgment Some assessments of patients require the patient s active involvement to create a record (physical or electronic) and/or to perform an activity that is the basis of the rating or score. The process of creating the record or performing the activity can be influenced by patient volition and by motivation to participate in the assessment. The level of motivation and attention to the assessment process may vary over time within a patient and differ systematically between patients yielding differences in ratings or scores unrelated to differences in the underlying medical status of interest (symptom or functional ability). This can give rise to nonequivalent clinical meaning of the measurements that might not be inherently discernable. In addition, some evaluations depend on the patient or other person (e.g., a trained medical professional, a spouse or caregiver) integrating observations (self-observations in the case of the patient) and transforming them into the rating in a non-deterministic process. Rater judgment about the patient s response to an inquiry, observations in watching a patient perform a task, or observations of a biological sample, may differ between raters due to prior experiences and biases of the rater. This can give rise to differing scores from the raters that are again unrelated to differences in the underlying medical status. 9

274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 Terminology: Clinical Assessments and Biomarkers Assessments that are subject to variation within a patient between successive times or between patients due to patient volition effects (i.e., variation unrelated to clinical status), or are dependent upon judgment of a rater, are grouped together as clinical assessments. When clinical assessments are used as clinical trial outcomes, they are called clinical outcome assessments (COA). There are, in contrast, assessments that are subject to little to no motivational or judgmental influence. These assessments are categorized as biomarkers 1. Protein levels in blood or urine measured by standardized methods are biomarkers, as is an automated quantitative measurement of the size of a pathologic lesion visualized with MRI. Because of the potential patient or rater influence, developing, evaluating, and refining COAs requires care to ensure that they are well-defined and reliable within the target COU. Appropriately developed COAs can be reliable and highly informative regarding patient clinical status and can reveal treatment effects with meaning to patients on a very broad range of types of effects. OA Attribute 2: Which category of rater applies judgment to form the measurement As discussed, the involvement of a rater who may need to apply personal judgment to arrive at the rating is a distinguishing feature of many OAs categorized as COAs. For purposes of categorizing COAs, the rater is the person who applies the final judgment to the information obtained during the assessment procedure, and forms the rating that is recorded as the measurement. Raters with different prior experiences, types of expertise, or perspective may be differently influenced in forming the rating. Wide variations in these influences can damage the validity or reliability of a COA, and efforts to decrease the variation in the rating due to the rater s judgment are important to developing a well-defined COA. Identifying the rater category is a first step to addressing these potential biases. Although there are many different people who may be involved in a clinical trial, for purposes of these discussions of COAs, there are three categories of people who might be a rater and apply judgment: patients, clinicians, and non-clinician observers. There are, in addition, COAs in which no rater judgment is applied; these OAs remain categorized as COAs (and not as biomarkers) when variation due to patient volitional involvement may be present. When the patient is the rater, the COA is called a patient reported outcome (PRO) assessment. The patient provides responses to a questionnaire that are directly captured (e.g., paper or computerized questionnaire forms), or in interviews where the patient s observations or reports are recorded exactly as 1 A Biomarkers Definitions Working Group (ref) has worked upon issues involved with biomarkers used in surrogate endpoints, and defined biomarker for that discussion. The definition of biomarker in the current discussion (focused on clinical assessments) differs from the working group s by expanding the criteria that distinguish between biomarkers and clinical assessments in order to address the range of types and complexities that arise when developing clinical assessments for use in endpoints. 10

306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 spoken, without any interpretation (i.e., judgment) on the part of the interviewer. Because a patient s direct reports can capture a wider range of feelings and functions than have been evaluated in many previously developed non-pro COAs, and only a PRO can capture direct measures of how a patient feels (e.g., pain, low mood), there has been increasing interest in developing PRO COAs in recent years, and an FDA guidance on the topic.[ref to FDA guidance here] For many of the COAs that have previously been used in clinical trials, a member of the investigator team with appropriate professional training has been the rater, and applied professional expertise and judgment to their observations of, or conversations with, the patient to arrive at a rating according the OA s definition. These COAs are called clinician reported outcome (ClinRO) assessments. ClinRO rating scales, for example, may call for the clinician to interpret the patient s responses to questions in an interview, judge the quality of some patient actions, or judge findings on physical examination. For purposes of this discussion a ClinRO is any COA where some specific professional training and judgment is necessary for forming the rating. In contrast, some observations can be made and interpreted by a person other than the patient and do not require specialized professional training. These include COAs that are best made by a companion (e.g., parent, spouse) or caregiver of the patient. These COAs depend upon the observer to formulate the rating, thus are influenced by the perspective of the observer. Often the planned observer is taught what to observe and apply judgment to in forming the rating, but healthcare professional training is not needed. COAs in this category are called observer reported outcome (ObsRO) assessments. The fourth category of COAs is where the patient is instructed to perform a defined task and some defined quantification of that performance is the measurement (e.g., distance walked in 6 minutes, number of pictorial symbols correctly matched to a key within a fixed amount of time). These task performance COAs are called quantified performance outcome PerfO assessments. Some PerfOs are conducted by a clinician on the investigator team administering the task and monitoring performance to judge whether the patient has performed the task adequately, but the investigator does not apply judgment to quantifying the performance. There are many COAs that involve the patient performing a specified task, but only when the score is a well-defined quantitation of the task, without rater judgment to influencing the score, is the COA a PerfO. When appropriately defined, developed, and evaluated, any of these four categories of COAs can become accepted as a well-defined and reliable COA and suitable for use in a study endpoint. Which type is more advantageous for a particular concept of interest will be strongly influenced by the specific intended context of use and should be carefully considered at the outset of developing a new COA. OA Attribute 3: Whether the OA is directly assessing a meaningful aspect of how the patient feels or functions 11

342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 As discussed earlier, treatment benefit is an effect that is meaningful to the patient within their typical life, and is demonstrated when clinical trial endpoint results show an effect that can be interpreted as a meaningful effect. The OA of the endpoint in some studies is directly assessing the MHA, and treatment effects on these OAs are inherently interpretable as showing meaningful effects (e.g., many, but not all, PROs are intended to have this interpretability). In other studies the OA is not directly assessing the MHA, but instead is assessing a measurable COI. Figure 1 illustrates this relationship for an example of measures of function. A well-defined relationship between the OA and the MHA, however, enables a favorable effect on the endpoint to justify the conclusion that there is a favorable effect on the MHA (i.e., a treatment benefit). Whether an OA is directly or indirectly assessing the MHA is a binary distinction. The distinction is important because the interpretability of direct measures of the MHA is inherent, while the interpretability of indirect measures of the MHA is dependent upon some additional evidence. This additional evidence should define the relationship between the COA measurement and the MHA (see discussion of measurement principles in Powers, et.al.). Indirectness, however, is a graded quality. Some OAs evaluate abilities or actions that are close, but not identical, to how the patient functions in typical life (e.g., some in-clinic performance measures that simulate instrumental activities of daily living, or visual acuity testing), and thus the measurement is close to evaluating the COI itself. Other OAs are substantially unlike the patient s functioning in daily life (e.g., supine quadriceps isometric strength). Terminology: Direct OAs and Indirect OAs Assessments that provide direct evidence about the MHA are termed Direct Measure OAs, and those that are not precisely evaluating the MHA are termed Indirect Measure OAs. When the OA is a COA, the terms become direct measure and indirect measure COAs. Recognizing whether the OA is a direct measure of the MHA or an indirect measure is important to determining whether evidence to establish the relationship between the measurement and the MHA is necessary. For indirect measure COAs, the degree of indirectness will guide the amount and type of evidence obtained during the course of developing and evaluating the COA that ensures conformity with principles of good measurement. Establishing the relationship between the measurement and the MHA for COAs closely similar to patient functioning in typical life will generally be more straightforward than for those substantially dissimilar to typical life activities. Biomarkers are all very indirect measures of the MHA, and thus substantial amounts of evidence are needed to establish a biomarker as an acceptable OA for use in an efficacy endpoint (generally called a surrogate endpoint). 12

378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 Measurement Properties are Different than OA Attributes The OA attributes identified in this discussion are intrinsic to the individual OA based on its definition. When used in a study endpoint, however, many important measurement properties of the OA are affected by the specific COU. The COU can affect whether the OA has validity for the intended MHA, and the endpoint s performance characteristics such as reliability and ability to detect change. For example, the patient population of a study may alter the ability of the endpoint based on a COA to detect change. A COA that shows a good dynamic range for one particular patient population may have unacceptable ceiling or floor limitations in a different population, or may be much less reliable in one patient population than another. Other design features of a clinical study can also alter an endpoint s measurement properties. In a study intended to show a treatment effect as a lesser amount of the natural worsening severity in a disease, a COA s reliability may enable observing that treatment effect in a study of 1 year duration, but that same reliability would make observing the treatment effect in a study of 3 months duration unlikely without infeasible study sample size. In some cases a COA suitable for use in one COU might not be valid when the disease under study changes even if the MHA does not change. For example, a COI (measured by an OA) that is closely linked to the MHA in one disease can be poorly linked in another disease because manifestations of the second disease, not present in the first, become the dominant influence on the MHA. Even within the spectrum of a single disease elements of the COU can alter the interpretability of an observed treatment effect. A treatment effect that is a modest fraction of the full dynamic range of a COA might be of clinical value to patients at a mildly affected stage of a disease, but not important to patients at a severely affected stage of the same disease. Identifying the Clinical Outcome Assessment Determining what may be concluded based on the results of a clinical trial is facilitated by effective communication regarding the clinical study objectives, setting, and endpoints. Clear identification of the COA is important to the discussion. COAs are sometimes identified only with the name selected by the COA s creator, but this may be insufficient for several reasons. Procedures for performing many long-standing COAs have varied over time as different investigators have used the COA in different studies, and often do not identify any modification (e.g., do not identify a version). Thus, studies seeming to identify the same COA as the study measure may in fact have made distinctly different measurements. To avoid this confusion in developing and evaluating an assessment it is important to ensure clarity of name, including precisely identifying the exact version of an assessment. Clarity is also dependent on identifying the aspect(s) of patients meaningful feels and functions the assessment is intended to represent, i.e., identifying the intended MHA. For assessments that directly 13

414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 436 437 438 439 440 441 442 443 444 445 446 447 448 449 measure the MHA this may be clear from the description of the measurement procedure. This is often not the case for indirect assessments. The intended MHA should be explicitly identified for all assessments, irrespective of whether it seems implicitly clear. Naming a COA should be done cautiously (or the names regarded cautiously for pre-existing COAs). COAs are sometimes given a name indicating the intended MHA, but having such a name is not support for concluding it is a good measure of that MHA. For example, naming an assessment disease X disability scale does not assure it is a good measurement of disability in that disease. The recommended approach is to name the COA to reflect what is actually measured and separately naming the intended MHA. In addition, as discussed just above, identifying the intended COU is essential to understanding the endpoint s measurement properties and whether an endpoint based on a particular COA is suitable for use in a clinical trial. Conclusion Patients may substantially benefit from therapies that either treat diseases for which there are currently no therapies or from therapies that improve upon the benefits provided by current therapies. These benefits are established by showing favorable effects on the efficacy endpoints in clinical trials testing the treatment. Successful clinical trials depend upon many factors, among which is availability of good assessments to use in study endpoints. Demonstrating benefit in diseases without current established therapy may require development and use of measures that are not already known and well understood to be useful for studies of the disease. Diseases with existing therapies may require development of improved endpoints to demonstrate the advantages of a new therapy over existing ones, or new endpoints to evaluate treatment effects on aspects of the disease that have not previously been evaluated. Effects on endpoints need to be interpretable as meaningful effects for patients, i.e., a treatment benefit: an effect on how patients feel, function, or survive. Study endpoints to demonstrate efficacy are built on outcome assessments (particularly clinical outcome assessments), and are intended to represent the MHA when used in a well specified COU. COAs measure a COI that may be directly the MHA, or may be a COI more readily measured that is thought to have strong relationship to the MHA within the disease. The first step to understanding the performance properties of an outcome assessment in a COU is to identify the attributes of the outcome assessment, particularly those discussed in this manuscript. Categorizing an outcome assessment by these attributes will guide directing attention to aspects of the outcome assessment that warrant carefully specifying the detailed procedure for conducting the assessment, determining the type of evidence that is needed to deem the outcome assessment to be well defined and reliable, and refining the outcome assessment during the course of its development. 14

450 451 452 453 454 Clearly and fully identifying the COA with name, version (if appropriate), intended MHA, and the COU will aid understanding whether a) the COA has been shown well-defined and reliable, b) there is need for some limited amount of additional development (e.g., as for use in a COU related to, but identical to, the established COU), or c) a substantial COA development effort is needed. 15

455 456 457 REFERENCES (list to be added to final draft) FIGURE 458 459 460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 FIGURE 1: The Interrelationships of Meaningful Functional Abilities, the Measureable Concept of Interest, Directly Meaningful Functional Activities and Indirectly Meaningful Assessments of Function A group of related meaningful activities (lower left rectangular box, Concrete meaningful activities) that are of benefit to affect in the intended disease are identified. An abstracted conceptualization is formulated from these, and given a name (left-middle oval, Ambulation Dependent Activities), and is called a meaningful health aspect (MHA). Assessment of the identified specific activities might be formulated into a defined OA that is hypothesized to directly assess the MHA. Other types of activities may also be of interest in the same disease, shown as conceptualized into other illustrated MHAs. The specific MHA might be deconstructed into more easily measureable body actions (concepts of interest, right side ovals) that are thought to be important elements of performing the meaningful activities. Procedures could be devised as evaluations of the simplified action (rectangular boxes, lower right). These are indirect OAs. The procedure provides a score for the quality or quantity observed in this operationalized method to measure a body action performed in a way that is not a part of a person s normal life. The clinical meaning of the score is not precisely known, but is hypothesized to reflect the meaningful functional activities. One or more of the procedural tests might be used as a COA in a study endpoint. 16