Evaluating and Interpreting Clinical Trials

Similar documents
The RoB 2.0 tool (individually randomized, cross-over trials)

Controlled Trials. Spyros Kitsiou, PhD

The comparison or control group may be allocated a placebo intervention, an alternative real intervention or no intervention at all.

In this second module in the clinical trials series, we will focus on design considerations for Phase III clinical trials. Phase III clinical trials

EBM: Therapy. Thunyarat Anothaisintawee, M.D., Ph.D. Department of Family Medicine, Ramathibodi Hospital, Mahidol University

Cochrane Pregnancy and Childbirth Group Methodological Guidelines

Randomized Controlled Trial

Revised Cochrane risk of bias tool for randomized trials (RoB 2.0) Additional considerations for cross-over trials

Evaluation Models STUDIES OF DIAGNOSTIC EFFICIENCY

Glossary From Running Randomized Evaluations: A Practical Guide, by Rachel Glennerster and Kudzai Takavarasha

EFFECTIVE MEDICAL WRITING Michelle Biros, MS, MD Editor-in -Chief Academic Emergency Medicine

95% 2.5% 2.5% +2SD 95% of data will 95% be within of data will 1.96 be within standard deviations 1.96 of sample mean

Systematic Reviews. Simon Gates 8 March 2007

Clinical research in AKI Timing of initiation of dialysis in AKI

ANATOMY OF A RESEARCH ARTICLE

CONSORT 2010 Statement Annals Internal Medicine, 24 March History of CONSORT. CONSORT-Statement. Ji-Qian Fang. Inadequate reporting damages RCT

Why do Psychologists Perform Research?

CHAMP: CHecklist for the Appraisal of Moderators and Predictors

Critical appraisal: Systematic Review & Meta-analysis

Statistical Essentials in Interpreting Clinical Trials Stuart J. Pocock, PhD

Glossary of Practical Epidemiology Concepts

The role of Randomized Controlled Trials

CLINICAL PROTOCOL DEVELOPMENT

20. Experiments. November 7,

Web appendix (published as supplied by the authors)

GATE CAT Intervention RCT/Cohort Studies

What is indirect comparison?

Unit 1 Exploring and Understanding Data

COMMITTEE FOR PROPRIETARY MEDICINAL PRODUCTS (CPMP) POINTS TO CONSIDER ON MISSING DATA

Guidelines for Reporting Non-Randomised Studies

Overview of Study Designs

In many healthcare situations, it is common to find

Learning objectives. Examining the reliability of published research findings

Experimental Design. Terminology. Chusak Okascharoen, MD, PhD September 19 th, Experimental study Clinical trial Randomized controlled trial

BACKGROUND + GENERAL COMMENTS

Appendix G: Methodology checklist: the QUADAS tool for studies of diagnostic test accuracy 1

Evaluating the results of a Systematic Review/Meta- Analysis

STUDY DESIGN. Jerrilyn A. Cambron, DC, PhD Department of Research. RE6002: Week 2. National University of Health Sciences

Regression Discontinuity Analysis

GLOSSARY OF GENERAL TERMS

\ jar gon \ BUSTER. For research terms A B C D E F G H I J K L M N O P Q R S T U V W X Y Z 1

Essential Skills for Evidence-based Practice: Statistics for Therapy Questions

Biostatistics Primer

CRITICAL APPRAISAL OF MEDICAL LITERATURE. Samuel Iff ISPM Bern

Module 5. The Epidemiological Basis of Randomised Controlled Trials. Landon Myer School of Public Health & Family Medicine, University of Cape Town

Biostatistics 3. Developed by Pfizer. March 2018

Vocabulary. Bias. Blinding. Block. Cluster sample

Deciding whether a person has the capacity to make a decision the Mental Capacity Act 2005

MAY 1, 2001 Prepared by Ernest Valente, Ph.D.

STUDIES OF THE ACCURACY OF DIAGNOSTIC TESTS: (Relevant JAMA Users Guide Numbers IIIA & B: references (5,6))

CRITICAL APPRAISAL AP DR JEMAIMA CHE HAMZAH MD (UKM) MS (OFTAL) UKM PHD (UK) DEPARTMENT OF OPHTHALMOLOGY UKM MEDICAL CENTRE

Placebo and Belief Effects: Optimal Design for Randomized Trials

Critical Appraisal Istanbul 2011

Reflection paper on assessment of cardiovascular risk of medicinal products for the treatment of cardiovascular and metabolic diseases Draft

Checklist for appraisal of study relevance (child sex offenses)

exposure/intervention

A Decision Tree for Controlled Trials

ISPOR Task Force Report: ITC & NMA Study Questionnaire

1 The conceptual underpinnings of statistical power

Further data analysis topics

Safinamide (Addendum to Commission A15-18) 1

How to CRITICALLY APPRAISE

Recent developments for combining evidence within evidence streams: bias-adjusted meta-analysis

Statistics for Clinical Trials: Basics of Phase III Trial Design

Dichotomizing partial compliance and increased participant burden in factorial designs: the performance of four noncompliance methods

Making comparisons. Previous sessions looked at how to describe a single group of subjects However, we are often interested in comparing two groups

NEED A SAMPLE SIZE? How to work with your friendly biostatistician!!!

Assessing risk of bias

Downloaded from:

REPRODUCTIVE ENDOCRINOLOGY

Protocol Development: The Guiding Light of Any Clinical Study

1. Draft checklist for judging on quality of animal studies (Van der Worp et al., 2010)

Phase III Clinical Trial. Randomization, Blinding and Baseline Assessment. Chi-hong Tseng, PhD Statistics Core, Department of Medicine

GATE: Graphic Appraisal Tool for Epidemiology picture, 2 formulas & 3 acronyms

Clinical Trials in Psoriasis

Guidelines for Writing and Reviewing an Informed Consent Manuscript From the Editors of Clinical Research in Practice: The Journal of Team Hippocrates

SAMPLING AND SAMPLE SIZE

Special Features of Randomized Controlled Trials

Strategies for handling missing data in randomised trials

Measuring and Assessing Study Quality

How to Interpret a Clinical Trial Result

Lecture 2. Key Concepts in Clinical Research

AVOIDING BIAS AND RANDOM ERROR IN DATA ANALYSIS

Evidence- and Value-based Solutions for Health Care Clinical Improvement Consults, Content Development, Training & Seminars, Tools

EUROPEAN COMMISSION HEALTH AND FOOD SAFETY DIRECTORATE-GENERAL. PHARMACEUTICAL COMMITTEE 21 October 2015

Chapter Three Research Methodology

Issues to Consider in the Design of Randomized Controlled Trials

Overview of Study Designs in Clinical Research

Real-world data in pragmatic trials

Clinical Epidemiology I: Deciding on Appropriate Therapy

Delfini Evidence Tool Kit

GRADE. Grading of Recommendations Assessment, Development and Evaluation. British Association of Dermatologists April 2018

Debate Regarding Oseltamivir Use for Seasonal and Pandemic Influenza

User s guide to the checklist of items assessing the quality of randomized controlled trials of nonpharmacological treatment

Trials and Tribulations of Systematic Reviews and Meta-Analyses

European Federation of Statisticians in the Pharmaceutical Industry (EFSPI)

Assessing Agreement Between Methods Of Clinical Measurement

EPF s response to the European Commission s public consultation on the "Summary of Clinical Trial Results for Laypersons"

Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews

Chapter 11. Experimental Design: One-Way Independent Samples Design

Transcription:

Article #2 CE Evaluating and Interpreting Clinical Trials Dorothy Cimino Brown, DVM, DACVS University of Pennsylvania ABSTRACT: For the practicing veterinarian, selecting the best treatment for patients can be challenging. The gold standard for determining the efficacy of a treatment is the randomized controlled clinical trial (RCT) reported in the veterinary medical or pharmaceutical literature. With the time constraints placed on busy practitioners, the temptation is to review the trial objectives and then the conclusions. However, it is important for practitioners to be able to critically review the methods and results of RCTs to determine whether the conclusions are justified by the data and whether a change in their practice based on those conclusions may benefit their patients. Send comments/questions via email editor@compendiumvet.com or fax 800-556-3288. Visit CompendiumVet.com for full-text articles, CE testing, and CE test answers. Reports of randomized controlled clinical trials (RCTs) are the gold standard by which practitioners and others make decisions about treatment efficacy. Therefore, the RCT, more than any other methodology, can have a powerful and immediate impact on patient care. This article reviews the basic components of the methods and results for conducting RCTs and describes the influence of those components on the interpretation of trial results. With this understanding, practitioners can critically review information presented to them in the veterinary medical and pharmaceutical company literature and make informed decisions for their patients. DEFINING A CLINICAL TRIAL A clinical trial is a rigorously controlled test of a new intervention on subjects. The term intervention is used in the broadest sense to include prophylactic, diagnostic, and therapeutic agents and procedures. This article focuses on therapeutic agents (i.e., new drugs). Clinical trials progress in phases. Phase I and II trials enroll a limited number of subjects to determine the safety, dose, or toxicity of the treatment of the condition for which it is intended. Phase III trials are large clinical trials of an intervention that in phases I and II has been shown to be efficacious with tolerable side effects. Phase III trials compare the effect and value of the treatment versus a control group of subjects; these trials are designed to provide definitive evidence for the efficacy of the treatment. Phase III trials can provide unbiased estimates of an intervention s specific effects and change the way practitioners manage their patients. Whether they are truly unbiased estimates that have applicability to patients in any given practice depends on the many features of the trials discussed in this article. EXAMPLE TRIAL FOR DISCUSSION To make the discussion clearer and more applicable, we will refer to a (fictitious) specific RCT report as an example. This is a report of the beneficial effects of drug A in dogs with osteoarthritis (OA). The authors conclude that, based on a double-blind, randomized, placebocontrolled clinical trial, dogs with OA receiv- COMPENDIUM 752 October 2005

CE 753 Dogs with OA Reference population General group to which the results of a trial are expected to be applicable Intervention Drug A Control Placebo Outcome Pain score Platelet function Experimental population Actual group in which the trial is conducted Figure 1. Schematic appearance of the trial. ing drug A have significant pain relief. In addition, the authors conclude that drug A has no significant effect on platelet function, unlike other drugs in its class. Figure 1 is a schematic appearance of the trial. We will refer to this example as we highlight the basic components of the reporting of the methods and results of RCTs and describe the influence of those components on the interpretation of trial results. Study participants Treatment group Comparison group Nonparticipants Figure 2. Population hierarchy for a randomized controlled clinical trial. EVALUATION OF METHODS Study Population For the conclusions of a trial to be useful to practitioners, the study subjects in the trial (e.g., dogs with OA) should be representative of the patients seen routinely in their practice. This can be determined by evaluating inclusion and exclusion criteria in the methods section of the report. Narrow inclusion and exclusion criteria confine enrollment in the study to a small subset of patients with the disease, which may impose limitations on how useful the results are to a practitioner. In our example, the study concluded that drug A has a beneficial effect in dogs with OA (i.e., the reference population); however, the inclusion criteria for the trial were middle-aged, large-breed dogs with coxofemoral OA and no other underlying conditions (i.e., the experimental population; Figure 2). Although drug A appeared to be very beneficial in this subset of dogs with OA, practitioners are likely to see more variable results in their practices when administering the drug to dogs of varying ages and sizes, with varying joints affected by OA, and with underlying diseases. This is not to say that the reported results of the trial are not valid; assuming all other methods are appropriate, the trial results are very valid for the experimental population of the report. What practitioners must decide is whether the results of the trial can be generalized to the population of dogs that they treat on a regular basis. If the answer is yes, a generally similar result to that reported in the trial may be expected. If the answer is no, more variable results may be expected. Assignment to Treatment Versus Control Groups The RCT is the gold standard method of evaluating new and existing treatments because of its ability to minimize bias. One way to minimize bias in the assignment to a treatment or control group is through randomization. Randomization implies that each study subject has the same chance of being placed in either the treatment or control group. This means that with an adequate sample size, the study groups (i.e., treatment and control) tend to be comparable with respect to all variables except for the treatment being studied. Selection bias occurs when study subjects with one or more October 2005 COMPENDIUM

754 CE influencing factors appear more frequently in one study group than in another. For example, if younger age is associated with a better outcome in dogs with OA and the proportion of younger study subjects is greater in the treatment group than in the control group, then even if the treatment and control are equally effective, there would be an observed benefit of the treatment that did not really exist. Using our trial example of middle-aged (4- to 8-year-old) dogs reported in the OA trial, if 50% of the dogs in the drug A group were 4 to 5 years of age and only 10% were 7 to 8 years of age, and the opposite were true in the placebo group (i.e., 50% were 7 to 8 years of age, and 10% were 4 to 5 years of age), the drug A group would appear to do much better than the placebo group, not necessarily because the drug was that much more effective than the placebo but because the dogs in the drug A group tended to be much younger, and younger dogs do better. Randomization can take care of this problem by ensuring that a 4-year-old dog is just as likely to be placed in the drug A group as in the placebo group. The beauty of randomization is that not only known factors (e.g., age) are evenly distributed between groups but also factors that are unsuspected by the investigators because of limitations of biologic knowledge at the time the trial is initiated. For example, if it were discovered 2 years after our example trial was reported that thin dogs do better than obese dogs, the results of the trial would still be valid because randomization would have roughly equally allocated the obese and thin dogs between the two study groups, thus negating the potential for the body condition of the dogs to bias the results. For practitioners to be sure that the treatment and control groups in the study are truly comparable, they must look for the presence and method of randomization in the methods section of the report. If there is no mention of randomization, the presence of unequal study groups and a biased result must be considered. If randomization is properly done, nobody either involved in deciding whether a subject is eligible to enter a trial or responsible for administering the treatment will know the assigned group. This is known as blinding. When a system of assignment is known, there is potential for bias. For example, if the first two eligible dogs with OA (i.e., a 4-year-old dog and an 8-year-old dog) present at the same time with different prognoses and, according to the randomization list, dog 1 is to receive drug A and dog 2 is to receive the placebo, an investigator may, consciously or not, enter them into the study in the order that would allow the dog with the better potential outcome (i.e., the 4-year-old dog) to receive the treatment. If a large proportion of study subjects are entered in this way, a serious imbalance in the treatment groups with respect to factors affecting the outcome under study would result. 1,2 Ideally, the randomization list should be kept by a third party (usually a pharmacy) that dispenses the drug or placebo to the investigators without them knowing into which Because of its ability to reduce bias (i.e., systematic errors), the randomized controlled clinical trial is the gold standard for evaluating drug efficacy. group the dog is being placed. In short, blinding mitigates the influences of expectation or other human predilections. 3,4 If blinding of the group allocation is not reported or is reported not to have been done, the practitioner must again interpret the results of the trial knowing that significant selection bias could be present. When allocation to a treatment or control group is done by a method other than blind randomization, the burden of proof is on the investigator/author of the trial report to show that all possible biases in allocating study subjects to a group or influencing effects of known or unknown factors that may differ between the study groups did not account for the observed result. If blind randomization was used to allocate study subjects to treatment and control groups, the practitioner can be confident that observed differences between the groups reported in the trial are not due to selection of particular subjects to receive a given therapy. Ascertaining the Outcome The primary concern regarding determining the outcomes in the trial (i.e., pain score and platelet function in our example) is that results are not biased by collec- COMPENDIUM October 2005

756 CE tion of more complete or accurate information from one study group compared with another (i.e., drug A versus placebo). Observation bias (by the investigators or caregivers of the dogs) in ascertaining the outcome can exist in an RCT in that knowledge of a study subject s treatment status might, consciously or not, influence identification or reporting of relevant events (i.e., pain score). Bias occurs when investigators consciously or subconsciously favor one group over another. For example, if investigators know which dogs received the treatment, they may monitor that group more closely and thus deal with them differently than they would the control group in a way that could seriously affect the outcome of the trial. For example, if investigators know that a dog is receiving drug A, they may be more likely to persistently question the dog s caregiver to give some indication of a positive response on a pain score than if they know the dog is receiving the placebo. Bias can also occur if the dog s caregiver knows to which group their dog belongs. If the dog s caregiver knows that their dog is receiving drug A versus the placebo, they are likely to overreport an improved pain score, whereas those who know their dog is in the control group are likely to overreport no improvement (or perhaps deterioration). This leads to exaggerated estimates of treatment benefits. Blinding of the investigators and caregivers of the study subjects prevents these biases. 1,2,5 In the title or abstract of an RCT report, authors often describe the trial as blind. However, the reader often has to go to the methods section of the report to determine exactly which parties were blind. If either the investigators or caregivers of the dogs were not blinded, the practitioner must again interpret the results of the trial knowing that significant biases could be present. The likelihood of such bias is directly related to the subjectivity of the outcomes under study. For example, the secondary outcome for the example OA trial is platelet function. In this case, observation bias is unlikely because platelet function is an objective outcome and cannot be affected by knowledge of the dog s study group. In contrast, when the outcome of interest is subjective, such as pain score, the use of blinding and a control group minimizes bias in the ascertainment. Placebo controls also take care of the problem that caregivers administering a drug to their dog may be sensitized to their dog s physical condition and may tend to ascribe every sign or unusual occurrence to the treatment. The key to obtaining an unbiased estimate of effect for a treatment with a subjective outcome or a side effect is to subtract the placebo effect. That is, the Selection bias is the systematic differences in prognosis or responsiveness to treatment between the treatment and control groups selected for the trial.this can be eliminated with blind randomization of the study subjects. true effect of the treatment with drug A on pain score is the measured effect of drug A on the pain score minus the measured effect of the placebo on the pain score. If the example OA trial did not use a placebo group control, it would be impossible for the practitioner to tell whether the decreased pain score following the treatment with drug A was due to the actual drug or merely to the caregivers or investigators belief that the treatment would help. Sample Size Considerations A trial must have a sufficient sample size (i.e., number of study subjects) to have adequate statistical power or ability to reliably detect the small to moderate but clinically important differences between study groups (e.g., drug A versus the placebo) that are most likely to occur. 1,2 A trial undertaken with an insufficient number of study subjects is of little scientific value. In fact, trials with an inadequate sample size could be scientifically harmful if their results are misinterpreted as demonstrating that a treatment has no effect when, in fact, the sample size was not sufficient to draw that conclusion. For the example OA trial, the null result would have been that the pain score following drug A administration was not different than the pain score following placebo administration in dogs with OA. In our example, the null hypothesis was disproved, so an adequate COMPENDIUM October 2005

CE 757 number of study subjects must have completed the trial. Had it not been disproved (i.e., the conclusion of the trial was that drug A was not better than the placebo in decreasing pain scores), the practitioner would need to look at the methods section of the report to determine whether this is a true null result or the study was merely underpowered. The methods section should describe how the sample size was determined. If it does not, the practitioner cannot be certain that a null result is not due merely to an inadequate number of animals enrolled in the trial. Allocated to treatment (n =...) Compliant (n =...) Noncompliant (n =...) Lost to follow-up (n =...) Analyzed (n =...) Excluded from analysis (n =...) EVALUATING RESULTS Tracking Study Subjects Through the Trial For practitioners to assess the validity of results and whether they are applicable to their patients, they must be able to follow the flow of study subjects through the trial. This is relatively easy to do if a flow diagram of subject progress through the phases of the trial is included in the results section of the report (Figure 3). If a diagram is not included, the practitioner must glean this information from the text of the results section. Number of Animals Assessed for Eligibility Versus Number of Animals Randomized People who choose to participate in a clinical trial are very likely to differ from nonparticipants in ways that can impact the outcome of the trial. This is particularly true if a large percentage of those offered enrollment in a trial decline and the participants in the trial become a select subgroup for reasons not apparent in the inclusion criteria. Whether the subgroup of participants is representative of the entire potential participant pool does not affect the validity of results for that subgroup but may affect how applicable the results are to the population of patients a practitioner sees. For example, if 200 Assessed for eligibility (n =...) Randomized (n =...) Excluded (n =...) Allocated to control group (n =...) Compliant (n =...) Noncompliant (n =...) Lost to follow-up (n =...) Analyzed (n =...) Excluded from analysis (n =...) Figure 3. Flow of study subjects through a randomized controlled clinical trial. Numbers in each section should be reported. dogs were assessed for eligibility but only 70 were randomized in the trial, 130 dogs would be considered nonparticipants. If a lot of the nonparticipation is because of reasons other than not fitting the inclusion and exclusion criteria, the participating experimental population (Figure 2) becomes an even more select subset of the reference population for reasons that may not be identified. A trial should report the number of study subjects assessed for eligibility that did not participate and ideally offer some details about the nonparticipants so practitioners can assess the presence and extent of differences between participants and nonparticipants. This aids in judging whether the results of the participants are representative of the larger population and, ultimately, of the patients in a clinician s practice. Number of Animals Randomized Versus Number of Animals Allocated to the Study Groups If a valid randomization technique is used, it is very likely that the number of study subjects allocated to each of the two study groups will not be exactly 50% of October 2005 COMPENDIUM

758 CE the total number randomized. When computer randomization or a randomization table is used to allocate study subjects to groups, it is common for one study group to have slightly more subjects enrolled than the other. For example, if 70 dogs are randomized in the OA trial, it would be typical for 36 to be assigned to the drug A treatment group and 34 to be assigned to the placebo group (or vice versa). It would be more unusual for exactly 35 dogs to be assigned to each group. If practitioners notice in the results section that an equal number of animals were assigned to each study group, they should look closely at the methods section to determine the method of randomization. If a clear method is not described, the reader should wonder whether alternate assignment to the treatment and control groups was used rather than true randomization. Alternate assignment always results in exactly 50% of the subjects in each of the two study groups (assuming an even number groups more alike, thereby decreasing the ability of the trial to detect any true differences between groups. For example, if seven of the 36 dogs in the drug A group do not receive drug A according to the protocol and five of the 34 dogs in the placebo group are administered drug A (or some other over-the-counter analgesic) by their caregivers at their own initiative, the two groups (i.e., drug A and placebo) become very similar in their exposure to drug A (or pain medication in general); consequently, any true magnitude of effect of drug A may be obscured. Thus the interpretation of any trial result must take into account the extent to which there was adherence to the treatment regimen. If a null result of a trial (i.e., no difference between drug A and the placebo groups) is found in light of a significant number of those enrolled not complying with the protocol, the study may be underpowered (i.e., not enough compliant subjects in each group) to detect Observation bias is the systematic differences in obtaining information about the study subjects once they have been entered in the trial.this can be eliminated by blinding the investigators, data collectors, and caregivers of animals enrolled in the trial. of total animals). As discussed earlier, this method is subject to great selection bias when two eligible animals present for enrollment at the same time. Number of Animals That Do Not Comply in Each Study Group By definition, an RCT requires the active participation and cooperation of the study subjects and their caregivers. After the agreement to participate, there may be deviation from the protocol for a variety of reasons, including the development of side effects, forgetting to administer the medication, or caregivers choosing to obtain alternative treatment for their dogs on their own. To the extent that study subjects in the control group receive therapy or those in the treatment group do not actually receive their assigned regimen, the two study groups become very similar in terms of exposure to the treatment. Thus the effect of noncompliance in any study subject makes the treatment and comparison a true difference between the study groups, even if an appropriate number were originally randomized. If a positive result of a trial (i.e., drug A decreases the pain score significantly more than the placebo) is found in light of a significant number of those enrolled not complying with the protocol, the treatment is likely very effective but may not be practical because there is a lot of noncompliance for some reason. For example, drug A may be very effective in decreasing the pain score in dogs with OA, but if it must be administered five times per day, noncompliance is likely to be high. It may not be practical for the practitioner to recommend use of drug A, no matter how effective it may be, if the treatment regimen is so difficult that it is likely to be accepted and used in only a small proportion of dogs with OA. For these reasons, compliance data should be reported with reasons for deviation from the protocol for individual subjects. Knowing that 100% compliance in a trial would be very unusual, practitioners need to COMPENDIUM October 2005

760 CE consider whether only compliant study subjects were analyzed for the report if no compliance information is reported. This can lead to very biased results, which are discussed later. Number of Animals Lost to Follow-Up In addition to the need for uniform ascertainment of outcome between groups is the requirement for complete follow-up of study subjects over the duration of the trial. As the period of time over which subjects must be followed increases, maintaining complete ascertainment of outcomes becomes more difficult. A number of subjects in both groups may be lost to follow-up by the end of the study period. If the proportion of outcomes that are not ascertained is large or differs among the study groups, the result could be an under- or overestimate of the effect of the drug. If the proportion lost to follow-up is large (i.e., 30% to 40%), it should raise serious doubts about the validity of the study results. 1 However, the more difficult issue for interpretation is that even if the rate of loss is not that extreme, the probability of loss may be related to the treatment, outcome, or both. For example, if caregivers are aware of which study group their dog is enrolled in, they may be more likely not to return for follow-up evaluations if their dog is receiving the placebo than if it is receiving drug A. It would also be likely for caregivers of dogs that are not doing well while enrolled in the trial (i.e., pain scores remain high) not to return for follow-up evaluation because they want to seek alternative therapy. Not determining the outcome of these cases can introduce a significant amount of bias in the study. Exactly how much is impossible to determine directly. An indirect approach used to describe the extent of bias introduced by losses to follow-up is to calculate estimates of treatment effect, assuming the most extreme situations. For example, one estimate would be based on the assumption that all those lost to follow-up from the drug A group had improved pain scores, while none lost from the placebo group did. The second extreme would be that none lost from the drug A group had improved pain scores, while all lost from the placebo group did. The results of these calculations provide a range within which the true effect of the drug lies. Ideally, the author(s) of the study reports these calculations (often called a sensitivity analysis) or at least provides the information necessary for readers to make this assessment themselves. 6 It is unusual not to lose a single study subject during a trial. If the trial report does not address loss to follow-up at all, it is impossible to gauge how much bias may be associated with the results. Number of Animals Analyzed It is important for practitioners to know how many of the original animals randomized to each study group had outcomes analyzed. This again has to do with the bias that is introduced into results because subjects lost to follow-up cannot be included in the analysis. It also has to do with the bias that is introduced when noncompliant subjects are not analyzed in the group to which they were randomized, which is discussed later in this article. Baseline Data One important early step in evaluating results of a trial is to compare the relevant characteristics of the treatment and control groups to ensure that balance was achieved. Although randomization tends to distribute both known and unknown factors evenly among study groups, if the sample size is small, randomization may not always result in groups that are alike with respect to every factor except the treatment under study. For example, if the age, sex, and bodyweight of the dog have an impact on how well it will respond to treatment with drug A, the breakdown of age, sex, and bodyweight of the dogs in the drug A group versus the placebo group should be presented so that practitioners can feel comfortable that randomization did indeed work and the two groups are comparable. If by chance the groups do not appear comparable, the imbalances can be controlled in the analysis using statistical techniques that would be reported in the methods section. Most RCTs reported in veterinary medicine are relatively small, making it very important for the baseline characteristics of the group to be reported. If this information is not reported, it is impossible for the practitioner to know whether randomization was effective in delivering comparable groups for analysis, and the possibility of selection bias (as discussed earlier) exists. Intention-to-Treat Analysis When the results of a clinical trial are evaluated, it is important for practitioners to assess which study subjects were included in the analysis. Some investigators remove subjects that did not comply with the study protocol from the analysis; however, the exclusion of any randomized study subject from the analysis can lead to COMPENDIUM October 2005

CE 761 biased results. Once study subjects are randomized to a study group (i.e., drug A versus the placebo), their subsequent health experience must be assessed and analyzed along with that of all others in that group, regardless of whether there is compliance with their assigned regimen. In all circumstances, the optimal comparison for estimating the true benefit from the treatment protocol is to analyze by the intention to treat (not whether they were actually treated). In other words, once randomized, always analyzed 1,2,7,8 (Figure 4). In most trials, perfect compliers represent some fraction of the total study population. Although the goal is to study the actual effect of the treatment (drug A), randomization is done only on the basis of offering the treatment. To preserve the power of randomization, the data must be analyzed on this basis. Only the entire groups allocated by randomization are truly comparable. Subsequent analyses can certainly be reported based on that subgroup of study subjects that actually received their assigned treatment; however, if this is reported, it is important for practitioners to realize that it is impossible to achieve balance in the distribution of unknown factors that had originally been achieved through randomization and that the results of the subgroup of compliers may be biased. P Values Versus Confidence Intervals The size of the P value is a function of two factors: the magnitude of the difference between the groups and the size of the sample. 1 Consequently, even a very small difference may be statistically significant if the sample size is sufficiently large; conversely, a large effect may not achieve statistical significance if there is a small sample size. To overcome this problem, a related but far more informative measure to evaluate the role of chance in the reported results may be the confidence interval (CI). For the OA example, in evaluating the relationship between drug A and pain in dogs, the report may say that dogs in the placebo group have a 1.9 times greater risk (i.e., relative risk = 1.9) of an elevated pain score compared with dogs receiving drug A and that this dif- Allocated to intervention Randomized Compliers Noncompliers Loss to follow-up Analyzed Allocated to control group Compliers Figure 4. Flow of randomized study subjects to final analysis. Analyzed Noncompliers ference between groups is statistically significant (i.e., P <.05). If the 95% CI is presented (e.g., 1.3 to 2.8), the reader can see that the best estimate of increased risk of elevated pain score is 1.9; however, the reader can be 95% confident that the true relative risk is no less than 1.3 and no greater than 2.8. It is particularly important to look for the CI when interpreting the results of studies that are not statistically significant. A narrow CI (i.e., 0.8 to 1.2) would add support to the belief that there is actually no true increased risk, whereas a wide interval (i.e., 0.8 to 8.2) suggests that the sample size was not sufficient to have adequate statistical power. EFFICACY VERSUS EFFECTIVENESS In veterinary medicine, RCTs are designed to test a biologic question on a relatively homogeneous population of animals (e.g., does drug A significantly decrease the pain score of middle-aged, large-breed dogs with coxofemoral OA and no other underlying diseases?). These types of trials are generally called efficacy trials. They test the true biologic effect of a treatment under optimal circumstances. They do not test the true effectiveness of a treatment in usual care, which is the effect a drug has when widely used in practice. The true effectiveness of a treatment cannot be determined until it is used in a heterogeneous population of thousands of animals. CONCLUSION The RCT is the most reliable methodology for assessing the efficacy of treatments in veterinary medicine. However, a number of issues in the design and conduct of the trial as well as the analysis must be carefully con- October 2005 COMPENDIUM

762 CE sidered to ensure that valid conclusions are made. In addition, practitioners must determine whether the reported results are applicable to the patients they routinely treat in their practice. Much of this information is not presented in the abstract of a report, and practitioners must delve deeper into the methods and results of the trial to make a truly informed decision for their patients. REFERENCES 1. Hennekens CH, Buring JE: Epidemiology in Medicine, ed 1. Philadelphia, Lippincot Williams & Wilkins, 1987, p 383. 2. Piantadosi S: Clinical Trials: A Methodologic Perspective. New York, John Wiley & Sons, 1997, p 590. 3. Kramer MS, Shapiro SH: Scientific challenges in the application of randomized trials. JAMA 252:2739 2745, 1984. 4. Juni P, Altman DG, Eggar M: Assessing the quality of controlled clinical trials. Br Med J 323:42 46, 2001. 5. Halpern SD: Evaluating preference effects in partially unblinded, randomized clinical trials. J Clin Epidemiol 56:109 115, 2003. 6. Hollis S: A graphical sensitivity analysis for clinical trials with non-ignorable missing binary outcome. Stat Med 21:3823 3834, 2002. 7. Ruiz-Canela M: Intention to treat analysis is related to methodological quality. Br Med J 320:1007, 2000. 8. Moher D, Schulz KF, Altman D: The CONSORT statement: Revised recommendations for improving the quality of reports of parallel-group randomized trials. JAMA 285:1987 1991, 2001. ARTICLE #2 CE TEST This article qualifies for 2 contact hours of continuing CE education credit from the Auburn University College of Veterinary Medicine. Subscribers may purchase individual CE tests or sign up for our annual CE program. Those who wish to apply this credit to fulfill state relicensure requirements should consult their respective state authorities regarding the applicability of this program. To participate, fill out the test form inserted at the end of this issue or take CE tests online and get real-time scores at CompendiumVet.com. 1. The RCT is the gold standard for evaluating drug efficacy because of its ability to a. reduce bias. b. compare two groups of study subjects. c. restrict inclusion criteria to a small subset of patients with the disease of interest. d. control the compliance of study participants. 2. Selection bias can be eliminated by a. blinding the caregivers of animals enrolled in the trial. b. blind randomization of the study subjects. c. having an adequate sample size. d. following the flow of study subjects through the trial. 3. Observation bias can be eliminated by a. using subjective endpoints such as pain score. b. blinding the investigators, data collectors, and caregivers of the animals enrolled in the trial. c. doing a sensitivity analysis. d. reporting the number of study subjects assessed for eligibility that did not participate. 4. An RCT can provide an unbiased estimate of a drug s efficacy by a. using an intention-to-treat analysis. b. blinding the randomization. c. blinding the investigators, data collectors, and caregivers of animals enrolled in the trial. d. all of the above 5. Which statement regarding randomization is true? a. Regardless of sample size, randomization ensures that the study groups are comparable with respect to all variables except for the treatment being studied. b. Randomization protects against observation bias. c. Randomization is the only way to evenly distribute unknown factors between study groups. d. Randomization ensures that study groups have the same number of participants enrolled. 6. For practitioners to determine whether study subjects in a reported trial are representative of their patients, they need to evaluate the a. inclusion criteria. b. exclusion criteria. c. number of study subjects assessed for eligibility that did not participate. d. all of the above 7. Which term describes the general group to which the results of a clinical trial are expected to be applicable? a. reference population b. experimental population c. participants d. nonparticipants 8. Which statement is true? a. Randomization implies that each study subject has the same chance of being placed in either the treatment or control group. b. The primary concern in determining the outcome of a trial is that only the data from study subjects compliant with the protocol are collected. c. When very narrow inclusion and exclusion criteria are used, the results of the trial will likely not be valid. d. Observation bias is of greatest concern when data on objective endpoints are collected. COMPENDIUM Test answers now available at CompendiumVet.com October 2005

764 CE 9. Which statement is true? a. If a large percentage of those offered enrollment in a trial decline, the results of the study are likely to be invalid. b. A trial must have a sufficient sample size to have adequate statistical power to reliably detect the important differences between study groups. c. If the number of study subjects lost to follow-up is large or differs among the study groups, the result is always an overestimate of the effect of the drug. d. A sensitivity analysis is used to determine whether enough study subjects were enrolled in the trial. 10. Which statement is true? a. The abstract of a clinical trial report provides all the information practitioners need to make a truly informed decision for their patients. b. RCTs test the true effectiveness of a treatment in usual practice (i.e., the effect of a drug when it is widely used in practice). c. The size of the P value is a function solely of the magnitude of the difference between the groups. d. Once study subjects have been randomized to a study group, their subsequent health experience must be assessed and analyzed along with that of all others in that group, regardless of whether there is compliance with their assigned regimen. COMPENDIUM October 2005