Analysis of Observational Studies: A Guide to Understanding Statistical Methods

Similar documents
Statistical Consideration for Bilateral Cases in Orthopaedic Research

Review Article Statistical methods and common problems in medical or biomedical science research

Intention-to-Treat Analysis and Accounting for Missing Data in Orthopaedic Randomized Clinical Trials

How to Design a Good Case Series

A Propensity-Matched Cohort Study

UC Berkeley UC Berkeley Previously Published Works

the Orthopaedic forum Is There Truly No Significant Difference? Underpowered Randomized Controlled Trials in the Orthopaedic Literature

Static progressive and dynamic elbow splints are often

Since many political theories assert that the

A Prospective Randomized Study of Minimally Invasive Total Knee Arthroplasty Compared with Conventional Surgery

A Comparative Effectiveness Study. Tiffany A. Radcliff, PhD, Elizabeth Regan, MD, PhD, Diane C. Cowper Ripley, PhD, and Evelyn Hutt, MD

Reporting Checklist for Nature Neuroscience

Younger Age Is Associated with a Higher Risk of Early Periprosthetic Joint Infection and Aseptic Mechanical FailureAfterTotalKneeArthroplasty

Reverse Shoulder Arthroplasty for the Treatment of Rotator Cuff Deficiency

Predictive Factors for Differentiating Between Septic Arthritis and Lyme Disease of the Knee in Children

Biomarkers of Nutritional Exposure and Nutritional Status

By Edmund Lau, MS, Kevin Ong, PhD, Steven Kurtz, PhD, Jordana Schmier, MA, and Av Edidin, PhD

Public perception regarding anterior cruciate ligament reconstruction

Total Elbow Arthroplasty in Patients Forty Years of Age or Less. By Andrea Celli, MD, and Bernard F. Morrey, MD

Perceptions of harm from secondhand smoke exposure among US adults,

Duration of the Increase in Early Postoperative Mortality After Elective Hip and Knee Replacement

Dynamic Modeling of Behavior Change

A PRELIMINARY STUDY OF MODELING AND SIMULATION IN INDIVIDUALIZED DRUG DOSAGE AZATHIOPRINE ON INFLAMMATORY BOWEL DISEASE

Management of Modifiable Risk Factors Prior to Primary Hip and Knee Arthroplasty

Reporting Checklist for Nature Neuroscience

Clustered Encouragement Designs with Individual Noncompliance: Bayesian Inference with Randomization, and Application to Advance Directive Forms.

Cost-Effectiveness of Antibiotic-Impregnated Bone Cement Used in Primary Total Hip Arthroplasty

Identifying Factors Related to the Survival of AIDS Patients under the Follow-up of Antiretroviral Therapy (ART): The Case of South Wollo

Competitive Helping in Online Giving

Radiographic structural abnormalities associated with premature, natural hip-joint failure

Legg-Calvé-Perthes Disease: A Review of Cases with Onset Before Six Years of Age

American Academy of Periodontology Best Evidence Consensus Statement on Selected Oral Applications for Cone-Beam Computed Tomography

Traumatic injuries leading to glenohumeral joint instability. History of Shoulder Instability and Subsequent Injury During Four Years of Follow-up

Computer-Assisted Surgical Navigation Does Not Improve the Alignment and Orientation of the Components in Total Knee Arthroplasty

Investigation performed at the Department of Orthopaedics, University of Utah, Salt Lake City, Utah

Corticosteroid injection in diabetic patients with trigger finger: A prospective, randomized, controlled double-blinded study

Studies With Staggered Starts: Multiple Baseline Designs and Group-Randomized Trials

Development of a questionnaire to measure impact and outcomes of brachial plexus injury

A Population-Based Cohort Study on the Drug-Specific Effect of Statins on Sepsis Outcome

A Clinical Decision Support Tool for Familial Hypercholesterolemia Based on Physician Input

Risk factors for surgical site infection following orthopaedic spinal operations

AMERICAN THORACIC SOCIETY DOCUMENTS

Trend Toward High-Volume Hospitals and the Influence on Complications in Knee and Hip Arthroplasty

Gary L. Grove, PhD, and Chou I. Eyberg, MS. Investigation performed at cyberderm Clinical Studies, Broomall, Pennsylvania

Modeling Latently Infected Cell Activation: Viral and Latent Reservoir Persistence, and Viral Blips in HIV-infected Patients on Potent Therapy

The value of intraoperative gram stain in revision total knee arthroplasty

Reduction of Osteolysis with Use of Marathon Cross-Linked Polyethylene

Simultaneous bilateral or unilateral carpal tunnel release? A prospective cohort study of early outcomes and limitations

Investigation performed at KNG Health Consulting, LLC, Rockville, Maryland

By Luis A. Corrales, MD, Saam Morshed, MD, MPH, Mohit Bhandari, MD, MSc, FRCSC, and Theodore Miclau III, MD

Impact of Preoperative Opioid Use on Total Knee Arthroplasty Outcomes

Opportunistic Osteoporosis Screening Gleaning Additional Information from Diagnostic Wrist CT Scans

By Jae Kwang Kim, MD, PhD, Young-Do Koh, MD, PhD, and Nam-Hoon Do, MD

A DISCRETE MODEL OF GLUCOSE-INSULIN INTERACTION AND STABILITY ANALYSIS A. & B.

Propensity score methods to adjust for confounding in assessing treatment effects: bias and precision

Hemiarthroplasty of the Hip with and without Cement: A Randomized Clinical Trial

Jefferson Digital Commons. Thomas Jefferson University

Periprosthetic Femoral Fracture within Two Years After Total Hip Replacement

EffectsofAgeandBodyMassIndexontheResults of Transtrochanteric Rotational Osteotomy for Femoral Head Osteonecrosis

The use of controlled hypotension during shoulder arthroscopy

In 1979, one of us (R.L.L.) and Dobyns reported a surface. Long-Term Outcomes of Proximal Interphalangeal Joint Surface Replacement Arthroplasty

Factorial HMMs with Collapsed Gibbs Sampling for Optimizing Long-term HIV Therapy

Effects of Regional Versus General Anesthesia on Outcomes After Total Hip Arthroplasty

Accuracy of patient recall of hand and elbow disability on the QuickDASH questionnaire over a two-year period

A Randomized Clinical Trial Comparing Open and Arthroscopic Stabilization for Recurrent Traumatic Anterior Shoulder Instability

Skeletal Age Assessment from the Olecranon for Idiopathic Scoliosis at Risser Grade 0

Effect of Hip Reconstructive Surgery on Health-Related Quality of Life of Non-Ambulatory Children with Cerebral Palsy

The disability associated with end-stage ankle arthritis. Arthroscopic Versus Open Ankle Arthrodesis: A Multicenter Comparative Case Series

Singer-Loomis Report

Analyzing the impact of modeling choices and assumptions in compartmental epidemiological models

By Thomas K. Fehring, MD, Susan M. Odum, MEd, CCRC, Josh Hughes, BS, Bryan D. Springer, MD, and Walter B. Beaver Jr., MD

Thirty-five-Year Results After Charnley Total Hip Arthroplasty in Patients Less Than Fifty Years Old

Effect of Smoking Cessation Intervention on Results of Acute Fracture Surgery. ARandomizedControlledTrial

The prevalence of cubital tunnel syndrome: A cross-sectional study in a U.S. metropolitan cohort

PERFORMANCE EVALUATION OF HIGHWAY MOBILE INFOSTATION NETWORKS

William N. Levine, MD, Charla R. Fischer, MD, Duong Nguyen, MD, Evan L. Flatow, MD, Christopher S. Ahmad, MD, and Louis U.

Periacetabular Osteotomy After Failed Hip Arthroscopy for Labral Tears in Patients with Acetabular Dysplasia

Transverse Fractures of the Femoral Shaft Are a Better Predictor of Nonaccidental Trauma in Young Children Than Spiral Fractures Are

Association of atypical femoral fractures with bisphosphonate use by patients with varus hip geometry

Primary Linked Semiconstrained Total Elbow Arthroplasty for Rheumatoid Arthritis

The influence of neighborhood environment on the incidence of childhood asthma: A propensity score approach

Comparison of Patients Undergoing Primary Shoulder Arthroplasty Before and After the Age of Fifty

Audiological Bulletin no. 35

Alonger hospital stay following elective surgery results in

Single-Anesthetic Versus Staged Bilateral Total Hip Arthroplasty

The Prevalence of Sacroiliac Joint Degeneration in Asymptomatic Adults

Surgical Treatment of Three and Four-Part Proximal Humeral Fractures

META-ANALYSIS. Topic #11

The prevalence of traumatic brachial plexus injury in. Prevalence of Rotator Cuff Tears in Adults with TraumaticBrachialPlexusInjuries

Evaluation of Brace Treatment for Infant Hip Dislocation in a Prospective Cohort

The influence of obesity on the outcome of treatment of lumbar disc herniation: analysis of the Spine Patient Outcomes Research Trial (SPORT).

A FORMATION BEHAVIOR FOR LARGE-SCALE MICRO-ROBOT FORCE DEPLOYMENT. Donald D. Dudenhoeffer Michael P. Jones

Risk Factors for Chondrolysis of the Glenohumeral Joint. Investigation performed at the University of Washington, Seattle, Washington

Mathematical Beta Cell Model for Insulin Secretion following IVGTT and OGTT

Jui-Jung Yang, MD, Leou-Chyr Lin, MD, Kuo-Hua Chao, MD, Shih-Youeng Chuang, MD, Chia-Chun Wu, MD, Tsu-Te Yeh, MD, and Yu-Tung Lian, RN

Improved Accuracy of Component Positioning with Robotic-Assisted Unicompartmental Knee Arthroplasty

Background. Aim. Design and setting. Method. Results. Conclusion. Keywords

Localization-based secret key agreement for wireless network

APPLICATION OF GOAL PROGRAMMING IN FARM AGRICULTURAL PLANNING

Transcription:

50 COPYRIGHT Ó 2009 BY THE JOURNAL OF BONE AND JOINT SURGERY, INCORPORATED Analysis of Observational Stuies: A Guie to Unerstaning Statistical Methos By Saam Morshe, MD, MPH, Paul Tornetta III, MD, an Mohit Bhanari, MD, MSc, FRCSC Observational stuies provie an important source of information when ranomize controlle trials cannot or shoul not be unertaken, provie that the ata are analyze an interprete with special attention to bias. This article highlights the special analytic consierations require for proper reporting an interpretation of observational stuies. We review statistical principles that are funamental to unerstaning what observational ata can offer. The concepts inclue the relationship between a stuy sample an the target population, an the two primary forms of statistical analysis: estimation an hypothesis testing. The concept of bias, an confouning in particular, is introuce as an obstacle to rawing vali conclusions from an observational stuy. The iscussion will then focus on the techniques that are most useful in the analysis of the three most common types of observational stuies (the case series, the therapeutic stuy, an the prognostic stuy). The goal of this review is to empower the reaer to take a practical approach to an valily interpret the statistical analysis of these stuy types. Introuction The analysis of high-quality ranomize controlle trials offers the top level of evience from clinical outcomes research investigation of therapeutic interventions. Yet there are many scenarios in which ranomize controlle trials are inappropriate or impossible, such as for the stuy of rare conitions; moreover, the generalizability of ranomize controlle trials is often limite by strict inclusion an exclusion criteria. Nonranomize stuies (or observational stuies) can provie an important complementary source of information, provie that the ata are analyze an interprete in the context of the confouning bias to which they are prone. This article will explain the special analytic consierations that are require for the proper reporting an interpretation of observational stuies. We begin by reviewing some intuitive, rather than technical, statistical principles that are funamental to the unerstaning of what observational ata can tell us. The relationship between a stuy sample an the target population is iscusse, as this is the key to statistical inference. The notion of probability istributions is presente as a means of unerstaning the two primary forms of statistical analysis: estimation an hypothesis testing. The concept of bias, an confouning in particular, is then introuce as a major obstacle to rawing vali conclusions from an observational stuy. The iscussion then focuses on techniques that are most useful in the analysis of the three most common types of observational stuies: the case series, the therapeutic stuy, an the prognostic stuy. Each type will be introuce with a case example for consieration as basic principles are reviewe, followe by further explanation. The goal of this review is to empower the reaer to unerstan a research question in the context of these categories so as to initiate a practical an vali approach to analysis. Populations an Distributions The analysis of any clinical stuy is base on the principle of taking a ranom or representative sample of subjects in orer to raw some inference about a larger population of similar iniviuals calle the target population (Fig. 1). However, going from a population to a sample leas to some egree of uncertainty or margin of error because of the nee to rely on the use of estimation without knowlege of the entire population. To quantify this uncertainty, we rely on mathematically efine probability istributions, such as normal istribution for continuous ata an binomial istribution for categorical ata. Unerstaning these istributions is funamental to statistical inference, an the reaer is referre to a basic statistical text for more backgroun 1,2. These istributions are base on parameters such as mean an stanar eviation. If the assumption is mae that the observe ata are a sample from a population with a istribution that has a known theoretical form, then it is reasonable to use parameters of that istribution (those observe) to calculate probabilities of ifferent values occurring. This parametric approach to statistics Disclosure: In support of their research for or preparation of this work, one or more of the authors receive, in any one year, outsie funing or grants in excess of $10,000 from the Orthopaeic Research an Eucation Founation. Neither they nor a member of their immeiate families receive payments or other benefits or a commitment or agreement to provie such benefits from a commercial entity. J Bone Joint Surg Am. 2009;91 Suppl 3:50-60 oi:10.2106/jbjs.h.01577

51 Fig. 1 Target population, stuy sample, an statistical inference. The analysis of any clinical stuy is base on the principle of taking a sample of subjects for the purpose of rawing some inference about a larger population of similar iniviuals. is wie-ranging an ubiquitous in meical research. However, if these istributional assumptions are not realistic, then the parametric approach may not generate vali results. When ata eviate from a so-calle normal pattern, nonparametric or istribution-free methos shoul be use 3. Making the ecision to use parametric or nonparametric methos is an important early step in the analysis of ata an requires the analyst to unerstan the observe istribution of the ata. Estimation an Hypothesis Testing Statistical analyses are of two general types: estimation an hypothesis testing. A primary objective of any of the three types of observational stuies introuce above is to provie some numerical value that expresses the probability or average of a measure outcome (often expresse as a proportion or mean) or the relative effect associate with a specific treatment or prognostic factor (often expresse as a relative risk or os ratio). Estimation covers a broa range of statistical proceures that yiel the magnitue of risk or effect as well as the precision of that estimate. Hypothesis testing is a metho for unerstaning the likelihoo of observing a ifference or association from ata if no such relationship actually exists in the population. While these two concepts may appear similar an are often simultaneously generate by computerize statistical packages, they actually convey istinct information that ought to be well unerstoo by researchers an those who strive to unerstan reporte results. Estimation typically involves two components. The first is the calculation of a point estimate of isease or outcome prevalence (typically expresse as a probability, rate, or mean) or effect (typically expresse as an os ratio, relative risk, or risk ifference). What is equally important to statistical inference is the variability or precision of this measure. Estimation gives us this quantity, typically in the form of a confience interval 4, which informs us of how large an error might be mae with an estimate of effect. A confience interval is a range of values that can be confiently relie on to inclue the true population value. Interpretation of a 95% confience interval woul inclue the range of values that contains the true population mean with a probability of 0.95. If, for example, one hunre ranom samples were rawn from a population in which the true mean is sixty-five, the estimate 95% confience interval of the mean of approximately ninety-five of those samples shoul inclue sixty-five (Fig. 2). Hypothesis testing, while somewhat less intuitive than estimation, is use in the majority of reporte results of statistical analysis in which comparisons are mae. For hypothesis testing, we state a null hypothesis that the effect of interest is zero. This statistical null hypothesis is often the negation of the research hypothesis that generate the ata (that is, there is no ifference in the effect of treatment A compare with treatment B). We also have an alternative hypothesis, which is usually simply that the effect of interest is not zero. Having set up our null hypothesis, we can then, with use of a test statistic from a t test, chi-square test, or similar type of analysis, evaluate the probability that we coul have obtaine the observe ata if the null hypothesis were true. This probability is usually calle the p value; the smaller the p value, the more unlikely the null hypothesis. When a p value is below some arbitrary cutoff point (e.g., 0.05), the result is often calle significant. The use of the wor significant can lea to much confusion over what is significant an what is clinically significant. Because meical

52 Fig. 2 Plot of estimate 95% confience intervals (horizontal lines) from 100 ranom samples taken from a computer-generate normally istribute population whose mean age is sixty-five years. The intervals are stacke from bottom to top, in orer of ascening sample mean. Note that ninety-three of these 100 ranom samples have an estimate 95% confience interval that inclues the true population mean. journals report many of their results with use of hypothesis testing, many restrict the use of the wor significant to those results that meet the statistical efinition. The use of cutoff points for p values leas to treating the analysis as a process of ecision-making within which it is customary to consier a significant effect as real an a nonsignificant result as inicating no effect. Notice that this value gives no information about the magnitue of the effect or association that is being investigate. The ecision about significance is also problematic because the uncertainty of the result is obscure (whereas it is explicit when a confience interval is estimate). It is not reasonable to conclue that a nonsignificant result inicates no clinical effect just because the null hypothesis cannot be rule out. The ifference between a p value of 0.048 an 0.052 shoul not alter the conclusion about an association in the ata. For these reasons, the approach base on estimation is often consiere superior 2,5 ; moreover, when hypothesis testing is performe, specific p values shoul be reporte in their entirety, allowing reaers to make up their own mins about whether the ifference is clinically significant. The threshol level at which a p value may be consiere to be significant also epens on how many times one sample group is compare with another. The more often that a ifference between two groups is searche for, the more likely it is that a ifference will be foun that has occurre purely by chance (that is, the type-i error rate increases) 6. This is an important consieration when two groups that are efine by the treatment that is receive are compare with respect to multiple outcomes of interest or when multiple subgroup analyses are performe. Multiple testing is ieally performe to generate new hypotheses for future stuy, rather than trying to use the same ata to try to efinitively answer multiple questions. It is important that investigators be explicit about the primary outcome that their stuy was esigne to investigate because this is typically what is use to calculate sample size an power an this is the comparison for which the stanar hypothesis test is vali. When multiple comparisons are mae an multiple hypothesis tests are unertaken with the same ata, the cutoff p value for significance shoul be lowere accoringly with use of a Bonferroni or similar ajustment metho to guar against type-i error 7. Bias an Confouning Properly conucte observational stuies require a clear unerstaning of the role of bias in the ata an how it ought to be hanle. While there are multiple escribe forms of bias that threaten the valiity of a clinical research stuy 8, most fall into one of three categories: selection bias, information bias, an confouning 5.Selectionbiasisefineasa istortion of estimates that results from the way in which subjects are selecte into the stuy sample. Selection bias may arise ue to flaws in the choice of groups to be compare or loss to follow-up uring ata collection (censoring). For example, if the likelihoo of censoring is influence by the choice of treatment or implant a patient receives, any relative estimate of effect will be biase by analyzing only those who receive follow-up. Information bias results in istortion of estimates ue to measurement error or misclassification of

53 from a lack of comparability between groups. Unfortunately, most analytical methos, several of which will be iscusse in greater etail below, only account for known sources of bias an come with their own set of assumptions that can only be teste partially. Fig. 3 Diagram of confouning of relationship between exposure or treatment (E) an isease (D) by confouner (C). Confouners are, by efinition, associate with treatment or exposure an have an association with isease or outcome. A confouner cannot be an effect of the exposure. subjects as to treatment, outcome, or other variables. This type of bias may result when the outcome of a particular treatment is assesse by a nonbline surgeon, an, similar to selection bias, it is a threat to valiity no matter what stuy esign is use. Confouning is a particular problem in observational stuies. Confouning represents a mixing of effects between the treatment of interest an associate extraneous factors that also impact outcome, potentially obscuring or istorting the relationship of interest (Fig. 3). Confouning arises when patients selecte for one treatment group are funamentally ifferent from the other group with regar to the pretreatment likelihoo of having the outcome of interest. In surgery, treatment ecisions are commonly mae on the grouns of certain overt an subtle factors relate to prognosis or severity of isease. For example, if patients with less severe systemic injuries are able to unergo fixation of long-bone fractures at an earlier time than those with severe multisystem trauma, there may be factors other than timing of treatment that influence the lower rates of morbi complications among those treate early. This type of bias in stuies of therapies has been terme confouning by inication or confouning by severity an is a major threat to the valiity of conclusions rawn from observational stuies. In ranomize controlle trials, the ranomization process will, on the average, evenly balance both the known an the unknown confouners, an this guarantees the valiity of the statistical test use. The ranomization process makes it possible to ascribe to the ifference in outcome a probability istribution that is not influence by ifferences in any prognostic factor other than the intervention uner investigation 9. The chi-square test for two-by-two tables an the Stuent t test for comparison of two means can be justifie on the basis of ranomization alone without making further assumptions concerning the istribution of baseline variables. In the absence of ranomization, aitional esign an analysis methos are neee to account for sources of bias that arise Analytical Techniques Analytical methos for observational stuies vary wiely an are chosen accoring to the type of stuy that is being performe. Most case series require very basic escriptive statistics, such as probabilities or simple averages. Therapeutic an prognostic stuies strive to give unconfoune estimates of association an therefore incorporate more elaborate techniques, each with relative strengths an weaknesses (Table I). Because therapeutic stuies can be thought of as a special case of the prognostic stuy in which we are only intereste in the effect associate with one risk factor (such as a specific treatment), the analytical methos use to control confouning are similar an will be presente together. It is important to remember that this istinction among the three observational stuy types is mae to highlight key features an that elements of all three may be present in the same stuy. Analysis of a Case Series Case series, while occupying the lower rung in the hierarchy of evience (Level IV), can provie extremely useful information to care proviers an patients if they satisfy certain criteria. First, the target population must be efinable an the stuy sample must be representative. Secon, the intervention must be reproucible so that a surgeon, with aequate training, can expect similar results if the proceure is faithfully replicate. Thir, the outcomes that are measure shoul be clinically important. An finally, follow-up shoul be as complete as possible to limit loss of precision an to avoi selection bias. If these criteria are met, a simple escriptive statistic, such as risk (number of new cases per number at risk), rate (number of new events per unit of time), or mean (numerical average), along with confience intervals (generate from a statistical moel of a probability istribution such as the binomial for risk ata or Poisson for rates) 10, can set an important benchmark for proviers an be very helpful in proviing information with regar to patient expectations. A classic example of such a case series is the one reporte by Letournel an Juet 11. That well-efine series, which consiste of 940 operatively treate isplace acetabular fractures an a follow-up perio of more than thirty-three years, has come to represent the so-calle gol stanar in the treatment of fractures of the acetabulum 11. Letournel an Juet reporte that, of the 567 hips that were operate on within twenty-one ays, 73.7% were assesse as perfect reuctions. Between three weeks an four months after injury, the probability of a perfect reuction among 150 hips ecrease to 64.7%. Ha 95% confience intervals been reporte, they woul have looke something like 70.0% to 77.3% prior to three weeks an 56.5% to 72.3% after (both intervals calculate with use of the exact binomial metho) an even more information about the

54 TABLE I Relative Strengths an Weaknesses of Methos of Analysis of Therapeutic an Prognostic Nonranomize Stuies Methos Strengths Weaknesses Matching Simple Limits sample size Efficient sampling metho, especially in case-control stuies Unable to fully explore associations with matche factors Potential for overmatching Stratification Simple Difficult to interpret with multiple subgroups Easy to see effect moification Multivariable ajustment Propensity scores Instrumental variables Efficient simultaneous ajustment for multiple confouners Ability to easily assess effects of iniviual factors Ability to irectly see confouning through istribution of the propensity score Intuitive an simplifie means of matching on single number Rare outcomes Confouning ajustment more robust to moeling assumptions Ability to get unconfoune estimates espite not having observe all possible confouners Quality of estimates subject to fit an assumptions of moel Potential remains for bias from unknown confouning Possible to miss effect moification Cannot test all instrumental variable assumptions Inference restricte only to subjects whose treatment is impacte by the instrumental variable potential uncertainty in making inferences about the larger theoretical population of all isplace acetabular fractures coul have been conveye. Although as many as 18% of patients were either lost to follow-up or ha incomplete ata, the results have been reprouce in case series reporte by others 12-14, aing to a relatively coherent boy of ata regaring a relatively uncommon conition. As multivariable analysis becomes more familiar to orthopaeic investigators an reaers, it is likely that case series (at least large ones) will increasingly both tell us about natural history an ientify risk factors (such as age, fracture type, an length of follow-up) associate with outcome. Analysis of Therapeutic an Prognostic Stuies Confouning threatens to bias estimates of associations of risk factors or treatment with outcome. The following set of methos inclues matching, stratification, an multivariable regression an can be use to control for confouning for either type of stuy. Two more avance multivariable methos that have been use increasingly in the analysis of therapeutic stuies propensity score analysis an instrumental variable analysis are also iscusse. Most of these methos rely on making the level of one or more factors constant in orer to stuy the variability in outcome that is specifically associate with a change in the treatment or risk factor of interest. The interpretation of such analyses is therefore appropriately escribe as conitional, as an accurate interpretation is epenent on whether other known variables are hel constant (with the exception of instrumental variable analysis). This is typically what is meant by statistical ajustment or controlling for confouning. Matching Matching is a conceptually straightforwar strategy whereby confouners are ientifie an subjects in the treatment groups are matche on the basis of these factors so that, in the en, the treatment groups are the same with regar to these factors. Matching can either be one on a one-to-one basis (matche pairs) or on the basis of frequencies (that is, the confouner is presenting an equal percentage of subjects in each group), an subjects can be matche with respect to a single confouner or multiple confouners. Matching can be use in both prospective an retrospective observational stuy esigns (incluing casecontrol stuies). For example, Ciminiello et al. 15 examine the impact of small incisions (<5 cm) on a variety of outcomes, incluing bloo loss, operative time, an postoperative complications, in patients unergoing primary total hip arthroplasty. To ensure that the group of patients who receive a small incision was as homogeneous as possible with the comparator group of patients who receive a stanar-size incision, the authors use a matche-pair cohort esign. They matche sixty patients in each group on a variety of potentially confouning factors, incluing age, sex, boy mass inex, American Society of Anesthesiologists score, iagnosis (osteoarthritis), prosthesis, type of fixation, anesthesia, surgical approach, an positioning, an were unable to ientify any significant ifferences in outcome between the two techniques.

55 TABLE II Stratifie Analysis of Nail Fixation Compare with Plate Fixation an the Effect on Development of Ault Respiratory Distress Synrome or Multiple Organ Failure 17 Chest Injury No Chest Injury Total Cohort Nail Plate Nail Plate Nail Plate Develope ault respiratory istress 5 2 4 1 9 3 synrome or multiple organ failure Number at risk 117 104 118 114 235 218 Risk 0.043 0.019 0.033 0.009 0.038 0.014 Risk ifference (95% confience interval) 0.024 (20.022, 0.069)* 0.025 (20.012, 0.062)* 0.024 (20.004, 0.054) Summary risk ifference (95% confience interval) 0.024 (20.005, 0.053) P value = 0.11 *Risk ifferences between strata are not significantly ifferent; that is, no interaction (test for heterogeneity; p value = 0.96). Given the absence of interaction, a poole summary valily estimates the risk ifference, ajusting for chest injury. P-value testing the null hypothesis of no association between treatment metho an outcome, ajusting for chest injury an assuming no interaction. While matching is an effective way of balancing multiple confouners, it is also associate with several important limitations. One is that it may be ifficult or impossible to fin exact matches between the two groups of patients, an this ifficulty increases rapily as the number of factors to be matche increases. Matching may eliminate substantial numbers of subjects ue to an inability to match all subjects, which results in a ecrease sample size an power. One solution is to match patients within a reasonable range (for example, age ± five years), meaning that the range is such that ifferences of prognostic importance are not believe to exist. Another problem is that matching generally preclues the evaluation of the unerlying relations between matching variable an exposure in a prospective cohort stuy an matching factor an outcomes in a case-control stuy. This is because of the sampling schemes (base on exposure for the prospective cohort stuy an outcome for the case-control stuy) an the way that balance is force with respect to the matching factor in each of these esigns. Finally, if matching is unertaken on variables that are not true confouners, a loss of statistical power can result; moreover, in a case-control stuy, such overmatching can create a new bias 16. Therefore, matching shoul be use cautiously an only on factors that are strongly associate with the outcome of interest an believe to be ifferentially istribute between treate an untreate subjects (that is, only on true confouners). Stratification Stratification is relate to matching an provies another means by which to control confouning. Potentially confouning variables are ientifie, an the cohort is groupe by levels of this factor. The analysis is then performe on each subgroup within which the factor remains constant, thereby removing the confouning potential of that factor. Bosse et al. 17 unertook a stuy to assess the impact of reame intrameullary nailing compare with plate fixation of femoral shaft fractures on several averse outcomes, incluing ault respiratory istress synrome (ARDS) an multiple organ failure. Because the presence of chest injury coul cause ifferences in surgical approach an impact the outcome of interest, it was consiere a potential confouner an a stratifie analysis was unertaken. Table II shows the crue or unajuste analysis of the total cohort as well as the subgroup analysis stratifie by the presence or absence of chest injury. The stratifie analysis shows subgroup risk ifferences that, in this case, are not statistically ifferent from one another (that is, no interaction); therefore, the ajuste summary risk ifference 18 can be reporte as unconfoune by chest injury. Just as stratification allows for control over a confouning factor, it also facilitates investigation into whether the effect of interest is constant across levels of the factor by which stratification is unertaken. If the estimates among stratifie groups are homogeneous, they can be average into a summary estimate that is unconfoune by the stratification variable, as mentione above. Conversely, significant ifferences in effect (interaction) preclue averaging of treatment effects. In the stuy by Bosse et al. 17, if there were a ifference in the effect of surgical treatment on complications epening on whether or not patients ha a chest injury, the stratum risk ifferences woul be reporte separately for those with an without chest injury. Stratification is a useful strategy when there are only one or two risk factors or confouners, but it quickly becomes unmanageable an ifficult to interpret when there are multiple confouners with multiple levels each. Although testing for interaction an estimating summary measures of effect base on stratification are often avoie in favor of reporting multivariable statistics, these etails still provie important information that is often hien in the reporting of the more sophisticate analyses that will be iscusse below. Multivariable Regression The use of regression for the ajustment of multiple confouning factors is one of the most commonly use analytical techniques in therapeutic an prognostic stuies. Regression

56 TABLE III Appropriate Multivariable Ajustment Moels for Common Types of Outcomes Type of Outcome Example Moel Estimate of Effect Binary Prevalence of postoperative infection Logistic regression Os ratio Continuous Range of motion or functional outcome score (i.e., SF-36) Linear regression Mean ifference Time-to-event Time to reoperation following total hip arthroplasty Cox proportional hazars Hazar ratio Rate National rates of total joint replacement Poisson regression Rate ratio analysis is base on moeling the mathematical relationships between two or more variables that give an approximate escription of the observe ata. Regression moels shoul not be thought of as explanations of unerlying mechanisms (that is, statistical moels are not reality), but rather as simplifications that are compatible with the ata an that provie us with some inference as to associations foun in the ata. These moels are usually aitive in that an observe epenent variable (such as the outcome of interest) can be explaine by a moel in which the effects of ifferent influences or inepenent variables (incluing treatment of interest an other preictors of outcome or confouning factors) are ae. Most analyses are base on the general linear moel: = A 1 B 1 X 1 1 B 2 X 2 1...B p X p E[Y] where the expectation (or mean value) of Y is an aitive combination of an intercept (A) an (p) explanatory inepenent variables multiplie by their respective coefficients (B 1 through B p ). Each coefficient represents an estimate of effect or risk epening on the type of general linear moel (e.g., mean ifference for linear regression an log os ratio for logistic regression). Multivariable analysis allows the association between epenent an inepenent variables to be estimate, while controlling for the influence of other inepenent variables. The appropriate moel epens on the type of ata available, especially the type of outcome that is being assesse. Table III lists some commonly use moels for typical outcome types. For example, Saleh et al. 19 performe a case-control stuy to evaluate the preictors of surgical site infections (a binary outcome) complicating total knee an total hip replacement. These authors use multivariable logistic regression to control for several emographic, perioperative, an postoperative factors an foun postoperative hematoma formation (p = 0.001) an persistent woun rainage (p = 0.01) to be the only significant associate risk factors. Ring et al. 20 use multivariable linear regression to stuy the influence of various preictors on functional an quality-of-life outcomes, such as the Disabilities of the Arm, Shouler an Han (DASH) score (a continuous outcome) after capsulectomy for posttraumatic elbow stiffness. After ajusting for other factors, such as range of motion, they foun significant associations of pain score (p < 0.001) an persistent ulnar nerve ysfunction (p < 0.01) with the DASH. Linear regression an logistic regression are two of the most commonly encountere strategies for multivariable ajustment. While fracture-healing or failure of implant fixation necessitating reoperation can be consiere as binary outcomes (that is, yes or no) or continuous outcomes (that is, time until event happens), choosing to analyze those outcomes with one of the two aforementione techniques can sacrifice information (element of time with logistic regression) or threaten basic assumptions of the moel (time-to-event outcomes are notoriously skewe an thereby violate multivariable normality requirements of linear regression). Therefore, time-to-event outcomes ought to be analyze with survival analysis techniques, an the most commonly use multivariable expansion of these methos is the Cox proportional hazars moel 21. Bhanari et al. 22 use this approach in a prognostic stuy of multiple risk factors for reoperation following initial operative management of fractures of the tibial shaft. After ajusting for over twenty possible variables, the researchers foun that open fracture (p = 0.001), cortical continuity less than 50% (p < 0.001), an transverse fracture pattern (p < 0.001) preicte a relative increase in reoperation. The Cox moel gives an estimate of effect that is analogous to the os ratio from logistic regression (the hazar ratio) an can similarly be interprete as a relative measure of risk of event associate with a unit change in a given preictor, holing other factors constant. Although the results of such multivariable analyses are commonly presente, the etails of how the moels were selecte are not. Reaers may be le to assume that the results are accurate when they may have been erive with use of inappropriate moels. Regression moels assume, for example, that there is no effect moification or ifference of effect between ifferent levels of a confouner, as iscusse earlier in the stratification example. Unless an interaction term (usually a prouct of two preictor variables) is inserte to represent effect moification, the moel will not account for such a relationship, thus causing the researchers to arrive at the false conclusion that the effect of treatment on outcome is constant across all levels of another preictor, such as age or sex. Assumptions are mae (such as the assumption of multivariable normality for linear regression or the assumption that the relative contribution of each factor is constant over time for the Cox moel) when a moel is fit, an it is important that these assumptions are verifie an that the overall fit of the moel to the ata is assesse. Moel fit is etermine in terms of the amount of variability in the ata as explaine by the moel an accoring to how well the moel preicts iniviual outcomes for a given observation. There are many iagnostic proceures for assessing the most vali an best-fitting re-

57 Fig. 4 Assumptions of an appropriate instrumental variable: (1) instrumental variable must be associate with treatment; (2) instrumental variable must have no association with outcome, other than through its influence on treatment. *Confouners here represent both those observe an unobserve. gression moel. These proceures shoul be conucte by a statistician or experience ata analyst an are beyon the scope of this review but iscusse elsewhere 23,24. A escription of moel fitting methos helps valiate reporte results. Propensity Score Analysis Propensity score analysis 25 is an approach to controlling for confouning through the generation of a score that summarizes the confouning by multiple variables. This form of analysis is a two-stage approach in which, first, rather than moeling the outcome as a function of multiple risk factors, the probability of being treate is moele, taking into account any possible confouning variables. This probability, usually generate by a logistic regression moel, is the propensity score an ranges from 0 to 1. Once the propensity score is generate for each subject, it can be use to match them (usually within some narrow range), or perform stratifie analysis on levels (such as eciles) of the propensity score, or it can be inserte into multivariable regression along with the treatment variable for use in estimating the outcome. While orthopaeic investigators have been slower to apply propensity score analysis than meical specialty (cariology an cariothoracic surgery, in particular) researchers have been 26, there are some examples. McHenry et al. 27 use propensity score analysis to control for confouning by inication for timing of treatment of surgically treate thoracic an lumbar spinal fractures in orer to assess risk factors for respiratory failure following operative stabilization of these injuries (that is, this was a prognostic stuy in which propensity score methos were use to ajust for the strong likelihoo of ifferential treatment-time assignment ue to injury severity). Subjects were matche base on the propensity score for treatment within forty-eight hours after injury. Logistic regression was then performe on the matche set, ientifying age, Injury Severity Score, Glasgow Coma Scale score, the presence of blunt injury to the chest, an time until surgery of longer than two ays as inepenent risk factors for respiratory failure. By matching on the propensity score, these investigators were able to limit bias ue to an important surgeon-controlle risk factor in assessing the relative importance of multiple prognostic factors. By estimating the treatment mechanism, propensity score analysis offers several insights into the ata an also offers theoretical avantages over conventional techniques of multivariable ajustment. First, propensity scores inicate the egree to which the likelihoo of treatment iffers between two groups an allows the investigator or reaer to assess the ways in which the treatment groups are actually comparable (that is, the two groups shoul have fairly similar istributions of propensity scores to make the comparison tenable). Secon, by matching or stratifying subjects on the basis of their likelihoo of treatment, an unerstaning of how selection bias is countere becomes intuitive because comparisons are mae only among those equally likely to have receive treatment, as in a ranomize controlle trial. Unfortunately, propensity score analysis is no more immune to threats cause by unknown an unmeasure confouners than the other methos alreay iscusse 25,28. Also, two recent systematic reviews have not shown significant ifferences in estimates from stuies in which sieby-sie conventional multivariable an propensity score analyses were performe 29,30. While the use of propensity score analysis is growing quickly among many fiels of research in meicine, guielines for proper use of these methos have lagge 31 an orthopaeic investigators shoul remain cognizant of emerging methoological work as they aopt this analytic approach. Instrumental Variables The instrumental variable approach to bias an confouning in meical research has been use frequently by economists for ecaes but has only recently been implemente in health research 32-34. Health economists, who typically work with aministrative ata in which many confouners are potentially missing, commonly use the instrumental variable approach to examine questions about the quality an istribution of care. The theo-

58 retical avantage of using instrumental variable methoology in the analysis of observational therapeutic stuies is that it offers the possibility of controlling for both known an unknown confouners an is therefore appealing when the threat of unobserve or unobservable confouners looms large. The iea is as follows: if a variable (the instrumental variable) can be ientifie that has the ability to cause variation in the treatment of interest but that has no impact whatsoever on outcome (other than through its irect influence on treatment), then it will be possible to estimate the magnitue of that inuce variation an its effect on outcome. Figure 4 provies a schematic iagram of this prerequisite relationship for the ientification of a useful an vali instrument. Instrumental variables can be thought of as achieving pseuoranomization, an a ranomize controlle trial is a special case in which the ranom number assignment (e.g., a fair coin toss) is the instrumental variable inucing variation in the outcome variable. Examples of the use of instrumental variable analysis are similarly rare in the orthopaeic literature. McGuire et al. 35 use a large Meicare ata set to examine the controversial topic of the impact of timing of fixation of hip fractures on mortality. Acknowleging the fact that there are likely to be factors beyon those which are measure in such a large aministrative ata set, these investigators chose ay-of-the-week grouping (Saturay through Monay compare with Tuesay through Friay) as an instrumental variable by which to pseuoranomize the cohort. The authors cite evience showing that ay of the week is a strong preictor of elay to operative treatment of hip fracture an assume that the ay of the week that the hip is broken shoul have no inepenent influence on mortality or have an association with other confouners such as the presence of comorbiities. The instrumental variable analysis showe an increase risk of mortality (risk ifference, 15%; p = 0.047) among patients unergoing surgery more than two ays after amission. It is likely that this methoology will grow in popularity among researchers who are trying to raw unconfoune estimates of effects of similar health-care ecisions from large ata sets for clinical practice an policy-making reasons. While the thought that one can avoi the issue of unobserve (an therefore uncontrolle for) confouning in observational stuies is very appealing, there are certain important limitations in the use of these methos to establish causality. First, ientifying an instrumental variable that meets the assumptions of no association with outcome, inepenent of treatment, is ifficult. Because this assumption is not irectly testable, there must be general consensus that the instrumental variable is tenable. In comparing instrumental variables with stanar multivariable ajustment or propensity score techniques, one is traing the assumption that was just mentione for the assumption that there is no unmeasure confouning, which is also not irectly testable. Another important consieration is that the effect that is measure only applies to those whose treatment was affecte by the instrumental variable. In the stuy by McGuire et al. 35, a 15% increase in the risk of mortality associate with elay of surgery applies only to the patient whose treatment timing was influence by the ay of the week on which he or she was amitte. This so-calle marginal patient 36 in a cohort stuy is important to istinguish from patients in the entire stuy sample, to whom any average treatment effect can be inferre in a ranomize controlle trial. Interpretation an Reporting of Results The reporting or interpretation of results from observational stuies must be tempere with the limitations implicit both in the ata an in the methos applie to the analysis of those ata. Matching an stratification provie a means to limit confouning by another factor by holing its level constant in the analysis. Conventional multivariable ajustment offers the power to ajust for multiple confouners at the same time, avancing the pursuit of potential causal relationships. Still, multiple other criteria are require to establish causation 37. Multivariable ajustment cannot give causation unless factors such as appropriate temporal orering of preictors an outcome are ensure an there are no unaccounte-for confouners missing from the analysis. While propensity score analysis offers a more plausible accounting for the multivariable nature of confouning an for the balancing of confouning by inication, causal interpretation is still limite by the same requirements. An, as just iscusse, causal interpretation from an instrumental variable analysis is contingent on universal acceptance of the instrumental variable that is chosen. These limitations ought to be acknowlege in the reporting of results. Other important limitations to the valiity of observational (an ranomize) stuies inclue missing ata an loss to follow-up or censoring. Missing ata an censoring are a form of selection bias in that those with complete ata or follow-up may iffer systematically in their association with outcome from those without complete ata or follow-up. In the most benign sense, ata missing at ranom shoul only lessen the precision or power of a stuy. However, when this is not the case, substantial biasing of estimates may result an there is no completely vali solution to this ilemma. There are numerous methos that have been escribe to account for missing ata, the most robust of which is multiple imputation 38. In ealing with the problem of patients lost to follow-up, sensitivity analysis (assigning all of those with incomplete follow-up to one or the other outcome) can at least put bounaries aroun the range of effect that may have been witnesse ha complete follow-up been achieve. Finally, given the number of techniques escribe in this review each with etails that coul not be thoroughly iscusse here it is vital for investigators to completely report on how the analyses were unertaken. From choice of confouners to control for the etails of the statistical proceure use, it is vital that enough information be given that an inepenent analyst can reliably reprouce the reporte results. While consierations such as moel fit an moel-assumption checking may not make it into a final paper, many journals will use this information initially in juging the quality of the manuscript an may then make the information available online or in an appenix.

59 Summary Observational stuies will continue to provie an important metho for clinical investigation in orthopaeic surgery in settings in which ranomize controlle trials are not feasible an when increase generalizability of finings is esire. Case series will continue to provie important information on natural history an the prevalence of certain iseases or outcomes an will serve to generate hypotheses for future research. Furthermore, their analysis with escriptive statistics is relatively simple. Prognostic stuies will help investigators further unerstan the risk factors that are associate with certain outcomes, an therapeutic stuies, especially when prospectively conucte, will provie the next-best level of evience to ranomize controlle trials in informing treatment ecisions. Confouning bias represents a major obstacle to rawing vali conclusions from such stuies, an current analytic approaches have been reviewe here for orthopaeic investigators an reaers of the literature. An unerstaning of the respective strengths an weaknesses of the various analytic approaches is necessary for proper application an interpretation. While each approach has its limitations an assumptions, any one of these approaches can be use as a powerful tool in unerstaning observational ata in clinical research. n Saam Morshe, MD, MPH Department of Orthopaeic Surgery, University of California San Francisco, Orthopaeic Trauma Institute at San Francisco General Hospital, 1001 Potrero Avenue, Room 3A-36, San Francisco, CA 94110. E-mail aress: morshes@orthosurg.ucsf.eu Paul Tornetta III, MD Department of Orthopaeic Surgery, Boston University Meical Center, 850 Harrison Avenue, D2N, Boston, MA 02118 Mohit Bhanari, MD, MSc, FRCSC Division of Orthopaeic Surgery, Department of Surgery, McMaster University, 293 Wellington Street North, Suite 110, Hamilton, ON L8L 2X2, Canaa References 1. Moore DS, McCabe GP. Introuction to the practice of statistics. 4th e. New York: W.H. Freeman; 2003. 2. Altman DG. Practical statistics for meical research. Lonon: Chapman an Hall/CRC; 1991. 3. Lehmann EL. Nonparametrics: statistical methos base on ranks. New York: McGraw-Hill; 1975. 4. Altman DG, Gore SM, Garner MJ, Pocock SJ. Statistical guielines for contributors to meical journals. In: Garner MJ, Altman DG, eitors. Statistics with confience. Lonon: British Meical Journal; 1989. p 83-100. 5. Rothman KJ, Greenlan S, eitors. Moern epiemiology. 2n e. Philaelphia: Lippincott Williams an Wilkins; 1998. 6. Hurwitz SR, Tornetta P 3r, Wright JG. An AOA critical issue; how to rea the literature to change your practice: an evience-base meicine approach. J Bone Joint Surg Am. 2006;88:1873-9. 7. Bener R, Lange S. Ajusting for multiple testing when an how? J Clin Epiemiol. 2001;54:343-9. 8. Sackett DL. Bias in analytic research. J Chronic Dis. 1979;32:51-63. 9. Byar DP, Simon RM, Frieewal WT, Schlesselman JJ, DeMets DL, Ellenberg JH, Gail MH, Ware JH. Ranomize clinical trials. Perspectives on some recent ieas. N Engl J Me. 1976;295:74-80. 10. Selvin S. Practical biostatistical methos. Belmont: Duxbury Press; 1995. 11. Letournel E, Juet R. Fractures of the acetabulum. 2n e. New York: Springer; 1993. 12. Moe BR, Carr SE, Watson JT. Open reuction an internal fixation of posterior wall fractures of the acetabulum. Clin Orthop Relat Res. 2000;377:57-67. 13. Matta JM. Fractures of the acetabulum: accuracy of reuction an clinical results in patients manage operatively within three weeks after the injury. J Bone Joint Surg Am. 1996;78:1632-45. 14. Liebergall M, Mosheiff R, Low J, Golvirt M, Matan Y, Segal D. Acetabular fractures. Clinical outcome of surgical treatment. Clin Orthop Relat Res. 1999;366:205-16. 15. Ciminiello M, Parvizi J, Sharkey PF, Eslampour A, Rothman RH. Total hip arthroplasty: is small incision better? J Arthroplasty. 2006;21:484-8. 16. Day NE, Byar DP, Green SB. Overajustment in case-control stuies. Am J Epiemiol. 1980;112:696-706. 17. Bosse MJ, MacKenzie EJ, Riemer BL, Brumback RJ, McCarthy ML, Burgess AR, Gens DR, Yasui Y. Ault respiratory istress synrome, pneumonia, an mortality following thoracic injury an a femoral fracture treate either with intrameullary nailing with reaming or with a plate. A comparative stuy. J Bone Joint Surg Am. 1997;79:799-809. 18. Mantel N, Haenszel W. Statistical aspects of the analysis of ata from retrospective stuies of isease. J Natl Cancer Inst. 1959;22: 719-48. 19. Saleh K, Olson M, Resig S, Bershasky B, Kuskowski M, Gioe T, Robinson H, Schmit R, McElfresh E. Preictors of woun infection in hip an knee joint replacement: results from a 20 year surveillance program. J Orthop Res. 2002;20:506-15. 20. Ring D, Aey L, Zurakowski D, Jupiter JB. Elbow capsulectomy for posttraumatic elbow stiffness. J Han Surg [Am]. 2006;31:1264-71. 21. Kleinbaum DG. Survival analysis: a self-learning text. Berlin: Springer; 1996. 22. Bhanari M, Tornetta P 3r, Sprague S, Najibi S, Petrisor B, Griffith L, Guyatt GH. Preictors of reoperation following operative management of fractures of the tibial shaft. J Orthop Trauma. 2003;17:353-61. 23. Hosmer DW, Lemeshow S. Applie logistic regression. New York: John Wiley an Sons; 1989. 24. Kleinbaum DG, Kupper LL, Muller KE, Nizam A. Applie regression analysis an multivariable methos. 3r e. Belmont: Duxbury Press; 1998. 25. Rosenbaum PR, Rubin DB. The central role of the propensity score in observational stuies for causal effects. Biometrika. 1983;70:41-55. 26. Stürmer T, Schneeweiss S, Rothman KJ, Avorn J, Glynn RJ. Performance of propensity score calibration a simulation stuy. Am J Epiemiol. 2007;165: 1110-8. 27. McHenry TP, Mirza SK, Wang J, Wae CE, O Keefe GE, Dailey AT, Schreiber MA, Chapman JR. Risk factors for respiratory failure following operative stabilization of thoracic an lumbar spine fractures. J Bone Joint Surg Am. 2006;88:997-1005. 28. Rosenbaum PR, Rubin DB. Reucing bias in observational stuies using subclassification on the propensity score. J Am Stat Assoc. 1984;79: 516-24. 29. Stürmer T, Joshi M, Glynn RJ, Avorn J, Rothman KJ, Schneeweiss S. A review of the application of propensity score methos yiele increasing use, avantages in specific settings, but not substantially ifferent estimates compare with conventional multivariable methos. J Clin Epiemiol. 2006;59: 437-47. 30. Shah BR, Laupacis A, Hux JE, Austin PC. Propensity score methos gave similar results to traitional regression moeling in observational stuies: a systematic review. J Clin Epiemiol. 2005;58:550-9.