COST EFFECTIVENESS STATISTIC: A PROPOSAL TO TAKE INTO ACCOUNT THE PATIENT STRATIFICATION FACTORS. C. D'Urso, IEEE senior member

Similar documents
Evaluation on liver fibrosis stages using the k-means clustering algorithm

Knowledge Discovery and Data Mining I

Outlier Analysis. Lijun Zhang

PO Box 19015, Arlington, TX {ramirez, 5323 Harry Hines Boulevard, Dallas, TX

Using Association Rule Mining to Discover Temporal Relations of Daily Activities

Comparative Study of K-means, Gaussian Mixture Model, Fuzzy C-means algorithms for Brain Tumor Segmentation

Visualizing Temporal Patterns by Clustering Patients

Practical propensity score matching: a reply to Smith and Todd

CHAPTER 3 METHOD AND PROCEDURE

Identification of Tissue Independent Cancer Driver Genes

Appendix. Lifetime extrapolation of data from the randomised controlled DiGEM trial

Vocabulary. Bias. Blinding. Block. Cluster sample

Biostatistics for Med Students. Lecture 1

Chapter 1. Introduction

Type II Fuzzy Possibilistic C-Mean Clustering

Active Sites model for the B-Matrix Approach

Sum of Neurally Distinct Stimulus- and Task-Related Components.

Unit 1 Exploring and Understanding Data

2/20/2012. New Technology #1. The Horizon of New Health Technologies. Introduction to Economic Evaluation

Introduction & Basics

extraction can take place. Another problem is that the treatment for chronic diseases is sequential based upon the progression of the disease.

Chapter 11: Advanced Remedial Measures. Weighted Least Squares (WLS)

Developing and Testing Hypotheses Kuba Glazek, Ph.D. Methodology Expert National Center for Academic and Dissertation Excellence Los Angeles

Predicting Heart Attack using Fuzzy C Means Clustering Algorithm

Data Mining. Outlier detection. Hamid Beigy. Sharif University of Technology. Fall 1395

Study Designs. Lecture 3. Kevin Frick

COMPUTER AIDED DIAGNOSTIC SYSTEM FOR BRAIN TUMOR DETECTION USING K-MEANS CLUSTERING

T. R. Golub, D. K. Slonim & Others 1999

A NOVEL VARIABLE SELECTION METHOD BASED ON FREQUENT PATTERN TREE FOR REAL-TIME TRAFFIC ACCIDENT RISK PREDICTION

Analyzing Human Negotiation using Automated Cognitive Behavior Analysis: The Effect of Personality. Pedro Sequeira & Stacy Marsella

MODEL-BASED CLUSTERING IN GENE EXPRESSION MICROARRAYS: AN APPLICATION TO BREAST CANCER DATA

Jeffrey S. Hoch studies claims of cost-effectiveness were invalidated by methodological problems; however, it is unlikely that any treatment will be c

A Scoring Policy for Simulated Soccer Agents Using Reinforcement Learning

IDENTIFICATION OF OUTLIERS: A SIMULATION STUDY

WDHS Curriculum Map Probability and Statistics. What is Statistics and how does it relate to you?

Data Mining in Bioinformatics Day 7: Clustering in Bioinformatics

Using Bayesian Networks to Analyze Expression Data. Xu Siwei, s Muhammad Ali Faisal, s Tejal Joshi, s

Monte Carlo Analysis of Univariate Statistical Outlier Techniques Mark W. Lukens

Clopidogrel and modified-release dipyridamole for the prevention of occlusive vascular events (review of Technology Appraisal No.

Statistical Techniques. Masoud Mansoury and Anas Abulfaraj

Hypothesis-Driven Research

Natural Scene Statistics and Perception. W.S. Geisler

Clinical Epidemiology for the uninitiated

Motivation: Fraud Detection

Measuring Focused Attention Using Fixation Inner-Density

6. Unusual and Influential Data

Sampling Uncertainty / Sample Size for Cost-Effectiveness Analysis

Automated Assessment of Diabetic Retinal Image Quality Based on Blood Vessel Detection

List of Figures. List of Tables. Preface to the Second Edition. Preface to the First Edition

Empirical function attribute construction in classification learning

Colon cancer subtypes from gene expression data

Containment Policies for Transmissible Diseases

IJRE Vol. 03 No. 04 April 2016

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

An Empirical Study on Causal Relationships between Perceived Enjoyment and Perceived Ease of Use

Resonating memory traces account for the perceptual magnet effect

IN SPITE of a very quick development of medicine within

The development of a modified spectral ripple test

[1] provides a philosophical introduction to the subject. Simon [21] discusses numerous topics in economics; see [2] for a broad economic survey.

Understandable Statistics

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

Combining machine learning and matching techniques to improve causal inference in program evaluation

Confidence Intervals. Chapter 10

UN Handbook Ch. 7 'Managing sources of non-sampling error': recommendations on response rates

Exploring uncertainty in cost effectiveness analysis. Francis Ruiz NICE International (acknowledgements to: Benjarin Santatiwongchai of HITAP)

Evaluating health management programmes over time: application of propensity score-based weighting to longitudinal datajep_

Predicting Breast Cancer Survival Using Treatment and Patient Factors

Recognizing Scenes by Simulating Implied Social Interaction Networks

How to describe bivariate data

Lecture Outline. Biost 517 Applied Biostatistics I. Purpose of Descriptive Statistics. Purpose of Descriptive Statistics

Positive and Unlabeled Relational Classification through Label Frequency Estimation

Supplement 2. Use of Directed Acyclic Graphs (DAGs)

Study population The study population comprised patients receiving ibutilide for acute chemical conversion of AF or flutter.

Modern Regression Methods

International Journal of Computer Science Trends and Technology (IJCST) Volume 5 Issue 1, Jan Feb 2017

Dr. Allen Back. Sep. 30, 2016

Biostatistics II

Complier Average Causal Effect (CACE)

Comparative study of Naïve Bayes Classifier and KNN for Tuberculosis

AQCHANALYTICAL TUTORIAL ARTICLE. Classification in Karyometry HISTOPATHOLOGY. Performance Testing and Prediction Error

UNSUPERVISED PROPENSITY SCORING: NN & IV PLOTS

environment because of dynamically changing topology and lack of accurate and timely information,

DEVELOPING AN EMERGENCY ROOM DIAGNOSTIC CHECK LIST USING ROUGH SETS: A CASE STUDY OF APPENDICITIS

Introspection-based Periodicity Awareness Model for Intermittently Connected Mobile Networks

SRI LANKA AUDITING STANDARD 530 AUDIT SAMPLING CONTENTS

Data Mining in Bioinformatics Day 4: Text Mining

Preparing More Effective Liquid State Machines Using Hebbian Learning

Review Statistics review 2: Samples and populations Elise Whitley* and Jonathan Ball

Appendix III Individual-level analysis

Case study examining the impact of German reunification on life expectancy

Adverse Outcomes After Hospitalization and Delirium in Persons With Alzheimer Disease

CHAPTER 3 DATA ANALYSIS: DESCRIBING DATA

Bayesian Multiparameter Evidence Synthesis to Inform Decision Making: A Case Study in Metastatic Hormone-Refractory Prostate Cancer

Technology appraisal guidance Published: 28 November 2012 nice.org.uk/guidance/ta267

Information-Theoretic Outlier Detection For Large_Scale Categorical Data

INTERNATIONAL STANDARD ON AUDITING (NEW ZEALAND) 530

What you should know before you collect data. BAE 815 (Fall 2017) Dr. Zifei Liu

Grounding Ontologies in the External World

Research Methods 1 Handouts, Graham Hole,COGS - version 1.0, September 2000: Page 1:

Programs for facies modeling with DMPE

Transcription:

COST EFFECTIVENESS STATISTIC: A PROPOSAL TO TAKE INTO ACCOUNT THE PATIENT STRATIFICATION FACTORS ABSTRACT C. D'Urso, IEEE senior member The solution here proposed can be used to conduct economic analysis in randomized clinical trials. It is based on a statistical approach and aims at calculating a revised version of the incremental costeffective ratio (ICER) in order to take into account the key factors that can influence the choice of therapy causing confounding by indication. Let us take as an example a new therapy to treat cancer being compared to an existing therapy with effectiveness taken as time to death. A challenging problem is that the ICER is defined in terms of means over the entire treatment groups. It makes no provision for stratification by groups of patients with differing risk of death. For example, for a fair and unbiased analysis, one would desire to compare time to death in groups with similar life expectancy which would be impacted by factors such as age, gender, disease severity, etc. The method we decided to apply is borrowed by cluster analysis and aims at (i) discard any outliers in the set under analysis that may arise, (ii) identify groups (i.e. clusters) of patients with "similar" key factors. INTRODUCTION The value of a new drug or therapy is often evaluated in terms of Incremental Cost Effectiveness Ratio (ICER) [1]. It is an equation used commonly in health economics to provide a practical approach to decision making regarding health interventions, in fact within a trial of two interventions it is the measure primarily used to compare the cost-effectiveness of the experimental treatment relative to the control treatment. The equation for ICER is: =, where and are the mean costs, and and are the mean effects for the experimental and control treatments, respectively. In order to take into account characteristic key factors of the therapy (e.g. age, LVEF and NYHA in a ICD therapy [2]), we consider the generic set P of n patients each described by m variables (key factors) as a data table P(n,m) with n rows and m columns. Our goal can be stated as follows: Goal. Identify k subsets (called ) of the set P containing elements with, such that each of them contains patients with similar key factors, and such that: + =, with the numbers of outliers discarded by the method. The calculation of the index on the elements of each group of patients must lead us to construct the overall incremental cost effectiveness ratio (ICER). Although we could assume some statistical properties of the distributions involved, in the following we will work under the following hypothesis: Hypotesis. No assumption about prior distribution or other parametric choices will be considered.

RELATED WORK Methods for describing the distributional properties of ICER statistics have been reported in the past, however, they are challenging even before the notion of stratification is added. In general, the distribution of the ratio of two statistics (in this case, the ratio in the difference of two means) is complicated. Interesting approaches such as Fieller s theorem have been suggested. More successful have been approaches base on the bootstrap [3]. However the work of Abadie and Imbens [4] suggests more caution should be exercised before approaches based on patient matching and stratification are embarked upon. RATIONALE OF THE METHOD An outlier is an observation that, as atypical or erroneous, deviates significantly from the behavior of other data, with reference to the type of analysis considered [5, 6, 7]. In our problem there could be among the patients a subset with some characteristics (i.e. key factors) very dissimilar to the others that could negatively affect the economic analysis in randomized clinical trials. The method proposed exploits the theories of cluster analysis, and as a first step it identifies those points of the set under analysis that have "distance" from the centroid (in terms of standard deviations) more than a specific threshold away. As an example to make clear the idea, in figure 1 is depicted the Mahalanobis distance from the centroid of each point in a certain set of data. The potential outliers are those more distant from the centroid. The Mahalanobis distance is one of the possible distance that can be considered, it has the important characteristic that it takes into account the context of the data that is the relation between data. The rational is the same for the more common euclidean distance as well. Figure 1 - Distance from each point in a set to the centroid (the spikes are potential errors in the set)

The cluster analysis applied to the set of patients data identifies a certain number (say k) ) of groups. In each of these group it will be calculated the incremental cost effective ratio. At this point the analysis applied may provide a situation similar to that depicted in the Figure 2. The clusters obtained in this way are composed intrinsically of a homogeneous set of individuals and containn both patients belonging to the experimental treatmentt group and patients belonging to the control treatment group. Thus for each of these clusters it is possible to calculate the index ICER, and then calculate the overall ICER in accordance with the formula provided in the following (see Step 3) ). Figure 2 - Representation of a typical scenario after the application of cluster analysis to the data of the patients involving in clinical trials. SOLUTION Here it is proposed a method based on cluster analysis. In particular we try to automate the calculation by means of classification of patients in a set of clusterss in order to achieve the goal. The method is composed of three sequential steps described in the following (the provided in Figure 3): diagram is

STEP ONE (cluster analysis) First of all the data related to the patients are stored in an array P of n rows (total number of the patients involved in the clinical trial) and m columns (number of key factors). We assume that from a particular attribute of each patient (e.g. the identification code) we can distinguish if he belongs to the experimental treatment group or to the control treatment group. The algorithm called DBSCAN is applied to the objects in P. It is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander and Xiaowei Xu in 1996 [8]. It is a density-based clustering algorithm because it finds a number of clusters starting from the estimated density distribution of corresponding nodes. DBSCAN is one of the most common clustering algorithms and also most cited in scientific literature (see appendix for the code provided in the Octave software environment [9]). After the application of the algorithm, the objects (i.e. the rows of the array) are classified in k clusters plus a set of outlier data (cluster labeled '-1') that represents the patients with characteristics too different from the others as to suggest that they have been wrongly chosen for the trial (see Figure 2). STEP TWO (calculation of each ICER j ) Once we got the k groups of patients we can calculate k indexes ICERj distinguishing between the individuals belonging to the experimental treatment group and those belonging to the control treatment group, as follows: where: k are the clusters identified in step 1 =,=1.. C and C are the mean costs, and E and E are the mean effects for the experimental and control treatments, respectively. STEP THREE (calculation of the overall ICER) The final step consists in the calculation of the overall incremental cost effectiveness ratio based on a weighted average of contributions from the k ratios obtained in step two. where: = = h + h = + +, number of patients in the cluster j

number of patients discarded as potential outliers = +, number of patients involved in the clinical trial In the following diagram is represented the logical flow of the method. Data about the n patients (key factors) Cluster Analysis Potential outliers discarded Patients divided in k groups Calculation of each ICER j Calculationofthe overall ICER Figure 3 - Logical flow of the proposed solution. CONCLUSION The method proposed takes into account the effect of strata in a population of patients. It uses the key factors associated to each patients as inner properties in order to group them and apply the computation separately for each homogenous group. No assumption about prior distribution or other parametric choices has been considered. The descriptive parameters of the new overall ICER can be written as a function of the corresponding descriptive measures of each index ICER : = = Therefore the variability of overall ratio can be expressed in terms of variability of the k indexes ICER (see reference [10] for a comparison of four methods of confidence intervals computation for the not stratified version of ICER). The action of disregarding anomalous data (see step 1 of the process and the notion of outliers) introduces a particular robustness in the proposed method. First of all we observe that clustering the data as the first step of the method allows analysis of homogeneous group of patients minimizing the overall effect of the observational nature of the data (i.e. non-random). We can state that such an algorithm is able to create subsets within a patient population that can provide more detailed

information about how the patient will respond to a given drug. Furthermore missing data or anomalous data can be distinguished and labeled as outliers in the first step of the process, and besides if there are some patients with censored data it is very likely that they are grouped in the same cluster. REFERENCES [1] Drummond MF. Principles of economics appraisal in health care. Oxford University Press, 1980. [2] Sheldon R, et alt.. Effect of clinic risk stratification on costeffectiveness of the implantable cardioverter-defibrillator: the canadian implantable defibrillator study. Circulation, 2001, 104:1622-1626. [3] Campbell MK and Torgenson DJ. Bootstrapping: estimating confidence intervals for cost effectiveness ratios. Q J Med,1999;92:177 182. [4] Abadie A and Imbens GW. On the Failure of the Bootstrap for Matching Estimators. Econometrica, 2008, Vol. 76, No. 6, pp. 1537-1557. [5] P. Davies and U. Gather, "The identification of multiple outliers," In Journal of the American Statistical Association, 88, 1993, p. 782-801. [6] John Dunagan, Santosh Vempala: "Optimal outlier removal in highdimensional spaces". J. Comput. Syst. Sci. 68(2): 335-373 (2004). [7] M. Riani, S. Zani, "An iterative method for the detection of multivariate outliers", Metron, Vol.55, pp. 101-117 (1997). [8] Ester M, Kriegel HP, Sander J, Xiaowei Xu. A density-based algorithm for discovering clusters in large spatial databases with noise. In Evangelos Simoudis, Jiawei Han, Usama M. Fayyad. Proceedings of the Second International Conference on Knowledge Discovery and Data Mining 1996. AAAI Press. pp. 226 231. ISBN 1-57735-004-9. [9] http://www.gnu.org/software/octave/. [10] D. Polsky, H.A. Glick, et al., "Confidence intervals for costeffectiveness ratios: a comparison of four methods", Health Economics, Vol.6, pp 243-252 (1997).