Visualizing Data for Hypothesis Generation Using Large-Volume Health care Claims Data

Similar documents
Exploring Temporal Patterns in Hypertensive Drug Therapy

Understanding Temporal Patterns in Hypertensive Drug Therapy

Visualizing Data for Hypothesis Generation Using Large-Volume Claims Data

Creating prognostic systems for cancer patients: A demonstration using breast cancer

Comparing Cohorts of Event Sequences

Spending estimates from Cancer Care Spending

extraction can take place. Another problem is that the treatment for chronic diseases is sequential based upon the progression of the disease.

Two: Chronic kidney disease identified in the claims data. Chapter

Efficiency Methodology

Behind the Cascade: Analyzing Spatial Patterns Along the HIV Care Continuum

The University of Mississippi School of Pharmacy

Jacqueline C. Barrientos, Nicole Meyer, Xue Song, Kanti R. Rai ASH Annual Meeting Abstracts 2015:3301

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

In each hospital-year, we calculated a 30-day unplanned. readmission rate among patients who survived at least 30 days

SPECIAL ISSUE. Medicaid Prescription Drug Access Restrictions: Exploring the Effect on Patient Persistence With Hypertension Medications

OBSERVATIONAL MEDICAL OUTCOMES PARTNERSHIP

Section K. Economic costs of ESRD. Vol 3 esrd. pg 731. K tables

Using claims data to investigate RT use at the end of life. B. Ashleigh Guadagnolo, MD, MPH Associate Professor M.D. Anderson Cancer Center

The effect of surgeon volume on procedure selection in non-small cell lung cancer surgeries. Dr. Christian Finley MD MPH FRCSC McMaster University

The Use of Clinical Trial Data in Combination with External Data Sources to Examine Novel Cancer Research Questions: A Modified Big Data Approach

Jukti Kumar Kalita, PhD Business Analytics and Insights Pfizer Presented at:

A DISSERTATION SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY. Helen Mari Parsons

Status of the CKD and ESRD treatment: Growth, Care, Disparities

We see health care. differently. Comprehensive data Novel insights Transformative actions Lasting value

CERCIT Workshop: Texas Cancer Registry; Medicaid; Registry Linked Claims Data

Finland and Sweden and UK GP-HOSP datasets

Setting the stage for change: upgrading the physician cancer case reporting application in New York

CLINICAL PROCESS IMPROVEMENT INITIATIVE (CPII) EFFICIENCY REPORT EXPLANATION January 4, 2016

Geographic Variation of Advanced Stage Colorectal Cancer in California

Cost-Effectiveness of Second-Line Chemotherapy/ Biologics among Elderly Metastatic Colon Cancer Patients

HEALTH CARE EXPENDITURES ASSOCIATED WITH PERSISTENT EMERGENCY DEPARTMENT USE: A MULTI-STATE ANALYSIS OF MEDICAID BENEFICIARIES

THE growing volume and variety of data presents both

Contents. Part 1 Introduction. Part 2 Cross-Sectional Selection Bias Adjustment

The Linked SEER-Medicare Data and Cancer Effectiveness Research

We Honor Veterans State Survey January 2012

Chapter 5: Acute Kidney Injury

Chapter 6: Healthcare Expenditures for Persons with CKD

Ethnic Disparities in the Treatment of Stage I Non-small Cell Lung Cancer. Juan P. Wisnivesky, MD, MPH, Thomas McGinn, MD, MPH, Claudia Henschke, PhD,

METHODS RESULTS. Supported by funding from Ortho-McNeil Janssen Scientific Affairs, LLC

JINANI JAYASEKERA, MA

Chapter 2: Identification and Care of Patients With CKD

Chapter 2: Identification and Care of Patients with CKD

Hypertension and diabetes treatments and risk of adverse outcomes among breast cancer patients. Lu Chen

INTRODUCTION TO SURVIVAL CURVES

Follow this and additional works at:

Propensity Score Matching with Limited Overlap. Abstract

Data Fusion: Integrating patientreported survey data and EHR data for health outcomes research

The American Experience

Using Hospital Admission and Readmission Patterns to Improve Outreach to Persons Living with HIV/AIDS in Pennsylvania

NEUROSCIENCE TRIALS OF THE FUTURE: A WORKSHOP Pragmatic Trials: Challenges and Opportunities for Neuroscience Trials

DOES PROCESS QUALITY OF INPATIENT CARE MATTER IN POTENTIALLY PREVENTABLE READMISSION RATES?

Predictive Diagnosis. Clustering to Better Predict Heart Attacks x The Analytics Edge

Prediction and Inference under Competing Risks in High Dimension - An EHR Demonstration Project for Prostate Cancer

PHARMO Database Network

Surveillance of Pancreatic Cancer Patients Following Surgical Resection

ffice for the Study of Aging University of South Carolina, USA Presented at REVES 2005, Beijing, China

Is it so small a thing To have enjoy d the sun, To have lived light in the spring, To have loved, to have thought, to have done e133

Dmitriy Fradkin. Ask.com

Utilisation and cost of health services in the last six months of life: a comparison of cohorts with and without cancer

Pre-ALLHAT Drug Use. Diuretics. ß-Blockers. ACE Inhibitors. CCBs. Year. % of Treated Patients on Medication. CCBs. Beta Blockers.

Top 10 Conditions by Resource Use. Rank by Total Resource Use*

Prescription Pattern of Anti-Hypertensive Drugs in Adherence to JNC- 7 Guidelines

Modelling Spatially Correlated Survival Data for Individuals with Multiple Cancers

Cost-Motivated Treatment Changes in Medicare Part B:

Confounding in influenza VE studies in seniors, and possible solutions

Supplementary Appendix

Analyzing Clusters of Disease and Populations to Simplify and Improve Care Delivery. July 28, 2016

Essentia Health Duluth Clinic RN Hypertension Management Pilot

Epidemiologic Research and Surveillance of the Epilepsies

Presenter. Rebecca Susic Director Account Management MEDai

Study Exposures, Outcomes:

Arkansas Health Care Payment Improvement Initiative Congestive Heart Failure Algorithm Summary

Detecting Anomalous Patterns of Care Using Health Insurance Claims

Florida State University Libraries

Big Data & Predictive Analytics Case Studies: Applying data science to human data Big-Data.AI Summit

THE IMPORTANCE OF COMORBIDITY TO CANCER CARE AND STATISTICS AMERICAN CANCER SOCIETY PRESENTATION COPYRIGHT NOTICE

Table 2. Distribution of Normalized Inverse Probability of Treatment Weights. Healthcare costs (US $2012) Notes:

Does Machine Learning. In a Learning Health System?

Predicting Breast Cancer Survivability Rates

Temporal Trends in Demographics and Overall Survival of Non Small-Cell Lung Cancer Patients at Moffitt Cancer Center From 1986 to 2008

A DATA MINING APPROACH FOR PRECISE DIAGNOSIS OF DENGUE FEVER

GSK Medicine: Study Number: Title: Rationale: Study Period: Objectives: Indication: Study Investigators/Centers: Research Methods: Data Source

National Cancer Institute

Evaluation of a Clinical Decision Support Rule-set for Medication Adjustments in mhealth-based Heart Failure Management

A Retrospective Claims Analysis of Medication Adherence and. Persistence Among Patients Taking Antidepressants

Research Article Continuous Positive Airway Pressure Device Time to Procurement in a Disadvantaged Population

The Geography of Viral Hepatitis C in Texas,

Commercial Health Insurance Claims Data. for Studying HIV/AIDS Care. Senior Scientist, Innovus Epidemiology. David D.

Jae Jin An, Ph.D. Michael B. Nichol, Ph.D.

Research Article Recognition of Depression and Anxiety among Elderly Colorectal Cancer Patients

International Journal of Advancements in Research & Technology, Volume 2, Issue 6, June-2013 ISSN

Process Mining in Healthcare. Ronny Mans

Supplementary Material*

Care Management Technologies

Health Services Utilization and Medical Costs Among Medicare Atrial Fibrillation Patients / September 2010

Antihypertensive Trial Design ALLHAT

Journal: Nature Methods

Hospice: Life s Final Journey Are You Ready?

Survival Prediction Models for Estimating the Benefit of Post-Operative Radiation Therapy for Gallbladder Cancer and Lung Cancer

Baseline Health Data Report: Cambria and Somerset Counties, Pennsylvania

Transcription:

Visualizing Data for Hypothesis Generation Using Large-Volume Health care Claims Data Eberechukwu Onukwugha PhD, School of Pharmacy, UMB Margret Bjarnadottir PhD, Smith School of Business, UMCP Shujia Zhou PhD, Computer Science, UMBC Acknowledgement Catherine Plaisant PhD, Human Computer Interaction Lab (HCIL), UMCP Sana Malik, Computer Science, UMCP Ran Qi, UMBC Jinani Jayasekera, UMB 1

A picture is worth What hypotheses would you want to test? Motivation Few tools for hypothesis generation and datadriven insight Limited guidance on how to generate insight Identify available tools Two case studies 2

Proportional symbols Treemap Choropleth mapping Available tools Proportional symbols Source: http://canceratlas.cancer.org/risk-factors/ [Accessed 5/6/2016] 3

Treemap Source: http://vizhub.healthdata.org/gbd-compare/ [Accessed 5/16/2016] Choropleth Map Source: http://vizhub.healthdata.org/gbd-compare/ [Accessed 5/6/2016] 4

Ideal tool Suited to large-volume health utilization data High-level abstractions and individual detail Integrated into to population health analyses Cost Prediction Using a Survival Grouping Algorithm: An Application to Incident Prostate Cancer Cases E Onukwugha, R Qi, J Jayasekera, S Zhou PharmacoEconomics 2016 Feb;34(2):207-16. 5

Outline Motivation Grouping Algorithm Interpretation Hypothesis generation Objective To illustrate how a grouping algorithm can be used to generate hypotheses regarding cost accumulation 6

Prognostic systems Prognostic systems are utilized in clinical practice to predict survival and outcomes Example: TNM classification system for classifying primary tumors Patients within a TNM class would have similar disease progression and survival Patient demographics Clinical predictors Cancer histology Age at diagnosis Comorbid conditions Survival Curves Average survival pattern 7

Cost Curves We can apply this approach to build cost curves across patient groups Identify cost predictors over time Prognostic Systems Identify groups of patients who have similar clinical prognostic factors TNM classification scheme is a bin model Mutually exclusive bins Exhaustive partitioning of patients Bins are grouped into stages Use the mean survival of patients in a bin to predict the survival of a new patient placed in that bin 8

Prognostic Systems TNM puts constraints on the number of prognostic factors The Ensemble Algorithm for Clustering Cancer Data (EACCD) 1 admits more prognostic factors Increased prediction accuracy Long computational time Increasing the number of prognostic factors increases the number of bins dramatically 1. Chen D, Xing K, Henson D, Sheng L, Schwartz AM, Cheng X. Developing prognostic systems of cancer patients by ensemble clustering. BioMed Research International. 2009 Prognostic Systems and Cost Curves Grouping Algorithm for Cancer Data (GACD) 1 uses clustering algorithm Reduces computational time Increases clustering accuracy 1. Qi R, Zhou S, editors. Simulated Annealing Partitioning: An Algorithm for Optimizing Grouping in Cancer Data. Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on; 2013: IEEE. 9

Overview of SEER-Medicare SEER - Cancer registry data - Cause of death - Area characteristics - Cases from 2000 to 2007 Medicare - Health Care claims - Treatment dates - Cost data - Claims from 1999 to 2009 SEER- Medicare 22 Methods Datasets Application of the GACD required two sets of data 1. Survival data Survival time Clinical variables Demographics Indicator for censoring 10

Methods Datasets (cont.) 2. Cost data Healthcare costs reimbursed by Medicare Physician Skilled nursing facility Outpatient care Home health care Hospice care Durable medical equipment Costs in 2009 US dollars using the Consumer Price Index (CPI) Methods Prognostic Factors Prognostic factors and cost drivers Identified from literature reviews Cancer stage Urban residence Age Poor performance status (proxy indicator) race 11

Methods Data Processing Patients with similar profiles were grouped into natural clusters Example Data Grouping Example Variable Unformatted data Formatted data Stage Urban/ rural location at the time of diagnosis Age at the time of diagnosis Stage 0 0 Stage I 1 Stage II 2 Stage III 3 Stage IV 4 Unknown stage 5 Rural 0 Urban 1 65-69 1 70-74 2 75-79 3 80-84 4 85+ 5 12

Data Grouping Example (cont.) Variable Performance status proxy measured in the 12 months prior to prostate cancer diagnosis Race Grade Unformatted data No claims for use of walking aids, wheelchairs, oxygen and related supplies, skilled nursing facility or hospitalizations 12 months prior to prostate cancer diagnosis. Formatted data At least one claim for the above 1 Non-Hispanic White 1 Non-Hispanic African American 2 Hispanic 3 Other 4 Well differentiated, moderately differentiated, unknown 0 Poorly differentiated or un-differentiated 1 0 Data Grouping Example (cont.) Factors Mean Level Combination Stage Prostate cancer stage II 2 Age 75-79 3 Race White 1 Performance status proxy measured in the 12 months prior to prostate cancer diagnosis No relevant indicator 0 23101 Natural cluster Urban Urban 1 13

Methods Grouping by Survival Similarity GACD applied to the formatted data (e.g. 23101 ) Patients grouped according to their survival similarities Generated cost curves for the resulting groups Methods Grouping Algorithm Step 1: Extract the combinations based on the all possible combinations, removing the combinations 100 14

Methods Grouping Algorithm (cont.) Step 2: Using the log-rank test to initialize the dissimilarity between combinations Apply a sequence of non-randomized clustering procedures (e.g., the PAM algorithm) Redefine the dissimilarity between combinations by assigning weights to clustering results Methods Grouping Algorithm (cont.) Step 3: Performing agglomerative hierarchical clustering to obtain the affinity to the survival curves as well as groups of patients 15

Methods Grouping Algorithm (cont.) The hierarchical clustering result represented by a dendrogram Average linkage method Closest survival curves are merged first Connected components form clusters Dendrogram of nine combinations Methods Grouping Algorithm (cont.) Step 4: Identifying groups of patients for plotting cost curves 16

Original bushy curves Intermediate trimmed curves 17

$US Final trimmed curves Cost curves for final survival curves 18

Training Sample 50,091 men with incident prostate cancer Result from grouping algorithm Review Interpret Generate hypotheses Results Interpret Variable Stage at prostate cancer diagnosis Full Sample (N a =50,091) % (or Mean) Group 0 (N a = 8,897) Col. % b Group 1 (N a =7,572) Col. % b Group 2 (N a = 29,006) Col. % b Group 3 (N a = 2,727) Col. % b Group 4 (N a = 1,889) Col. % b 0 - - - - - - 1 - - - - - - 2 27,556 55.01 45.80 42.45 68.62 13.35 0.00 3 2,269 4.53 3.62 1.85 6.23 0.00 0.00 4 3,393 6.77 0.00 15.21 0.00 28.31 77.77 Unknown 16,873 33.68 50.58 40.49 25.15 58.34 22.23 P-value c <0.01 19

Variable Demographic Characteristics Age at diagnosis Results Interpret (cont.) Full Sample (N a =50,091) % (or Mean) Group 0 (N a = 8,897) Col. % b Group 1 (N a =7,572) Col. % b Group 2 (N a = 29,006) Col. % b Group 3 (N a = 2,727) Col. % b Group 4 (N a = 1,889) Col. % b P-value c 65-69 12,543 25.04 3.61 9.90 39.55 0.00 0.00 <0.01 70-74 15,070 30.09 22.06 9.67 41.80 9.17 0.00 75-79 11,994 23.94 44.76 27.26 18.17 19.14 8.26 80-84 7,018 14.01 29.57 40.89 0.48 17.64 35.52 85+ 3,466 6.92 0.00 12.28 0.00 54.05 56.22 Age (mean, SD) d 50,091 74.5 (6.1) 76.9 (4.1) 78.8 (5.8) 71.1 (3.7) 83.3 (6.0) 85.5 (4.6) <0.01 Race White non- Hispanic 42,988 85.82 86.50 83.49 84.40 95.34 100.00 African American 3,774 7.53 8.42 15.10 6.05 4.66 0.00 Other e 1,508 3.01 1.65 1.41 4.32 0.00 0.00 Location Urban 46,525 92.88 92.24 95.83 91.66 94.79 100.00 Rural 3,566 7.12 7.76 4.17 8.34 5.21 0.00 92% are of age 80 <0.01 <0.01 Results Interpret (cont.) Variable Full Sample (N a =50,091) % (or Mean) Group 0 (N a = 8,897) Col. % b Group 1 (N a =7,572) Col. % b Group 2 (N a = 29,006) Col. % b Group 3 (N a = 2,727) Col. % b Group 4 (N a = 1,889) Col. % b P-value c Clinical Characteristics (Pre- <0.01 Period) Charlson comorbidity index <0.01 Zero One Two or higher Missing Performance status proxies 32,486 64.85 61.28 54.52 70.80 54.75 46.37 10,202 20.37 22.94 22.97 18.59 21.38 23.61 5,399 10.78 12.99 18.89 6.21 19.44 25.46 2,004 4.00 2.79 3.63 4.39 4.44 4.55 7,366 14.71 21.67 36.56 2.46 35.50 52.30 <0.01 20

Results Interpret (cont.) Variable Characteristics (Post-Period) Charlson comorbidity index Full Sample (N a =50,091) % (or Mean) Group 0 (N a = 8,897) Col. % b Group 1 (N a =7,572) Col. % b Group 2 (N a = 29,006) Col. % b Group 3 (N a = 2,727) Col. % b Group 4 (N a = 1,889) Col. % b P-value c Zero 27,653 55.21 53.95 45.77 61.55 38.10 26.26 One 10,898 21.76 22.88 22.39 21.88 19.66 15.03 Two or higher 7,129 14.23 16.74 21.17 10.83 20.50 17.79 Missing 4,411 8.81 6.43 10.67 5.74 21.75 40.92 Performance status 22,354 44.63 38.27 49.51 42.30 57.79 71.68 <0.01 proxies All cause death 12,434 24.82 27.00 43.90 10.91 70.30 86.02 <0.01 Prostate cancer related death 3,100 28.47 15.90 27.48 16.77 36.12 57.30 <0.01 Time-to death (in days) (mean, SD) Length of followup (in days) (mean, SD) 12,434 1,092 (799) 50,091 1,562 (847) 1,274 (827) 1,716 (886) 1,126 (786) 1,495 (884) 1,263 (789) 1,618 (786) 940 (755) 1,216 (901) Second longest 601 (593) 742 (725) <0.01 <0.01 <0.01 Hypothesis generation Generate hypotheses from grouping results on full sample Consider subsamples Inpatient costs Outpatient costs Prescription costs 21

$US $US Hypotheses Predictors of cost accumulation Group 4, highest - Mortality rate - Proportion with CCI 2 in the pre-period - Proportion with at least one performance status proxy indicator - No health services use prior to dx Hypotheses (contd.) Group 1: - Highest proportion of African Americans (15%) Group 3: - Largest number of men with an unknown cancer stage (58%) Groups 1 and 3: - Highest proportion of men with CCI score 2 in the post-period only 22

$US $US Cost: White, non-hispanic (WNH) sample Cost: African-American (AA) sample 23

$US $US Inpatient Cost: WNH Inpatient Cost: AA 24

Limitations The incorporation of other available claimsbased measures may lead to different hypotheses Utilizing electronic medical records and other linked datasets could impact the grouping results and hypotheses Prognostic tools Discussion Provide information on health status over time Describe cost accumulation over time Grouping algorithm can characterize groups associated with higher future costs Grouping algorithm can be used to generate hypotheses for future research 25

Human Computer Interaction Lab Visualization with EventFlow Margrét Vilborg Bjarnadóttir Robert H. Smith School of Business University of Maryland With Eberechukwu Onukwugha, Sana Malik, Catherine Plaisant and Tanisha Gooden Morbidity Mortality Costs Adherence $13.35 billion in hospitalization costs annually due to medication non-adherence (Sullivan et al 1990) The medication possession ratio (MPR) Michael A. Kane, Margrét V. Bjarnadóttir, Sanjay Ghimire. 2012. Study of compliance in hypertension treatment American Society of Hypertension, Annual Scientific Meeting. Poster Presentation, New York, NY, May 2012. 26

The Data 900,000 Individuals 16 million prescription claims 5 Drug classes: Angiotension-Converting Enzyme-Inhibitors (ACE) Angiotension II Receptor Blockers (ARB) Calcium Channel Blockers (CCB) Beta blockers (Beta) Diuretics The Research Questions Can we use visualization to understand adherence patterns What are the effects of modeling decisions on our outcome measures 27

Hypertension Treatment time Hypertension Treatment time 28

Event Flow Event Flow 29

EventFlow Behind the GUI VISUALIZE Display the aggregation RECORD RECORD RECORD AGGREGATE Merge multiple records into tree A B Constructing the EventFlow Overview C D E 30

31

32

33

34

Number of Records Time 35

Event Flow 36

USING EVENTFLOW WITH CLAIMS DATA Confetti to visualizations that answer questions 37

Hypertension Treatment time time Gaps & Overlaps 38

Gaps & Overlaps Case study UNERSTANDING TEMPORAL PATTERNS IN HYPERTENSIVE THERAPY 39

The Patients Event Flow Visualizing Patterns 40

Diuretics General Patterns The Research Questions What are the effects of modeling decisions on our outcome measures Can we use visualization to understand adherence patterns 41

Diuretics Only - Gaps Diuretics Only - Gaps Modeling Decisions* 42

Diuretics Only - Gaps Diuretics Only - Gaps 43

Diuretics Only & Gaps The Research Questions What are the effects of modeling decisions on our outcome measures Can we use visualization to understand adherence patterns Can we understand patient behavior? Can we identify good vs bad patterns? Can we identify early non-adherence? Patterns vs. medical outcomes Pattern stabilitization 44

Event Flow Drilling Down Non-compliance to guidelines for members with a history of Heart Failure Non-dihydropyridines (Acceptable) Dihydropyridines (Not acceptable) 45

EventFlow Summary On our way to understanding adherence behavior in hypertension therapy: Patterns are far from ideal How should adherence be described? EventFlow is a great tool to: Understand the big picture Drill down Generate hypotheses More information: www.hcil.umd.edu/eventflow 46

http://hcil.umd.edu/coco Cohort Comparison Thank You! 47

Q & A Appendix 48

Dataset : Linked Surveillance, Epidemiology and End Results (SEER)- Medicare database. 8 Methods Cost Curves and Prediction The curve of each group reflects cumulative inverse-probability weighted (IPW) costs To evaluate prediction accuracy Split survival data into two data sets (D 0 and D 1 ) of equal size D 0 : training data set D 1 : testing data set 49

Methods Cost Curves and Prediction (cont.) D 0 : develop patients groups using GACD D 1 : group the combinations in the dataset on the basis of the results from D 0 Difference between the predicted cost (based on D 0 ) and the actual cost from D 1 With and without the application of the GACD Methods Cost Curves and Prediction The curve of each group reflects cumulative inverse-probability weighted (IPW) costs To evaluate prediction accuracy Split survival data into two data sets (D 0 and D 1 ) of equal size D 0 : training data set D 1 : testing data set 50

Methods Cost Curves and Prediction (cont.) D 0 : develop patients groups using GACD D 1 : group the combinations in the dataset on the basis of the results from D 0 Difference between the predicted cost (based on D 0 ) and the actual cost from D 1 With and without the application of the GACD Predicted Costs with Grouping 51

Predicted Cost without Grouping Methods Cost Curves and Prediction (cont.) Between the actual cost and predicted cost, we calculated: Average difference Room mean squared error (RMSE) Mean absolute error (MAE) and 95% confidence interval 52

Results Difference in Predictions Grouped data (US$) Non-grouped data (US$) Average difference a 41,524.9 43,113.2 Root mean squared error (RMSE) 45,917.0 48,381.2 Mean absolute error (MAE) 41,789.5 43,639.3 95% Confidence interval Lower 41,420.8 43,061.7 Upper 42,158.2 44,216.9 The 5-year cost prediction without grouping sample overestimate of US$79,544,508 Appendix 110 53

References Onukwugha E, Qi R, Jayasekera J, Zhou S. Cost Prediction Using a Survival Grouping Algorithm: An Application to Incident Prostate Cancer Cases. PharmacoEconomics. 2015. Qi R, Zhou S, editors. A Comparative Study of Algorithms for Grouping Cancer Data. Proceedings of the International MultiConference of Engineers and Computer Scientists; 2014. Chen D, Xing K, Henson D, Sheng L, Schwartz AM, Cheng X. Developing prognostic systems of cancer patients by ensemble clustering. BioMed Research International. 2009. 54

References Qi R, Zhou S, editors. Simulated Annealing Partitioning: An Algorithm for Optimizing Grouping in Cancer Data. Data Mining Workshops (ICDMW), 2013 IEEE 13th International Conference on; 2013: IEEE. National Cancer Institute - Surveillance, Epidemiology and End Results. Bethesda: NCI 2013. http://seer.cancer.gov/registries. Accessed March 29 2013. 55