Evaluation on liver fibrosis stages using the k-means clustering algorithm

Similar documents
DETECTION AND STAGING OF LIVER FIBROSIS USING ADDITIVE LOGISTIC MODELS

Monitoring Hepatitis C

The role of non-invasivemethods in evaluating liver fibrosis of patients with non-alcoholic steatohepatitis

The role of ARFI and APRI in diagnosis of liver fibrosis on patients with common chronic liver diseases

Noninvasive Diagnosis and Staging of Liver Disease. Naveen Gara, MD

Non-Invasive Assessment of Liver Fibrosis. Patricia Slev, PhD University of Utah Department of Pathology

COST EFFECTIVENESS STATISTIC: A PROPOSAL TO TAKE INTO ACCOUNT THE PATIENT STRATIFICATION FACTORS. C. D'Urso, IEEE senior member

DISCLOSURES. This activity is jointly provided by Northwest Portland Area Indian Health Board and Cardea

Module 1 Introduction of hepatitis

Assessment of Liver Stiffness by Transient Elastography in Diabetics with Fatty Liver A Single Center Cross Sectional observational Study

Non-Invasive Testing for Liver Fibrosis

A hybrid Model to Estimate Cirrhosis Using Laboratory Testsand Multilayer Perceptron (MLP) Neural Networks

PROPOSTA DI UN NUOVO ALGORIMO PER LA DIAGNOSI ECOGRAFICA DELLE MALATTIE CRONICHE DEL FEGATO

Transient elastography in chronic viral liver diseases

Applied Health Economics and Health Policy

Pretreatment Evaluation

Predicting Heart Attack using Fuzzy C Means Clustering Algorithm

Surveillance for Hepatocellular Carcinoma

NONINVASIVE TECHNIQUES FOR EVALUATION AND MONITORING OF CHRONIC LIVER DISEASE

National Horizon Scanning Centre. Enhanced Liver Fibrosis Test (ELF) for evaluating liver fibrosis. June 2008

ABSTRACT INTRODUCTION. *Diana Feier 1, *Monica Lupsor Platon 1,2, Horia Stefanescu 1,3, Radu Badea 1,2

Cirrhosis and Portal Hypertension Gastroenterology Teaching Project American Gastroenterological Association

Automatic Hemorrhage Classification System Based On Svm Classifier

Invasive. Sampling error. Interobserver variability. Nondynamic evaluation of

Hepatology for the Nonhepatologist

Cristian Vicas, 1 Monica Lupsor, 2 Mihai Socaciu, 2 Sergiu Nedevschi, 1 and Radu Badea Introduction

ABNORMAL LIVER FUNCTION TESTS. Dr Uthayanan Chelvaratnam Hepatology Consultant North Bristol NHS Trust

Keywords. Abstract. Introduction

Pretreatment Evaluation

*APHP: Hôpital Beaujon Professor P Bedossa, Dr F Degos, Professor P Marcellin, Professor

Hepatocytes produce. Proteins Clotting factors Hormones. Bile Flow

The assessment of liver fibrosis using the computerized analysis of ultrasonographic images. Is the virtual biopsy appearing as an option?

Laboratory and Clinical Diagnosis of HCV Infection

Improved Hepatic Fibrosis Grading Using Point Shear Wave Elastography and Machine Learning

In Search of New Biomarkers for Nonalcoholic Fatty Liver Disease

Predicting Diabetes and Heart Disease Using Features Resulting from KMeans and GMM Clustering

Learning Objectives. After attending this presentation, participants will be able to:

Unsupervised MRI Brain Tumor Detection Techniques with Morphological Operations

ORIGINAL ARTICLE. Jun Zheng 1, Rong-chun Xing 1, Wei-hong Zheng 2, Wei Liu 1, Ru-cheng Yao 1, Xiao-song Li 1, Jian-ping Du 1, Lin Li 1.

NHSScotland is required to consider the Scottish Health Technologies Group (SHTG) advice.

COMPUTER AIDED DIAGNOSTIC SYSTEM FOR BRAIN TUMOR DETECTION USING K-MEANS CLUSTERING

Comparative Study of K-means, Gaussian Mixture Model, Fuzzy C-means algorithms for Brain Tumor Segmentation

Approach to the Patient with Liver Disease

Initial Evaluation for HCV Therapy. Hope McGratty PA-C, MPH

HCSP TRAINING MANUAL

Fibrosis detection from ultrasound imaging. The influence of necroinflammatory activity and steatosis over the detection rates.

Laboratory Tests and Diagnostic Procedures in Liver Disease: Adventures in Liverland

Identification of Tissue Independent Cancer Driver Genes

Pretreatment Evaluation

Hepatitis C Virus (HCV)

Hepatitis C Virus Infection in Diabetes Mellitus Patients

Transient elastography in chronic liver diseases of other etiologies

Aspartate aminotransferase-to-platelet ratio index in children with cholestatic liver diseases to assess liver fibrosis

The management of patients positive to hepatitis C virus antibody in Malta

2. Liver blood tests and what they mean p2 Acute and chronic liver screen

The Future is Here Now!

Who to Treat? Consider biopsy Treat. > 2 ULN Treat Treat Treat Treat CIRRHOTIC PATIENTS Compensated Treat HBV DNA detectable treat

Supplementary Note Details of the patient populations studied Strengths and weakness of the study

Hepatitis C Update on New Treatments

Compute-aided Differentiation of Focal Liver Disease in MR Imaging

T. R. Golub, D. K. Slonim & Others 1999

Rainforest Alliance MEDICAL EXAMS GUIDE

Chapter 1. Introduction

When to Treat: Staging Liver Disease David L. Thomas, MD, MPH

Understanding your FibroScan Results

Break the Liver Biopsy Habit

Reveal Relationships in Categorical Data

HEP DART 2017, Kona, Hawaii

LOGIQ S8 XDclear 2.0 Liver Procedures

Evaluation of liver and spleen stiffness using a ultrasound guided method: Accuracy of ARFI(R) measurements in liver disease patients

Brain Tumor Segmentation: A Review Dharna*, Priyanshu Tripathi** *M.tech Scholar, HCE, Sonipat ** Assistant Professor, HCE, Sonipat

DIAGNOSIS OF CIRRHOSIS BY TRANSIENT ELASTOGRAPHY (FIBROSCAN ): A PROSPECTIVE STUDY.

Data Mining. Outlier detection. Hamid Beigy. Sharif University of Technology. Fall 1395

MRI Image Processing Operations for Brain Tumor Detection

Predicting Breast Cancer Survivability Rates

Hepatitis C: Management of Previous Non-responders with First Line Protease Inhibitors

Supplementary Table 1. The distribution of IFNL rs and rs and Hardy-Weinberg equilibrium Genotype Observed Expected X 2 P-value* CHC

Management of Patients with Past or Present Hepatic Abnormalities

Analysis of Mammograms Using Texture Segmentation

POWERED BY. Innovation in liver disease management

Evaluation of liver and spleen stiffness using a ultrasound guided method: Accuracy of ARFI(R) measurements in liver disease patients

ABSTRACT I. INTRODUCTION. Mohd Thousif Ahemad TSKC Faculty Nagarjuna Govt. College(A) Nalgonda, Telangana, India

Monitoring Patients Who Are Starting HCV Treatment, Are On Treatment, Or Have Completed Therapy

Title:Justified Granulation Aided Noninvasive Liver Fibrosis Classification System

Management of Hepatitis C in Primary Care BABAFEMI ONABANJO, MD & BEN ALFRED, FNP UMASS FAMILY HEALTH CENTER WORCESTER

Diagnosis and Management of PBC

MP Noninvasive Techniques for the Evaluation and Monitoring of Patients With Chronic Liver Disease

Liver Disease Assessment Among PWID: The Role of Transient Elastography

EXTRACT THE BREAST CANCER IN MAMMOGRAM IMAGES

Non-Alcoholic Fatty Liver Disease (NAFLD)

Faculty Disclosure. Objectives. Cirrhosis Management for the Family Physician 18/11/2014

Clinical Policy Title: Liver elastography

IN THE NAME OF GOD. D r. MANIJE DEZFULI AZAD UNIVERCITY OF TEHRAN BOOALI HOSPITAL INFECTIOUS DISEASES SPECIALIST

Hepatitis Alert: Management of Patients With HCV Who Have Achieved SVR

Hepatitis C Virus (HCV) & Infectious Disease 101 for Hubs & Spokes April 24, :00 pm 1:00 pm

Relationship between Controlled Attenuation Parameter and Hepatic Steatosis as Assessed by Ultrasound in Alcoholic or Nonalcoholic Fatty Liver Disease

In response to an enquiry from the Scottish Clinical Biochemistry Managed Diagnostic Network

Noninvasive Techniques for the Evaluation and Monitoring of Patients With Chronic Liver Disease

DIABETES AND LABORATORY TESTS. Author: Josephine Davis

Section 3: Testing and Diagnosis of Hepatitis C

Transcription:

Annals of University of Craiova, Math. Comp. Sci. Ser. Volume 36(2), 2009, Pages 19 24 ISSN: 1223-6934 Evaluation on liver fibrosis stages using the k-means clustering algorithm Marina Gorunescu, Florin Gorunescu, Radu Badea, and Monica Lupsor Abstract. Hepatic fibrosis, seen as an indicator in the progression of chronic liver disease, is measured on a five level scale, using the direct biopsy. The goal of this paper is to develop a k-means clustering methodology used for patient segmentation in accordance with the fibrosis levels. The study aims to assess the effectiveness of such a patient clustering, based on the main biochemical parameters and stiffness values, compared with the standard rule given by biopsy. 1. Introduction Hepatic fibrosis is an indicator in the progression of chronic liver disease. Fibrosis itself causes no symptoms but can lead to portal hypertension -the scarring distorts blood flow through the liver- or cirrhosis -the failure to properly replace destroyed liver cells results in liver dysfunction. Thus, the last stage of liver stiffness leads to cirrhosis and hepatocellular carcinoma. Diagnosis is based on liver biopsy and treatment involves correcting the underlying condition when possible. Liver biopsy is currently the only means of detecting hepatic fibrosis, being indicated to clarify the diagnosis and to stage its progress (e.g. in chronic hepatitis C, whether fibrosis has progressed to cirrhosis). Noninvasive tests (e.g. serologic markers) are under study but are not yet ready for routine clinical use. Imaging tests such as ultrasonography, CT, and MRI may detect findings associated with fibrosis [5]. One of the last technological discovery worldwide in the evaluation of hepatic fibrosis is the Fibroscan (Echosens, Paris, France), a specially adapted ultrasound device using the principle of the one-dimension transient elastography (TE) for the assessment of liver stiffness. Existing patient grouping systems have been developed using either clinical opinion and/or statistical analysis. Although it may be highly desirable to automate the process of deriving statistically valid and clinically meaningful patient grouping systems, it must be taken into account that groups based solely on statistical analysis often result in groups which do not necessarily make sense clinically [1]. As such, it has been recognized that in order to develop a practical grouping methodology, a combination of clinical input and robust statistical methods is required [4]. As patient grouping techniques, clustering algorithms have been used in the context of health care to better understand the relationships between data when the groups are neither known nor cannot be predefined. These algorithms essentially derive data of similar type based on some measure of alikeness or closeness. Examples of clustering algorithms include hierarchical methods such as BIRCH [7], density-based methods such as DBSCAN [2], model-based methods such as mixture density modeling [6], and partition methods such as the k-means algorithm [3]. In this paper we use the k-means algorithm to cluster patients in five groups, based on 19

20 M. GORUNESCU, F. GORUNESCU, R. BADEA, AND M. LUPSOR the main biological parameters and stiffness values obtained by Fibroscan, in accordance with the five classes of liver fibrosis given by direct biopsy. Thus, we compare the patient segmentation obtained by biopsy with that obtained by the k-means algorithm using the standard medical parameters. Valuable conclusions were drawn from this comparison of significantly different methodologies. The rest of the paper is organized as follows. The next section summarizes the main characteristics of the k-means clustering algorithm. In Section three we present a real-life database, used to assess our findings. Section four discusses the results and finally we report our conclusions in section five. 2. k-means clustering algorithm The k-means algorithm is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. Basically, k-means is an algorithm to cluster n objects based on certain attributes into k partitions, k < n. The main idea is to define k centroids, one for each cluster, and to populate the corresponding clusters with the nearest items to them. The algorithm aims to minimise an objective function, given by the squared error function: where the centroid c j is defined by: E rr = k n j=1 i=1 x(j) i c j 2 c j = 1 n j x (j) i i S j where is a distance measure (usually the Euclidian distance) between a data point x (j) i and the centroid c j of the j-th cluster S j. The algorithm stops when the centroids remain unchanged between two consecutive iterations or the squared error does not improve significantly. 3. Dataset The dataset used in this study consists of 743 consecutive patients with chronic HCV infection, examined at the 3rd Medical Clinic, University of Medicine and Pharmacy Cluj-Napoca, Romania, between May 2007 and August 2008. All of them have positive HCV-RNA in their serum and underwent percutaneous liver biopsy for grading and staging the diseases. Moreover, all patients were referred to liver stiffness measurement. Besides the epidemiological, anthropometric clinical parameters and other important predictive parameters, the following biological parameters were determined for all patients on the same day as liver stiffness (i.e. the Fibroscan output): stiffness, sex, age, body mass index (BMI), glycemia, triglycerides, cholesterol, aspartate aminotransferase (AST), alanin aminotransferase (ALT), gamma glutamyl transpeptidase (GCT), total bilirubin (TB), alkaline phosphatase (AP), prothrombin index (PI), tocopheryl quinines (TQS), prothrombin time ratio/international normalized (INR), prolonged activated partial thromboplastin time, haematids (erytthrocytes), hemoglobin, hematocrit, medium erytthrocity volume, average eritrocitary hemoglobin, average concentration of hemoglobin in a red blood cell, thrombocytes, sideraemia, high density lipoprotein cholesterol. These parameters represent the predictive factors used as attributes in the clustering process. The liver biopsy output,

EVALUATION ON LIVER FIBROSIS STAGES... 21 represented by the five Metavir F values (MF0 to MF4), was considered as comparison factor to the k-means clustering segmentation, with k equaling 5, that is the number of Metavir F values. The aim of this study was to assess the effectiveness of an automatic segmentation (k-means clustering), based on the above parameters, in comparison to the standard gold methodology, based on the direct biopsy. The study was approved by the local Ethical Committee of the University of Medicine and Pharmacy Iuliu Hatieganu Cluj-Napoca. The nature of this study was explained to the patients, each of whom provided written informed consent before the beginning of the study in accordance with the principles of the Declaration Helsinki (revision of Edinburgh, 2000). 4. Results We have performed the k-means clustering algorithm using the above 25 attributes. The clustering result is displayed in Table 1. The rows here represent the actual classes obtained by direct biopsy (MF0-MF4), and the columns represent the predicted clusters (C1-C5), obtained based on the k-means algorithm. Each cell contains both the number of patients from the actual class belonging to the predicted cluster, together with the corresponding percentage. In this way, more detailed information on misclassification is provided. Table 1. Confusion (misclassification) table C1 C2 C3 C4 C5 MF0(29) 1 (3.45%) 4 (13.80%) 24 (82.76%) 0 (0%) 0 (0%) MF1(234) 45 (19.23%) 14 (5.98%) 175 (74.79%) 0 (0%) 0 (0%) MF2(175) 40 (22.86%) 16 (9.14%) 117 (66.86%) 1 (0.57%) 1 (0.57%) MF3(88) 11 (12.50%) 16 (18.18%) 59 (67.05%) 2 (2.27%) 0 (0%) MF4(217) 25 (11.52%) 109 (50.23%) 71 (32.72%) 12 (5.53%) 0 (0%) From Table 1 above we can see that the large majority of patients are distributed in the cluster C3 (83 % with MF0, 75 % with MF1, 67 % with MF2 and MF3, and just 32 % with MF4), only the cluster C2 containing half of MF4 patients. Let us also note the closeness of cluster C1 to the main cluster C3. This means that, although they have different MF scores, the diagnosis (patients segmentation) based on the closeness of their biological measures is not enough to provide a good accuracy. 5. Analysis of variance. The goal of the k-means clustering procedure is to classify objects into a userspecified number of clusters. To evaluate the appropriateness of the classification, we can compare the within-cluster variability -small if the classification is good - to the between-cluster variability - large if the classification is good. Using the analysis of variances (F-ratios), we obtained that all the 24 explanatory factors excepting total bilirubin (P-level = 0.17) are significant (P-level < 0.02) for a good segmentation of patients, under these circumstances.

22 M. GORUNESCU, F. GORUNESCU, R. BADEA, AND M. LUPSOR 6. Graph of means. The graph of means displays the line graph of the means across clusters. This graphical technique is very useful for visually summarizing the differences in means between clusters. Figure 1. Graph of means for the five clusters From the graph of means we can visually conclude that clusters C1, C2 and C3 are very close, emphasizing the observation from Table 1 regarding the distribution of patients among the five clusters. The average stiffness (Fibroscan output) of patients in each cluster is given in Table 2 below. Table 2. Clustering vs. average stiffness Clustering vs. average stiffness C1 C2 C3 C4 C5 Stiffness 12.07 28.25 9.7 30.43 4.2 Table 2 shows that the large majority of patients, irrespective their MF stage, belonging to cluster C3, have a mean stiffness equaling 9.7. In this case, the closest cluster is C1, in accordance with Table 1 and Figure 1. Cluster C2 is represented by an average stiffness equaling 28.25, far from those of clusters C1 and C3. Observe that, different from Figure 1, which displays the graph of means depending on each of 25 predicting attributes (including the stiffness), Table 2 shows the average stiffness in each cluster only. Thus, in this table, the relation between the clustering process and the Fibroscan output (averaged) is shown, emphasizing the connection between the Fibroscan technique and the patient segmentation Finally, in Table 3 we summarized the Euclidean distances between cluster centers.

EVALUATION ON LIVER FIBROSIS STAGES... 23 Clustering vs. average stiffness C1 C2 C3 C4 C5 C1 0 87.19 69.51 176.46 4.2 C2 87.19 0 39.97 127.73 863.31 C3 69.51 39.97 0 144.81 862.86 C4 176.46 127.73 144.81 0 875.90 C5 867.66 863.31 862.86 875.90 0 It is easy to see from the above table that clusters C1, C2 and C3 are close enough, based on the distance between centroids. The closest centroids are those of C2 and C3, followed by those of C1 and C3. These results are concordant with the observations from Tables 1 and 2, and Figure 1. Thus, these three clusters are similar enough, although they contain patients from all the MF stages. 7. Conclusions This paper introduced the problem of grouping patient according to their main biological parameters and the Fibroscan output, using the k-means clustering algorithm, and analyzed the relation between such segmentation and the classification given by liver biopsy. Based on the results in this paper, the k-means clustering algorithm appears to be a viable approach for grouping patients according to standard medical parameters, providing valuable knowledge about the relation between the liver fibrosis and medical features. On the other hand, clustering patients based only on the above medical parameters is not enough to obtain good accuracy for the fibrosis level. Thus, other parameters have to be considered to obtain concordance between the biopsy result and the automatic patient grouping. 8. Acknowledge. This paper was supported through the research grant no. 41071/2007, entitled Predictive diagnosis algorithm for the hepatic fibrosis stages evolution using noninvasive ultrasonographic techniques, optimized by stochastic analysis and imaging techniques - SONOFIBROCAST, financed by the National Center of Programs Management (CNMP), Ministry of Education, Research and Innovation, Romania. References [1] Averill, R.F., DRGs: Their Design and Development, Health Administration Press, 1991. [2] Ester, M., Kriegel, H., Sander, J., Xu, X., A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. 2nd International Conference on Knowledge Discovery and Data Mining, 226-231, 1996. [3] Han, J., Kamber, M., Data Mining: Concepts and techniques. Morgan Kaufmann, San Francisco, USA, 2006. [4] Kulinskaya, E., International Casemix Research: Why and How. Proc. 19th International Case Mix Conference, Washington, DC, 2003. [5] Merck Manual of Diagnosis and Therapy, Online version: http://www.merck.com/mmpe/sec03/ ch026/ch026b.html, 2007. [6] McLachlan, G.J, Peel, D., Finite Mixture Models, John Wiley & Sons, New York, 2000. [7] Zhang, T., Ramakrishnan, R., Livny, M., BIRCH: An efficient data clustering method for very large databases. SIGMOD 96, Proc. 1996 ACM SIGMOD International Conference on Management of Data, New York, USA. 103-114, 1996.

24 M. GORUNESCU, F. GORUNESCU, R. BADEA, AND M. LUPSOR (Marina Gorunescu) Faculty of Mathematics and Computer Science, University of Craiova E-mail address: mgorun@inf.ucv.ro (Florin Gorunescur) Department of Biostatistics and Computer Science, University of Medicine and Pharmacy of Craiova E-mail address: gorun@umfcv.ro (Radu Badea) Department of Hepatology, 3rd Medical Clinic, University of Medicine and Pharmacy Iuliu Hatieganu Cluj-Napoca E-mail address: rbadea@umfcluj.ro (Monica Lupsor) Department of Hepatology, 3rd Medical Clinic, University of Medicine and Pharmacy Iuliu Hatieganu Cluj-Napoca E-mail address: mmlupsor@yahoo.com