Detection and Classification of Lung Cancer Using Artificial Neural Network Almas Pathan 1, Bairu.K.saptalkar 2 1,2 Department of Electronics and Communication Engineering, SDMCET, Dharwad, India 1 almaseng@yahoo.co.in, 2 bairusaptalakar@gmail.com Abstract: The early detection of lung cancer is a challenging problem, due to structure of cancer cells. This paper presents the image segmentation using Artificial neural network. This method is used for detecting the lung cancer in its early stages. The segmentation results will be used as a base for a computer aided diagnosis (CAD) system for early detection of lung cancer which will improve the chances of survival of patient. Keywords: Lung cancer detection, Image segmentation, Artificial neural network 1. INTRODUCTION Lung cancer is considered to be the main cause of cancer death worldwide, and it is difficult to detect in its early stages because symptoms appear only in the advanced stages causing the mortality rate to be the highest among all other types of cancer. More people die because of lung cancer than any other types of cancer such as breast, colon, and prostate cancers. There is significant evidence indicating that the early detection of lung cancer will decrease mortality rate. The most recent estimates according to the latest statistics provided by world health organization indicates that around 7.6 million deaths worldwide each year because of this type of cancer. Furthermore, mortality from cancer are expected to continue rising, to become around 17 million worldwide in 2030. There are many techniques to diagnose lung cancer, such as Chest Radiography (x-ray), computed Tomography (CT), Magnetic Resonance Imaging (MRI scan) and Sputum Cytology. However, most of these techniques are expensive and time consuming. In other words, most of these techniques are detecting the lung cancer in its advanced stages, where the patients chance of survival is very low. Therefore, there is a great need for a new technology to diagnose the lung cancer in its early stages. Image processing techniques provide a good quality tool for improving the manual analysis 2. ARTIFICIAL NEURAL NETWORK Artificial neural network is a mathematical model that tries to simulate the structure and functionalities of biological neural networks. Basic building block of every structure artificial neural network is artificial neuron, that is, a simple mathematical model (function). Such a model has three simple sets of rules, multiplication, summation and activation. At the entrance of artificial neuron, the inputs are weighted, every input value is multiplied by individual weight in the middle section of artificial neuron is sum function that sums all weighted inputs and bias. At the exit of artificial neurons the sum of previously weighted inputs and bias is passing through activation function that is called also called transfer function Figure1: Working principle of an Artificial neuron Although the working principles and simple set of rules of artificial neuron looks like nothing special the full potential and calculation power of these models come to life when we start to interconnect them into artificial neural networks. These artificial neural networks use 62
simple fact that complexity can grow out of merely few basic and simple rules 3. PROCEDURE A. Seedfill operation Figure2: Artificial neuron Seed fill operation is an algorithm that determines the area connected to a given node in a multidimensional array. It performs the operation on background pixels of the binary image starting from the points specified in locations. It fills the holes in the binary image. B. Region of interest A region of interest, is a selected subset of samples within a dataset identified for a particular purpose. The concept of an ROI is commonly used in many application areas. In medical imaging, the boundaries of a tumor may be defined on an image or in a volume, for the purpose of measuring its size. C. Image segmentation Segmentation is the process of partitioning an image into disjoint and homogenous this task can be equivalently achieved by finding the boundaries between the regions; these two strategies have been proven to be equivalent indeed. Regions of image segmentation should be uniform and homogeneous with respect to some characteristics such as gray tone or texture. Region interiors should be simple and without many small holes. Adjacent regions of segmentation should have significantly different values with respect to the characteristic on which they are uniform. Boundaries of each segment should be simple, not ragged, and must be spatially accurate." A more formal definition of segmentation can be given in the following way. Let I denote an image and let H define a certain homogeneity predicate; then the segmentation of I is a partition P of I into a set of N regions Rn, n = 1. N, such that: 1) n1 R n N U Rn I with Rm ; n m ; 2) H(Rn) = true n ; 3) H(Rn Rm) = false Rn and Rm adjacent. Condition 1) states that the partition has to cover the whole image; condition 2) states that each region has to be homogeneous with respect to the predicate H; and condition 3) states that the two adjacent region cannot be merged into a single region that satisfies the predicate H. Segmentation is an extremely important operation in several applications of image processing and computer vision, since it represents the very first step of low-level processing of imagery. As mentioned above, the essential goal of segmentation is to decompose an image into parts which should be meaningful for certain applications with color image segmentation which is becoming increasingly important in many applications. For instance, in digital libraries large collections of images and videos need to be catalogued, ordered, and stored in order to efficiently browse and retrieve visual information. Color and texture are the two most important low-level attributes used for content based retrieval of information in images and videos. Because of the complexity of the problem, segmentation with respect to both color and texture is often used for indexing and managing the data Texture feature extraction consists of finding the mean which is done by converting the size of an image into column matrix and adding each element of the matrix to find the sum which is divided by the product of rows and columns of the image. standard deviation is y j M i1 u ij u M 1 Entropy is calculated by using the formula E=sum(P*log(1/P)) j 2 63
Kurtosis is defined as measure of how outlier prone a distribution is. It is measure of whether the distribution is tall, skinny or short and squat compared to normal distribution of the same variance D. Color representation Several color representations are currently in use in color image processing. The most common is the RGB space where colors are represented by their red, green, and blue components color is better represented in terms of hue, saturation, and intensity. An example of such a kind of representation is the HSI space which can be obtained from RGB coordinates in various ways, e.g., by defining hue H= arctan 3 G B,2R G B saturation S=1-min(R,G,B)/I, and intensity I= (R + G + B) /3, and by arranging them in a cylindrical coordinate system. The HSV space provides a description of color analogous to that of the HSI space, the hue H and the saturation S are similarly defined while the value V is defined as V =max(r, G,B). 4. K-MEANS ALGORITHM K-Means is a rather simple but well known algorithm for grouping objects, clustering. The K-Means method is numerical, unsupervised, non-deterministic and iterative Again all objects need to be represented as a set of numerical features. In addition the user has to specify the number of groups (referred to as k) he wishes to identify. Each object can be thought of as being represented by some feature vector in an n dimensional space, n being the number of all features used to describe the objects to cluster. The algorithm then randomly chooses k points in that vector space, these points serve as the initial centers of the clusters. Afterwards all objects are each assigned to center they are closest to. Usually the distance measure is chosen by the user and determined by the learning task. After that, for each cluster a new center is computed by averaging the feature vectors of all objects assigned to it. The process of assigning objects and recomputing centers is repeated until the process converges. The algorithm can be proven to converge after a finite number of iterations. Several tweaks concerning distance measure, initial center choice and computation of new average centers have been explored, as well as the estimation of the number of clusters k. Yet the main principle always remains the same do that for you. A. K-means algorithm properties There are always K clusters. There is always at least one item in each cluster. The clusters are non-hierarchical and they do not overlap. Every member of a cluster is closer to its cluster than any other cluster because closeness does not always involve the 'center' of clusters. B. K-means algorithm process The dataset is partitioned into K clusters and the data points are randomly assigned to the clusters resulting in clusters that have roughly the same number of data points. For each data point: Calculate the distance from the data point to each cluster. If the data point is closest to its own cluster, leave it where it is. If the data point is not closest to its own cluster, move it into the closest cluster. Repeat the above step until a complete pass through all the data points results in no data point moving from one cluster to another. At this point the clusters are stable and the clustering process ends. The choice of initial partition can greatly affect the final clusters that result, in terms of inter-cluster and intracluster distances and cohesion. 5. CLASSIFICATION Classification is the process of classifying the cancerous images by extracting the features of the given image suffering from the cancer and these features are compared with the features of the given sample images. In this paper 35 sample images are given for classification and the features of these images are compared with the given image and hence lung cancer is detected 6. RESULTS AND DISCUSSIONS The proposed technique is used for many images of lungs suffering from cancer. The seed operation is 64
performed for given image, Figure3 shows the input image, and Figure4 shows the image of seed fill operation. The region of interest is taken from the given image suffering from cancer. The Figure5 shows the region of interest for the image. The texture features and color image segmentation is done for the given image. The texture features of the given image are shown in table below. Figure6 shows the segmented image. Fig.5 shows RGB components of the image. Fig.6 shows the classification of cancerous images Figure5: Region of interest Figure3: Query image Figure 6: Segmented image Figure4: Seed fill operation TABLE I: FEATURES OF CANCEROUS IMAGE Table Head Texture feature extraction Values Mean 139.269 Standard deviation 68.2518 Entropy 7.84002 Kurtosis 194.02 Area 11495 Figure7: RGB componenet 65
Figure8:classification of cancerous images 66
Detection and Classification of Lung Cancer Using Artificial Neural Network 7. CONCLUSION In this paper the color features and texture features are extracted and the given image features are compared with given 35 sample images for classification using artificial neural network. In this three images are showing the lung cancer. It is showing the images suffering with 60%, 70% and 80% of lung cancer. REFERENCES [1]. Dignam JJ, Huang L, Ries L, Reichman M, Mariotto A, Feuer E. Estimating cancer statistic and other-cause mortality in clinical trial and population-based cancer registry cohorts, Cancer 10, Aug 2009. [2]. T. C. Kennedy, Y. Miller and S. Prindiville, Screening for Lung Cancer Revisited and the Role of Sputum Cytology and Fluorescence Bronchoscopy in a High-Risk Group, Chest Journal, vol. 10, pp. 72-79, 2005 [3]. Z. Daniele, H. Andrew, J. Nickerson, Nuclear Structure in Cancer Cells, Nature Reviews Cancer, Medical School, vol. 4, no. 9, pp. 677-87, USA, Sep. 2004 [4]. A. Sheila and T. Ried Interphase Cytogenetics of Sputum Cells for the Early Detection of Lung Carcinogenesis,Coordinating Center for Clinical Trials, National Cancer Institute, 6120 Executive Boulevard, Bethesda, MD 20852-4910. 2010R. [5]. K. McCrae, D. Ruck, S. Rogers and M. Oxley, Color Image Segmentation, Proceeding of the SPIE- The International Society for Optical Engineering, Application of Artificial Neural Networks, Orlando, USA, pp. 306-315, April, 1994. [6]. L. Lucchese and S. K. Mitra, Color Image Segmentation: A State of the Art Survey, Proceeding of the Indian National Science Academy (INSA-A), New Delhi, India, vol. 67, no. 2, pp. 207-221, 2001.M. [7]. S.Shah, Automatic Cell Images segmentation using a Shape-Classification Model, Proceedings of IAPR Conference on Machine vision Applications, pp. 428-432,Tokyo, Japan, 2007. [8]. R. Sammouda, N. Niki, H. Nishitani, S. Nakamura, and S. Mori, Segmentation of Sputum Color Image for Lung Cancer Diagnosis based on Neural Network, IEICE Transactions on Information and Systems. vol. 8, pp. 862-870, August 67