Detection of Lung Cancer Using Backpropagation Neural Networks and Genetic Algorithm Ms. Jennifer D Cruz 1, Mr. Akshay Jadhav 2, Ms. Ashvini Dighe 3, Mr. Virendra Chavan 4, Prof. J.L.Chaudhari 5 1, 2,3,4,5 Dr. D. Y. Patil, School of Engineering, Charholi (Bk), Lohegaon, Pune - 412105 Abstract Lung Carcinoma is a disease of uncontrolled growth of cancerous cells in the tissues of the lungs. The early detection of lung cancer is the key of its cure. Early diagnosis of the disease saves enormous lives, failing in which may lead to other severe problems causing sudden fatal death. In general, a measure for early stage diagnosis mainly includes X-rays, CT-images, MRI s, etc. In this system first we would use some techniques that are essential for the task of medical image mining such as Data Preprocessing, Training and testing of samples, Classification using Backpropagation Neural Network which would classify the digital X-ray, CT-images, MRI s, etc. as normal or abnormal. The normal state is the one that characterizes a healthy patient. The abnormal image will be further considered for the feature analysis. Further for optimized analysis of features Genetic Algorithm will be used that would extract as well as select features on the basis of the fitness of the features extracted. The selected features would be further classified as cancerous or noncancerous for the images classified as abnormal before. Hence this system will help to draw an appropriate decision about a particular patient s state. Keywords BackpopagationNeuralNetworks,Classification, Genetic Algorithm, Lung Cancer, Medical Image Mining. I. INTRODUCTION Lung Cancer is one of the leading causes of deaths in both men and women. In most cases the manifestation of lung cancer in a patient s body is revealed through early symptoms. Treatment and prognosis depend on the histological type of cancer, the stage and the patient s performance. Possible treatments include surgery, chemotherapy and radiotherapy. Survival depends on the stage, overall health and other factors. Currently there are no technical methods to prevent cancer, which is why early detection represents a vital factor in the treatment and increasing the survival rate. As medical images are an essential part of medical diagnosis and treatment we are concentrating on these images for optimized results. These images include prosperity of unseen information that are exploited by the doctors in making reasoned decisions about a patient. Extracting this relevant hidden information is a critical first step. This reason motivates us to use data mining techniques for efficient knowledge extraction. So we are using the data mining technique of data preprocessing for performing tasks like normalization, transformation, cleaning and formatting to get a good and enhanced quality of image for the process of feature extraction. The neural network will be trained and tested using an available database and the backpropagation algorithm. The process of feature selection will be carried out to select the essential features from the image and classify the image as cancerous or non-cancerous using the backpropagation neural network. Thus this system will help the doctors to conclude or make decision about a patient s condition. II. LITERATURE SURVEY Many of the existing system show various methods of detecting the lung carcinoma. One such system proposed by Sandeep Kumar Saini[1] suggests the use of fuzzy logic for the testing of the features extracted for its accuracy and the ACO technique for optimized classification of the cancerous images. In this system the lung CT image is applied to the process of preprocessing. After this feature extraction is done using the Binarization Approach. Then classification is done using fuzzy logic and ACO techniques to get the results as cancerous or non-cancerous image. Thus the system detects lung carcinoma in its early stages using fuzzy logic and ACO technique. Analogously Zakaria Suliman Zubi [2] proposes the technique of generalized Neural Networks and Association Rule Mining in order to identify the characteristics of the images and classify them into cancerous and non-cancerous. The author has made use of x-ray chest films which are stored in huge multimedia databases for a medical purpose. The multimedia database is applied to some image recognition methods to extract useful knowledge and rules from the database. These rules are given to the neural network to classify the database and help the doctors to decide on a patient s condition. The comparative study of the various techniques that can be used for the detection of lung cancer such as If-Then Rules, Decision-Tree, Naïve Bayes Classifier, and Artificial Neural Network is studied by V.Krishnaiah [3]. The author has proposed a system which extracts hidden knowledge from historical lung cancer database. The Decision tree is used to drill through the database and interpret it. The IF-THEN rule, Naïve Bayes and artificial neural network are used to classify 823
the database. Thus the results are obtained after classification and the lung cancer prediction system is used for early detection which can save a patient s life. Also Ankit Agarwal [4] proposed a system that performs Association Rule mining analysis on lung cancer data to identify hotspots in the cancer data, where the patient survival time is significantly higher than and lower than the average survival time across the entire dataset. In this system the use of SEER database is made from the SEER website on o license agreement and the data mining method of association rule mining is performed to classify the images as cancerous or non-cancerous. Here a two stage association rule mining is used where the redundant rules from stage 1 are removed in stage 2 and classification is done to identify hotspots in lung cancer data. Sr. No. Paper Author Classification Methods used 1 Detection Of Lung Carcinoma using Fuzzy Logic and ACO Techniques 2 Using Some Data Mining Techniques For Early Diagnosis Of Lung Cancer: Sandeep Kumar Saini, Gaurav, Amita Choudhary Zakaria Suliman Zubi and Rema Asheibani Saad Fuzzy Logic and ACO Technique. Neural Networks, Association Rule Mining. Dr.M.Thangamani [5] has proposed a system for automatic detection of lung cancer using adoptive neuro fuzzy system. The proposed system extracts the lung region from the lung CT image using morphological reconstruction. After this nodule segmentation is done using global threshold and morphological operations. The feature extraction is done from these segmented nodules and based on these features classification of the cancerous or non-cancerous image is done using an ANFIS based classifier. 3 Diagnosis Of Lung Cancer Prediction System: Using Data Mining Classification Techniques V.Krishnaiah, Dr. G. Narsimha, Dr. N. Subhash Chandra If-Then Rules, Decision-Tree, Naive Bayes Classifier, Artificial Neural Network. In the research article of Early Detection of Lung Cancer Risk Using Data Mining Kawsar Ahmed [6] has used the methods of Apriori Algorithm. The article proposes the use of a lung cancer database, applying the database to preprocessing methods and then removing various patterns from this data using the apriori algorithm. After the patterns are removed a clustering algorithm is used to create clusters having patterns with different ranges of risk. Hence by using these data mining techniques the early detection of lung cancer risks in various ranges in achieved. 4 Identifying Hotspots In Lung Cancer Data Using Association Rule Mining 5 Implementation of automatic detection of lung cancer using Adoptive Neuro Fuzzy system Ankit Agarwal and Alok Choudhary S.Navin Kumar, R.Rishikesh,Dr. M. Thangamani, V.Santhoshkumar, A.SenthilKarthick Association Rule Mining. Adoptive Neuro Fuzzy System 6 Early Detection of Lung Cancer Risk Using Data Mining Kawsar Ahmed, Abdullah-Al Emran, Tasnuba Jesmin, Roushney Fatima Mukti, Md Zamilur Rahman, Farzana Ahmed Apriori Algorithm 824
III. METHODOLOGY The first step is to collect the Lung CT images to initially classify them as normal or abnormal. Images are collected from the different available databases. These lung cancer images are in the format of.dat file. The next step will be to apply Data Preprocessing on this images.this phase is necessary to improve the quality of the image so as to make feature extraction phase more reliable. For this purpose the input i.e. RGB images are transformed to Gray-scale images. Post enhancing the image, the training and testing of the Artificial Neural Network is done using available database. Feature extraction of the input images is done using Genetic Algorithm. The artificial neural network has to be initially trained with a training dataset for learning and performing classification. Images have a large number of features and it is important to extract and use the required features for reducing the complexity of processing. In image processing feature extraction is performed on the images for deriving the values (features) that prove to be informative, non-redundant and help in the learning and knowledge discovery about the given images for carrying out better human interpretations. According to the features extracted, the classification of the images is done as normal or abnormal images using Backpropagation Neural Network. Now, the process of feature selection is performed on the abnormal images using Genetic Algorithm.This process makes the task of classification easy, as the classification has to be performed on the basis of the relevant features selected from the input image. The features are selected according to the fitness value of the features extracted, only those features are selected whose fitness value is equal to the threshold The features thus selected are used to classify the abnormal image as cancerous or non-cancerous using Backpropagation Neural Network Neural Networks: Neural network is an interconnected web of neurons that transmits elaborate patterns of electrical signals. The human brain can be called as a biological neural network. An artificial neural network is a family of learning models based on biological neural networks. Advantages of Neural Network: Neural networks are flexible in changing the environment. A neural network learns to recognize the patterns which exist in the dataset. Neural networks can easily build informative models and handle very complex situations. IV. ALGORITHMS USED 1. Backpropogation Neural Network: This is a learning algorithm used for training the artificial neural network. Primarily, the backpropagation process consists of two passes- forward pass and backward pass through which the different layers of the network are trained. The algorithm for Backpropagation can be given as follows: 1. Firstly, initialize the weights randomly. 2. An input vector pattern is presented to the network. 3. Evaluate the outputs of the network by propagating input signals forward. 4. For all output phase neurons calculate the desired output. 5. For all other neurons (from last hidden layer to first), compute output of the succeeding layer. 6. Update the weights according to the learning rate. 7. Go to step 2 for a certain number of iterations, or until the error is less than a pre-specified value. 2. GENETIC ALGORITHM: This is a heuristic search algorithm based on the method of natural selection and genetics. It generates solutions for optimization problems using techniques such as inheritance, mutation, selection and crossover. Genetic Algorithm is given as: 1. Create a random initial population (t). 2. Evaluate a fitness for every individual in the current population. 3. Select the fittest features i.e. parents from the population. 4. Perform crossover on the parents creating a population (t+1). 5. Now, perform mutation on the population (t+1), which in short means make some random changes in every individual of the population. 6. Determine the fitness of the finally generated population. 7. Repeat steps 2 to 6 until a condition where the best features that meets a predefined minimum criteria is found. 825
System Flow: System START Login Upload images Training and Testing Feature Extraction (GA) BPNN (Abnormal/ Normal) Feature Selection (GA) BPNN (Cancerous/ Non-Cancerous) Result Logout Flow Algorithm: 1. START 2. Login 3. Upload the X-ray images or CT-images of the lungs to be tested for lung cancer. 4. The uploaded images then go through the process of Preprocessing, in which it is converted from RGB to Grayscale. 5. The artificial neural network is trained and tested using the learning database to perform pattern classification totest the raw database provided and the pattern of classification learnt is considered as the features of the images using Genetic Algorithm. 6. The artificial neural network classifies the uploaded images as normal or abnormal using Backpropagation Algorithm. If the image is classified as normal go to step 9. Else continue. 7. Required features of the images classified as abnormal are selected using the Genetic Algorithm. 8. The features thus selected are further used to classify the image as cancerous or non-cancerous using the Backpropagation Algorithm. 9. The results are displayed. 10. STOP V. CONCLUSION We have studied an approach towards the system for the early detection of lung cancer which will benefit the patient, as the disease can be cured in its early stages and will also help to increase the survival rate. This system provides a framework for detecting the lung cancer using the Lung CT image/x-ray chest image and providing the results concerninga particular patient s health state by making the use of Backpropagation Algorithm and the Genetic Algorithm. ACKNOWLEDGMENT We would like to thank Dr.D.Y.Patil School Of Engineering for providing us with all the required amenities. We are also grateful to Prof. S. S. Das, Head of Computer Engineering Department, DYPSOE, Lohegaon, Punefor their indispensable support, suggestions and motivation during the entire course of the project. STOP 826
REFERENCES [1] Sandeep Kumar Saini, Gaurav, Amita Choudhary, Detection Of Lung Carcinoma using fuzzy Logic and ACO Techniques, IJERT, Vol.3 Issue 8, pp.903-906,2014 [2] Zakaria Suliman Zubi and Rema Asheibani Saad, Using Some Data Mining Techniques for Early Diagnosis of Lung Cancer, Rescent researches in Artificial Intelligence, Knowledge Engineering and data Bases, Libya,2007 [3] V. Krishnaiah, Dr. G. Narsimha, Dr. N. Subhash Chandra, 2013, Diagnosis of Lung Cancer Prediction System Using Data Mining Classification Techniques, International Journal Of Computer Scienceand Informayion Technologies,Vol.4(1),2013,39-45. [4] Ankit Agarwal and Alok Choudhary, Identifying Hotspots in Lung cancer Data Using Association Rule Mining,, IEEE, pp.995-1002, 2011. [5] S.NavinKumar,R.Rishikesh,Dr.M.Thangamani,V.Santhoshkumar, A.SenthilKarthick, Implementation of automatic detection of lung cancer using Adoptive Neuro Fuzzy system,international Journal of Advances in Science and Technology,2012 [6] Kawsar Ahmed, Abdullah-Al Emran, Tasnuba Jesmin, Roushney Fatima Mukti, Md Zamilur Rahman, Farzana Ahmed, Early Detection of Lung cancer Risk Using Data Mining Techniques, DOI:http://dx.doi.org/10.7314/APJCP.2013.14.1.595 [7] J.David Schffer, Angel Janevski and Mark R.Simpson,AGenetic Algorithm Approach for Discovering Diagnostic Pattern in Molecular Measurement Data,,IEEE,2005. [8] P. Nithya, B.Umamaheshwari, R. Deepa, Detection Of Lung Cancer Using Data Mining Classification Techniques, IJARCSSE, Vol. 5, Issue 7, July 2011 827