Automated Approach for Qualitative Assessment of Breast Density and Lesion Feature Extraction for Early Detection of Breast Cancer 1 Spandana Paramkusham, 2 K. M. M. Rao, 3 B. V. V. S. N. Prabhakar Rao Electronics and Electrical Engineering, BITS Pilani Hyderabad, India 1 spandanamadhav@gmail.com, 2 kundammrao@gmail.com, 3 budhiraju@bits-hyderabad.ac.in Abstract Breast cancer is one of the leading causes of fatality in women. Mammogram is the effectual modality for early detection of breast cancer. Increased mammographic breast density is a moderate independent risk factor for breast cancer, Radiologists have estimated breast density using four broad categories (BI-RADS) swearing on visual assessment of mammograms. But if we can measure breast density quantitatively, we can provide most accurate and a reliable density measures. Breast density and Lesion feature extraction plays important role in determining cancer risk. Breast contour helps to find the position of the nipple, as its position is important for registration of left and right breasts, to detect bilateral asymmetry. The shape of the mass border helps radiologists to judge whether mass is malignant or benign.. Novel algorithms are designed for 1) Breast Density Estimation 2) Breast border 3) Segmentation of mass and for deriving the mass border,4) Extraction of haralick features[15] from the mass. These features help to further investigate in a clinical evaluation for classification to detect the cancer in early stages. We processed fourteen mammograms for breast border extraction, of which we segmented and calculated features for six patients, who have masses. Keywords Breast density, Mass, Malignant, Benign, Feature Extraction. I. INTRODUCTION Breast cancer is the conducing cause of death in women and more so in urban areas in India. It accounts for about 25 to 33 of all cancers in women. Mammography is the efficacious technique for detecting breast cancer in early stages. About 50 breast cancer patients in India confront in stages 3 and 4 [1], so there is urgent need to diagnose the breast cancer in early stages. Inadequate image quality makes radiologists difficult to detect subtle signs of breast cancer like masses, microcalcifications. Several image processing techniques have been developed to improve the detection of abnormal features in breast mammograms to increase survival rate and chances of complete recovery. Breast density is a significant measure which indicates presence of abnormality. It is very difficult task to detect malignant lesions in dense breast. Wolfe in [2] inferred that there is a relation between parenchymal pattern and breast cancer. Automated approach to detect breast parenchymal density qualitatively helps radiologists. Masses are the most common asymmetric signs of cancer and appear brighter than the surrounding tissue [3]. Most benign masses possess well-defined sharp borders, while malignant tumors often have ill-defined, microlobulated, or spiculated borders and further extraction of these features helps for classification. Bilateral asymmetry is an asymmetry of the breast parenchyma between left and right breast, may indicate breast cancer in its early stage. Many techniques have been developed for the detection of bilateral asymmetry, quality assessment of breast density, breast contour extraction that assists radiologists for early detection of breast cancer. In [4] they applied minimum cross entropy to get threshold values to segment main core of glandular region. In [5] they estimated breast density values by segmenting breast region with statistical approach and concluded that breast cancer patients have higher breast density. Extraction of breast contour is also very important to find the position of the nipple, as its position is important for mass detection in the next stages and alignment of left and right breasts. Extraction of breast border in [6] is done using polynomial modeling. In [7] segmentation of mass is done using region growing technique, where Harris corner technique is used to get the seed value. Feature extraction of suspicious regions helps doctors to detect cancer in early stages [8]. In [9] mass was segmented using isocontour map and texture features, shape features are extracted for further classification. Wavelet features are extracted for circular lines of extracted mass [10]. In [11] bit planes 6 and 7 were considered for the extraction of statistical features and logical mapping is done for mean and standard deviation. Morphological features were calculated for microcalcification [12]. We designed algorithms for image enhancement, segmentation and calculated bilateral asymmetry [13]. We segmented Mass region and superimposed mass boundary on the mass [14] so that radiologists can observe mass lesion exactly and extracted geometric features, Wavelet features, and Texture features from the mass. In this paper we developed 1) Automated Breast density estimation by segmenting glandular region 2) Lesion mask extraction and its feature extraction for classification into malignant and benign 3) Breast boundary detection to find the position of the nipple, as it is important for the alignment of left and right breasts. 892
II. IMPLEMENTATION USING LABVIEW AND MATLAB A. Estimation of breast density In this paper, we use the K-Means clustering algorithm. This is an algorithm to group objects into a K number of clusters based on a features where K is a positive integer number. In the estimation of breast density we segment glandular region using k-means algorithm. We consider the input as image pixels and their features are their greylevel values. The algorithm aims at minimizing sum of any pixel point to cluster centroid distances, we have chosen Euclidean distance as distance measure. We processed seven cases, of which five are cancer patients and two are benign cases. Figure 2: a) Cancer Mammogram b) Dense region of (a) Algorithm: 1) Read the image using mat lab 2) Choose number of clusters 3) Apply Kmeans algorithm Figure 3: a) Benign Mammogram b) Dense Region of (a) Table1: BD: Breast Density PN: Patient Number CCL: CC Left CCR: CC Right 4) Extract glandular region 5) Calculate area of the glandular region BD PN 1 PN 2 PN 3 PN 4 PN5 PN6 PN7 6) Segment breast area from the background by applying ostus thresholding 7) Calculate breast area 8) Breast density = Number of dense tissue pixels X 100 Number of pixels in the Breast Read the Original image Apply K means algorithm with K=2 Extract dense region from the breast Calculate Number of pixels in glandular region CC L CC R 47.3 40 61 38. 5 28 47 36 42 31 31.1 38.5 77.37 70 45.9 Figure 2(a) and Figure 2(b) gives cancer mammogram and its segmented glandular region, Figure 3(a) and Figure 3(b) gives benign mammogram and its segmented breast region. In Table I PN1, PN2 PN3, PN6 are the patients who have malignant masses, PN7 has cancer calcification, PN 4, PN5 are the patients who have benign masses, diagnosed by radiologist. Observations: We could observe that 1) Breasts having mass, have high density for malignant and benign cases, 2) Cancer patients have high breast density (>60) where ever mass is present. PN7 has high breast density of 77.37 in the breast, who has cancer calcifications. B. Extraction of breast border Segment breast area from background Calculate Number of pixels in breast area Estimate the breast density (Gaussian pyramid) is generated for the mammogram image. The hierarchy consists of two levels (0-1). The standard resolution of level 0 image is 1914 2294. Level 1 image is obtained by reducing the size of level 0 image by cubic spline resampling technique i.e., 957 1147. Then Image Arithmetic division is applied to level 1 and level 0 (upscale image). To the output morphological dilation with structuring element 5x5 kernel is applied to get correct border. Figure 1: Flow chart to estimate breast density 893
Procedure: Step 1: Gaussian filter is applied to (level 0) image, then it is downscaled by cubic spline resampling technique in lab view. Now level 1 s image resolution is 957 x 1147 Step 2: Apply Gaussian filters to level 1 Step 3: Upscale level 1 image to original size using cubic spline interpolation. Step 4: Apply Image arithmetic division to level 0 and level 1 image. Figure 6: Extraction of border implementation in Lab view Step 5: Morphological operations, thinning and dilation is applied with structuring element 5x5 kernel. As shown in Figure 5 and Figure 6, and Figure 7 show CC and MLO views of original mammograms and its borders. Original image Gaussian Filter Downscaling by half Gaussian Filter Upscale to original size Arithmetic Division Morphological Thinning Morphological dilation Figure 4: Flow chart to extract border Figure 7: a) MLO left b) Border C. Segmentation of mass and border extraction Mask of the lesion is obtained by applying manual threshold using histogram and morphological operations, which is of unsupervised, it doesn t require any seed. Thresholding, morphological dilation and opening are carried out using matlab. This process is explained in flow chart as shown in Figure 8. Gray level mage Manual Threshold Morphological operations Logical AND with original image Border extraction Figure 8: Flow chart of segmentation \ Figure 5: a) Original mammogram b) Extracted border Figure.9: a) Original b) Malignant mass c)ill defined Border 894
Correlation 3 Sumof variances 4 Inverse difference moment 5 Sum Average 6 Sum Variance 7 Figure10: a) Original b) Benign mass c) Lobulated Border Figure 9(a) and Figure 10(a) are original mammograms, Figure 9(b) and Figure 10(b) are the extracted mass and Figure 9(c) and Figure 10(c) gives the border. D. Feature Extraction from the segmented mass We extracted masses of six patients who have masses of which PN 1, PN2, PN 3, and PN 6 have malignant masses and PN 4, PN 5 have benign masses. Haralick features are calculated for the ROI of segmented masses as shown in Figure 9 (b) and Figure 10 (b). Let p(i,j) be the (i,j) th entry in a normalized GLCM. The mean values for the rows and columns of the matrix are. ( ) Sum Entrophy 8 Entrophy 9 Difference Variance 10 Difference Entrophy 11 Information Measure 1 12 Information measure 2 13 Table: 3 ( ) F N PN1 PN2 PN3 PN4 PN5 PN6 The standard deviations for the rows and columns of the matrix are: ( ) i ( ) i ( ) ( ) p(i,j) is the (i,j) th entry in a normalized GLCM, px(i) is the i th entry in the marginal probability matrix obtained by summing the rows of p(i,j). We calculated thirteen haralick features for six patients having masses with GLCM of each patient mass image. Feature Table:2 Energy 1 Contrast 2 Feature number 1 0.246 0.418 0.442 0.340 0.360 0.339 2 0.205 0.172 0.224 0.117 0.103 0.140 3 973.6 560.0 604.2 420.6 367.06 479.0 4 21.56 25.11 29.7 14.18 13.48 15.94 5 0.947 0.975 0.972 0.968 0.965 0.970 6 8.009 8.132 9.273 6.183 5.963 6.48 7 62.68 84.87 101.8 42.24 40.40 48.21 8 1.66 1.048 0.986 1.337 1.3 1.36 9 1.738 1.083 1.026 1.382 1.350 1.404 10 0.1 0.113 0.114 0.109 0.1072 0.110 895
11 0.364 0.205 0.229 0.251 0.2667 0.235 12-0.76-0.82-0.80-0.82-0.8-0.82 13 0.94 0.883 0.864 0.921 0.9128 0.93 high breast density. We computed thirteen haralick parameters from the extracted mass region. These parameters help to classify benign and malignant masses. In future we would like to develop a model to classify malignant and benign breast images, based on the parameters like bilateral asymmetry, breast density, border of the mass and Haralick parameters. FN: Feature Number (a) PN: Patient Number (b) IV. ACKNOWLEDGEMENT We thank Director, BITS, Hyderabad for supporting our research work and providing facilities. Authors gratefully acknowledge KIMS, Hyderabad for supporting the research work by providing mammographic images, analyzing the outputs and providing useful comments. REFERENCES [1] http://www.breastcancerindia.net/bc/statistics/stati.htm [2] Wolfe Mammographic Parenchymal Patterns and Breast Cancer Risk [3] http://www.macmillan.org.uk/cancerinformation/causesris kfactors/pre-ancerous/breastcalcifications.aspx. [4] S. Tzikopoulos, H. Georgiou, M. Marvoforakis and S. Theodoridis, Full Automated scheme for breast density estimation and asymmetry detection of mammograms. (c) Figure11: Plots of Features versus No. of patients Among the above features contrast, correlation, sum of variances, sum average could delineate malignant and benign masses. Table I gives the names of haralick features and Table II gives values of haralick features to the extracted mass of six patients. Figure 10(a), 10(b) and 10(c) give plot of haralick features of patients. Observations: The border extraction is a preprocessing step, to find nipple point and end points of the border. These points are used for alignment of left breast and right breast to detect bilateral asymmetry. The border of the mass helps radiologists to preliminary examine whether mass is ill-defined (Malignant)) or Welldefined (Circular or Lobulated).Contrast, correlation, Sum average, Sum variance could delineate benign and malignant as shown Figure11. III. CONCLUSION In this study, images of 14 patients are given by the hospital of which 6 patients have the lesions and 1 patient have cancer calcifications, where PN1, PN2, PN3, PN6, PN7are malignant and PN4, PN5 are benign. We extracted border of the mammogram both in CC view and MLO view to detect nipple location for registration. The border of the mass is extracted from segmented mass to define the shape of the border. We calculated breast density of seven mammograms of which cancer patients having mass, have [5] L. Li, Z. Wu, L. Chen, F. George, Z. Chen, A. Salem and M. Kallegiri,(2005), Breast Tissue Density and CAD Cancer Detection in Digital Mammography, IEEE EMBS Int conf.,sep 1-4, 2005 [6] H. Mirzaalian, M. R. Ahmadzadeh and F. Kolahdoozan, (2006), Breast Contour on Digital Mammogram, ICTTA in Information and Communication Technologies, Vol 1, pp:1804-1801, 2006. [7] B. Senthilkumar, G. Umamaheswari and J. Karthik, (2010), A novel region growing segmentation algorithm for the detection of breast cancer, IEEE Int conf. in computational intelligence and computing research, pp 1-4, Dec 2010. [8] H. Al-Shamlan and A. El-Zaart, (2010), Feature extraction values for breast cancer mammography images, IEEE Int conf on bioinformatics and biomedical technology, pp 335-340, April 2010. [9] W. Han, J. Dong, Y. Guo, M. Zhang and J. Wang, (2011), Identification of masses in digital mammogram using an optimal set of features, IEEE.conf. ontrust, Security and Privacy in computing and Communications, pp 1763-1768, Nov 2011. [10] J. K. Dash and L. Sahoo, (2012), Wavelet Based Features ofcircular Scan Lines for Mammographic Mass Classification, IEEE confrecent Advances in Information Technology (RAIT), pp 58-61, March 2012. [11] M. Tayel and A. Mohsen, (2011), Statistical Measures and Criteria for ROI Identification in Breast Mammograms, IEEE colloquim Humanities, Science and Engineering, pp 922-927, Dec 2011. [12] F. G. G. Elpídio, L. M. Brasil, J. M. Lamas, C. J. Miosso and L. A. Lemos, (2012), Morphological analysis for 896
feature extraction and classification of breast Calcifications, IEEE conf. on Health Care Exchanges (PAHCE), Pan American, pp 46-49, March 2012. [13] P. Spandana, K. M. M. Rao, B. V. V. S. N. Prabhakar Rao, and Jwalasrikala, (2013), Novel Image Processing Techniques for Early Detection of Breast Cancer, Mat lab and Lab view implementation, in IEEE Point-of-Care Healthcare Technologies (PHT), pp:105-108, January 2013 [14] P. Spandana, K. M. M. Rao and B. V. V. S. N Prabhakar Rao, (2013), Early Stage Detection of Breast Cancer Using Novel Image Segmentation and Feature Extraction Techniques, Matlab and Labview Implementation, ICACT in Advanced Communication Technology 2013 (in Communication). [15] M. Mustra, M. Grgic and K. Delac, (2010), Feature Selection for Automatic Breast Density Classification, IEEE ELMAR, pp 9-16, Sep 2010. 897