Code Clone Analysis Environment for Software Maintenance

Similar documents

Record of Revisions to Patient Tracking Spreadsheet Template

Extended G/L Segment Codes

CSE 331, Spring 2000

How to Get Set Up and Running with NDepend

NATIONAL SENIOR CERTIFICATE GRADE 12

P02-03 CALA Program Description Proficiency Testing Policy for Accreditation Revision 1.9 July 26, 2017

ICT4LIFE. Final Conference. ICT4Life field work - tailored solutions in diverse regional context Ariane Girault, E-Seniors Association

Data Fusion for Predicting Breast Cancer Survival

EXPLORING THE PROCESS OF ASSESSMENT AND OTHER RELATED CONCEPTS

Breast Cancer Awareness Month 2018 Key Messages (as of June 6, 2018)

Module 6: Goal Setting

Success Criteria: Extend your thinking:

AP Biology Lab 12: Introduction to the Scientific Method and Animal Behavior

DATA RELEASE: UPDATED PRELIMINARY ANALYSIS ON 2016 HEALTH & LIFESTYLE SURVEY ELECTRONIC CIGARETTE QUESTIONS

Properties detailed info There are a few properties in Make Barcode to set for the output of your choice.

Commun. Theor. Phys. (Beijing, China) 38 (2002) pp. 555{560 c International Academic Publishers Vol. 38, No. 5, November 15, 2002 Capability Analysis

Podcast Transcript Title: Common Miscoding of LARC Services Impacting Revenue Speaker Name: Ann Finn Duration: 00:16:10

2017 CMS Web Interface

How to become an AME Online

PET FORM Planning and Evaluation Tracking ( Assessment Period)

Model-driven Reengineering for a Blue Planet - Refactoring for Energy Efficiency -

Swindon Joint Strategic Needs Assessment Bulletin

WU-Minn HCP 900 Subjects Data Release: Reference Manual

EMC believes the information in this publication is accurate as of its publication date. The information is subject to change without notice.

BIOLOGY 101. CHAPTER 13: Meiosis and Sexual Life Cycles: Variations on a Theme

Campus Climate Survey

The estimator, X, is unbiased and, if one assumes that the variance of X7 is constant from week to week, then the variance of X7 is given by

Assessment Field Activity Collaborative Assessment, Planning, and Support: Safety and Risk in Teams

CONSENT FOR KYBELLA INJECTABLE FAT REDUCTION

PROCEDURAL SAFEGUARDS NOTICE PARENTAL RIGHTS FOR PRIVATE SCHOOL SPECIAL EDUCATION STUDENTS

A Phase I Study of CEP-701 in Patients with Refractory Neuroblastoma NANT (01-03) A New Approaches to Neuroblastoma Therapy (NANT) treatment protocol.

Strategies for Avoiding Plagiarism Part Two: Paraphrasing

FDA Dietary Supplement cgmp

EMEA DICOMBurner solution EMEA DICOMBurner solution

Name: Anchana Ganesh Age: 21 years Home Town: Chennai, Tamil Nadu Degree: B.Com. Profilometer Score. Profilometer Graph

Getting Started. Learning Guide. with Continuous Glucose Monitoring for the MiniMed 530G with Enlite. CGM Foundations

2017 CMS Web Interface

Rate Lock Policy. Contents

Meaningful Use Roadmap Stage Edition Eligible Hospitals

The Interface Between Theory of Mind and Language Impairment

detailed in Ward and Lockhead (1970), is only summarized here.

Year 10 Food Technology. Assessment Task 1: Foods for Special Needs. Name: Teacher:

University College Hospital. Pump school Starting on an insulin pump. Children and Young People s Diabetes Service

BROCKTON AREA MULTI-SERVICES, INC. MEDICAL PROCEDURE GUIDE. Date(s) Reviewed/Revised:

WHAT IS HEAD AND NECK CANCER FACT SHEET

Coding. Training Guide

ANXIETY SYMPTOMS INTERVENTION SESSION HANDOUTS. Introduction to Fighting Fear by Facing Fear. Making a Fears and Worries List

HIS Registry of Ministry Resources

Dental Benefits. Under the TeamstersCare Plan, you and your eligible dependents have three basic options when you need dental care.

A pre-conference should include the following: an introduction, a discussion based on the review of lesson materials, and a summary of next steps.

Candida March, Ines Smyth, and Maitrayee Mukhopadhyay, 1999, A Guide to Gender-Analysis Frameworks, London: Oxfam Publishing.

InformationNOW Attendance

NHAIS SIS Communication

Building Code 101 OWMC November 20, Ministry of Municipal Affairs and Housing

Session 5: Is FOOD fair?

Reliability and Validity Plan 2017

Field Epidemiology Training Program

Frequently Asked Questions: IS RT-Q-PCR Testing

Benefits for Anesthesia Services for the CSHCN Services Program to Change Effective for dates of service on or after July 1, 2008, benefit criteria

NYS Common Core ELA & Literacy Curriculum Grade 12 Module 4 Unit 1 Lesson 14

2017 CMS Web Interface

Statement of Work for Linked Data Consulting Services

Chapter 6: Impact Indicators

Completing the NPA online Patient Safety Incident Report form: 2016

NEW YORK STATE BOARD OF ELECTIONS

Independent Charitable Patient Assistance Program (IPAP) Code of Ethics

Improving Surveillance and Monitoring of Self-harm in Irish Prisons

Interpretation. Historical enquiry religious diversity

The ECG app is not intended for use by people under 22 years old.

Action plan: serialisation of Nordic packages focus on Product Codes

DIRECTED FORGETIING: SHORT-TERM MEMORY OR CONDITIONED RESPONSE? WENDY S. MILLER and HARVARD L. ARMUS The University of Toledo

FOUNDATIONS OF DECISION-MAKING...

Code Generation from UML Model: State of the Art and Practical Implications

The data refer to persons aged between 15 and 54.

2019 Canada Winter Games Team NT Female Hockey Selection Camp August 16-19, 2018

Structured Assessment using Multiple Patient. Scenarios (StAMPS) Exam Information

TABLE OF CONTENTS Glossary of terms Code Pad Diagram 3. Understanding the Code Pad lights.4.

Appendix C. Master of Public Health. Practicum Guidelines

Session78-P.doc College Adjustment And Sense Of Belonging Of First-Year Students: A Comparison Of Learning Community And Traditional Students

2018 CMS Web Interface

Specifically, on page 12 of the current evicore draft, we find the statement:

Bariatric Surgery FAQs for Employees in the GRMC Group Health Plan

(Please text me on once you have submitted your request online and the cell number you used)

State Health Improvement Plan Choosing Priorities, Creating a Plan. DHHS DPH - SHIP Priorities (Sept2016) 1

OPS Measurement Period Report

Novel methods and approaches for sensing, evaluating, modulating and regulating mood and emotional states.

Natural Computation and Self Organization in Monkey Behavior KELLY FINN ANIMAL BEHAVIOR GRADUATE GROUP

Creating and Linking Charge Objects

UNIT 2: mapping bananas

A Unified Approach to Conflict Mineral Compliance for the Tungsten Industry. The Westin, Sydney, 23 September 2013

Related Policies None

Code of employment practice on infant feeding

MGPR Training Courses Guide

OEE Studio Release Note

Chapter 3 Perceiving Ourselves and Others in Organizations

Medical Device Software Development Management: Following FDA Guidelines for Software Validation

Individual Assessments for Couples Treatment with HFCA

Analysis of Pesticide Residues in Citrus Oils by GCxGC-TOFMS with Minimal Sample Preparation

HEALTH SURVEILLANCE INDICATORS: CERVICAL CANCER SCREENING. Public Health Relevance. Highlights.

Transcription:

Cde Clne Analysis Envirnment fr Sftware Maintenance Yshiki Hig 1 Tshihir Kamiya 2 Shinji Kusumt 1 Katsur Inue 1 1 Graduate Schl f Infrmatin Science and Technrgy, Osaka University 2 PRESTO, Japan Science and Technlgy Agency {y-hig,kamiya,kusumt,inue@ist.saka-u.ac.jp ABSTRACT Recently, cde clne has been regarded as ne f factrs that make sftware maintenance mre difficult. A cde clne is a cde fragment in a surce cde that is identical r similar t anther. Fr example, if we mdify a cde fragment which has cde clnes, it is necessary t cnsider whether we have t mdify each f its cde clnes. There are tw ways f maintenance supprt fr cde clnes. One is t cmprehend and manage cde clnes, and the ther is t remve them. Fr the frmer supprt, we have develped cde clne analysis envirnment Gemini. Fr the latter supprt, we have prpsed a methd that detects refactring-riented cde clne. Thrugh the applicatin f them t sftware maintenance in several sftware cmpanies, we gt sme feedback abut them. In this paper, in rder t imprve the usefulness and applicability f the methds in the actual sftware maintenance, we extend bth f ur maintenance supprt methds. Cncretely, we have develped a new cde clne analysis tl SuperGemini that is imprved the scalability by restructuring the architecture f Gemini. This imprvement makes it pssible t apply SuperGemini t industrial sftware practically. Als, as the extensin f the refactring methd, we have develped a characterizatin f cde clnes by sme metrics, which suggest hw t remve them. Then, we have develped refactring supprt tl Cancer. We expect SuperGemini and Cancer can supprt sftware maintenance mre effectively. Categries and Subject Descriptrs D.1.m [Prgramming techniques]: Miscellaneus; D.2.6 [Sftware Engineering]: Prgramming Envirnments; D.2.8 [Sftware Engineering]: Metrics General Terms Design, Languages, Management Keywrds Cde clne, Refactring, Sftware maintenance 1. INTRODUCTION Recently, maintaining sftware systems has been becming mre difficult as the size and cmplexity f sftware is increasing. Maintenance f sftware system is defined as mdificatin f a sftware prduct after delivery t crrect faults, t imprve perfrmance r ther attributes, r t adapt the prducts t a mdified envirnment[14]. Actually, it is reprted that many sftware cmpanies expend a lt f time and human cst fr sftware maintenance. It is generally said that cde clne is ne f factrs that make sftware maintenance mre difficult[6]. Cde clne is a cde fragment that is identical r similar t anther. Cde clnes are intrduced because f varius reasns such as reusing cde by cpy-and-paste. If we mdify a cde fragment and it has many cde clnes, it is necessary t cnsider prs and cns f mdificatin in its crrespnding all cde clnes. Especially, fr large scale sftware, such prcesses are very cmplicated and need much cst. S, efficient cde clne detectin is necessary and imprtant in sftware develpment and maintenance. There are tw ways f maintenance supprt fr cde clnes. One is t cmprehend and manage cde clnes, and the ther is t remve them. Fr the frmer supprt, there exist many researches t autmatically detect cde clnes[4][13]. We have als develped cde clne detectin tl CCFinder[10] and cde clne analysis envirnment Gemini[15]. We have been delivering Gemini (including CCFinder) t mre than 50 sftware rganizatins and evaluated the usefulness f them in the actual sftware maintenance. Then, we have gtten valuable feedback frm them. One f the prblems t be slved in applying Gemini t industrial sftware is scalability. That means Gemini can t deal with large scale sftware (mre than 50000 LOC), effectively. S, the imprvement f scalability f Gemini is essential prblem. Fr the latter supprt, several cde clne remval methds have been prpsed[2][3][12]. We have als suggested a refactring methd that can apply practical sftware develpment and maintenance[7]. But, we have nt implemented the methd as an actual sftware tl. In this paper, in rder t deal with the abve prblems, we develp tw maintenance supprt systems based n cde clne analysis. The first system is SuperGemini that have high scalability t cpe with a huge size sftware in the actual sftware rganizatin. The secnd system is Cancer t supprt the refactring fr cde clne. Cancer can de-

tect refactring-riented cde clnes in practical time frm large scale sftware. Mrever, Cancer characterizes detected cde clne using sme metrics. In ther wrd, Cancer tells the user which cde clnes can be remved and hw t remve them. S, the user can cncentrate n mdifying surce cde, which leads sftware develpment and maintenance t mre effective nes. Thrugh case studies fr several pen surce sftware, we cnfirm the applicability f SuperGemini and Cancer. Surce files Cde clne analysis envirnment, Gemini Clne pair manager Clne selectin infrmatin Cde clne detectr Surce cde manager Cde clne database Clne selectin infrmatin Metrics manager Interfaces Clne scatter plt view Surce cde view Metric graph views User 2. PRELIMINARIES Here, we define sme terminlgy regarding cde clnes. Next, we briefly explain ur previus research results, a cde clne detectin tl CCFinder[10], a cde clne analysis system Gemini and a refactring methd fr cde clnes detected by CCFinder. Finally, we shw several prblems t be slved that were knwn thrugh applicatins f the tls in the actual sftware maintenance prcess. 2.1 Cde Clne A clne relatin is defined as an equivalence relatin (i.e., reflexive, transitive, and symmetric relatin) n cde fragments[10]. A clne relatin hlds between tw cde fragments if (and nly if) they are the same sequences. (Sequences are smetimes riginal character strings, strings withut white spaces, sequences f tken type, and transfrmed tken sequences. ) Fr a given clne relatin, a pair f cde fragments is called a clne pair if the clne relatin hlds between the fragments. An equivalence class f clne relatin is called a clne set. That is, a clne set is a maximal set f cde fragments in which a clne relatin hlds between any pair f cde fragments. A cde fragment in a clne set f a prgram is called a cde clne r simply a clne. 2.2 CCFinder CCFinder[10] detects cde clnes frm prgrams and utputs the lcatins f the clne pairs n the prgrams. The length f minimum cde clne is set by the user in advance. Clne detectin f CCFinder is a prcess in which the input is surce files and the utput is clne pairs. The prcess cnsists f fllwing fur steps: Step1: Lexical analysis: Each line f surce files is divided int tkens crrespnding t a lexical rule f the prgramming language. The tkens f all surce files are cncatenated int a single tken sequence, s that finding clnes in multiple files is perfrmed in the same way as single file analysis. Step2: Transfrmatin: The tken sequence is transfrmed, i.e., tkens are added, remved, r changed based n the transfrmatin rules that aims at regularizatin f identifiers and identificatin f structures. Then, each identifier related t types, variables, and cnstants is replaced with a special tken. This replacement makes cde fragments with different variable names clne pairs. Step3: Match Detectin: Frm all the sub-strings n the transfrmed tken sequence, equivalent pairs are detected as clne pairs. Step4: Frmatting: Each lcatin f clne pair is cnverted int line numbers n the riginal surce files. Figure 1: Architecture f Gemini 2.3 Gemini Since CCFinder aims t detect cde clnes efficiently, the utput frm CCFinder is text frmat, which is difficult t analyze in practical maintenance prcess. S, we have implemented Gemini t supprt effective cde clne analysis. Gemini[15] is a GUI-based cde clne analysis envirnment which uses CCFinder as a cde clne detectr. As shwn in Figure 1, Gemini prvides t the user the fllwing view windws that enable an interactive cde clne analysis: Scatter Plt View, Metric Graph View, Surce Cde View. Scatter Plt View shws visually where clne pairs exist in surce files. It is very effective mechanism in early phase f cde clne analysis since the state f distributin f cde clne can be grasped at a glance. In the view, the user can select clne pairs by muse dragging. The detail f the Scatter Plt View will be described later. Metric Graph View is used fr the user t select clnes by the quantitative characteristics f them. In the Metric Graph View, the user can easily select distinctive clne sets by setting the range f each metrics value. Surce Cde View wrks cperating the Scatter Plt View n the Metric Graph View. The user can btain actual surce cde crrespnding t the clnes selected in the ther views. Here, we briefly explain the Scatter Plt. Figure 2 shws an example f the Scatter Plt. Bth the vertical and hrizntal axes represent cde fragments f surce files. The fllwing tw sequences are used as sample cde fragments in the scatter plt. cde fragment X: ABCDCDEFBCDG, cde fragment Y: ABCEFBCDEBCD Here, symbls A, B, C,... are cde fragments in an unit such as character, tken, line, statement, functin, etc (In Gemini, it is tken). In Figure 2, each small black square means that crrespnding tw elements n the tw axes are the same. S, a clne pair is shwn as a diagnal line segment. If the same cde fragments are arranged n the tw axes, naturally, a diagnal line frm the upper left t the lwer right is drawn since such dt means cmparisn f tken with itself, and the dts are symmetrical with a diagnal line. 2.4 Refactring f cde clnes We have als studied the remval f cde clnes frm surce cde. The remval f cde clnes is generally referred as refactring[6] r restructuring. The key idea f ur methd is t find a kind f chesive cde fragment (like cmpund blck r methd bdies) frm the cde clne fragments. Figure 3 shws an example. In this figure, there are tw cde

/ H / H m i i B! #" 2 %M') 2 *9NO') 2 --. 01 2 3 4 56780 %B" : 0 #;, 0=< 2 : 0 >.? 1 3 3," : 2 A7 " < 2 : 0..) 01 2 3 %B" < 2 : 0>. 0@12 3 4 56780 ) 01 2 3 4 52 % 2 ) 01 2 3 4 56780 %EKF << ) M! #" $&%(') $+*(,) $- -. 012 3 4 56780 %9" : 0 #;, 0=<2 : 0 >.@? 1 3 3," : 2 A7 " <2 : 0..) 012 3 %B" < 2 : 0>. 0@1 2 3 4 56780 ) 012 3 4 5 $+%C$D) 012 3 4 56780 %EGF << ) BIKJGLL PRQ+ST(UVXWY Z[T\!]&^ bc ekfhgi egjkei e lolom np qr uvowxx fkd zo{ qos b{od w ba mi qr fhd qor uvowxx qr egfkei qr uvowxx f!ƒ ˆD Š Œ ˆ Ž & PRQ+S!T(U@V_WYZ[TO\!]=` Figure 3: Example f merging tw cde fragments Figure 2: Scatter plt f cde clnes fragments A and B frm a prgram, and the cde fragments with hatching are maximal clnes between them. In cde fragment A, sme data are substituted t list data structure frm the head successively. In cde fragment B, they are dne s frm the tail successively. The fr blcks in A and B have a cmmn lgic that handles a list data structure. There are, hwever, sentences befre and after fr blck, that are nt necessarily related with the fr blck frm semantic pint f view. Such semantically unrelated sentences ften bstruct refactring. In ther wrd, extracting nly fr blck as a cde clne is mre preferable frm refactring viewpint in this example. The prpsed methd is implemented as a filter fr the utput f CCFinder. We named the filter CCShaper[7]. The extracting prcess using CCShaper cnsists f the fllwing three steps: Step1: Detect clne pairs using CCFinder. Step2: Prvide syntax infrmatin (bdy f methd, lp and s n) t each blck by parsing the surce files where clne pair are detected in Step1 and investigating the psitins f blcks. Step3: Extract structural blcks in the cde clne using the infrmatin f lcatin f clne pairs and structural blcks. Intuitively, structural blck indicates the part f cde clne that is easy t mve and merge. CCShaper perfrms Steps 2 and 3. Fr example, CCShaper extracts the fllwing kinds f cde clne as a refactringriented cde clne fr Java language. Declaratin : class {, interface { Methd : methd bdy, cnstructr, static initializer Statement : if, fr, while, d, switch, try, synchrnized In Step 1, time cmplexity O(nt) is required. Here, n is the tken number included in target sftware, and t is the length f the lngest cde clne in it (the details are shwn in [10]). In Step 2, time cmplexity O(n) is required. Next, in Step 3, time cmplexity O(cs lg c) is required. Here, s is the number f target surce files, and c is the average number f cde clnes in each file. Actually, the values f c and s are extremely smaller than the value f n. S, time cmplexity O(nt) is apprximately required fr detecting refactring riented cde clnes, which enables us t detect nes in practical time frm large scale sftware. There are several related studies abut refactring f cde clnes. Kmndr et al.[12] has prpsed a refactring methd using prgram slicing. In this methd, a prgram dependence graph is cnstructed by analyzing target surce cdes. Identical r similar parts are detected as cde clne. This detectin is greatly precise because f cnsidering cntrl and data flw f prgram. Mrever, it can detect rerdered and intertwined clnes[12] which cannt detected by CCFinder. But, time cmplexity f cnstructing prgram dependence graph is O(m 2 )(m is the number f statement and expressin included in target surce cdes), which makes it difficult t apply this methd t large scale sftware. Als, Balazinska et al.[2] has prpsed an apprach t extract cde clnes using metrics. Since they cnsider the cntext f cde clne, it is practical. But, the unit f the cde clne is restricted t functin and methd, which makes it difficult t perfrm refactring t smaller unit f cde clne. 2.5 Prblems t be slved We have s far delivered Gemini t mre than fifty sftware rganizatins, and gtten many feedbacks frm them. One f the prblems which have been repeatedly pinted ut is the lw scalability f Gemini. The scatter plt f Gemini needs space cmplexity O(n 2 )(n is the number f target surce files) by its implementatin. Mrever, if very many clnes exist, the cst f pltting clnes becmes huge and cnsequently the perfrmance extremely deterirates. Frm ur experience and the related research[11], Gemini can perfrm smthly if the LOC f target sftware is abut 50,000 200,000 r less. Als, there is anther prblem caused by the user-interface f Gemini. The Scatter Plt View is based n clne pair analysis. On the ther hand, the Metric Graph View is based

S, we decided t cntrive the implementatin f the Scatter Plt View. At first, we recnstitute the data structure. Existing Gemini uses single 2-dimensinal matrix as the data structure fr the Scatter Plt View(Figure 4(a)). In Figure 4, c means clnes existing in the cell crrespnding t tw axes. T cpe with the prblem, we cntrive t use hierarchy 2-dimensinal matrices. As shwn in Figure 4(b), each cell f the tp level matrix has a sub-matrix. If n cde clnes exist in files crrespnding t this sub-matrix, it is nt necessary t create the instance f this ne, which leads t cutting dwn memry usage. Als, we revised the drawing cmpnent that if very many clnes exist in sme prtins f surce files, the prtins are marked ut cllectively. This cntrivance leads t cutting dwn the cst f drawing clnes. (a) Previus (b) Latter Figure 4: Data Structure f Scatter Plt n clne set analysis. We cnsidered that it is useful t use each view, separately. But, sme users wanted t use the views cperatively and then claimed that the peratins were smetimes cnfused. S, the usability f them were nt gd. With respect t the methd f refactring f cde clnes described in Sectin 2.4, we have just prpsed the apprach t extract the refactring-riented clnes and did nt cnsider hw t remve them. S, the user have t decide hw t remve the cde clnes by him/herself. This paper describes the imprvements fr these prblems. Fr the imprvement f Gemini, we have added sme new views, and changed architecture f it. As the results, the scalability f Gemini have been imprved greatly. Fr the imprvement f the refactring methd, we have intrduced sme metrics t determine hw t remve them. Detected clnes are quantitatively characterized by using the metrics which supprt the user hw t remve them. 3. NEW CODE CLONE ANALYSIS ENVI- RONMENT: SUPERGEMINI 3.1 Key Idea fr Imprving Gemini Thrugh several experiments, we realized that the Scatter Plt View makes Gemini s scalability lwer extremely. The Scatter Plt View, by its nature, needs space cmplexity O(n 2 )(n is the number f target surce files) t draw clne pairs. Mrever, Scatter Plt View draws each clne pair respectively even if very many clnes exist. Thus, it makes the cst f drawing very high. With respect t the prblem f user-interface, we changed the architecture f Gemini. Existing architecture is shwn in Figure 1. In the revisin f the architecture, we examined that there are fllwing tw categries f cde clne analysis: File-Based Analysis: It aims t evaluate hw many cde clnes are included in each file. In this analysis, the user selects sme files with interest. And, they check hw many clnes exist in thse files r hw thse files are cvered with clnes, and s n. Clne-Based Analysis: It aims t evaluate which cde clnes are distinguishing. This analysis, the user selects sme clnes with interest. And, they check where thse clnes exist in sftware, and s n. S, we changed the Gemini s architecture accrding t the categries. Als, we intrduce a new metric called LDL(C) t Gemini. LDL(C) means the rate hw cde clne C cntains the same statement repeatedly. Fr example, the fllwing cde fragment (a cde clne) C 1 cnsists f three System.ut.println statements. System.ut.println("The value f a is " + a); System.ut.println("The value f b is " + b); System.ut.println("The value f c is " + c); In this case, a System.ut.println statement is appeared three times in this fragment. S, the value f LDL(C 1) becmes 0.33. This metric enables the user t discriminate the clnes, which include repeatedly the same statements, frm ther clnes. Say, they are repeated imprt statements clnes in Java language, r repeated printf and scanf statements clnes in C language. The reasn f intrducing this metric als dues t the feedback frm sftware cmpanies. 3.2 Implementatin f SuperGemini We have implemented abve extensins as a new cde clne analysis envirnment, SuperGemini. SuperGemini fundamentally desn t prvide clne pair infrmatin t the user. Scatter Plt View isn t used t display clne pair infrmatin, but used t shw hw many cde clnes exist between each file, which is used in file-based analysis. Figure 5 shws the architecture f SuperGemini. As shwn in this figure, several new views, Directry Tree View, Clne Set List, and File List are included in SuperGemini. The Directry Tree View is mainly used with the Scatter Plt View. The Scatter Plt View shws clnes which are under the directry r file indicated n the Directry Tree View. The Clne Set List is used in clne-based analysis. The Clne Set List shws clne sets selected n the Metric Graph View. Als, the Clne Set List has a srt functin, which srts clne sets accrding t ascending r descending rder f each metric. We cnsider that the user perates these views as fllws. 1. The user rughly specifies clne sets in the Metric Graph View with his/her interest metrics. 2. The user srts the selected clne sets n the Clne Set List based n the metrics and specify the distinctive clne set. 3. The user brwses actual cde fragment f the specified clne set ne by ne.

AB C D >E FGH<> IKJ D G B L <M <NM DPO >?Q G R > < 34( *. + / ) 5. 6 7 81 + ( 2 ;@<>? $ % & ' ( ) ( *% + ) *, & -. / ( / & 0 (1 + ( 2! " # Extract Methd means that a fragment f surce cde are extracted and redefined as a new methd[6]. Originally, this pattern is applied t t lng methd r t cmplex part. Here, in rder t remve cde clnes, we use Extract Methd t extract cde clne fragments as a cmmn new methd. Pull Up Methd means that the same methds defined in child classes are pulled up t its parent class[6]. This pattern is perfrmed because f varius reasns such as design pattern. If plural child classes which have cmmn parent class include clne methd, pulling up such methds means clne remval. STM B >E FUG <> I"J D G B L <M <NM DO >?Q G R > < W XY #,/ 6 * * (.7 % & * 1 + ( 2 9 +. ( / * &. : *. ( (1 + ( 2 V+ % (H% + ) *, & -. / ( / & 0 (1 + ( 2 ;=<>? Figure 5: Architecture f SuperGemini The File List is used in file-based analysis. The File List has the fllwing infrmatin fr each file. the number f tkens included in the file, the number f lines included in the file, the number f cde clnes included in the file, the rate hw the file is cvered with cde clnes, the list f cde clnes included in the file, and the list f files which include the same cde clnes with the file. The File List prvides the user quantitative clne infrmatin f each file. The user als can srt files based n each infrmatin. Say, it is very easy fr the user t brwse the surce cde f the file which includes the largest number f cde clnes. Here, we classify clne sets as the fllwing three types based n its degree f dispersin in the file system. Dense: In case that all the element f a clne set are included in the same file, Middle: In case that all the element f a clne set are nt included in the same file but in the same directry, Scattered: In case that all the element f a clne set are included in neither the same file nr the same directry. The reasn why we use this classificatin is that the way hw t deal with cde clnes is different frm their degree f dispersin[11]. The Surce Cde View has als been adjusted t this classificatin. That is, the cde clnes shwn in the Surce Cde View are highlighted by the different clrs based n each dispersin. S, the user can figure ut the degree f dispersin f them at a glance. 4. REFACTORING METHOD FOR CODE CLONE 4.1 Key Idea We use existing refactring pattern[6], especially Extract Methd and Pull Up Methd, t remve cde clnes. 4.2 Cde Clne Metrics fr Determining Refactring Pattern We attempt t refine detected cde clnes by measuring their characteristics t remve sme f them. Extract Methd is the extractin f a cde fragment, s it is desirable that the target fragment has lw cupling with the ther surrunding fragments in the methd, in ther wrds, the variables defined utside the fragment aren t used (referred and substituted) in the fragment. If such variables are used, it is necessary t prvide them as parameters fr the new methd. Therefre, we measure the amunt f such variables. On the ther hand, Pull Up Methd means mving identical existing methds in child classes t the parent class, s it is necessary that the child classes have cmmn parent class. Therefre, we measure the dispersin f clnes in the class hierarchy. The abve characterizing makes it pssible t determine hw each clne can be remved. In rder t make the decisin, we intrduce several metrics. Fr the variables which are defined utside the cde clne fragment, we define tw metrics RV K(S), and RV N(S). Here, we assume that clne set S includes cde fragments {f 1, f 2,, f n. Cde fragment f i uses externally defined variables {v i1, v i2,, v imi. Als, RS(v ij ) dentes the ttal number f referred and substituted cunt f v ij. RV K(f i) = m i, m X i RV N(f i) = RS(v i) i=1 and, nx RV K(S) = ( RV K(f i))/n i=1 nx RV N(S) = ( RV N(f i))/n i=1 Intuitively, RV K(S) represents the number f externally defined variables used in the fragments f the clne set S. Additinally, RV N(S) cunts the number f usage f the variables used in the fragments f S. Fr the dispersin in class hierarchy, we defined a metrics DCH(S). As described abve, the clne set S includes cde fragments {f 1, f 2,, f n. C i dentes the class which includes cde fragment f i. Then, if the classes {C 1, C 2,, C n have several cmmn parent classes, C p is defined as the class which lays the lwest psitin in class hierarchy amng the parent classes f {C 1, C 2,, C n. Als, D(C k, C h ) represents the dis-

JJJ XXX ``` ddd fff hhh rž Žb šlœb bž JLKMN M O,N PQ RQSBOTQU MOVQ RM W "$#%&'( #)&%*,+ *,-*. & u$v w x y z { x ~ B y v flyz N [ \ O,N P Q RgQSW N [ ]O,N ] [ \V OV Q RM W hjik\ V O] V \N P Q RQSWQ lmmlmn [ P O W XLYZTN [ \ O,N P Q RQSW N [ ]O,N ] [ \V ^ V Q O_,W 354 6 78 4 796 :<; 8 =9>?@8; =<AB?! `LYZTN [ \ O,N P Q RQSBOV \WTWba P M[ \[ OTac >9:B4 :<C:D?FEFG 4 H@IF?B4 6 G 8D dlyztn [ \ O,N P Q RQS e\ [ P \ ^ V MU MS P R PN P Q R n2 & prqf* )*TsB& t ƒ v{ v w,ƒ y { w Š, Œ /0 )+ * 12%* + * -*. & x z vbz ˆ v y v Figure 6: The analysis flw f Cancer tance between class C k and class C h in the class hierarchy. DCH(S) = max {D(C 1, C p), D(C 2, C p),, D(C n, C p) If the classes dn t have cmmn parent class, DCH(S) = -1 The value f DCH(S) als becmes larger as the degree f the dispersin f its clne set becmes large. If all fragments f a clne set S are in the same class, the value f its DCH(S) is set as 0. If all fragment f a clne set are in a class and its direct children classes, the value f its DCH(S) is set as 1. Exceptinally, if classes which have sme fragments f a clne set dn t have cmmn parent class, the value f its DCH(S) is set as -1. In detail, this metric is measured fr nly the class hierarchy where the target sftware exists because it is unrealistic that the user pulls up sme methds which are defined in the target sftware classes t library classes like JDK. 4.3 Refactring Supprt Tl: Cancer Based n the prpsed methd, we have implemented a refactring supprt tl Cancer with Java language. Figure 6 shws the analysis flw f Cancer. Cancer cnsists f tw units, Analysis unit and GUI unit. Analysis unit perfrms the fllwing analyses. 1. Detectin f cde clnes, 2. Extractin f structural blcks, 3. Extractin f class hierarchy, 4. Extractin f variable definitin, 5. Extractin f structural clnes, 6. Calculatin f sme metrics. Fr detectin f cde clnes, Cancer internally calls CCFinder[10]. Fr the analyses which need syntax r semantic analyses, we used JavaCC[8], which is an pen surce cde generatr written in Java language. The analysis result, that is structural clne infrmatin with metrics, is passed t GUI unit as XML frmat. Figures 7(a) and 7(b) shw snapshts f Cancer with the name f the windws. Intuitively, the user specifies the distinctive clne set n the Main Windw. Then, he/she analyzes the details f it n the Clne Set Viewer. 4.4 Functin f Each Cmpnent Here, we explain each cmpnent n Cancer. (a) Befre selectin Figure 8: Metric Graph (b) After selectin 4.4.1 Metric Graph View The Metric Graph View uses existing metrics, LEN(S), P OP (S), and DF L(S) [15] in additin t three metrics defined in Sectin 4.2. The fllwings are brief explanatins f each metric. LEN(S) fr clne set S is the maximum length f tken sequence fr each ne in S. POP(S) is the number f elements (cde fragments) f a given clne set S. A clne set with a high value f P OP (S) means that similar cde fragment appear in many places. DFL(S) indicates an estimatin f hw many tkens wuld be remved frm surce files when the cde fragments in a clne set S are recnstructed. This recnstructin is cnsidered as the simplest case that all cde fragments f S are replaced with caller statements f a new identical rutine (functin, methd, template functin, r s). After the recnstructin, LEN(S) P OP (S) tkens are ccupied in the surce files. In the newly recnstructed surce files, they ccupy k P OP (S) tkens (let k be the number f tkens fr ne caller statement) fr caller statements and LEN(S) tkens fr callee rutine. Here, we explain the Metric Gragh View using an example shwn in Figure 8. In the Metric Graph View, each metric has a parallel crdinate axe. Upper and lwer limits are set per each metric. The hatching part is between upper and lwer limits f each metric. A plygnal line is drawn per each clne set. In this example, values fr the clne sets S 1 and S 2 are drawn. In the left graph(8(a)), all metric values f S 1 and S 2 are between upper and lwer limits. S, these tw clne sets are selected state. In the right graph(8(b)), the value f DCH(S 2) is bigger than the upper limit f DCH, which means S 2 is unselected state. The Metric Graph View enables the user t select arbitrary clne

"!# $ % &$ ' ( $ %)%*$ %)! #" $ * $ ',+ ' %& $ ' (! ' ) *+,$ $-$ '. (a) Main Windw set by changing upper and lwer limits f each metric. And, the result f selectin is reflected n the Clne Set List. 4.4.2 Checkbx f RV Variables In the Checkbx f RV Variables in Figure 7(a), the user can decide which variables are cunted as metrics RV K(S) and RV N(S). Currently, the variables are selected frm the fllwing five types. field members f its class, field members f parent class, this variable, super variable, lcal variables. Fr example, if the user is ging t perfrm Extract Methd within a class, it is nt necessary t cunt all kind f variables except lcal nes because these variables can be accessed anywhere in the same class. On the ther hand, if the user is ging t perfrm refactring that crsses ver plural classes like Pull Up Methd, these nes shuld be cunted. 4.4.3 Checkbx f Clne Unit In the Checkbx f Clne Unit, the user can decide which kind f clne unit are shwn in the Metric Graph View. Currently, the number f unit types are twelve as described in Sectin 2.4. Fr example, if the user is ging t perfrm Pull Up Methd, he/she shuld check nly methd unit because the target f this pattern is the existing methds. 4.4.4 Clne Set List The Clne Set List shws all clne sets which are selected in the Metric Graph View. And the list can srt clne sets in ascending and descending sequence f each metric value. Duble-clicking a clne set n this view is a trigger t run the Clne Set Viewer as shwn in Figure 7(b). It shws mre detail infrmatin f the selected clne set. Figure 7: Snapshts f Cancer (b) Clne Set Viewer 4.4.5 Metrics value panel The Metrics Value Panel shws the values f all metrics f clne set selected in the Main Windw. 4.4.6 Cde fragment list The Cde Fragment List shws the list f all cde fragments included in the selected clne set. Each element f the list has three kinds f infrmatin, a path t each file which includes the cde clne fragment, the lcatin f the cde clne in the file(the number f beginning line, beginning clumn, end line and end clumn), and the number f tken included in the cde clne fragment. 4.4.7 Surce Cde View The Surce Cde View wrks cperatively with the Cde Fragment List. The user can btain the actual surce cde crrespnding t the cde clne fragment selected in the Cde Fragment List. The fragment including the clnes is emphatically displayed. 4.4.8 RV variables list The RV Variables List shws the list f all variables which are used and defined externally in the cde fragment which is selected in the Cde Fragment List. Each element f this list has three kinds f infrmatin, the name f its variable, the kind f its variable and the cunt f used. 4.5 Refactring Prcedure Nw, we explain refactring prcess using Cancer. If the user wants t perfrm Pull Up Methd, the fllwing cnditins shuld be cnsidered fr example. (PC1) The target is methd unit cde clne. (PC2) The value f DCH(S) is mre than 1. Usually, Pull Up Methd is perfrmed n existing methds, s (PC1) shuld be cnsidered. And, the classes whse methd includes target cde clnes have t inherit cmmn parent class, s (PC2) shuld be cnsidered. Next,

Table 1: Target sftware Student s Ant 1.6 Eclipse 2.1.3 Num. f Files 70 627 7920 LOC 7200 180,000 1,690,000 Detectin time 2 secnds 20 secnds 5 minutes the refinement prcess is as fllws. At first, the user checks nly methd unit checkbx n the Checkbx f Clne Unit, which is reflected t the Metric Graph View. Next, the user sets the lwer limit f DCH(S) as mre then 0. This peratin is reflected t the Clne Set List. As the result, the list shws the clne sets which meet the cnditins (PC1) and (PC2). On the ther hand, if the user wants t perfrm Extract Methd, the fllwing cnditins shuld be cnsidered fr example. (EC1)The target is statement unit cde clnes. (EC2)Tthe value f DCH(S) is 0. (EC3)The value f RV K(S) is less than 1. Since Extract Methd is usually perfrmed n a cde fragment in a methd, (EC1) is cnsidered. Next, if all fragments f clne set S exist in the same class, it is easy t merge them. S, (EC2) is cnsidered. The reasn t cnsider (EC3) is that if sme variables which are externally defined are used in a fragment, it is necessary t make them parameters f the new extracted methd. Mrever, if sme values are substituted t sme f them, they have t be returned t methd caller place t reflect the values f them. It is necessary t cntrive like making new data class if plural value are substituted. The refinement prcess is as fllws. At first, the user checks nly statement unit (d, if, fr, switch, synchrnized, try, while) checkbx n the Checkbx f Clne Unit, which is reflected t the Metric Graph View. Next, the user checks nly lcal variable n the Checkbx f RV Variables because ther kind variables can be accessed as far as in the same class. Next, the user set the range f DCH(S) as sme value between 0 and 1(0 DCH(S) 1), and the upper limit f RV K(S) as less then 2. As the result f these peratins, the Clne Set List shws nly the clne sets which meet abve three cnditins (EC1), (EC2) and (EC3). 5. CASE STUDY Here, we describe several case studies t evaluate the usefulness f SuperGemini and Cancer. 5.1 SuperGemini The bjective f the case study fr SuperGemini is cnfirm hw the scalability f SuperGemini has imprved as cmpared with Gemini. In this case study, we applied SuperGemini t small, middle and large scale sftware. Fr each applicatin, we analyzed the detectin time f cde clne (by CCFinder), the initializatin time (reading cde clne infrmatin file and initializing each view) f SuperGemini and Gemini and the used memry size. Of curse, several cnditins fr the cde clnes are the same bth in SuperGemini and Gemini except the memry size assigned t JavaVM. Table 1 shws the scale and clne detectin time fr each sftware, and Table 2 shws the result f each applicatin. The small scale sftware is a student s prgram f Osaka University. In this applicatin, the used memry size f Gemini is a little bit lwer than ne f SuperGemini. We cnsider that the increased memry size f SuperGemini is caused by intrducing new metrics. Fr the applicatin f middle scale sftware, Ant 1.6.0[1], there is a remarkable difference between Gemini and SuperGemini. The initializatin time f Gemini is 15 secnd, which is s impractical cmparing with ne f SuperGemini. Als, the used memry size f SuperGemini is 35% smaller than ne f Gemini. This imprvement is caused by intrducing a new data structure f the Scatter Plt View. These difference becmes mre remarkable in applicatin t the large scale sftware, Eclipse 2.1.3[5]. In this case, in the executin f Gemini, we assigned 1GB memry t JavaVM (SuperGemini and Gemini are implemented in Java language). But, it was insufficient fr Gemini and s Gemini culdn t wrk. On the ther hand, in the executin f SuperGemini, thugh we nly assigned 100MB memry t JavaVM, SuperGemini wrked smthly. S, we can say that restructuring f the architecture makes marvelus imprvement f the scalability. 5.2 Cancer 5.2.1 Overview In rder t evaluate the usefulness f Cancer, we have applied it t Ant 1.6.0[1]. It includes 627 files and the size is 180,000 LOC. In this case study, we set thirty tkens as the length f minimum cde clne f CCFinder(intuitively, thirty tkens crrespnd t abut five LOC). The value thirty is the empirical value which was derived frm ur previus applicatins f CCFinder. We als set thirty tkens as the length f minimum clne f Cancer. Then, we tried t perfrm Extract Methd and Pull Up Methd t cde clnes detected by Cancer. We gt 154 clne sets frm Ant. The fllwings are the number f clnes. All detected clnes 154 Extract Methd 32 Pull Up Methd 20 The cnditins f Extract Methd and Pull Up Methd are the same as nes described in Sectin 4.5. In Sectin 5.2.2 and 5.2.3, we describe the details f refactring using Cancer. Als, after remving several clne sets, we perfrmed regressin tests t cnfirm the behavir f Ant. In the regressin test, we used ttally 220 test cases included in Ant package. These test cases used JUnit[9], which is ne f regressin testing framewrks. S, we culd easily perfrm all test cases and tk abut 4 minutes t perfrm all test cases. 5.2.2 Extract Methd As described abve, we extracted 32 clne sets using the Extract Methd cnditins described in Sectin 4.5. Then, we brwsed and examined all surce cdes f each clne set, and classified them t the fllwing fur grups: Grup 1 clne sets that can be remved nly by extracting them and making a new methd in the same class. Grup 2 clne sets that can be remved by extracting them and making a new methd with setting the externally defined variables as parameters f it because such variables are used in the clne.

Table 2: Result f case study Gemini SuperGemini Student s Ant 1.6.1 Eclipse 2.1.3 Student s Ant 1.6.1 Eclipse 2.1.3 Initializatin time 2 secnds 15 secnds - 1 secnds 2 secnds 20 secnds Memry usage 26MB 58MB - 35MB 38MB 100MB Grup 3 clne sets that can be remved by extracting them and making a new methd with setting the externally defined variables as parameters f it and with adding parameters f return statement t deliver the results t the variables used in the caller. Grup 4 clne sets that can be remved but need a lt f effrt. if (!ischecked()) { // make sure we dn t have a circular reference here Stack stk = new Stack(); stk.push(this); dieoncircularreference(stk, getprject()); Figure 9: Example f Extract Methd in Grup 1 Three clne sets were classified t Grup 1. Figure 9 shws a surce cde f ne f them. In this if-statement clne, n externally defined variable was used. S, it was very easy t extract it as a new methd in the same class. if (javacpts!= null &&!javacpts.equals("")) { genictask.createarg().setvalue("-javacpts"); genictask.createarg().setline(javacpts); Figure 10: Example f Extract Methd in Grup 2 Eighteen clne sets were classified t Grup 2. Figure 10 shws a surce cde f ne f them. In this if-statement clne, the variable javacpts was a field member f its class, and the variable genictask was a lcal variable. S, it is necessary t set genictask as a parameter f a new methd t extract this cde clne in the same class. if (isavemenuitem == null) { try { isavemenuitem = new MenuItem(); isavemenuitem.setlabel("save BuildInf T Repsitry"); catch (Thrwable iexc) { handleexceptin(iexc); Figure 11: Example f Extract Methd in Grup 3 Seven clne sets were classified t Grup 3. Figure 11 shws a surce cde f ne f them. In this if-statement clne, the variable isavemenuitem was externally defined. Mrever, the value was substituted t it. S, it is necessary t set isavemenuitem as a parameter f a new methd and add return statement t reflect the result f substitutin t the caller. Fur clne sets were classified t Grup 4. Figure 12 shws a surce cde f ne f them. In this if-statement clne, sme return-statements were used. S, a lt f effrt wuld be necessary t extract it. In this case study, we didn t remve these fur clne sets because we think that remval f them is strngly dependent n the skill f each prgrammer. 5.2.3 Pull Up Methd Next, we describe the results f applying Pull Up Methd. As described abve, we extracted 20 clne sets using the if (name == null) { if (ther.name!= null) { return false; else if (!name.equals(ther.name)) { return false; Figure 12: Example f Extract Methd in grup 4 Pull Up Methd cnditins described in Sectin 4.5. Then, we brwsed and examined all surce cdes f each cde clne, and classified them t the fllwing fur grups: Grup 1 clne sets that can be remved nly by mving them t the cmmn parent class. Grup 2 clne sets that can be remved by mving them t cmmn parent class after adding variables which are defined utside. Grup 3 clne sets that can be remved by mving them t cmmn parent class and adding a new methd which needs parameters f utside variables and return statement. Existing methds which includes the pull-uped clnes can be deleted r changed s that they call the new methd frm the inside. If they are deleted, it is necessary t change all its caller places. Grup 4 clne sets that need much cntrivance t remve. Here, n clne set was classified t Grup 1. private vid getcmmentfilecmmand(cmmandline cmd) { if (getcmmentfile()!= null) { /* Had t make tw separate cmmands here because if a space is inserted between the flag and the value, it is treated as a Windws filename with a space and it is enclsed in duble qutes ("). This breaks clearcase. */ cmd.createargument().setvalue(flag_commentfile); cmd.createargument().setvalue(getcmmentfile()); Figure 13: Example f Pull Up Methd in grup 2 Ten clne sets were classified t Grup 2. Figure 13 shws a surce cde f ne f them. In this methd clne, the variable this was mitted at calling methd getcmment- File which was defined in the same class. The variables this and FLAG_COMMENTFILE, which are field members f the same class, are externally defined. T adapt Pull Up Methd pattern, with adding tw parameters, we pulled up them t the cmmn parent class. Tw clne sets were classified t Grup 3. Figure 14 shws a surce cde f ne f them. In this methd clne, the variable map was externally defined, and sme values were substituted t it(methd seterrr was defined in the cmmn parent class). S, t pull up this clne set t the cmmn parent class, it was necessary t add a parameter and return statement fr the variable map. Eight clne sets were classified t Grup 4. Figure 15 shws a surce cde f ne f them. In this methd, the methd checkoptins was called. This methd was defined in

public vid verifysettings() { if (targetdir == null) { seterrr("the targetdir attribute is required."); if (mapperelement == null) { map = new IdentityMapper(); else { map = mapperelement.getimplementatin(); if (map == null) { seterrr("culd nt set <mapper> element."); Figure 14: Example f Pull Up Methd in grup 3 public vid execute() thrws BuildExceptin { Cmmandline cmmandline = new Cmmandline(); Prject aprj = getprject(); int result = 0; // Default the viewpath t basedir if it is nt specified if (getviewpath() == null) { setviewpath(aprj.getbasedir().getpath()); // build the cmmand line frm what we gt. the frmat is // cleartl checkin [ptins...] [viewpath...] // as specified in the CLEARTOOL.EXE help cmmandline.setexecutable(getcleartlcmmand()); cmmandline.createargument().setvalue(command_checkin); checkoptins(cmmandline); result = run(cmmandline); if (Execute.isFailure(result)) { String msg = "Failed executing: " + cmmandline.tstring(); thrw new BuildExceptin(msg, getlcatin()); Figure 15: Example f Pull Up Methd in grup 4 the same class(methds, getprject, getviewpath and getlcatin were defined by using cmmn parent class). And, the variable cmmandline, which was a parameter f this methd, was defined and used in the clne. S, this methd caller made it difficult t apply Pull Up Methd t this clne set. But, the methd checkoptins was defined in each child class. In this case, Template Methd pattern[6] culd be applied. Prcedure f this pattern appliance is the fllwings. At first, we mved the clne t the cmmn parent class. Next, we defined an abstract methd checkoptins in the cmmn parent class. 6. CONCLUSION In this paper, we have prpsed a new cde clne analysis envirnments. We have implemented SuperGemini that is capable t analyze huge scale sftware. Als, we have prvided sme new viewers t SuperGemini and attained high usability cmpared with Gemini. Mrever, we have prpsed a new refactring methd fr cde clnes, and implemented a refactring supprt tl, Cancer. The cde clne analysis algrithm used in Cancer is s fast that it can apply industrial huge scale sftware. Als, we have applied Cancer t Ant, and remved almst f refined clnes. As future wrks, we are ging t perfrm mre detail analysis fr cde clnes. Fr example, distinctin f reference and substitutin f externally defined variables shuld be cnsidered. Als, we are ging t cnsider the effectiveness f refactring. Currently, we refine cde clnes based n the judgment whether they can be remved r nt. If we can judge whether the cde clnes shuld be remved r nt, the supprting f the refactring will becme mre effective. In the present, SuperGemini and Cancer have been develped independently. S, sme cmpnents are implemented n bth f them (ex. the Metrics Graph View). It is necessary t integrate them and develp a ttal maintenance envirnment based n cde clne analysis. Als, we are ging t deliver SuperGemini t sftware cmpanies t cnfirm the impact f imprving architecture, because this imprvement is based n their requirement. Als, we are ging t apply Cancer t several industrial sftware. 7. REFERENCES [1] Ant, http://ant.apache.rg, 2003. [2] M. Balazinska et al., Advanced Clne-Analysis t Supprt Object-Oriented System Refactring, Prceedings the 7th Wrking Cnference n Reverse Engineering, 2000, 98-107. [3] I. D. Baxter et al., Clne Detectin Using Abstract Syntax Trees, Prc. f ICSM98, pages 368-377, Nv. 1998. [4] S. Ducasse et al., A Language Independent Apprach fr Detecting Duplicated Cde, Prc. f ICSM99, pages 109-118, Aug. 1999. [5] Eclipse, http://www.eclipse.rg, 2004. [6] M. Fwler, Refactring: imprving the design f existing cde, Addisn Wesley, 1999. [7] Y. Hig, Y. Ueda, T. Kamiya, S. Kusumt and K. Inue, On sftware maintenance prcess imprvement based n cde clne analysis, Prc. f Prfes 2002, pp. 185-197 (2002). [8] JavaCC, http://javacc.dev.java.net, 2003. [9] JUnit, http://www.junit.rg, 2003. [10] T. Kamiya, S. Kusumt, and K. Inue, CCFinder: A multi-linguistic tken-based cde clne detectin system fr large scale surce cde IEEE Transactins n Sftware Engineering, vl. 28, n. 7, pp. 654-670, (2002-7). [11] Cry Kapser and Michael W. Gdfrey, Tward a Taxnmy f Clnes in Surce Cde: A Case Study, Evlutin f Large-scale Industrial Sftware Applicatins (ELISA), Amsterdam, The Netherlands, September 23, 2003. [12] R. Kmndr and S. Hrwitz, Using slicing t identify duplicatin in surce cde, In Prc. f the 8th Internatinal Sympsium n Static Analysis, Paris, France, July 16-18, 2001. [13] J. Mayland, C. Leblanc, and E. M. Merl Experiment n the Autmatic Detectin f Functin Clnes in a Sftware System Using Metrics, Prc. f IEEE Int l Cnf. n Sftware Maintenance (ICSM) 96, pages 244-253, Mnterey, Califrnia, Nv. 1996. [14] Pigski T. M, Maintenance, Encyclpedia f Sftware Engineering, 1, Jhn Wiley & Sns, 1994. [15] Y. Ueda, T. Kamiya, S. Kusumt, K. Inue, Gemini: Maintenance Supprt Envirnment Based n Cde Clne Analysis, 8th Internatinal Sympsium n Sftware Metrics, June 4-7, 2002.