Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article

Similar documents
Optimal Planning of Charging Station for Phased Electric Vehicle *

Survival Rate of Patients of Ovarian Cancer: Rough Set Approach

THE NATURAL HISTORY AND THE EFFECT OF PIVMECILLINAM IN LOWER URINARY TRACT INFECTION.

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of

Using Past Queries for Resource Selection in Distributed Information Retrieval

Study and Comparison of Various Techniques of Image Edge Detection

A Classification Model for Imbalanced Medical Data based on PCA and Farther Distance based Synthetic Minority Oversampling Technique

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores

AUTOMATED DETECTION OF HARD EXUDATES IN FUNDUS IMAGES USING IMPROVED OTSU THRESHOLDING AND SVM

Performance Evaluation of Public Non-Profit Hospitals Using a BP Artificial Neural Network: The Case of Hubei Province in China

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

THIS IS AN OFFICIAL NH DHHS HEALTH ALERT

Reconstruction of gene regulatory network of colon cancer using information theoretic approach

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE

AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters

Lymphoma Cancer Classification Using Genetic Programming with SNR Features

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA

Statistically Weighted Voting Analysis of Microarrays for Molecular Pattern Selection and Discovery Cancer Genotypes

econstor Make Your Publications Visible.

Balanced Query Methods for Improving OCR-Based Retrieval

Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago

CLUSTERING is always popular in modern technology

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo

EXAMINATION OF THE DENSITY OF SEMEN AND ANALYSIS OF SPERM CELL MOVEMENT. 1. INTRODUCTION

Copy Number Variation Methods and Data

Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer

Prognosis and Diagnosis of Breast Cancer Using Interactive Dashboard Through Big Data Analytics

Nonlinear Modeling Method Based on RBF Neural Network Trained by AFSA with Adaptive Adjustment

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures

Evaluation of Literature-based Discovery Systems

Improvement of Automatic Hemorrhages Detection Methods using Brightness Correction on Fundus Images

Physical Model for the Evolution of the Genetic Code

Saeed Ghanbari, Seyyed Mohammad Taghi Ayatollahi*, Najaf Zare

INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER

Comparison of support vector machine based on genetic algorithm with logistic regression to diagnose obstructive sleep apnea

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE

RENAL FUNCTION AND ACE INHIBITORS IN RENAL ARTERY STENOSISA/adbon et al. 651

A Computer-aided System for Discriminating Normal from Cancerous Regions in IHC Liver Cancer Tissue Images Using K-means Clustering*

Economic crisis and follow-up of the conditions that define metabolic syndrome in a cohort of Catalonia,

NHS Outcomes Framework

AlereTM. i Influenza A & B. Enter. Molecular results in less than 15 minutes

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex

A New Machine Learning Algorithm for Breast and Pectoral Muscle Segmentation

A Support Vector Machine Classifier based on Recursive Feature Elimination for Microarray Data in Breast Cancer Characterization. Abstract.

Study on Psychological Crisis Evaluation Combining Factor Analysis and Neural Networks *

A New Diagnosis Loseless Compression Method for Digital Mammography Based on Multiple Arbitrary Shape ROIs Coding Framework

Decreased Nailfold Capillary Density in Limited Scleroderma with Pulmonary Hypertension. and a longer disease duration. 3,4

*VALLIAPPAN Raman 1, PUTRA Sumari 2 and MANDAVA Rajeswari 3. George town, Penang 11800, Malaysia. George town, Penang 11800, Malaysia

Multidimensional Reliability of Instrument for Measuring Students Attitudes Toward Statistics by Using Semantic Differential Scale

Cancer Classification Based on Support Vector Machine Optimized by Particle Swarm Optimization and Artificial Bee Colony

Association Analysis and Distribution of Chronic Gastritis Syndromes Based on Associated Density

An Approach to Discover Dependencies between Service Operations*

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy

Research Article Statistical Analysis of Haralick Texture Features to Discriminate Lung Abnormalities

DS May 31,2012 Commissioner, Development. Services Department SPA June 7,2012

Evaluation of the generalized gamma as a tool for treatment planning optimization

Semantics and image content integration for pulmonary nodule interpretation in thoracic computed tomography

Statistical Analysis on Infectious Diseases in Dubai, UAE

A Support Vector Machine Classifier based on Recursive Feature Elimination for Microarray Data in Breast Cancer Characterization. Abstract.

Classification of Breast Tumor in Mammogram Images Using Unsupervised Feature Learning

ARTICLE IN PRESS. computer methods and programs in biomedicine xxx (2007) xxx xxx. journal homepage:

What Determines Attitude Improvements? Does Religiosity Help?

Automated and ERP-Based Diagnosis of Attention-Deficit Hyperactivity Disorder in Children

Statistical models for predicting number of involved nodes in breast cancer patients

Detection of Lung Cancer at Early Stage using Neural Network Techniques for Preventing Health Care

Feature Selection for Predicting Tumor Metastases in Microarray Experiments using Paired Design

Clinging to Beliefs: A Constraint-satisfaction Model

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect

A Linear Regression Model to Detect User Emotion for Touch Input Interactive Systems

Jurnal Teknologi USING ASSOCIATION RULES TO STUDY PATTERNS OF MEDICINE USE IN THAI ADULT DEPRESSED PATIENTS. Full Paper

Computing and Using Reputations for Internet Ratings

Biomarker Selection from Gene Expression Data for Tumour Categorization Using Bat Algorithm

Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification

Introduction ORIGINAL RESEARCH

Maize Varieties Combination Model of Multi-factor. and Implement

National Polyp Study data: evidence for regression of adenomas

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015

Engineered commensal microbes for dietmediated colorectal-cancer chemoprevention

N-back Training Task Performance: Analysis and Model

A Geometric Approach To Fully Automatic Chromosome Segmentation

Towards Prediction of Radiation Pneumonitis Arising from Lung Cancer Patients Using Machine Learning Approaches

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx

Impact of Imputation of Missing Data on Estimation of Survival Rates: An Example in Breast Cancer

Strategies for the Early Diagnosis of Acute Myocardial Infarction Using Biochemical Markers

The impact of asthma self-management education programs on the health outcomes: A meta-analysis (systemic review) of randomized controlled trials

Available online at ScienceDirect. Procedia Computer Science 46 (2015 )

Recent Trends in U.S. Breast Cancer Incidence, Survival, and Mortality Rates

AUTOMATED CHARACTERIZATION OF ESOPHAGEAL AND SEVERELY INJURED VOICES BY MEANS OF ACOUSTIC PARAMETERS

Nonstandard Machine Learning Algorithms for Microarray Data Mining. Byoung-Tak Zhang

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field

Algorithms 2009, 2, ; doi: /a OPEN ACCESS

Lateral Transfer Data Report. Principal Investigator: Andrea Baptiste, MA, OT, CIE Co-Investigator: Kay Steadman, MA, OTR, CHSP. Executive Summary:

Insights in Genetics and Genomics

Comparison among Feature Encoding Techniques for HIV-1 Protease Cleavage Specificity

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Proceedings of the 6th WSEAS Int. Conf. on EVOLUTIONARY COMPUTING, Lisbon, Portugal, June 16-18, 2005 (pp )

Transcription:

Jestr Journal of Engneerng Scence and Technology Revew () (08) 5 - Research Artcle Prognoss Evaluaton of Ovaran Granulosa Cell Tumor Based on Co-forest ntellgence Model Xn Lao Xn Zheng Juan Zou Mn Feng Lang Sun Yan L and Kaxuan Yang * Department of Pathology West Chna Second Unversty Hosptal Chengdu 600 Chna Key Laboratory of Brth Defects and Related Dsease of Women and Chldren Mnstry of Educaton Schuan Unversty Chengdu 600 Chna College of Computer Scence and Engneerng Unversty of Electronc Scence and Technology of Chna Chengdu 67 Chna Department of mmunology Cleveland Clnc Cleveland Oho 95 Unted States JOURNAL OF Engneerng Scence and Technology Revew www.jestr.org Receved November 07; Accepted Aprl 08 Abstract Ovaran granulosa cell tumor (GCT) has dfferent recurrence perods whch dramatcally decreases after the 5-year survval perod. Prognoss evaluaton has mportant clncal values and s a research hotspot. Prognoss evaluaton methods nclude logstc regresson Ch-square analyss and other tradtonal statstcal methods; however these technques cannot solve problems such as lmted samples and ambguous prognoss-related pathologc features and have poor relablty and valdty of assessment results. n ths study an artfcal ntellgence theory was ntroduced and the prognoss evaluaton of ovaran GCT based on co-forest ntellgence model was proposed to fnd a method applcable to the pathologcal data of ovaran GCT wth lmted samples and ambguous prognoss features. Frst data preprocessng of ovaran GCT samples was performed. Ths procedure ncluded deletng unqualfed data and standardzng and normalzng data. Second prognoss evaluaton of ovaran GCT was accomplshed by usng co-forest ntellgence algorthm. Fnally the valdty of the proposed prognoss evaluaton method was verfed by 75 patents wth ovaran GCT n the West Chna Second Hosptal of Schuan Unversty. Results ndcate that: () the accuracy of prognoss evaluaton based on the feature set selected by Log-Rank test ncreases by.% compared wth that (.%) based on the drect use of standardzed and normalzed feature set and () the co-forest algorthm can be used for the model analyss of small pathologcal datasets of ovaran GCT. Moreover ths method can be used to explore effectve characterstcs from the canddate feature dataset through automatc learnng wth predcton accuracy of up to 95.7%. Ths study reveals the relablty and effectveness of the proposed prognoss evaluaton method of ovaran GCT based on co-forest ntellgence model. Conclusons are benefcal for clncans to accurately understand the development laws of ovaran GCT take the ntatve to master the dagnoss and treatment and ncrease the long-term survval rate of patents. Keywords: Ovaran granulosa cell tumor Prognoss evaluaton Log-Rank test Co-forest ntellgent algorthm. ntroducton Ovaran granulosa cell tumor (GCT) s a sex cord-stromal tumor wth low grade malgnancy (ncludng adult GCT and juvenle GCT). Menstrual dsorder n the reproductve age or rregular vagnal bleedng n menopause perod abdomnal pan pelvc mass and ascte long-term recurrence and sgnfcant reducton of fve-year survval rate after recurrence are common clncal symptoms of ovaran GCT[]. Therefore prognoss evaluaton of ovaran GCT s mportant for ts dagnoss and treatment formulaton by clncans. A stable and relable evaluaton method s conducve to clncans to take the ntatve to master dagnoss and treatment and ncrease the long-term survval rate of patents. However the exstng prognoss evaluaton methods of ovaran GCT manly use tradtonal statstcal approaches such as logstc regresson and Ch-square analyss. These methods determne the correlaton between *E-mal address: huaxpath@alyun.com SSN: 79-77 08 Eastern Macedona and Thrace nsttute of Technology. All rghts reserved. do:0.50/jestr..9 sngle factor and tumor recurrence[]. Dfferences of relevant prognoss pathologc features reman unknown. Dfferent lteratures report sgnfcantly dfferent outcomes provdng dffculty to clncans relable to fnd references for the dagnoss and treatment of ths dsease[]. n addton the long recurrence of ovaran GCT causes hgh loss rato of follow-up resultng n lmted samples and further ncreasng dffcultes aganst prognoss evaluaton. Recently sem-supervsed learnng technology has broken the applcaton bottleneck of modelng analyss based on small-szed dataset[][5]. Sem-supervsed learnng technology can tran the ntal model based on few labeled ovaran GCT samples predct the unlabeled ovaran GCT samples based on the automatc markng strategy of probablty learnng theory and mprove the generalzaton ablty of the model learned and acqured from few labeled samples by usng the effectve nformaton hdden n unlabeled data. These characterstcs make artfcal ntellgence (A) technology applcable to prognoss evaluaton of ovaran GCT wth lmted samples and ambguous relevant prognostc features. A seres of assocated features ncludng clncal and pathologcal features s enlsted based on exstng studes[ 6-6].

Xn Lao Xn Zheng Juan Zou Mn Feng Lang Sun Yan L and Kaxuan Yang/ Journal of Engneerng Scence and Technology Revew () (08) 5- Furthermore the prognoss evaluaton method of ovaran GCT based on co-forest algorthm and multple factors was constructed to solve problems of lmted ovaran GCT samples and ambguous prognostc features.. State of the art Abundant academc studes on prognoss evaluaton of ovaran GCT have been reported at home and abroad tryng to fnd stable and relable prognostc features of ovaran GCT and provde some references for postoperatve treatment and therapeutc effect evaluaton of tumors. Based on lterature revew the recurrence perod of ovaran GCT s n 5 years after the frst vst[6]. n the study of Fox et al. more than 50% patents suffered recurrence n years[][6]. Schwartz et al. also reported that 76.% patents suffered recurrence n years[7][8]. To have patents who suffered recurrence after more than 0 years s also common[][]. Sommers once reported that sxpatents wth ovaran GCT suffered recurrence after 0 years of operaton[]. The longest perod of recurrence of ovaran GCT reaches 7 years[6]. Clncal features such as recurrence of ovaran GCT pelvc spread and tumor nvolvement of extra ovaran organs are beleved as effectve features of poor prognoss of ovaran GCT[6][]. Varous pathologc features of ovaran GCT are sgnfcantly correlated wth clncal prognoss. However dfferent research conclusons stll had many contradctons and dsputes. Haba et al.[5] ponted out pathologc features wth tumor well-dfferentaton. For example follcular pattern of tumor cells and occurrence of Call-Exner body all promoted good tumor prognoss. nsular or dffuse pattern of tumor cells prompted poor dfferentaton of tumors and poor prognoss[5]. Pectasdes et al.[6] beleved that nuclear mtoss actvty of tumor cells s related wth the assocated marker K-67 ndex and expresson levels of oncogene and ant-oncogene markers (e.g. P5 P6 and PTEN) are pathologcal features related wth prognoss of ovaran GCT. However no agreement on these research conclusons has been reached yet. Moreover ovaran GCT s not a common ovaran tumor and has very lmted clncal samples and dffcult data avalablty (acquston of one sample covers multple programs ncludng collecton of clncal data pathologc mage and mmunohstochemcal stanng). The prognoss analyss modelng of ovaran GCT based on clncal and pathologc data shall prevent unqualfed samples n pathologc dataset nvolved n teratve tunng of the model. Before the modelng analyss based on the feature dataset the feature dataset has to be standardzed and normalzed. The correlaton between pathologc features and clncal prognoss of ovaran GCT has not been determned completely. Therefore ths modelng requres that the appled ntellgence algorthm shall be able to explore effectve features from canddate feature dataset through automatc learnng. n addton wth lmted pathologc data of ovaran GCT the ntellgence algorthm shall be capable to establsh the ntal model by usng few samples. Furthermore the model can screen qualfed samples for teratve tunng to mprove the predcton performance of the model. Based on the above analyss a prognoss evaluaton method of ovaran GCT was proposed based on the co-forest ntellgence model. Frst a seres of features ncludng clncal and pathologc features was enlsted n the proposed model wth reference to exstng lteratures and research results. Unqualfed samples n the feature dataset were elmnated and data standardzaton and normalzaton were performed. Subsequently the co-forest ntellgence algorthm that can explore effectve features automatcally was appled to prognoss evaluaton of pathologc data of ovaran GCT. The remander of ths study s organzed as follows. Secton descrbes the research methodologes ncludng data preprocessng and co-forest ntellgence algorthm for ovaran GCT prognoss evaluaton. Secton constructs the GCT pathologc dataset based on 75 patents wth ovaran GCT from Aprl 00 and February 0 n West Chna Second Hosptal of Schuan Unversty. Secton 5 carres out the correspondng experments and analyses based on above GCT pathologc dataset. Secton 6 presents the conclusons. data flterng prognostc evaluaton pathologc dataset of ovaran GCTs data preprocessng Co-Forest ntellgence model prognostc status poor good Fg.. Flowchart of the proposed prognoss evaluaton method. Methodology The flow chart of the proposed prognoss evaluaton method of ovaran GCT based on the co-forest ntellgence model s shown n Fg.. Frst pathologc data of ovaran GCT samples were preprocessed ncludng elmnaton of unqualfed data and data standardzaton and normalzaton. Second prognoss evaluaton of ovaran GCT was accomplshed by the co-forest ntellgence algorthm. Structure of GCT pathologc data was determned wth reference to prevous lteratures and research fndngs. A seres of relevant features ncludng clncal and pathologcal characterstcs was enlsted n the proposed method. 6

Xn Lao Xn Zheng Juan Zou Mn Feng Lang Sun Yan L and Kaxuan Yang/ Journal of Engneerng Scence and Technology Revew () (08) 5-. Preprocessng of pathologc dataset of ovaran GCT Orgnal data are generally ncomplete redundant and fuzzy. The nterference nformaton n the orgnal data may cause analyss bas[7]. Therefore data preprocessng s needed before prognoss evaluaton by usng the co-forest ntellgence model ncludng elmnaton of unqualfed data accordng to expert rules and data standardzaton and normalzaton[8]. Frst unqueness ntegrty (whether key attrbute values n data records are clear and ntegral) valdty (whether the value range of each attrbute n data record s reasonable and mets the constrants) and consstency (whether unt of each attrbute n data records s set unform) should be observed. nconsstent standards and data structure shall be avoded and pathologc data of ovaran GCT shall be verfed accordng to ther medcal sgnfcance. Data that fal to meet the above condtons shall be deleted. Gven that the feature dataset of ovaran GCT covers dfferent types and dmensons of attrbutes whch may nfluence the modelng analyss results data standardzaton and normalzaton are necessary to ensure that all features are at the same order of magntudes and applcable for contrast analyss. Data standardzaton method s related wth actual meanng and valung mode of data and shall be judged accordng to expert rules. The data processng model wll be nterpreted n detal n Secton.. Zero-mean normalzaton method was adopted as follows: Sk[] Mk[] Nk[]= V [] k where Nk[] s the attrbute n the normalzed sample k; and S [] k s the attrbute n the sample k. Mk[] and Vk [] are mean and varance of attrbute n the sample k respectvely. The calculaton formulas are shown as Eqs. () and (). m Mk[]= Sk[ ] () m = k m V []= S M () ( [ ] [ ]) k k k m k = where m s the sample sze n the feature dataset of ovaran GCT.. Prognoss evaluaton of ovaran GCT based on coforest ntellgence algorthm Sem-supervsed learnng algorthm can tran the ntal model by usng few labeled samples. Durng predcton of unlabeled samples the model can screen unlabeled samples wth hgh confdence coeffcent for teratve tunng accordng to screenng strategy further mprovng the generalzaton ablty of the model[9][0]. Co-tranng s an mportant branch n sem-supervsed learnng algorthm. Zhou et al. proposed the co-forest algorthm[] based on the ntellgent collaboratve algorthm[][5] whch further used the collaboratve performance of multple basc models and can perform modelng analyss on small-szed dataset. Moreover the co-forest algorthm s able to explore effectve features from canddate feature dataset through automatc learnng. n ths study the co-forest algorthm was appled for prognoss evaluaton of ovaran GCT. () The co-forest model accomplshes the co-tranng by usng sx base classfers. Frst sx ndependent sample subsets are acqured through Bootstrap resamplng of labeled sample set and used to tran sx base classfers. Next unlabeled samples whch meet the requrements are selected by combnng classfers (rest fve base classfers) as the supplementary sample set for teratve tunng of the model. The teratve tranng of the co-forest ntellgence model s shown n Fg.. Specfc steps are ntroduced as follows. Step ) Sx ndependent tranng sample subsets ( L L L L L 5 and L 6 ) are constructed through Bootstrap resamplng[] from labeled sample set. They are used to tran base classfers (random stress[]) ( bc bc 5 and bc 6 ) whch can explore effectve features automatcally from the orgnal dataset. Step ) mplementng co-tranng of sx base classfers. The unlabeled samples that shall be added n base classfer bc for next teratve tranng are determned by votng of the combnng classfer HC. Next the newly constructed sample set s used to re-tran base classfers. Frst classfcaton errors e (suppose t s the -th teraton at present) of labeled sample set by the combnng classfer HC (combnaton of fve base classfers except the base classfer bc ) are recorded. f e meets Eq. () samples wth hgh confdence coeffcent whch meet the condtons are selectedas the extended tranng set. e W < e t t < W t t where the ntal value of classfcaton error ( e 0 ) can be set as 0.5. Durng optmzaton of base classfers extended sample set s only selected when the performance of the combnng classfer s mproved. Specfcally data n unlabelled sample set are added nto the canddate extended sample set. When the weght sum of all added unlabeled samples s hgher than the threshold addng s stopped. Next extended samples are screened from the canddate extended sample set accordng to the confdence coeffcent. Sngle canddate sample n the canddate sample set whch has lower confdence coeffcent than the threshold shall be deleted. Then canddate sample set s formed by screenng accordng to threshold of sngle confdence coeffcent and subsequently judged by Eq. (5). f the sample meets the condtons t s used as the extended sample set. Otherwse the samples deleted. W e W t t t < (5) et where Wt s the weght of the sample set at the -th teraton. W t s calculated as follows. The weght W s the predcted confdence coeffcent t j of sample x j of the n classfer except for -th classfer at the t-th teraton. Accordng to the above method tranng sample sets of base classfers ( bc 5 and bc 6 ) are extended by usng the combnng classfer( HC HC HC () 7

Xn Lao Xn Zheng Juan Zou Mn Feng Lang Sun Yan L and Kaxuan Yang/ Journal of Engneerng Scence and Technology Revew () (08) 5- HC HC 5 and HC 6 ). n ths way the co-tranng of sx base classfers s accomplshed. Step ) Determnng whether supplementary samples added to sx base classfers s judged one by one. f yes the supplementary samples are ntegrated wth current sample set of the base classfer to re-tran the base classfer and update the state of the flag bt. Step ) The updatng flag bt of sx base classfers s checked one by one. f none s updated tranng of co-forest ntellgence model s stopped. Otherwse step s performed and the co-tranng s contnued. buld sx base classfers teratve tranng the sx base classfers buld sx sample subsets co-tranng sx base classfers screenng accordng to confdence tran sx base classfers update sx base classfer Y extend the tranng set N co-forest ntellgence model Fg.. Tranng of the co-forest ntellgence model. Constructon of pathologc dataset of ovaran GCT Patents wth ovaran GCT from Aprl 00 to February 0 dagnosed and hosptalzed n the West Chna nd Unversty Hosptal of Schuan Unversty were selected n the research based on the followng rules. () Dagnoss of ovaran GCT was revewed and confrmed by senor pathologsts. () Complete clncal data from the frst vst to treatment perod. () Follow-up vst years. Fnally 75 patents wth ovaran GCT were nvestgated n ths experment ncludng 7 patents sufferng recurrence of tumor n the follow-up vst. The recurrence perod ranges between 6 and 5 months months n average. Among them three patents ded of recurrence. Clncal data of all patents were revewed and dfferent clncal characterstc features were summarzed ncludng age modus operand clncal stage of tumor and postoperatve chemotherapy. n the pathologc dataset of ovaran GCT patents aged from to 80 and the age of medan onset was 7 years old. All patents were treated by operatons. Among them patents (56%) had prmary operaton and adopted uterus + blateral adnexectomy +/ lymph node excson patents (%) had adnexectomy of the affected sde or tumorectomy and 9 patents (%) had tumor reductve surgery. After the operaton 9 patents (5%) were determned as stage 0 patents (0%) at stage and 6 patents (8%) at stage. n addton 8 patents (6%) receved radotherapy/chemotherapy and another 7 patents (6%) had not receved radotherapy/chemotherapy. Pathologc data and sectons of all patents are revewed by senor attendng doctors. The tumor dameter ranges between.5 and cm 5.8cm n average. Specfcally sx patents (8%) had spontaneous tumor rupture. Tumor patterns under a mcroscope are manly follcular pattern (Fg.) nsular pattern (Fg.) trabecular pattern (Fg.5) and dffuse/sarcoma pattern (Fg.6) accompaned wth few combned patterns (two or more patterns n the above four patterns). Twenty sx patents (.7%) presented tumor hemorrhage and necross. Call-Exner body (Fg.7) was observed n 8 patents (6%) (Fg.7) and tumor lutenzaton was detected n patents (0.7%) (Fg.8). The nuclear mtoss phase of tumor counted -/0HPF 7/0HPF n average. n mmunohstochemcal test 5 patents (69.%) were PTEN postve 6 patents (8.7%) were p6 postve and patents (57.%) were p5 postve (postve cells>50%[6]). Fg.. Tumor cells n follcular pattern (amplfcaton factor=00) Fg.. Tumor cells n nsular pattern (amplfcaton factor=00) 8

Xn Lao Xn Zheng Juan Zou Mn Feng Lang Sun Yan L and Kaxuan Yang/ Journal of Engneerng Scence and Technology Revew () (08) 5- rules for specfc data n samples are shown n Table. Dfferent preprocessng rules are descrbed as follows. () Rule : bnary data term s f t has correspondence. Otherwse t values 0. () Rule : multple-valued data term wth determned value s dscretzed accordng to regulated proporton. () Rule : multple-valued data term wthout determned value s truncated frst accordng to the upper lmt set by expert rules and then dscretzed. Fg. 5. Tumor cells n trabecular pattern (amplfcaton factor=00) Fg. 6. Tumor cells n dffuse/sarcoma pattern (amplfcaton factor=00) Table.. Data preprocessng rules of ovaran GCT ndex clncal stage of tumor postoperatve chemotherapy Call-Exner body number of nuclear mtoss cell atypsm haemorrhage and necross follcular pattern nsular pattern trabecular pattern rbbon pattern dffuse pattern lutenzaton of tumor cells K-67 expresson PTEN EGFR P5 prognostc status Rules Pathologcal samples of ovaran GCT nclude 7 clncal/pathologcal features and prognoss status. Some preprocessed pathologcal data samples of ovaran GCT are lsted n Table ncludng the orgnal data and preprocessed (standardzed and normalzed) data. n Table attrbute values of all preprocessed data meet the requrements of standardzaton and normalzaton. Fg. 7. Tumor cells n Call-Exner body pattern (amplfcaton factor=00) Fg. 8. Lutenzaton and nuclear mtoss of tumor cells (amplfcaton factor=00) Patents were dvded nto the recurrence group and the non recurrence group. Pathologcal and clncal factors related wth tumor recurrence were analyzed prelmnarly by Log-Rank test[]. Clncal factors related wth recurrence ncluded clncal stage of tumor and postoperatve chemotherapy (p<0.05). Pathologcal factors ncluded spontaneous tumor rupture tumor cell pattern (nsular or dffuse patterns) nuclear mtoss number of tumor cells and postve rates of p5 and K-67 ndex (p<0.05). Smulaton result analyss. Data preprocessng experment and analyss Pathologc sample set of ovaran GCT s preprocessed ncludng standardzaton and normalzaton. Preprocessng. Expermental analyss on prognoss evaluaton of ovaran GCT For ovaran GCT samples a seres of relevant features ncludng clncal and pathologcal features was enlsted wth reference to prevous lterature and research results. On ths bass two feature sets were constructed ncludng the followng. () Feature set (M) after standardzaton and normalzaton of all features n Table except the prognoss status was establshed. () Based on M the feature set (M) of factors whch have sgnfcantly statstcal (p<0.05) correlatons wth recurrence accordng to prelmnary Log-Rank test was constructed. t covers clncal stage of tumor postoperatve chemotherapy nuclear mtoss spontaneous tumor rupture postve rate of mmunohstochemcal markers (p5 and K- 67) and pattern of tumor cells (follcular and dffuse patterns). For M and M the proposed co-forest ntellgence model was appled for the experment of prognoss predcton. The results were compared wth the decson tree C.5[5] and support vector machne (SVM) model [6]. The three-fold cross valdaton method was appled n the experment. The recever operator characterstc curve (ROC) of prognoss evaluaton based on co-forest ntellgence model decson tree C.5 and SVM model based on M are shown n Fg.9. The ROC curves of prognoss evaluaton based on co-forest ntellgence model decson tree C.5 9

Xn Lao Xn Zheng Juan Zou Mn Feng Lang Sun Yan L and Kaxuan Yang/ Journal of Engneerng Scence and Technology Revew () (08) 5- and SVM model based on M are shown n Fg.0. The ROC curves of prognoss evaluaton based on co-forest ntellgence model decson tree C.5 and SVM model based on M and M are shown n Fgs.. Predcton performance statstcs of the above three models based on dfferent feature sets are presented n Table. Table.. Preprocessng results of some ovaran GCT data ndex Sample Sample before after before after clncal stage of tumor stage 0.5 stage 0 postoperatve chemotherapy none 0 Exst call-exner body exst Exst number of nuclear mtoss 0. 0.6 cell atypsm none 0 None 0 haemorrhage and necross exst None 0 follcular pattern none 0 Exst nsular pattern exst Exst 0 trabecular pattern exst None 0 rbbon pattern none 0 None 0 dffuse pattern exst None 0 lutenzaton of tumor cells none 0 None 0 K-67 expresson 50% 0.5 0% 0. PTEN focal focal 0. postve postve 0. EGFR negatve 0 Negatve 0 P5 negatve 0 focal postve 0. prognostc status favorable unfavorable 0 Fgs.9 and 0 show that the proposed prognoss evaluaton method of ovaran GCT based on the co-forest ntellgence model s superor decson tree C.5 and SVM model n terms of prognoss predcton based on ether M or M. Fgs. show that the prognoss predcton accuraces of the co-forest ntellgence model decson tree C.5 and SVM model based on M are sgnfcantly hgher than those based on M. These expermental results prove the valdty of prelmnary feature set screenng by Log- Rank test. Table shows that the area under the ROC curves (AUCs) of the co-forest ntellgence model based on M and M (0.96 and 0.958 respectvely) are far larger than those of the decson tree C.5 (0.7 and 0.86) and SVM model (0.7 and 0.798). Accordng to the above results the proposed prognoss evaluaton of ovaran GCT based on co-forest ntellgence model has sgnfcantly hgher valdty than those of decson tree C.5 and the SVM model. Furthermore prognoss predcton accuraces of the co-forest ntellgence model decson tree C.5 and SVM model based on M are hgher than those based on M provng valdty of Log-Rank test n selecton of the orgnal feature set. The proposed method overcomes problems of lmted pathologcal samples and dffcult determnaton of prognoss-relevant factors n prognoss evaluaton of ovaran GCT. Furthermore the method acheves satsfyng predcton performance and has hgh practcal value n prognoss evaluaton. Ths study s conducve to clncans to optmze the treatment scheme and realze ndvdual precson treatment based on comprehensve evaluaton of patents condtons thus guaranteeng the long-term survval rate and survval qualty of patents. Fg. 9. ROC curves of evaluaton models based on M Fg. 0. ROC curves of evaluaton models based on M Fg.. ROC curves of the co-forest ntellgence model based on M and M 0

Xn Lao Xn Zheng Juan Zou Mn Feng Lang Sun Yan L and Kaxuan Yang/ Journal of Engneerng Scence and Technology Revew () (08) 5- Fg.. ROC curves of the decson tree C.5 based on M and M Fg.. ROC curves of the SVM model based on M and M Table.. Performances of dfferent prognoss evaluaton methods of ovaran GCT Algorthm Model AUC M M co-forest model 0.96 0.957 decson tree C.5 model 0.7 0.86 support vector machne 0.7 0.798 5. Conclusons learnng technology are ntroduced nto prognoss evaluaton of tumors and a prognoss evaluaton method of ovaran GCT based on the co-forest ntellgence model s proposed. Some conclusons can be drawn accordng to expermental results. () The prognoss evaluaton of ovaran GCT based on M whch s standardzed and normalzed s poorer than that based on M whch s selected by Log-Rank test. Currently pathologcal features and clncal features related wth prognoss of ovaran GCT have not been determned completely. Therefore M must have some nvald and even nterference features. Log-Rank test can elmnate some nterference features thus mprovng the predcton accuracy of the model. () The co-forest ntellgence model can make modelng analyss on small-szed dataset and explore effectve features from canddate feature dataset through automatc learnng. t overcomes some shortcomngs of ovaran GCT prognoss (.e. ncomplete determnaton of relevant pathologcal and clncal features) and acheves satsfyng prognoss predcton results. The AUCs of co-forest ntellgence model based on M and M (0.96 and 0.958 respectvely) are far larger than those of the decson tree C.5 (0.7 and 0.86) and SVM model (0.7 and 0.798). The proposed prognoss evaluaton method of ovaran GCT based on co-forest ntellgence model not only overcomes lmted sample data and ambguous prognostc features but also acheves good predcton results. t has hgh practcal value n prognoss evaluaton. Research conclusons are conducve to break bottlenecks aganst prognoss evaluaton of ovaran GCT and can help clncans master development laws of ovaran GCT take the ntatve n dagnoss and treatment and ncrease long-term survval rates of patents. However further mprovements are stll needed. Future studes shall further collect ovaran GCT samples to ncrease the generalzaton of the prognoss predcton model. Acknowledgements The authors are grateful for the support provded by the Program of Key Laboratory Open Fund n Schuan Provnce (Grant No. 07LF008) and the mportant Specal Fund for Appled R & D n Guangdong Provnce (Grant No. 05BD000). Ths s an Open Access artcle dstrbuted under the terms of the Creatve Commons Attrbuton Lcense Ovaran GCT has lmted samples and sgnfcant dfferent perods of recurrence resultng n many dffcultes of prognoss evaluaton. n ths study A theory and machne References. Färkklä A. Halta U. M. Tapper J. et al. Pathogeness and treatment of adult-type granulosa cell tumor of the ovary. Annals of Medcne 9(5) 07 pp.5-7.. Nosov V. Slva Tavassol F. et al. Predctors of recurrence of ovaran granulosa cell tumors. nternatonal Journal of Gynecologcal Cancer 9() 009 pp.68-6.. Klem P. J. Joensuu H. Salm T. Prognostc value of flow cytometrc DNA content analyss n granulosa cell tumor of the ovary. Cancer65(5) 990 pp.89-9.. Zhou Z. H. L M. Semsupervsed regresson wth cotranng-style algorthms. EEE Transactons on Knowledge & Data Engneerng 9() 007 pp.79-9. 5. Raahem B. Zhong W. Lu J. Explotng unlabeled data to mprove peer-to-peer traffc classfcaton usng ncremental trtranng method. Peer-to-Peer Networkng and Applcatons () 009 pp.87-97. 6. Khosla D. Dmr K. Pandey AK. et al. Ovaran granulose cell tumor: clncal features treatment outcomes and prognostc factors. North Amercan Journal of Medcal Scences 6() 0 pp. 8. 7. Sehoul J. Drescher F. S. Mustea A. et al. Granulosa cell tumor of the ovary: 0 years follow-up data of 65 patents. Antcancer Research (C) 00 pp.-9.

Xn Lao Xn Zheng Juan Zou Mn Feng Lang Sun Yan L and Kaxuan Yang/ Journal of Engneerng Scence and Technology Revew () (08) 5-8. Lao X. Feng. M. Wang. H. Pathologc features and prognostc factors of ovaran granulosa cell tumor. Journal of Schuan Unversty (Medcal scence edton) () 0 pp.9-. 9. Fox H. Agrawal K. Langley F. A. A clncopathologc study of 9 cases of granulosa cell tumor of the ovary wth specal reference to the factors nfluencng prognoss.cancer 5() 975 pp.-. 0. Schwartz P. E. Smth J. P. Treatment of ovaran stromal tumors. Amercan Journal of Obstetrcs & Gynecology 5() 976 pp.0-.. Cheong M. L. Shen J. Huang S. H. Long-term survval n a patent wth an advanced ovaran juvenle granulosa cell tumor wth para-aortc lymph node metastass. Tawanese Journal of Obstetrcs & Gynecology 55(6) 06 pp.907-909.. Majdoul S. Tawfq N. Bourhaleb Z. Recurrence occurrng ten years after the ntal dagnoss of granulosa cell tumour of the ovary: about two cases and revew of the lterature. Pan Afrcan Medcal Journal 5() 06 pp.5-0.. Sommers S. C. Gates O. Goodof Late recurrence of granulosa cell tumors: report of two cases. Obstetrcs & Gynecology 6() 955 pp.95-98.. Seagle B. L. Ann P. Butler S. Shahab S. Ovaran granulosa cell tumor: anatonal cancer database study. Gynecologc Oncology 6() 07 pp.85-9. 5. Haba R. Mk H. Kobayash S. et al. Combned analyss of flow cytometry and morphometry of ovaran granulosa cell tumor. Cancer 7() 05 pp.58-6. 6. Pectasdes D. Pectasdes E. A. Granulosa cell tumor of the ovary.cancer Treatment Revews () 008 pp.-. 7. HM.W. Ln W.C. ChenC.W. et al. Data preprocessng ssues for ncomplete medcal datasets. Expert Systems (5) 06 pp.-8. 8. Hausten S. Grand challenges n altmetrcs: heterogenety data qualty and dependences. Scentometrcs 08() 06 pp.-. 9. Zhang K. Lan L. Kwok J. T. et al. Scalng up graph-based semsupervsed learnng va prototype vector machnes. EEE Transactons on Neural Networks & Learnng Systems 6() 07 pp.-57. 0. Yeung D. Y. Chang H. Da G. A scalable kernel-based semsupervsed metrc learnng algorthm wth out-of-sample generalzaton ablty. Neural Computaton 0() 008 pp.89-86.. L M. Zhou Z. H. mprove computer-aded dagnoss wth machne learnng technques usng undagnosed samples. EEE Transactons on Systems Man and Cybernetcs 9() 007 pp.088-098.. Robnson A. Randomzaton bootstrap and monte-carlo methods n bology. Journal of the Royal Statstcal Socety 70() 00 pp.856-859.. Yang R. M. Zhang G. L. Lu F. et al. Comparson of boosted regresson tree and random forest models for mappng topsol organc carbon concentraton n an alpne ecosystem. Ecologcal ndcators 60() 06 pp.870-878.. Kolets D. Pands N. Survval analyss part : kaplan-meer method and the log-rank test. Amercan Journal of Orthodontcs & Dentofacal Orthopedcs 5() 07 pp.569-57. 5. Zouggar S.T. Adla A. Proposal for measurng qualty of decson trees partton. nternatonal Journal of Decson Support System Technology 9() 07 pp.6-6. 6. Wu J. Yang H. Lnear regresson-based effcent svm learnng for large-scale classfcaton. EEE Transactons on Neural Networks & Learnng Systems 6(0) 07 pp.57-69.