IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE JOHN H. PHAN The Wallace H. Coulter Department of Bomedcal Engneerng, Georga Insttute of Technology, 313 Ferst Drve Atlanta, GA 30332, USA QIQIN YIN-GOEN ANDREW N. YOUNG Department of Pathology and Laboratory Medcne, Emory Unversty Atlanta, GA 30322, USA MAY D. WANG The Wallace H. Coulter Department of Bomedcal Engneerng, Georga Insttute of Technology, 313 Ferst Drve Atlanta, GA 30332, USA Identfyng and valdatng bomarkers from hgh-throughput gene expresson data s mportant for understandng and treatng cancer. Typcally, we dentfy canddate bomarkers as features that are dfferentally expressed between two or more classes of samples. Many feature selecton metrcs rely on rankng by some measure of dfferental expresson. However, nterpretng these results s dffcult due to the large varety of exstng algorthms and metrcs, each of whch may produce dfferent results. Consequently, a feature rankng metrc may work well on some datasets but perform consderably worse on others. We propose a method to choose an optmal feature rankng metrc on an ndvdual dataset bass. A metrc s optmal f, for a partcular dataset, t favorably ranks features that are known to be relevant bomarkers. Extensve knowledge of bomarker canddates s avalable n publc databases and lterature. Usng ths knowledge, we can choose a rankng metrc that produces the most bologcally meanngful results. In ths paper, we frst descrbe a framework for assessng the ablty of a rankng metrc to detect known relevant bomarkers. We then apply ths method to clncal renal cancer mcroarray data to choose an optmal metrc and dentfy several canddate bomarkers. 1. Introducton The subjectve nature of tradtonal medcal technques lmts the accuracy of cancer subtype classfcaton and, subsequently, the effectveness of therapy. Clncans vsually examne cancer specmens to determne ther subtypes before proposng treatment regmens. However, cancers wth smlar characterstcs may behave very dfferently despte smlar treatment condtons [1]. Because cancer s the result of genetc anomales, emergng dagnostc research has

prmarly focused on genetc and proteomc expresson. Ths research generally nvolves the use of hgh throughput technology (e.g. mcroarrays and mass spectrometry) to generate large amounts of genetc and proteomc expresson data. We typcally reduce ths data usng one of many analyss algorthms wth the goal of dentfyng a subset of features (correspondng to genes or protens) wth hgh predctve accuracy [2-4]. We hope that these feature subsets wll both enhance our understandng of the bologcal mechansms as well as provde us wth an accurate dagnostc system. When valdated, we call these dfferentally expressed features bomarkers. Unfortunately, even the selecton of a rankng metrc s subjectve, as dfferent metrcs may dentfy dfferent subsets of features [5]. Feature rankng affects both the effcency of dentfyng relevant genes and the accuracy of subsequent predctve models. We address ths ssue by presentng a method that uses exstng bologcal knowledge to dentfy the best feature rankng metrc for a partcular gene expresson dataset. The optmal metrc maxmzes the probablty of correctly rankng dfferentally expressed and prevously valdated genes. Despte numerous feature selecton studes, there s stll a lack of clncally valdated and proven bomarkers for most cancers. Thus, the use of correct genes as knowledge for algorthm selecton s subjectve and we should choose these genes carefully. Sources of bologcal knowledge are abundant, but vary n terms of relablty. We consder a knowledge source to be relable f genes (or the correspondng expressed protens) from that source have been clncally valdated as dfferentally expressed. The majorty of knowledge s contaned n the lterature and roughly falls nto four levels of relablty, adapted from a revew of post-analyss valdaton methods by Chuaqu et al. [6]: 1. No bologcal valdaton. As the lowest level of relablty, ths ncludes studes that develop feature selecton algorthms and present the selected lst of genes wthout a strngent nterpretaton of the bologcal results. 2. In slco valdaton. Also known as computatonal valdaton, these studes compare ther feature selecton results to the results of other studes. They may also dentfy Gene Ontology (GO) categores that are statstcally overrepresented as a result of feature selecton. 3. Same-sample valdaton. These studes valdate ther mcroarray experments by performng addtonal assays on the same samples from whch ther mcroarrays were derved. These assays typcally nclude quanttatve real-tme PCR (qrt-pcr) or northern analyss and serve to valdate the techncal relablty of the mcroarrays. 4. Independent or clncal valdaton. As the hghest level of relablty, these studes valdate the results of ther mcroarray experments usng ndependent bologcal samples, usually from a clncal source. Independent

valdaton ensures that the selected features are not a result of over-fttng. These valdatons often take the form of qrt-pcr and n stu hybrdzaton (ISH) for RNA products, or mmunohstochemstry (IHC) and western analyss for proten products. Despte frequent dsagreement between qrt-pcr and mcroarray results, qrt- PCR s the most common method for valdaton of dfferentally expressed genes. Genes wth large fold-change n mcroarray data are consstently correlated wth qrt-pcr whle those wth smaller fold change are more susceptble to techncal varablty [7]. The detecton of dfferentally expressed genes s generally reproducble across several mcroarray platforms [8]. However, n lght of a recent study llustratng the pervasveness of techncal artfacts n mcroarray data [9], we only consder a knowledge source relable f t falls nto category three or four. Investgators have attempted to mprove feature selecton by usng bologcal knowledge. Ther knowledge sources often fall nto category two of relablty, n slco valdaton, and nclude Gene Ontology and pathway databases, publshed lterature, mcroarray repostores, and sequence nformaton. Generally, these studes dentfy genes that cluster or correlate wth genes from the knowledge sources [10-12]. Another study developed a theoretcal framework to compare feature rankng metrcs n the presence of control features [13]. However, ths study also neglected to focus on the relablty of the control features. Indeed, the wealth of avalable nformaton n the form of gene and proten nteractons, functonal annotaton, and genetc and pathways can mprove the results of data analyss [14]. Furthermore, mcroarray data analyss has shfted from purely data drven methods to methods that use addtonal knowledge, even n the feature selecton process [14]. We develop a method to quantfy the effcency of detectng bomarkers by feature rankng. Ths method maxmzes the bologcal relevance of feature rankng by choosng the best metrc from a populaton of metrcs. The chosen rankng metrc s optmal wth respect to knowledge obtaned from relable sources. We test the effectveness of our method usng clncal gene expresson data. Results ndcate that the choce of rankng metrc sgnfcantly affects feature rankng, whch, n turn, affects the effcency of dscoverng and valdatng novel bomarkers.

2. Methods 2.1. Modelng Knowledge n Feature Selecton Throughout ths paper, the term feature set denotes a group of one or more features or genes that act n concert. A sample refers to measurements of a feature set from a sngle mcroarray or molecular profle. The entre mcroarray sample contans l features whle a feature set may contan p features (where p << l ). We r represent samples for feature set as jontly dstrbuted p random vectors, X R, and labels, Y {0,1 }. The class label, Y, ndcates the clncal source of the mcroarray sample. In most cancer problems, Y = 1 ndcates, for example, samples measured from patents wth cancer and Y = 0 ndcates samples from patents wth no cancer. For a mcroarray dataset wth N samples, feature set for a partcular dataset s the vector d r r r r = (( y1, x1 ),( y2, x2), K,( yn, xn )) r from the random varable D, whch represents all feature sets n a dataset. Each feature set s assocated wth a relevance varable, r, from the random varable R {0,1 }. r represents the bologcal relevance of the feature set and the relablty of the knowledge source. D r and R are jontly dstrbuted. For each feature set, we assgn a score that represents the predctve ablty of that feature set: r A = h( D, θ ) (1) where A R s a random varable and θ s a meta-parameter that characterzes the scorng functon, or rankng metrc. Although θ may represent the space of all rankng methods, we use a reduced set of wrapperbased methods n our smulatons. Specfcally, we use a support vector machne (SVM) classfer wth the lnear and radal bass kernels and estmate the classfcaton accuracy of bomarkers usng the 0.632 bootstrap [5, 15]. The SVM classfer depends on a cost parameter, C, whch determnes the penalty of msclassfcaton. The radal bass kernel depends on γ, whch s proportonal to the complexty of the classfer. For the radal bass kernel, the par of parameters, ( C, γ ), represents θ. We dscretely vary C and γ over the log scale range of 0.1 to 10 3 and 0.01 to 10 5, respectvely. For the lnear kernel, only the sngle parameter,c, represents θ. We vary ths parameter over the log scale range of 0.01 to 10 2.

In practce, a gene expresson dataset wll have N samples, each wth l features. We separately examne m (m can be dfferent from l and nclude, for example, all pars, trplets, or a subset of feature combnatons) feature sets, r r r correspondng to { d1, d2, K, dm} and { r1, r2, K, rm }. From the mappng defned n eq. 1, we compute the set of values { α1, α2, K, αm} where each α s an observaton from A. Usng a smple selecton method, we can then conclude that the best feature sets and potental bomarkers are n the set G = { : α < τ} (2) where τ s a threshold. We want to choose a θ that produces the most bologcally relevant r r r rankng of the m feature sets, { d1, d2, K, dm}, wth respect to a gven set of knowledge. Assumng that lower scores are better, the best θ assgns scores such that α < α j for r = 1 and r j = 0,.e., feature set s known to be more relevant than feature set j for ths partcular dataset. Although we may never know the relevance of all features n a dataset, we may nfer from lterature that the k feature sets, Gk = { g1, g2, K, gk}, are relevant, where k << m. Ths mples that the elements of the set { α : Gk} should generally be smaller than those of { α j : j Gk}. If the knowledge s relable, we want to choose a θ that maxmzes the probablty that the score of a feature set from G k s less than that of a feature set that s not fromg k. Explctly, ths probablty s P ( α < α j θ ) (3) for Gk and j Gk. The estmated optmal rankng method s ˆ = arg max P ( α < α θ ), (4) θ θ j keepng n mnd that θˆ s only optmal, or maxmzes the probablty, wth respect to the gven knowledge set. For m feature sets, k of whch are n our knowledge set, G k, we can emprcally approxmate the probablty of eq. 3 wth P 1 ( < α j θ ) = I( α < α j k( m k) ) α (5) G k j G

where I (x) evaluates to one when x s true and zero when x s false. Eq. 5 s equvalent to computng the area under an ROC curve (AUC) for classfyng feature sets as ether relevant or rrelevant [13]. 2.2. Iteratvely Updatng Knowledge It may be dffcult to comple a comprehensve lst of knowledge from lterature and ndependent valdaton. Consequently, we can expect that some feature sets that are not n our knowledge set, j Gk, are, n fact, relevant bomarkers. If V s the set of all relevant bomarkers, regardless of whether ther relevance s known, we defne the knowledge update functon, S, as θˆ Gk + 1 = S ˆ ( Gk ) = {{ Gk,arg mn α }: V, Gk }. (6) θ Ths functon adds to G k a relevant bomarker wth the best rank accordng to the estmated optmal metrc,θˆ. Of course, a feature set s known to be n the set V only after performng a valdaton procedure such as qrt-pcr. If we know all feature sets n V, we can quantfy any mprovement n effcency due to optmzaton of the rankng metrc. Usng bootstrap resamplng, we randomly and repeatedly partton the feature sets n V nto a group of known relevant feature sets (tranng) and a group of unknown relevant feature sets (testng). If there are K elements n V, we randomly select * * K elements wth replacement, resultng n K ( K < K) unque elements for * the testng set. We use the group of K K known relevant feature sets to optmze the rankng metrc, then teratvely detect feature sets from the * unknown set of K features and update our knowledge usng eq. 6. Every valdaton test requres a fnte amount of tme and resources. Plottng the fracton of correctly valdated bomarkers (y-axs) vs. total valdaton tme (xaxs), reveals that hgher detecton effcency corresponds to a larger area under ths curve. Ths curve s smlar to a ROC curve, so we also call the area under ths curve the AUC. We repeat ths bootstrap samplng of feature sets 100 tmes n order to compute the sgnfcance of the dfferences among three condtons: optmal metrc selecton, sub-optmal metrc selecton, and sub-optmal ntal knowledge. For the sub-optmal metrc selecton condton, we use correct ntal knowledge selected from V va bootstrap, but use a modfed equaton to choose θˆ wth medan AUC: ˆ = arg medan P( α < α θ ). (7) θ θ j

Selecton of a rankng metrc wth medan AUC represents the common practce of arbtrarly selectng a metrc wth no regard for bologcal relevance and effcency. Ths medan AUC algorthm also serves as a reference pont for assessng the potental mprovement of effcency when usng the optmal algorthm. For the sub-optmal ntal knowledge condton, we begn the smulaton wth ncorrect knowledge selected va bootstrap and use eq. 4 to optmze the rankng algorthm before updatng the current knowledge set. We expect the average AUC of the optmal selecton condton to be hgher than that of both of the sub-optmal condtons. Fgure 1 llustrates ths process. To determne whether the optmzaton procedure s over-fttng to the knowledge set, we conduct addtonal tests usng randomly selected knowledge sets. If over-fttng s occurrng, results of the optmal, suboptmal, and suboptmal knowledge tests for randomly selected knowledge should be smlar to those of the true knowledge set. Fgure 1. Quantfyng the effcency of detectng relevant feature sets. For clncal data, we defne V as the set of K known dfferentally expressed feature sets. Usng bootstrap cross valdaton, we partton V nto K * and K-K * samples. K * s the number of unque samples after samplng from V K tmes wth replacement. We optmze the rankng algorthm usng K-K * feature sets and assess the algorthm s effcency n detectng the remanng K * feature sets. For each of the three condtons optmal metrc selecton, sub-optmal metrc selecton, and sub-optmal ntal knowledge we perform ths bootstrap samplng 100 tmes n order to compute the sgnfcance of any dfferences between mean AUC values.

2.3. Mcroarray Data Analyss and qrt-pcr Valdaton We examne two clncal case studes usng renal tumor mcroarray datasets. The frst dataset, from a study by Schuetz et al., uses Affymetrx mcroarrays (HG-Focus, 8793 probesets) to profle samples from three subtypes of renal tumors: 13 clear cell (CC) renal cell carcnoma (RCC), 4 chromophobe (CHR) RCC, and 3 oncocytoma (ONC, bengn) [2]. The second dataset, from a study by Jones et al., uses a dfferent model of Affymetrx mcroarrays (HG-U133A, 22283 probesets reduced to 8793 that are common to HG-Focus) to examne smlar renal tumor subtypes wth 32 CC, 6 CHR, and 12 ONC samples [16]. We are nterested n bomarkers that dfferentate the CC class from the combned group of ONC and CHR. Usng lterature, we dentfy genes that have been valdated (va qrt-pcr or IHC) as dfferentally expressed between the CC and ONC/CHR subtypes. We then valdate an addtonal 94 genes usng qrt-pcr (usng RNA from 34 CC and 18 CHR tssue samples). These 94 genes were selected by a renal cancer pathologst based on hs knowledge and prevous research. Only some of the 94 genes assayed wth qrt-pcr are dfferentally expressed as assessed by a lnear SVM wth classfcaton error estmated usng 0.632 bootstrap. Genes measured wth qrt-pcr are categorzed as dfferentally expressed f the estmated classfcaton error s less than 10%. Usng the set of knowledge from both lterature and qrt-pcr valdaton, we examne the effcency of detectng these bomarkers by optmzng the rankng metrc under varous condtons, as llustrated n fgure 1. 3. Results and Dscusson As descrbed n the methods, we dentfy fve genes from lterature that are dfferentally expressed between the CC and ONC/CHR renal tumor subtypes (table 1). Each of these genes had been valdated usng ether qrt-pcr or IHC. Addtonally, we valdate several other potental bomarkers usng qrt-pcr and select genes wth estmated classfcaton errors of less than 10% (table 2). Combnng all knowledge from both lterature and qrt-pcr valdaton, we examne the effect of optmzng the feature rankng metrc usng the method llustrated n fgure 1. Box plots of the 100 teratons for each of the three tests ndcate that optmal selecton outperforms sub-optmal selecton (fgure 2, left column). The comparson of optmal to suboptmal metrcs may seem to always favor the optmal metrc. However, the optmal metrc s not always a smple lnear classfer. In fact, durng the teratve gene detecton process, θ changes frequently as V s updated. Moreover, suboptmal selecton may represent the common practce of arbtrarly selectng rankng metrcs wth no regard to ther

potental dsadvantages for partcular datasets. The box plots represent the medan and quartles of the AUC values for each of the 100 teratons. Correspondngly, the ROC curves also ndcate that the optmal selecton method mproves the effcency of bomarker detecton (fgure 2, rght column). For the Schuetz data (fgure 2, top row), the performance dfference between the optmal and suboptmal rankng metrcs seems small accordng to the box plots. However, the ROC curve of the optmal metrc ntally rses much more quckly compared to that of the suboptmal. The regon of low specfcty boosts the performance of the suboptmal metrc. However, ths regon should be neglected when assessng performance snce the number of false postves at ths pont s very hgh. Valdaton procedures would lkely consder only the bomarkers detected n the hgh specfcty regon. Results are smlar for the Jones data (fgure 2, bottom row). The hgh varance of the suboptmal ntal knowledge condton ndcates that optmzaton of the rankng metrc s senstve to the ntal condtons. Some of the randomly selected ntal knowledge may, n fact, be dfferentally expressed, resultng n good performance. However, these random ntal knowledge sets are more lkely to be rrelevant. Thus, box plots for ths condton llustrate ths mxture of knowledge qualty. These results stress the mportance of the qualty of bomarker knowledge. The control tests usng random knowledge sets for V show that our method does not over-ft to the knowledge (fgure 2, box plots CO, CSO, and CSK). None of the algorthms consdered n our space of θ are able to favorably rank these randomly selected genes. AUCs of these control tests are close to 0.5 as expected for random classfcaton. Usng all knowledge from lterature and the frst round of qrt-pcr, we optmze the rankng metrc and select the top genes that have not been prevously valdated and that have estmated classfcaton errors of less than 5% (table 3). We can lnk a few of these genes drectly to prevous lterature pertanng to renal cancer. For example, CXCR4 has been lnked to kdney cancer. Usng qrt-pcr, Schrader et al. shows that ths gene s over-expressed n kdney cancer tssue compared to normal kdney tssue [17]. IGFBP3 and KLF10 has also been lnked to renal cell carcnoma [18, 19]. Valdaton of these genes usng qrt-pcr may yeld addtonal knowledge to teratvely refne the bomarker selecton process. However, snce we want to prmarly focus on the methodology here, we reserve the actual valdaton of these results for a future study.

Table 1. Genes valdated as dfferentally expressed between CC and ONC/CHR renal tumor subtypes from varous knowledge sources. Gene Symbol Knowledge Source Valdaton Method CA9 Chen et al., Cln Cancer Res, 2005 qrt-pcr CLCNKB Chen et al., Cln Cancer Res, 2005 qrt-pcr DEFB1 Schuetz et al., J Mol Dagn, 2004 qrt-pcr, IHC LRP2 Schuetz et al., J Mol Dagn, 2004 qrt-pcr, IHC PVALB Chen et al., Cln Cancer Res, 2005 qrt-pcr Table 2. Genes that we valdated wth qrt-pcr. These genes have estmated classfcaton errors of less than 10% as assessed by a lnear SVM classfer usng 0.632 bootstrap estmaton. Gene Symbol Error Gene Symbol Error STC1 2.43E-05 COX5A 0.0394058 SLC25A4 0.00186696 BAG1 0.0548365 CFTR 0.00279081 LY6E 0.0596081 PDHA1 0.0133316 CD99 0.0600892 PFKM 0.0279739 AKAP12 0.0624445 NNMT 0.0289622 ACAT1 0.0687972 CP 0.0300157 SPTBN2 0.077287 CFB 0.0387219 GOT1 0.0784855 Fgure 2. Box plots of AUC areas over 100 teratons for each test (left). AUCs for the optmal test (O) are hgher than both the sub-optmal (SO) and sub-optmal knowledge (SK) tests (dfferences are statstcally sgnfcant wth p-values very close to 0). The control tests, usng randomly selected knowledge ndcate that optmzng the rankng metrc does not over-ft (CO=control optmal, CSO=control suboptmal, CSK=control suboptmal knowledge). Average ROC curves for each test, llustrate the dfferences n bomarker detecton effcency (rght). The ROC for the optmal metrc test (sold lne) ndcates more accurate bomarker detecton for both the Schuetz (top row) and Jones (bottom row) renal cancer datasets.

4. Concluson Table 3. Proposed lst of genes for further qrt-pcr valdaton. Gene Symbol Error Gene Symbol Error ACLY 0 PCCB 0.03274 CXCR4 0.013907 TMSB10 0.034201 C4A /// C4B 0.0187 HCLS1 0.034415 FLNA 0.019903 ACTA2 0.039398 PMP22 0.023798 IGFBP3 0.040989 PFKFB3 0.026506 NFKBIA 0.042332 KLF10 0.027801 CD44 0.049095 PRG1 0.03003 IER3 0.049571 LGALS1 0.030617 We have shown that bomarker dentfcaton by feature rankng benefts from knowledge ntegraton at key ponts. Usng ths knowledge whether from clncal observatons, laboratory experments, or exstng lterature we can ntellgently choose an optmal rankng metrc for a specfc gene expresson dataset. The use of an optmal metrc for rankng and dentfyng novel bomarkers reduces the number of false dscoveres, ncreases the number of true dscoveres, reduces the requred tme for valdaton, and ncreases the overall effcency of the process. The results of our smulatons ndcate that knowledge ntegraton mproves bomarker selecton for clncal mcroarray data. Although ths study assumes ndependent gene expresson, the method s general and we can use t to rank combnatoral gene expresson data as well. Furthermore, we test ths method usng only a lmted set of wrapper-based feature rankng metrcs. However, t s easly expandable to encompass a varety of metrcs, ncludng the commonly used flter methods such as t-tests and fold change. We hope that the proposed method wll mpact bomarker dentfcaton practces and mprove the effectveness of resultng clncal applcatons. Acknowledgments Ths research has been supported by grants from Natonal Insttutes of Health (R01CA108468, P20GM072069, U54CA119338), Mcrosoft Research Fundng, and Georga Cancer Coalton (Dstngushed Cancer Scholar Award to MDW). References 1. Golub, T., et al., Molecular Classfcaton of Cancer: Class Dscovery and Class Predcton by Gene Expresson Montorng. Scence, 1999. 286: p. 531-537. 2. Schuetz, A., et al., Molecular classfcaton of renal tumors by gene expresson proflng. J Mol Dagn, 2004.

3. Sngh, D., et al., Gene expresson correlates of clncal prostate cancer behavor. Cancer Cell, 2002. 1: p. 203-209. 4. van't Veer, L., et al., Gene expresson proflng predcts clncal outcome of breast cancer. Nature, 2002. 415: p. 530-536. 5. Braga-Neto, U. and E. Dougherty, Is cross-valdaton vald for smallsample mcroarray classfcaton? Bonformatcs, 2004. 20: p. 374-380. 6. Chuaqu, R., et al., Post-analyss follow-up and valdaton of mcroarray experments. Nature Genetcs, 2002. 32: p. 509-514. 7. Morey, J., J. Ryna, and F. Van Dolah, Mcroarray valdaton: factors nfluencng correlaton between olgonucleotde mcroarrays and real-tme PCR. Bol. Proced. Onlne, 2006. 8(1): p. 175-193. 8. Sh, L., et al., The McroArray Qualty Control (MAQC) project shows nter- and ntraplatform reproducblty of gene expresson measurements. Nat Botechnol, 2006. 24(9): p. 1151-61. 9. Stokes, T., et al., chp artfact CORRECTon (cacorrect): A Bonformatcs System for Qualty Assurance of Genomcs and Proteomcs Array Data. Annals of Bomedcal Engneerng, 2007. 35: p. 1068-1080. 10. Aerts, S., et al., Gene prortzaton through genomc data fuson. Nature Botechnology, 2006. 24(5): p. 537-544. 11. Kuffner, R., K. Fundel, and R. Zmmer, Expert knowledge wthout the expert: ntegrated analyss of gene expresson and lterature to derve actve functonal contexts. Bonformatcs, 2005. 21: p. 259-267. 12. Kong, S., W. Pu, and P. Park, A multvarate approach for ntegratng genome-wde expresson data and bologcal knowledge. Bonformatcs, 2006. 22(19): p. 2373-2380. 13. Mukherjee, S. and S. Roberts, A theoretcal analyss of the selecton of dfferentally expressed genes. J Bonformatcs Comput Bol, 2005. 3: p. 627-643. 14. Bellazz, R. and B. Zupan, Towards knowledge-based gene expresson data mnng. Journal of Bomedcal Informatcs, 2007. 40: p. 787-802. 15. Efron, B. and R. Tbshran, Improvements on Cross-Valdaton: The.632+ Bootstrap Method. Journal of the Amercan Statstcal Assocaton, 1997. 92(438): p. 548-560. 16. Jones, J., et al., Gene sgnatures of progresson and metastass n renal cell cancer. Cln Cancer Res, 2005. 11(16): p. 5730-9. 17. Schrader, A., et al., CXCR4/CXCL12 expresson and sgnallng n kdney cancer. Brtsh Journal of Cancer, 2002. 86: p. 1250-1256. 18. Rosendahl, A. and G. Forseberg, IGF-I and IGFBP-3 augment transformng growth factor-beta actons n human renal carcnoma cells. Kdney Internatonal, 2006. 70: p. 1584-1590. 19. Ivanov, S., et al., Two novel VHL targets, TGFBI (BIGH3) and ts transactvator KLF10, are up-regulated n renal clear cell carcnoma and other tumors. Bochem Bophys Res Commun, 2008.