Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO

Size: px
Start display at page:

Download "Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO"

Transcription

1 Zuo et al. BMC Bonformatcs (2017) 18:99 DOI /s METHODOLOGY ARTICLE Open Access Incorporatng pror bologcal knowledge for network-based dfferental gene expresson analyss usng dfferentally weghted graphcal LASSO Ymng Zuo 1,2,3,YCu 2, Guoqang Yu 1, Rujang L 2 and Habtom W. Ressom 3* Abstract Background: Conventonal dfferental gene expresson analyss by methods such as student s t-test, SAM, and Emprcal Bayes often searches for statstcally sgnfcant genes wthout consderng the nteractons among them. Network-based approaches provde a natural way to study these nteractons and to nvestgate the rewrng nteractons n dsease versus control groups. In ths paper, we apply weghted graphcal LASSO (wglasso) algorthm to ntegrate a data-drven network model wth pror bologcal knowledge (.e., proten-proten nteractons) for bologcal network nference. We propose a novel dfferentally weghted graphcal LASSO (dwglasso) algorthm that bulds group-specfc networks and perform network-based dfferental gene expresson analyss to select bomarker canddates by consderng ther topologcal dfferences between the groups. Results: Through smulaton, we showed that wglasso can acheve better performance n buldng bologcally relevant networks than purely data-drven models (e.g., neghbor selecton, graphcal LASSO), even when only a moderate level of nformaton s avalable as pror bologcal knowledge. We evaluated the performance of dwglasso for survval tme predcton usng two mcroarray breast cancer datasets prevously reported by Bld et al. and van de Vjver et al. Compared wth the top 10 sgnfcant genes selected by conventonal dfferental gene expresson analyss method, the top 10 sgnfcant genes selected by dwglasso n the dataset from Bld et al. led to a sgnfcantly mproved survval tme predcton n the ndependent dataset from van de Vjver et al. Among the 10 genes selected by dwglasso, UBE2S, SALL2, XBP1 and KIAA0922 have been confrmed by lterature survey to be hghly relevant n breast cancer bomarker dscovery study. Addtonally, we tested dwglasso on TCGA RNA-seq data acqured from patents wth hepatocellular carcnoma (HCC) on tumors samples and ther correspondng non-tumorous lver tssues. Improved senstvty, specfcty and area under curve (AUC) were observed when comparng dwglasso wth conventonal dfferental gene expresson analyss method. Conclusons: The proposed network-based dfferental gene expresson analyss algorthm dwglasso can acheve better performance than conventonal dfferental gene expresson analyss methods by ntegratng nformaton at both gene expresson and network topology levels. The ncorporaton of pror bologcal knowledge can lead to the dentfcaton of bologcally meanngful genes n cancer bomarker studes. Keywords: Pror bologcal knowledge, Gaussan graphcal model, Weghted graphcal LASSO, Network-based dfferental gene expresson analyss *Correspondence: hwr@georgetown.edu 3 Lombard Comprehensve Cancer Center, Georgetown Unversty, Washngton, DC, USA Full lst of author nformaton s avalable at the end of the artcle The Author(s). 2017Open Access Ths artcle s dstrbuted under the terms of the Creatve Commons Attrbuton 4.0 Internatonal Lcense ( whch permts unrestrcted use, dstrbuton, and reproducton n any medum, provded you gve approprate credt to the orgnal author(s) and the source, provde a lnk to the Creatve Commons lcense, and ndcate f changes were made. The Creatve Commons Publc Doman Dedcaton waver ( apples to the data made avalable n ths artcle, unless otherwse stated.

2 Zuo et al. BMC Bonformatcs (2017) 18:99 Page 2 of 14 Background Typcally, a dfferental gene expresson analyss (e.g., student s t-test, SAM, Emprcal Bayes, etc.) s performed to dentfy genes wth sgnfcant changes between bologcally dsparate groups [1 3]. However, ndependent studes for the same clncal types of patents often lead to dfferent sets of sgnfcant genes and had only few n common [4]. Ths may be attrbuted to the fact that genes are members of strongly ntertwned bologcal pathways and are hghly nteractve wth each other. Wthout consderng these nteractons, dfferental gene expresson analyss wll easly yeld based result and lead to a fragmented pcture. Network-based methods provde a natural framework to study the nteractons among genes [5]. Data-drven network model reconstructs bologcal networks solely based on statstcal evdence. Relevance network s one common data-drven network model [6, 7]. It uses correlaton or mutual nformaton to measure the relevance between genes and sets a hard threshold to connect hgh relevant pars. Relevance network has extensve applcaton due to ts smplcty and easy mplementaton. However, ts drawback becomes sgnfcant when the varable number ncreases: t confounds drect and ndrect assocatons [8]. For example, a strong correlaton for gene par X-Y and X-Z wll ntroduce a less strong but probably stll statstcally sgnfcant correlaton for gene par Y-Z. As a result, when the number of genes s large, relevance network tends to generate over-complcated networks that contan overwhelmng false postves. Bayesan network s another classc data-drven network model [9]. Unlke undrected graphs such as relevance networks, Bayesan networks generate drected acyclc graphs, n whch each edge ndcates a condtonal dependence relatonshp between two genes gven ther parents. The benefts of usng Bayesan networks are: 1) By modelng condtonal dependence relatonshp, Bayesan networks only dentfy drect assocatons; 2) Wth drectons n the graph, Bayesan networks allow to nfer causal relatonshp. However, t s challengng to apply Bayesan networks on hgh-throughput omc data snce learnng the structure of Bayesan networks for hgh dmensonal data s tme-consumng and can be statstcally unrelable. Addtonally, Bayesan network cannot model cyclc structures, such as feedback loops, whch are common n bologcal networks. Recently, Gaussan graphcal models (GGMs) have been ncreasngly appled on bologcal network nference [10 12]. Smlar to Bayesan network, GGMs can remove the effect of ndrect assocatons through estmaton of the condtonal dependence relatonshp. At the same tme, they generate undrected graphs and have no lmtaton on modelng only acyclc structures. In GGMs, a connecton between two nodes corresponds to a non-zero entry n the nverse covarance matrx (.e., precson matrx), whch ndcates a condtonal dependency between these two nodes gven the others. GGMs dates back to early 1970s when Dempster ntroduced covarance selecton problem [13]. The conventonal approach to solve ths problem reles on statstcal test (e.g., devaton tests) and forward/backward selecton procedure [14]. Ths s not feasble for hgh-throughput omc data when the number of genes s rangng from several hundred to thousands whle the number of samples are only tens to hundreds. In addton, the small n, largep scenaro for omc data (.e., sample sze s far less than the varable number), makes maxmum lkelhood estmaton (MLE) of precson matrx not to exst because the sample covarance matrx s rank defcent. To deal wth these ssues, Schäfer et al. proposed to combne Moore- Penrose pseudonverse and bootstrappng technque to approxmate the precson matrx [15]. Others appled l 1 regularzaton to get a sparse network [16 18]. Takng nto account of the sparsty property of bologcal networks and the computatonal burden of bootstrappng, l 1 regularzaton methods are preferred. Among varous l 1 regularzaton methods, Menshausen et al. performed l 1 regularzed lnear regresson (.e., LASSO) for each node to select ts neghbors [16]. Gven all ts neghbors, one node s condtonally ndependent wth the remanng ones. Snce LASSO s performed for each node, ths neghbor selecton approach may face a consstency problem. For example, whle gene X s selected as Y s neghbor, gene Y may not be selected as X s neghbor when performng LASSO for gene X and gene Y separately. Compared wth neghbor selecton method, a more reasonable approach s graphcal LASSO, whch drectly estmates precson matrx by applyng l 1 regulaton on the elements of the precson matrx to obtan a sparse estmated precson matrx [17, 18]. We wll pursut the extenson of graphcal LASSO n ths paper. In addtonal to data-drven network models, there are many publcly avalable databases such as STRING ( KEGG ( kegg), BoGRID( and ConsensusPathDB ( where one can extract varous types of nteractons ncludng proten-proten, sgnalng, and gene regulatory nteractons [19 22]. Bologcal networks reconstructed from these databases have been reported useful. For example, Chuang et al. reconstructed proten-proten nteracton (PPI) network from multple databases to help dentfy markers of metastass for breast cancer studes usng gene expresson data [23]. They overlad the gene expresson value on ts correspondng proten n the network and searched for sub-networks whose actvtes across all patents were hghly dscrmnatve of metastass. By dong ths, they found several hub genes related to

3 Zuo et al. BMC Bonformatcs (2017) 18:99 Page 3 of 14 known breast cancer mutatons, whle these genes were not found sgnfcant by conventonal dfferental gene expresson analyss. They also reported that the dentfed sub-networks are more reproducble between dfferent breast cancer cohorts than ndvdual gene markers. However, databases are far from beng complete. Networks constructed purely based on the databases have a large number of false negatves. In addton, databases are seldom specfc to a certan dsease, so the nteractons that exst n the databases may not be reflectve of the patent populaton under study. In contrast, data-drven modelsarelkelytohavealargenumberoffalsepostves due to background nose. Consderng ths, an approprate approach to ntegrate the pror bologcal knowledge from databases and data-drven network model s desrable for more robust and bologcally relevant network reconstructon [24]. Prevously, pror bologcal knowledge has been ncorporated nto the neghbor selecton method [25]. It reles on the Bayesan nterpretaton of LASSO and assgns two dfferent pror dstrbutons for connectons that are present n the database and those are not. Recently, weghted graphcal LASSO (wglasso) has been proposed to ncorporate pror bologcal knowledge nto graphcal LASSO by assgnng dfferent weghts to the entres of precson matrx [26]. In ths work, we extend the orgnal wglasso algorthm, explan ths dea from a Bayesan perspectve, and perform comprehensve comparsons between wglasso and competng data-drven network models (e.g., neghbor selecton, graphcal LASSO). Addtonally, explorng the topologcal changes between bologcal dsparate groups may lead to new dscoveres that cannot be dentfed by conventonal dfferental gene expresson analyss [27 29]. For example, hgh-degree nodes (.e., hubs) that only exst n one of the bologcally dsparate groups may ndcate the regulatory rule of the hub genes only n that group. Knowledge-fused dfferental dependency network (KDDN) s a recently proposed method to construct knowledge ncorporated network that can show the rewrng connectons between two groups [29]. An opensource Cytoscape app s avalable for easy mplementaton [30]. In ths paper, we propose a novel algorthm called dfferentally weghted graphcal LASSO (dwglasso) for network-based dfferental gene expresson analyss. Ths s acheved by buldng separate networks for bologcally dsparate groups usng wglasso, explorng the topologcal changes between dfferent groups, and prortzng sgnfcant gene lst from conventonal dfferental gene expresson analyss as shown n Fg. 1. Other prevously reported methods nclude those that focus on ntegratng pror bologcal knowledge nto data-drven network model to dentfy sub-networks that are related to the dsease under study [31, 32]. Our work dffers wth these methods snce we compute a dfferental network score for each gene and prortze them for subsequent analyss rather than outputtng a sub-network lst for bologcal nterpretaton. Also, methods that drectly ncorporate gene networks or pror bologcal knowledge nto statstcal models for classfcaton and regresson tasks have Fg. 1 An overvew of dwglasso. The nput s gene expresson data (e.g., Mcroarray, RNA-seq data, etc.) and the output s a prortzed lst based on the dfferental network (DN) score defned wthn dwglasso

4 Zuo et al. BMC Bonformatcs (2017) 18:99 Page 4 of 14 been reported [33, 34]. The ratonale s that functonally lnked genes tend to be co-regulated and co-expressed, and therefore should be treated smlarly n the statstcal model. Our work leaves the statstcal model untouched. Instead, t focuses on usng the best set of gene bomarkers as an nput to the statstcal model. Ths s consdered to have advantages over provdng multple lnked genes from the network whose expresson values have smlar patterns. We show the applcaton of dwglasso on two ndependent mcroarray datasets from breast cancer patents for survval tme predcton, and on TCGA RNA-seq data acqured from patents wth hepatocellular carcnoma (HCC) for classfcaton task between tumor samples and ther correspondng non-tumorous lver tssues. Therestofthepapersorganzedasfollows. Methods secton ntroduces the extended wglasso algorthm and the proposed dwglasso for network-based dfferental gene expresson analyss. Results and dscusson secton presents the results of wglasso and dwglasso based on smulaton, mcroarray and RNA-seq data. Fnally, Concluson secton summarzes our work and dscusses possble future extensons. Methods Network nference usng wglasso Consder ( a centered and scaled data matrx X n p.e., n =1 x j = 0, n =1 x 2 j ), = 1 t measures the ntenstes of p genes on n samples, from a p-dmensonal Gaussan dstrbuton wth zero means on each dmenson and postve defnte covarance matrx p p (.e., X N (0, )). Suppose the sample sze n s far less than the varable number p (.e., n p), then the MLE of the precson matrx (.e., = 1 ) does not exst snce the sample covarance matrx S s rank defcent. If we further assume s sparse, then a l 1 regularzaton term can be added to the negatve loglkelhood functon f (X ) = log det + tr(s ) for a sparse precson matrx estmaton as shown n Eq. (1). Graphcal LASSO s an algorthm to effcently solve Eq. (1) by usng block coordnate descent [8, 9]. Once the sparse precson matrx ˆ s obtaned, a non-zero element n ˆ (.e., ˆθ j = 0) ndcates a condtonal dependence between x and x j gven the others. For network G ={(, j);1 < j p},wehaveĝ ={(, j) : ˆθ j = 0}. arg mn log det + tr(s ) + λ 1 (1) 0 where s the precson matrx, 0 s the constrant that has to be postve defnte, S s the sample covarance matrx, tr denotes the trace, the sum of the dagonal elements n a matrx, 1 represents the l 1 norm of, the sum of the absolute values of all the elements n,and λ s the tunng parameter controllng the sparsty of. LASSO based estmates have a Bayesan nterpretaton [35]. ˆ s the maxmum a posteror (MAP) estmate for the posteror dstrbuton p( X) wth a Laplacan pror dstrbuton p( ) as shown n Eq. (2). The LASSO term λ 1 n Eq. (1) s now part of p( ) = exp( λ 1 ) wth zero means and a scalng parameter λ. Fromthe Bayesan perspectve, p( ) encodes the pror knowledge of the network topology. For a database that contans only bnary nformaton (connectng or not) for a gven gene par, a natural way s to assgn two dfferent scalng parameters λ 1 and λ 2 for connectng pars and those are not connected, as shown n Eq. (3). For connectng pars, ther Laplacan pror dstrbuton s dffused, whle for non-connectng pars ther Laplacan pror dstrbuton s concentrated (.e., λ 1 λ 2 ). In another word, a larger penalty wll be assgned to non-connectng pars to ncrease the chance of ther correspondng entres n to shrnk to zero. In realty, tunng λ 1 and λ 2 at the same tme nvolves two dmensonal grd search, whch s qute tme-consumng for hgh-dmensonal data. An extreme soluton to set λ 2 = 0 lnks all the connectng gene pars from the database n the graph, neglectng the fact that the database mght contan some spurous connectons for the dsease under study. p( X) = p(x )p( ) p(x) exp(log det tr(s )) exp( λ 1 ) (2) p( ) = exp( λ 1 non con 1 ) λ 2 con 1 ) (3) Instead of usng the bnary nformaton, a contnuous confdence score s more sutable to ncorporate pror bologcal knowledge nto graphcal LASSO. The confdence score can be obtaned from multple resources. For example, an estmated functonal assocaton score for PPIs s provded by STRING database. We scale ths confdence score nto the range [0,1] and create a weght matrx W p p.inw, 1 ndcates a complete trust for a gene par to be connected, 0 represents that no evdence supports a gene par to be connected. In ths way, we can assgn dfferent penaltes to dfferent gene pars as shown n Eq. (4). Compared to Eq. (3), (4) also gves larger penalty for less lkely connectng gene pars, but now there s only one tunng parameter λ. For a fxed λ, R package glasso can solve Eq. (4) effcently gven W [17]. arg mn log det + tr(s ) + λ (1 W) 1 (4) 0 where 1 s all 1 matrx, W stheweghtmatrxcontanng the confdence score for each gene par and represents the element-wse multplcaton between two matrces. For LASSO based optmzaton problem as shown n Eq. (4), tunng the parameter λ s crucal snce t con-

5 Zuo et al. BMC Bonformatcs (2017) 18:99 Page 5 of 14 trols the sparsty of the output ˆ. Typcally, λ s tuned by cross-valdaton, Akake nformaton crteron (AIC), Bayesan nformaton crteron (BIC), or stablty selecton [36]. Consderng that AIC and BIC often lead to data under-fttng (.e., over-sparse network) and stablty selecton requres extensve computatonal tme, we prefer to use cross valdaton wth one standard error rule to select the optmal tunng parameter λ opt.byusngone standard error rule, we can acheve the smplest (most regularzed) model whose error s wthn one standard devaton of the mnmal error. Our wglasso algorthm s shown below. Algorthm 1 wglasso Input: A centered and scaled data matrx X n p ; AweghtmatrxW p p ; A regularzaton parameter set ; A cross valdaton fold number k. Output: Estmated precson matrx ˆ. 1: Randomly and equally dvde X nto k folds, gven by X 1, X 2,..., X k. 2: for each λ do 3: for each m {1, 2,..., k} do 4: Run graphcal LASSO algorthm wth nput X n =[..., X m 1, X m+1,... ], and regularzaton parameter λ (1 W) to obtan the estmated precson matrx ˆ λ m. 5: Calculate the negatve log-lkelhood ( functon as the model fttng error f X m ˆ λ m) = log det ˆ λ ) m ( S + tr m λ m. 6: end for ( 7: Calculate the standard error for f X 1 ˆ λ 1), ( f X 2 ˆ λ ( 2),..., f X k ˆ λ k) as SE( ˆ λ ) = ( ( var f X 1 ˆ λ ) ( 1,...,f X k ˆ λ )) k k. ( kl=1 f X l ˆ λ ) l 8: Compute the average model fttng error f (X ˆ λ ) = k. 9: end for 10: Obtan λ mn that acheves the mnmal model fttng error λ mn ={λ :mn f (X ˆ λ )}. λ 11: Move λ n the drecton of ncreasng regularzaton untl reachng to one standard error lmt λ opt ={λ : f (X ˆ λ ) = f (X ˆ λmn ) + SE( ˆ λmn )}. 12: Run graphcal LASSO algorthm wth nput X and regularzaton parameter λ opt (1 W) to obtan the fnal estmated precson matrx ˆ. Network-based dfferental gene expresson analyss usng dwglasso Fgure 2 shows the framework of the proposed dwglasso algorthm for network-based dfferental gene expresson analyss. dwglasso prortzes the sgnfcant lst obtaned from the conventonal dfferental gene expresson analyss based on the topologcal changes between the group-specfc networks bult by wglasso. Specfcally, dwglasso frst performs dfferental gene expresson analyss to obtan a lst of sgnfcant genes whose expresson values dffer between the two bologcally dsparate groups. Then based on these sgnfcant genes, dwglasso bulds group specfc networks usng wglasso. After the networks are constructed, dwglasso calculates a dfferental network score for each gene n the sgnfcant lst based on the topologcal changes between the two group-specfc networks. In calculatng the dfferental network score, dwglasso frst computes the node degree for each gene n both networks, meanng the number of neghbors each gene s connected wth. Then consderng the sze of the two networks are dfferent, the node degrees are scaled nto the range [0,1]. At last, the dfferental network score for one gene s computed as the absolute value of the dfference between the two assocated scaled node degrees from dfferent groups. Fnally, wth the dfferental network scores, dwglasso prortzes the sgnfcant lst from the conventonal dfferental gene expresson analyss n a decreasng order. The prortzed gene lst s used for subsequent analyss such as buldng classfcaton or regresson models. We beleve dwglasso can help classfcaton or regresson models to acheve better predcton performance snce the prortzed lst ntegrates nformaton at the gene expresson and network structure levels. More than that, the ncorporaton of pror bologcal knowledge s more lkely to dentfy bologcally meanngful genes. Detaled algorthm for dwglasso s shown below. Results and dscusson Smulaton data Bologcal networks are reported to be scale-free, whch means the degree dstrbuton of the network follows a power law [37]. We consdered ths scale-free property of bologcal network n generatng smulaton data usng R package huge [38]. Usng huge, a scale-free network was bult by nputtng the node number p. The sparsty of the network s s fxed, dependng on p. For example, when the node number s 100, the sparsty of the network s 0.02, ndcatng only 2% of all possble connectons (.e., p (p 1) 2 ) exst n the scale-free network. Once the scale-free network s bult, huge creates the true precson matrx true based on the network topology and the postve defnte constrant true 0 so that

6 Zuo et al. BMC Bonformatcs (2017) 18:99 Page 6 of 14 Algorthm 2 dwglasso Input: The raw data matrx X raw n p ; AweghtmatrxW p p. Output: Prortzed sgnfcant lst L dwglasso. 1: Perform conventonal dfferental gene expresson analyss on X raw to obtan a sgnfcant lst L. 2: Get two centered and scaled group specfc data matrx X (1) n 1 p sg and X (2) n 2 p sg from X raw and L,pckng out only the sgnfcant genes. 3: Buld group specfc networks G (1) and G (2) by runnng wglasso algorthm wth {X (1), W} and {X (2), W} as nputs. 4: for each L do 5: Compute the node degree d (1) and d (2) from G (1) and G (2),respectvely. 6: end for 7: for each L do 8: Compute the scaled node degree sd (1) and sd (2) as sd (1) = sd (2) = max j L max j L ( ) d (1) mn d (1) j j L ( ) ( d (1) j mn d (1) j j L ( ) d (2) mn d (2) j j L ( ) ( d (2) j mn d (2) j j L 9: Compute the dfferental network score dns = sd (1) sd (2). 10: end for 11: Prortze L based on the dfferental network score n a decreasng order to obtan L dwglasso. true = ( true ) 1 exsts. At last, smulaton data X n p N (0, true ) was generated. We created smulaton datasets wth varous p and n, as seen n Table 1. The weght matrx W, whch contans pror bologcal knowledge, was constructed based on true. In realty, databases may also contan spurous connectons for the dsease under study. To evaluate how the ncorrect connectons n W wll mpact wglasso, we ntroduced an addtonal metrc, acc. Whenacc = 60%, we randomly reassgned 40% ncorrect connectons n W. Specfcally, W was created as follows. Intally, for zero entres n true, the correspondng entres n W were also zero; for non-zero entres n true, the correspondng entres n W were randomly generated from the unform dstrbuton U(0,1). Then, we randomly assgned ncorrect connectons nto W based on the acc value whle keepng the total connectons n W the same as those n true. Under the assumpton that ncorrect entres n W should have lower confdence scores compared to those ), ). of correct entres, we generated ncorrect entres from the unform dstrbuton U(0, 0.5). We estmated the true network topology by usng neghbor selecton, graphcal LASSO, and the proposed wglasso methods. For neghbor selecton method, two strateges were appled to deal wth the nconsstency problem. Neghbor selecton wth or operator accepted nconsstent connectons whle neghbor selecton wth and operator rejected them. To make a far comparson, we tuned the regularzaton parameter n each method to ensure the output network has the same sparsty as the true network (.e., s = 0.02 for p = 100, s = for p = 500). For each n and p scenaro, we regenerated X n p 100 tmes, calculated the false postves and false negatves of connectons for each method, and lsted ther means and standard devatons n Table 1. To evaluate how the ncorrect connectons n W would mpact the performance of wglasso, we randomly reassgned 40% (acc = 60%) and 60% (acc = 40%) ncorrect pror bologcal knowledge n W. From Table 1, we can conclude that the estmated network from wglasso has much less false postves and false negatves, compared wth those from neghbor selecton and graphcal LASSO methods. A decrease of acc n W would lead to more false postves and false negatves from wglasso, but t stll outperforms neghbor selecton and graphcal LASSO methods when the acc n W s only as moderate as 40%. To make more comprehensve comparson, we plotted precson recall curve to evaluate the performance of neghbor selecton, graphcal LASSO and wglasso methods. We ran the above methods wth p = 100, n = 50 and acc = 40% n W, computed the precson and recall, and generated the plot as shown n Fg. 3. From Fg. 3, wglasso dsplays a clear mprovement over neghbor selecton and graphcal LASSO methods. Ths agrees wth our expectaton snce wglasso consders whether the connecton has supportng evdence from database and how well t fts the data n the model. Mcroarray data We appled the proposed dwglasso algorthm on two breast cancer mcroarray datasets: Bld et al. and van de Vjver et al. datasets [39, 40]. The former ncludes 158 patents wth all ther survval records, and was used for tranng. We excluded patents wth less than 5-year follow-up tme. Among the remanng patents, 42 wth less than 5-year survval durng the follow-up tme were consdered to form hgh rsk group whle the other 60 formed the low rsk group. van de Vjver et al. dataset contans 295 breast cancer patents, together wth ther survval records, and was used for ndependent testng. Both datasets are avalable at PRECOG webste ( precog.stanford.edu), an onlne repostory for queryng cancer gene expresson and clncal data, and have been

7 Zuo et al. BMC Bonformatcs (2017) 18:99 Page 7 of 14 Fg. 2 Framework for dwglasso preprocessed for subsequent statstcal analyss [41]. The raw Bld et al. and van de Vjver et al. datasets are also avalable at Gene Expresson Omnbus (GSE3143) and R package seventygenedata, respectvely [42]. Our nterest s to obtan a prortzed sgnfcant gene lst based on dwglasso for more accurate survval tme predcton. The workflow s shown n Fg. 4. We frst performed unvarate analyss on Bld et al. dataset to select a lst of statstcally sgnfcant genes based on concordance ndex between the expresson value and survval tme [43]. Ths lead to a total of 58 genes whose adjusted p-values were less than The nflaton of Type I error caused by multple testng was controlled by the false dscovery rate (FDR) usng the Benjamn-Hochberg procedure. The total 58 sgnfcant genes are ncluded n Addtonal fle 1: Table S1 along wth ther assocated adjusted p-values. We then appled wglasso algorthm to buld two separate networks usng the total 58 sgnfcant genes for the hgh rsk and low rsk groups, respectvely. The weght matrx W was constructed based on the confdence scores from STRING database after nputtng the 58 sgnfcant genes to nvestgate the PPIs among them. For gene pars wth no confdence scores from STRING, we assgned the correspondng entres n W to zeros. In wglasso, we performed 10-fold cross valdaton and chose the optmal tunng parameter λ opt by one standard error rule. Fg. 5 shows our chose of λ opt : λ opt = for hgh rsk group and λ opt = for low rsk group. From the Table 1 The mean and standard devaton (n parenthess) of false postves (FP) and false negatves (FN) for connectons from neghbor selecton (NS), graphcal LASSO (glasso) and weghted graphcal LASSO (wglasso) methods under dfferent node number (p) and sample sze (n) scenaros p n NS (or) NS (and) glasso wglasso (acc = 60%) wglasso (acc = 40%) FP FN FP FN FP FN FP FN FP FN (17) 151 (10) 166 (15) 157 (10) 154 (23) 148 (11) 112 (17) 104 (11) 129 (18) 122 (11) (16) 111 (15) 132 (17) 122 (16) 114 (20) 112 (15) 82 (15) 74 (13) 93 (16) 87 (12) (13) 59 (18) 78 (15) 72 (21) 79 (17) 63 (19) 51 (11) 39 (14) 58 (13) 50 (15) (42) 679 (77) 758 (43) 738 (82) 710 (48) 681 (77) 480 (36) 451 (66) 549 (39) 526 (60) (30) 453 (129) 473 (42) 493 (134) 431 (40) 468 (129) 277 (26) 290 (87) 330 (31) 313 (106) (22) 164 (117) 189 (27) 177 (118) 199 (28) 186 (126) 109 (18) 110 (76) 130 (21) 135 (88) The best performance s marked n bold

8 Zuo et al. BMC Bonformatcs (2017) 18:99 Page 8 of 14 Fg. 3 Precson recall curves for neghbor selecton, graphcal LASSO and weghted graphcal LASSO methods under p = 100, n = 50 and acc = 40% networks, we( calculated ) the node degree for each gene n two groups d h, dl, scaled them based on the network ( ) sze sd h, sdl, and computed the dfferental network ) score (dns = sd h sd l. At last, we prortzed the 58 sgnfcant genes based on the network dfferental scores n a decreasng order. To evaluate whether dwglasso could lead to more accurate survval tme predcton, we tested the prortzed gene lst usng dfferent methods on the ndependent van de Vjver et al. dataset. The 295 patents were dvded nto hgh rsk and low rsk groups accordng to the rsk scores calculated usng multvarate Cox regresson from the top 10 sgnfcant genes based on dwglasso, a competng pror knowledge ncorporated network analyss method (.e., KDDN), and conventonal dfferental gene expresson analyss (.e., concordance ndex). Unlke dwglasso that bulds group-specfc networks, KDDN generates only one network wth all rewrng connectons. From the network constructed by KDDN, we computed the node degree for each gene to help prortze the sgnfcant gene lst. Kaplan-Meer survval analyss was then performed to evaluate the performance of the above three scenaros. The resultng survval curves are shown n Fgs. 6a, b, and d. To evaluate how much the ncorporaton of pror bologcal knowledge contrbutes to the mproved performance of dwglasso, we tested the top 10 sgnfcant genes selected based on dwglasso wth no pror bologcal knowledge ncorporated (.e., W = 0). The resultng survval curve s shown n Fg. 6c. As expected, dwglasso wth no pror bologcal knowledge ncorporated s equvalent to usng graphcal LASSO n buldng group specfc networks (Fg. 4). As llustrated n Fg. 6, the top 10 sgnfcant genes from dwglasso wth pror bologcal knowledge ncorporated yelded the best performance (p value = ,hazardrato= 3.325), compared to the top 10 sgnfcant genes from KDDN (p value = ,hazardrato= 3.304), the top 10 sgnfcant Fg. 4 Workflow of dwglasso for more accurate survval tme predcton on mcroarray data

9 Zuo et al. BMC Bonformatcs (2017) 18:99 Page 9 of 14 Fg. 5 Error curves to choose optmal tunng parameter λ opt usng 10-fold cross valdaton by one standard error rule. The blue lne ndcates the one standard error for λ mn n the drecton of ncreasng regularzaton genes based on dwglasso wth no pror bologcal knowledge ncorporated (p value = , hazard rato = 2.316), and the top 10 sgnfcant genes based on concordance ndex (p value = 0.002, hazard rato = 2.037). We beleve the mproved performance acheved by dwglasso and KDDN are due to the extra nformaton provded from the topologcal changes between hgh rsk and low rsk groups. Also, dwglasso and KDDN beneft from ncorporatng pror bologcal knowledge to obtan more relable and bologcally relevant genes shared across ndependent datasets, leadng to better predcton performance than those that do not use pror bologcal knowledge (Fg. 6). Table 2 presents the top 10 sgnfcant genes selected based on concordance ndex and dwglasso wth pror bologcal knowledge ncorporated, together wth ther adjusted p-values. The top 10 genes from the other methods are presented n Addtonal fles 2: Table S2. Among the top 10 sgnfcant genes based on dwglasso n Table 2, UBE2S has been reported to be over-expressed n breast cancer [44]. The authors showed UBE2S knockdown suppressed the malgnant characterstcs of breast cancer cells, such as mgraton, nvason, and anchorage-ndependent growth. SALL2 has also been reported as a predctor of lymph node metastass n breast cancer [45]. Unlke UBE2S, SALL2 was dentfed as a tumor suppressor gene that can suppress cell growth when over-expressed [46]. Addtonally, XBP1 has been Fg. 6 Survval curves. a top 10 sgnfcant genes based on dwglasso wth pror bologcal knowledge ncorporated, b top 10 sgnfcant genes based on KDDN,c top 10 sgnfcant genes based on dwglasso wth no pror knowledge ncorporated, d top 10 sgnfcant genes based on concordance ndex

10 Zuo et al. BMC Bonformatcs (2017) 18:99 Page 10 of 14 Table 2 The top 10 sgnfcant genes based on conventonal dfferental gene expresson analyss (.e., concordance ndex) and dwglasso wth pror bologcal knowledge ncorporated, along wth ther adjusted p-value Top 10 sgnfcant genes based on concordance ndex Top 10 sgnfcant genes based on dwglasso Gene symbol Adjusted p-value Gene symbol Adjusted p-value BTD SALL FKTN UBE2S LRRC RAB11FIP RAB11FIP KIAA EMX XBP HNRNPAB KIAA TKT EMX LANCL OAZ TFF NDC USF CCT Common genes are marked n bold reported to be actvated n trple-negatve breast cancer and has a pvotal role n the tumorgencty and progresson of ths breast cancer subtype [47]. KIAA0922 has also been reported as a novel nhbtor of Wnt sgnalng pathway, whch s closely related to breast cancer [48]. None of UBE2S, SALL2, XBP1 and KIAA0922 s among the top 10 sgnfcant genes based on concordance ndex accordng to Table 2. In Fg. 7, we showed the neghbors of UBE2S and SALL2 n the hgh rsk and low rsk groups based on the networks created by wglasso from Bld et al. dataset. UBE2S s over-expressed n the hgh rsk group whle SALL2 s under-expressed. Ths agrees wth that UBE2S s a promotng breast cancer gene whle SALL2 s a suppressor breast cancer gene [44, 46]. Addtonally, UBE2S has hgher scaled node degree n the hgh rsk group whle SALL2 ( has hgher scaled node degree n the low rsk group sdube2s h = 0.286, sdl UBE2S = 0.778, sdh SALL2 = 1.0, sd l SALL2 = ). Ths shows, as a promotng breast cancer gene, UBE2S s more actvely connected wth ts neghbors n the hgh rsk group whle, the suppressor breast cancer gene, SALL2 s more actvely connected wth ts neghbors n the low rsk group. In Fg. 7, yellow edges represent connectons that have been supported from STRING database. We can see that these connectons based on pror bologcal knowledge are not always showng up from the output of wglasso. Ths s a nce property snce pror bologcal knowledge only provdes evdence. We stll need the support from the data to make a connecton. Therefore, by ntegratng pror bologcal knowledge nto data-drven models, we expect to buld more robust and bologcally relevant networks. Table 3 shows the survval tme predcton performance when the top 5, top 10 and top 15 sgnfcant genes are selected by each of the four methods as the nputs to the multvarate Cox regresson model (Fg. 6). In all three cases, the proposed dwglasso algorthm wth pror bologcal knowledge ncorporated acheved the best performance, followed by KDDN and dwglasso wthout pror bologcal knowledge ncorporated. The method that reles purely on concordance ndex had the least performance. RNA-seq data Usng UCSC Cancer Genomcs Browser, we obtaned TCGA RNA-seq data (level 3) acqured from patents wth HCC [49]. The RNA-seq data was acqured by analyss of 423 lver tssues, ncludng 371 prmary tumor, 50 sold normal and 2 recurrent tumor samples based on Illumna HSeq 2000 RNA Sequencng platform and mapped onto the human genome coordnates usng UCSC cgdata HUGO probemap. Among the 371 prmary tumor samples, 50 of them can fnd ts correspondng sold normal samples. To evaluate dwglasso on RNA-seq data, we apply a workflow shown n Fg. 8. We frst pcked out the 100 samples whose tumor tssues and ther correspondng non-tumorous tssues can both be found. Randomly, we selected 60 of them (30 tumor samples and ther correspondng normal samples) as the tranng dataset. The remanng 40 samples (20 tumor samples and ther correspondng normal samples) were used as testng dataset 1. Consderng testng dataset 1 only contans 40 samples, we created testng dataset 2 by combnng the above 40 samples and the remanng 321 tumor samples whose correspondng normal samples cannot be found. Wth testng datasets 1 and 2, we evaluated the performance of dwglasso on both balanced and large sample sze datasets. Specfcally, we preprocessed RNA-seq data usng R package DESeq2 on the tranng dataset [50].

11 Zuo et al. BMC Bonformatcs (2017) 18:99 Page 11 of 14 Fg. 7 Neghbors of UBE2S and SALL2 n two groups. a neghbors of UBE2S n the hgh rsk group, b neghbors of UBE2S n the low rsk group, c neghbors of SALL2 n the hgh rsk group, d neghbors of SALL2 n the low rsk group. Label colors represent over- (red) or under- (green) expresson n the hgh rsk group. Node shapesndcate unque (crcle) or shared(rectangle) genes between the two groups. Node colors show the sgnfcance ofthe gene expresson valuebetween the twogroups. Yellow edges represent nteractons recorded n the STRING database. Thckness of the edge ndcates the strength of the nteracton From DESeq2, we selected statstcally sgnfcant genes whose adjusted p-values were less than 0.01 for subsequent analyss. At ths step, the number of sgnfcant genes s typcally between 1000 and We prortzed the sgnfcant gene lst based on dwglasso. From the prortzed gene lst, the top 5 genes were selected to tran a logstc regresson classfer to dstngush tumor and normal samples. The traned logstc regresson classfer was fnally evaluated on testng datasets 1 and 2. To compare dwglasso wth other methods, we also prortzed the sgnfcant gene lst based on adjusted p-value from DESeq2, dwglasso wthout pror bologcal knowledge ncorporated and KDDN, bult logstc regresson classfer usng the top 5 genes on the prortzed lst and evaluated the traned classfer on the testng datasets 1 and 2. Table 3 The survval tme predcton performance (p-value and hazard rato) for the top 5, top 10 and top 15 sgnfcant genes based on concordance ndex: DEA, dwglasso wth no pror bologcal knowledge ncorporated: dwglasso (no pror), KDDN, and dwglasso wth pror bologcal knowledge ncorporated: dwglasso (pror) Top 5 sgnfcant genes Top 10 sgnfcant genes Top 15 sgnfcant genes Methods p-value Hazard rato p-value Hazard rato p-value Hazard rato DEA E E dwglasso (no pror) E E KDDN E E dwglasso (pror) E E The best performance s marked n bold when the gene number s fxed

12 Zuo et al. BMC Bonformatcs (2017) 18:99 Page 12 of 14 Fg. 8 Workflow of dwglasso for more accurate classfcaton predcton on RNA-seq data The above procedure was repeated 100 tmes and the means and standard devatons for senstvty, specfcty and area under curve (AUC) were calculated usng testng datasets 1 and 2 as shown n Table 4. In agreement wth mcroarray data, network-based methods wth pror bologcal knowledge ncorporated yelded the best performance, followed by network-based method wthout pror bologcal knowledge ncorporated, and the conventonal dfferental gene expresson analyss method was the worst. Ths s expected snce both dwglasso and KDDN methods take nto account of the changes of genes at gene expresson and network topology levels, and ncorporate pror bologcal knowledge nto ther network models. Concluson In ths paper, we apply a novel network nference method, wglasso to ntegrate pror bologcal knowledge nto a data-drvenmodel.wealsoproposeanewnetwork-based dfferental gene expresson analyss method dwglasso for better dentfcaton of genes assocated wth bologcally dsparate groups. Smulaton results show that wglasso can acheve better performance n buldng bologcally relevant networks than purely data-drven models (e.g., neghbor selecton and graphcal LASSO) even when only a moderate level of nformaton s avalable as pror bologcal knowledge. We demonstrate the performance of dwglasso n survval tme predcton usng two ndependent mcroarray breast cancer datasets prevously publshed by Bld et al. and van de Vjver et al. The top 10 genes selected by dwglasso based on the dataset from Bld et al. dataset lead to a sgnfcantly mproved survval tme predcton on the dataset from van de Vjver et al., compared wth the top 10 sgnfcant genes obtaned by conventonal dfferental gene expresson analyss. Among the top 10 genes selected by Table 4 The mean and standard devaton (n parenthess) of senstvty, specfcty and area under curve (AUC) calculated for conventonal dfferental gene expresson analyss: DEA, dwglasso wth no pror bologcal knowledge ncorporated: dwglasso (no pror), KDDN, and dwglasso wth pror bologcal knowledge ncorporated: dwglasso (pror) Testng dataset 1 Testng dataset 2 Methods Specfcty Senstvty AUC Specfcty Senstvty AUC DEA (0.07) (0.06) (0.04) (0.07) (0.04) (0.01) dwglasso (no pror) (0.03) (0.11) (0.02) (0.03) (0.05) (0.01) KDDN (0.08) (0.04) (0.02) (0.08) (0.03) (0.01) dwglasso (pror) (0.03) (0.07) (0.03) (0.03) (0.03) (0.01) The best performance s marked n bold

13 Zuo et al. BMC Bonformatcs (2017) 18:99 Page 13 of 14 dwglasso, UBE2S, SALL2, XBP1 and KIAA0922 have been prevously reported to be relevant n breast cancer bomarker dscovery study. We also tested dwglasso usng TCGA RNA-seq data acqured from patents wth HCC on tumors samples and ther correspondng nontumorous lver tssues. Improved senstvty, specfcty and AUC were observed when comparng dwglasso wth conventonal dfferental gene expresson analyss method. Future research work wll focus on applyng dwglasso on other omc studes such as proteomcs and metabolomcs. Addtonal fles Addtonal fle 1: Table S1: The total 58 sgnfcant genes along wth ther assocated adjusted p-values. (CSV1.09kb) Addtonal fle 2: Table S2: The top 10 sgnfcant genes based on KDDN and dwglasso wthout pror bologcal knowledge ncorporated. (CSV 4.00 kb) Abbrevatons AIC: Akake nformaton crteron; AUC: area under curve; BIC: Bayesan nformaton crteron; DEA: dfferental gene expresson analyss; DN: dfferental network; dwglasso: dfferentally weghted graphcal LASSO; FDR: false dscovery rate; FP: false postves; FN: false negatves; GGMs: Gaussan graphcal models; HCC: hepatocellular carcnoma; KDDN: Knowledge-fused dfferental dependency network; LASSO: least absolute shrnkage and selecton operator; MAP: maxmum a posteror; MLE: maxmum lkelhood estmaton; PPI: proten-proten nteracton; wglasso: weghted graphcal LASSO Acknowledgements None. Fundng Ths work s n part supported by the Natonal Insttutes of Health Grants U01CA185188, R01CA and R01GM awarded to HWR. Avalablty of supportng data The datasets supportng the results of ths artcle are ncluded wthn the artcle and ts addtonal fles, or from referenced sources. Authors contrbutons YZ desgned and mplemented the algorthms, conducted the synthetc smulaton and real data applcaton, and drafted the paper. YC collected the two mcroarray datasets and partcpated n generatng the results for the synthetc smulaton and real data applcaton. GY and RL provded expertse n dfferental expresson analyss. HWR drected the project and completed the paper. All authors revewed and approved the fnal manuscrpt. Competng nterests The authors declare that they have no competng nterests. Consent for publcaton Not applcable. Ethcs approval and consent to partcpate Not applcable. Author detals 1 Department of Electrcal and Computer Engneerng, Vrgna Polytechnc Insttute and State Unversty, Arlngton, VA, USA. 2 Department of Radaton Oncology, Stanford Unversty, Palo Alto, CA, USA. 3 Lombard Comprehensve Cancer Center, Georgetown Unversty, Washngton, DC, USA. Receved: 18 February 2016 Accepted: 31 January 2017 References 1. Tusher VG, Tbshran R, Chu G. Sgnfcance analyss of mcroarrays appled to the onzng radaton response. Proc Natl Acad Sc. 2001;98(9): Newton MA, Kendzorsk CM, Rchmond CS, Blattner FR, Tsu KW. On dfferental varablty of expresson ratos: mprovng statstcal nference about gene expresson changes from mcroarray data. J Comput Bol. 2001;8(1): Efron B, Tbshran R, Storey JD, Tusher V. Emprcal bayes analyss of a mcroarray experment. J Am Stat Assoc. 2001;96(456): En-Dor L, Zuk O, Domany E. Thousands of samples are needed to generate a robust gene lst for predctng outcome n cancer. Proc Natl Acad Sc. 2006;103(15): Zuo Y, Yu G, Zhang C, Ressom HW. A new approach for mult-omc data ntegraton. In: Bonformatcs and Bomedcne (BIBM), 2014 IEEE Internatonal Conference On; p IEEE. 6. Butte AJ, Kohane IS. Unsupervsed knowledge dscovery n medcal databases usng relevance networks. In: Proceedngs of the AMIA Symposum; p Amercan Medcal Informatcs Assocaton. 7. Butte AJ, Kohane IS. Mutual nformaton relevance networks: functonal genomc clusterng usng parwse entropy measurements. Pac Symp Bocomput. 2000;5: Cteseer. 8. Zuo Y, Yu G, Tadesse MG, Ressom HW. Bologcal network nference usng low order partal correlaton. Methods. 2014;69(3): Fredman N, Lnal M, Nachman I, Pe er D. Usng Bayesan networks to analyze expresson data. J Comput Bol. 2000;7(3-4): Toh H, Hormoto K. Inference of a genetc network by a combned approach of cluster analyss and graphcal gaussan modelng. Bonformatcs. 2002;18(2): Dobra A, Hans C, Jones B, Nevns JR, Yao G, West M. Sparse graphcal models for explorng gene expresson data. J Multvar Anal. 2004;90(1): Kshno H, Waddell PJ. Correspondence analyss of genes and tssue types and fndng genetc lnks from mcroarray data. Genome Inform. 2000;11: Dempster AP. Covarance selecton. Bometrcs. 1972; Edwards D. Introducton to Graphcal Modellng: Sprnger Scence & Busness Meda; Schäfer J, Strmmer K. An emprcal bayes approach to nferrng large-scale gene assocaton networks. Bonformatcs. 2005;21(6): Menshausen N, Bühlmann P. Hgh-dmensonal graphs and varable selecton wth the lasso. Ann Stat. 2006; Fredman J, Haste T, Tbshran R. Sparse nverse covarance estmaton wth the graphcal lasso. Bostatstcs. 2008;9(3): Mazumder R, Haste T. The graphcal lasso: New nsghts and alternatves. Electron J Stat. 2012;6: Snel B, Lehmann G, Bork P, Huynen MA. Strng: a web-server to retreve and dsplay the repeatedly occurrng neghbourhood of a gene. Nuclec Acds Res. 2000;28(18): Kanehsa M, Goto S. Kegg: kyoto encyclopeda of genes and genomes. Nuclec Acds Res. 2000;28(1): Stark C, Bretkreutz BJ, Reguly T, Boucher L, Bretkreutz A, Tyers M. Bogrd: a general repostory for nteracton datasets. Nuclec Acds Res. 2006;34(suppl 1): Kamburov A, Werlng C, Lehrach H, Herwg R. Consensuspathdb a database for ntegratng human functonal nteracton networks. Nuclec Acds Res. 2009;37(suppl 1): Chuang HY, Lee E, Lu Y-T, Lee D, Ideker T. Network-based classfcaton of breast cancer metastass. Mol Syst Bol. 2007;3(1):. 24. Zuo Y, Yu G, Ressom HW. Integratng pror bologcal knowledge and graphcal lasso for network nference. In: Bonformatcs and Bomedcne (BIBM), 2015 IEEE Internatonal Conference On; p IEEE. 25. Wang Z, Xu W, San Lucas FA, Lu Y. Incorporatng pror knowledge nto gene network study. Bonformatcs. 2013;29(20): L Y, Jackson SA. Gene network reconstructon by ntegraton of pror bologcal knowledge. G3: Genes Genomes Genetcs. 2015;5(6): Ha MJ, Baladandayuthapan V, Do K-A. Dngo: dfferental network analyss n genomcs. Bonformatcs. 2015;31(21):

14 Zuo et al. BMC Bonformatcs (2017) 18:99 Page 14 of Zhang B, L H, Rggns RB, Zhan M, Xuan J, Zhang Z, Hoffman EP, Clarke R, Wang Y. Dfferental dependency network analyss to dentfy condton-specfc topologcal changes n bologcal networks. Bonformatcs. 2009;25(4): Tan Y, Zhang B, Hoffman EP, Clarke R, Zhang Z, Shh IM, Xuan J, Herrngton DM, Wang Y. Knowledge-fused dfferental dependency network models for detectng sgnfcant rewrng n bologcal networks. BMC Syst Bol. 2014;8(1): Tan Y, Zhang B, Hoffman EP, Clarke R, Zhang Z, Shh IM, Xuan J, Herrngton DM, Wang Y. Kddn: an open-source cytoscape app for constructng dfferental dependency networks wth sgnfcant rewrng. Bonformatcs. 2015;31(2): We Z, L H. A markov random feld model for network-based analyss of genomc data. Bonformatcs. 2007;23(12): Chouvardas P, Kollas G, Nkolaou C. Inferrng actve regulatory networks from gene expresson data usng a combnaton of pror knowledge and enrchment analyss. BMC Bonforma. 2016;17(5): We P, Pan W. Incorporatng gene networks nto statstcal tests for genomc data va a spatally correlated mxture model. Bonformatcs. 2008;24(3): Bnder H, Schumacher M. Incorporatng pathway nformaton nto boostng estmaton of hgh-dmensonal rsk predcton models. BMC Bonforma. 2009;10(1): Tbshran R. Regresson shrnkage and selecton va the lasso. J R Stat Soc Seres B (Methodol). 1996; Menshausen N, Bühlmann P. Stablty selecton. J R Stat Soc Seres B (Stat Methodol). 2010;72(4): Barabas AL, Oltva ZN. Network bology: understandng the cell s functonal organzaton. Nat Rev Genet. 2004;5(2): Zhao T, Lu H, Roeder K, Lafferty J, Wasserman L. The huge package for hgh-dmensonal undrected graph estmaton n r. J Mach Learn Res. 2012;13(1): Bld AH, Yao G, Chang JT, Wang Q, Pott A, Chasse D, Josh MB, Harpole D, Lancaster JM, Berchuck A, et al. Oncogenc pathway sgnatures n human cancers as a gude to targeted therapes. Nature. 2006;439(7074): Van De Vjver MJ, He YD, van t Veer LJ, Da H, Hart AA, Voskul DW, Schreber GJ, Peterse JL, Roberts C, Marton MJ, et al. A gene-expresson sgnature as a predctor of survval n breast cancer. N Engl J Med. 2002;347(25): Gentles AJ, Newman AM, Lu CL, Bratman SV, Feng W, Km D, Nar VS, Xu Y, Khuong A, Hoang CD, et al. The prognostc landscape of genes and nfltratng mmune cells across human cancers. Nat Med. 2015;21(8): Marchonn L, Afsar B, Geman D, Leek JT. A smple and reproducble breast cancer prognostc test. BMC Genomcs. 2013;14(1): Pencna MJ, D Agostno RB. Overall c as a measure of dscrmnaton n survval analyss: model specfc populaton value and confdence nterval estmaton. Stat Med. 2004;23(13): Ayesha AK, Hyodo T, Asano E, Sato N, Mansour MA, Ito S, Hamaguch M, Senga T. UBE2S s assocated wth malgnant characterstcs of breast cancer cells. Tumor Bol. 2016;37(1): Huang E, Cheng SH, Dressman H, Pttman J, Tsou MH, Horng CF, Bld A, Iversen ES, Lao M, Chen CM, et al. Gene expresson predctors of breast cancer outcomes. Lancet. 2003;361(9369): Lu H, Adler AS, Segal E, Chang HY. A transcrptonal program medatng entry nto cellular quescence. PLoS Genet. 2007;3(6): Chen X, Ilopoulos D, Zhang Q, Tang Q, Greenblatt MB, Hatzapostolou M, Lm E, Tam WL, N M, Chen Y, et al. Xbp1 promotes trple-negatve breast cancer by controllng the hf1 [agr] pathway. Nature. 2014;508(7494): Maharz N, Parett V, Nelson E, Dent S, Robledo-Sarmento M, Setterblad N, Parceler A, Pla M, Sgaux F, Gluckman JC, et al. Identfcaton of tmem131l as a novel regulator of thymocyte prolferaton n humans. J Immunol. 2013;190(12): Zhu J, Sanborn JZ, Benz S, Szeto C, Hsu F, Kuhn RM, Karolchk D, Arche J, Lenburg ME, Esserman LJ, et al. The ucsc cancer genomcs browser. Nat Methods. 2009;6(4): Love MI, Huber W, Anders S. Moderated estmaton of fold change and dsperson for rna-seq data wth deseq2. Genome Bol. 2014;15(12):1. Submt your next manuscrpt to BoMed Central and we wll help you at every step: We accept pre-submsson nqures Our selector tool helps you to fnd the most relevant journal We provde round the clock customer support Convenent onlne submsson Thorough peer revew Incluson n PubMed and all major ndexng servces Maxmum vsblty for your research Submt your manuscrpt at

Copy Number Variation Methods and Data

Copy Number Variation Methods and Data Copy Number Varaton Methods and Data Copy number varaton (CNV) Reference Sequence ACCTGCAATGAT TAAGCCCGGG TTGCAACGTTAGGCA Populaton ACCTGCAATGAT TAAGCCCGGG TTGCAACGTTAGGCA ACCTGCAATGAT TTGCAACGTTAGGCA

More information

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores Parameter Estmates of a Random Regresson Test Day Model for Frst Three actaton Somatc Cell Scores Z. u, F. Renhardt and R. Reents Unted Datasystems for Anmal Producton (VIT), Hedeweg 1, D-27280 Verden,

More information

INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER

INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER LI CHEN 1,2, JIANHUA XUAN 1,*, JINGHUA GU 1, YUE WANG 1, ZHEN ZHANG 2, TIAN LI WANG 2, IE MING SHIH 2 1The Bradley Department

More information

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) Internatonal Assocaton of Scentfc Innovaton and Research (IASIR (An Assocaton Unfyng the Scences, Engneerng, and Appled Research Internatonal Journal of Emergng Technologes n Computatonal and Appled Scences

More information

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures Usng the Perpendcular Dstance to the Nearest Fracture as a Proxy for Conventonal Fracture Spacng Measures Erc B. Nven and Clayton V. Deutsch Dscrete fracture network smulaton ams to reproduce dstrbutons

More information

Reconstruction of gene regulatory network of colon cancer using information theoretic approach

Reconstruction of gene regulatory network of colon cancer using information theoretic approach Reconstructon of gene regulatory network of colon cancer usng nformaton theoretc approach Khald Raza #1, Rafat Parveen * # Department of Computer Scence Jama Mlla Islama (Central Unverst, New Delh-11005,

More information

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of Appled Mathematcal Scences, Vol. 7, 2013, no. 41, 2047-2053 HIKARI Ltd, www.m-hkar.com Modelng Mult Layer Feed-forward Neural Network Model on the Influence of Hypertenson and Dabetes Melltus on Famly

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and Ths artcle appeared n a journal publshed by Elsever. The attached copy s furnshed to the author for nternal non-commercal research and educaton use, ncludng for nstructon at the authors nsttuton and sharng

More information

Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago

Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago Jont Modellng Approaches n dabetes research Clncal Epdemology Unt, Hosptal Clínco Unverstaro de Santago Outlne 1 Dabetes 2 Our research 3 Some applcatons Dabetes melltus Is a serous lfe-long health condton

More information

Physical Model for the Evolution of the Genetic Code

Physical Model for the Evolution of the Genetic Code Physcal Model for the Evoluton of the Genetc Code Tatsuro Yamashta Osamu Narkyo Department of Physcs, Kyushu Unversty, Fukuoka 8-856, Japan Abstract We propose a physcal model to descrbe the mechansms

More information

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana Internatonal Journal of Appled Scence and Technology Vol. 5, No. 6; December 2015 Modelng the Survval of Retrospectve Clncal Data from Prostate Cancer Patents n Komfo Anokye Teachng Hosptal, Ghana Asedu-Addo,

More information

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE JOHN H. PHAN The Wallace H. Coulter Department of Bomedcal Engneerng, Georga Insttute of Technology, 313 Ferst Drve Atlanta,

More information

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA Journal of Theoretcal and Appled Informaton Technology 2005 ongong JATIT & LLS ISSN: 1992-8645 www.jatt.org E-ISSN: 1817-3195 A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA 1 SUNGMIN

More information

Statistically Weighted Voting Analysis of Microarrays for Molecular Pattern Selection and Discovery Cancer Genotypes

Statistically Weighted Voting Analysis of Microarrays for Molecular Pattern Selection and Discovery Cancer Genotypes IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.2, December 26 73 Statstcally Weghted Votng Analyss of Mcroarrays for Molecular Pattern Selecton and Dscovery Cancer Genotypes

More information

Insights in Genetics and Genomics

Insights in Genetics and Genomics Insghts n Genetcs and Genomcs Research Artcle Open Access New Score Tests for Equalty of Varances n the Applcaton of DNA Methylaton Data Analyss [Verson ] Welang Qu Xuan L Jarrett Morrow Dawn L DeMeo Scott

More information

AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION

AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION www.arpapress.com/volumes/vol8issue2/ijrras_8_2_02.pdf AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION I. Jule 1 & E. Krubakaran 2 1 Department

More information

Study and Comparison of Various Techniques of Image Edge Detection

Study and Comparison of Various Techniques of Image Edge Detection Gureet Sngh et al Int. Journal of Engneerng Research Applcatons RESEARCH ARTICLE OPEN ACCESS Study Comparson of Varous Technques of Image Edge Detecton Gureet Sngh*, Er. Harnder sngh** *(Department of

More information

Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer

Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer Gene Selecton Based on Mutual Informaton for the Classfcaton of Mult-class Cancer Sheng-Bo Guo,, Mchael R. Lyu 3, and Tat-Mng Lok 4 Department of Automaton, Unversty of Scence and Technology of Chna, Hefe,

More information

Nonstandard Machine Learning Algorithms for Microarray Data Mining. Byoung-Tak Zhang

Nonstandard Machine Learning Algorithms for Microarray Data Mining. Byoung-Tak Zhang Nonstandard Machne Learnng Algorthms for Mcroarray Data Mnng Byoung-Tak Zhang Center for Bonformaton Technology (CBIT) & Bontellgence Laboratory School of Computer Scence and Engneerng Seoul Natonal Unversty

More information

A GEOGRAPHICAL AND STATISTICAL ANALYSIS OF LEUKEMIA DEATHS RELATING TO NUCLEAR POWER PLANTS. Whitney Thompson, Sarah McGinnis, Darius McDaniel,

A GEOGRAPHICAL AND STATISTICAL ANALYSIS OF LEUKEMIA DEATHS RELATING TO NUCLEAR POWER PLANTS. Whitney Thompson, Sarah McGinnis, Darius McDaniel, A GEOGRAPHICAL AD STATISTICAL AALYSIS OF LEUKEMIA DEATHS RELATIG TO UCLEAR POWER PLATS Whtney Thompson, Sarah McGnns, Darus McDanel, Jean Sexton, Rebecca Pettt, Sarah Anderson, Monca Jackson ABSTRACT:

More information

Appendix F: The Grant Impact for SBIR Mills

Appendix F: The Grant Impact for SBIR Mills Appendx F: The Grant Impact for SBIR Mlls Asmallsubsetofthefrmsnmydataapplymorethanonce.Ofthe7,436applcant frms, 71% appled only once, and a further 14% appled twce. Wthn my data, seven companes each submtted

More information

Evaluation of Literature-based Discovery Systems

Evaluation of Literature-based Discovery Systems Evaluaton of Lterature-based Dscovery Systems Melha Yetsgen-Yldz 1 and Wanda Pratt 1,2 1 The Informaton School, Unversty of Washngton, Seattle, USA. 2 Bomedcal and Health Informatcs, School of Medcne,

More information

Using Past Queries for Resource Selection in Distributed Information Retrieval

Using Past Queries for Resource Selection in Distributed Information Retrieval Purdue Unversty Purdue e-pubs Department of Computer Scence Techncal Reports Department of Computer Scence 2011 Usng Past Queres for Resource Selecton n Dstrbuted Informaton Retreval Sulleyman Cetntas

More information

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect Peer revew stream A comparson of statstcal methods n nterrupted tme seres analyss to estmate an nterventon effect a,b, J.J.J., Walter c, S., Grzebeta a, R. & Olver b, J. a Transport and Road Safety, Unversty

More information

Lymphoma Cancer Classification Using Genetic Programming with SNR Features

Lymphoma Cancer Classification Using Genetic Programming with SNR Features Lymphoma Cancer Classfcaton Usng Genetc Programmng wth SNR Features Jn-Hyuk Hong and Sung-Bae Cho Dept. of Computer Scence, Yonse Unversty, 134 Shnchon-dong, Sudaemoon-ku, Seoul 120-749, Korea hjnh@candy.yonse.ac.kr,

More information

Introduction ORIGINAL RESEARCH

Introduction ORIGINAL RESEARCH ORIGINAL RESEARCH Assessng the Statstcal Sgnfcance of the Acheved Classfcaton Error of Classfers Constructed usng Serum Peptde Profles, and a Prescrpton for Random Samplng Repeated Studes for Massve Hgh-Throughput

More information

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex 1 Sparse Representaton of HCP Grayordnate Data Reveals Novel Functonal Archtecture of Cerebral Cortex X Jang 1, Xang L 1, Jngle Lv 2,1, Tuo Zhang 2,1, Shu Zhang 1, Le Guo 2, Tanmng Lu 1* 1 Cortcal Archtecture

More information

Prediction of Human Disease-Related Gene Clusters by Clustering Analysis

Prediction of Human Disease-Related Gene Clusters by Clustering Analysis Int. J. Bol. Sc. 2011, 7 61 Research Paper Internatonal Journal of Bologcal Scences 2011; 7(1):61-73 Ivysprng Internatonal Publsher. All rghts reserved Predcton of Human Dsease-Related Gene Clusters by

More information

An Approach to Discover Dependencies between Service Operations*

An Approach to Discover Dependencies between Service Operations* 36 JOURNAL OF SOFTWARE VOL. 3 NO. 9 DECEMBER 2008 An Approach to Dscover Dependences between Servce Operatons* Shuyng Yan Research Center for Grd and Servce Computng Insttute of Computng Technology Chnese

More information

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters Tenth Internatonal Conference on Computatonal Flud Dynamcs (ICCFD10), Barcelona, Span, July 9-13, 2018 ICCFD10-227 Predcton of Total Pressure Drop n Stenotc Coronary Arteres wth Ther Geometrc Parameters

More information

Biomarker Selection from Gene Expression Data for Tumour Categorization Using Bat Algorithm

Biomarker Selection from Gene Expression Data for Tumour Categorization Using Bat Algorithm Receved: March 20, 2017 401 Bomarker Selecton from Gene Expresson Data for Tumour Categorzaton Usng Bat Algorthm Gunavath Chellamuthu 1 *, Premalatha Kandasamy 2, Svasubramanan Kanagaraj 3 1 School of

More information

Feature Selection for Predicting Tumor Metastases in Microarray Experiments using Paired Design

Feature Selection for Predicting Tumor Metastases in Microarray Experiments using Paired Design Feature Selecton for Predctng Tumor Metastases n Mcroarray Experments usng Pared Desgn Qhua Tan 1,2, Mads Thomassen 1 and Torben A. Kruse 1 ORIGINAL RESEARCH 1 Department of Bochemstry, Pharmacology and

More information

NHS Outcomes Framework

NHS Outcomes Framework NHS Outcomes Framework Doman 1 Preventng people from dyng prematurely Indcator Specfcatons Verson: 1.21 Date: May 2018 Author: Clncal Indcators Team NHS Outcomes Framework: Doman 1 Preventng people from

More information

Statistical models for predicting number of involved nodes in breast cancer patients

Statistical models for predicting number of involved nodes in breast cancer patients Vol.2, No.7, 641-651 (2010) do:10.4236/health.2010.27098 Health Statstcal models for predctng number of nvolved nodes n breast cancer patents Alok Kumar Dwved 1 *, Sada Nand Dwved 2, Suryanarayana Deo

More information

Optimal Planning of Charging Station for Phased Electric Vehicle *

Optimal Planning of Charging Station for Phased Electric Vehicle * Energy and Power Engneerng, 2013, 5, 1393-1397 do:10.4236/epe.2013.54b264 Publshed Onlne July 2013 (http://www.scrp.org/ournal/epe) Optmal Plannng of Chargng Staton for Phased Electrc Vehcle * Yang Gao,

More information

Survival Rate of Patients of Ovarian Cancer: Rough Set Approach

Survival Rate of Patients of Ovarian Cancer: Rough Set Approach Internatonal OEN ACCESS Journal Of Modern Engneerng esearch (IJME) Survval ate of atents of Ovaran Cancer: ough Set Approach Kamn Agrawal 1, ragat Jan 1 Department of Appled Mathematcs, IET, Indore, Inda

More information

Balanced Query Methods for Improving OCR-Based Retrieval

Balanced Query Methods for Improving OCR-Based Retrieval Balanced Query Methods for Improvng OCR-Based Retreval Kareem Darwsh Electrcal and Computer Engneerng Dept. Unversty of Maryland, College Park College Park, MD 20742 kareem@glue.umd.edu Douglas W. Oard

More information

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx Neuropsychologa xxx (200) xxx xxx Contents lsts avalable at ScenceDrect Neuropsychologa journal homepage: www.elsever.com/locate/neuropsychologa Storage and bndng of object features n vsual workng memory

More information

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16 310 Int'l Conf. Par. and Dst. Proc. Tech. and Appl. PDPTA'16 Akra Sasatan and Hrosh Ish Graduate School of Informaton and Telecommuncaton Engneerng, Toka Unversty, Mnato, Tokyo, Japan Abstract The end-to-end

More information

BINNING SOMATIC MUTATIONS BASED ON BIOLOGICAL KNOWLEDGE FOR PREDICTING SURVIVAL: AN APPLICATION IN RENAL CELL CARCINOMA

BINNING SOMATIC MUTATIONS BASED ON BIOLOGICAL KNOWLEDGE FOR PREDICTING SURVIVAL: AN APPLICATION IN RENAL CELL CARCINOMA BINNING SOMATIC MUTATIONS BASED ON BIOLOGICAL KNOWLEDGE FOR PREDICTING SURVIVAL: AN APPLICATION IN RENAL CELL CARCINOMA DOKYOON KIM, RUOWANG LI, SCOTT M. DUDEK, JOHN R. WALLACE, MARYLYN D. RITCHIE Center

More information

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis The Lmts of Indvdual Identfcaton from Sample Allele Frequences: Theory and Statstcal Analyss Peter M. Vsscher 1 *, Wllam G. Hll 2 1 Queensland Insttute of Medcal Research, Brsbane, Australa, 2 Insttute

More information

THE NATURAL HISTORY AND THE EFFECT OF PIVMECILLINAM IN LOWER URINARY TRACT INFECTION.

THE NATURAL HISTORY AND THE EFFECT OF PIVMECILLINAM IN LOWER URINARY TRACT INFECTION. MET9401 SE 10May 2000 Page 13 of 154 2 SYNOPSS MET9401 SE THE NATURAL HSTORY AND THE EFFECT OF PVMECLLNAM N LOWER URNARY TRACT NFECTON. L A study of the natural hstory and the treatment effect wth pvmecllnam

More information

Impact of Imputation of Missing Data on Estimation of Survival Rates: An Example in Breast Cancer

Impact of Imputation of Missing Data on Estimation of Survival Rates: An Example in Breast Cancer Orgnal Artcle Impact of Imputaton of Mssng Data on Estmaton of Survval Rates: An Example n Breast Cancer Banesh MR 1, Tale AR 2 Abstract Background: Multfactoral regresson models are frequently used n

More information

NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 TIANHONG ZHOU

NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 TIANHONG ZHOU NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 by TIANHONG ZHOU B.S., Chna Agrcultural Unversty, 2003 M.S., Chna Agrcultural Unversty, 2006 A THESIS submtted n partal fulfllment of the requrements

More information

The effect of salvage therapy on survival in a longitudinal study with treatment by indication

The effect of salvage therapy on survival in a longitudinal study with treatment by indication Research Artcle Receved 28 October 2009, Accepted 8 June 2010 Publshed onlne 30 August 2010 n Wley Onlne Lbrary (wleyonlnelbrary.com) DOI: 10.1002/sm.4017 The effect of salvage therapy on survval n a longtudnal

More information

Project title: Mathematical Models of Fish Populations in Marine Reserves

Project title: Mathematical Models of Fish Populations in Marine Reserves Applcaton for Fundng (Malaspna Research Fund) Date: November 0, 2005 Project ttle: Mathematcal Models of Fsh Populatons n Marne Reserves Dr. Lev V. Idels Unversty College Professor Mathematcs Department

More information

Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods

Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods Research Artcle Receved 30 September 2009, Accepted 15 March 2010 Publshed onlne n Wley Onlne Lbrary (wleyonlnelbrary.com) DOI: 10.1002/sm.3941 Estmatng the dstrbuton of the wndow perod for recent HIV

More information

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data Unobserved Heterogenety and the Statstcal Analyss of Hghway Accdent Data Fred L. Mannerng Professor of Cvl and Envronmental Engneerng Courtesy Department of Economcs Unversty of South Florda 4202 E. Fowler

More information

(From the Gastroenterology Division, Cornell University Medical College, New York 10021)

(From the Gastroenterology Division, Cornell University Medical College, New York 10021) ROLE OF HEPATIC ANION-BINDING PROTEIN IN BROMSULPHTHALEIN CONJUGATION* BY N. KAPLOWITZ, I. W. PERC -ROBB,~ ANn N. B. JAVITT (From the Gastroenterology Dvson, Cornell Unversty Medcal College, New York 10021)

More information

Resampling Methods for the Area Under the ROC Curve

Resampling Methods for the Area Under the ROC Curve Resamplng ethods for the Area Under the ROC Curve Andry I. Bandos AB6@PITT.EDU Howard E. Rockette HERBST@PITT.EDU Department of Bostatstcs, Graduate School of Publc Health, Unversty of Pttsburgh, Pttsburgh,

More information

What Determines Attitude Improvements? Does Religiosity Help?

What Determines Attitude Improvements? Does Religiosity Help? Internatonal Journal of Busness and Socal Scence Vol. 4 No. 9; August 2013 What Determnes Atttude Improvements? Does Relgosty Help? Madhu S. Mohanty Calforna State Unversty-Los Angeles Los Angeles, 5151

More information

A Geometric Approach To Fully Automatic Chromosome Segmentation

A Geometric Approach To Fully Automatic Chromosome Segmentation A Geometrc Approach To Fully Automatc Chromosome Segmentaton Shervn Mnaee ECE Department New York Unversty Brooklyn, New York, USA shervn.mnaee@nyu.edu Mehran Fotouh Computer Engneerng Department Sharf

More information

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article Jestr Journal of Engneerng Scence and Technology Revew () (08) 5 - Research Artcle Prognoss Evaluaton of Ovaran Granulosa Cell Tumor Based on Co-forest ntellgence Model Xn Lao Xn Zheng Juan Zou Mn Feng

More information

Tumor Phylogenetic Lineage Separation by Medoidshift Clustering with Non-Positive Kernel

Tumor Phylogenetic Lineage Separation by Medoidshift Clustering with Non-Positive Kernel Tumor Phylogenetc Lneage Separaton by Medodshft Clusterng wth Non-Postve Kernel Lu Xe School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA 15213 lxe1@andrew.cmu.edu Commttee Dr. Russell Schwartz,

More information

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS ELLIOTT PARKER and JEANNE WENDEL * Department of Economcs, Unversty of Nevada, Reno, NV, USA SUMMARY Ths paper examnes the econometrc

More information

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo Estmaton for Pavement Performance Curve based on Kyoto Model : A Case Study for Kazuya AOKI, PASCO CORPORATION, Yokohama, JAPAN, Emal : kakzo603@pasco.co.jp Octávo de Souza Campos, Publc Servces Regulatory

More information

Towards Prediction of Radiation Pneumonitis Arising from Lung Cancer Patients Using Machine Learning Approaches

Towards Prediction of Radiation Pneumonitis Arising from Lung Cancer Patients Using Machine Learning Approaches Towards Predcton of Radaton Pneumonts Arsng from Lung Cancer Patents Usng Machne Learnng Approaches Jung Hun Oh, Adtya Apte, Rawan Al-Loz, Jeffrey Bradley, Issam El Naqa * Dvson of Bonformatcs and Outcomes

More information

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015 Incorrect Belefs Overconfdence Econ 1820: Behavoral Economcs Mark Dean Sprng 2015 In objectve EU we assumed that everyone agreed on what the probabltes of dfferent events were In subjectve expected utlty

More information

Research Article Computational Analysis of Specific MicroRNA Biomarkers for Noninvasive Early Cancer Detection

Research Article Computational Analysis of Specific MicroRNA Biomarkers for Noninvasive Early Cancer Detection Hndaw BoMed Research Internatonal Volume 0, Artcle ID 00, pages https://do.org/0./0/00 Research Artcle Computatonal Analyss of Specfc McroRNA Bomarkers for Nonnvasve Early Detecton Tanc Song, Yanchun Lang,,

More information

USING DIFFERENTIAL GEOMETRIC LARS ALGORITHM TO STUDY THE EXPRESSION PROFILE OF A SAMPLE OF PATIENTS WITH LATEX-FRUIT SYNDROME

USING DIFFERENTIAL GEOMETRIC LARS ALGORITHM TO STUDY THE EXPRESSION PROFILE OF A SAMPLE OF PATIENTS WITH LATEX-FRUIT SYNDROME Electronc Journal of Appled Statstcal Analyss EJASA (211), Electron. J. App. Stat. Anal., Vol. 4, Issue 2, 227 234 e-issn 27-5948, DOI 1.1285/275948v4n2p227 211 Unverstà del Salento http://sba-ese.unle.t/ndex.php/ejasa/ndex

More information

Boosting for tumor classification with gene expression data. Seminar für Statistik, ETH Zürich, CH-8092, Switzerland

Boosting for tumor classification with gene expression data. Seminar für Statistik, ETH Zürich, CH-8092, Switzerland BIOINFORMATICS Vol. 19 no. 9 2003, pages 1061 1069 DOI: 10.1093/bonformatcs/btf867 Boostng for tumor classfcaton wth gene expresson data Marcel Dettlng and Peter Bühlmann Semnar für Statstk, ETH Zürch,

More information

Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA

Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA Alma Mater Studorum Unverstà d Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA Cclo XXVII Settore Concorsuale d afferenza: 13/D1 Settore Scentfco dscplnare: SECS-S/02

More information

An Introduction to Modern Measurement Theory

An Introduction to Modern Measurement Theory An Introducton to Modern Measurement Theory Ths tutoral was wrtten as an ntroducton to the bascs of tem response theory (IRT) modelng and ts applcatons to health outcomes measurement for the Natonal Cancer

More information

THIS IS AN OFFICIAL NH DHHS HEALTH ALERT

THIS IS AN OFFICIAL NH DHHS HEALTH ALERT THIS IS AN OFFICIAL NH DHHS HEALTH ALERT Dstrbuted by the NH Health Alert Network Health.Alert@dhhs.nh.gov August 26, 2016 1430 EDT (2:30 PM EDT) NH-HAN 20160826 Recommendatons for Accurate Dagnoss of

More information

AUTOMATED DETECTION OF HARD EXUDATES IN FUNDUS IMAGES USING IMPROVED OTSU THRESHOLDING AND SVM

AUTOMATED DETECTION OF HARD EXUDATES IN FUNDUS IMAGES USING IMPROVED OTSU THRESHOLDING AND SVM AUTOMATED DETECTION OF HARD EXUDATES IN FUNDUS IMAGES USING IMPROVED OTSU THRESHOLDING AND SVM Wewe Gao 1 and Jng Zuo 2 1 College of Mechancal Engneerng, Shangha Unversty of Engneerng Scence, Shangha,

More information

TOPICS IN HEALTH ECONOMETRICS

TOPICS IN HEALTH ECONOMETRICS TOPICS IN HEALTH ECONOMETRICS By VIDHURA SENANI BANDARA WIJAYAWARDHANA TENNEKOON A dssertaton submtted n partal fulfllment of the requrements for the degree of DOCTOR OF PHILOSOPHY WASHINGTON STATE UNIVERSITY

More information

Economic crisis and follow-up of the conditions that define metabolic syndrome in a cohort of Catalonia,

Economic crisis and follow-up of the conditions that define metabolic syndrome in a cohort of Catalonia, Economc crss and follow-up of the condtons that defne metabolc syndrome n a cohort of Catalona, 2005-2012 Laa Maynou 1,2,3, Joan Gl 4, Gabrel Coll-de-Tuero 5,2, Ton Mora 6, Carme Saurna 1,2, Anton Scras

More information

Integrative Computational Identifications of the Signaling Pathway Network Related to TNF-alpha Stimulus in Vascular Endothelial Cells

Integrative Computational Identifications of the Signaling Pathway Network Related to TNF-alpha Stimulus in Vascular Endothelial Cells Integratve Computatonal Identfcatons of the Sgnalng Pathway Network Related to -alpha Stmulus n Vascular Endothelal Cells Jn Gu, Shao L, Yang Chen, Yanda L MOE Key Laboratory of Bonformatcs and Bonformatcs

More information

Optimal probability weights for estimating causal effects of time-varying treatments with marginal structural Cox models

Optimal probability weights for estimating causal effects of time-varying treatments with marginal structural Cox models Optmal probablty weghts for estmatng causal effects of tme-varyng treatments wth margnal structural Cox models Mchele Santacatterna, Cela García-Pareja Rno Bellocco, Anders Sönnerborg, Anna Ma Ekström

More information

*VALLIAPPAN Raman 1, PUTRA Sumari 2 and MANDAVA Rajeswari 3. George town, Penang 11800, Malaysia. George town, Penang 11800, Malaysia

*VALLIAPPAN Raman 1, PUTRA Sumari 2 and MANDAVA Rajeswari 3. George town, Penang 11800, Malaysia. George town, Penang 11800, Malaysia 38 A Theoretcal Methodology and Prototype Implementaton for Detecton Segmentaton Classfcaton of Dgtal Mammogram Tumor by Machne Learnng and Problem Solvng *VALLIAPPA Raman, PUTRA Sumar 2 and MADAVA Rajeswar

More information

Richard Williams Notre Dame Sociology Meetings of the European Survey Research Association Ljubljana,

Richard Williams Notre Dame Sociology   Meetings of the European Survey Research Association Ljubljana, Rchard Wllams Notre Dame Socology rwllam@nd.edu http://www.nd.edu/~rwllam Meetngs of the European Survey Research Assocaton Ljubljana, Slovena July 19, 2013 Comparng Logt and Probt Coeffcents across groups

More information

ALMALAUREA WORKING PAPERS no. 9

ALMALAUREA WORKING PAPERS no. 9 Snce 1994 Inter-Unversty Consortum Connectng Unverstes, the Labour Market and Professonals AlmaLaurea Workng Papers ISSN 2239-9453 ALMALAUREA WORKING PAPERS no. 9 September 211 Propensty Score Methods

More information

Price linkages in value chains: methodology

Price linkages in value chains: methodology Prce lnkages n value chans: methodology Prof. Trond Bjorndal, CEMARE. Unversty of Portsmouth, UK. and Prof. José Fernández-Polanco Unversty of Cantabra, Span. FAO INFOSAMAK Tangers, Morocco 14 March 2012

More information

Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification

Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification Fast Algorthm for Vectorcardogram and Interbeat Intervals Analyss: Applcaton for Premature Ventrcular Contractons Classfcaton Irena Jekova, Vessela Krasteva Centre of Bomedcal Engneerng Prof. Ivan Daskalov

More information

Integration of sensory information within touch and across modalities

Integration of sensory information within touch and across modalities Integraton of sensory nformaton wthn touch and across modaltes Marc O. Ernst, Jean-Perre Brescan, Knut Drewng & Henrch H. Bülthoff Max Planck Insttute for Bologcal Cybernetcs 72076 Tübngen, Germany marc.ernst@tuebngen.mpg.de

More information

Effects of Estrogen Contamination on Human Cells: Modeling and Prediction Based on Michaelis-Menten Kinetics 1

Effects of Estrogen Contamination on Human Cells: Modeling and Prediction Based on Michaelis-Menten Kinetics 1 J. Water Resource and Protecton, 009,, 6- do:0.6/warp.009.500 Publshed Onlne ovember 009 (http://www.scrp.org/ournal/warp) Effects of Estrogen Contamnaton on Human Cells: Modelng and Predcton Based on

More information

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy Appendx for Insttutons and Behavor: Expermental Evdence on the Effects of Democrac 1. Instructons 1.1 Orgnal sessons Welcome You are about to partcpate n a stud on decson-makng, and ou wll be pad for our

More information

Comparison of support vector machine based on genetic algorithm with logistic regression to diagnose obstructive sleep apnea

Comparison of support vector machine based on genetic algorithm with logistic regression to diagnose obstructive sleep apnea Orgnal Artcle Comparson of support vector machne based on genetc algorthm wth logstc regresson to dagnose obstructve sleep apnea Zohreh Manoochehr, Nader Salar 1, Mansour Rezae 1, Habbolah Khazae 2, Sara

More information

The Influence of the Isomerization Reactions on the Soybean Oil Hydrogenation Process

The Influence of the Isomerization Reactions on the Soybean Oil Hydrogenation Process Unversty of Belgrade From the SelectedWorks of Zeljko D Cupc 2000 The Influence of the Isomerzaton Reactons on the Soybean Ol Hydrogenaton Process Zeljko D Cupc, Insttute of Chemstry, Technology and Metallurgy

More information

A Wild Bootstrap approach for the selection of biomarkers in early diagnostic trials

A Wild Bootstrap approach for the selection of biomarkers in early diagnostic trials Zapf et al. BMC Medcal Research Methodology 25 5:43 DOI.86/s2874-5-25-y RESEARCH ARTICLE Open Access A Wld Bootstrap approach for the selecton of bomarkers n early dagnostc trals Antona Zapf *, Edgar Brunner

More information

Non-linear Multiple-Cue Judgment Tasks

Non-linear Multiple-Cue Judgment Tasks Non-lnear Multple-Cue Tasks Anna-Carn Olsson (anna-carn.olsson@psy.umu.se) Department of Psychology, Umeå Unversty SE-09 87, Umeå, Sweden Tommy Enqvst (tommy.enqvst@psyk.uu.se) Department of Psychology,

More information

Evaluation of two release operations at Bonneville Dam on the smolt-to-adult survival of Spring Creek National Fish Hatchery fall Chinook salmon

Evaluation of two release operations at Bonneville Dam on the smolt-to-adult survival of Spring Creek National Fish Hatchery fall Chinook salmon Evaluaton of two release operatons at Bonnevlle Dam on the smolt-to-adult survval of Sprng Creek Natonal Fsh Hatchery fall Chnook salmon By Steven L. Haeseker and Davd Wlls Columba Rver Fshery Program

More information

Saeed Ghanbari, Seyyed Mohammad Taghi Ayatollahi*, Najaf Zare

Saeed Ghanbari, Seyyed Mohammad Taghi Ayatollahi*, Najaf Zare DOI:http://dx.do.org/10.7314/APJCP.2015.16.14.5655 and Anthracyclne- Breast Cancer Treatment and Survval n the Eastern Medterranean and Asa: a Meta-analyss RESEARCH ARTICLE Comparng Role of Two Chemotherapy

More information

Comparison among Feature Encoding Techniques for HIV-1 Protease Cleavage Specificity

Comparison among Feature Encoding Techniques for HIV-1 Protease Cleavage Specificity Internatonal Journal of Intellgent Systems and Applcatons n Engneerng Advanced Technology and Scence ISSN:2147-67992147-6799 http://jsae.atscence.org/ Orgnal Research Paper Comparson among Feature Encodng

More information

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION computng@tanet.edu.te.ua www.tanet.edu.te.ua/computng ISSN 727-6209 Internatonal Scentfc Journal of Computng FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION Gábor Takács ), Béla Patak

More information

A Computer-aided System for Discriminating Normal from Cancerous Regions in IHC Liver Cancer Tissue Images Using K-means Clustering*

A Computer-aided System for Discriminating Normal from Cancerous Regions in IHC Liver Cancer Tissue Images Using K-means Clustering* A Computer-aded System for Dscrmnatng Normal from Cancerous Regons n IHC Lver Cancer Tssue Images Usng K-means Clusterng* R. M. CHEN 1, Y. J. WU, S. R. JHUANG, M. H. HSIEH, C. L. KUO, Y. L. MA Department

More information

A-UNIFAC Modeling of Binary and Multicomponent Phase Equilibria of Fatty Esters+Water+Methanol+Glycerol

A-UNIFAC Modeling of Binary and Multicomponent Phase Equilibria of Fatty Esters+Water+Methanol+Glycerol -UNIFC Modelng of Bnary and Multcomponent Phase Equlbra of Fatty Esters+Water+Methanol+Glycerol N. Garrdo a, O. Ferrera b, R. Lugo c, J.-C. de Hemptnne c, M. E. Macedo a, S.B. Bottn d,* a Department of

More information

Drug Prescription Behavior and Decision Support Systems

Drug Prescription Behavior and Decision Support Systems Drug Prescrpton Behavor and Decson Support Systems ABSTRACT Adverse drug events plague the outcomes of health care servces. In ths research, we propose a clncal learnng model that ncorporates the use of

More information

DeSigN: connecting gene expression with therapeutics for drug repurposing and development

DeSigN: connecting gene expression with therapeutics for drug repurposing and development The Author(s) BMC Genomcs 2017, 18(Suppl 1):934 DOI 10.1186/s12864-016-3260-7 RESEARCH Open Access DeSgN: connectng gene expresson wth therapeutcs for drug repurposng and development Bernard Kok Bang Lee

More information

UNIVERISTY OF KWAZULU-NATAL, PIETERMARITZBURG SCHOOL OF MATHEMATICS, STATISTICS AND COMPUTER SCIENCE

UNIVERISTY OF KWAZULU-NATAL, PIETERMARITZBURG SCHOOL OF MATHEMATICS, STATISTICS AND COMPUTER SCIENCE UNIVERISTY OF KWAZULU-NATAL, PIETERMARITZBURG SCHOOL OF MATHEMATICS, STATISTICS AND COMPUTER SCIENCE A COMPLEX SURVEY DATA ANALYSIS OF TB AND HIV MORTALITY IN SOUTH AFRICA By JOIE LEA MURORUNKWERE STUDENT

More information

Available online at ScienceDirect. Procedia Computer Science 46 (2015 )

Available online at   ScienceDirect. Procedia Computer Science 46 (2015 ) Avalable onlne at www.scencedrect.com ScenceDrect Proceda Computer Scence 46 (215 ) 1762 1769 Internatonal Conference on Informaton and Communcaton Technologes (ICICT 214) Automatc Characterzaton of Bengn

More information

CONSTRUCTION OF STOCHASTIC MODEL FOR TIME TO DENGUE VIRUS TRANSMISSION WITH EXPONENTIAL DISTRIBUTION

CONSTRUCTION OF STOCHASTIC MODEL FOR TIME TO DENGUE VIRUS TRANSMISSION WITH EXPONENTIAL DISTRIBUTION Internatonal Journal of Pure and Appled Mathematcal Scences. ISSN 97-988 Volume, Number (7), pp. 3- Research Inda Publcatons http://www.rpublcaton.com ONSTRUTION OF STOHASTI MODEL FOR TIME TO DENGUE VIRUS

More information

Statistical Analysis on Infectious Diseases in Dubai, UAE

Statistical Analysis on Infectious Diseases in Dubai, UAE Internatonal Journal of Preventve Medcne Research Vol. 1, No. 4, 015, pp. 60-66 http://www.ascence.org/journal/jpmr Statstcal Analyss on Infectous Dseases 1995-013 n Duba, UAE Khams F. G. 1, Hussan H.

More information

A Classification Model for Imbalanced Medical Data based on PCA and Farther Distance based Synthetic Minority Oversampling Technique

A Classification Model for Imbalanced Medical Data based on PCA and Farther Distance based Synthetic Minority Oversampling Technique A Classfcaton Model for Imbalanced Medcal Data based on PCA and Farther Dstance based Synthetc Mnorty Oversamplng Technque NADIR MUSTAFA School of Computer Scence and Engneerng Unversty of Electronc Scence

More information

A Meta-Analysis of the Effect of Education on Social Capital

A Meta-Analysis of the Effect of Education on Social Capital A Meta-Analyss of the Effect of Educaton on Socal Captal Huang Jan ** "Scholar" Research Center for Educaton and Labor Market Department of Economcs, Unversty of Amsterdam and Tnbergen Insttute by Henrëtte

More information

Estimation of Relative Survival Based on Cancer Registry Data

Estimation of Relative Survival Based on Cancer Registry Data Revew of Bonformatcs and Bometrcs (RBB) Volume 2 Issue 4, December 203 www.sepub.org/rbb Estmaton of Relatve Based on Cancer Regstry Data Olaf Schoffer *, Ante Nedostate 2, Stefane J. Klug,2 Cancer Epdemology,

More information

Investigation of zinc oxide thin film by spectroscopic ellipsometry

Investigation of zinc oxide thin film by spectroscopic ellipsometry VNU Journal of Scence, Mathematcs - Physcs 24 (2008) 16-23 Investgaton of znc oxde thn flm by spectroscopc ellpsometry Nguyen Nang Dnh 1, Tran Quang Trung 2, Le Khac Bnh 2, Nguyen Dang Khoa 2, Vo Th Ma

More information

A Linear Regression Model to Detect User Emotion for Touch Input Interactive Systems

A Linear Regression Model to Detect User Emotion for Touch Input Interactive Systems 2015 Internatonal Conference on Affectve Computng and Intellgent Interacton (ACII) A Lnear Regresson Model to Detect User Emoton for Touch Input Interactve Systems Samt Bhattacharya Dept of Computer Scence

More information

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field Subject-Adaptve Real-Tme Sleep Stage Classfcaton Based on Condtonal Random Feld Gang Luo, PhD, Wanl Mn, PhD IBM TJ Watson Research Center, Hawthorne, NY {luog, wanlmn}@usbmcom Abstract Sleep stagng s the

More information

TTCA: an R package for the identification of differentially expressed genes in time course microarray data

TTCA: an R package for the identification of differentially expressed genes in time course microarray data Albrecht et al. BMC Bonformatcs (2017) 18:33 DOI 10.1186/s12859-016-1440-8 METHODOLOGY ARTICLE Open Access TTCA: an R package for the dentfcaton of dfferentally expressed genes n tme course mcroarray data

More information