Tumor Phylogenetic Lineage Separation by Medoidshift Clustering with Non-Positive Kernel

Size: px
Start display at page:

Download "Tumor Phylogenetic Lineage Separation by Medoidshift Clustering with Non-Positive Kernel"

Transcription

1 Tumor Phylogenetc Lneage Separaton by Medodshft Clusterng wth Non-Postve Kernel Lu Xe School of Computer Scence Carnege Mellon Unversty Pttsburgh, PA Commttee Dr. Russell Schwartz, Dr. Kathryn Roeder, Dr. Roy Maxon Abstract Background. Computatonal approaches have been wdely used for dssectng tumor subtypes through mxture models. One successful approach nvolves formulatng ths as a problem equvalent to dentfyng smplces n a hgh-dmensonal space of genomc pont clouds, where the vertces of a smplex are assumed to represent unque cell clones such as healthy stem cells, tumor progentors and tumor subtypes. The hghly heterogeneous nature and the complex evolutonary trajectores of tumors pose sgnfcant challenges to conventonal methods for unmxng tumor composton, whch can be smplfed by usng clusterng methods to separate tumor evoluton lneages. Am. The goal of ths project s to desgn a clusterng method that addresses the specfc challenges posed by the propertes of tumor genomc data: 1) the number of cluster s unknown; 2) data ponts are clustered accordng to ther belongng smplces; 3) the data clouds from two clusters may overlap; 4) there s no exstng probablstc model yet developed to descrbe the dstrbuton of the data ponts nsde a cluster; 5) the edges of two smplces may form sharp angles. Methods. The new clusterng method s based on standard medodshft but coupled wth specalzed non-postve kernel functons. It converges on vertces of the smplces nstead of densty maxma, and classfy data ponts wth respect to the dstance to the convergng vertces. The convergence s guaranteed when the shadow kernel functon s non-ncreasng and concave. Data. Synthetc datasets are produced from 5 hypothetcal tumor progresson scenaros. The method was also appled on two real breast cancer datasets and one real lung cancer dataset. Results. The new method shows consstently better clusterng than stand medodshft n all test scenaros, and mprovements over k-medods n certan applcatons. It also captures some nterestng structural characterstcs of the real cancer dataset that could not be detected by pror automated approaches to tumor mxture separaton. Conclusons. The medodshft-based method descrbed n ths work better handles the structural constrants of subdvdng smplcal complex by fndng cluster centers at vertces of the smplces, whereas standard medodshft s heavly nfluenced by the densty on shared margns and may fal to dstngush adjacent smplces. Comparng to the prevously used k-medods method, the new method removed the often dffcult model selecton queston of choosng k, whch requres a pror knowledge of the number of oncogenetc lneages. 1

2 1 Background Genomcs research has dramatcally mproved our understandng of the nature of tumor progresson and the means of possble treatment. Tumor progresson, or tumorgeness, s an evolutonary process by whch tumor cells accumulate successve mutatons that lead to decreased growth control, ncreased nvasveness and eventually metastass. There has been much nterest n reconstructng ths process of evoluton because of ts relevance to dentfyng the drvng factors of mutaton and predctng drug response and future prognoss. Our understandng of tumor progresson has been radcally reshaped by the applcaton of new technologes for probng the genome, gene and proten expresson profles of tumors, whch have made t possble to dentfy key sub-types of tumors that may be clncally ndstngushable yet have very dstnct prognoses and responses to treatments [1, 2, 3, 4]. Uncoverng these tumor sub-types has helped drve the development of novel therapeutcs, known as targeted therapeutcs, that are more specfcally targeted to the partcular genetc defects that cause each cancer [5, 6, 7]. Despte the profound mpact molecular genetcs had on cancer research, however, we are only startng to embrace the full complexty of tumor evoluton. As a result of that, some recognzed sub-types reman poorly defned and many patents do not fall nto any currently recognzed sub-type [8]. Nevertheless, clncal treatment of cancer could receve consderable beneft from better technques to dentfy sub-types mssed by the prevalng expresson clusterng approaches, ther dagnostc sgnatures, and the genes essental to the pathogencty of partcular sub-types [8]. More sophstcated computatonal models have stemmed from the feld of phylogenetcs, where tumors are not consdered as merely random collectons of mutated cells, but rather evolvng populatons. The poneerng work of Desper et al. nferred tumor pylogenes, or oncogenetc trees, by treatng observed tumors as leaf nodes n a speces tree, estmatng evolutonary dstances from genomc and gene expresson profles, and applyng a varety of methods to obtan reasonable models of the major progresson pathways by whch tumors evolve across a patent populaton. [9, 10, 11]. Maxmum lkelhood estmaton has also been used to work wth common measurements of tumor state [12]. An alternatve to the above tumor-by-tumor phylogenetcs, the cell-by-cell approach reles nstead on heterogenety between ndvdual cells wthn sngle tumors to dentfy lkely pathways of progresson [13, 14, 15, 16, 17]. The Pennngton model forms tumor evoluton as a Stener tree problem wthn ndvdual patents, and uses pooled data from many patents to buld a global consensus network descrbng common evolutonary pathways across a patent populaton [13, 14]. It s based on the assumpton that tumors preserve remnants of ther earler cell populatons as they develop, and any gven tumor sample wll therefore consst of a heterogeneous combnaton of cells at dfferent stages of progresson along a common pathway, as well as possbly contamnaton by healthy cells of varous knds. Ths assumpton s supported by numerous studes usng recent technques such as fluourescence n stu hybrdzaton (FISH) and sngle cell sequencng. [15, 16, 17, 18]. Each of the above two methods has ts shortcomngs. The tumor-by-tumor approach overlooks the ntratumor heterogenety that can provde valuable clues to tumor progresson, and the cell-by-cell approach consders the ntratumor heterogenety nformaton but at the cost of lmtng the number of probes per cell and therefore the resoluton of measure. The more recent sngle-cell sequencng technque ncreased the resoluton of a sngle cell measurement to the genomc level, but so far the data s affected by nose and the sample sze s lmted by the hgh cost [19, 20, 21, 22, 23]. 2 Related works Schwartz et al. proposed a gap-brdgng method that computatonally nfers cell populaton from tssue-wde measurements by usng unmxng, a mathematcal formalsm of the problem of separatng fundamental components of mxed samples n whch each observaton s presumed to be an unknown convex combnaton of several hdden components. [8]. In the context of tumor phylogeny, each component corresponds to a cell clone ( node ) on the underlyng oncogenetc tree. Ther specfc approach vews components as vertces of a mult-dmensonal smplex that encloses the observed ponts, whch makes unmxng essentally the problem of nferrng the vertces and boundares of the smplex from a survey of the ponts nsde t [24]. Tollver et al. revsed the 2

3 unmxng model and derved a soft geometrc unmxng algorthm that robustly handles nosy observatons [25]. Instead of focusng on sub-structural nformaton of the data clouds, other popular mxture nference methods are centered around the probablty densty estmaton that best fts the observed data [26]. Both optmzed for data acqured from sngle-nucleotde polymorphsm (SNP) array, ABSOLUTE fts optmal CNV model and uses a karyotype lkelhood model as a pror [27], and ASCAT uses addtonal nformaton such as varant allele frequency (VAF) based upon the avalablty of the raw sequencng data [28]. Some methods also reles on sequencng and VAF data, such as PyClone that uses herarchcal Bayesan clusterng [29] and ScClone that uses Bayersan mxture modelng [26]. CNAnorm s a another method desgned for sequencng data and nfers heterogenety and copy numbers separately, but t reles on the assumpton that tumor s largely monoclonal [30]. Assumng that the tumor sub-populatons can be dstngushed by CNV nformaton, the THetA algorthm poses heterogenety nference as a maxmum lkelhood mxture decomposton problem [31]. The above numerous strateges towards deconvoluton of tumor genomc data can typcally resolve up to approxmately 10 dstnct cell types, whch lmts ther ablty to resolve fner detals of cellular heterogenety wthn tumors [32]. To address ths ssue, Roman et al. proposed a methodologcal mprovement on genomc mxture modelng by explotng the fact that mxed genomc data from cells evolvng accordng to an evolutonary tree model would be expected to have fner mathematcal substructures than a sngle unform smplex assumed by pror work. In partcular, pont clouds produced by representng tumors as ponts n a genomc space (e.g., by gene expresson or gene copy numbers) would be expected to yeld smplcal complexes: conjunctons of low-dmensonal sub-smplces, correspondng to dstnct tumor subtypes, joned to one another va lower-dmensonal surfaces correspondng to shared ancestral cell populatons ncludng healthy cells [32]. Reconstructng these smplcal complexes can be consdered a specal case of the technque of manfold learnng [33]. In the work of Roman et al., k-medods clusterng s employed to separate pont clouds from dfferent sub-smplces, but t requres the pre-selecton of cluster number k that corresponds to the number of lneages n the unknown oncogenetc tree. In ths project, the structural-based clusterng approach s extended wth a medodshft-based method that s desgned to better dentfy clusters of ponts on dstnct low-dmensonal subspaces of the full manfold and remove the often dffcult task of model selecton regardng the choce of k. The standard medodshft method automatcally seeks the densty centers of the pont clouds known as modes, but t may break down n the context of separatng ndvdual smplces from smplcal complex due to the possblty that densty centers may locate on the conjuncton surface. The general methodology of medodshft provdes a good platform for desgnng problem-specfc classfer for tumor genomc data, as t s free from any pre-assumpton on data dstrbuton or cluster number, and guaranteed to converge wth only one-tme computaton of parwse dstances [34], 3 Problem descrpton The project s amed to desgn a problem-specfc algorthm for clusterng tumor samples wth respect to ther belongng smplces n a smplcal complex embedded n hgh-dmensonal featurespace. The vertces of each smplex are assumed to represent the cell clones that map to the nodes on the oncogenetc tree of the partcular type of tumor, and the number of shared nodes by two lneages determnes the number of vertces shared by two smplces. The nput dataset s organzed as a matrx whose rows and columns are assocated wth samples and measurements of certan features, respectvely. 4 Method 4.1 Algorthms and mplementaton Besdes the development of medodshft wth non-postve kernel (NPK), ths project also nvolves standard medodshft and k-medods as competng algorthms, and adjusted rand ndex as a measure of clusterng results. NPK-medodshft, standard medodshft and ARI are scrpted n MATLAB R2014b, and k-medods s mplemented as a bult-n functon of MATLAB R2014b wth k-means++ 3

4 seedng. The bult-n functon graphallshortestpaths of MATLAB R2014b s used whenever the shortest L2-squared path s needed as dstance metrc. 4.2 Standard medodshft revsted The dervaton of the medodshft algorthm manly follows the work of Shekh et al. [34] and Comancu et al. [35]. Smlar to meanshft, the medodshft algorthm s desgned to fnd the modes of estmated kernel densty, but only at data ponts rather than n the whole contnuous parameter space: f(x ) = 1 c n Φ(z j ) (1) j Here Φ( ) s the kernel functon usng profle notaton, and x, x j are data ponts such that 1 n, 1 j n. The factors c and h are the normalzer and kernel bandwdth, respectvely. The dstance measure z j can be any non-negatve dstance metrc between data ponts x and x j, for example, the scaled L2-squared dstance: z j = x x j 2 0 (2) h To formulate a medodshft task, frst the orgnal kernel, or shadow kernel Φ( ) must be translated nto a medodshft kernel by: where the teratve update rule s: ϕ( ) = Φ ( ) (3) y p+1 = arg mn y {x 1...x n} n ( x y 2 ϕ ( x y p h 2)) (4) Here y p s the medod at p th step. In fact, because the sequence of y belongs to the set of data ponts, one teraton of the above mnmzaton wll provde enough nformaton to trace out the convergence trajectory for all data ponts. The trace termnates when y k+1 = y k. Shekh et al. proved that medodshft s guaranteed to converge when the shadow kernel functon Φ( ) s convex. The resultng sequence y has the property that f(y p+q ) > f(y p ) for all q > 0 (Theorem 2.1 n [34]). Equvalently, a convex shadow kernel means ϕ ( ) 0, or ϕ(x) s a monotoncally non-ncreasng functon on x 0 because of the symmetry of Φ( ) and ϕ( ). 4.3 Medodshft wth non-postve kernel (NPK) As dscussed n the background secton, the standard medodshft clusterng may not work well for separatng smplces from a smplcal complex due to the possblty that the smplces, despte occupyng dstnct subspaces, may yeld pont clouds so overlappng that wll result n a super cluster that ncludes all adjacent smplces. By the structural characterstcs of a smplcal complex, a better strategy s to fnd the class centers located at densty mnma, often one vertex that s farthest from the densty center for each smplex, and group the ponts n the contanng smplex nto one cluster. Intutvely, ths can be acheved by reversng the sgn of the kernel functon n Equaton 3: φ( ) = Φ ( ) (5) 4

5 The convergence of medodshfht wth φ( ) s guaranteed wth a concave shadow kernel Φ( ) and the same update rule n Equaton 4 wth φ( ) nstead of ϕ( ). To prove that, t s suffcent to show that the resultng sequence y has the property that f(y p+q ) > f(y p ) for all q > 0. From Equaton 1, f(y p+1 ) f(y p ) = 1 c n ( Φ ( x y p+1 h 2) Φ ( x y p 2)) (6) h and the concavty of Φ( ), f(y p+1 ) f(y p ) 1 h 2 c n ( φ ( x y p h 2)( x y p+1 2 x y p 2)) (7) By the update rule n Equaton 4 and the termnaton at y p+1 = y p, n ( x y p+1 2 φ ( x y p 2)) < h n ( x y p 2 φ ( x y p h 2)) (8) Combnng the two above nequaltes, we conclude that f(y p+1 ) < f(p) for all p, thus the sequence f(y) s strctly decreasng. Ths guarantee of convergence wll however break down f a hard cut-off s appled. Hard cut-offs, such as Φ(x) = 0 f x > 1, are common for other kernels but they wll break the concavty of Φ(x). Although the convergence only requres concavty, non-decreasng Φ( ) such as log(z j + 1) wll result n an estmated kernel densty that s nversely proportonal to data densty, makng a decreasng sequence of f( ) converge at the densty maxma. A nonncreasng Φ( ) mples a non-postve frst dervatve,.e. φ( ) 0, and the concavty of Φ( ) mples φ ( ) 0,.e. φ( ) s non-ncreasng stage medodshft To suppress the nose n data clouds, a standard medodshft s used as a generc denoser before usng NPK medodshft. Wthout knowng the underlyng nose dstrbuton, we collapse the data clouds by steppng one teraton forward wth standard medodshft, and then use NPK medodshft to cluster the remanng ponts. Labelng s preserved when trackng the trajectory of medods across the two stages. 4.5 Assessment of clusterng results It s a non-trval work to fnd a good measure that tells the qualty of clusterng, as the values of true postve (TP), true negatve (TN), false postve (FP) and false negatve (FN) are hard to defne when the number of clusters gven by medodshft s not always the same as ground truth. One promsng measure s the adjusted Rand ndex (ARI), whch defnes TP, TN, FP and FN accordng to the parwse relatons among data ponts [36]: TP: a par of ponts that belong to the same true cluster are also n the same nferred cluster. TN: a par of ponts that belong to dfferent true clusters are not n the same nferred cluster. FP: a par of ponts that belong to dfferent true clusters are n the same nferred cluster. FN: a par of ponts that belong to the same true cluster are not n the same nferred cluster. Assume there are v true clusters denoted by V, 1 v, w nferred clusters denoted by W j, 1 j w, and m j = V W j, then the value of ARI s computed as [37]: ARI = ( mj ) 2 t3 (9) t 1 + t 2 t 3 v w j 5

6 where t 1 = 1 2 v ( ) V ; t 2 = w ( ) Wj ; t 3 = 8t 1t 2 2 n(n 1) j (10) One advantage of usng ARI s ts strngent penalty for random clusterng and trval clusterng. As mentoned above, standard medodshft often creates a trval cluster that ncludes all data ponts, whch wll be rated as low as 0 by ARI but stll as much as 0.5 by Jaccard ndex, another popular measure of clusterng results [38, 39]. 4.6 Kernel functons and bandwdth selecton In ths project, the followng shadow kernel and kernel are used wth the standard medodshft method and the frst stage of 2-stage medodshft: { 1 Φ 0 (z j ) = 2 (1 z j) 2 z j 1 0 z j > 1 { 1 zj z ϕ 0 (z j ) = j 1 0 z j > 1 (11) and the followng shadow kernel and kernel are used wth NPK medodshft and the second stage of 2-stage medodshft: Φ 1 (z j ) = z j exp(z j ) φ 1 (z j ) = 1 exp(z j ) (12) To provde a relatvely farer comparson across datasets wth varyng sparsty and dstance metrc, the kernel bandwdth was not set to a fxed value, but rather decded by the mean of parwse dstance and the choce of a bandwdth factor h c. The followng s an example for L2-squared dstance: h 2 = h c n 2 and the dstance metrc s therefore scaled as: n n x x j 2 (13) j n 2 x x j 2 z j = n n h c j x (14) x j 2 5 Dataset 5.1 Synthetc scenaros Evaluaton of the effectveness of the methods s complcated by the lack of ground truth mxture compostons from real heterogeneous tumor data. Synthetc datasets generated from smulated scenaros wth known mxture components, mxture fractons, and smplcal structures therefore provde a relatvely far testbed to quantfy effectveness of the methods. Fve tumor evoluton scenaros are consdered here, each nvolvng a model of tumors evolvng nto two or three subtypes: Scenaro T2E conssts of a model n whch a healthy state evolves nto an early precancerous state that subsequently branches nto two subtypes of late state (Fgure 1 A). Ths model results n a structure of two trangles sharng an edge (Fgure 1 F). One trangle has 50 data ponts and the other has

7 Fgure 1: The smulated tumor progresson scenaros for producng synthetc datasets. Subfgures A, B, C, D, E show fve oncogenetc trees of the synthetc model (T2E, T2V, Q2V, L3V, TLV, respectvely), each wth healthy cells at root, tumor progentor cells as nternal nodes, and tumor subtypes at leaves. Ther correspondng smplcal complex structures are shown n subfgures F, G, H, I, J, respectvely, and synthetc data ponts are randomly drawn from the enclosed smplces. Note that the structures are embedded n 10-Dmensonal space whle llustrated n 3-D for dagrammatc vew. Scenaro T2V conssts of two subtypes each wth ts own early and a late progresson stages (Fgure 1 B), yeldng a structure of two trangles joned at a vertex (Fgure 1 G). One trangle has 50 data ponts and the other has 100. Scenaro Q2V conssts of a more complex two-subtype model, n whch each subtype s defned by early, ntermedate, and late progresson stages (Fgure 1 C). The resultng structure s a par of tetrathedra joned at a vertex (Fgure 1 H). One tetrahedron has 50 data ponts and the other has 100. Scenaro L3V assumes an ancestral healthy cell subpopulaton that dverges nto three dscrete subtypes each wth a sngle progresson state (Fgure 1 D). The result s a smplcal complex structure consstng of three lnes joned at a vertex (Fgure 1 I). Each of the lnes have 25, 50, 75 data ponts, respectvely. Scenaro TLV conssts of two subtypes wth unequal number of ntermedate stages: one s the drect descendant of healthy cell, the other has one precancerous stage (Fgure 1 E). The resultng structure s a lne and a trangle sharng a vertex (Fgure 1 J). The lne has 25 data ponts and the trangle has 100 data ponts. The structures are embedded n 10-Dmensonal space by placng the vertex that represents healthy cells at the orgn, and each of the other vertces on a unque base vector wth unt dstance away from the orgn. Under noseless condton, data ponts are unformly randomly placed nsde the enclosed smplex; and under nosy condtons, Gaussan noses wth standard devaton σ = 0.05, 0.1, 0.2 are added to each dmenson of each data pont. 5.2 Real tumor data Usng array comparatve genomc hybrdzaton (acgh), a molecular cytogenetc technque for analysng copy number varatons (CNVs) relatve to plody level n the DNA of a test sample compared to a reference sample, Navn et al. assayed CNVs of 83,055 genes from 87 breast tumor samples, formng an 87-by-83,055 matrx [40]. Prncple component analyss (PCA) reduced the 7

8 dmensonalty down to 86, wth the projected space consstng of the frst 10 prncple components (PC) capturng approxmately 90% of the data covarance. Jones et al. measured the expresson levels of genes n 91 lung cancer cells by complementary DNA (cdna) mcroarray [41]. The pre-process of the dataset follows prevous work n Schwartz et al. [8], where the data ponts were translated from log space to lnear space and the upper and lower lmts are set to 2 5 and 2 5, respectvely. The values outsde the lmts were set to the closer lmt, and two outlers were dscard from the dataset. The dmensonalty was reduced to 88 by PCA. Retreved from the cancer genome atlas (TCGA), the thrd dataset conssts of 1100 breast tumor samples profled at 20,243 expresson levels wth RNASeq [42], a technology that uses the capabltes of next-generaton sequencng to reveal a snapshot of RNA presence and quantty from a genome at a gven moment n tme. Three outlers were dscarded, and the remanng data ponts were normalzed by dmenson-wse standard devaton. The dmensonalty was reduced to 1096 by PCA. For all three datasets, the data ponts projected n the PC space were lnearly scaled to ft n a range of [0, 1] n each dmenson. 6 Results The standard medodshft, NPK medodshft and 2-stage medodshft methods were appled to all synthetc scenaros (Fgure 2), where data ponts are randomly generated for 100 replcates per scenaro per choce of σ. For the frst 2 methods, h c takes the values from 0.2 to 2 wth a step sze of 0.2, and for 2-stage medodshft, h c s fxed at 1 for the frst stage and ranges from 0.2 to 2 wth a step sze of 0.2 for the second stage. The results of the three methods obtaned at h c = 1 are then compared aganst k-medods wth the data generated under the same scenaros and levels of nose (Fgure 3). By applyng NPK medodshft (wth h c = 1) on the frst 3-PC space of the real acgh data, one can fnd 3 lne-shaped clusters joned at one vertex, and the cluster centers located on the dsjoned endponts of the lnes (Fgure 4 B). The spatal dstrbuton of the data ponts projected n the frst 3- PC space may nspre people usng 3-medods clusterng, whch yelds ndeed qute smlar clusters for acgh data, but wth cluster centers located at the densty centers (Fgure 4 A). Appled to the frst 5-PC space of the acgh cancer data, NPK medodshft (wth h c = 1) fnds one more cluster that resdes near the orgn n the frst 3-PC space (Fgure 5 A). Although ndstngushable n the frst 3-PC space, the new cluster shows consderable dvergence n the addtonal PC spaces (Fgure 5 B, C) from the other clusters. The dstrbuton of cdna data ponts n the frst 3-PC space shows a smlar 3-arm shape to the acgh case, wth the dfference beng arched arms and sharper angles between each par of arms (Fgure 4 C, D). Shortest L2 squared path s used as dstance measure between ponts n the acgh data, and NPK medodshft (wth h c = 1) classfed each arm as ts own cluster wth centers located agan on the dsjoned endponts of the arms (Fgure 4 D). The applcaton of 3-medods faled to gve desrable clusters that could separate the arms (Fgure 4 C). The data clouds n TCGA dataset have a much noser appearance whch makes the separaton of potental clusters a much harder task than n the prevous two real datasets (Fgure 6). Coupled wth shortest L2-path dstance metrc and kernel bandwdth factor h c = 1, the applcaton of 2-stage medodshft on TCGA dataset dscovered 3 clusters n the frst 4-PC space (Fgure 6). Despte the nosness, the red cluster shapes lke a tetrahedron n the frst 3-PC space, and a flat dsk n the PC space; and the blue and black clusters appear to be a trangle and a rod respectvely n both PC and PC spaces (Fgure 6). 7 Dscusson Comparng across the fve scenaros, fndng the rght clusters for Q2V and T2V s much easer than for the others, because under these scenaros the two smplces share only one vertex whle havng most of the data clouds away from the jonng vertex. Scenaro T2E s a much harder case due to ts densty center beng placed on the jonng edge. In theory, TLV s an easer case than L3V, whch s 8

9 Fgure 2: Comparng clusterng results of synthetc Scenaros Q2V (B, C, D, E), T2V (G, H, I, J), TLV (L, M, N, O), L3V (Q, R, S, T) and T2E (V, W, X, Y) by standard medodshft (red curves), NPK medodshft (blue curves) and 2-stage medodshft (black curves). Gaussan nose s added to the datasets wth σ = 0 (B, G, L, Q, V), 0.05 (C, H, M, R, W), 0.1 (D, I, N, S, X), and 0.2 (E, J, O, T, Y). For each subfgure of errorbars, the vertcal axs ndcates the value of ARI for every subfgure, the horzontal axs ndcates the kernel bandwdth factor, and the errorbars show one standard devaton above and below the mean ARI value. The structural llustratons of the scenaros are shown n Subfgures A, F, K, P, U, respectvely. In general, 2-stage medodshft (black curves) gves the hghest ARI values, whle standard medodshft gves the lowest and most bandwdth-senstve ARI. proved by k-medods, but NPK medodshft has a slghtly lower success rate n clusterng TLV than L3V, whch s caused by occasonally mstakng the two vertces n the trangle as two endponts n dfferent smplces. All methods perform poorly at nose level 0.2 under all scenaros. Comparng across the frst three methods, the standard medodshft yelds low and h c -senstve ARIs for all scenaros, whle 2-stage medodshft gves overall the hghest and most stable ARIs. Scenaro TLV s relatvely senstve to kernel bandwdth wth NPK medodshft (Fgure 2 L, M, N, P), whch prefers the choce of smaller bandwdth value under all nose condtons. The performance of NPK medodshft s close to 2-stage medodshft n term of ARI value under Scenaro L3V (Fgure 2 Q, R, S, T). Addng k-medods to the comparson, t s obvous that the k-medods method has slght advantages over 2-stage medodshft under Scenaros Q2V (Fgure 3 B, C, D, E), T2V (Fgure 3 G, H, I, J) and TLV (Fgure 3 L, M, N), but slght dsadvantages under the other two harder scenaros (Fgure 3 Q, R, S, V, W, X). 9

10 Fgure 3: Comparng clusterng results of synthetc Scenaros Q2V (B, C, D, E), T2V (G, H, I, J), TLV (L, M, N, O), L3V (Q, R, S, T) and T2E (V, W, X, Y) by standard medodshft, NPK medodshft, 2-stage medodshft and k-medods wth the correct k. Gaussan nose s added to the datasets wth σ = 0 (B, G, L, Q, V), 0.05 (C, H, M, R, W), 0.1 (D, I, N, S, X), and 0.2 (E, J, O, T, Y). For each subfgure of boxplots, the vertcal axs ndcates the value of ARI, the horzontal axs corresponds to the lst of methods, the red bars ndcate the medan, the boxes nclude the frst to the thrd quarter quantle, the whskers reach to extrema, and the red pluses mark the outlers. The structural llustratons of the scenaros are shown n Subfgures A, F, K, P, U, respectvely. K- medods gves the hghest ARI under scenaros Q2V, T2V and TLV, whle 2-stage medodshft and NPK medodshft gve hgher ARI under scenaros L3V and T2E. Applcatons of NPK medodshft and 2-stage medodshft on real cancer datasets show reasonable clusterng results whch one would expect from ther sub-structural appearances. In the cases of acgh and cdna datasets, where data clouds spread n 3 dstnct arms n the frst 3-PC space, the NPK medodshft performs superor than standard medodshft even k-medods, whch s n accordance wth the smlar synthetc scenaro L3V. In the cases of cdna and TCGA datasets where nter-cluster dstances may be shorter than ntra-cluster dstances for some data ponts measured n Eucldean space, usng shortest path as dstance metrc mght be able to create the correct clusterng as n other cases of manfold learnng [33]. In comparson, standard medodshft faled to produce any desrable clusterng wth a range of bandwdth when beng appled to the frst 3-PC space of real acgh and cdna data(fgure 9). Affected 10

11 Fgure 4: Comparng clusterng results of the acgh (A, B) and cdna (C, D) data projected n the space consstng of the frst 3 PC by 3-medods (A, C) and NPK medodshft (B, D). Data ponts are colored wth respect to ther belongng clusters, and the crcles ndcate the correspondng cluster centers. The 3-medods method found the cluster center located at local densty maxma, whle NPK medodshft found them located at vertces farthest away from the conjont pont. by the fluctuaton n the spatal densty of the data cloud, a small bandwdth may lead to too many smaller clusters, and a large bandwdth may nclude two or more clusters n a super cluster. K-medods works reasonably well wth the correct number of clusters. In the applcatons on acgh and cdna datasets, t seems obvous that k should be set to 3 whle clusterng the frst 3-PC space wth k-medods method n both cases, but addng more PCs to the space hnders the determnaton of cluster numbers by vsually checkng the dstrbuton of data ponts, and the use of k-medods method therefore requres the nference of k by model selecton under these condtons. Medodshft-based methods do not rely on predefned cluster numbers; nstead the number of nferred clusters s nfluenced by kernel bandwdth, and such nfluence s less promnent wth NPK medodshft. 8 Lmtatons When developng the synthetc scenaros, the vertces of the smplces are assumed to occupy dstnct subspaces n the synthetc scenaros, resultng n perpendcular edges joned at the orgn. Ths assumpton follows as a consequence of embeddng a low-dmensonal manfold n a much hgher dmenson, where t s less lkely to have more than one vertex n one dmenson. Another smplfcaton n the scenaros s that the vertces have equal dstance from the orgnal, whch follows the sole crteron for classfyng data ponts wth respect to the found cluster centers beng mnmzng the dstance from each data pont to the center of ts belongng cluster. As a consequence of that, when a vertex s chosen to be the center of a smplex by NPK medodshft, the correctness of clusterng reles on the dstance from that vertex to the decson boundary. Takng scenaro L3V for an example, the resultng clusters mght be skewed f the lnes have dfferent length measured n 11

12 certan dstance metrc. A possble remedy for more complcated stuaton n manfold learnng s to develop more sophstcated metrcs. Lots of factors n the real world may affect the accuracy of clusterng, wth an mportant one beng the nosy observatons. In the synthetc scenaros, Gaussan nose s added wth standard devaton up to σ = 0.2, and the k-medods and 2-stage medodshft can produce meanngful clusters up to σ = 0.1 for some scenaros. It s non-trval to estmate the nose nduced by expermental measures wth dfferent genomc proflng technques; Su et al. estmated σ to range from 0.05 to 0.38 for RNA-seq [43]. When these datasets usually have much hgher dmensonalty than the number of data ponts, PCA may serve as a denoser but complcates the quantfcaton of the remanng nose after PCA. Two other factors - the surface shared by two or more smplces and the number of data ponts n each smplex - are nherent to the structure of the tumor phylogenetc tree and the source of the samples. The results from clusterng the synthetc datasets show the tendency of a less accurate clusterng f a hgher proporton of the precancerous states s shared among the lneages. The synthetc scenaros are created delberately wth unequal cluster sze, whch accounts for the uncertanty n the possble real applcatons to some extent. The accuracy of every method ncreases substantally when data ponts are evenly dstrbuted across the smplces (results not shown). The clusterng results are nfluenced by more PCs addng to the space (Fgure 7, 8), but whether the fluctuaton n clusterng s a reflecton of actual oncogenetc meanng or merely drven by nose requres further oncogenetc analyss. 9 Concluson The work n ths project fts n the framework of tumor phylogenetc study by facltatng unmxng tumor heterogenety wth the separaton of possble phylogenetc lneages. The new-ntroduced NPK medodshft method works as a desrable classfer that handles the specal challenges n clusterng generc tumor genomc data, and s less dependent on the a pror knowledge and expermental source of the data. Non-conventonal kernels such as Φ 1 ( ) and φ 1 ( ) may have lttle meanng n most statstcal applcatons that nvolve kernel densty estmaton, but they serve the purpose n fndng data ponts located at geometrc extrema of the smplces. Applcaton on the synthetc scenaros proves that 2-stage medodshft can produce clusters of smlar qualty to k-medods wth known cluster numbers, and outperforms standard medodshft method n general. NPK medodshft and 2-stage medodshft captured prevously undscovered sub-spacal characterstcs of the pont clouds from real breast cancer data, but the bologcal meanng of the dscovered clusters n real dataset requres verfcaton from unmxng and further oncogenetc studes. 10 Future works The results suggest further work may be needed to buld more realstc synthetc scenaros that can better reflect the nature of tumor genomc data. Some drectons seem promsng n makng mprovements n synthetc data generaton, for example, dervng a better data generatng method that approxmates the nose n real measurements, and desgnng new scenaros wthout the assumptons of vertces occupyng ndvdual dmenson and equal dstance from vertces to the orgn. In case of clusterng data wth more complcated structures, one mght consder fndng a problemspecfc dstance metrc, but mght also beneft from a problem-specfc kernel functon. Unlke standard kernel densty estmaton where the choce of kernel functons plays a much less mportant role than that of the kernel bandwdth, couplng medodshft wth dfferent NPKs may lead to unexpected clusterng results. For example, by usng a flat kernel φ 1 ( ) = 1 medodshft wll smply create a sngle trval cluster centered at the pont that has the largest total dstance to the other ponts: y = arg mn y {x 1...x n} n x y 2 (15) 12

13 The use of φ 1 (z j ) = zj r wll result n undesrable nsenstvty to the choce of bandwdth: y p+1 = arg mn y {x 1...x n} = arg mn n y {x 1...x n} h 2r = arg mn y {x 1...x n} 1 x y 2( x y p 2r h 2r ) n x y 2 x y p 2r n x y 2 x y p 2r (16) The NPK φ 1 (z j ) = exp( z j ) 1 has some nterestng features. When h s too small, exp( z j ) 1 approxmates -1 and creates trval clusterng. When h s too large, exp( z j ) 1 approxmates z j and becomes nsenstve to h. Nevertheless, the results from clusterng tumor genomc data, even the canddates from unmxng each cluster, are nconclusve of any bologcal meanng untl further oncogenetc valdatons have been performed. Acknowledgment The author apprecates the advsory of Dr. Russell Schwartz and the effort of the commttee members. Specal thanks to Theodore Roman for answerng countless questons, and to Dr. Roy Maxon for formulatng the layout of ths report. Ths work s funded by NIH Award R01CA and the Department of Computatonal Bology, School of Computer Scence, Carnege Mellon Unversty. References [1] Golub T, Slonm D, Tamayo P, Huard C, Gaasenbeek M, et al. (1999) Molecular classfcaton of cancer: class dscovery and class predcton by gene expresson montorng. Scence, 286: [2] Perou C, Sorle T, Esen M, Rjn van der M, Rees S, et al. (2000) Molecular portrats of human breast tumors. Nature, 406: [3] Sorle T, Perou C, Tbshran R, Aas T, Gesler S, et al. (2001) Gene expresson profles of breast carcnomas dstngush tumor subclasses wth clncal mplcatons. Proceedngs of Natonal Academy of Scence, 98: [4] Sorle T, Tbshran R, Parker J, Haste T, Marron J, et al. (2003) Repeated observaton of breast tumor subtypes n ndepednent gene expresson data sets. Proceedngs of Natonal Academy of Scence, 100: [5] Pegram M, Konecny G, Slamon D. (2000) The molecular and cellular bology of HER2/neu gene amplfcaton/overexpresson and the clncal development of herceptn (trastuzumab) therapy for breast cancer. Cancer Treatment Research, 103: [6] Atkns J, Gershell L. (2002) From the analyst s couch: Selectve antcancer drugs. Nataure Revews Cancer, 2: [7] Bld A, Pott A, Nevns J. (2006) Opnon: Lnkng oncogenc pathways wth therapeutc opportuntes. Nature Revews Cancer, 6: [8] Schwartz R, Shackney S. (2010) Applyng unmxng to gene expresson data for tumor phylogeny nference. BMC Bonformatcs, 11(42): [9] Desper R, Jang F, Kallonem O, Moch H, Papadmtrou C, et al. (1999) Inferrng tree models for oncogeness from comparatve genome hybrdzaton data. Journal of Computatonal Bology, 6: [10] Desper R, Jang F, Kallonem O, Moch H, Papadmtrou C, et al. (2000) Dstance-based reconstructon of tree models for oncogeness. Journal of Computatonal Bology, 7:

14 [11] Desper R, Khan J, Schaffer A. (2004) Tumor classfcaton usng phylogenetc methods on expresson data. Journal of Theoretcal Bology, 228: [12] von Heydebreck A, Gunawan B, Fzes L. (2004) Maxmum lkelhood estmaton of oncogenetc tree models. Bostatstcs, 5: [13] Pennngton G, Smth C, Shackney S, Schwartz R. (2006) Expectaton-maxmzaton method for the reconstructon of tumor phylogenes from sngle-cell data. Computatonal Systems Bonformatcs Conference, [14] Pennngton G, Smth C, Shackney S, Schwartz R. (2007) Reconstructng tumor phylogenes from sngle-cell data. Journal of Bonformatcs and Computatonal Bology, 5: [15] Smth C, Pollce A, Gu L, Brown K, Sngh S, et al. (2000) Correlatons among p53, Her- 2/neu and ras overexpresson and aneuoplody by multparameter flow cytometry n human breast cancer: evdence fro a common phenotypc evolutonary pattern n nfltratng ductal carcnomas. Clncal Cancer Research, 6: [16] Janocko L, Brown K, Smth C, Gu L, Pollce A, et al. (2001) Dstnctve patterns of Her-2/neu, c-myc and cycln D1 gene amplfcaton by fluorescence n stu hybrdzaton n prmary human breast cancers. Cytometry, 46: [17] Shackney S, Smth C, Pollce A, Brown K, Day R, et al. (2004) Intracellular patterns of Her- 2/neu, ras, and plody abnormaltes n prmary human breast cancers predct postoperatve clncal dsease-free survval. Clncal Cancer Research, 10: [18] Shah S, Morn R, Khattra J, Prentce L, Pugh T, et al. (2009) Mutatonal evoluton n a lobular breast tumour profled at sngle nucleotde resoluton. Nature, 461: [19] Navn N, Kendall J, Troge J, Andrews P, Rodgers L, et al. (2011) Tumour evoluton nferred by sngle-cell sequencng. Nature, 472(7341): [20] Wang D, Bodovtz S. (2010) Sngle cell analyss: the new fronter n omcs. Trends n botechnology, 28(6): [21] Tao Y, Ruan J, Yeh S, Lu X, Wang Y, et al. (2011) Rapd growth of a hepatocellular carcnoma and the drvng mutatons revealed by cell-populaton genetc analyss of whole-genome data. Proceedngs of the Natonal Academy of Scences, 108(29): [22] Hou Y, Song L, Zhu P, Zhang B, Tao Y, et al. (2012) Sngle-cell exome sequencng and monoclonal evoluton of a jak2-negatve myeloprolferatve neoplasm. Cell, 148(5): [23] Xu X, Hou Y, Yn X, Bao L, Tang A, et al. (2012) Sngle-cell exome sequencng reveals snglenucleotde mutaton characterstcs of a kdney tumor. Cell, 148(5): [24] Ehrlch R, Full W. (1987) Sortng out geology - unmxng mxtures. Use and Abuse of Statstcal Methods n the Earth Scences, Oxford Unversty Press, [25] Tollver D, Tsourakaks C, Subramanan A, Shackney S, Schwartz R. (2010) Robust unmxng of tumor states n array comparatve genomc hybrdzaton data. Bonformatcs, 26(ISMB): [26] Dng L, Wendl M, McMchael J, Raphael B. (2014) Expandng the computatonal toolbox for mnng cancer genomes. Nature Revews, 15: [27] Carter S, Cbulsks K, Helman E, McKenna A, Shen H, et al. (2012) Absolute quantfcaton of somatc DNA alteratons n human cancer. Nature Botechnology, 30: [28] van Loo P, Nordgard S, Lngjærde O, Russnes H, Rye I, et al. (2010) Allele-specfc copy number analyss of tumors. Proceedngs of the Natonal Academy of Scence, 107(39): [29] Shah S, Roth A, Goya R, Oloum A, Ha G, et al. (2012) The clonal and mutatonal evoluton spectrum of prmary trple-negatve breast cancers. Nature, 486: [30] Gusnanto A, Wood H, Pawtan Y, Rabbtts P, Berr S. (2012) Correctng for cancer genome sze and tumour cell content enables better estmaton of copy number alteratons from nextgeneraton sequence data. Bonformatcs, 28(1): [31] Oesper L, Mahmoody A, Raphael B. (2013) THetA: nferrng ntra-tumor heterogenety from hgh-throughput DNA sequencng data. Genome Bology, 14: R80. 14

15 [32] Roman T, Nayyer A, Fasy B, Schwartz R. (2014) A smplcal complex-based approach to unmxng tumor progresson data. Submtted [33] Rowes S, Saul L. (2000) Nonlnear dmensonalty reducton by locally lnear embeddng. Scence, 290(5500): [34] Shekh Y, Khan E, Kanade T. (2007) Mode-seekng by medodshft. IEEE 11th Internatonal Conference on Computer Vson (ICCV). [35] Comancu D, Meer P. (2002) Mean Shft: A Robust Approach Toward Feature Space Analyss. IEEE Transactons on Pattern Analyss and Machne Intellgence, 24(5): [36] Hubert L, Arabe P. (1985) Comparng parttons. Journal of Classfcaton, 2: [37] Kuncheva L, Hadjtodorov S. (2004) Usng Dversty n Cluster Ensembles. IEEE SMC Internatonal Conference on Systems, Man and Cybernetcs. [38] Wagner S, Wagner D. (2007) Comparng Clusterngs - An Overvew. [39] Denœud L, Garreta H, Guénoche A. (2006) Comparson of dstance ndces between parttons. t Studes n Classfcaton, Data Analyss, and Knowledge Organzaton, [40] Navn N, Krasntz A, Rodgers L, Cook K, Meth J, et al. (2010) Inferrng tumor progresson from genomc heterogenety. Genome research, 20(1): [41] Jones M, Vrtanen C, Honjoh D, Myosh C, Satoh Y, et al. (2004) Two prognostcally sgnfcant subtypes of hgh-grade lung neuroendocrne tumours ndependent of small-cell and large-cell neuroendocrne carcnomas dentfed by gene expresson profles. Lancet, 363: [42] Wensten J, Collsson E, Mlls G, Shaw K, Ozenberger B, et al. (2013) The cancer genome atlas pan-cancer analyss project. Nature genetcs, 45(10): [43] Su Z, Łabaj P, L S, Therry-Meg J, Therry-Meg D, et al. (2014) A comprehensve assessment of RNA-seq accuracy, reproducblty and nformaton content by the sequencng qualty control consortum. Nature Botechnology, 32(9):

16 Fgure 5: Resultng clusters of the acgh data projected n the space consstng of the frst 5 PC by NPK medodshft. Data ponts are colored wth respect to ther belongng clusters. The data ponts are shown n 3D spaces consstng of the 1st-2nd-3rd (A), 2nd-3rd-4th (B), and 3rd-4th-5th (C) PC. Despte ndstngushable n Subfgure A, the magenta cluster dverse from the other clusters n Subfgures B and C. 16

17 Fgure 6: Resultng clusters of TCGA data projected n the space consstng of the frst 4 PC by 2-stage medodshft. Data ponts are colored wth respect to ther belongng clusters. The data ponts are shown n dfferent vew angles n 3D spaces consstng of the 1st-2nd-3rd (A, C, E, G) and 2nd-3rd-4th (B, D, F, H) PC. The clusters appear to have dstnct shapes and spacal occupaton. 17

18 Fgure 7: Resultng clusters of acgh data projected n the space consstng of the frst 8 PC by medodshft wth non-postve kernel. Data ponts are colored wth respect to ther belongng clusters. The data ponts are shown n 3D spaces consstng of the 1st-2nd-3rd (A), 2nd-3rd-4th (B), 3rd-4th-5th (C), 4th-5th-6th (D), 5th-6th-7th (E), and 6th-7th-8th (F) PC. The ncreased dmensonalty nfluences the clusterng result wth the rsk of fttng nose. 18

19 Fgure 8: Resultng clusters of cdna data projected n the space consstng of the frst 6 PC by medodshft wth non-postve kernel. Data ponts are colored wth respect to ther belongng clusters. The data ponts are shown n 3D spaces consstng of the 1st-2nd-3rd (A), 2nd-3rd-4th (B), 3rd-4th-5th (C), and 4th-5th-6th (D) PC. The ncreased dmensonalty nfluences the clusterng result wth the rsk of fttng nose. 19

20 Fgure 9: Comparng clusterng results of acgh data projected n the space consstng of the frst 3 PC by standard medodshft wth kernel bandwdth factors of 0.5 (A), 1.0 (B), 1.5 (C), 2.0 (D), 2.5 (E), and 3.0 (F). Cluster centers are marked wth red crcles. The clusterng s senstve to the choce of bandwdth factor, and t fals to gve desrable clusterng n all cases. 20

21 Fgure 10: Comparng clusterng results of cdna data projected n the space consstng of the frst 3 PC by standard medodshft wth kernel bandwdth factors of 0.5 (A), 1.0 (B), 1.5 (C), 2.0 (D), 2.5 (E), and 3.0 (F). Cluster centers are marked wth red crcles. The clusterng s senstve to the choce of bandwdth factor, and t fals to gve desrable clusterng n all cases. 21

Physical Model for the Evolution of the Genetic Code

Physical Model for the Evolution of the Genetic Code Physcal Model for the Evoluton of the Genetc Code Tatsuro Yamashta Osamu Narkyo Department of Physcs, Kyushu Unversty, Fukuoka 8-856, Japan Abstract We propose a physcal model to descrbe the mechansms

More information

Copy Number Variation Methods and Data

Copy Number Variation Methods and Data Copy Number Varaton Methods and Data Copy number varaton (CNV) Reference Sequence ACCTGCAATGAT TAAGCCCGGG TTGCAACGTTAGGCA Populaton ACCTGCAATGAT TAAGCCCGGG TTGCAACGTTAGGCA ACCTGCAATGAT TTGCAACGTTAGGCA

More information

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores Parameter Estmates of a Random Regresson Test Day Model for Frst Three actaton Somatc Cell Scores Z. u, F. Renhardt and R. Reents Unted Datasystems for Anmal Producton (VIT), Hedeweg 1, D-27280 Verden,

More information

Study and Comparison of Various Techniques of Image Edge Detection

Study and Comparison of Various Techniques of Image Edge Detection Gureet Sngh et al Int. Journal of Engneerng Research Applcatons RESEARCH ARTICLE OPEN ACCESS Study Comparson of Varous Technques of Image Edge Detecton Gureet Sngh*, Er. Harnder sngh** *(Department of

More information

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures Usng the Perpendcular Dstance to the Nearest Fracture as a Proxy for Conventonal Fracture Spacng Measures Erc B. Nven and Clayton V. Deutsch Dscrete fracture network smulaton ams to reproduce dstrbutons

More information

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16 310 Int'l Conf. Par. and Dst. Proc. Tech. and Appl. PDPTA'16 Akra Sasatan and Hrosh Ish Graduate School of Informaton and Telecommuncaton Engneerng, Toka Unversty, Mnato, Tokyo, Japan Abstract The end-to-end

More information

Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago

Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago Jont Modellng Approaches n dabetes research Clncal Epdemology Unt, Hosptal Clínco Unverstaro de Santago Outlne 1 Dabetes 2 Our research 3 Some applcatons Dabetes melltus Is a serous lfe-long health condton

More information

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) Internatonal Assocaton of Scentfc Innovaton and Research (IASIR (An Assocaton Unfyng the Scences, Engneerng, and Appled Research Internatonal Journal of Emergng Technologes n Computatonal and Appled Scences

More information

AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION

AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION www.arpapress.com/volumes/vol8issue2/ijrras_8_2_02.pdf AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION I. Jule 1 & E. Krubakaran 2 1 Department

More information

Using Past Queries for Resource Selection in Distributed Information Retrieval

Using Past Queries for Resource Selection in Distributed Information Retrieval Purdue Unversty Purdue e-pubs Department of Computer Scence Techncal Reports Department of Computer Scence 2011 Usng Past Queres for Resource Selecton n Dstrbuted Informaton Retreval Sulleyman Cetntas

More information

Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer

Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer Gene Selecton Based on Mutual Informaton for the Classfcaton of Mult-class Cancer Sheng-Bo Guo,, Mchael R. Lyu 3, and Tat-Mng Lok 4 Department of Automaton, Unversty of Scence and Technology of Chna, Hefe,

More information

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE JOHN H. PHAN The Wallace H. Coulter Department of Bomedcal Engneerng, Georga Insttute of Technology, 313 Ferst Drve Atlanta,

More information

Survival Rate of Patients of Ovarian Cancer: Rough Set Approach

Survival Rate of Patients of Ovarian Cancer: Rough Set Approach Internatonal OEN ACCESS Journal Of Modern Engneerng esearch (IJME) Survval ate of atents of Ovaran Cancer: ough Set Approach Kamn Agrawal 1, ragat Jan 1 Department of Appled Mathematcs, IET, Indore, Inda

More information

Optimal Planning of Charging Station for Phased Electric Vehicle *

Optimal Planning of Charging Station for Phased Electric Vehicle * Energy and Power Engneerng, 2013, 5, 1393-1397 do:10.4236/epe.2013.54b264 Publshed Onlne July 2013 (http://www.scrp.org/ournal/epe) Optmal Plannng of Chargng Staton for Phased Electrc Vehcle * Yang Gao,

More information

Reconstruction of gene regulatory network of colon cancer using information theoretic approach

Reconstruction of gene regulatory network of colon cancer using information theoretic approach Reconstructon of gene regulatory network of colon cancer usng nformaton theoretc approach Khald Raza #1, Rafat Parveen * # Department of Computer Scence Jama Mlla Islama (Central Unverst, New Delh-11005,

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and Ths artcle appeared n a journal publshed by Elsever. The attached copy s furnshed to the author for nternal non-commercal research and educaton use, ncludng for nstructon at the authors nsttuton and sharng

More information

Investigation of zinc oxide thin film by spectroscopic ellipsometry

Investigation of zinc oxide thin film by spectroscopic ellipsometry VNU Journal of Scence, Mathematcs - Physcs 24 (2008) 16-23 Investgaton of znc oxde thn flm by spectroscopc ellpsometry Nguyen Nang Dnh 1, Tran Quang Trung 2, Le Khac Bnh 2, Nguyen Dang Khoa 2, Vo Th Ma

More information

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana Internatonal Journal of Appled Scence and Technology Vol. 5, No. 6; December 2015 Modelng the Survval of Retrospectve Clncal Data from Prostate Cancer Patents n Komfo Anokye Teachng Hosptal, Ghana Asedu-Addo,

More information

CLUSTERING is always popular in modern technology

CLUSTERING is always popular in modern technology Max-Entropy Feed-Forward Clusterng Neural Network Han Xao, Xaoyan Zhu arxv:1506.03623v1 [cs.lg] 11 Jun 2015 Abstract The outputs of non-lnear feed-forward neural network are postve, whch could be treated

More information

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters Tenth Internatonal Conference on Computatonal Flud Dynamcs (ICCFD10), Barcelona, Span, July 9-13, 2018 ICCFD10-227 Predcton of Total Pressure Drop n Stenotc Coronary Arteres wth Ther Geometrc Parameters

More information

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA Journal of Theoretcal and Appled Informaton Technology 2005 ongong JATIT & LLS ISSN: 1992-8645 www.jatt.org E-ISSN: 1817-3195 A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA 1 SUNGMIN

More information

Statistically Weighted Voting Analysis of Microarrays for Molecular Pattern Selection and Discovery Cancer Genotypes

Statistically Weighted Voting Analysis of Microarrays for Molecular Pattern Selection and Discovery Cancer Genotypes IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.2, December 26 73 Statstcally Weghted Votng Analyss of Mcroarrays for Molecular Pattern Selecton and Dscovery Cancer Genotypes

More information

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx Neuropsychologa xxx (200) xxx xxx Contents lsts avalable at ScenceDrect Neuropsychologa journal homepage: www.elsever.com/locate/neuropsychologa Storage and bndng of object features n vsual workng memory

More information

Computing and Using Reputations for Internet Ratings

Computing and Using Reputations for Internet Ratings Computng and Usng Reputatons for Internet Ratngs Mao Chen Department of Computer Scence Prnceton Unversty Prnceton, J 8 (69)-8-797 maoch@cs.prnceton.edu Jaswnder Pal Sngh Department of Computer Scence

More information

A Novel artifact for evaluating accuracies of gear profile and pitch measurements of gear measuring instruments

A Novel artifact for evaluating accuracies of gear profile and pitch measurements of gear measuring instruments A Novel artfact for evaluatng accuraces of gear profle and ptch measurements of gear measurng nstruments Sonko Osawa, Osamu Sato, Yohan Kondo, Toshyuk Takatsuj (NMIJ/AIST) Masaharu Komor (Kyoto Unversty)

More information

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field Subject-Adaptve Real-Tme Sleep Stage Classfcaton Based on Condtonal Random Feld Gang Luo, PhD, Wanl Mn, PhD IBM TJ Watson Research Center, Hawthorne, NY {luog, wanlmn}@usbmcom Abstract Sleep stagng s the

More information

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis The Lmts of Indvdual Identfcaton from Sample Allele Frequences: Theory and Statstcal Analyss Peter M. Vsscher 1 *, Wllam G. Hll 2 1 Queensland Insttute of Medcal Research, Brsbane, Australa, 2 Insttute

More information

A New Machine Learning Algorithm for Breast and Pectoral Muscle Segmentation

A New Machine Learning Algorithm for Breast and Pectoral Muscle Segmentation Avalable onlne www.ejaet.com European Journal of Advances n Engneerng and Technology, 2015, 2(1): 21-29 Research Artcle ISSN: 2394-658X A New Machne Learnng Algorthm for Breast and Pectoral Muscle Segmentaton

More information

Lymphoma Cancer Classification Using Genetic Programming with SNR Features

Lymphoma Cancer Classification Using Genetic Programming with SNR Features Lymphoma Cancer Classfcaton Usng Genetc Programmng wth SNR Features Jn-Hyuk Hong and Sung-Bae Cho Dept. of Computer Scence, Yonse Unversty, 134 Shnchon-dong, Sudaemoon-ku, Seoul 120-749, Korea hjnh@candy.yonse.ac.kr,

More information

Balanced Query Methods for Improving OCR-Based Retrieval

Balanced Query Methods for Improving OCR-Based Retrieval Balanced Query Methods for Improvng OCR-Based Retreval Kareem Darwsh Electrcal and Computer Engneerng Dept. Unversty of Maryland, College Park College Park, MD 20742 kareem@glue.umd.edu Douglas W. Oard

More information

Introduction ORIGINAL RESEARCH

Introduction ORIGINAL RESEARCH ORIGINAL RESEARCH Assessng the Statstcal Sgnfcance of the Acheved Classfcaton Error of Classfers Constructed usng Serum Peptde Profles, and a Prescrpton for Random Samplng Repeated Studes for Massve Hgh-Throughput

More information

Project title: Mathematical Models of Fish Populations in Marine Reserves

Project title: Mathematical Models of Fish Populations in Marine Reserves Applcaton for Fundng (Malaspna Research Fund) Date: November 0, 2005 Project ttle: Mathematcal Models of Fsh Populatons n Marne Reserves Dr. Lev V. Idels Unversty College Professor Mathematcs Department

More information

Evaluation of the generalized gamma as a tool for treatment planning optimization

Evaluation of the generalized gamma as a tool for treatment planning optimization Internatonal Journal of Cancer Therapy and Oncology www.jcto.org Evaluaton of the generalzed gamma as a tool for treatment plannng optmzaton Emmanoul I Petrou 1,, Ganesh Narayanasamy 3, Eleftheros Lavdas

More information

Appendix F: The Grant Impact for SBIR Mills

Appendix F: The Grant Impact for SBIR Mills Appendx F: The Grant Impact for SBIR Mlls Asmallsubsetofthefrmsnmydataapplymorethanonce.Ofthe7,436applcant frms, 71% appled only once, and a further 14% appled twce. Wthn my data, seven companes each submtted

More information

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect Peer revew stream A comparson of statstcal methods n nterrupted tme seres analyss to estmate an nterventon effect a,b, J.J.J., Walter c, S., Grzebeta a, R. & Olver b, J. a Transport and Road Safety, Unversty

More information

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy Appendx for Insttutons and Behavor: Expermental Evdence on the Effects of Democrac 1. Instructons 1.1 Orgnal sessons Welcome You are about to partcpate n a stud on decson-makng, and ou wll be pad for our

More information

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015 Incorrect Belefs Overconfdence Econ 1820: Behavoral Economcs Mark Dean Sprng 2015 In objectve EU we assumed that everyone agreed on what the probabltes of dfferent events were In subjectve expected utlty

More information

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo Estmaton for Pavement Performance Curve based on Kyoto Model : A Case Study for Kazuya AOKI, PASCO CORPORATION, Yokohama, JAPAN, Emal : kakzo603@pasco.co.jp Octávo de Souza Campos, Publc Servces Regulatory

More information

A Geometric Approach To Fully Automatic Chromosome Segmentation

A Geometric Approach To Fully Automatic Chromosome Segmentation A Geometrc Approach To Fully Automatc Chromosome Segmentaton Shervn Mnaee ECE Department New York Unversty Brooklyn, New York, USA shervn.mnaee@nyu.edu Mehran Fotouh Computer Engneerng Department Sharf

More information

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data Unobserved Heterogenety and the Statstcal Analyss of Hghway Accdent Data Fred L. Mannerng Professor of Cvl and Envronmental Engneerng Courtesy Department of Economcs Unversty of South Florda 4202 E. Fowler

More information

An Introduction to Modern Measurement Theory

An Introduction to Modern Measurement Theory An Introducton to Modern Measurement Theory Ths tutoral was wrtten as an ntroducton to the bascs of tem response theory (IRT) modelng and ts applcatons to health outcomes measurement for the Natonal Cancer

More information

INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER

INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER LI CHEN 1,2, JIANHUA XUAN 1,*, JINGHUA GU 1, YUE WANG 1, ZHEN ZHANG 2, TIAN LI WANG 2, IE MING SHIH 2 1The Bradley Department

More information

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION computng@tanet.edu.te.ua www.tanet.edu.te.ua/computng ISSN 727-6209 Internatonal Scentfc Journal of Computng FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION Gábor Takács ), Béla Patak

More information

Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO

Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO Zuo et al. BMC Bonformatcs (2017) 18:99 DOI 10.1186/s12859-017-1515-1 METHODOLOGY ARTICLE Open Access Incorporatng pror bologcal knowledge for network-based dfferental gene expresson analyss usng dfferentally

More information

Biomarker Selection from Gene Expression Data for Tumour Categorization Using Bat Algorithm

Biomarker Selection from Gene Expression Data for Tumour Categorization Using Bat Algorithm Receved: March 20, 2017 401 Bomarker Selecton from Gene Expresson Data for Tumour Categorzaton Usng Bat Algorthm Gunavath Chellamuthu 1 *, Premalatha Kandasamy 2, Svasubramanan Kanagaraj 3 1 School of

More information

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex 1 Sparse Representaton of HCP Grayordnate Data Reveals Novel Functonal Archtecture of Cerebral Cortex X Jang 1, Xang L 1, Jngle Lv 2,1, Tuo Zhang 2,1, Shu Zhang 1, Le Guo 2, Tanmng Lu 1* 1 Cortcal Archtecture

More information

ALMALAUREA WORKING PAPERS no. 9

ALMALAUREA WORKING PAPERS no. 9 Snce 1994 Inter-Unversty Consortum Connectng Unverstes, the Labour Market and Professonals AlmaLaurea Workng Papers ISSN 2239-9453 ALMALAUREA WORKING PAPERS no. 9 September 211 Propensty Score Methods

More information

EVALUATION OF BULK MODULUS AND RING DIAMETER OF SOME TELLURITE GLASS SYSTEMS

EVALUATION OF BULK MODULUS AND RING DIAMETER OF SOME TELLURITE GLASS SYSTEMS Chalcogende Letters Vol. 12, No. 2, February 2015, p. 67-74 EVALUATION OF BULK MODULUS AND RING DIAMETER OF SOME TELLURITE GLASS SYSTEMS R. EL-MALLAWANY a*, M.S. GAAFAR b, N. VEERAIAH c a Physcs Dept.,

More information

A Support Vector Machine Classifier based on Recursive Feature Elimination for Microarray Data in Breast Cancer Characterization. Abstract.

A Support Vector Machine Classifier based on Recursive Feature Elimination for Microarray Data in Breast Cancer Characterization. Abstract. A Support Vector Machne Classfer based on Recursve Feature Elmnaton for Mcroarray Data n Breast Cancer Characterzaton. R.Campann, D. Dongovann, E. Iamper, N. Lanconell, G. Palermo, M. Roffll, A. Rccard

More information

Statistical models for predicting number of involved nodes in breast cancer patients

Statistical models for predicting number of involved nodes in breast cancer patients Vol.2, No.7, 641-651 (2010) do:10.4236/health.2010.27098 Health Statstcal models for predctng number of nvolved nodes n breast cancer patents Alok Kumar Dwved 1 *, Sada Nand Dwved 2, Suryanarayana Deo

More information

The Influence of the Isomerization Reactions on the Soybean Oil Hydrogenation Process

The Influence of the Isomerization Reactions on the Soybean Oil Hydrogenation Process Unversty of Belgrade From the SelectedWorks of Zeljko D Cupc 2000 The Influence of the Isomerzaton Reactons on the Soybean Ol Hydrogenaton Process Zeljko D Cupc, Insttute of Chemstry, Technology and Metallurgy

More information

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of Appled Mathematcal Scences, Vol. 7, 2013, no. 41, 2047-2053 HIKARI Ltd, www.m-hkar.com Modelng Mult Layer Feed-forward Neural Network Model on the Influence of Hypertenson and Dabetes Melltus on Famly

More information

Natural Image Denoising: Optimality and Inherent Bounds

Natural Image Denoising: Optimality and Inherent Bounds atural Image Denosng: Optmalty and Inherent Bounds Anat Levn and Boaz adler Department of Computer Scence and Appled Math The Wezmann Insttute of Scence Abstract The goal of natural mage denosng s to estmate

More information

N-back Training Task Performance: Analysis and Model

N-back Training Task Performance: Analysis and Model N-back Tranng Task Performance: Analyss and Model J. Isaah Harbson (jharb@umd.edu) Center for Advanced Study of Language and Department of Psychology, Unversty of Maryland 7005 52 nd Avenue, College Park,

More information

NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 TIANHONG ZHOU

NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 TIANHONG ZHOU NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 by TIANHONG ZHOU B.S., Chna Agrcultural Unversty, 2003 M.S., Chna Agrcultural Unversty, 2006 A THESIS submtted n partal fulfllment of the requrements

More information

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS ELLIOTT PARKER and JEANNE WENDEL * Department of Economcs, Unversty of Nevada, Reno, NV, USA SUMMARY Ths paper examnes the econometrc

More information

*VALLIAPPAN Raman 1, PUTRA Sumari 2 and MANDAVA Rajeswari 3. George town, Penang 11800, Malaysia. George town, Penang 11800, Malaysia

*VALLIAPPAN Raman 1, PUTRA Sumari 2 and MANDAVA Rajeswari 3. George town, Penang 11800, Malaysia. George town, Penang 11800, Malaysia 38 A Theoretcal Methodology and Prototype Implementaton for Detecton Segmentaton Classfcaton of Dgtal Mammogram Tumor by Machne Learnng and Problem Solvng *VALLIAPPA Raman, PUTRA Sumar 2 and MADAVA Rajeswar

More information

A Linear Regression Model to Detect User Emotion for Touch Input Interactive Systems

A Linear Regression Model to Detect User Emotion for Touch Input Interactive Systems 2015 Internatonal Conference on Affectve Computng and Intellgent Interacton (ACII) A Lnear Regresson Model to Detect User Emoton for Touch Input Interactve Systems Samt Bhattacharya Dept of Computer Scence

More information

A Support Vector Machine Classifier based on Recursive Feature Elimination for Microarray Data in Breast Cancer Characterization. Abstract.

A Support Vector Machine Classifier based on Recursive Feature Elimination for Microarray Data in Breast Cancer Characterization. Abstract. A Support Vector Machne Classfer based on Recursve Feature Elmnaton for Mcroarray Data n Breast Cancer Characterzaton. R.Campann, D. Dongovann, N. Lanconell, G. Palermo, A. Rccard, M. Roffll Dpartmento

More information

The effect of salvage therapy on survival in a longitudinal study with treatment by indication

The effect of salvage therapy on survival in a longitudinal study with treatment by indication Research Artcle Receved 28 October 2009, Accepted 8 June 2010 Publshed onlne 30 August 2010 n Wley Onlne Lbrary (wleyonlnelbrary.com) DOI: 10.1002/sm.4017 The effect of salvage therapy on survval n a longtudnal

More information

Boosting for tumor classification with gene expression data. Seminar für Statistik, ETH Zürich, CH-8092, Switzerland

Boosting for tumor classification with gene expression data. Seminar für Statistik, ETH Zürich, CH-8092, Switzerland BIOINFORMATICS Vol. 19 no. 9 2003, pages 1061 1069 DOI: 10.1093/bonformatcs/btf867 Boostng for tumor classfcaton wth gene expresson data Marcel Dettlng and Peter Bühlmann Semnar für Statstk, ETH Zürch,

More information

Desperation or Desire? The Role of Risk Aversion in Marriage. Christy Spivey, Ph.D. * forthcoming, Economic Inquiry. Abstract

Desperation or Desire? The Role of Risk Aversion in Marriage. Christy Spivey, Ph.D. * forthcoming, Economic Inquiry. Abstract Desperaton or Desre? The Role of Rsk Averson n Marrage Chrsty Spvey, Ph.D. * forthcomng, Economc Inury Abstract Because of the uncertanty nherent n searchng for a spouse and the uncertanty of the future

More information

Evaluation of Literature-based Discovery Systems

Evaluation of Literature-based Discovery Systems Evaluaton of Lterature-based Dscovery Systems Melha Yetsgen-Yldz 1 and Wanda Pratt 1,2 1 The Informaton School, Unversty of Washngton, Seattle, USA. 2 Bomedcal and Health Informatcs, School of Medcne,

More information

Effects of Estrogen Contamination on Human Cells: Modeling and Prediction Based on Michaelis-Menten Kinetics 1

Effects of Estrogen Contamination on Human Cells: Modeling and Prediction Based on Michaelis-Menten Kinetics 1 J. Water Resource and Protecton, 009,, 6- do:0.6/warp.009.500 Publshed Onlne ovember 009 (http://www.scrp.org/ournal/warp) Effects of Estrogen Contamnaton on Human Cells: Modelng and Predcton Based on

More information

An Approach to Discover Dependencies between Service Operations*

An Approach to Discover Dependencies between Service Operations* 36 JOURNAL OF SOFTWARE VOL. 3 NO. 9 DECEMBER 2008 An Approach to Dscover Dependences between Servce Operatons* Shuyng Yan Research Center for Grd and Servce Computng Insttute of Computng Technology Chnese

More information

Price linkages in value chains: methodology

Price linkages in value chains: methodology Prce lnkages n value chans: methodology Prof. Trond Bjorndal, CEMARE. Unversty of Portsmouth, UK. and Prof. José Fernández-Polanco Unversty of Cantabra, Span. FAO INFOSAMAK Tangers, Morocco 14 March 2012

More information

Non-linear Multiple-Cue Judgment Tasks

Non-linear Multiple-Cue Judgment Tasks Non-lnear Multple-Cue Tasks Anna-Carn Olsson (anna-carn.olsson@psy.umu.se) Department of Psychology, Umeå Unversty SE-09 87, Umeå, Sweden Tommy Enqvst (tommy.enqvst@psyk.uu.se) Department of Psychology,

More information

Using a Wavelet Representation for Classification of Movement in Bed

Using a Wavelet Representation for Classification of Movement in Bed Usng a Wavelet Representaton for Classfcaton of Movement n Bed Adrana Morell Adam Depto. de Matemátca e Estatístca Unversdade de Caxas do Sul Caxas do Sul RS E-mal: amorell@ucs.br André Gustavo Adam Depto.

More information

Encoding processes, in memory scanning tasks

Encoding processes, in memory scanning tasks vlemory & Cognton 1976,4 (5), 501 506 Encodng processes, n memory scannng tasks JEFFREY O. MILLER and ROBERT G. PACHELLA Unversty of Mchgan, Ann Arbor, Mchgan 48101, Three experments are presented that

More information

Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA

Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA Alma Mater Studorum Unverstà d Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA Cclo XXVII Settore Concorsuale d afferenza: 13/D1 Settore Scentfco dscplnare: SECS-S/02

More information

Economic crisis and follow-up of the conditions that define metabolic syndrome in a cohort of Catalonia,

Economic crisis and follow-up of the conditions that define metabolic syndrome in a cohort of Catalonia, Economc crss and follow-up of the condtons that defne metabolc syndrome n a cohort of Catalona, 2005-2012 Laa Maynou 1,2,3, Joan Gl 4, Gabrel Coll-de-Tuero 5,2, Ton Mora 6, Carme Saurna 1,2, Anton Scras

More information

Active Affective State Detection and User Assistance with Dynamic Bayesian Networks. Xiangyang Li, Qiang Ji

Active Affective State Detection and User Assistance with Dynamic Bayesian Networks. Xiangyang Li, Qiang Ji Actve Affectve State Detecton and User Assstance wth Dynamc Bayesan Networks Xangyang L, Qang J Electrcal, Computer, and Systems Engneerng Department Rensselaer Polytechnc Insttute, 110 8th Street, Troy,

More information

Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification

Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification Fast Algorthm for Vectorcardogram and Interbeat Intervals Analyss: Applcaton for Premature Ventrcular Contractons Classfcaton Irena Jekova, Vessela Krasteva Centre of Bomedcal Engneerng Prof. Ivan Daskalov

More information

BINNING SOMATIC MUTATIONS BASED ON BIOLOGICAL KNOWLEDGE FOR PREDICTING SURVIVAL: AN APPLICATION IN RENAL CELL CARCINOMA

BINNING SOMATIC MUTATIONS BASED ON BIOLOGICAL KNOWLEDGE FOR PREDICTING SURVIVAL: AN APPLICATION IN RENAL CELL CARCINOMA BINNING SOMATIC MUTATIONS BASED ON BIOLOGICAL KNOWLEDGE FOR PREDICTING SURVIVAL: AN APPLICATION IN RENAL CELL CARCINOMA DOKYOON KIM, RUOWANG LI, SCOTT M. DUDEK, JOHN R. WALLACE, MARYLYN D. RITCHIE Center

More information

AUTOMATED CHARACTERIZATION OF ESOPHAGEAL AND SEVERELY INJURED VOICES BY MEANS OF ACOUSTIC PARAMETERS

AUTOMATED CHARACTERIZATION OF ESOPHAGEAL AND SEVERELY INJURED VOICES BY MEANS OF ACOUSTIC PARAMETERS AUTOMATED CHARACTERIZATIO OF ESOPHAGEAL AD SEVERELY IJURED VOICES BY MEAS OF ACOUSTIC PARAMETERS B. García, I. Ruz, A. Méndez, J. Vcente, and M. Mendezona Department of Telecommuncaton, Unversty of Deusto

More information

THE NATURAL HISTORY AND THE EFFECT OF PIVMECILLINAM IN LOWER URINARY TRACT INFECTION.

THE NATURAL HISTORY AND THE EFFECT OF PIVMECILLINAM IN LOWER URINARY TRACT INFECTION. MET9401 SE 10May 2000 Page 13 of 154 2 SYNOPSS MET9401 SE THE NATURAL HSTORY AND THE EFFECT OF PVMECLLNAM N LOWER URNARY TRACT NFECTON. L A study of the natural hstory and the treatment effect wth pvmecllnam

More information

SMALL AREA CLUSTERING OF CASES OF PNEUMOCOCCAL BACTEREMIA.

SMALL AREA CLUSTERING OF CASES OF PNEUMOCOCCAL BACTEREMIA. SMALL AREA CLUSTERING OF CASES OF PNEUMOCOCCAL BACTEREMIA. JP Metlay, MD, PhD T Smth, PhD N Kozum, PhD C Branas, PhD E Lautenbach, MD NO Fshman, MD PH Edelsten, MD Center for Health Equty Research and

More information

Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods

Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods Research Artcle Receved 30 September 2009, Accepted 15 March 2010 Publshed onlne n Wley Onlne Lbrary (wleyonlnelbrary.com) DOI: 10.1002/sm.3941 Estmatng the dstrbuton of the wndow perod for recent HIV

More information

We analyze the effect of tumor repopulation on optimal dose delivery in radiation therapy. We are primarily

We analyze the effect of tumor repopulation on optimal dose delivery in radiation therapy. We are primarily INFORMS Journal on Computng Vol. 27, No. 4, Fall 215, pp. 788 83 ISSN 191-9856 (prnt) ó ISSN 1526-5528 (onlne) http://dx.do.org/1.1287/joc.215.659 215 INFORMS Optmzaton of Radaton Therapy Fractonaton Schedules

More information

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article Jestr Journal of Engneerng Scence and Technology Revew () (08) 5 - Research Artcle Prognoss Evaluaton of Ovaran Granulosa Cell Tumor Based on Co-forest ntellgence Model Xn Lao Xn Zheng Juan Zou Mn Feng

More information

A Classification Model for Imbalanced Medical Data based on PCA and Farther Distance based Synthetic Minority Oversampling Technique

A Classification Model for Imbalanced Medical Data based on PCA and Farther Distance based Synthetic Minority Oversampling Technique A Classfcaton Model for Imbalanced Medcal Data based on PCA and Farther Dstance based Synthetc Mnorty Oversamplng Technque NADIR MUSTAFA School of Computer Scence and Engneerng Unversty of Electronc Scence

More information

What Determines Attitude Improvements? Does Religiosity Help?

What Determines Attitude Improvements? Does Religiosity Help? Internatonal Journal of Busness and Socal Scence Vol. 4 No. 9; August 2013 What Determnes Atttude Improvements? Does Relgosty Help? Madhu S. Mohanty Calforna State Unversty-Los Angeles Los Angeles, 5151

More information

THE NORMAL DISTRIBUTION AND Z-SCORES COMMON CORE ALGEBRA II

THE NORMAL DISTRIBUTION AND Z-SCORES COMMON CORE ALGEBRA II Name: Date: THE NORMAL DISTRIBUTION AND Z-SCORES COMMON CORE ALGEBRA II The normal dstrbuton can be used n ncrements other than half-standard devatons. In fact, we can use ether our calculators or tables

More information

Resampling Methods for the Area Under the ROC Curve

Resampling Methods for the Area Under the ROC Curve Resamplng ethods for the Area Under the ROC Curve Andry I. Bandos AB6@PITT.EDU Howard E. Rockette HERBST@PITT.EDU Department of Bostatstcs, Graduate School of Publc Health, Unversty of Pttsburgh, Pttsburgh,

More information

(From the Gastroenterology Division, Cornell University Medical College, New York 10021)

(From the Gastroenterology Division, Cornell University Medical College, New York 10021) ROLE OF HEPATIC ANION-BINDING PROTEIN IN BROMSULPHTHALEIN CONJUGATION* BY N. KAPLOWITZ, I. W. PERC -ROBB,~ ANn N. B. JAVITT (From the Gastroenterology Dvson, Cornell Unversty Medcal College, New York 10021)

More information

Insights in Genetics and Genomics

Insights in Genetics and Genomics Insghts n Genetcs and Genomcs Research Artcle Open Access New Score Tests for Equalty of Varances n the Applcaton of DNA Methylaton Data Analyss [Verson ] Welang Qu Xuan L Jarrett Morrow Dawn L DeMeo Scott

More information

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE Wang Yng, Lu Xaonng, Ren Zhhua, Natonal Meteorologcal Informaton Center, Bejng, Chna Tel.:+86 684755, E-mal:cdcsjk@cma.gov.cn Abstract From, n Chna meteorologcal

More information

Submitted for Presentation 94th Annual Meeting of the Transportation Research Board January 11-15, 2015, Washington, D.C.

Submitted for Presentation 94th Annual Meeting of the Transportation Research Board January 11-15, 2015, Washington, D.C. Wegh-In-Moton Staton Montorng and Calbraton usng Inductve Loop Sgnature Technology Shn-Tng (Cndy) Jeng (Correspondng Author) CLR Analytcs Inc 8885 Research Drve Sute 15 Irvne, CA 92618 Tel: 949-864-6696,

More information

Nonstandard Machine Learning Algorithms for Microarray Data Mining. Byoung-Tak Zhang

Nonstandard Machine Learning Algorithms for Microarray Data Mining. Byoung-Tak Zhang Nonstandard Machne Learnng Algorthms for Mcroarray Data Mnng Byoung-Tak Zhang Center for Bonformaton Technology (CBIT) & Bontellgence Laboratory School of Computer Scence and Engneerng Seoul Natonal Unversty

More information

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi Unversty of Pennsylvana ScholarlyCommons PSC Workng Paper Seres 7-29-20 HIV/AIDS-related Expectatons and Rsky Sexual Behavor n Malaw Adelne Delavande RAND Corporaton, Nova School of Busness and Economcs

More information

AUTOMATED DETECTION OF HARD EXUDATES IN FUNDUS IMAGES USING IMPROVED OTSU THRESHOLDING AND SVM

AUTOMATED DETECTION OF HARD EXUDATES IN FUNDUS IMAGES USING IMPROVED OTSU THRESHOLDING AND SVM AUTOMATED DETECTION OF HARD EXUDATES IN FUNDUS IMAGES USING IMPROVED OTSU THRESHOLDING AND SVM Wewe Gao 1 and Jng Zuo 2 1 College of Mechancal Engneerng, Shangha Unversty of Engneerng Scence, Shangha,

More information

Estimation of Relative Survival Based on Cancer Registry Data

Estimation of Relative Survival Based on Cancer Registry Data Revew of Bonformatcs and Bometrcs (RBB) Volume 2 Issue 4, December 203 www.sepub.org/rbb Estmaton of Relatve Based on Cancer Regstry Data Olaf Schoffer *, Ante Nedostate 2, Stefane J. Klug,2 Cancer Epdemology,

More information

Evasion of tumours from the control of the immune system: consequences of brief encounters

Evasion of tumours from the control of the immune system: consequences of brief encounters Al-Tameem et al. Bology Drect 22, 7:3 RESEARCH Open Access Evason of tumours from the control of the mmune system: consequences of bref encounters Mohannad Al-Tameem, Mark Chaplan * and Alberto d Onofro

More information

A GEOGRAPHICAL AND STATISTICAL ANALYSIS OF LEUKEMIA DEATHS RELATING TO NUCLEAR POWER PLANTS. Whitney Thompson, Sarah McGinnis, Darius McDaniel,

A GEOGRAPHICAL AND STATISTICAL ANALYSIS OF LEUKEMIA DEATHS RELATING TO NUCLEAR POWER PLANTS. Whitney Thompson, Sarah McGinnis, Darius McDaniel, A GEOGRAPHICAL AD STATISTICAL AALYSIS OF LEUKEMIA DEATHS RELATIG TO UCLEAR POWER PLATS Whtney Thompson, Sarah McGnns, Darus McDanel, Jean Sexton, Rebecca Pettt, Sarah Anderson, Monca Jackson ABSTRACT:

More information

Towards Automated Pose Invariant 3D Dental Biometrics

Towards Automated Pose Invariant 3D Dental Biometrics Towards Automated Pose Invarant 3D Dental Bometrcs Xn ZHONG 1, Depng YU 1, Kelvn W C FOONG, Terence SIM 3, Yoke San WONG 1 and Ho-lun CHENG 3 1. Mechancal Engneerng, Natonal Unversty of Sngapore, 117576,

More information

National Polyp Study data: evidence for regression of adenomas

National Polyp Study data: evidence for regression of adenomas 5 Natonal Polyp Study data: evdence for regresson of adenomas 78 Chapter 5 Abstract Objectves The data of the Natonal Polyp Study, a large longtudnal study on survellance of adenoma patents, s used for

More information

Feature Selection for Predicting Tumor Metastases in Microarray Experiments using Paired Design

Feature Selection for Predicting Tumor Metastases in Microarray Experiments using Paired Design Feature Selecton for Predctng Tumor Metastases n Mcroarray Experments usng Pared Desgn Qhua Tan 1,2, Mads Thomassen 1 and Torben A. Kruse 1 ORIGINAL RESEARCH 1 Department of Bochemstry, Pharmacology and

More information

RENAL FUNCTION AND ACE INHIBITORS IN RENAL ARTERY STENOSISA/adbon et al. 651

RENAL FUNCTION AND ACE INHIBITORS IN RENAL ARTERY STENOSISA/adbon et al. 651 Downloaded from http://ahajournals.org by on January, 209 RENAL FUNCTION AND INHIBITORS IN RENAL ARTERY STENOSISA/adbon et al. 65 Downloaded from http://ahajournals.org by on January, 209 Patents and Methods

More information

THIS IS AN OFFICIAL NH DHHS HEALTH ALERT

THIS IS AN OFFICIAL NH DHHS HEALTH ALERT THIS IS AN OFFICIAL NH DHHS HEALTH ALERT Dstrbuted by the NH Health Alert Network Health.Alert@dhhs.nh.gov August 26, 2016 1430 EDT (2:30 PM EDT) NH-HAN 20160826 Recommendatons for Accurate Dagnoss of

More information

STAGE-STRUCTURED POPULATION DYNAMICS OF AEDES AEGYPTI

STAGE-STRUCTURED POPULATION DYNAMICS OF AEDES AEGYPTI Internatonal Conference Mathematcal and Computatonal Bology 211 Internatonal Journal of Modern Physcs: Conference Seres Vol. 9 (212) 364 372 World Scentfc Publshng Company DOI: 1.1142/S21194512543 STAGE-STRUCTURED

More information