This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

Size: px
Start display at page:

Download "This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and"

Transcription

1 Ths artcle appeared n a journal publshed by Elsever. The attached copy s furnshed to the author for nternal non-commercal research and educaton use, ncludng for nstructon at the authors nsttuton and sharng wth colleagues. Other uses, ncludng reproducton and dstrbuton, or sellng or lcensng copes, or postng to personal, nsttutonal or thrd party webstes are prohbted. In most cases authors are permtted to post ther verson of the artcle (e.g. n Word or Tex form) to ther personal webste or nsttutonal repostory. Authors requrng further nformaton regardng Elsever s archvng and manuscrpt polces are encouraged to vst:

2 Computatonal Statstcs and Data Analyss 53 (2009) Contents lsts avalable at ScenceDrect Computatonal Statstcs and Data Analyss journal homepage: Partton clusterng of hgh dmensonal low sample sze data based on p-values George von Borres a, Hayan Wang b, a Departamento de Estatístca, IE, Unversdade de Brasíla, , DF, Brazl b Department of Statstcs, Kansas State Unversty, , KS, USA a r t c l e n f o a b s t r a c t Artcle hstory: Receved 7 February 2009 Receved n revsed form 26 Aprl 2009 Accepted 19 June 2009 Avalable onlne 26 June 2009 Clusterng technques play an mportant role n analyzng hgh dmensonal data that s common n hgh-throughput screenng such as mcroarray and mass spectrometry data. Effectve use of the hgh dmensonalty and some replcatons can help to ncrease clusterng accuracy and stablty. In ths artcle a new parttonng algorthm wth a robust dstance measure s ntroduced to cluster varables n hgh dmensonal low sample sze (HDLSS) data that contan a large number of ndependent varables wth a small number of replcatons per varable. The proposed clusterng algorthm, PPCLUST, consders data from a mxture dstrbuton and uses p-values from nonparametrc rank tests of homogeneous dstrbuton as a measure of smlarty to separate the mxture components. PPCLUST s able to effcently cluster a large number of varables n the presence of very few replcatons. Inherted from the robustness of rank procedure, the new algorthm s robust to outlers and nvarant to monotone transformatons of data. Numercal studes and an applcaton to mcroarray gene expresson data for colorectal cancer study are dscussed. Publshed by Elsever B.V. 1. Introducton Mnng n hgh dmensonal low sample sze (HDLSS) data s an actve research topc due to the advance n data collecton technologes that allow the obtanng of nformaton from a large number of varables (for example, genes, protens) at the same tme. Contradctory to the requrement of plenty of replcatons as demanded by tradtonal methods, the number of replcatons for such data s often lmted due to tme or cost constrant. For example, a medum-szed mcroarray study often contans nformaton from thousands of genes wth no more than a hundred samples for each gene. An mportant task s to nvestgate and dentfy dsease response genes usng the post-genome data. Ths can provde target for drug development n publc health and gve the focus for genetc alteraton to yeld dsease resstant crops. Statstcal methods for such purposes are manly n three categores. One category s through the analyss of ndvdual gene and then apply false dscovery rate (FDR) control (Benjamn and Hochberg, 1995; Efron, 2007) to adjust for multple comparson ssues. A large volume of work n the lterature falls n ths category. Even though FDR s meant to mprove the dentfcaton of true postves, t stll leads to conservatve results n genomc applcatons (Storey and Tbshran, 2003). Ths s especally true n the case of small sample szes snce the test statstcs calculated from small replcatons are lack of power for nonparametrc methods and are senstve to devatons from assumptons for parametrc methods. As a result, when only a small amount of useful nformaton exsts among a large amount of noses, the lmtaton of these methods prevals. A second category of methods s referred to as gene set enrchment that consders a set of genes selected based on bologcal knowledge from pathway nformaton or lterature mnng to ncrease power (Subramanan et al., 2005; Efron and Tbshran, 2007). Unfortunately, pathway or gene ontology nformaton s not known for all genomes and so gene set enrchment Correspondng author. E-mal addresses: gborres@unb.br (G. von Borres), hwang@ksu.edu (H. Wang) /$ see front matter. Publshed by Elsever B.V. do: /j.csda

3 3988 G. von Borres, H. Wang / Computatonal Statstcs and Data Analyss 53 (2009) Table 1 Hgh dmensonal data layout, where a and n 2. Factor level Dstrbuton Observatons Sample sze 1 F 1 (x) X 11 X X 1n1 n 1 2 F 2 (x) X 21 X X 2n2 n 2... a F a (x) X r1 X r2... X rnr n a... methods may not be applcable. A thrd category s through clusterng to dentfy groups of dfferentally expressed genes (Fraley, 1998; Alon et al., 1999; Notterman et al., 2001; Yeung and Ruzzo, 2001; Jang et al., 2004; Huttenhower et al., 2007; Fu and Medco, 2007). Clusterng based methods are more flexble. However, non-probablstc dstance measures and correspondng clusters obtaned can lead to dffculty n nterpretaton. Further, most algorthms are senstve to monotone transformatons and produce dfferent results when appled to dfferent transformatons of data. In addton to the above mentoned problems, most avalable methods requre a user to pre-specfy the number of clusters. Ths s dffcult and could produce msleadng results when ncorrect number of clusters are specfed. Mxture model based clusterng (MCLUST) developed by Fraley and Raftery (2006) can automatcally estmate the number of clusters usng Bayesan Informaton Crteron. However, ths algorthm reles heavly on normalty assumpton and may produce poor clusterng accuracy when the data are heavly skewed. Further, as ponted out by the authors, MCLUST s not recommended to apply to HDLSS data drectly due to ts dependence on the covarance matrx estmaton. We propose to approach the problem from a combnaton of clusterng and gene set enrchment dea wthout havng to rely on known bologcal nformaton. Specfcally, we assume that at least two replcatons are avalable for each varable (gene) to start wth. All the varables and ther observatons together can be vewed as orgnated from hgh dmensonal mxtures of dstrbutons, where each unque dstrbuton defnes a cluster. We then ntroduce a new parttonal algorthm usng a robust measure of smlarty to cluster the large number of varables. The robust smlarty measure evolves from p-values obtaned from the rank test of no nonparametrc effect of groups (Wang and Akrtas, 2004) specally developed for the HDLSS structure. The new algorthm can automatcally determne the number of clusters and are nvarant to monotone transformatons of data. Numercal studes show that the proposed algorthm has hgh clusterng accuracy and stablty. Addtonally, the algorthm s fast and do not show memory allocaton problems observed n some algorthms when the number of varables n the study s very hgh ( or more varables). 2. Revew of the nonparametrc test for homogeneous dstrbuton Suppose we have observatons from a mxture of unknown dstrbutons. Let a cluster be all the observatons generated from the same dstrbuton. Dfferences among clusters can be reflected n many ways such as dfferent mean values or dfferent varances. In ths artcle, the problem of clusterng on observatons s proceeded as a problem of detectng a sgnfcant dfference on the dstrbuton of the observatons from each dstrbuton. Let X j denote the jth observaton from the th varable (or factor), where {X j, 1 j n } are ndependent observatons from some unknown dstrbuton F (x), = 1, 2,..., a. The observed data can be vewed as a matrx wth elements X j. Each row represents the level of a factor, and each column represents an observaton (replcaton), as s shown n Table 1. We frst test to see f these observatons are from the same dstrbuton,.e., we test the hypothess H 0 : F 1 (x) = = F a (x). The Kruskal Walls test can be used when the number of dstrbutons s small. However, the test s not vald n a hgh dmensonal settng snce the nference s based on large sample sze and small number of dstrbutons. We also do not recommend to use tradtonal ANOVA F-test as the error terms n ANOVA model need to be..d. Gaussan wth a constant varance. Akrtas and Arnold (2000) showed that the ANOVA F-test s robust to departure from homoscedastcty when there are a large number of factors, but t s not asymptotcally vald for unbalanced data wth small sample szes even under homoscedastcty. Later, Akrtas and Papadatos (2004) consdered test procedures for unbalanced and/or heteroscedastc stuatons when the number of factors tends to nfnty. However, ther tests are based on orgnal observatons that are not nvarant to monotone transformaton of data. To overcome all these lmtatons, Wang and Akrtas (2004) consdered a nonparametrc rank test of the null hypothess of equalty of dstrbuton functons for each factor level when the number of factors s large and the number of replcatons s ether small (referred as HDLSS data here) or large. We use the p-value from testng the hypothess n (1) usng the test statstc n Wang and Akrtas (2004) as the measure of smlarty among groups. Let R j represent the (md-)rank of observaton X j n the set of all n 1 + n n a observatons. Then under H 0, all observatons are..d. realzatons of a common dstrbuton. So these (md-)ranks are dscrete unformly dstrbuted random a numbers between 1 and =1 n for contnuous data. Let R. = n 1 (1) n j=1 R j be the mean rank of observatons for the th factor level and R.. = a 1 a =1 R. be the overall unweghted mean of ranks from all factor levels. Defne the test statstc, F R = MST R MSE R (2)

4 G. von Borres, H. Wang / Computatonal Statstcs and Data Analyss 53 (2009) where MST R s the unweghted mean square error due to factor levels calculated over (md-)ranks: MST R = 1 a (R. R.. ) 2, a 1 =1 and MSE R s the pooled estmate of the sample varance, also obtaned over (md-)ranks: MSE R = 1 a 1 S 2 a n, R, =1 where S 2 R, s the sample varance calculated usng (md-)ranks of observatons from the th factor level. The asymptotc dstrbuton of a(f R 1) under H 0, as a, s gven n Wang and Akrtas (2004). For convenence of further dscusson, we restate the theorem below. Theorem 1. Let F (x) be arbtrary cumulatve dstrbuton functons and H(x) = ( a =1 n ) 1 n F (x) be the average cumulatve dstrbuton functon. Assume that the observatons are ndependent. Defne σ 2 = Var(H(X j )) and v 2 = 1 a 1 2 σ 2 > 0, τ 2 = 1 a 2σ 4 a n a n (n 1). (5) =1 Then under H 0 : F 1 (x) = = F a (x), the lmt of τ 2 /v 4 2 exsts as a. Further, a(fr 1) d N(0, lm a τ 2 /v 4 2 ), =1 regardless of n stay bounded or go to, provded that max {n }/ mn {n } = O(1) for n 2. The statstc a(f R 1) compared to the normal crtcal values can be used to obtan an approxmate p-value to gve a sample evdence of the homogenety of the dstrbutons. A large p-value ndcates that the gven sample does not provde evdence to conclude that the factor levels beng tested have dfferent dstrbutons. In such case, we cluster these factors levels nto the same group. In contrast, a small p-value gves evdence aganst H 0 ndcatng that at least two dstrbutons are dfferent. The use of the hypothess testng results from (6) to obtan smlarty measure allows flexble modelng and robust clusterng at the same tme. Wth ths general setup, the data collected can be balanced or unbalanced and the user does not have to worry about normalty or skewness of the data. Heteroscedastc varances are naturally ncorporated. Ths s mportant as gene regulatons are very complcated and the varatons of the expresson data from dfferent genes can be dramatcally dfferent. In addton, the results hold for small or large sample szes. In partcular, allowng relable nference wth the sample szes as small as two can lead to sgnfcant reducton n cost for consderng the number of arrays requred. Before we apply the results of (6) n clusterng, we frst evaluate ts performance. The estmated type I error and power were not studed n Wang and Akrtas (2004). We report our smulaton results n the next secton Type I error and power estmate when the number of varables s large Table 2 reports the Type I error estmate usng the asymptotc dstrbuton of the test statstc n (6) at sgnfcance levels 0.10, 0.05 and For performance of other nonparametrc tests n such a settng, one can see Akrtas and Papadatos (2004). In the smulatons the number of random varables, a, takes values 1000, 2000 and 4000, and the number of observatons per varable s set to be 4. The smulatons are based on 2000 runs and observatons were generated from normal, lognormal, exponental, and Cauchy dstrbutons. The Jackknfe bas corrected estmator (Pawtan, 2001) of σ 4 were used n the estmaton of the asymptotc varances. The Type I error rates reported n Table 2 are close to the true α levels, ndcatng that the test statstc a(f r 1) performs well n testng the hypothess n (1) regardless of whether the dstrbuton s symmetrc (normal), skewed (lognormal, exponental), or heavy taled (Cauchy). To study the power of the test descrbed n Secton 2, we generated data for 2000 random varables from mxture dstrbutons wth four observatons per varable. Normal, lognormal, exponental, and Cauchy dstrbutons are consdered to evaluate robustness of the test. For all cases except the exponental dstrbuton, observatons for 95% of the varables are generated wth the dstrbuton havng locaton parameter 0 and scale parameter 1, and the remanng 5% of the random varables have locaton parameter d rangng from 0 to The acheved power at sgnfcance level α = 0.05 s gven n Fg. 1. The test appears to be very powerful n detectng small proporton of dfferences n all cases consdered. 3. Partton clusterng algorthm based on p-values The p-values obtaned from the test n Secton 2 can serve as a smlarty measure n a clusterng algorthm wth hgh dmensonal data. In ths secton, we ntroduce a partton algorthm, PPCLUST (p-values based parttonal clusterng), to teratvely conduct nonparametrc hypothess testng and partton the random varables nto subgroups whenever the smlarty s below a certan threshold. That s, a group of varables s parttoned nto two smaller groups when the test (3) (4) (6)

5 3990 G. von Borres, H. Wang / Computatonal Statstcs and Data Analyss 53 (2009) Table 2 Type I error estmate. The test has accurate sze regardless of the dstrbuton beng symmetrc, skewed or heavy-taled. Dstrbuton Number of Factor levels Nomnal level Type I error Normal(0,1) Lognormal(0,1) Exponental(1) Cauchy(0,1) Estmated Power at Level 0.05 Estmated Power Normal Lognormal Exponental Cauchy Dfference n Locaton Parameter (d) Fg. 1. Acheved power for HDLSS data wth α = 0.05, consderng shfted dfferences n mean (d) n a group of 100 factor levels n a total of 2000 factor levels and data generated from four dstrbutons: Normal(0, 1) (contnuous lne n blue), Lognormal(0, 1) (dashed lne n black), Exponental(1) (dotted lne n red) and Cauchy(0, 1) (dotted-dashed lne n green). of dentcal dstrbuton n (1) s rejected and the group remans ntact f the test s not rejected. When H 0 s rejected, smaller groups are created for further testng. The algorthm contnues untl when there are no groups wth smlarty measures below the threshold The algorthm For g 1, let g 1 be the number of groups dentfed such that all the varables wthn each group have dentcal dstrbuton. PPCLUST s descrbed below n 9 steps. Throughout the algorthm, the subset of data to be tested are always stored as n Table 1 wth each row representng a random sample from the same varable. 1. Let D 1 denote the matrx of observatons from all varables as n Table 1. Each row contans observatons from the same varable. The number of rows n D 1 s denoted as n f (D 1 ). Set g = Calculate the (md-)rank of all the observatons n D 1 and store them n D 1R n the same format as n Table Calculate the medan (md-)rank for each varable (.e. each row) n D 1R. 4. Sort the varables (.e. rows) n D 1 accordng to the medan ranks from Step Conduct the test to evaluate f the varables (rows) n D 1 have dentcal dstrbuton If H 0 s not rejected, report all the varables n D 1 as a sngle group. Go to Step If H 0 s rejected: contnue to Step Take the frst half of the number (rounded to nteger) of varables from consecutve rows of D 1 and denote the data n ths subset ncludng correspondng observatons as D 2. Let n f (D 2 ) be the number of varables n D 2.

6 G. von Borres, H. Wang / Computatonal Statstcs and Data Analyss 53 (2009) Conduct the test to evaluate f the varables (rows) n D 2 have dentcal dstrbuton. 7.1 If H 0 s not rejected: Assgn the varables of D 2 and correspondng observatons to group g Assgn g + 1 to g Remove the varables n D 2 and correspondng observatons from D If n f (D 1 ) = 0, then go to Step If n f (D 1 ) 1, then do steps A and B below: A. Test to see f each varable n D 1 belongs to the newly assgned group by testng the correspondng hypothess that all nvolved random varables have the same dstrbuton. Remove the varable and ts observatons from D 1 when H 0 s not rejected and put them nto the newly assgned group. B. Let D 2 be the set that contans the remanng varables and ther observatons n D 1 and go to Step If H 0 s rejected: Take the frst half of the number (rounded to nteger) of varables from D 2 and denote the data from ths subset wth correspondng observatons as D Assgn all the varables and correspondng observatons that are not n D 3 to D Let D 2 = D 3 and delete D Go to Step If n f (D 2 ) = 1, then perform Steps ; otherwse, return to Step Allocate the varable and correspondng observatons n D 2 to group Remove the varable n D 2 and correspondng observatons from D If n f (D 1 ) = 0 then go to Step If n f (D 1 ) = 1, then let D 2 = D 1 and return to Step If n f (D 1 ) > 1, then let D 2 = D 1 and go to Step Stop the clusterng and report the groups dentfed. Remark. For Step 3, please note that each varable has multple..d. observatons. The sortng s only done to the varables, not to the observatons. The observatons from each varable reman unordered so that they are stll ndependent and dentcally dstrbuted. For the same set of varables to be tested wth gven..d observatons from each varable, the test statstc defned n (2) and the asymptotc varance calculated n (5) usng Jackknfe bas corrected estmator of σ 4 reman unchanged no matter we sort the varables or not. Therefore, the sortng has no effect on the test. However, t provdes computatonal advantage for the clusterng by puttng smlar varables n nearby groups wthout alterng the basc requrement of Theorem 1. For Steps 6 and 7.2.1, an alternatve way to partton the varables s to splt between two rows that has the largest gap n ther medan ranks. Ths can potentally ncrease the speed of clusterng f the dstrbutons underlyng dfferent clusters are well separated. However, the advantage s not sgnfcant f the underlyng dstrbutons have substantal overlap as n the numercal study n Secton 4. Step 7 bascally repeatedly partton and group the varables untl no further partton s possble. Step 8 bascally put the random varables that cannot be clustered to any of the dentfed groups nto a group labeled as 0. Therefore, the random varables wth group label 0 are not necessarly smlar (or dssmlar). Instead, they are judged to belong to none of the dentfed groups. In other words, the random varables n group 0 resulted n a rejecton of H 0 when tested wth random varables of any other dentfed group. By the end of the algorthm, g 1 s the total number of dfferent groups. A group labeled wth a lower number n the output contan random varables wth lower medan observaton values than those groups labeled wth hgher numbers. For example, f the data are the ratos of gene expressons under a treatment and a control, a group labeled wth a lower number may contan down-regulated genes and a group labeled wth a hgher number may contan up-regulated genes. Intermedate groups contan genes not dfferentally expressed. In addton to the up or down regulatons, the genes from dfferent groups are sgnfcantly dfferent as a result of the hypothess testng About the sgnfcance level to use Note that to determne f all the varables n a group have dentcal dstrbuton, Theorem 1 only apples when the number of varables (rows) s large. As the partton proceeds, the number of varables n the group to be tested wll reduce. The left panel of Table 3 gves the type I error estmate when the number of varables s no more than 500 when each varable contans two replcatons (under four dfferent dstrbutons). Ths and Table 2 together ndcate that the test n Theorem 1 s lberal when the number of varables s no more than 50. To remedy ths, we suggest to use small sgnfcance level n determnng whether to reject a test. We recommend to take the upper bound of all sgnfcance levels, α, such that smaller levels yeld smlar clusterng results. If a sgnfcance level used leads to too many small clusters, t ndcates that the level s not small enough and the clusterng results obtaned s not relable. Ths s because the test does not have acceptable type I error for small number of varables wth small sample szes. In such case, even smaller sgnfcance levels need to be consdered. We choose not to use Kruskal Walls test n that ths test requres large sample szes and small number of varables. Our numercal results show that ths test s very conservatve when the number of varables s large and the sample szes are small (see the rght panel of Table 3 for type I error estmate). For example, n a smulaton we generated 15 random varables wth scale parameter 1 from normal, lognormal, and Cauchy dstrbutons. Ten of them have locaton parameter 0.5 and the

7 3992 G. von Borres, H. Wang / Computatonal Statstcs and Data Analyss 53 (2009) Table 3 Type I error estmate at level α = 0.05 for the test n Theorem 1 (left panel) and Kruskal Walls test (rght panel) under four dstrbutons when the number of varables s below 500. Each varable has 2 replcatons. The test n Theorem 1 s lberal when the number of varables s no more than 50 and the Kruskal Walls test s conservatve for all the cases consdered. All dstrbutons have locaton parameter 0 and scale parameter 1 (the Unf s for unform dstrbuton on (0, 1)). Test n Theorem 1 Kruskal Walls test n f Unf Normal Lognormal Cauchy n f Unf Normal Lognormal Cauchy remanng 5 varables have locaton parameter 0. Two replcatons were generated for each varable. The estmated power at level 0.05 from the Kruskal Walls test for these dstrbutons are 0.006, 0.006, and respectvely. So Kruskal Walls test s not senstve enough to detect heterogeneous dstrbutons to partton the data Advantage of PPCLUST compared to tradtonal clusterng algorthms The robust smlarty measure and the clusterng mechansm entals the followng advantage of PPCLUST. 1. Invarance to monotone transformatons: The use of overall ranks of the observatons n the test statstc leads to smlarty measure that s nvarant to monotone transformaton of data and ths n turn makes PPCLUST to have such property. Many clusterng algorthms produce dfferent results before and after monotone transformatons of the data due to the fact that such transformatons change the smlarty matrces used n clusterng. PPCLUST does not have ths drawback so that a user does not need to explore approprate transformatons of data to satsfy some model assumptons. Ths s partcularly useful snce choosng approprate transformatons for HDLSS data s a dffcult queston tself. 2. Automatc specfcaton of the number of groups: PPCLUST does not requre the number of clusters to be specfed n advance. It wll determne the number of clusters automatcally by specfcaton of a sgnfcance level as the threshold to be compared wth the p-values for testng the hypothess of dentcal dstrbuton. Estmatng the number of mxture components s tself a popular research topc that s often computatonally extensve. In low dmensonal case, t has been a nusance and dffcult for a user to choose the number of clusters even though the clusterng results may be vsualzed. In hgh dmensonal case, effectve vsualzaton tools are not avalable to ad a user. So t s even harder to specfy the number of clusters for a real dataset. PPCLUST produces ths nformaton drectly. The specfcaton of a sgnfcance level s not as ntrusve as the specfcaton of the number of groups, whch s one of the objectves of clusterng analyss. In fact, the sgnfcance level can be used as a gudance n fndng the number of groups n a real data set. For example, decreasng the sgnfcance level n PPCLUST wll decrease the number of groups found because t decreases the Type I error commtted by the test. The use of dfferent sgnfcance levels can serve as a fne tunng parameter n revealng the total number of dfferent groups G where the algorthm tends to stablze,.e, fnd G that s more common to dfferent α levels. Ths can be used as an ndcaton of the true number of groups n the data. We remark that lowerng the sgnfcance level too much wll also decrease the power of the test n fndng new and small groups. The delcate balance can be acheved n the same way as how we handle the type I and type II error n regular hypothess testng. 3. Less concern for multple comparson problems n HDLSS data: Reducng false dscoveres whle strvng to mantan the power to dentfy true dscoveres s one of the challenges for HDLSS data analyss (Storey, 2002; Sabatt, 2006; Qu and Yakovlev, 2006; Strmmer, 2008). Ths s less of a concern n PPCLUST snce the test s appled to groups of varables nstead of on a one-by-one bass. 4. PPCLUST favors HDLSS for asymptotc dstrbuton of the test statstcs whle other algorthms often need pror dmenson reducton before beng appled to hgh dmensonal data. In hgh dmensonal studes t s common to apply some dmenson reducton technque such as prncpal components analyss before clusterng data (Johnson, 1998). Some studes do not recommend the use of PCA before clusterng except n very specal stuatons (Yeung and Ruzzo, 2001). Smulatons n Yeung and Ruzzo show that clusterng prncpal components nstead of orgnal data produce dfferent results on many algorthms usng dfferent smlarty metrcs. PPCLUST does not requre prevous dmenson reducton to the analyss. Instead, PPCLUST takes advantage of the hgh dmensonalty to provde power to produce relable smlarty measure. Ths s specally appealng when only very small number of replcatons are avalable.

8 G. von Borres, H. Wang / Computatonal Statstcs and Data Analyss 53 (2009) Flexble to work wth unbalanced data wth small sample szes: the algorthm works wth both balanced or unbalanced data. The only requrement s that the number of replcatons per varable s at least 2. There s no need that all varables have the same number of replcatons. Unbalanced data s common n studes of mcroarray gene expresson data and some algorthms requre balanced data. Solutons lke elmnaton from the study of factor levels wth ncomplete nformaton or mputaton of data can hde or serously compromse the result of the study. 6. PPCLUST produces fast soluton for computatonally costly problems as the computatonal complexty s O(log2(N)). Note that tradtonal clusterng algorthms need to do optmzaton at each stage to fnd the optmal partton of the data based on a crtera. As the number of varables ncreases, the optmzaton cost becomes a major concern for exhaustve search. Genetc algorthms are often used to speed up the search. Instead of searchng for the optmal soluton at each stage, PPCLUST reles on statstcal evdence obtaned from hypothess testng to judge whether a group of varables s from the same dstrbuton or not. As long as the null hypothess s not rejected, the members are not sgnfcantly dfferent and therefore a group s formed. In other words, PPCLUST only need the smlarty measure from hypothess testng and elmnates the optmzaton process. Wth the smlarty measure beng obtaned through a sngle test of hypothess, the computatonal burden s dramatcally reduced to O(log 2 (N)) as opposed to O(N log 2 (N)), the best tme complexty case for herarchcal clusterng. Ths s confrmed from our smulatons (see Secton 4), where t takes PPCLUST less than a mnute to complete the clusterng of a data set contanng up to 7000 random varables wth sample szes ranged from 5 to 20 per varable usng PC machne runnng Wndows XP wth Intel Pentum M processor, 1.6 GHz, and 1 Gb of RAM memory. 4. Numercal comparson for clusterng of HDLSS data In ths secton, we compare PPCLUST wth some benchmark algorthms on smulated data. To evaluate the smlarty between two clusterng parttons, Rand (1971) proposed the Rand ndex that gves the fracton of all pars that are correctly put n the same cluster or correctly put n separate clusters. However, the expected value of the Rand ndex of two random parttons does not take a constant value. Hubert and Arabe (1985) consdered the adjusted Rand ndex (ARI) whch s centered at zero and has maxmum value of 1 acheved when the two parttons are dentcal up to renumberng of the subsets. Mllgan and Cooper (1986) compared multple ndces for measurng agreement between two parttons n clusterng analyss wth dfferent numbers of clusters, and they recommended the ARI as the ndex of choce. We adopt the ARI to compare the performance of these algorthms n clusterng consstency compared to the truth as s known from data generaton. Study I: Clusterng for symmetrc data In the followng smulatons we generated hgh dmensonal data from mxture dstrbutons havng mxture components smlar to the gene expresson data from a colorectal cancer study (Notterman et al., 2001), whch contan several large groups havng overall dstrbuton of a t-dstrbuton wth 15 degrees of freedom shfted by some locaton parameter µ and stretched by a scale parameter σ. Specfcally, observatons for 4000 random varables were generated accordng to the followng scheme: Group 1: 300 random varables from 0.25 t Group 2: 200 random varables from 0.25 t Group 3: 2500 random varables from 0.25 t 15. Group 4: 800 random varables from 0.25 t Group 5: 200 random varables from 0.25 t The denstes of these fve dstrbutons have substantal overlap. Fve observatons were generated for each random varable. PPCLUST usng sgnfcance level α = 10 8 and the followng 10 benchmark clusterng algorthms are appled to the generated data: Parttonal Algorthms: K-means, parttonng around medods (PAM), clusterng large applcatons (CLARA) wth Eucldean metrc, Self-Organzng Maps (SOM) wth dmenson 5 1. Herarchcal Algorthms: herarchcal clusterng (HCLUST) wth Ward s agglomeraton method, agglomeratve nestng (AGNES), dvsve analyss clusterng (DIANA) wth Eucldean metrc, herarchcal clusterng by mnmum energy dstance wth Eucldean norm x y ). Fuzzy Algorthm: fuzzy clusterng (FANNY). Model Based Algorthm: mxture model based clusterng (MCLUST) wth automatc choce of best model through Bayesan Informaton Crteron. For detals of each algorthm, one can see McQueen (1967), Kaufman and Rousseeuw (1990), Kohonen (1989), Székely and Rzzo (2005) and Fraley and Raftery (2006). In all algorthms that need pre-specfcaton of the number of clusters, we set the number to be 5, the true number of groups. It should be noted that ths nformaton s often not known n real practce whch contrbute to addtonal uncertanty for ther clusterng performance. R software (verson 2.4.1) wth packages energy, mclust, cluster, and SOM were used n the smulaton. PPCLUST was wrtten n SAS macro language (verson 9.3.1), and the ARI was calculated usng both R and SAS. For each algorthm n R, we use the default settng except that we supply the number of clusters wth the true number

9 3994 G. von Borres, H. Wang / Computatonal Statstcs and Data Analyss 53 (2009) Table 4 Mean and standard devatons (std) of adjusted Rand ndex for all algorthms over 200 smulated datasets. Dfferent sample szes are consdered. The groups are generated from symmetrc dstrbutons (Study I). Adjusted Rand ndex Sample szes Algorthm Mean Std. Mean Std. Mean Std. Mean Std. PPCLUST PAM K-means Energy Mclust Clara Dana HCLUST Agnes Fanny SOM of groups. For example, by default, the algorthm of Hartgan and Wong (1979) s used for K-means. In addton, wth the specfed number of clusters n the K-means algorthm, a random set of (dstnct) rows of the data s automatcally chosen as the ntal centers. The random selecton for the centers and rows s the standard ntalzaton method used n R. It has been confrmed emprcally to have better performance than other ntalzaton methods (Bradley and Fayyad, 1998; Pena et al., 1999). To evaluate the stablty of the clusterng performance, we repeat the data generaton 200 tmes and apply above algorthms on these 200 data sets. In order to verfy the performance of PPCLUST under dfferent sample szes, the complete smulaton study was repeated consderng also samples of szes 10, 15, and 20. The average and standard devaton of the ARI reflect the clusterng accuracy and stablty respectvely. They are reported n Table 4 for all the algorthms appled to the 200 data sets wth dfferent sample szes. The best two mean ARIs and standard devatons are hghlghted. From Table 4, t can be seen that as the number of replcatons ncreases, the clusterng accuracy ncreases for all algorthms. PPCLUST has the best clusterng accuracy for all sample szes consdered. In addton, PPCLUST s also the most stable algorthm for small sample sze (5) among all 11 algorthms snce the ARI of PPCLUST has the smallest standard devaton for sample sze 5. The standard devaton of the ARI for PPCLUST stays almost the same for sample szes 5, 10 and 15. MCLUST has comparable average ARI to PPCLUST for sample szes 15 and 20, but has sgnfcantly worse performance than PPCLUST for small sample szes n both clusterng accuracy and stablty. SOM showed consstent stablty but wth very low clusterng accuracy as the average ARI for SOM s less than 0.5 for all sample szes. Algorthms Energy, HCLUST, and Agnes, are compettve to PPCLUST for samples of sze 15 or hgher, but those algorthms are not as stable as PPCLUST and MCLUST. Dana and Fanny showed the lowest stablty among all algorthms and should not be used wth HDLSS data. Fg. 2 gves a graphcal summary of the performance of these algorthms through boxplots. Overall, PPCLUST has the best clusterng performance n terms of both accuracy and stablty. For larger samples, MCLUST s a good alternatve to PPCLUST. Study II: Clusterng for skewed data In a second study, the data generated for study I are transformed usng the functon e 4(x+1), where x s an observaton generated n study I. The resultng dstrbuton of the data s close to a lognormal dstrbuton but wth more extreme ponts snce x was generated from t-dstrbuton nstead of a normal dstrbuton. The resultng dstrbutons are heavly skewed. There s stll a sgnfcant amount of overlap among the denstes. Table 5 and Fg. 3 summarze the clusterng performance of these algorthms on the transformed data. PPCLUST s consderably better than all other algorthms n all sample sze stuatons. Clara has the worst results and PAM s the best algorthm among the other algorthms, but never had average ARI greater than PPCLUST appled to the transformed data yelds dentcal results to those before the transformaton because t s nvarant to monotone transformatons. The smulatons and all calculatons were performed usng Wndows XP wth Intel Pentum M processor, 1.6 GHz, and 1 Gb of RAM memory. The processng tme for PPCLUST s consstently less than 1 mn for each run of data sets wth 4000 random varables and PAM s the only faster algorthm. MCLUST, the closer compettor to PPCLUST, showed processng tmes at least 3 tmes hgher than PPCLUST. 5. Applcaton Clusterng of genes usng expresson data can dentfy genes that are dfferentally expressed under dfferent condtons. Such genes may be responsble for dsease progresson or responsve to treatment. Identfcaton of such genes can ad n bomarker dentfcaton for drug development. Addtonally, the dfferentally expressed genes can be used to classfy

10 G. von Borres, H. Wang / Computatonal Statstcs and Data Analyss 53 (2009) K Means PAM Clara Energy Mclust Dana Hclust Agnes Fanny SOM PPclust Sample Szes Fg. 2. Boxplots of adjusted Rand ndex for PPCLUST and 10 other algorthms on symmetrc data based on 200 smulated datasets wth dfferent sample szes (Study I). Table 5 Mean and standard devatons (std) of adjusted Rand ndex for all algorthms over 200 smulated datasets wth skewed dstrbuton. Dfferent sample szes are consdered. The data are generated from heavly skewed dstrbutons as descrbed n Study II. Adjusted Rand ndex Sample Szes Algorthm Mean Std. Mean Std. Mean Std. Mean Std. PPCLUST PAM K-means Energy Mclust Clara Dana HCLUST Agnes Fanny SOM patent dsease status. For example, usng all genes from the whole genome can lead to neffcency n classfyng tumor patents as no nference can deal wth hgh dmensonal predcton wthout mposng strong assumptons. Instead, usng only the genes found to be dfferentally expressed from the clusterng algorthm can sgnfcantly reduce the complexty of the classfcaton problem. That s, results from the clusterng can serve as a dmenson reducton tool for classfcaton. These studes would allow to mprove treatments by dentfcaton of targets for therapy n many dseases. In ths secton, we apply PPCLUST to data from Notterman et al. (2001) study about transcrptonal gene expresson profles of colorectal cancer. Heatmaps are used to vsualze the results of PPCLUST. Clusterng genes n colorectal cancer Colon and rectal cancer have many features n common and for ths reason both are often referred to as colorectal cancer. Ths cancer begns n most cases as a growth of tssue, called polyp, nsde the wall of the colon or rectum. If the cells of a tumor (adenomas) acqure the ablty to nvade and spread nto the ntestne and other areas, a malgnant tumor develops (carcnoma or adenocarcnoma). Understandng how change n DNA causes cells of the colon and rectum to become cancerous could gude scentsts n the development of new drugs, treatments and actons durng early stages of the dsease. In Notterman et al. (2001) study, normal tssues were pared wth the two types of tumors, adenoma and adenocarcnoma. The data 1 consst of mrna expresson patterns probed n 4 colon adenoma tssues, 18 adenocarcnoma and 22 pared normal 1 Avalable n mcroarray.prnceton.edu/oncology.

11 3996 G. von Borres, H. Wang / Computatonal Statstcs and Data Analyss 53 (2009) K Means PAM Clara Energy Mclust Dana Hclust Agnes Fanny SOM PPclust Sample Szes Fg. 3. Boxplots of adjusted Rand ndex for PPCLUST and 10 other algorthms on heavly skewed data (Study II) based on 200 smulated datasets under dfferent sample szes. colon samples. In ther study, a two-way herarchcal clusterng algorthm was used to show that genome-wde expresson proflng may permt a molecular classfcaton of the three dfferent types of tssues. Here nstead of clusterng on the tssues, we apply PPCLUST to cluster genes. Snce some of the genes n the orgnal data were observed more than once, the medan of expresson levels of duplcated genes n each database (adenoma and pared normal tssues database, and adenocarcnoma and pared normal tssues database) was calculated. Then smlar transformatons as descrbed n Notterman et al. s (2001) study were performed pror to the applcaton of PPCLUST n the composte database,.e., the followng steps were appled to each dataset: Deleton of expresson levels 0; Calculaton of the logarthm of the expresson level; Deleton of genes havng more than 25% of ther values mssng. In Notterman the percentage cutoff was 15% resultng n a smaller sample. Two data sets are obtaned, one wth 4 adenoma and pared normal tssues for 4175 genes and the other one wth 18 adenocarcnomas and pared normal tssues for 4234 genes. Only 1038 genes are common to both data sets. The exstence of pared data allows the applcaton of PPCLUST to the dfference n gene expresson levels of cancer (adenoma or adenocarcnoma) and normal tssues. The dea s that genes not related to the dsease should not have sgnfcant changes n expresson levels for cancer and normal tssues. However, genes that have sgnfcant changes n expresson level can be dentfed through a clusterng algorthm. PPCLUST wth a few sgnfcance levels s appled to the data. For sgnfcance levels greater than , the clusterng resulted n too many small clusters of genes; for sgnfcance levels much smaller than , the man structure of groups obtaned stays stable. So we use as our sgnfcance level. Fg. 4 presents the heatmap of orgnal dfferences n expresson levels of adenoma and normal tssues n pared samples and the heatmap wth genes ordered by the groups to whch they were allocated. There s a concentraton of zero to postve expresson levels for ths data wth no clear exstence of any gene groups. After applyng PPCLUST, genes were clustered nto 6 groups wth 38 (0.91%), 316 (7.57%), 9 (0.22%), 3573 (85.58%), 221 (5.29%), and 15 (0.36%) genes, respectvely. The frst three groups contans genes that are sgnfcantly down regulated and the last two groups consst of genes that are sgnfcantly up regulated for adenomas compared to normal tssues. There s also a set of 3 (0.07%) genes that cannot be clustered wth any other gene. The largest group s formed mostly by genes that had no sgnfcant dfference n ther expresson levels between adenomas and normal tssues. We also appled PPCLUST to the dfference n expresson levels of adenocarcnoma and normal tssues. In ths case, 7 groups are obtaned wth only 4 (0.09%) genes not assgned to any group. The number of genes n each group are 91 (2.15%), 774 (18.28%), 9 (0.21%), 2673 (63.13%), 5 (0.12%), 655 (15.47), and 23 (0.54%). The heatmaps before and after clusterng are gven n Fg. 5. Among the 1038 genes that are present n both data sets, the membershp assgnment for comparng adenoma versus normal and adenocarcnoma versus normal tssues are tabulated n Table 6. Among these genes, 558 of them had no sgnfcant change n expresson for both adenoma and adenocarcnoma tssues. For the other genes that are not sgnfcantly dfferentally expressed n adenoma tssues, usually, there s no sgnfcant change of expresson levels n carcnoma tssues. Smlarly, genes that are sgnfcantly down regulated n adenoma tssues tend to be also down regulated n carcnoma

12 G. von Borres, H. Wang / Computatonal Statstcs and Data Analyss 53 (2009) Fg. 4. Heatmaps for Adenoma Normal Tssues before and after groupng Fg. 5. Heatmaps for Adenocarcnoma Normal Tssues before and after clusterng. Table 6 Dstrbuton of 1038 genes present n both adenoma and adenocarcnoma tssue types. Genes n group 0 are not grouped by PPCLUST, and genes n group 4 are those genes that are dfferentally expressed n nether tssue type. Adenoma groups Adenocarcnoma groups tssues. The same pattern s also observed for sgnfcantly up-regulated genes. Only 10 genes had opposte expresson levels n both types of tssues.

13 3998 G. von Borres, H. Wang / Computatonal Statstcs and Data Analyss 53 (2009) Comparng the heatmaps obtaned before and after clusterng n both tssue types reveals that n carcnoma tssues the clusterng of genes s more evdent than n adenoma tssues. Ths s due to the larger dfferences n expresson levels of carcnoma related genes. Results from the clusterng of the gene expresson data n the colorectal cancer study suggest target genes to molecular bologsts for further lab experments. 6. Concluson In ths artcle, we proposed a novel computatonal algorthm, PPCLUST, for effectvely clusterng a large number of random varables wth small number of replcatons per varable. The avalablty of replcatons allows us to use p-values from a (md-)rank test of homogeneous dstrbuton developed by Wang and Akrtas (2004) as smlarty measures to determne f a group need to be parttoned. Snce no optmzaton s necessary, the computatonal cost s dramatcally reduced compared to commonly used algorthms appled to a large number of varables. In addton, PPCLUST has the advantage that t s nvarant to monotone transformatons of data and can automatcally determne the number of clusters wth a specfed sgnfcance level. In our smulaton studes, PPCLUST outperformed 10 other benchmark algorthms commonly used n the mcroarray lterature when consderng clusterng accuracy, stablty and speed. The superor performance of PPCLUST on hgh dmensonal data wth small sample szes make t a useful tool n such data that arse from many dscplnes. Acknowledgements We are grateful to the two referees and the Edtor for ther helpful comments that mproved the presentaton of ths manuscrpt. We would also lke to acknowledge SAS Insttute Brazl for the use of SAS through academc agreement wth Unversty of Brasla. References Akrtas, M.G., Arnold, S., Asymptotcs for analyss of varance when the number of levels s large. Journal of The Amercan Statstcal Assocaton 95, Akrtas, M.G., Papadatos, N., Heteroscedastc one-way ANOVA and lack-of-ft tests. Journal of The Amercan Statstcal Assocaton 99, Alon, U., Barka, N., Notterman, D.A., Gsh, K., Ybarra, S., Mack, D., Levne, A.J., Broad patterns of gene expresson revealed by clusterng analyss of tumor and normal colon tssues probed by olgonucleotde arrays. Proceedngs of the Natonal Academy of Scences USA 96, Benjamn, Y., Hochberg, Y., Controllng the false dscovery rate: A practcal and powerful approach to multple testng. JRSSB 57, Bradley, P.S., Fayyad, U.M., Refnng ntal ponts for K-means clusterng. In: Proceedngs of the Ffteenth Internatonal Conference on Machne Learnng. Morgan kaufmann publshers, Inc., San Francsco, CA, pp Efron, B., Correlaton and large-scale smultaneous sgnfcance testng. Journal of the Amercan Statstcal Assocaton 102, Efron, B., Tbshran, R., On testng the sgnfcance of sets of genes. Annals of Appled Statstcs 1, Fraley, C., Algorthms for model-based Gaussan herarchcal clusterng. SIAM 20. Fraley, C., Raftery, A.E., MCLUST verson 3.0: An R package for normal mxture modelng and model-based clusterng, Techncal Report, Unversty of Washngton. Fu, L., Medco, E., Flame, a novel fuzzy clusterng method for the analyss of DNA mcroarray data. BMC Bonformatcs 8. Hartgan, J.A., Wong, M.A., A K-means clusterng algorthm. Appled Statstcs 28, Hubert, L., Arabe, P., Comparng parttons. Journal of Classfcaton 2, Huttenhower, C., Flamholz, A.I., Lands, J.N., Sah, S., Myers, C.L., Olszewsk, K.L., Hbbs, M.A., Semens, N.O., Troyanskaya, O.G., Coller, H.A., Nearest neghbor networks: Clusterng expresson data based on gene neghborhoods. BMC Bonformatcs 8. Jang, D., Tang, C., Zhang, A., Cluster analyss for gene expresson data: A survey. IEEE Transactons on Knowledge and Data Engneerng 16, Johnson, D.E., Appled Multvarate Methods for Data Analyss. Duxbury. Kaufman, L., Rousseeuw, P.J., Fndng Groups n Data: An Introducton to Cluster Analyss. Wley Interscence. Kohonen, T., Self-organzaton and Assocatve Memory. Sprnger. McQueen, J.B., Some methods for classfcaton and analyss of multvarate observatons. In: Proceedngs of Ffth Berkeley Symposum on Mathematcal Statstcs and Probablty. Mllgan, G.W., Cooper, M.C., A study of the comparablty of external crtera for herarchcal cluster analyss. Multvarate Behavoral Research 21, Notterman, D.A., Alon, U., Serk, A.J., Levne, A.J., Transcrptonal gene expresson profles of colorectal adenoma, adenocarcnoma, and normal tssue examned by olgonucleotde arrays. Cancer Research 61, Pawtan, Y., In all lkelhood: Statstcal modelng and nference usng lkelhood, Oxford. Pena, J.M., Lozano, J.A., Larranaga, P., An emprcal comparson of four ntalzaton methods for the K-Means algorthm. Pattern Recognton Letters 20, Qu, X., Yakovlev, A., Some comments on nstablty of false dscovery rate estmaton. Journal of Bonformatcs and Computatonal Bology 4, Rand, W.M., Objectve crtera for the evaluaton of clusterng methods. JASA 36, Sabatt, C., False dscovery rate and multple comparson procedures. In: DNA Mcroarrays and Related Genomcs Technques: Desgn, Analyss, and Interpretaton of Experments. Chapman & Hall/CRC, pp Storey, J., A drect approach to false dscovery rates. Journal of the Royal Statstcal Socety B 64 (3), Storey, J.D., Tbshran, R., Statstcal sgnfcance for genomewde studes. Proceedngs of the Natonal Academy of Scences USA 16, Strmmer, K., A unfed approach to false dscovery rate estmaton. BMC Bonformatcs 9, 303. Subramanan, A., Tamayo, P., Mootha, V.K., Mukherjee, S., Ebert, B.L., Gllette, M.A., Paulovch, A., Pomeroy, S.L., Golub, T.R., Lander, E.S., Mesrov, J.P., Gene set enrchment analyss: A knowledge-based approach for nterpretng genome-wde expresson profles. Proceedngs of the Natonal Academy of Scences USA 43, Székely, G.J., Rzzo, M.L., Herarchcal clusterng va jont between-wthn dstances: Extendng Ward s mnmum varance method. Journal of Classfcaton 22, Wang, H., Akrtas, M.G., Rank tests for ANOVA wth large number of factor levels. Journal of Nonparametrc Statstcs 16, Yeung, K.Y., Ruzzo, W.L., Prncpal component analyss for clusterng gene expresson data. Bonformatcs 9,

Copy Number Variation Methods and Data

Copy Number Variation Methods and Data Copy Number Varaton Methods and Data Copy number varaton (CNV) Reference Sequence ACCTGCAATGAT TAAGCCCGGG TTGCAACGTTAGGCA Populaton ACCTGCAATGAT TAAGCCCGGG TTGCAACGTTAGGCA ACCTGCAATGAT TTGCAACGTTAGGCA

More information

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores Parameter Estmates of a Random Regresson Test Day Model for Frst Three actaton Somatc Cell Scores Z. u, F. Renhardt and R. Reents Unted Datasystems for Anmal Producton (VIT), Hedeweg 1, D-27280 Verden,

More information

Physical Model for the Evolution of the Genetic Code

Physical Model for the Evolution of the Genetic Code Physcal Model for the Evoluton of the Genetc Code Tatsuro Yamashta Osamu Narkyo Department of Physcs, Kyushu Unversty, Fukuoka 8-856, Japan Abstract We propose a physcal model to descrbe the mechansms

More information

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures Usng the Perpendcular Dstance to the Nearest Fracture as a Proxy for Conventonal Fracture Spacng Measures Erc B. Nven and Clayton V. Deutsch Dscrete fracture network smulaton ams to reproduce dstrbutons

More information

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16 310 Int'l Conf. Par. and Dst. Proc. Tech. and Appl. PDPTA'16 Akra Sasatan and Hrosh Ish Graduate School of Informaton and Telecommuncaton Engneerng, Toka Unversty, Mnato, Tokyo, Japan Abstract The end-to-end

More information

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) Internatonal Assocaton of Scentfc Innovaton and Research (IASIR (An Assocaton Unfyng the Scences, Engneerng, and Appled Research Internatonal Journal of Emergng Technologes n Computatonal and Appled Scences

More information

Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer

Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer Gene Selecton Based on Mutual Informaton for the Classfcaton of Mult-class Cancer Sheng-Bo Guo,, Mchael R. Lyu 3, and Tat-Mng Lok 4 Department of Automaton, Unversty of Scence and Technology of Chna, Hefe,

More information

Study and Comparison of Various Techniques of Image Edge Detection

Study and Comparison of Various Techniques of Image Edge Detection Gureet Sngh et al Int. Journal of Engneerng Research Applcatons RESEARCH ARTICLE OPEN ACCESS Study Comparson of Varous Technques of Image Edge Detecton Gureet Sngh*, Er. Harnder sngh** *(Department of

More information

Using Past Queries for Resource Selection in Distributed Information Retrieval

Using Past Queries for Resource Selection in Distributed Information Retrieval Purdue Unversty Purdue e-pubs Department of Computer Scence Techncal Reports Department of Computer Scence 2011 Usng Past Queres for Resource Selecton n Dstrbuted Informaton Retreval Sulleyman Cetntas

More information

Insights in Genetics and Genomics

Insights in Genetics and Genomics Insghts n Genetcs and Genomcs Research Artcle Open Access New Score Tests for Equalty of Varances n the Applcaton of DNA Methylaton Data Analyss [Verson ] Welang Qu Xuan L Jarrett Morrow Dawn L DeMeo Scott

More information

Optimal Planning of Charging Station for Phased Electric Vehicle *

Optimal Planning of Charging Station for Phased Electric Vehicle * Energy and Power Engneerng, 2013, 5, 1393-1397 do:10.4236/epe.2013.54b264 Publshed Onlne July 2013 (http://www.scrp.org/ournal/epe) Optmal Plannng of Chargng Staton for Phased Electrc Vehcle * Yang Gao,

More information

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION computng@tanet.edu.te.ua www.tanet.edu.te.ua/computng ISSN 727-6209 Internatonal Scentfc Journal of Computng FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION Gábor Takács ), Béla Patak

More information

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana Internatonal Journal of Appled Scence and Technology Vol. 5, No. 6; December 2015 Modelng the Survval of Retrospectve Clncal Data from Prostate Cancer Patents n Komfo Anokye Teachng Hosptal, Ghana Asedu-Addo,

More information

Biomarker Selection from Gene Expression Data for Tumour Categorization Using Bat Algorithm

Biomarker Selection from Gene Expression Data for Tumour Categorization Using Bat Algorithm Receved: March 20, 2017 401 Bomarker Selecton from Gene Expresson Data for Tumour Categorzaton Usng Bat Algorthm Gunavath Chellamuthu 1 *, Premalatha Kandasamy 2, Svasubramanan Kanagaraj 3 1 School of

More information

Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago

Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago Jont Modellng Approaches n dabetes research Clncal Epdemology Unt, Hosptal Clínco Unverstaro de Santago Outlne 1 Dabetes 2 Our research 3 Some applcatons Dabetes melltus Is a serous lfe-long health condton

More information

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE Wang Yng, Lu Xaonng, Ren Zhhua, Natonal Meteorologcal Informaton Center, Bejng, Chna Tel.:+86 684755, E-mal:cdcsjk@cma.gov.cn Abstract From, n Chna meteorologcal

More information

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx Neuropsychologa xxx (200) xxx xxx Contents lsts avalable at ScenceDrect Neuropsychologa journal homepage: www.elsever.com/locate/neuropsychologa Storage and bndng of object features n vsual workng memory

More information

ALMALAUREA WORKING PAPERS no. 9

ALMALAUREA WORKING PAPERS no. 9 Snce 1994 Inter-Unversty Consortum Connectng Unverstes, the Labour Market and Professonals AlmaLaurea Workng Papers ISSN 2239-9453 ALMALAUREA WORKING PAPERS no. 9 September 211 Propensty Score Methods

More information

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE JOHN H. PHAN The Wallace H. Coulter Department of Bomedcal Engneerng, Georga Insttute of Technology, 313 Ferst Drve Atlanta,

More information

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of Appled Mathematcal Scences, Vol. 7, 2013, no. 41, 2047-2053 HIKARI Ltd, www.m-hkar.com Modelng Mult Layer Feed-forward Neural Network Model on the Influence of Hypertenson and Dabetes Melltus on Famly

More information

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy Appendx for Insttutons and Behavor: Expermental Evdence on the Effects of Democrac 1. Instructons 1.1 Orgnal sessons Welcome You are about to partcpate n a stud on decson-makng, and ou wll be pad for our

More information

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS ELLIOTT PARKER and JEANNE WENDEL * Department of Economcs, Unversty of Nevada, Reno, NV, USA SUMMARY Ths paper examnes the econometrc

More information

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015 Incorrect Belefs Overconfdence Econ 1820: Behavoral Economcs Mark Dean Sprng 2015 In objectve EU we assumed that everyone agreed on what the probabltes of dfferent events were In subjectve expected utlty

More information

Project title: Mathematical Models of Fish Populations in Marine Reserves

Project title: Mathematical Models of Fish Populations in Marine Reserves Applcaton for Fundng (Malaspna Research Fund) Date: November 0, 2005 Project ttle: Mathematcal Models of Fsh Populatons n Marne Reserves Dr. Lev V. Idels Unversty College Professor Mathematcs Department

More information

A GEOGRAPHICAL AND STATISTICAL ANALYSIS OF LEUKEMIA DEATHS RELATING TO NUCLEAR POWER PLANTS. Whitney Thompson, Sarah McGinnis, Darius McDaniel,

A GEOGRAPHICAL AND STATISTICAL ANALYSIS OF LEUKEMIA DEATHS RELATING TO NUCLEAR POWER PLANTS. Whitney Thompson, Sarah McGinnis, Darius McDaniel, A GEOGRAPHICAL AD STATISTICAL AALYSIS OF LEUKEMIA DEATHS RELATIG TO UCLEAR POWER PLATS Whtney Thompson, Sarah McGnns, Darus McDanel, Jean Sexton, Rebecca Pettt, Sarah Anderson, Monca Jackson ABSTRACT:

More information

AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION

AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION www.arpapress.com/volumes/vol8issue2/ijrras_8_2_02.pdf AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION I. Jule 1 & E. Krubakaran 2 1 Department

More information

NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 TIANHONG ZHOU

NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 TIANHONG ZHOU NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 by TIANHONG ZHOU B.S., Chna Agrcultural Unversty, 2003 M.S., Chna Agrcultural Unversty, 2006 A THESIS submtted n partal fulfllment of the requrements

More information

Reconstruction of gene regulatory network of colon cancer using information theoretic approach

Reconstruction of gene regulatory network of colon cancer using information theoretic approach Reconstructon of gene regulatory network of colon cancer usng nformaton theoretc approach Khald Raza #1, Rafat Parveen * # Department of Computer Scence Jama Mlla Islama (Central Unverst, New Delh-11005,

More information

Nonstandard Machine Learning Algorithms for Microarray Data Mining. Byoung-Tak Zhang

Nonstandard Machine Learning Algorithms for Microarray Data Mining. Byoung-Tak Zhang Nonstandard Machne Learnng Algorthms for Mcroarray Data Mnng Byoung-Tak Zhang Center for Bonformaton Technology (CBIT) & Bontellgence Laboratory School of Computer Scence and Engneerng Seoul Natonal Unversty

More information

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis The Lmts of Indvdual Identfcaton from Sample Allele Frequences: Theory and Statstcal Analyss Peter M. Vsscher 1 *, Wllam G. Hll 2 1 Queensland Insttute of Medcal Research, Brsbane, Australa, 2 Insttute

More information

Statistically Weighted Voting Analysis of Microarrays for Molecular Pattern Selection and Discovery Cancer Genotypes

Statistically Weighted Voting Analysis of Microarrays for Molecular Pattern Selection and Discovery Cancer Genotypes IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.2, December 26 73 Statstcally Weghted Votng Analyss of Mcroarrays for Molecular Pattern Selecton and Dscovery Cancer Genotypes

More information

The Influence of the Isomerization Reactions on the Soybean Oil Hydrogenation Process

The Influence of the Isomerization Reactions on the Soybean Oil Hydrogenation Process Unversty of Belgrade From the SelectedWorks of Zeljko D Cupc 2000 The Influence of the Isomerzaton Reactons on the Soybean Ol Hydrogenaton Process Zeljko D Cupc, Insttute of Chemstry, Technology and Metallurgy

More information

Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO

Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO Zuo et al. BMC Bonformatcs (2017) 18:99 DOI 10.1186/s12859-017-1515-1 METHODOLOGY ARTICLE Open Access Incorporatng pror bologcal knowledge for network-based dfferental gene expresson analyss usng dfferentally

More information

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo Estmaton for Pavement Performance Curve based on Kyoto Model : A Case Study for Kazuya AOKI, PASCO CORPORATION, Yokohama, JAPAN, Emal : kakzo603@pasco.co.jp Octávo de Souza Campos, Publc Servces Regulatory

More information

A Geometric Approach To Fully Automatic Chromosome Segmentation

A Geometric Approach To Fully Automatic Chromosome Segmentation A Geometrc Approach To Fully Automatc Chromosome Segmentaton Shervn Mnaee ECE Department New York Unversty Brooklyn, New York, USA shervn.mnaee@nyu.edu Mehran Fotouh Computer Engneerng Department Sharf

More information

Balanced Query Methods for Improving OCR-Based Retrieval

Balanced Query Methods for Improving OCR-Based Retrieval Balanced Query Methods for Improvng OCR-Based Retreval Kareem Darwsh Electrcal and Computer Engneerng Dept. Unversty of Maryland, College Park College Park, MD 20742 kareem@glue.umd.edu Douglas W. Oard

More information

Survival Rate of Patients of Ovarian Cancer: Rough Set Approach

Survival Rate of Patients of Ovarian Cancer: Rough Set Approach Internatonal OEN ACCESS Journal Of Modern Engneerng esearch (IJME) Survval ate of atents of Ovaran Cancer: ough Set Approach Kamn Agrawal 1, ragat Jan 1 Department of Appled Mathematcs, IET, Indore, Inda

More information

Price linkages in value chains: methodology

Price linkages in value chains: methodology Prce lnkages n value chans: methodology Prof. Trond Bjorndal, CEMARE. Unversty of Portsmouth, UK. and Prof. José Fernández-Polanco Unversty of Cantabra, Span. FAO INFOSAMAK Tangers, Morocco 14 March 2012

More information

INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER

INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER LI CHEN 1,2, JIANHUA XUAN 1,*, JINGHUA GU 1, YUE WANG 1, ZHEN ZHANG 2, TIAN LI WANG 2, IE MING SHIH 2 1The Bradley Department

More information

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect Peer revew stream A comparson of statstcal methods n nterrupted tme seres analyss to estmate an nterventon effect a,b, J.J.J., Walter c, S., Grzebeta a, R. & Olver b, J. a Transport and Road Safety, Unversty

More information

Chapter 20. Aggregation and calibration. Betina Dimaranan, Thomas Hertel, Robert McDougall

Chapter 20. Aggregation and calibration. Betina Dimaranan, Thomas Hertel, Robert McDougall Chapter 20 Aggregaton and calbraton Betna Dmaranan, Thomas Hertel, Robert McDougall In the prevous chapter we dscussed how the fnal verson 3 GTAP data base was assembled. Ths data base s extremely large.

More information

THE NATURAL HISTORY AND THE EFFECT OF PIVMECILLINAM IN LOWER URINARY TRACT INFECTION.

THE NATURAL HISTORY AND THE EFFECT OF PIVMECILLINAM IN LOWER URINARY TRACT INFECTION. MET9401 SE 10May 2000 Page 13 of 154 2 SYNOPSS MET9401 SE THE NATURAL HSTORY AND THE EFFECT OF PVMECLLNAM N LOWER URNARY TRACT NFECTON. L A study of the natural hstory and the treatment effect wth pvmecllnam

More information

EXAMINATION OF THE DENSITY OF SEMEN AND ANALYSIS OF SPERM CELL MOVEMENT. 1. INTRODUCTION

EXAMINATION OF THE DENSITY OF SEMEN AND ANALYSIS OF SPERM CELL MOVEMENT. 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol.3/00, ISSN 64-6037 Łukasz WITKOWSKI * mage enhancement, mage analyss, semen, sperm cell, cell moblty EXAMINATION OF THE DENSITY OF SEMEN AND ANALYSIS OF

More information

A Computer-aided System for Discriminating Normal from Cancerous Regions in IHC Liver Cancer Tissue Images Using K-means Clustering*

A Computer-aided System for Discriminating Normal from Cancerous Regions in IHC Liver Cancer Tissue Images Using K-means Clustering* A Computer-aded System for Dscrmnatng Normal from Cancerous Regons n IHC Lver Cancer Tssue Images Usng K-means Clusterng* R. M. CHEN 1, Y. J. WU, S. R. JHUANG, M. H. HSIEH, C. L. KUO, Y. L. MA Department

More information

The Effect of Fish Farmers Association on Technical Efficiency: An Application of Propensity Score Matching Analysis

The Effect of Fish Farmers Association on Technical Efficiency: An Application of Propensity Score Matching Analysis The Effect of Fsh Farmers Assocaton on Techncal Effcency: An Applcaton of Propensty Score Matchng Analyss Onumah E. E, Esslfe F. L, and Asumng-Brempong, S 15 th July, 2016 Background and Motvaton Outlne

More information

An Introduction to Modern Measurement Theory

An Introduction to Modern Measurement Theory An Introducton to Modern Measurement Theory Ths tutoral was wrtten as an ntroducton to the bascs of tem response theory (IRT) modelng and ts applcatons to health outcomes measurement for the Natonal Cancer

More information

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field Subject-Adaptve Real-Tme Sleep Stage Classfcaton Based on Condtonal Random Feld Gang Luo, PhD, Wanl Mn, PhD IBM TJ Watson Research Center, Hawthorne, NY {luog, wanlmn}@usbmcom Abstract Sleep stagng s the

More information

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters Tenth Internatonal Conference on Computatonal Flud Dynamcs (ICCFD10), Barcelona, Span, July 9-13, 2018 ICCFD10-227 Predcton of Total Pressure Drop n Stenotc Coronary Arteres wth Ther Geometrc Parameters

More information

Boosting for tumor classification with gene expression data. Seminar für Statistik, ETH Zürich, CH-8092, Switzerland

Boosting for tumor classification with gene expression data. Seminar für Statistik, ETH Zürich, CH-8092, Switzerland BIOINFORMATICS Vol. 19 no. 9 2003, pages 1061 1069 DOI: 10.1093/bonformatcs/btf867 Boostng for tumor classfcaton wth gene expresson data Marcel Dettlng and Peter Bühlmann Semnar für Statstk, ETH Zürch,

More information

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA Journal of Theoretcal and Appled Informaton Technology 2005 ongong JATIT & LLS ISSN: 1992-8645 www.jatt.org E-ISSN: 1817-3195 A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA 1 SUNGMIN

More information

CLUSTERING is always popular in modern technology

CLUSTERING is always popular in modern technology Max-Entropy Feed-Forward Clusterng Neural Network Han Xao, Xaoyan Zhu arxv:1506.03623v1 [cs.lg] 11 Jun 2015 Abstract The outputs of non-lnear feed-forward neural network are postve, whch could be treated

More information

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data Unobserved Heterogenety and the Statstcal Analyss of Hghway Accdent Data Fred L. Mannerng Professor of Cvl and Envronmental Engneerng Courtesy Department of Economcs Unversty of South Florda 4202 E. Fowler

More information

4.2 Scheduling to Minimize Maximum Lateness

4.2 Scheduling to Minimize Maximum Lateness 4. Schedulng to Mnmze Maxmum Lateness Schedulng to Mnmzng Maxmum Lateness Mnmzng lateness problem. Sngle resource processes one ob at a tme. Job requres t unts of processng tme and s due at tme d. If starts

More information

Encoding processes, in memory scanning tasks

Encoding processes, in memory scanning tasks vlemory & Cognton 1976,4 (5), 501 506 Encodng processes, n memory scannng tasks JEFFREY O. MILLER and ROBERT G. PACHELLA Unversty of Mchgan, Ann Arbor, Mchgan 48101, Three experments are presented that

More information

Lateral Transfer Data Report. Principal Investigator: Andrea Baptiste, MA, OT, CIE Co-Investigator: Kay Steadman, MA, OTR, CHSP. Executive Summary:

Lateral Transfer Data Report. Principal Investigator: Andrea Baptiste, MA, OT, CIE Co-Investigator: Kay Steadman, MA, OTR, CHSP. Executive Summary: Samar tmed c ali ndus t r esi nc 55Fl em ngdr ve, Un t#9 Cambr dge, ON. N1T2A9 T el. 18886582206 Ema l. nf o@s amar t r ol l boar d. c om www. s amar t r ol l boar d. c om Lateral Transfer Data Report

More information

Introduction ORIGINAL RESEARCH

Introduction ORIGINAL RESEARCH ORIGINAL RESEARCH Assessng the Statstcal Sgnfcance of the Acheved Classfcaton Error of Classfers Constructed usng Serum Peptde Profles, and a Prescrpton for Random Samplng Repeated Studes for Massve Hgh-Throughput

More information

CONSTRUCTION OF STOCHASTIC MODEL FOR TIME TO DENGUE VIRUS TRANSMISSION WITH EXPONENTIAL DISTRIBUTION

CONSTRUCTION OF STOCHASTIC MODEL FOR TIME TO DENGUE VIRUS TRANSMISSION WITH EXPONENTIAL DISTRIBUTION Internatonal Journal of Pure and Appled Mathematcal Scences. ISSN 97-988 Volume, Number (7), pp. 3- Research Inda Publcatons http://www.rpublcaton.com ONSTRUTION OF STOHASTI MODEL FOR TIME TO DENGUE VIRUS

More information

Resampling Methods for the Area Under the ROC Curve

Resampling Methods for the Area Under the ROC Curve Resamplng ethods for the Area Under the ROC Curve Andry I. Bandos AB6@PITT.EDU Howard E. Rockette HERBST@PITT.EDU Department of Bostatstcs, Graduate School of Publc Health, Unversty of Pttsburgh, Pttsburgh,

More information

Using a Wavelet Representation for Classification of Movement in Bed

Using a Wavelet Representation for Classification of Movement in Bed Usng a Wavelet Representaton for Classfcaton of Movement n Bed Adrana Morell Adam Depto. de Matemátca e Estatístca Unversdade de Caxas do Sul Caxas do Sul RS E-mal: amorell@ucs.br André Gustavo Adam Depto.

More information

Evaluation of Literature-based Discovery Systems

Evaluation of Literature-based Discovery Systems Evaluaton of Lterature-based Dscovery Systems Melha Yetsgen-Yldz 1 and Wanda Pratt 1,2 1 The Informaton School, Unversty of Washngton, Seattle, USA. 2 Bomedcal and Health Informatcs, School of Medcne,

More information

Evaluation of two release operations at Bonneville Dam on the smolt-to-adult survival of Spring Creek National Fish Hatchery fall Chinook salmon

Evaluation of two release operations at Bonneville Dam on the smolt-to-adult survival of Spring Creek National Fish Hatchery fall Chinook salmon Evaluaton of two release operatons at Bonnevlle Dam on the smolt-to-adult survval of Sprng Creek Natonal Fsh Hatchery fall Chnook salmon By Steven L. Haeseker and Davd Wlls Columba Rver Fshery Program

More information

Active Affective State Detection and User Assistance with Dynamic Bayesian Networks. Xiangyang Li, Qiang Ji

Active Affective State Detection and User Assistance with Dynamic Bayesian Networks. Xiangyang Li, Qiang Ji Actve Affectve State Detecton and User Assstance wth Dynamc Bayesan Networks Xangyang L, Qang J Electrcal, Computer, and Systems Engneerng Department Rensselaer Polytechnc Insttute, 110 8th Street, Troy,

More information

Investigation of zinc oxide thin film by spectroscopic ellipsometry

Investigation of zinc oxide thin film by spectroscopic ellipsometry VNU Journal of Scence, Mathematcs - Physcs 24 (2008) 16-23 Investgaton of znc oxde thn flm by spectroscopc ellpsometry Nguyen Nang Dnh 1, Tran Quang Trung 2, Le Khac Bnh 2, Nguyen Dang Khoa 2, Vo Th Ma

More information

ENRICHING PROCESS OF ICE-CREAM RECOMMENDATION USING COMBINATORIAL RANKING OF AHP AND MONTE CARLO AHP

ENRICHING PROCESS OF ICE-CREAM RECOMMENDATION USING COMBINATORIAL RANKING OF AHP AND MONTE CARLO AHP ENRICHING PROCESS OF ICE-CREAM RECOMMENDATION USING COMBINATORIAL RANKING OF AHP AND MONTE CARLO AHP 1 AKASH RAMESHWAR LADDHA, 2 RAHUL RAGHVENDRA JOSHI, 3 Dr.PEETI MULAY 1 M.Tech, Department of Computer

More information

Richard Williams Notre Dame Sociology Meetings of the European Survey Research Association Ljubljana,

Richard Williams Notre Dame Sociology   Meetings of the European Survey Research Association Ljubljana, Rchard Wllams Notre Dame Socology rwllam@nd.edu http://www.nd.edu/~rwllam Meetngs of the European Survey Research Assocaton Ljubljana, Slovena July 19, 2013 Comparng Logt and Probt Coeffcents across groups

More information

(From the Gastroenterology Division, Cornell University Medical College, New York 10021)

(From the Gastroenterology Division, Cornell University Medical College, New York 10021) ROLE OF HEPATIC ANION-BINDING PROTEIN IN BROMSULPHTHALEIN CONJUGATION* BY N. KAPLOWITZ, I. W. PERC -ROBB,~ ANn N. B. JAVITT (From the Gastroenterology Dvson, Cornell Unversty Medcal College, New York 10021)

More information

A New Machine Learning Algorithm for Breast and Pectoral Muscle Segmentation

A New Machine Learning Algorithm for Breast and Pectoral Muscle Segmentation Avalable onlne www.ejaet.com European Journal of Advances n Engneerng and Technology, 2015, 2(1): 21-29 Research Artcle ISSN: 2394-658X A New Machne Learnng Algorthm for Breast and Pectoral Muscle Segmentaton

More information

What Determines Attitude Improvements? Does Religiosity Help?

What Determines Attitude Improvements? Does Religiosity Help? Internatonal Journal of Busness and Socal Scence Vol. 4 No. 9; August 2013 What Determnes Atttude Improvements? Does Relgosty Help? Madhu S. Mohanty Calforna State Unversty-Los Angeles Los Angeles, 5151

More information

HERMAN AGUINIS University of Colorado at Denver. SCOTT A. PETERSEN U.S. Military Academy at West Point. CHARLES A. PIERCE Montana State University

HERMAN AGUINIS University of Colorado at Denver. SCOTT A. PETERSEN U.S. Military Academy at West Point. CHARLES A. PIERCE Montana State University ORGANIZATIONAL Aguns et al. / MODERATING RESEARCH EFFECTS METHODS Apprasal of the Homogenety of Error Varance Assumpton and Alternatves to Multple Regresson for Estmatng Moderatng Effects of Categorcal

More information

THE NORMAL DISTRIBUTION AND Z-SCORES COMMON CORE ALGEBRA II

THE NORMAL DISTRIBUTION AND Z-SCORES COMMON CORE ALGEBRA II Name: Date: THE NORMAL DISTRIBUTION AND Z-SCORES COMMON CORE ALGEBRA II The normal dstrbuton can be used n ncrements other than half-standard devatons. In fact, we can use ether our calculators or tables

More information

N-back Training Task Performance: Analysis and Model

N-back Training Task Performance: Analysis and Model N-back Tranng Task Performance: Analyss and Model J. Isaah Harbson (jharb@umd.edu) Center for Advanced Study of Language and Department of Psychology, Unversty of Maryland 7005 52 nd Avenue, College Park,

More information

Lymphoma Cancer Classification Using Genetic Programming with SNR Features

Lymphoma Cancer Classification Using Genetic Programming with SNR Features Lymphoma Cancer Classfcaton Usng Genetc Programmng wth SNR Features Jn-Hyuk Hong and Sung-Bae Cho Dept. of Computer Scence, Yonse Unversty, 134 Shnchon-dong, Sudaemoon-ku, Seoul 120-749, Korea hjnh@candy.yonse.ac.kr,

More information

THIS IS AN OFFICIAL NH DHHS HEALTH ALERT

THIS IS AN OFFICIAL NH DHHS HEALTH ALERT THIS IS AN OFFICIAL NH DHHS HEALTH ALERT Dstrbuted by the NH Health Alert Network Health.Alert@dhhs.nh.gov August 26, 2016 1430 EDT (2:30 PM EDT) NH-HAN 20160826 Recommendatons for Accurate Dagnoss of

More information

Optimal probability weights for estimating causal effects of time-varying treatments with marginal structural Cox models

Optimal probability weights for estimating causal effects of time-varying treatments with marginal structural Cox models Optmal probablty weghts for estmatng causal effects of tme-varyng treatments wth margnal structural Cox models Mchele Santacatterna, Cela García-Pareja Rno Bellocco, Anders Sönnerborg, Anna Ma Ekström

More information

Non-linear Multiple-Cue Judgment Tasks

Non-linear Multiple-Cue Judgment Tasks Non-lnear Multple-Cue Tasks Anna-Carn Olsson (anna-carn.olsson@psy.umu.se) Department of Psychology, Umeå Unversty SE-09 87, Umeå, Sweden Tommy Enqvst (tommy.enqvst@psyk.uu.se) Department of Psychology,

More information

Estimation of Relative Survival Based on Cancer Registry Data

Estimation of Relative Survival Based on Cancer Registry Data Revew of Bonformatcs and Bometrcs (RBB) Volume 2 Issue 4, December 203 www.sepub.org/rbb Estmaton of Relatve Based on Cancer Regstry Data Olaf Schoffer *, Ante Nedostate 2, Stefane J. Klug,2 Cancer Epdemology,

More information

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article Jestr Journal of Engneerng Scence and Technology Revew () (08) 5 - Research Artcle Prognoss Evaluaton of Ovaran Granulosa Cell Tumor Based on Co-forest ntellgence Model Xn Lao Xn Zheng Juan Zou Mn Feng

More information

National Polyp Study data: evidence for regression of adenomas

National Polyp Study data: evidence for regression of adenomas 5 Natonal Polyp Study data: evdence for regresson of adenomas 78 Chapter 5 Abstract Objectves The data of the Natonal Polyp Study, a large longtudnal study on survellance of adenoma patents, s used for

More information

NHS Outcomes Framework

NHS Outcomes Framework NHS Outcomes Framework Doman 1 Preventng people from dyng prematurely Indcator Specfcatons Verson: 1.21 Date: May 2018 Author: Clncal Indcators Team NHS Outcomes Framework: Doman 1 Preventng people from

More information

Latent Class Analysis for Marketing Scales Development

Latent Class Analysis for Marketing Scales Development Workng Paper Seres, N.16, 2009 Latent Class Analyss for Marketng Scales Development Francesca Bass Department of Statstcal Scences Unversty of Padua Italy Abstract: Measurement scales are a crucal nstrument

More information

RENAL FUNCTION AND ACE INHIBITORS IN RENAL ARTERY STENOSISA/adbon et al. 651

RENAL FUNCTION AND ACE INHIBITORS IN RENAL ARTERY STENOSISA/adbon et al. 651 Downloaded from http://ahajournals.org by on January, 209 RENAL FUNCTION AND INHIBITORS IN RENAL ARTERY STENOSISA/adbon et al. 65 Downloaded from http://ahajournals.org by on January, 209 Patents and Methods

More information

AUTOMATED CHARACTERIZATION OF ESOPHAGEAL AND SEVERELY INJURED VOICES BY MEANS OF ACOUSTIC PARAMETERS

AUTOMATED CHARACTERIZATION OF ESOPHAGEAL AND SEVERELY INJURED VOICES BY MEANS OF ACOUSTIC PARAMETERS AUTOMATED CHARACTERIZATIO OF ESOPHAGEAL AD SEVERELY IJURED VOICES BY MEAS OF ACOUSTIC PARAMETERS B. García, I. Ruz, A. Méndez, J. Vcente, and M. Mendezona Department of Telecommuncaton, Unversty of Deusto

More information

Feature Selection for Predicting Tumor Metastases in Microarray Experiments using Paired Design

Feature Selection for Predicting Tumor Metastases in Microarray Experiments using Paired Design Feature Selecton for Predctng Tumor Metastases n Mcroarray Experments usng Pared Desgn Qhua Tan 1,2, Mads Thomassen 1 and Torben A. Kruse 1 ORIGINAL RESEARCH 1 Department of Bochemstry, Pharmacology and

More information

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex 1 Sparse Representaton of HCP Grayordnate Data Reveals Novel Functonal Archtecture of Cerebral Cortex X Jang 1, Xang L 1, Jngle Lv 2,1, Tuo Zhang 2,1, Shu Zhang 1, Le Guo 2, Tanmng Lu 1* 1 Cortcal Archtecture

More information

Prototypes in the Mist: The Early Epochs of Category Learning

Prototypes in the Mist: The Early Epochs of Category Learning Journal of Expermental Psychology: Learnng, Memory, and Cognton 1998, Vol. 24, No. 6, 1411-1436 Copyrght 1998 by the Amercan Psychologcal Assocaton, Inc. 0278-7393/98/S3.00 Prototypes n the Mst: The Early

More information

TOPICS IN HEALTH ECONOMETRICS

TOPICS IN HEALTH ECONOMETRICS TOPICS IN HEALTH ECONOMETRICS By VIDHURA SENANI BANDARA WIJAYAWARDHANA TENNEKOON A dssertaton submtted n partal fulfllment of the requrements for the degree of DOCTOR OF PHILOSOPHY WASHINGTON STATE UNIVERSITY

More information

Evaluation of the generalized gamma as a tool for treatment planning optimization

Evaluation of the generalized gamma as a tool for treatment planning optimization Internatonal Journal of Cancer Therapy and Oncology www.jcto.org Evaluaton of the generalzed gamma as a tool for treatment plannng optmzaton Emmanoul I Petrou 1,, Ganesh Narayanasamy 3, Eleftheros Lavdas

More information

Towards Automated Pose Invariant 3D Dental Biometrics

Towards Automated Pose Invariant 3D Dental Biometrics Towards Automated Pose Invarant 3D Dental Bometrcs Xn ZHONG 1, Depng YU 1, Kelvn W C FOONG, Terence SIM 3, Yoke San WONG 1 and Ho-lun CHENG 3 1. Mechancal Engneerng, Natonal Unversty of Sngapore, 117576,

More information

The effect of salvage therapy on survival in a longitudinal study with treatment by indication

The effect of salvage therapy on survival in a longitudinal study with treatment by indication Research Artcle Receved 28 October 2009, Accepted 8 June 2010 Publshed onlne 30 August 2010 n Wley Onlne Lbrary (wleyonlnelbrary.com) DOI: 10.1002/sm.4017 The effect of salvage therapy on survval n a longtudnal

More information

Computing and Using Reputations for Internet Ratings

Computing and Using Reputations for Internet Ratings Computng and Usng Reputatons for Internet Ratngs Mao Chen Department of Computer Scence Prnceton Unversty Prnceton, J 8 (69)-8-797 maoch@cs.prnceton.edu Jaswnder Pal Sngh Department of Computer Scence

More information

A Novel artifact for evaluating accuracies of gear profile and pitch measurements of gear measuring instruments

A Novel artifact for evaluating accuracies of gear profile and pitch measurements of gear measuring instruments A Novel artfact for evaluatng accuraces of gear profle and ptch measurements of gear measurng nstruments Sonko Osawa, Osamu Sato, Yohan Kondo, Toshyuk Takatsuj (NMIJ/AIST) Masaharu Komor (Kyoto Unversty)

More information

*VALLIAPPAN Raman 1, PUTRA Sumari 2 and MANDAVA Rajeswari 3. George town, Penang 11800, Malaysia. George town, Penang 11800, Malaysia

*VALLIAPPAN Raman 1, PUTRA Sumari 2 and MANDAVA Rajeswari 3. George town, Penang 11800, Malaysia. George town, Penang 11800, Malaysia 38 A Theoretcal Methodology and Prototype Implementaton for Detecton Segmentaton Classfcaton of Dgtal Mammogram Tumor by Machne Learnng and Problem Solvng *VALLIAPPA Raman, PUTRA Sumar 2 and MADAVA Rajeswar

More information

Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods

Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods Research Artcle Receved 30 September 2009, Accepted 15 March 2010 Publshed onlne n Wley Onlne Lbrary (wleyonlnelbrary.com) DOI: 10.1002/sm.3941 Estmatng the dstrbuton of the wndow perod for recent HIV

More information

Normal variation in the length of the luteal phase of the menstrual cycle: identification of the short luteal phase

Normal variation in the length of the luteal phase of the menstrual cycle: identification of the short luteal phase Brtsh Journal of Obstetrcs and Gvnaecologjl July 1984, Vol. 9 1, pp. 685-689 Normal varaton n the length of the luteal phase of the menstrual cycle: dentfcaton of the short luteal phase ELIZABETH A. LENTON,

More information

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi HIV/AIDS-related Expectatons and Rsky Sexual Behavor n Malaw Adelne Delavande Unversty of Essex and RAND Corporaton Hans-Peter Kohler Unversty of Pennsylvanna January 202 Abstract We use probablstc expectatons

More information

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi Unversty of Pennsylvana ScholarlyCommons PSC Workng Paper Seres 7-29-20 HIV/AIDS-related Expectatons and Rsky Sexual Behavor n Malaw Adelne Delavande RAND Corporaton, Nova School of Busness and Economcs

More information

A Meta-Analysis of the Effect of Education on Social Capital

A Meta-Analysis of the Effect of Education on Social Capital A Meta-Analyss of the Effect of Educaton on Socal Captal Huang Jan ** "Scholar" Research Center for Educaton and Labor Market Department of Economcs, Unversty of Amsterdam and Tnbergen Insttute by Henrëtte

More information

Journal of Economic Behavior & Organization

Journal of Economic Behavior & Organization Journal of Economc Behavor & Organzaton 133 (2017) 52 73 Contents lsts avalable at ScenceDrect Journal of Economc Behavor & Organzaton j ourna l ho me pa g e: www.elsever.com/locate/jebo Perceptons, ntentons,

More information

Combined Temporal and Spatial Filter Structures for CDMA Systems

Combined Temporal and Spatial Filter Structures for CDMA Systems Combned Temporal and Spatal Flter Structures for CDMA Systems Ayln Yener WINLAB, Rutgers Unversty yener@wnlab.rutgers.edu Roy D. Yates WINLAB, Rutgers Unversty ryates@wnlab.rutgers.edu Sennur Ulukus AT&T

More information

DS May 31,2012 Commissioner, Development. Services Department SPA June 7,2012

DS May 31,2012 Commissioner, Development. Services Department SPA June 7,2012 . h,oshawa o Report To: From: Subject: Development Servces Commttee Item: Date of Report: DS-12-189 May 31,2012 Commssoner, Development Fle: Date of Meetng: Servces Department SPA-2010-09 June 7,2012 Applcaton

More information