Introduction ORIGINAL RESEARCH

Size: px
Start display at page:

Download "Introduction ORIGINAL RESEARCH"

Transcription

1 ORIGINAL RESEARCH Assessng the Statstcal Sgnfcance of the Acheved Classfcaton Error of Classfers Constructed usng Serum Peptde Profles, and a Prescrpton for Random Samplng Repeated Studes for Massve Hgh-Throughput Genomc and Proteomc Studes James Lyons-Weler a,f, Rchard Pelkan b, Herbert J Zeh III c,f, Davd C Whtcomb d,f, Davd E Malehorn e,f, Wllam L Bgbee e,f, Mlos Hauskrecht b,f a Department of Pathology, Cancer Bomarkers Laboratory, Center for Pathology Informatcs, Benedum Oncology Informatcs Center b Department of Computer Scence c Department of Surgery d Departments of Medcne, Cell Bology & Physology, and Human Genetcs e Clncal Proteomcs Faclty f Unversty of Pttsburgh Cancer Insttute Unversty of Pttsburgh Abstract: Peptde profles generated usng SELDI/MALDI tme of flght mass spectrometry provde a promsng source of patentspecfc nformaton wth hgh potental mpact on the early detecton and classfcaton of cancer and other dseases. The new proflng technology comes, however, wth numerous challenges and concerns. Partcularly mportant are concerns of reproducblty of classfcaton results and ther sgnfcance. In ths work we descrbe a computatonal valdaton framework, called PACE (Permutaton-), that lets us assess, for a gven classfcaton model, the sgnfcance of the Acheved Classfcaton Error (ACE) on the profle data. The framework compares the performance statstc of the classfer on true data samples and checks f these are consstent wth the behavor of the classfer on the same data wth randomly reassgned class labels. A statstcally sgnfcant ACE ncreases our belef that a dscrmnatve sgnal was found n the data. The advantage of PACE analyss s that t can be easly combned wth any classfcaton model and s relatvely easy to nterpret. PACE analyss does not protect researchers aganst confoundng n the expermental desgn, or other sources of systematc or random error.we use PACE analyss to assess sgnfcance of classfcaton results we have acheved on a number of publshed data sets. The results show that many of these datasets ndeed possess a sgnal that leads to a statstcally sgnfcant ACE. Keywords: ovaran cancer, pancreatc cancer, prostate cancer, bomarkers, bonformatcs, proteomcs, dsease predcton models, early detecton Introducton Hgh-throughput, low resoluton tme-of-flght mass spectrometry systems such as surface-enhanced laser desorpton onzaton - tme of flght (SELDI-TOF) mass spectrometry (SELDI; Merchant and Wenberger, 2; Issaq et al., 22) and matrx-asssted laser desorpton/onzaton-tme of flght mass spectrometry (MALDI) are just begnnng to emerge as wdely recognzed hgh-throughput data sources for potental markers for the early detecton of cancer (Wrght et al., 1999; Adam et al., 21; Petrcon et al., 22). Spectra, or peptde profles, are readly generated from easly collected samples such as serum, urne, lymph, and cell lysates. Comparsons have been made for a large number of cancers (Table 1) n search of dagnostc markers, wth astonshngly good ntal results for the classfcaton of cancer and control profles collected wthn respectve studes. Wth these very promsng results the questons related to the sgnfcance and reproducblty of such classfcaton results become mmnent. Reproducblty and sgnfcance are essental wth these types of data snce the dentty of the peptdes located at clncally sgnfcant m/z postons that translate to the classfcaton accuracy are unknown and ther correctness cannot be verfed through ndependent expermental studes. The process of peptde profle generaton s subject to many sources of systematc errors. If these are not properly understood they can potentally jeopardze the valdty of the results. Such concerns have led to the analyss of possble bases present n publshed data sets and questons on the reproducblty of some of the obtaned classfcaton results under the proper expermental setup (Baggerly et al., 24). Such studes hghlght the need for randomzaton of sample order acquston and processng, mantanng constant protocols Correspondence: James Lyons-Weler, lyonswelerj@upmc.edu Cancer Informatcs 25:1(1)

2 Lyons-Weler, Pelkan, and Zeh, et al Table 1: Publshed senstvtes and specfctes of SELDI-TOF-MS proflng for varous types of cancers Cancer Type SN, SP Reference Ovaran Cancer 1%, 95% Petrcon et al., 22 Prostate Cancer 1%, 1% Qu et al., 22 Breast Cancer 9%, 93% Vlahou et al., 23 Breast Cancer 91%, 93% L et al., 22 Head & Neck Cancer 83%, 9% Wadsworth et al., 24 Lung Cancer 93.3%, 96.7% Xao et al., 23 Pancreatc Cancer 78%, 97% Koopmann et al., 24 over the course of a study (ncludng sample handlng and storage condtons), dentfcaton of potental confoundng factors and the use of a balanced study desgn whenever possble to allow proper characterzaton of varaton n the non-dseased populaton. Certanly, a desgn matrx should be created for each study and nspected for patterns that reflect complete or severe partal ncdental confoundng. In addton, mult-ste valdaton studes, whch are currently ongong n the EDRN (Early Detecton Research Network), can help to dentfy possble problems. The peptde profle data are not perfect and nclude many random components. The presence of large amounts of randomness s a threat for nterpretve data analyss; the randomness ncreases the possblty of dentfyng a structure and patterns n a completely unnformatve sgnal. In such a case we want to have an addtonal assurance that the data and results of nterpretve (classfcaton) analyss obtaned for these data are not due to chance. Permutaton tests (Kendall, 1945; Good, 1994) used commonly n statstcs offer one soluton approach to ths problem and allow us to determne the sgnfcance of the result under random permutaton of target labels. In ths work, buldng upon the permutaton test theory, we propose a permutaton-based framework called PACE (Permutaton-Acheved Classfcaton Error) that can assess the sgnfcance of the classfcaton error for a gven classfcaton model wth respect to the null hypothess under whch the error result s generated n response to random permutaton of the class labels. The man advantage of the PACE analyss s that t s ndependent of the model desgn. Ths allows the problems of choosng the best dsease predcton model and achevng a sgnfcant result to become decoupled. Many of the methods of hgh-throughput data analyss are very advanced, and thus may be poorly understood by the majorty of researchers who would lke to adopt a relable analyss strategy. Understandng PACE analyss nvolves only vsual examnaton of an ntutve graph (e.g., Fgure 1), whch makes t easy to apply and explan to the novce. In the followng we frst descrbe the classfcaton problem and evaluaton of the classfcaton performance. Next we ntroduce the PACE framework that offers addtonal assessment of the sgnfcance of the results. We compare PACE conceptually to exstng confdence assessment methods; t s found to be potentally complementary to confdence nterval-based bootstrap methods, whch seek to determne whether a confdence nterval around a statstc of nterest ncludes a sngle pont (or a seres of sngle ponts;.e., the ROC curve). Fnally, we apply PACE analyss to a number of publshed and new SELDI-TOF-MS data sets. We demonstrate wth postve and negatve results the utlty of reportng not only the ACE but also whether a gven ACE s statstcally sgnfcant. PACE thus provdes a begnnng reference pont for researchers to determne objectvely whether they have constructed a sgnfcant classfer n the dscovery phase. Evaluaton of classfers Classfcaton Classfcaton s the task of assgnng class labels to sample data whch come from more than one category. In our case, the classfcaton task s to deter- 54 Cancer Informatcs 25:1(1)

3 Serum peptde profle classfers and hgh-throughput random samplng repeated studes mne whether a partcular proteomc profle comes from a case (cancerous) or control (non-cancerous) populaton. A classfcaton model whch assgns labels (ether case or control) to profles can be learned from tranng examples; profles wth known case and control labels. The goal s to acheve a classfer that performs as best as possble on future data. Practcal concerns related to the classfer learnng nclude the possblty of model overft. The overft occurs when the classfcaton model s based strongly towards tranng examples and generalzes poorly to new (unseen) examples. Typcally, model overft occurs due to the ncluson of too many parameters n the model n conjuncton wth a small number of examples. To assess the ablty of the classfcaton model to future data we can splt the data from the study nto tranng and test sets; the tranng set s used n the learnng stage to buld the classfer, the testng set s wthheld from the learnng stage and t s used for evaluaton purposes only. Evaluaton Tranng set: a collecton of samples used to dentfy features and classfcaton rules based on dscrmnatory nformaton derved from the comparsons of features between or among groups. Test set: a collecton of samples to whch the classfcaton rules learned from the tranng set are appled to produce an estmate of the external generalzablty of the estmated classfcaton error. The classfcaton error rate observed when classfer s appled to them s called the test error rate. (Smlarly, the senstvty s called test set senstvty, etc.). The classfer rules learned nclude parameters optmzed usng the tranng set that are then ncluded n the predcton phase (for predctons on the test set). Test errors are usually hgher than the tranng errors; Feng et al refer to the dfference as optmsm ; (Z. Feng, personal communcaton). Test errors are less based than tranng errors, and therefore are more (but not completely) reflectve of the expected classfcaton error should the classfer be appled to new cases from the same populaton. The use of the test data set errors as the estmate s approprate because t s low-based compared to the classfcaton errors acheved usng only the tranng data set. The test set may be a held-out set of samples, or, more commonly, a number of held-out sets to avod naccuracy of ACE. Valdaton set: a set of samples collected and/or processed and/or analyzed n a laboratory or at a ste dfferent from the laboratory or ste where the orgnal tranng sets were produced. Valdaton sets are never ncluded n the learnng step. All valdaton sets are test sets but not all test sets are valdaton sets. The more ndependence there s among sample sets, laboratory protocols, and mplementaton of a partcular method of predctng class membershp, the more robust the bomarkers. Cross-valdaton Methods for estmatng the test error nclude leaveone-out cross-valdaton, k-fold valdaton, and random subsamplng valdaton. The selecton of each of these depends n part on the number of samples avalable; these methods and ther sutablty for applcaton to the analyss of hgh-throughput genomc and proteomc data sets have recently been explored (Braga-Neto & Dougherty, 24). Use of the test error rates and performance measures derved from those rates allows one to assess the expected senstvty (SN) and specfcty (SP) of a gven test or classfer; these performance measures are usually summarzed n a confuson matrx. Even wth these estmated performance measures, however, a more general queston remans: for a broad range of potental outcomes and focus, from bomarker evaluaton, dscovery, valdaton and translaton, what level of senstvty s to be deemed sgnfcant, or suffcent, at a specfed level of specfcty? The clear overall objectve of maxmzng both SN and SP s bult nto the recever-operator-characterstc (ROC) evaluaton of a test, and the search of the most nformatve test usually seeks to maxmze the area under the curve (AUC). Estmates of SN, SP, the ROC curve, and ts area can all be determned usng random subsamplng valdaton. These approaches are well-studed, and ther estmates of expected classfcaton error are generally understood to be less based than those estmated usng tranng data sets. Permutaton based valdaton The ndvdual performance statstcs by themselves, do not always allow us to judge the mportance of the result. In partcular, one should be always con- Cancer Informatcs 25:1(1) 55

4 Lyons-Weler, Pelkan, and Zeh, et al cerned by the possblty that the observed statstc s the result of chance. Careful elmnaton of ths possblty gves more credblty to the result and establshes ts potental mportance. Permutaton test methods offer a class of technques that make ths assessment possble under a wde varety of assumptons. Expected performance under the null model vares wth the specfcs of a desgn, and the dstrbuton of the performance statstcs vary wth the dstrbuton of nformaton among markers and the type of dsease predcton model used. Permutaton test methods work by comparng the statstc of nterest wth the dstrbuton of the statstc obtaned under the null (random) condton. Our prorty n predctve models s to crtcally evaluate the observed dscrmnatory performance. In terms of hypothess testng the null hypothess we want to reject s: The performance statstc of the dsease predcton model on the true data s consstent wth the performance of the model on the data wth randomly assgned class labels. The objectve of optmzng a classfcaton score tself s largely uncontrolled n most genomc and proteomc hgh-throughput analyss studes. Researchers do not, for example, typcally attempt to determne and therefore do not report the statstcal sgnfcance of the senstvty of a test, n spte of the exstence of a number of approaches for performng such assessments. Here we ntroduce a permutaton method for assessng sgnfcance on the acheved classfcaton error (ACE) of a constructed predcton model. Theory A permutaton test s a non-parametrc approach to hypothess testng, whch s useful when the dstrbuton for the statstc of nterest T s unknown. By evaluatng a classfer s statstc of nterest when presented wth data havng randomly permuted labels, an emprcal dstrbuton over T can be estmated. By calculatng the p-value of the statstc s value when the classfer s presented wth the true data, we can determne f the classfer s behavor s statstcally sgnfcant wth respect to the level of confdence α. Let be a set of all permutatons of labels of the dataset wth d examples. The permutaton test (Mukherjee et al., 23) s then defned as: Repeat N tmes (where n s an ndex from 1,,N) Choose a permutaton from a unform dstrbuton over Compute the statstc of nterest for ths permutaton of labels t n where = T ( x d n π, y d,..., x, y 1 π n d π n 1 d x, y denotes a profle-label par, where the profle x s assgned the label accordng to the permutaton n π Construct an emprcal cumulatve dstrbuton over the statstc of nterest: N 1 n Pˆ( T t) = H ( t t ) N n= 1 where H denotes the Heavsde functon. Compute the statstc of nterest for the actual labels, t = T ( x1, y1,..., x d, y d ) and ts correspondng p-value π n ) 56 Cancer Informatcs 25:1(1)

5 Serum peptde profle classfers and hgh-throughput random samplng repeated studes Table 2: Steps n the Analyss of Hgh-Throughput Peptde Proflng Spectra. These steps were elucdated n part n dscusson wth the EDRN Bonformatcs Workng Group. We gratefully acknowledge ther nput. Expermental Desgn Measurement Preprocessng Data Representaton Feature Selecton Classfcaton Selecton of type and numbers of samples to compare Determnaton of sample rate Mass calbraton Profle QA/QC flterng Varance correcton/regularzaton Smoothng Baselne correcton Normalzaton (nternal or external) Profle Algnment Determnaton of profle attrbutes: Peak selecton Whole-profle Partal-profle Bnnng May also nclude peak-fndng algorthms and peak-matchng routnes Identfcaton of profle features whch are lkely to be clncally sgnfcant: Unvarate statstcal analyss Multvarate feature selecton Renderng sample class nferences Computatonal Valdaton / Study Desgn Sgnfcance Testng of ACE Calculaton of an estmated classfcaton error rate whch s hopefully unbased and accurate. May nvolve: Random subsamplng Bootstrappng k-fold valdaton Leave-one-out valdaton PACE (ths paper) Boostrap confdence nterval estmaton (Efron and Tbshran, 1997) If p ˆ = P ( T t ) under the emprcal dstrbuton p Pˆ. α reject the null hypothess. For our purposes, the statstc of nterest T s the acheved classfcaton error (ACE). Applcaton of permutaton-based valdaton to peptde proflng (PACE) We defne a classfcaton method f as all steps appled by a researcher to the data pror to some bologcal nterpretaton. These nclude the steps summarzed n Table 2. In the case of SELDI/MALDI- TOF-MS, ths may nclude mass calbraton, baselne correcton flterng, normalzaton, peakselecton, a varety feature selecton and classfcaton, approaches. We take the poston that every researcher that has decded to approach the problem of analyss of a hgh-throughput proteomc data set has embarked on a journey of method development;.e., the seres of decsons made by the research tself s method f. We assume that the researcher has adopted a study desgn that employs one or more tranng/test set splts, For our purposes, we use 4 random tranng/test splts to acheve a reasonably accurate estmate of ACE. A thrd valdaton sample can be set asde to verfy the statstc on the prstne data. The valdaton set can ether be produced at the same tme, under the same condtons as the tranng/test data set. A more general estmate of the external va- Cancer Informatcs 25:1(1) 57

6 Lyons-Weler, Pelkan, and Zeh, et al MACE ACE 95th % 99th % Fgure 1: Example of PACE analyss. The permutaton-acheved classfcaton error (PACE) dstrbuton s estmated by computng a statstc (n ths case, testng error) over repeated relabelng of the sample data. The top sold lne ndcates the mean acheved classfcaton error (MACE) of ths dstrbuton. The low 95 th and 99 th percentles of the PACE dstrbuton are gven by the dashed and dotted lnes, respectvely. If the acheved classfcaton error (ACE, bottom marked lne) falls below a percentle band, t s a statstcally sgnfcant result at that confdence level. In ths example, ACE for a Naïve Bayes classfer usng a weghted separablty wthout peak selecton or decorrelaton (see below for detals) falls consstently below the 99 th percentle band of the PACE dstrbuton. It can be sad that ths classfer produced a statstcally sgnfcant result at the 99% level ldty of the estmate of the generalzaton error and ts robustness to dfferent laboratory condtons (and thus an assessment of the potental for practcal (clncal) applcaton) s obtaned when the valdaton set s obtaned at a dfferent tme or better yet n a dfferent laboratory (as n multste valdaton studes). Permutaton- (PACE) Analyss Gven the acheved classfcaton error (ACE) estmated va method f, generate an arbtrarly large number of new data sets wth random sample relabelng. Method f s appled to each of the permuted data sets, resultng n a null dstrbuton of ACE (called PACE). Lower 95th and 99th percentles are located n PACE: ACE s then compared to these percentles to assess the statstcal sgnfcance of the classfer method f. Alternatves to PACE The permutaton based approach compares the error acheved on the true data to errors on randomly labeled data. It tres to show that the result for the true data s dfferent from results on the random data, and thus t s unlkely the consequence of a random process. We note that the permutaton-based method s dfferent and thus complementary to standard hypothess testng methods that try to determne confdence ntervals on estmates of the target statstcs. We also note that one may apply standard hypothess testng methods to check f the target statstc for our classfcaton model s statstcally sgnfcantly dfferent from ether the fully random, trval or any other classfcaton model. However, the permutaton framework always looks at the combnaton of the data label generaton and classfcaton processes and thus establshes the dfference n between the performance on the true and random data. Classfcaton error s a composte evaluaton metrc. Other types of performance measures for whch confdence ntervals have been studed so far nclude sgnfcance of SN at a fxed SP (Lnnet, 1987), AUC (as mplemented, for example, n Accu- ROC; Vda, 1993), and the ROC curve tself (Macskassy et al., 23). Here we brefly explan these optons. Whch performance measure to assess may vary accordng to strategy. Bootstrap-estmated or analytcally determned confdence ntervals around SN at a specfed SP (Lnnet, 1987) requres that a desred SP be known, and ths depends on ts ntent; for example a screenng test should have very 58 Cancer Informatcs 25:1(1)

7 Serum peptde profle classfers and hgh-throughput random samplng repeated studes hgh SP to avod resultng n too many false postves when appled to a populaton. Even here, however, very hgh and too many are rather contextdependent, should not be consdered n a slo by gnorng exstng or other proposed dagnostc tests. Acceptable FP values depend to a degree on the SP of exstng practces, and to an extent on the prevalence of the dsease. Any screen can be consdered to change the prevalence of dsease n the potental patent populaton, and therefore follow-up wth panels of mnmally nvasve markers, or multvarate studes of numerous rsk factors (demographc, famlal, vaccnaton, smokng hstory), and longterm montorng, mght make such screenng worthwhle. Hgh-throughput proteomcs hghlghts the need for dynamc clncal dagnostcs. The varous approaches suggested by Lnnet were extended and revsed wth a suggeston by Platt et al. (2) to adopt the bootstrap confdence nterval method (Efron and Tbshran, 1993). A workng paper by Zhou and Qn (23) explores related approaches. One strategy s to perform bootstrappng (Efron and Tbshran, 1993) and calculate a 1-α confdence nterval around a measure of nterest. Bootstrappng s a subsamplng scheme n whch N data sets are created by subsamplng the features of the orgnal data set, wth replacement. Each of the N data sets s analyzed. Confdence ntervals around some measure of nterest (T) can be calculated or consensus nformaton can be gathered; n ether case, varablty n an estmate T s used a measure of robustness of T. Varous mplementatons of the bootstrap are avalable; the least based appears to be bas-corrected accelerated verson (Efron and Tbshran, 1993). A second strategy s to calculate confdence ntervals around the AUC measure. Bootstrappng (Efron and Tbshran, 1997) s sometmes used to estmate AUC confdence ntervals. Relyng on confdence n the AUC can be problematc because t reports on the entre ROC, and, n practce, only part of the ROC s consdered relevant for a partcular applcaton (e.g., hgh SP requred by screenng tests. A lterature on assessng the sgnfcance of partal ROC curves has been developed (Dodd and Pepe, 23; Gefen et al., 23); a recent study (Stephan, Wesselng et al., 23) compared the features and performance of eght programs for ROC analyss. A thrd strategy s to calculate bootstrap confdence bands around the ROC curve tself (Macskassy et al., 23). Under ths approach, bootstrappng s explored and bands are created usng any of a varety of sweepng methods that explore the ROC curve n one (SN) or two (SN and 1-SP) dmensons. Expermental results of PACE analyss on clncal data We appled PACE analyss to the followng publshed data sets, and one new data set from the UPCI, usng a number of methods of analyss: UPCI Pancreatc Cancer Data Ovaran Cancer Data (D1; Petrcon et al., 22) Ovaran Cancer Data (D2; Petrcon et al., 22) Prostate Cancer Data (Qu et al., 22) The UPCI s pancreatc cancer data are only n the prelmnary stages of analyss and we report only ntal results. An ongong study wth an ndependent valdaton set s underway. Preoperatve serum samples were taken from 32 pancreatc cancer cases (17 female, 15 male). Twenty-three non-cancer age, gender, and smokng hstory-matched controls were analyzed; ages ranged from 34 to 87, pancreatc cancer cases had a mean age of 64, controls had a mean age of 67 (p=.19). Of the cancer samples, 16 were resected; 6 patents had locally advanced unresectable dsease, and 1 had metastatc dsease. The ovaran cancer datasets D1 and D2 (Petrcon et al., 22) were obtaned through the clncal proteomcs program databank ( Both datasets were created from the same samples, but D2 was processed usng a dfferent chp surface (WCX2) as opposed to the hydrophobc H4 chp used to generate the data n D1. The samples consst of 1 controls: 61 samples wthout ovaran cysts, 3 samples wth bengn cysts smaller than 2.5 cm, 8 samples wth bengn cysts larger than 2.5 cm, and 1 sample wth Cancer Informatcs 25:1(1) 59

8 Lyons-Weler, Pelkan, and Zeh, et al Table 3: Lst of methods appled to datasets. Each dataset was evaluated usng PACE analyss wth every possble combnaton of these methods. MAC = maxmum allowed correlaton. Method Optons (Choce of one) Peak Detecton On (Select only peaks) Off (Use the whole profle) Feature Selecton Area under ROC curve (AUC) Fsher score J5 test Smple separablty crteron t-test score Weghted separablty crteron De-correlaton Enhancement On (MAC < 1) Off (MAC = 1) Classfcaton Model Naïve Bayesan Classfer Support Vector Machne (SVM) bengn gynecologcal dsease. The samples nclude 1 cases: 24 samples wth stage I ovaran cancer, and 76 samples wth stage II, II and IV ovaran cancer. The prostate cancer dataset (Qu et al., 22) was also acqured from the clncal proteomcs program databank. It conssts of 253 controls: 75 samples wth a prostate-specfc antgen (PSA) level less than 4 ng/ml, 137 samples wth a PSA level between 4 and 1 ng/ml, 16 samples wth a PSA level greater than 1 ng/ml, and 25 samples wth no evdence of dsease and PSA level less than 1 ng/ml. 69 cases exst n ths dataset: 7 samples wth stage I prostate cancer, 31 samples wth stage II and III prostate cancer, and 31 samples wth bopsy-proven prostate cancer and PSA level greater than 4 ng/ml. Methods Appled and Evaluated Table 3 gves a summary of methods appled n the analyss. A bref descrpton of some of these methods s provded below. A thorough descrpton of these methods can be found n Hauskrecht at al. (25, n press). Peak detecton In some crcles t s a strong belef that only peaks n a profle represent nformatve features of a profle. Peak detecton can take place before performng further feature selecton n order to lmt the ntal amount of the profle to be consdered. There are varous ways n whch peak detecton can be performed; for the purposes of our experments, we utlze a peak detecton method that examnes the mean profle generated for all tranng samples, and then determnes ts local maxma. The local maxma postons become the only features consdered for feature selecton later n the ppelne dsplayed n Table 3. Alternatvely, we can gnore the peak detecton phase completely and consder the entre profle for feature selecton. Feature selecton methods Fsher Score: The Fsher score s ntended to be a measure of the dfference between dstrbutons of a sngle varable. A partcular feature s Fsher score s computed by the followng formula: where F( ) = + 2 ( μ μ ) + ( ) 2 σ + ( σ ) 2 ± μ s the mean value for the th feature n the postve or negatve profles, and ± σ s the standard devaton. We utlze a varant of ths crteron (Furey, 2), computed wth the followng formula: μ μ F( ) = σ + σ To avod confuson, we refer to the second formula above as our Fsher-lke score. Features wth hgh Fsher scores possess the desrable qualty of havng a large dfference between means of case versus control groups, whle mantanng low overall varablty. These features are more lkely to be consstently expressed dfferently between case and control samples, and therefore ndcate good canddates for feature selecton. AUC Score (for feature selecton): Recever operatng characterstc curves are commonly used to measure the performance of dagnostc systems n Cancer Informatcs 25:1(1))

9 Serum peptde profle classfers and hgh-throughput random samplng repeated studes terms of ther ht-or-mss behavor. By computng the ROC curve for each feature ndvdually, one can determne the ablty of that feature to separate samples nto the correct groups. Measurng the area under the ROC curve (Hanley et al., 1982) then gves an ndcaton of the feature s probablty of beng a successful bomarker. The AUC score for a gven feature s then obtaned by ntegratng over the ROC curve for that feature. As wth the Fsher score, hgher AUC scores sgnfy better feature canddates. Unvarate t-test: The t-test (Bald et al., 21) can be used to determne f the case versus control dstrbutons of a feature dffer substantally wthn the tranng set populaton. The t statstc, representng a normalzed dstance measurement between populatons, s gven as 2 2 σ σ + t = ( μ μ+ ) + n n+ where μ, σ are the emprcal mean and standard devaton for the th feature n the n control samples, and μ +, σ + are lkewse the emprcal mean and standard devaton for the th feature n the case samples. The t statstc follows a Student dstrbuton wth f [( σ / n ) + ( σ / n )] 2 2 = + + ( σ 2 ( 2 / n ) + / ) σ n + n n 1 degrees of freedom. For each feature, one can then calculate the t statstc and assocated f, and determne the assocated p-value wth a predetermned confdence level from a standard table of sgnfcance. Smaller p-values ndcate t s unlkely the observed case and control populatons of the th feature are smlar by chance. Thus, t s lkely that the th feature s represented n a way that s statstcally sgnfcant between case and control examples, makng t a good canddate for feature selecton. 2 We also evaluated feature selecton usng smple separablty, weghted separablty, and the J5 test (Patel and Lyons-Weler, 24). De-correlaton enhancement: After dfferental feature selecton, we can perform further feature evaluaton to avod hghly correlated features. These may be of nterest for nterpretng the bologcal sources of varaton among peptdes (such as carrer protens; Mehta et al., 23). For the purpose of constructng ndependent classfers, however, t may be better to avod usng non-ndependent features - f only to ncrease the number of features ncluded after feature selecton - but also to avod overtranng on a large number of hghly correlated features. One way to avod these correlated features s de-correlaton (removal of features whch are nter-correlated beyond some pre-determned threshold). All of the methods descrbed so far can be evaluated wth and wthout de-correlaton. Prncpal component analyss: Prncpal component analyss, a type of feature constructon, ncorporates aspects of de-correlaton by groupng correlated features nto aggregate features (components), whch are presumed to be orthogonal (.e., uncorrelated). Classfcaton models Naïve Bayes: The Naïve Bayes classfer makes the assumpton that the state of a feature (ndcatng membershp n the case or control group) s ndependent of the states of other features when the sample s class (case or control) s known. Let X = { x, x 2,..., 1 n be a sample consstng of n features, and C = { c, c 2,... c be a set of m target classes to whch X mght belong. One can compute the probablty of a sample belongng to a partcular class usng Bayes rule: P( c X ) = m x 1 m j= 1 j } } P( X c ) P( c ) P( X c ) P( c ) j Cancer Informatcs 25:1(1) 61

10 Lyons-Weler, Pelkan, and Zeh, et al The lkelhood of sample X belongng to a partcular target class c j s gven as the product of each probablty densty functon for each feature n the populaton of c j. For our purposes, we assume each feature x k follows a Gaussan dstrbuton, although other dstrbutons are possble. Thus, the probablty densty functon for feature x k s where P( x k c j P( X c ) = ) = 1 j 2πσ k = 1 are the mean and standard devaton of the k th feature wthn the populaton of samples belongng to class c j. These two values, and ther correspondng par for the control populaton, must be estmated usng the emprcal nformaton seen n the tranng set for each feature. The estmates are then used n the computaton above durng the predctve process on the testng set. Support Vector Machne (SVM): One mght magne a sample wth n features as a pont n an n- dmensonal space. Ideally, we would lke to separate the n-dmensonal space nto parttons that contan all samples from ether case or control populatons exclusvely. The lnear support vector machne or SVM (Vapnk 1995, Burges 1995) accomplshes ths goal by separatng the n-dmensonal space nto 2 parttons wth a hyperplane wth the equaton w T X + w = where w s the normal to the hyperplane, and kj w s the dstance between the support vectors. These support vectors are the representatve samples from each class whch are most helpful for defnng n P( x c ) exp μ kj, σ kj k 1 2 j xk μkj σ kj 2 the decson boundary. The parameters of the model, w and w can be learned from data n the tranng set through quadratc optmzaton usng a set of Lagrange parameters αˆ (Scholkopf 22). These parameters allow us to redefne the decson boundary as wˆ T x + T w = α y ( x x) + w SV where only samples n the support vector contrbute to the computaton of the decson boundary. Fnally, the support vector machne determnes a classfcaton for the th sample as seen here: ŷ T yˆ = sgn ˆ α y ( x x) + w SV where negatve ŷ s wll occur below the hyperplane, and postve ŷ s wll occur above t. Ideally, all samples from one group wll have negatve ŷ whle all others wll have postve ŷ PACE Results All four cancer datasets were analyzed usng classfers defned by dfferng confguratons of feature selecton crtera, peak selecton, de-correlaton, and classfcaton models. De-correlaton MAC thresholds range from 1 (no de-correlaton) to.4 (strct de-correlaton) n ncrements of.2. To assess the statstcal sgnfcance of the classfers generated through these confguratons, PACE analyss was performed usng 1 random permutatons of the ˆ 62 Cancer Informatcs 25:1(1)

11 Serum peptde profle classfers and hgh-throughput random samplng repeated studes data over 4 splts nto tranng and testng sets. Classfers were evaluated over the range of 5 to 25 features, n ncrements of 5 features. For llustratve purposes, examples of PACE graphs are presented n the appendces of ths work. These graphs represent only a porton of the classfers evaluated for ths work. In partcular, the appendces present PACE graphs for SVM classfers enforcng a.6 MAC threshold, both before and after peak selecton, for each of the unvarate feature selecton methods. UPCI Pancreatc Cancer Data Each possble confguraton of classfcaton models produced a statstcally sgnfcant classfer at the 99% level. Ths trend was observed for all feature szes n each classfer. See fgures A.1 through A.6 for examples of PACE analyss on ths dataset usng dfferent feature selecton crtera. Ovaran Cancer Data (D1; Petrcon et al., 22) Each possble confguraton of classfcaton models produced a statstcally sgnfcant classfer at the 99% level. Ths trend was observed for all feature szes n each classfer. See fgures B.1 through B.6 for examples of PACE analyss on ths dataset usng dfferent feature selecton crtera. Ovaran Cancer Data (D2; Petrcon et al., 22) Each possble confguraton of classfcaton models produced a statstcally sgnfcant classfer at the 99% level. Ths trend was observed for all feature szes n each classfer. See fgures C.1 through C.6 for examples of PACE analyss on ths dataset usng dfferent feature selecton crtera. Prostate Cancer Data (Qu et al., 22) Under random feature selecton, several classfers were produced whch were not statstcally sgnfcant at the 99% or 95% level. Usng the Naïve Bayes classfcaton model, the generated classfers were not sgnfcant at the 95% level for small amounts of features (5-15). As de-correlaton becomes strcter, the classfers lost statstcal sgnfcance at hgh amounts of features where they had been sgnfcant wth a more lenent MAC. When couplng ths technque wth peak selecton, no statstcally sgnfcant classfers were produced. Wth an SVM-based classfer usng random feature selecton, the produced classfers were sgnfcant at the 99% level except when usng the ntal 5 features. Changes n MAC and peak selecton dd not change ths behavor. In general, Naïve Bayesan classfers usng unvarate feature selecton crtera are sgnfcant at the 99% level as long as peak selecton s not performed beforehand. The one excepton was the J5 test, whch was unable to produce a sgnfcant classfer at the 95% level wthout the ad of de-correlaton. Applyng de-correlaton allowed these classfers to acheve sgnfcance at the 99% level. When performng peak selecton, only the classfers produced usng the strctest MAC thresholds (.6,.4) were able to acheve some form of sgnfcance, and even then, only at hgh amounts of features (15-25). The weghted separablty score was unable to produce a sgnfcant naïve Bayes classfer usng peak selecton. SVM classfers usng unvarate feature selecton crtera were nearly always sgnfcant at the 99% level, ether wth or wthout peak selecton. The few nstances where there was no sgnfcance at the 95% level occurred usng the J5 and smple separablty scores wthout de-correlaton. In the case of the J5 score, lowerng the MAC to.8 remeded the stuaton, whle the smple separablty score mproved smply through ncorporatng addtonal features. See fgures D.1 through D.6 for examples of PACE analyss on ths dataset usng dfferent feature selecton crtera. Dscusson We have before us a dauntng challenge of creatng conduts of clear and meanngful communcaton and understandng between consumers (statstcans, computatonal machne learnng experts, bonformatcans) and the producers of hgh Cancer Informatcs 25:1(1) 63

12 Lyons-Weler, Pelkan, and Zeh, et al throughput data sets. The objectve s to maxmze the rate at whch clncally sgnfcant patterns can be dscovered and valdated. Dscplnes can be brdged n part by a straghtforward reference pont on performance provded by decson-theoretc performance measures. Nevertheless, performance characterstcs that are typcally reported (SN, SP, PPV, NPV) only provde partal nformaton on performance (the method s performance n the alternatve case). Researchers may be reluctant to publsh results that have relatvely low SN and SP (e.g.,.75,.8), and yet ths level of performance may n fact be hghly surprsng gven the sample numbers and degree of varablty (due to nose varance). Stellar results such as hgh 9 s senstvty and specfcty predomnate n the publshed cancer lterature (Table 1), posng the queston of whether the early reports of hgh performance may have set the standard too hgh. Some bologcal sgnal and powers of prognoss can be expected to be lower. Our work focuses on the queston: what represents a remarkable SN? SP? AUC? ACE? We study ths from the perspectve that proteomc proflng represents only one of many dfferent sources of potental clncally sgnfcant nformaton, and that combned use of panels of bomarkers and other molecular and classcal dagnostc nformaton s lkely to be requred f proteomc proflng becomes wdely adopted. Mnmze ACE: Conjecture or Tautology? In mcroarray analyss, most papers descrbe a new algorthm or test for fndng dfferentally expressed genes. Ths makes s dffcult to assess the valdty of a gven analytcal strategy (method of analyss). We recommend that a standard be consdered for the assessment of the mpact of partcular decsons n the constructon of an analytcal strategy, ncludng decsons made durng pre-processng (Fgs. 2 and 3): Specfcally, Any method that results n a sgnfcant ACE s to be preferred over methods that do not acheve sgnfcance. All sgnfcant methods (at a specfed degree of sgnfcance) are equally justfed for the tme beng. It s possble that dfferent methods that acheve sgnfcant ACE wll dentfy dstnct feature sets, n whch case each feature set s potentally nterestng. Note that we are not suggestng that reproducblty s not mportant;.e., deally, the same methods on smlarly-szed dfferent data sets should acheve smlar levels of sgnfcance. Indeed, reproducblty s key; therefore, the methods that yeld smlar levels of sgnfcance n repeated experences are also valdated. Note also that we are also not recommendng that one should adopt the somewhat opposng poston that The method that mnmzes ACE wll tend to be most sgnfcant, and therefore wll lkely be best justfed. In contrast, we consder t lkely that clncally sgnfcant nformaton may exst at a varety of scales wthn these large data sets. The search for a method-any method- wth the most sgnfcant ACE from a sngle data set seems lkely to lead to overestmates of the expected clncal utlty of a set of bomarkers. Comparsons of ACE across cancer types and wth ndependent data set would be nformatve. Nonsgnfcant Results Reasons for negatve results mght nclude no bologcal sgnal, poor study desgn or laboratory SOPs, poor technology, or low bologcal sgnal (requrng larger numbers of samples). It s our poston that researchers are better nformed whether the result s sgnfcant or not. For example, a non-sgnfcant ACE may nform the researcher that they should refne or redrect ther research queston; an example mght nclude early detecton of a gven dsease provdng a negatve result n the pre-dsease state, suggestng that one mght move the focus to early stage dsease nstead of pre-dsease. Whle the clncal predcton of a potental outcome durng the course of dsease may not be possble from the precondtoned state, the research mght shft focus toward how early can ths condton be predcted? Whle we report few non-sgnfcant results, we have seen non-sgnfcant results from unpublshed, propretary studes of whch we cannot report the detals. The results are unpublshed n part due to the negatve results, and n part due to the changes n the expermental desgn that has resulted due to achevng a negatve result. 64 Cancer Informatcs 25:1(1)

13 Serum peptde profle classfers and hgh-throughput random samplng repeated studes Relaton of PACE to Smlar Methods PACE creates a dstrbuton of the expected ACE under the null condton. The fxed measure ACE s the average classfcaton error over all random subsamplngs. Ths generates a dstrbuton around ACE, and the determnaton of sgnfcance could nvolve a comparson of the degree of overlap between the ACE and PACE dstrbutons. As we have seen, PACE s smlar n focus to a number of alternatve methods wth slghtly dstnct mplementatons and foc. These nclude the ROC bootstrap confdence nterval on AUC, confdence nterval estmaton around SN at a fxed SP, and bootstrap bands around the ROC curve tself. The bootstrap ROC s used to determne a confdence nterval around an estmated area under the ROC curve (AUC); we are most nterested n the specfc part of feature space where a classfer works best, not n the overall performance of a classfer over a range of strngency, and thus PACE focuses on comparng a pont estmate of statstc theta to ts null dstrbuton. A tradtonal lmtaton of permutaton tests s an assumpton of symmetry; n our case, we are only nterested n the lower tal of the PACE dstrbuton. In the case of ndvdual performance measures (SN, SP) or the composte AUC, one would be nterested only n the upper tal of ACE. Symmetry s also known to be an especally mportant assumpton when estmatng the confdence nterval around the AUC (Efron and Tbshran, 1993). The queston of relatve sutablty of these alternatves should be determned emprcally to determne f any practcal dfferences exst n ths partcular applcaton. So the queston s posed: whch statstcal assessment of confdence s of most practcal (appled) nterest: the specfc measurement of classfcaton error acheved by x n the learnng stage of the actual study, or the dstrbuton of the classfcaton error n magned alternatve cases? We prefer to make our nferences on the data set at hand, for the tme beng, usng magned alternatves that nvolve a (hopefully) well-posed null condton. The bas-corrected accelerated bootstrap confdence nterval (Efron and Tbshran, 1993), whch s rangerespectng and range-preservng (and unbased, as the name suggests) corrects for dfferences between the medan AUC of some of the pseudosamples and that of the orgnal sample, makng the magned alternatve samples more lke the actual sample. Ths method should also be explored n ths context. Some of these dsparate methods could also potentally be combned (e.g., PACE as the null dstrbuton and ROC bootstrappng to assess confdence ntervals around ACE). Ths would use the degree of overlap of dstrbutons nstead of specfc nstance outsde of a generated dstrbuton. A more formal exploraton of these possbltes seems warranted. Robustness of PACE and Permutaton Approaches to the Stark Realtes of Hgh-Throughput Scence PACE provde a reference pont that s robust to many of the vagares n study desgn common to peptde proflng studes, such as dfferent numbers of techncal replcates per sample that result from the applcaton of QA/QC. Compared to dstrbuton-dependent crtera that would otherwse requre adjustments to degrees of freedom, both PACE and the bootstrap are relevant for the data set at hand. Caveats PACE and the other methods cted here do not protect ncdental partal or complete confoundng. True valdaton of the results of any hgh-throughput analyss should nvolve more than one ste, deally wth the applcaton of a specfc classfer rule learned at ste A to data generated at ste B. Further, to protect aganst amplfcaton of local bases by data preprocessng steps, the preprocessng must be wrapped wthn the permutaton loop. A Word on Coverage It s mportant to consder n the development and evaluaton of bomarker-based classfcaton rule whether a sample s classfable;.e., do the rules developed and data at hand provde suffcently precse nformaton on a gven sample. The proporton of samples that are predctable n a data s defned as coverage. If a strategy s adopted whereby a number of samples are not classfed, the evaluaton scheme (whether t be a bootstrap, random subsamplngderved confdence boundares, or permutaton sg- Cancer Informatcs 25:1(1) 65

14 Lyons-Weler, Pelkan, and Zeh, et al nfcance test) should also be forced to not classfy the same number of samples. These enforced passes on a sample must be checked and enforced after the predcton stage to conserve the numercal and statstcal aspects of the study desgn and data set (e.g.s, number of samples; varablty wthn m/z class). Research s needed to determne the mportance of asymptotc propertes, dependences of the bootstrap ROC on the monotonc or jaggedness of the ROC curve, and the use of combned dstrbutons (.e., measure of degree of overlap between the PACE dstrbuton, as the null dstrbuton, and the bootstrap ROC curve as varablty n the estmated classfer performance measure of nterest n separate nstance of the study). Towards a More Complete Characterzaton of the Problem In the consderaton of further development and mprovements n analytc methods for the analyss of peptde profles, we assume that detaled descrptons of fundamental characterstcs of lowresoluton peptde profles can be used to help set prortes n the constructon of partcular strateges. These descrptons/observatons nclude an acknowledgement of somewhat hgh mass accuracy (.2-.4%); a comprehenson that ndvdual m/z values are not specfc (.e., they are not unque to ndvdual peptdes), and therefore ntensty measures at a gven m/z value reflect sum ntensty of peptde m/z classes, whch may or may not be functonally assocated; an understandng that peptdes do not map to sngle ndvdual peptdes;.e., they exst two or more tmes n the profle at dfferent m/z values as varously protonated forms. Each peptde may have a roughly unque sgnature, and pattern matchng forms the bass of peptde fngerprnt data mnng, but a peptde need not occur as a sngle peak; an understandng that m/z varance wll contan bologcal sources (mass shfts due to amno acd sequence varaton and varyng degrees of ubqunaton and cleavage, bndng of peptdes wth others), chemcal, and physcal components (mass drft), and thus models that allow the statstcal accountng of each of these varance components are needed; an understandng that hgh ntensty measurement n SELDI-TOF-MS profles tend to exhbt hgher varance, whch suggest that relance of peaks for any nference (analyzng peaks only, algnng peaks, or normalzng profles to peaks) may add large, unwanted components of varance or restrct fndng to peptdes wth ntenstes that are most naccurately measured; the acknowledgment that the m/z vector s an arbtrary vector along whch ntensty values of smlarly massed and charged peptdes are arranged, and, as an arbtrary ndex n and of tself may requre (or deserve) no profound bologcal explanaton and may or may not offer a profound bologcal nsght related to the clncal questons at hand beyond a gude to dentty of peptde by pattern matchng; observatons that features determned to be sgnfcant tend to be locally correlated and that long-range correlatons also exst, and that both artfactual and bologcally mportant correlatons and ant-correlatons may exst at both dstances; an expectaton that correlatons may exst that reflect protonated forms of peptdes and that some correlaton/antcorrelaton pars may reflect real peptde bology, such as enzymatc cleavage cascades; smlarly, the observaton that at least part of the local autocorrelaton observed n the profle s lkely due to poor resoluton (mass drft), and reflects a physcal property of the profles (nstrument measurement error and resolvng power). It may also reflect smoothng due to natural bologcal varaton n the populaton from whch the samples were drawn, the effects of summng ntenstes of dstnct peptdes that share smlar but not dentcal m/z values. One mght consder 66 Cancer Informatcs 25:1(1)

15 Serum peptde profle classfers and hgh-throughput random samplng repeated studes whether the local correlatons all reflect real bologcal propertes of sngle peptdes at partcular m/z postons, and, f not, they may offer no bologcal nsght and may requre no bologcal explanaton (.e. local autocorrelaton may be smple artfact of degree of resoluton of the nstrument and the lack of specfcty of m/z values). These descrptons may help motvate research on varance correctons, de-correlaton, the use of PCA, profle algnment strateges, and attempts at transformaton. Other Open Questons As hgh-throughput genomc and proteomc data become less expensve, and the laboratory equpment spreads nto an ncreasng number of facltes, t seems lkely that dfferent laboratores wll study the sample problem wth completely ndependent effort. Publshed data sets, therefore, represent profoundly useful potental source of corroboraton, or valdaton, of bomarker sets that mght be expected to exhbt reproducble dfferences n large portons of the patent populaton. A careful characterzaton and valdaton of those dfferences, as a step that s ndependent of the queston of potental clncal utlty, s essental n these studes. True valdaton by planned repeated experments may seem dauntng, or unwarranted at ths early stage, and the tendency wll be to attempt to valdate markers deemed to be sgnfcant n a small study usng other technology (mmunohstochemstry, for example). In ths case, absence of valdaton of specfc protens wth other technology s not complete refutaton due to the potental for dosyncrases n ths new applcaton of mass spec technology. Computatonal valdaton appled at the step of feature selecton alone could prove nvaluable (.e., whch features are reproducbly dfferent between cases and controls, responders and nonresponders, n ndependently analyzed subsets or splts of the data samples?) Large mult-year and mult-ste studes As unlkely as large-scale repeated studes may seem, t seems mmnent that studes of peptde profles from thousands of patents and normal donors wll be forthcomng. What are the practcal problems n such a settng? We would advocate avodng the temptaton to vew one large data set (say, 5, patent, 5, normal) as a sngle study, and would recommend analyss of multple, random ndependent (non-overlappng) subsets, whch would provde true valdaton of feature selecton methods and classfcaton nferences. Such large studes wll occur over long tmer perods. Laboratory condtons change, and manufacturers change kts and protocols; thus, to maxmze the generalzablty of the performance characterstcs of a traned classfer, tranng and test sets should be randomly selected and blnded. We must remember that learnng s asymptotc. Therefore, researchers should avod evaluatng a classfer bult on tranng data set 1 produced at tme 1 wth testng set produced at tme 2; nstead, they should randomze the data over the entre tme perod, even f ths means re-learnng a classfer after publshng an ntally nternally vald classfer usng data set 1. Ths approach stll nvolves tranng, but protects aganst a based (overly pessmstc) result due to shfts n laboratory condtons. Future Drectons n Peptde Proflng Gven that the dstrbuton of pure nose varance over the m/z range s not unform under the null condton, unvarate feature selecton methods such as t-tests, Fsher s score, area under the curve (AUC) and ther nonparametrc alternatves are perhaps best appled as permutaton tests to attempt to equalze the Type 1 error rate over the m/z range ncluded n an analyss. When combned wth PACE, ths greatly ncreases the computatonal burden of analyzng even a small set of profles, but the pay-off should be mmense. Features that are not sgnfcant under the parametrc, dstrbuton-dependent tests can become sgnfcant under the permutaton test for sgnfcance, and the reverse shfts are also possble. Ths becomes especally mportant when usng sgnfcance levels to select n-ranked features. When permutaton feature selecton methods are then combned wth classfcaton algorthms such as PCA, SVM, or nearest neghbor algorthms, and then are evaluated by PACE or bootstrap methods, ths clearly wll requre a large network dedcated to Cancer Informatcs 25:1(1) 67

Copy Number Variation Methods and Data

Copy Number Variation Methods and Data Copy Number Varaton Methods and Data Copy number varaton (CNV) Reference Sequence ACCTGCAATGAT TAAGCCCGGG TTGCAACGTTAGGCA Populaton ACCTGCAATGAT TAAGCCCGGG TTGCAACGTTAGGCA ACCTGCAATGAT TTGCAACGTTAGGCA

More information

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) Internatonal Assocaton of Scentfc Innovaton and Research (IASIR (An Assocaton Unfyng the Scences, Engneerng, and Appled Research Internatonal Journal of Emergng Technologes n Computatonal and Appled Scences

More information

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE JOHN H. PHAN The Wallace H. Coulter Department of Bomedcal Engneerng, Georga Insttute of Technology, 313 Ferst Drve Atlanta,

More information

Using Past Queries for Resource Selection in Distributed Information Retrieval

Using Past Queries for Resource Selection in Distributed Information Retrieval Purdue Unversty Purdue e-pubs Department of Computer Scence Techncal Reports Department of Computer Scence 2011 Usng Past Queres for Resource Selecton n Dstrbuted Informaton Retreval Sulleyman Cetntas

More information

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores Parameter Estmates of a Random Regresson Test Day Model for Frst Three actaton Somatc Cell Scores Z. u, F. Renhardt and R. Reents Unted Datasystems for Anmal Producton (VIT), Hedeweg 1, D-27280 Verden,

More information

Physical Model for the Evolution of the Genetic Code

Physical Model for the Evolution of the Genetic Code Physcal Model for the Evoluton of the Genetc Code Tatsuro Yamashta Osamu Narkyo Department of Physcs, Kyushu Unversty, Fukuoka 8-856, Japan Abstract We propose a physcal model to descrbe the mechansms

More information

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA Journal of Theoretcal and Appled Informaton Technology 2005 ongong JATIT & LLS ISSN: 1992-8645 www.jatt.org E-ISSN: 1817-3195 A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA 1 SUNGMIN

More information

Resampling Methods for the Area Under the ROC Curve

Resampling Methods for the Area Under the ROC Curve Resamplng ethods for the Area Under the ROC Curve Andry I. Bandos AB6@PITT.EDU Howard E. Rockette HERBST@PITT.EDU Department of Bostatstcs, Graduate School of Publc Health, Unversty of Pttsburgh, Pttsburgh,

More information

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures Usng the Perpendcular Dstance to the Nearest Fracture as a Proxy for Conventonal Fracture Spacng Measures Erc B. Nven and Clayton V. Deutsch Dscrete fracture network smulaton ams to reproduce dstrbutons

More information

AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION

AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION www.arpapress.com/volumes/vol8issue2/ijrras_8_2_02.pdf AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION I. Jule 1 & E. Krubakaran 2 1 Department

More information

Study and Comparison of Various Techniques of Image Edge Detection

Study and Comparison of Various Techniques of Image Edge Detection Gureet Sngh et al Int. Journal of Engneerng Research Applcatons RESEARCH ARTICLE OPEN ACCESS Study Comparson of Varous Technques of Image Edge Detecton Gureet Sngh*, Er. Harnder sngh** *(Department of

More information

ALMALAUREA WORKING PAPERS no. 9

ALMALAUREA WORKING PAPERS no. 9 Snce 1994 Inter-Unversty Consortum Connectng Unverstes, the Labour Market and Professonals AlmaLaurea Workng Papers ISSN 2239-9453 ALMALAUREA WORKING PAPERS no. 9 September 211 Propensty Score Methods

More information

INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER

INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER LI CHEN 1,2, JIANHUA XUAN 1,*, JINGHUA GU 1, YUE WANG 1, ZHEN ZHANG 2, TIAN LI WANG 2, IE MING SHIH 2 1The Bradley Department

More information

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect Peer revew stream A comparson of statstcal methods n nterrupted tme seres analyss to estmate an nterventon effect a,b, J.J.J., Walter c, S., Grzebeta a, R. & Olver b, J. a Transport and Road Safety, Unversty

More information

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16 310 Int'l Conf. Par. and Dst. Proc. Tech. and Appl. PDPTA'16 Akra Sasatan and Hrosh Ish Graduate School of Informaton and Telecommuncaton Engneerng, Toka Unversty, Mnato, Tokyo, Japan Abstract The end-to-end

More information

Statistically Weighted Voting Analysis of Microarrays for Molecular Pattern Selection and Discovery Cancer Genotypes

Statistically Weighted Voting Analysis of Microarrays for Molecular Pattern Selection and Discovery Cancer Genotypes IJCSNS Internatonal Journal of Computer Scence and Network Securty, VOL.6 No.2, December 26 73 Statstcally Weghted Votng Analyss of Mcroarrays for Molecular Pattern Selecton and Dscovery Cancer Genotypes

More information

Balanced Query Methods for Improving OCR-Based Retrieval

Balanced Query Methods for Improving OCR-Based Retrieval Balanced Query Methods for Improvng OCR-Based Retreval Kareem Darwsh Electrcal and Computer Engneerng Dept. Unversty of Maryland, College Park College Park, MD 20742 kareem@glue.umd.edu Douglas W. Oard

More information

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana Internatonal Journal of Appled Scence and Technology Vol. 5, No. 6; December 2015 Modelng the Survval of Retrospectve Clncal Data from Prostate Cancer Patents n Komfo Anokye Teachng Hosptal, Ghana Asedu-Addo,

More information

Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer

Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer Gene Selecton Based on Mutual Informaton for the Classfcaton of Mult-class Cancer Sheng-Bo Guo,, Mchael R. Lyu 3, and Tat-Mng Lok 4 Department of Automaton, Unversty of Scence and Technology of Chna, Hefe,

More information

Reconstruction of gene regulatory network of colon cancer using information theoretic approach

Reconstruction of gene regulatory network of colon cancer using information theoretic approach Reconstructon of gene regulatory network of colon cancer usng nformaton theoretc approach Khald Raza #1, Rafat Parveen * # Department of Computer Scence Jama Mlla Islama (Central Unverst, New Delh-11005,

More information

Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago

Joint Modelling Approaches in diabetes research. Francisco Gude Clinical Epidemiology Unit, Hospital Clínico Universitario de Santiago Jont Modellng Approaches n dabetes research Clncal Epdemology Unt, Hosptal Clínco Unverstaro de Santago Outlne 1 Dabetes 2 Our research 3 Some applcatons Dabetes melltus Is a serous lfe-long health condton

More information

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx Neuropsychologa xxx (200) xxx xxx Contents lsts avalable at ScenceDrect Neuropsychologa journal homepage: www.elsever.com/locate/neuropsychologa Storage and bndng of object features n vsual workng memory

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and Ths artcle appeared n a journal publshed by Elsever. The attached copy s furnshed to the author for nternal non-commercal research and educaton use, ncludng for nstructon at the authors nsttuton and sharng

More information

Feature Selection for Predicting Tumor Metastases in Microarray Experiments using Paired Design

Feature Selection for Predicting Tumor Metastases in Microarray Experiments using Paired Design Feature Selecton for Predctng Tumor Metastases n Mcroarray Experments usng Pared Desgn Qhua Tan 1,2, Mads Thomassen 1 and Torben A. Kruse 1 ORIGINAL RESEARCH 1 Department of Bochemstry, Pharmacology and

More information

Optimal Planning of Charging Station for Phased Electric Vehicle *

Optimal Planning of Charging Station for Phased Electric Vehicle * Energy and Power Engneerng, 2013, 5, 1393-1397 do:10.4236/epe.2013.54b264 Publshed Onlne July 2013 (http://www.scrp.org/ournal/epe) Optmal Plannng of Chargng Staton for Phased Electrc Vehcle * Yang Gao,

More information

Biomarker Selection from Gene Expression Data for Tumour Categorization Using Bat Algorithm

Biomarker Selection from Gene Expression Data for Tumour Categorization Using Bat Algorithm Receved: March 20, 2017 401 Bomarker Selecton from Gene Expresson Data for Tumour Categorzaton Usng Bat Algorthm Gunavath Chellamuthu 1 *, Premalatha Kandasamy 2, Svasubramanan Kanagaraj 3 1 School of

More information

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015 Incorrect Belefs Overconfdence Econ 1820: Behavoral Economcs Mark Dean Sprng 2015 In objectve EU we assumed that everyone agreed on what the probabltes of dfferent events were In subjectve expected utlty

More information

Project title: Mathematical Models of Fish Populations in Marine Reserves

Project title: Mathematical Models of Fish Populations in Marine Reserves Applcaton for Fundng (Malaspna Research Fund) Date: November 0, 2005 Project ttle: Mathematcal Models of Fsh Populatons n Marne Reserves Dr. Lev V. Idels Unversty College Professor Mathematcs Department

More information

Lymphoma Cancer Classification Using Genetic Programming with SNR Features

Lymphoma Cancer Classification Using Genetic Programming with SNR Features Lymphoma Cancer Classfcaton Usng Genetc Programmng wth SNR Features Jn-Hyuk Hong and Sung-Bae Cho Dept. of Computer Scence, Yonse Unversty, 134 Shnchon-dong, Sudaemoon-ku, Seoul 120-749, Korea hjnh@candy.yonse.ac.kr,

More information

Lateral Transfer Data Report. Principal Investigator: Andrea Baptiste, MA, OT, CIE Co-Investigator: Kay Steadman, MA, OTR, CHSP. Executive Summary:

Lateral Transfer Data Report. Principal Investigator: Andrea Baptiste, MA, OT, CIE Co-Investigator: Kay Steadman, MA, OTR, CHSP. Executive Summary: Samar tmed c ali ndus t r esi nc 55Fl em ngdr ve, Un t#9 Cambr dge, ON. N1T2A9 T el. 18886582206 Ema l. nf o@s amar t r ol l boar d. c om www. s amar t r ol l boar d. c om Lateral Transfer Data Report

More information

CONSTRUCTION OF STOCHASTIC MODEL FOR TIME TO DENGUE VIRUS TRANSMISSION WITH EXPONENTIAL DISTRIBUTION

CONSTRUCTION OF STOCHASTIC MODEL FOR TIME TO DENGUE VIRUS TRANSMISSION WITH EXPONENTIAL DISTRIBUTION Internatonal Journal of Pure and Appled Mathematcal Scences. ISSN 97-988 Volume, Number (7), pp. 3- Research Inda Publcatons http://www.rpublcaton.com ONSTRUTION OF STOHASTI MODEL FOR TIME TO DENGUE VIRUS

More information

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of Appled Mathematcal Scences, Vol. 7, 2013, no. 41, 2047-2053 HIKARI Ltd, www.m-hkar.com Modelng Mult Layer Feed-forward Neural Network Model on the Influence of Hypertenson and Dabetes Melltus on Famly

More information

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis The Lmts of Indvdual Identfcaton from Sample Allele Frequences: Theory and Statstcal Analyss Peter M. Vsscher 1 *, Wllam G. Hll 2 1 Queensland Insttute of Medcal Research, Brsbane, Australa, 2 Insttute

More information

An Introduction to Modern Measurement Theory

An Introduction to Modern Measurement Theory An Introducton to Modern Measurement Theory Ths tutoral was wrtten as an ntroducton to the bascs of tem response theory (IRT) modelng and ts applcatons to health outcomes measurement for the Natonal Cancer

More information

A New Machine Learning Algorithm for Breast and Pectoral Muscle Segmentation

A New Machine Learning Algorithm for Breast and Pectoral Muscle Segmentation Avalable onlne www.ejaet.com European Journal of Advances n Engneerng and Technology, 2015, 2(1): 21-29 Research Artcle ISSN: 2394-658X A New Machne Learnng Algorthm for Breast and Pectoral Muscle Segmentaton

More information

THE NATURAL HISTORY AND THE EFFECT OF PIVMECILLINAM IN LOWER URINARY TRACT INFECTION.

THE NATURAL HISTORY AND THE EFFECT OF PIVMECILLINAM IN LOWER URINARY TRACT INFECTION. MET9401 SE 10May 2000 Page 13 of 154 2 SYNOPSS MET9401 SE THE NATURAL HSTORY AND THE EFFECT OF PVMECLLNAM N LOWER URNARY TRACT NFECTON. L A study of the natural hstory and the treatment effect wth pvmecllnam

More information

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy Appendx for Insttutons and Behavor: Expermental Evdence on the Effects of Democrac 1. Instructons 1.1 Orgnal sessons Welcome You are about to partcpate n a stud on decson-makng, and ou wll be pad for our

More information

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters Tenth Internatonal Conference on Computatonal Flud Dynamcs (ICCFD10), Barcelona, Span, July 9-13, 2018 ICCFD10-227 Predcton of Total Pressure Drop n Stenotc Coronary Arteres wth Ther Geometrc Parameters

More information

Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification

Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification Fast Algorthm for Vectorcardogram and Interbeat Intervals Analyss: Applcaton for Premature Ventrcular Contractons Classfcaton Irena Jekova, Vessela Krasteva Centre of Bomedcal Engneerng Prof. Ivan Daskalov

More information

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article Jestr Journal of Engneerng Scence and Technology Revew () (08) 5 - Research Artcle Prognoss Evaluaton of Ovaran Granulosa Cell Tumor Based on Co-forest ntellgence Model Xn Lao Xn Zheng Juan Zou Mn Feng

More information

Encoding processes, in memory scanning tasks

Encoding processes, in memory scanning tasks vlemory & Cognton 1976,4 (5), 501 506 Encodng processes, n memory scannng tasks JEFFREY O. MILLER and ROBERT G. PACHELLA Unversty of Mchgan, Ann Arbor, Mchgan 48101, Three experments are presented that

More information

A Linear Regression Model to Detect User Emotion for Touch Input Interactive Systems

A Linear Regression Model to Detect User Emotion for Touch Input Interactive Systems 2015 Internatonal Conference on Affectve Computng and Intellgent Interacton (ACII) A Lnear Regresson Model to Detect User Emoton for Touch Input Interactve Systems Samt Bhattacharya Dept of Computer Scence

More information

A Novel artifact for evaluating accuracies of gear profile and pitch measurements of gear measuring instruments

A Novel artifact for evaluating accuracies of gear profile and pitch measurements of gear measuring instruments A Novel artfact for evaluatng accuraces of gear profle and ptch measurements of gear measurng nstruments Sonko Osawa, Osamu Sato, Yohan Kondo, Toshyuk Takatsuj (NMIJ/AIST) Masaharu Komor (Kyoto Unversty)

More information

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex 1 Sparse Representaton of HCP Grayordnate Data Reveals Novel Functonal Archtecture of Cerebral Cortex X Jang 1, Xang L 1, Jngle Lv 2,1, Tuo Zhang 2,1, Shu Zhang 1, Le Guo 2, Tanmng Lu 1* 1 Cortcal Archtecture

More information

Richard Williams Notre Dame Sociology Meetings of the European Survey Research Association Ljubljana,

Richard Williams Notre Dame Sociology   Meetings of the European Survey Research Association Ljubljana, Rchard Wllams Notre Dame Socology rwllam@nd.edu http://www.nd.edu/~rwllam Meetngs of the European Survey Research Assocaton Ljubljana, Slovena July 19, 2013 Comparng Logt and Probt Coeffcents across groups

More information

EXAMINATION OF THE DENSITY OF SEMEN AND ANALYSIS OF SPERM CELL MOVEMENT. 1. INTRODUCTION

EXAMINATION OF THE DENSITY OF SEMEN AND ANALYSIS OF SPERM CELL MOVEMENT. 1. INTRODUCTION JOURNAL OF MEDICAL INFORMATICS & TECHNOLOGIES Vol.3/00, ISSN 64-6037 Łukasz WITKOWSKI * mage enhancement, mage analyss, semen, sperm cell, cell moblty EXAMINATION OF THE DENSITY OF SEMEN AND ANALYSIS OF

More information

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data Unobserved Heterogenety and the Statstcal Analyss of Hghway Accdent Data Fred L. Mannerng Professor of Cvl and Envronmental Engneerng Courtesy Department of Economcs Unversty of South Florda 4202 E. Fowler

More information

Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods

Estimating the distribution of the window period for recent HIV infections: A comparison of statistical methods Research Artcle Receved 30 September 2009, Accepted 15 March 2010 Publshed onlne n Wley Onlne Lbrary (wleyonlnelbrary.com) DOI: 10.1002/sm.3941 Estmatng the dstrbuton of the wndow perod for recent HIV

More information

A Support Vector Machine Classifier based on Recursive Feature Elimination for Microarray Data in Breast Cancer Characterization. Abstract.

A Support Vector Machine Classifier based on Recursive Feature Elimination for Microarray Data in Breast Cancer Characterization. Abstract. A Support Vector Machne Classfer based on Recursve Feature Elmnaton for Mcroarray Data n Breast Cancer Characterzaton. R.Campann, D. Dongovann, E. Iamper, N. Lanconell, G. Palermo, M. Roffll, A. Rccard

More information

Price linkages in value chains: methodology

Price linkages in value chains: methodology Prce lnkages n value chans: methodology Prof. Trond Bjorndal, CEMARE. Unversty of Portsmouth, UK. and Prof. José Fernández-Polanco Unversty of Cantabra, Span. FAO INFOSAMAK Tangers, Morocco 14 March 2012

More information

Normal variation in the length of the luteal phase of the menstrual cycle: identification of the short luteal phase

Normal variation in the length of the luteal phase of the menstrual cycle: identification of the short luteal phase Brtsh Journal of Obstetrcs and Gvnaecologjl July 1984, Vol. 9 1, pp. 685-689 Normal varaton n the length of the luteal phase of the menstrual cycle: dentfcaton of the short luteal phase ELIZABETH A. LENTON,

More information

Evaluation of the generalized gamma as a tool for treatment planning optimization

Evaluation of the generalized gamma as a tool for treatment planning optimization Internatonal Journal of Cancer Therapy and Oncology www.jcto.org Evaluaton of the generalzed gamma as a tool for treatment plannng optmzaton Emmanoul I Petrou 1,, Ganesh Narayanasamy 3, Eleftheros Lavdas

More information

THIS IS AN OFFICIAL NH DHHS HEALTH ALERT

THIS IS AN OFFICIAL NH DHHS HEALTH ALERT THIS IS AN OFFICIAL NH DHHS HEALTH ALERT Dstrbuted by the NH Health Alert Network Health.Alert@dhhs.nh.gov August 26, 2016 1430 EDT (2:30 PM EDT) NH-HAN 20160826 Recommendatons for Accurate Dagnoss of

More information

What Determines Attitude Improvements? Does Religiosity Help?

What Determines Attitude Improvements? Does Religiosity Help? Internatonal Journal of Busness and Socal Scence Vol. 4 No. 9; August 2013 What Determnes Atttude Improvements? Does Relgosty Help? Madhu S. Mohanty Calforna State Unversty-Los Angeles Los Angeles, 5151

More information

The Influence of the Isomerization Reactions on the Soybean Oil Hydrogenation Process

The Influence of the Isomerization Reactions on the Soybean Oil Hydrogenation Process Unversty of Belgrade From the SelectedWorks of Zeljko D Cupc 2000 The Influence of the Isomerzaton Reactons on the Soybean Ol Hydrogenaton Process Zeljko D Cupc, Insttute of Chemstry, Technology and Metallurgy

More information

AUTOMATED DETECTION OF HARD EXUDATES IN FUNDUS IMAGES USING IMPROVED OTSU THRESHOLDING AND SVM

AUTOMATED DETECTION OF HARD EXUDATES IN FUNDUS IMAGES USING IMPROVED OTSU THRESHOLDING AND SVM AUTOMATED DETECTION OF HARD EXUDATES IN FUNDUS IMAGES USING IMPROVED OTSU THRESHOLDING AND SVM Wewe Gao 1 and Jng Zuo 2 1 College of Mechancal Engneerng, Shangha Unversty of Engneerng Scence, Shangha,

More information

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS ELLIOTT PARKER and JEANNE WENDEL * Department of Economcs, Unversty of Nevada, Reno, NV, USA SUMMARY Ths paper examnes the econometrc

More information

National Polyp Study data: evidence for regression of adenomas

National Polyp Study data: evidence for regression of adenomas 5 Natonal Polyp Study data: evdence for regresson of adenomas 78 Chapter 5 Abstract Objectves The data of the Natonal Polyp Study, a large longtudnal study on survellance of adenoma patents, s used for

More information

Evaluation of Literature-based Discovery Systems

Evaluation of Literature-based Discovery Systems Evaluaton of Lterature-based Dscovery Systems Melha Yetsgen-Yldz 1 and Wanda Pratt 1,2 1 The Informaton School, Unversty of Washngton, Seattle, USA. 2 Bomedcal and Health Informatcs, School of Medcne,

More information

J. H. Rohrer, S. H. Baron, E. L. Hoffman, D. V. Swander

J. H. Rohrer, S. H. Baron, E. L. Hoffman, D. V. Swander 2?Hr a! A Report of Research on o ^^ -^~" r" THE STABILITY OF AUTOKINETIC JUDGMENTS J. H. Rohrer, S. H. Baron, E. L. Hoffman, D. V. Swander A techncal report made under ONR Contract Nonr-475(01) between

More information

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo Estmaton for Pavement Performance Curve based on Kyoto Model : A Case Study for Kazuya AOKI, PASCO CORPORATION, Yokohama, JAPAN, Emal : kakzo603@pasco.co.jp Octávo de Souza Campos, Publc Servces Regulatory

More information

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION computng@tanet.edu.te.ua www.tanet.edu.te.ua/computng ISSN 727-6209 Internatonal Scentfc Journal of Computng FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION Gábor Takács ), Béla Patak

More information

Computing and Using Reputations for Internet Ratings

Computing and Using Reputations for Internet Ratings Computng and Usng Reputatons for Internet Ratngs Mao Chen Department of Computer Scence Prnceton Unversty Prnceton, J 8 (69)-8-797 maoch@cs.prnceton.edu Jaswnder Pal Sngh Department of Computer Scence

More information

A Support Vector Machine Classifier based on Recursive Feature Elimination for Microarray Data in Breast Cancer Characterization. Abstract.

A Support Vector Machine Classifier based on Recursive Feature Elimination for Microarray Data in Breast Cancer Characterization. Abstract. A Support Vector Machne Classfer based on Recursve Feature Elmnaton for Mcroarray Data n Breast Cancer Characterzaton. R.Campann, D. Dongovann, N. Lanconell, G. Palermo, A. Rccard, M. Roffll Dpartmento

More information

NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 TIANHONG ZHOU

NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 TIANHONG ZHOU NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 by TIANHONG ZHOU B.S., Chna Agrcultural Unversty, 2003 M.S., Chna Agrcultural Unversty, 2006 A THESIS submtted n partal fulfllment of the requrements

More information

N-back Training Task Performance: Analysis and Model

N-back Training Task Performance: Analysis and Model N-back Tranng Task Performance: Analyss and Model J. Isaah Harbson (jharb@umd.edu) Center for Advanced Study of Language and Department of Psychology, Unversty of Maryland 7005 52 nd Avenue, College Park,

More information

Prototypes in the Mist: The Early Epochs of Category Learning

Prototypes in the Mist: The Early Epochs of Category Learning Journal of Expermental Psychology: Learnng, Memory, and Cognton 1998, Vol. 24, No. 6, 1411-1436 Copyrght 1998 by the Amercan Psychologcal Assocaton, Inc. 0278-7393/98/S3.00 Prototypes n the Mst: The Early

More information

Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA

Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA Alma Mater Studorum Unverstà d Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA Cclo XXVII Settore Concorsuale d afferenza: 13/D1 Settore Scentfco dscplnare: SECS-S/02

More information

Estimation of Relative Survival Based on Cancer Registry Data

Estimation of Relative Survival Based on Cancer Registry Data Revew of Bonformatcs and Bometrcs (RBB) Volume 2 Issue 4, December 203 www.sepub.org/rbb Estmaton of Relatve Based on Cancer Regstry Data Olaf Schoffer *, Ante Nedostate 2, Stefane J. Klug,2 Cancer Epdemology,

More information

Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO

Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO Zuo et al. BMC Bonformatcs (2017) 18:99 DOI 10.1186/s12859-017-1515-1 METHODOLOGY ARTICLE Open Access Incorporatng pror bologcal knowledge for network-based dfferental gene expresson analyss usng dfferentally

More information

Non-linear Multiple-Cue Judgment Tasks

Non-linear Multiple-Cue Judgment Tasks Non-lnear Multple-Cue Tasks Anna-Carn Olsson (anna-carn.olsson@psy.umu.se) Department of Psychology, Umeå Unversty SE-09 87, Umeå, Sweden Tommy Enqvst (tommy.enqvst@psyk.uu.se) Department of Psychology,

More information

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE Wang Yng, Lu Xaonng, Ren Zhhua, Natonal Meteorologcal Informaton Center, Bejng, Chna Tel.:+86 684755, E-mal:cdcsjk@cma.gov.cn Abstract From, n Chna meteorologcal

More information

The effect of salvage therapy on survival in a longitudinal study with treatment by indication

The effect of salvage therapy on survival in a longitudinal study with treatment by indication Research Artcle Receved 28 October 2009, Accepted 8 June 2010 Publshed onlne 30 August 2010 n Wley Onlne Lbrary (wleyonlnelbrary.com) DOI: 10.1002/sm.4017 The effect of salvage therapy on survval n a longtudnal

More information

Evaluation of two release operations at Bonneville Dam on the smolt-to-adult survival of Spring Creek National Fish Hatchery fall Chinook salmon

Evaluation of two release operations at Bonneville Dam on the smolt-to-adult survival of Spring Creek National Fish Hatchery fall Chinook salmon Evaluaton of two release operatons at Bonnevlle Dam on the smolt-to-adult survval of Sprng Creek Natonal Fsh Hatchery fall Chnook salmon By Steven L. Haeseker and Davd Wlls Columba Rver Fshery Program

More information

Using a Wavelet Representation for Classification of Movement in Bed

Using a Wavelet Representation for Classification of Movement in Bed Usng a Wavelet Representaton for Classfcaton of Movement n Bed Adrana Morell Adam Depto. de Matemátca e Estatístca Unversdade de Caxas do Sul Caxas do Sul RS E-mal: amorell@ucs.br André Gustavo Adam Depto.

More information

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi Unversty of Pennsylvana ScholarlyCommons PSC Workng Paper Seres 7-29-20 HIV/AIDS-related Expectatons and Rsky Sexual Behavor n Malaw Adelne Delavande RAND Corporaton, Nova School of Busness and Economcs

More information

ENRICHING PROCESS OF ICE-CREAM RECOMMENDATION USING COMBINATORIAL RANKING OF AHP AND MONTE CARLO AHP

ENRICHING PROCESS OF ICE-CREAM RECOMMENDATION USING COMBINATORIAL RANKING OF AHP AND MONTE CARLO AHP ENRICHING PROCESS OF ICE-CREAM RECOMMENDATION USING COMBINATORIAL RANKING OF AHP AND MONTE CARLO AHP 1 AKASH RAMESHWAR LADDHA, 2 RAHUL RAGHVENDRA JOSHI, 3 Dr.PEETI MULAY 1 M.Tech, Department of Computer

More information

Journal of Economic Behavior & Organization

Journal of Economic Behavior & Organization Journal of Economc Behavor & Organzaton 133 (2017) 52 73 Contents lsts avalable at ScenceDrect Journal of Economc Behavor & Organzaton j ourna l ho me pa g e: www.elsever.com/locate/jebo Perceptons, ntentons,

More information

A GEOGRAPHICAL AND STATISTICAL ANALYSIS OF LEUKEMIA DEATHS RELATING TO NUCLEAR POWER PLANTS. Whitney Thompson, Sarah McGinnis, Darius McDaniel,

A GEOGRAPHICAL AND STATISTICAL ANALYSIS OF LEUKEMIA DEATHS RELATING TO NUCLEAR POWER PLANTS. Whitney Thompson, Sarah McGinnis, Darius McDaniel, A GEOGRAPHICAL AD STATISTICAL AALYSIS OF LEUKEMIA DEATHS RELATIG TO UCLEAR POWER PLATS Whtney Thompson, Sarah McGnns, Darus McDanel, Jean Sexton, Rebecca Pettt, Sarah Anderson, Monca Jackson ABSTRACT:

More information

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field Subject-Adaptve Real-Tme Sleep Stage Classfcaton Based on Condtonal Random Feld Gang Luo, PhD, Wanl Mn, PhD IBM TJ Watson Research Center, Hawthorne, NY {luog, wanlmn}@usbmcom Abstract Sleep stagng s the

More information

Integration of sensory information within touch and across modalities

Integration of sensory information within touch and across modalities Integraton of sensory nformaton wthn touch and across modaltes Marc O. Ernst, Jean-Perre Brescan, Knut Drewng & Henrch H. Bülthoff Max Planck Insttute for Bologcal Cybernetcs 72076 Tübngen, Germany marc.ernst@tuebngen.mpg.de

More information

Insights in Genetics and Genomics

Insights in Genetics and Genomics Insghts n Genetcs and Genomcs Research Artcle Open Access New Score Tests for Equalty of Varances n the Applcaton of DNA Methylaton Data Analyss [Verson ] Welang Qu Xuan L Jarrett Morrow Dawn L DeMeo Scott

More information

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi HIV/AIDS-related Expectatons and Rsky Sexual Behavor n Malaw Adelne Delavande Unversty of Essex and RAND Corporaton Hans-Peter Kohler Unversty of Pennsylvanna January 202 Abstract We use probablstc expectatons

More information

Active Affective State Detection and User Assistance with Dynamic Bayesian Networks. Xiangyang Li, Qiang Ji

Active Affective State Detection and User Assistance with Dynamic Bayesian Networks. Xiangyang Li, Qiang Ji Actve Affectve State Detecton and User Assstance wth Dynamc Bayesan Networks Xangyang L, Qang J Electrcal, Computer, and Systems Engneerng Department Rensselaer Polytechnc Insttute, 110 8th Street, Troy,

More information

Concentration of teicoplanin in the serum of adults with end stage chronic renal failure undergoing treatment for infection

Concentration of teicoplanin in the serum of adults with end stage chronic renal failure undergoing treatment for infection Journal of Antmcrobal Chemotherapy (1996) 37, 117-121 Concentraton of tecoplann n the serum of adults wth end stage chronc renal falure undergong treatment for nfecton A. MercateUo'*, K. Jaber*, D. Hfflare-Buys*,

More information

NHS Outcomes Framework

NHS Outcomes Framework NHS Outcomes Framework Doman 1 Preventng people from dyng prematurely Indcator Specfcatons Verson: 1.21 Date: May 2018 Author: Clncal Indcators Team NHS Outcomes Framework: Doman 1 Preventng people from

More information

Comparison among Feature Encoding Techniques for HIV-1 Protease Cleavage Specificity

Comparison among Feature Encoding Techniques for HIV-1 Protease Cleavage Specificity Internatonal Journal of Intellgent Systems and Applcatons n Engneerng Advanced Technology and Scence ISSN:2147-67992147-6799 http://jsae.atscence.org/ Orgnal Research Paper Comparson among Feature Encodng

More information

Nonstandard Machine Learning Algorithms for Microarray Data Mining. Byoung-Tak Zhang

Nonstandard Machine Learning Algorithms for Microarray Data Mining. Byoung-Tak Zhang Nonstandard Machne Learnng Algorthms for Mcroarray Data Mnng Byoung-Tak Zhang Center for Bonformaton Technology (CBIT) & Bontellgence Laboratory School of Computer Scence and Engneerng Seoul Natonal Unversty

More information

An Approach to Discover Dependencies between Service Operations*

An Approach to Discover Dependencies between Service Operations* 36 JOURNAL OF SOFTWARE VOL. 3 NO. 9 DECEMBER 2008 An Approach to Dscover Dependences between Servce Operatons* Shuyng Yan Research Center for Grd and Servce Computng Insttute of Computng Technology Chnese

More information

*VALLIAPPAN Raman 1, PUTRA Sumari 2 and MANDAVA Rajeswari 3. George town, Penang 11800, Malaysia. George town, Penang 11800, Malaysia

*VALLIAPPAN Raman 1, PUTRA Sumari 2 and MANDAVA Rajeswari 3. George town, Penang 11800, Malaysia. George town, Penang 11800, Malaysia 38 A Theoretcal Methodology and Prototype Implementaton for Detecton Segmentaton Classfcaton of Dgtal Mammogram Tumor by Machne Learnng and Problem Solvng *VALLIAPPA Raman, PUTRA Sumar 2 and MADAVA Rajeswar

More information

Survival Rate of Patients of Ovarian Cancer: Rough Set Approach

Survival Rate of Patients of Ovarian Cancer: Rough Set Approach Internatonal OEN ACCESS Journal Of Modern Engneerng esearch (IJME) Survval ate of atents of Ovaran Cancer: ough Set Approach Kamn Agrawal 1, ragat Jan 1 Department of Appled Mathematcs, IET, Indore, Inda

More information

Dr.S.Sumathi 1, Mrs.V.Agalya 2 Mahendra Engineering College, Mahendhirapuri, Mallasamudram

Dr.S.Sumathi 1, Mrs.V.Agalya 2 Mahendra Engineering College, Mahendhirapuri, Mallasamudram Detecton Of Myocardal Ischema In ECG Sgnals Usng Support Vector Machne Dr.S.Sumath 1, Mrs.V.Agalya Mahendra Engneerng College, Mahendhrapur, Mallasamudram Abstract--Ths paper presents an ntellectual dagnoss

More information

Economic crisis and follow-up of the conditions that define metabolic syndrome in a cohort of Catalonia,

Economic crisis and follow-up of the conditions that define metabolic syndrome in a cohort of Catalonia, Economc crss and follow-up of the condtons that defne metabolc syndrome n a cohort of Catalona, 2005-2012 Laa Maynou 1,2,3, Joan Gl 4, Gabrel Coll-de-Tuero 5,2, Ton Mora 6, Carme Saurna 1,2, Anton Scras

More information

Single-Case Designs and Clinical Biofeedback Experimentation

Single-Case Designs and Clinical Biofeedback Experimentation Bofeedback and Self-Regulaton, VoL 2, No. 3, 1977 Sngle-Case Desgns and Clncal Bofeedback Expermentaton Davd H. Barow: Brown Unversty and Butler Hosptal Edward B. Blanchard Unversty of Tennessee Medcal

More information

The Effect of Fish Farmers Association on Technical Efficiency: An Application of Propensity Score Matching Analysis

The Effect of Fish Farmers Association on Technical Efficiency: An Application of Propensity Score Matching Analysis The Effect of Fsh Farmers Assocaton on Techncal Effcency: An Applcaton of Propensty Score Matchng Analyss Onumah E. E, Esslfe F. L, and Asumng-Brempong, S 15 th July, 2016 Background and Motvaton Outlne

More information

Desperation or Desire? The Role of Risk Aversion in Marriage. Christy Spivey, Ph.D. * forthcoming, Economic Inquiry. Abstract

Desperation or Desire? The Role of Risk Aversion in Marriage. Christy Spivey, Ph.D. * forthcoming, Economic Inquiry. Abstract Desperaton or Desre? The Role of Rsk Averson n Marrage Chrsty Spvey, Ph.D. * forthcomng, Economc Inury Abstract Because of the uncertanty nherent n searchng for a spouse and the uncertanty of the future

More information

Natural Image Denoising: Optimality and Inherent Bounds

Natural Image Denoising: Optimality and Inherent Bounds atural Image Denosng: Optmalty and Inherent Bounds Anat Levn and Boaz adler Department of Computer Scence and Appled Math The Wezmann Insttute of Scence Abstract The goal of natural mage denosng s to estmate

More information

(From the Gastroenterology Division, Cornell University Medical College, New York 10021)

(From the Gastroenterology Division, Cornell University Medical College, New York 10021) ROLE OF HEPATIC ANION-BINDING PROTEIN IN BROMSULPHTHALEIN CONJUGATION* BY N. KAPLOWITZ, I. W. PERC -ROBB,~ ANn N. B. JAVITT (From the Gastroenterology Dvson, Cornell Unversty Medcal College, New York 10021)

More information

Optimal probability weights for estimating causal effects of time-varying treatments with marginal structural Cox models

Optimal probability weights for estimating causal effects of time-varying treatments with marginal structural Cox models Optmal probablty weghts for estmatng causal effects of tme-varyng treatments wth margnal structural Cox models Mchele Santacatterna, Cela García-Pareja Rno Bellocco, Anders Sönnerborg, Anna Ma Ekström

More information

Does reporting heterogeneity bias the measurement of health disparities?

Does reporting heterogeneity bias the measurement of health disparities? HEDG Workng Paper 06/03 Does reportng heterogenety bas the measurement of health dspartes? Teresa Bago d Uva Eddy Van Doorslaer Maarten Lndeboom Owen O Donnell Somnath Chatterj March 2006 ISSN 1751-1976

More information