Using Past Queries for Resource Selection in Distributed Information Retrieval

Size: px
Start display at page:

Download "Using Past Queries for Resource Selection in Distributed Information Retrieval"

Transcription

1 Purdue Unversty Purdue e-pubs Department of Computer Scence Techncal Reports Department of Computer Scence 2011 Usng Past Queres for Resource Selecton n Dstrbuted Informaton Retreval Sulleyman Cetntas Purdue Unversty, scetnta@cs.purdue.edu Luo S Purdue Unversty, ls@cs.purdue.edu Hao Yuan Purdue Unversty, yuan3@cs.purdue.edu Report Number: Cetntas, Sulleyman; S, Luo; and Yuan, Hao, "Usng Past Queres for Resource Selecton n Dstrbuted Informaton Retreval" (2011). Department of Computer Scence Techncal Reports. Paper Ths document has been made avalable through Purdue e-pubs, a servce of the Purdue Unversty Lbrares. Please contact epubs@purdue.edu for addtonal nformaton.

2 Usng Past Queres for Resource Selecton n Dstrbuted Informaton Retreval Suleyman Cetntas 1 Department of Computer Scences Purdue Unversty West Lafayette, I, 47907, USA Luo S Departments of Computer Scences and Statstcs Purdue Unversty West Lafayette, I, 47907, USA Hao Yuan Department of Computer Scences Purdue Unversty West Lafayette, I, 47907, USA SCETINTA@CS.PURDUE.EDU LSI@CS.PURDUE.EDU YUAN3@ CS.PURDUE.EDU Abstract Federated text search provdes a unfed search nterface for multple search engnes of dstrbuted text nformaton sources. Resource selecton s an mportant component for federated text search, whch selects a small number of nformaton sources that contan the largest number of relevant documents for a user query. Most pror research of resource selecton focused on selectng nformaton sources by analyzng statc nformaton of avalable nformaton sources that s sampled n the offlne manner. On the other hand, most pror research gnored a large amount of valuable nformaton lke the results from past queres. Ths paper proposes a new resource selecton technque (whch s called qsm) that utlzes the search results of past queres for estmatng the utltes of avalable nformaton sources for a specfc user query. The new algorthm calculates the query smlartes between a specfc query and all past queres, and then estmates the utltes of avalable nformaton sources by the weghted combnaton of results of past queres wth respect to the query smlartes. The new resource selecton algorthm s practcal as t does not requre relevance judgment of past queres and t only utlzes regresson based results mergng method to rank the results of past queres. Furthermore, a combned resource selecton approach s proposed to ntegrate the two approaches of learnng from past queres and usng statc sampled nformaton. An extensve set of experments demonstrate the effectveness of the new resource selecton algorthm of learnng from past queres as well as the combned resource selecton algorthm n several confguratons. Keywords: Past Queres, Resource Selecton, Federated Search 1 Correspondng Author: Phone: (765) & Fax: (765)

3 1 Introducton The prolferaton of searchable text nformaton sources on local area networks and the Internet creates a problem of fndng nformaton that s dstrbuted among many text nformaton sources (federated text search) (Callan 2000; Craswell 2000; Meng et al. 2002). Federated text search, also known as dstrbuted nformaton retreval, ncludes three sub-problems. Frst, nformaton about the contents of each ndvdual nformaton source must be acqured. Ths task s called resource representaton (Balle et al. 2009; Callan 2000; Gravano et al. 1997; Nottelmann and Fuhr 2003; S and Callan 2003a). Second, a small number of text nformaton sources should be selected for search for a specfc user query (Cetntas et al. 2009; Craswell 2000; French et al. 1999; Fuhr 1999; Gravano et al. 1999; Hawkng and Thstlewate 1999; Iperots and Gravano 2002; Nottelmann and Fuhr 2003; Powell et al. 2000; Shokouh 2007; Shokouh and Zobel 2007; S and Callan 2003a; S and Callan 2004; Zha and Lafferty 2001). Ths task s called resource selecton or collecton selecton. Thrd, after results are returned from selected nformaton sources, the ndvdual ranked lsts should be merged nto a sngle fnal ranked lst. Ths task s called results mergng (Callan 2000; Cetntas and S 2007; Krsch 1997; Larson 2002; Le Calv and Savoy 2000; Lu et al. 2005; S and Callan 2003b; Xu and Callan 1998). Most pror research work of resource selecton focused on selectng nformaton sources by analyzng the statc nformaton whch s sampled from avalable nformaton sources n an offlne manner. For example, the ReDDE (Relevant Document Dstrbuton Estmaton (S and Callan 2003a)) algorthm frst tres to buld a centralzed sampled database based on query-basedsamplng, and then uses the nformaton of the centralzed sampled database to estmate the dstrbuton of relevant documents for resource selecton. However, most prevous research gnored a large amount of valuable nformaton lke the resource selecton results of past queres. For example, n a real world search engne, there are many smlar or even duplcated queres every day. The results from those smlar past queres are very valuable to gude the resource selecton decson of a current user query. One can magne that f two user queres are very smlar, ther correspondng resource selecton results should also be close. In ths paper, we propose a new resource selecton technque (.e., qsm) that utlzes the search results of past queres for estmatng the utltes of avalable nformaton sources for a specfc user query. The new algorthm calculates the query smlartes between a specfc query and all past queres, and then estmates the utltes of avalable nformaton sources by the weghted combnaton of search results of past queres wth respect to the query smlartes. The new resource selecton algorthm s practcal as t does not requre relevance judgment of past queres, and t only utlzes regresson based results mergng method to rank the results of past queres. Furthermore, a combned resource selecton approach s proposed to ntegrate the two types of approaches: learnng from past queres and usng statc sampled nformaton. It has ts advantages to ntegrate two types of evdence. In partcular, the combned approach generates the resource selecton results by ntegratng the results of the qsm selecton algorthm and the ReDDE selecton algorthm. 2

4 Emprcal studes are conducted on two testbeds wth several confguratons to show the advantage of the new resource selecton algorthms. In partcular, we compare the performance of the new algorthms wth a state-of-the-art resource selecton algorthm as ReDDE (S and Callan 2003a) (relevant document dstrbuton estmaton). In our experments n research envronments, we followed two dfferent approaches to generate the set of past queres: ) we sampled queres for each test/onlne query by extractng the ttles of some top rankng documents n the search results of that partcular test query; ) we generated smulated queres by randomly removng some terms from test queres, and used these smulated queres as past queres. The set of all sampled queres and smulated queres for each test query forms the set of smlar past queres n ether approach. Under the usual scenaro that there exst smlar queres among past queres and onlne queres, our new qsm algorthm performs better than the pror state-of-the-art ReDDE algorthm. The larger amount of past queres we have or the more smlar the past queres are to onlne queres, the better results we can get. Furthermore, a combned resource selecton approach can provde even better resource selecton results than both the qsm and the ReDDE algorthms. Therefore when the system receves a new rare test/onlne query, t wll utlze the combned approach and perform at least as good as the ReDDE algorthm or even better than that. The rest of the paper s arranged as follows: Secton 2 dscusses the related work. Secton 3 proposes the new resource selecton approaches,.e. ) the resource selecton approach by learnng from past queres and ) the combned resource selecton approach. Secton 4 dscusses the expermental methodology. Secton 5 presents the expermental results; and fnally, Secton 6 concludes ths work. 2 Related Work There has been consderable research on all of the three sub-tasks of federated search:.e. ) acqurng resource descrpton, ) resource selecton and ) results mergng. Snce ths paper focuses on the resource selecton task, we manly survey related work n resource selecton and brefly dscuss the works n resource descrpton and results mergng. The STARTS (Gravano et al. 1997) protocol s one of the solutons to acqure the resource descrptons. It requres explct cooperaton from each nformaton source. Although STARTS s a good soluton n envronments where cooperaton can be guaranteed, t doesn t work n mult-party envronments where some nformaton sources may not cooperate. Query Based Samplng (QBS) (Callan 2000; Callan and Connell 2001) s an alternatve approach for acqurng resource descrptons snce t doesn t requre explct cooperaton from avalable sources. QBS only uses the normal process of runnng queres and retrevng a lst of downloadable documents to construct resource descrptons. Ths soluton has been shown to acqure rather accurate resource descrptons usng a relatvely small number of randomly generated queres to retreve relatvely small number of documents. There s a large body of pror research on resource selecton (e.g., Cetntas et al. 2009; Craswell 2000; French et al. 1999; Fuhr 1999; Gravano et al. 1999; Hawkng and Thstlewate 1999; Iperots and Gravano 2002; Nottelmann and Fuhr 2003; Powell et al. 2000; Shokouh 2007; Shokouh and Zobel 2007; S and Callan 2003a; S and Callan 2004; Zha and Lafferty 2001). 3

5 Space lmtatons preclude dscussng t all, so we restrct our attenton to a few that have been studed often or recently n pror research. A large body of resource selecton algorthms follows the Bg Document approach by representng each nformaton source as a sngle bg document. The smlartes between the Bg Documents and a user query are calculated to rank avalable nformaton sources. Bg Document methods nclude CORI (Callan 2000; Callan et al. 1995), ggloss (Gravano et al. 1997; Gravano et al. 1999) and CVV (Yuwono and Lee 1997). Partcularly, the CORI resource selecton algorthm has been shown to be one of the most stable and effectve Bg Document resource selecton algorthms n pror studes (e.g, Craswell 2000; French et al. 1999). The CORI resource selecton algorthm (Callan 2000; Callan et al. 1995) represents each nformaton source by usng the words that t contans, ther frequences, and a small number of corpus statstcs. Resource rankng s generated by usng a Bayesan nference network and an adaptaton of the Okap term frequency normalzaton. Detals can be found n (Callan 2000; Callan et al. 1995). However, the Bg Document approach does not consder nformaton source szes, whch s a bg problem for searchng n the envronment of a mxture of small and very large nformaton sources (S and Callan 2003a). ReDDE (S and Callan 2003a) resource selecton algorthm was proposed to address the above ssue. It explctly estmates the dstrbuton of relevant documents among avalable nformaton sources for resource selecton. ReDDE utlzes database sze estmaton and a centralzed sample database (CSDb) that conssts of the documents obtaned by query based samplng as resource descrptons. The CSDb s a representatve subset of the centralzed complete database (CCDb) whch s the unon of all the documents n avalable nformaton sources. Snce the CCDb s not avalable n the federated search envronment, ReDDE uses the CSDb (whch has already been generated by QBS) to smulate the property of CCDb. In summary, ReDDE ssues a query to an ndex created over CSDb and scores each resource ) proportonal to the number of top-ranked documents orgnatng from that resource and ) consderng the estmated sze of that resource. More detaled nformaton about CCDb and CSDb databases can be seen on Fgure 1. In partcular, ReDDE utlzes the ranked lst of a user query on CSDb to smulate the ranked lst of ths user query on CCDb. Wth the ranked lst of a user query on CSDb, ReDDE estmates the number of relevant documents to query q n an nformaton source where C j ^ 1 Re l _ q( j) = P( rel d)* * d C j _ samp C j _ samp ^ C j C as follows: j N s the estmated database sze for the nformaton source C j (e.g., by Sample- Resample method [24]); C j_samp s the set of documents that are sampled from database C j ; C (.e. the sze of C j_samp ). The only unknown n and N s the number of documents n j_samp _samp C j Equaton 1 s P(rel d ),.e. the probablty of relevance gven a specfc document. P(rel d ), the probablty of relevance of a specfc document n Equaton 1, s defned as the probablty of relevance gven document rank when a CCDb s searched by an effectve retreval method (.e., Rank_central(d ) ) as follows: (1) 4

6 ( ) P rel d where C q s a query dependent constant, C f Rank _ central( d ) < rato ˆ = 0 otherwse q all N all (2) s the estmated total number of documents n the CCDb and rato s a parameter whch s set to be as n pror experments (S and Callan 2003a). Although ths approxmaton of relevance probablty s rather rough, smlar methods have been used by many automatc relevance feedback methods. Rank_central(d ), the rank of a document d n the CCDb, s calculated as follows: c( d j ) Rank _ central( d ) = (3) d j Rank _ Samp( d j ) < Rank _ Samp( d ) where Rank_Samp(d ) s the correspondng rank n CSDb. ^ c( d j ) _ samp ^ Rel_q(j) can be calculated by pluggng Equatons 3 and 2 nto Equaton 1, so that the dstrbuton of relevant documents n dfferent nformaton sources s acqured wthout a need for relevance judgment. The dstrbuton of relevant documents among nformaton sources s suffcent to rank the avalable nformaton sources for the resource selecton task. However, all the above pror research of resource selecton focused on selectng nformaton sources by analyzng statc nformaton of avalable nformaton sources that s sampled n the offlne manner. The pror research gnored a large amount of valuable nformaton lke the results from past queres or query logs. In most real world applcatons, there s often substantal smlarty between users queres; and even many of them are duplcates of each other. Ths fact not only motvates the research to make use of the results of past queres, but also makes t possble to use the combned approach of utlzng both the valuable results from past queres and the statc nformaton of avalable nformaton sources for the resource selecton decson of a current user query. Voorhees et al. s work consdered usng results of past queres (1995). However, the task they are dealng wth s not exactly resource selecton snce ther method retreves documents from all nformaton sources. For the decson of how many documents to retreve from each source, the work n (Voorhess et al. 1995) proposed two approaches. One method s to calculate the relevant document dstrbuton among nformaton sources for a query q by averagng the relevant document dstrbutons (among all sources) of k most smlar past queres wth q. The other method s to compute the weghts of each nformaton source s most smlar query cluster wth the query q and use these weghts. Both of these methods requre relevance judgment. Some recent research has been conducted for selectng vertcal search engnes (e.g., Arguello et al. 2009; Daz et al. 2009). In partcular, research work has been conducted to utlze query log and user feedback nformaton to mprove vertcal search engne selecton. However, the research of selectng vertcal engnes was dfferent from our proposed work n several perspectves. Frst, they only studed and evaluated selectng one vertcal engne for each user query whle there are often qute a few relevant resources for each user query n tradtonal dstrbuted nformaton retreval applcatons (e.g., dgtal lbrares). Second, they treated vertcal engne selecton as a 5

7 classfcaton approach, whle our work ranks resources by the number of relevant documents contaned n avalable resources. Thrd, they utlzed human relevance judgment, whle the proposed work only utlzes results from past queres that are automatcally generated. Fnally, ther approach s a pure resource selecton method, whle the proposed method s a more complete soluton that ncludes results mergng. Recently, Cetntas et al. showed that usng the search results of past queres mproves the effectveness of resource selecton; however, ther work was lmted. Partcularly, they dd not ntegrate the two approaches of learnng from past queres and usng statc sampled nformaton. Furthermore, they dd not explore the effect of ) usng larger amount of past queres, ) more smlar past queres, ) selectng more nformaton sources, v) usng dfferent query smlarty estmaton technques on resource selecton effectveness (2009). The last step of dstrbuted nformaton retreval s mergng ranked lsts produced by dfferent search engnes. It s usually treated as a problem of transformng database-specfc document scores nto database-ndependent document scores. The CORI results mergng formula (Callan 2000) s a heurstc lnear combnaton of the database score and the document score. The Sem- Supervsed Learnng (SSL) algorthm (S and Callan 2003b) uses the documents acqured by query-based samplng as tranng data and utlzes lnear regresson to learn mergng models. When t s necessary, the Sem-Supervsed Learnng method downloads a small number of documents on the fly to create tranng data for buldng the regresson models. Please note that SSL wll be used n ths word for estmatng the relevance of nformaton sources to past queres. The documents downloaded by SSL wll be an mportant data source that wll be explaned later. More detaled nformaton about the set of SSL-downloaded documents along wth CCDb, CSDb databases and how they are used n ths work can be seen on Fgure 1. 3 ew Resource Selecton Algorthms Snce there tends to be many smlar queres n a real world federated search system, the valuable nformaton of past queres can help us provde better resource selecton results. In ths secton, we propose a novel algorthm, whch s called qsm, to utlze the valuable nformaton to gude the decson of resource selecton. Furthermore, we propose a combned resource selecton approach that utlzes both the nformaton from past queres and the statc nformaton from query-based samplng for more accurate resource selecton. The frst subsecton presents the qsm algorthm for how to use search results from past queres to gude resource selecton for onlne user queres. The second subsecton descrbes the combned resource selecton approach. 3.1 Resource Selecton by Learnng the Results from Past Queres Assume that there exsts a set of past queres, whch s denoted by P = {p 1, p 2,, p M }, where p represents the th past query. For an onlne user query q, assume that we have a specfc method to measure the smlarty between q and p (detals of the query smlarty estmaton approaches used n ths work can be found n Secton 3.1.2), let us denote the smlarty by Sm(p q). Denote the 6

8 set of nformaton sources by S = {s 1, s 2,, s N }. For a query q, we use Rel(s j q) to represent how lkely t s for the nformaton source s j to be relevant to the query q, or what (estmated) percentage of the relevant documents for the query q s n s j. We are nterested n rankng the avalable nformaton sources by comparng ther values of the Rel functon. Please note that the absolute values of Sm(p q) and Rel(s j q) are not mportant. Only the relatve dfference between Sm(p 1 q) and Sm(p 2 q) s mportant, for any par of p 1 and p 2 whle q s fxed; whch s also the case for Rel(s j q). In qsm, we can estmate the value of Rel(s j p ) for all past queres. For an onlne query q, the task s to estmate Rel(s j q) based on the nformaton of Rel(s j p ) and Sm(p q). The algorthmc framework of qsm s as follows: 1. Estmate the relevance of an nformaton source s j to a past query p,.e. estmate Rel(s j p ). Ths can be done n an offlne manner for past queres. After the federated search system has processed an onlne query, ths query can also be added to the set of past queres for future reference. 2. Estmate the smlarty between a past query p and an onlne/test query q,.e. estmate Sm(p q). 3. Estmate the relevance of an nformaton source s j to an onlne/test query q,.e. estmate Rel(s j q), accordng to the estmated relevance nformaton of smlar past queres (to ths onlne query q). 4. Rank the nformaton sources accordng to Rel(s j q), whch wll gve the most relevant nformaton sources to the onlne query q Estmatng the Relevance of Informaton to Past Queres (.e. Rel(s j p ) ) Any resource selecton algorthm can be appled to estmate the value of Rel(s j p ). If a resource selecton algorthm can explctly gve scores for all nformaton sources, we can drectly map those scores to Rel(s j p ). For example, n the ReDDE algorthm, we can set Rel(s j p ) ~ Rel_p ^ (j), where ^ (j) Rel_p s the estmated number of relevant documents n s j wth respect to p by ReDDE. In ths work, a regresson-based results mergng approach (S and Callan 2003b) s used to get a much more precse Rel(s j p ) estmaton by utlzng the search results from past queres, no matter whether a pror resource selecton has been made for p or not. The key ntuton underlyng ths approach s that the more relevant documents from an nformaton source are n the merged ranked lst of past query p, the more relevant ths nformaton source to p s. The process of ths approach s as follows: Frst, assume that a set of sources (denoted by SEL_p ) are selected for p. For the case that no pror resource selecton has been mode, we can have SEL_p =S,.e. all nformaton sources. 7

9 QBS Queres Generated by Query Based Samplng SSL Past Queres p Resource Descrpton for Source #1, S 1 Source #1, S 1 Docs downloaded from Source #1 by SSL Resource Descrpton for Source #2, S 2 Source #2, S 2 Docs downloaded from Source #2 by SSL Resource Descrpton for Source #N, S N Centralzed Sampled Database (CSDb) The collecton of documents that are sampled by Query Based Samplng Source #N, S N Centralzed Complete Database (CCDb) The collecton of all documents n all nformaton sources Docs downloaded from Source #N by SSL Set of Documents downloaded by SSL The set of documents downloaded by the SSL results mergng algorthm ReDDE Rel ReDDE(s j q) Past Query p Test Query q Sm(p q) Rel(s p ) Rel qsm(s j q) Rel Combned(s j q) Fg. 1 of evdence for ReDDE and qsm resource selecton approaches. A federated search envronment conssts of several nformaton sources. The set of all documents n all nformaton sources consttute the Centralzed Complete Database (CCDd) (shown on left). Note that CCDb s a hypothetcal database, whch s not avalable n a federated search envronment. Therefore Query Based Samplng (QBS) samples documents from every nformaton source n order to acqure the resource descrptons. The set of all documents sampled by QBS form the centralzed sampled database (CSDb) whch s used by ReDDE resource selecton algorthm (Callan 2000; Callan and Connell 2001). In ths work, the combned resource selecton approach that uses the resource selecton decsons of ReDDE, ndrectly uses the CSDb (snce ReDDE uses t). For estmatng of the relevance of nformaton sources to past queres, SSL results mergng algorthm (S and Callan 2003b) s used. In order to buld the regresson models, SSL downloads a partcular number of documents from each source; and the set of all downloaded document consttute the SSL-downloaded documents database (shown on rght). Note that SSL-downloaded documents are used for the estmaton of relevance of nformaton sources to past queres (.e. Rel(s j p )) and for retreval-based query smlarty estmaton (.e. Sm(p q)). Some statstcs about the number of documents n each database can be found n the dscusson n Secton 4.3 Second, we use the regresson based Sem-Supervsed Learnng (SSL) method (S and Callan 2003b) to merge the search results from those sources n SEL_p nto a sngle ranked lst. SSL wll gve us the ranked lst of the most relevant documents from all nformaton sources n SEL_p to the past query p. 8

10 Thrd, the nformaton sources n SEL_p are assgned relevance scores proportonal to how many of ther documents are n the sngle ranked lst of relevant documents to p (as; f a source s more relevant to p, more documents from ths source wll appear n the ranked lst). Partcularly, f a source s j s not n SEL_p, then Rel(s j p ) = 0. If the source s j s n SEL_p, then Rel(s j p ) s estmated by: where T s a pre-defned number, and 1, f doc s j; Re l( s j doc ) = (5) 0, otherwse. The value T,.e. how many documents are consdered n the sngle ranked lst, can be very small lke 20. In our experments, t s set to 20. Ths helps to mnmze the number of nteractons wth nformaton sources. A typcal search engne (such as Google, Yahoo, etc.) only returns 10 or 20 search results or document ds n one page by default. Please note that the SSL algorthm needs to download some documents for each past query p as tranng data to buld the regresson models for mergng the search results from nformaton sources n SEL_p n order to generate the sngle ranked lst for that past query p. Ths s due to the fact although SSL can use the documents n the CSDb as the tranng documents to buld the regresson models, t s not allowed to do so n order to evaluate the resource selecton approach of learnng from past queres n a strct manner. The set of all SSL downloaded tranng documents for all past queres create the database that s used for ) for the estmaton of the relevance of nformaton sources to past queres, and ) for the estmaton of the smlartes of past queres and onlne queres (wll be explaned n the next secton). In ths work, 3 top documents are downloaded as the tranng data for each past query p to buld the regresson models for results mergng, n a smlar manner as the mnmum downloadng method of the SSL results mergng algorthm (S and Callan 2003b). Re l( s j doc) Re l( s j p ) T (4) top T docs n the merged ranked lst Estmatng the Smlarty Between Past Queres and Onlne Queres (.e. Sm(p q) ) There are many ways to measure the smlartes between queres (.e. between past queres and an onlne query), such as term-based approach (cosne measure, edt dstance, latent semantc analyss, etc.), selecton-based approach and retreval-based approach (by comparng the retreval lsts). In our experments, we use two approaches: ) a retreval-based approach and ) term-based approach (cosne smlarty) to calculate the smlartes. Although the retreval-based approach s used as the default approach to calculate the query smlartes, the results acqured wth termbased approach s also used to test the robustness of the proposed algorthms (.e. to make sure that the proposed technques work well regardless of the dfferent characterstcs of dfferent query smlarty estmaton approaches). 9

11 ) Retreval-Based Query Smlarty Estmaton Retreval-based query smlarty estmaton approach compares a past query p and an onlne query q by comparng the smlarty of the retreval results of those queres on a collecton. In ths work, the set of SSL-downloaded documents (explaned n Secton ) are used as the collecton that queres to be compared are searched on. Partcularly, let R_p (or R_q) denote the search results (.e. the ranked lsts) of p (or q) on the SSL-downloaded documents database. The smlarty of p wth respect to q s measured by 1 Sm p q Score doc R R (6a) ( ) (, _ p, _ q) R _ p doc R _ p R _ q where the score functon for a matchng document s gven by: Score( doc, R _ p, R _ q) doc rank n R _ p doc rank n R _ q) = 1 R _ p R _ q (6b) The value of Sm(p q) wll be hgher f the set of documents that are common to R_p and R_q rank smlarly n R_p and R_q. In practce, we cut each search results (R_p and R_q) by only focusng on the top 20% returned documents n ther rank lsts; and normalze the values of Sm(p q) n a smple approach as follows: MAXSIM _ q = max Sm( p q) SIMTHRES _ q= 0.8 MAXSIM _ q (7) 0, f Sm( p q) < SIMTHRES _ q; normalzed ( Sm( p q) - SIMTHRES _ q) (8) Sm( p ),. q otherwse ( MAXSIM _ q - SIMTHRES _ q) Retreval-based smlarty estmaton approach s used as the default query smlarty estmaton approach n ths work due to the fact that term-based smlarty estmaton approaches are not as successful n fndng the smlarty between past and onlne queres when queres have lmted number of terms. More detaled dscusson on the comparson of retreval-based smlarty approach and term-based smlarty approach can be seen on Fgure 2. ) Term-Based Query Smlarty Estmaton (Cosne Measure) As a second method of calculatng the smlartes between past queres and onlne queres, we use the common cosne smlarty (Baeza-Yates and Rbero-Neto 1999). If we denote the bag-ofwords vector representatons of past query p and onlne query q as and q, the cosne smlarty s gven by: p q Sm( p q) cos( p, q) = (9) p q where s the dot product of these vectors. We use the common Okap weghtng scheme to calculate smlartes between past queres and onlne queres; and do normalzaton to scale the smlarty scores between 0 and 1. p 10

12 We use term-based smlarty as the secondary method of query smlarty estmaton approach and used t only wth the retreval-based query smlarty approach to test the robustness of the proposed algorthms (.e. to make sure that the proposed technques work well regardless of the dfferent characterstcs of dfferent query smlarty estmaton approaches). More detaled dscusson on the comparson of retreval-based smlarty approach and term-based smlarty approach can be seen on Fgure Estmatng the Relevance of Informaton to Onlne Queres (.e. Rel(s j q)) For an onlne query q, we predct the relevance of nformaton sources (.e. Rel(s j q)) based on the estmated relevance nformaton from smlar past queres. It s calculated by: Rel( s q) Rel( s p ) Sm( p q) j j In practce, we only consder the K most smlar past queres when we calculate the above summaton. In our experments, K s set to Resource Selecton Accordng to Rel(s j q) (10) The last step s to smply rank the nformaton sources accordng to the values of Rel(s j q). A larger value of Rel(s j q) means that t s more lkely for the source s j to contan more relevant nformaton wth respect to q. As we mentoned before, we can add a query q nto the set of past queres after ths query has been processed Dfferent Levels of Past Search Results For a past query p, the amount of search results wll affect the qualty of the estmated Rel(s j p ). The amount of results s measured by the number of nformaton sources selected and searched for generatng search results. If more sources have been searched, the amount of search results for the past query s more comprehensve. We call the qsm algorthm by qsm-cut-x where the number of sources selected and searched for p s X, or n other words SEL_p, =X. In a real world federated search applcaton wth N=100 nformaton sources, the typcal number of sources that are chosen for the search task s about X = 5, 10 or at most Combned Approach for Resource Selecton In ths subsecton, we propose a smple way to combne qsm wth other resource selecton algorthms lke ReDDE to acheve better performance. In partcular, we focus on the combnaton of qsm and ReDDE. Assume that ReDDE generates the score of a partcular nformaton source s j as Rel_ ReDDE (s j q) for resource selecton. We denote the correspondng resource selecton score from qsm as Rel_ qsm (s j q), and then we can smply combne the values of Rel_ ReDDE (s j q) and Rel_ qsm (s j q) n a lnear way: 11

13 Rel _combned ( s j q) + λ Rel _qsm ( s j q) max jrel _qsm ( s j q) Rel _ReDDE ( s j q) (1- λ) max j Rel _ReDDE ( s j q ) (11) where λ s a real number wthn [0,1]. In our experments, λ s set to 1/3. Fnally, we generate the resource selecton results accordng to the values of Rel_ combned (s j q). 4 Expermental Methodology It s most desrable to evaluate federated search algorthms wth real world applcatons. However, real world applcatons are usually short of relevance judgment data and t s dffcult to obtan full control of dfferent components of the systems (e.g., varyng retreval algorthms of each nformaton source). These dffcultes prevent us from dong thorough study wth real world applcatons. Instead, an extensve set of experments was desgned n research envronments to smulate real world applcatons such as (Avraham et al. 2006). 4.1 Testbeds Experments were carred out on Trec123 and Trec4 testbeds. Detals about Trec123 and Trec4 datasets are gven n Table 1. Trec col-bysource (Trec123): 100 nformaton sources were created from TREC CDs 1, 2 and 3. They are organzed by source and publcaton date. Trec4-bysource (Trec4): 100 nformaton sources were created from TREC4 accordng to the sources of the documents n TREC4. For Trec123 testbed, 50 queres were created from the ttle felds of TREC topcs For Trec4, another 50 queres were created from the descrpton felds of TREC topcs These queres wll be used as the set of test/onlne queres and wll be referred to test, onlne or real queres throughout the paper. The detaled statstcs about the test queres can be found n Table 2. All the 100 nformaton sources were assgned one of three types of retreval algorthms as INQUERY (Callan et al. 1995), a ungram language model wth lnear smoothng (Lemur Toolkt; Zha and Lafferty 2001) and a TFIDF retreval algorthm wth the ltc weghng schema (S and Callan 2003b). All the algorthms were mplemented wth the lemur toolkt (Lemur Toolkt, Oglve and Callan 2001). 4.2 Sampled and Smulated Past Query Sets In a real world federated search system, there often exst many smlar queres. However n Trec123 and Trec4 testbeds, we don t have many smlar queres. In order to better smulate the characterstcs of a real world federated search system, we use two dfferent approaches to generate two dfferent sets of past queres usng the test queres that are avalable n Trec123 and Trec4 datasets. Then the set of real queres that are avalable from Trec123 and Trec4 datasets are used as the set of test/onlne queres and ether one of the two sets of generated past queres are 12

14 Table 1. Summary statstcs for Trec123 and Trec4 dstrbuted IR testbeds. Sze umber of Documents Sze (MB) Testbed (GB) (x1000) Mn Avg Max Mn Avg Max Trec Trec Table 2. TREC query set statstcs. Collectons TREC Topc Set TREC Topc Feld Average Length (Words) Trec Ttle 2.92 Trec Descrpton 8.82 used as the set of past queres. Specfcally the experments are conducted separately on the two sets of generated past queres to test the robustness of the proposed algorthms (.e. to make sure that the proposed technques work well regardless of the dfferent characterstcs of the sets of generated past queres). The frst approach of generatng the set of past queres samples queres for each test/onlne query by extractng the ttles of some top rankng documents n the search results of that partcular test query. The past queres generated by ths approach wll be referred as sampled past queres. The second approach of generatng the set of past queres creates the past queres from test queres by randomly removng some terms from the test queres. The past queres generated by ths approach wll be referred as smulated past queres. To reterate; the experments are usng ether the sampled past queres or the smulated past queres as the set of past queres. More detaled explanaton of these two approaches s gven n the followng two subsectons Sampled Past Queres Ths subsecton descrbes the frst approach to generate the sampled past queres. Two sets of sampled past queres are generated for Trec123, whch are called Trec123_T20 and Trec123_T10. The past queres n Trec123_T20 (or Trec123_T10) were generated n the followng way: A new Trec123 database called Sub_Trec123 s constructed, whch conssts of the documents that exst n the orgnal Trec123 database and that do not exst n the CSDB of the ReDDE algorthm. The documents n CSDB of ReDDE are partcularly excluded n order not to explot the advantage of the documents of CSDB of ReDDE that are used as resource descrptors. For each test query, TOPX most smlar documents that have a ttle (not all the documents have a ttle) are retreved from Sub_Trec123 database. The ttles of these TOPX documents construct the set of X sampled queres for ths test query (.e. each ttle becomes a new sampled past query). Please note that X s 10 for Trec123_T10 and 20 for Trec123_T20 query set. For 50 test queres, the above samplng process s repeated and a total of 1000 past queres are acqured for Trec123_T20 and a total of 500 past queres are acqured for Trec123_T10. In the 13

15 Table 3a. Statstcs of sampled past queres. Collectons Sampled Past Query Set # of TOP Docs (ttles extracted) Average Length (Words) Trec123 Trec123_T Trec123 Trec123_T Trec4 Trec4_T Trec4 Trec4_T Table 3b. Statstcs of smulated past queres. Collectons Smulated Past Query Set # of Words Removed Average Length (Words) Trec123 Trec123_R Trec123 Trec123_R Trec4 Trec4_R Trec4 Trec4_R same way descrbed above, two sets of sampled past queres are also generated for Trec4 testbed, whch are called Trec4_T20 and Trec4_T10. Trec4_T20 & Trec4_T10 have 1000 & 500 sampled queres respectvely. In the experments that used the sampled past queres as the set of past queres; the 50 real queres are treated as test/onlne user queres, and the 1000 sampled past queres (.e. the past queres n Trec123_T20 or Trec4_T20 sampled past query sets) or 500 sampled queres (.e. n Trec123_T10 or Trec4_T10 sampled past query sets) are treated as the past queres. Detals about sampled past queres can be seen n Table 3a Smulated Past Queres Ths subsecton descrbes the second approach to generate the smulated past queres. Two sets of smulated past queres are generated for Trec123, whch are called Trec123_R1 and Trec123_R2. Each set contans 50 smulated queres. The past queres n Trec123_R1 (or Trec123_R2) were generated n the followng way: For each test query, 1 (or 2) term(s) s (are) removed to generate a smulated past query; For each smulated past query at least 2 terms are kept to make sure the smulated past query s not too short to be meanngful. (e.g., n Trec123_R2, f a real query only contans 3 terms, only 1 term s removed nstead of 2 to make sure the smulated query contans at least 2 terms.) For Trec4 testbed, two levels of smulated past queres are also generated, whch are called Trec4_R2 and Trec4_R3. Each set contans 50 smulated queres. The past queres n Trec4_R2 (or Trec4_R3) were generated by the followng way: For each test query, 2 (or 3) terms are randomly removed to generate a smulated past query; For each smulated past query at least 3 terms are kept. 14

16 Normalzed Smlarty Score Normalzed Smlarty Score Past Query Rank Normalzed Smlarty Score Normalzed Smlarty Score Past Query Rank Past Query Rank a) retreval-based smlarty on Trec123_T20 past query set Past Query Rank b) term-based smlarty on Trec123_T20 past query set Normalzed Smlarty Score Normalzed Smlarty Score Past Query Rank Normalzed Smlarty Score Normalzed Smlarty Score Past Query Rank Past Query Rank c) retreval-based smlarty on Trec123_R1 past query set Past Query Rank d) term-based smlarty on Trec123_R1 past query set Fg. 2 Statstcs of Retreval-Based and Term-Based Smlarty measures over Trec123_T20 and Trec123_R1 past query sets. Each onlne/test query has a set of smlar past queres sorted wth respect to the normalzed smlarty scores whle determnng the set of most smlar past queres. The dstrbuton of the normalzed smlarty scores between an onlne query and the set of past queres gve the nformaton of the normalzed smlarty score of the most smlar past query at a partcular rank wth respect to an onlne query (.e. the normalzed smlarty score of the k th most smlar past query wth the onlne query q). The average of the dstrbuton of normalzed smlarty scores across all onlne queres (y-axes) for ranks 1-to-50 (x-axes) s calculated for Trec123_T20 and Trec123_R1 past query datasets wth retreval-based and termbased smlarty measures and s reported n each graph. Detaled vew of each graph for the top 10 most smlar past queres (note that n ths work we only choose the top 5 most smlar past queres for each test query) can be seen on the smaller graphs on the rght top of each graph. An mportant observaton s that for queres that have very lmted number of terms (such as Trec123_R1 and Trec123_R2) term-based smlarty measure cannot calculate the smlarty between the past and onlne queres as precse as the retreval based smlarty approach as shown on Graphs c & d. Ths s due to the fact that although there s some smlarty between a past query and an onlne query, term-based approach cannot detect t f there are no common terms. But when the queres have enough number of terms, term-based approach also performs as good as the retreval-based approach n fndng the smlarty between past and onlne queres as shown on Graphs a & b. In the experments that used the smulated past queres as the set of past queres; the 50 real queres are treated as test/onlne user queres, and the 50 smulated queres (.e. n Trec123_R1, Trec123_R2, Trec4_R2 or Trec4_R3 smulated past query sets) are treated as past queres. Generally speakng, smulated past queres n Trec123_R1 are more smlar than Trec123_R2 wth 15

17 respect to the Trec123 onlne queres and smulated past queres n Trec4_R2 are more smlar than Trec4_R3 wth respect to the Trec4 onlne queres. Detals about smulated past queres can be seen n Table 3b. Snce the processes of generatng the smulatng queres nclude randomness, each set of smulated queres (e.g., Trec123_R1) has been generated for 15 tmes. Accordngly, the resource selecton experments that use smulated past queres are run 15 tmes for each set of smulated queres and the average results of all of these runs are reported for evaluaton. 4.3 Baselne Method In the experments, we use ReDDE to do resource selecton for past queres (.e. to determne SEL_p ). Also, ReDDE s a baselne resource selecton algorthm to compare wth qsm. For qsm, there are ether 1000 or 500 past queres at most (.e. for sampled queres), and for each past query 3 top documents per source are downloaded by SSL n order to buld a regresson model to merge search results nto a sngle ranked lst. So n total, we need to download at most 3000 or 1500 documents per source. The number s actually much smaller than 3000 or 1500 as only a very small porton of sources are selected for searchng each past query: for Trec123(or Trec4)_T10 we download about 150 and for Trec123(or Trec4)_T20 we download about 300 documents per source on average. For ReDDE, we buld the CSDb, whch contans 150 documents per source by QBS. All other parameters n ReDDE are the same as that n [23]. Although the types of evdence that qsm and ReDDE use are dfferent and t wll not be perfectly far to make a comparson, they are stll comparable by the amount of documents they use. 4.4 Evaluaton Metrc The recall metrc R k has been commonly used to evaluate resource selecton algorthms [2,7,23]. Let B denote a desrable rankng of avalable nformaton sources by Relevance-Based Rankng (.e., rankng of sources by the actual number of relevant documents), and E a rankng provded by a resource selecton algorthm. Let B and E denote the number of relevant documents n the th ranked database of B or E. The metrc R k s defned as follows: Rk = k k E = 1 (11) B = 1 The metrc above measures what percentage the dfference s between the estmated rankng and the most desrable rankng. Therefore, at a fxed k, a larger R k value ndcates a better rankng. 5 Experment Results An extensve set of experments are conducted to address the followng sx questons: 16

18 Table 4a. Comparson between ReDDE wth qsm-cut-10 and qsm-cut-20 on Trec123 dataset wth Trec123_T20 sampled past query set. Please note that the results for qsm- Cut-10 on Trec123 dataset wth Trec123_T20 sampled past query set has been reported n our prevous work (Cetntas et al. 2009). Trec123 Trec123_T20 qsm-cut-20 on Trec123_T (39.66%) (62.02%) (17.73%) (38.30%) (26.15%) (28.86%) (17.31%) (29.17%) (17.26%) (23.67%) (09.30%) (16.70%) (07.84%) (15.08%) (08.66%) (16.44%) (07.83%) (18.45%) (05.83%) (17.94%) Overall Improvement 15.76% 26.67% Table 4b. Comparson between ReDDE wth qsm-cut-10 and qsm-cut-20 on Trec4 dataset wth Trec4_T20 sampled past query set. Please note that the results for qsm- Cut-10 on Trec4 dataset wth Trec4_T20 sampled past query set has been reported n our prevous work (Cetntas et al. 2009). Trec4 Trec4_T20 qsm-cut-20 on Trec4_T (17.65%) (-08.16%) (-02.56%) (08.89%) (08.73%) (20.17%) (12.85%) (14.51%) (16.01%) (16.51%) (11.60%) (15.39%) (09.44%) (15.08%) (02.29%) (10.42%) (-01.96%) (06.87%) (-03.52%) (04.41%) Overall Improvement 7.05% 10.41% 1. How good s the new resource selecton algorthm by learnng results from past queres? Experments are conducted to compare the new resource selecton algorthm wth the state-of-the-art ReDDE resource selecton that uses statc sampled nformaton for resource selecton. 2. How does the new resource selecton algorthm behave when dfferent amount of nformaton sources have been searched for past queres? Experments are conducted to show the performance of the new resource selecton algorthm when dfferent amount of nformaton sources have been searched for past queres. 3. How does the new resource selecton algorthm behave wth dfferent amount of past queres? Experments are conducted to show the performance of the new resource selecton algorthm wth more or less amount of past queres. 4. How does the new resource selecton algorthm behave wth dfferent characterstcs of past queres? Experments are conducted to show the performance of the new resource selecton algorthm wth more or less smlar past queres. 17

19 Table 5a. Comparson between ReDDE wth qsm-cut-10 and qsm-cut-20 on Trec123 dataset wth Trec123_R1 smulated past query set. Please note that the results for qsm- Cut-10 on Trec123 dataset wth Trec123_R1 sampled past query set has been reported n our prevous work (Cetntas et al. 2009). Trec123 Trec123_R1 qsm-cut-20 on Trec123_R (63.71%) (68.78%) (35.47%) (44.91%) (33.06%) (45.94%) (30.53%) (42.81%) (25.90%) (37.35%) (15.79%) (27.91%) (12.02%) (24.21%) (09.15%) (23.47%) (08.43%) (21.62%) (03.14%) (17.06%) Overall Improvement 23.72% 35.41% Table 5b. Comparson between ReDDE wth qsm-cut-10 and qsm-cut-20 on Trec4 dataset wth Trec4_R2 smulated past query set. Please note that the results for qsm- Cut-10 on Trec4 dataset wth Trec4_R2 sampled past query set has been reported n our prevous work (Cetntas et al. 2009). Trec4 Trec4_R2 qsm-cut-20 on Trec4_R (05.23%) (13.07%) (-09.20%) (-02.41%) (04.80%) (13.28%) (06.62%) (11.98%) (11.37%) (17.19%) (09.45%) (14.17%) (09.76%) (13.62%) (05.24%) (09.00%) (03.83%) (06.82%) (01.74%) (06.51%) Overall Improvement 4.88% 10.32% 5. How does the new resource selecton algorthm behave when dfferent query smlarty estmaton approaches are used to calculate the smlarty between a test query and the set of past queres? Experments are conducted to show the performance of the new resource selecton algorthm wth retreval-based query smlarty estmaton approach and term-based query smlarty estmaton approach. 6. Can the combned resource selecton approach further mprove the accuracy of resource selecton? Experments are conducted to compare the combned approach wth the two approaches of ether learnng from past queres or usng statc sampled nformaton. 5.1 Comparson between qsm and ReDDE Table 4a and Table 4b show the results of qsm and the Trec123 and Trec4 testbeds wth Trec123_T20 and Trec4_T20 sampled query sets. In the same way Table 5a and Table 5b show the correspondng results wth Trec123_R1 and Trec4_R2 smulated query sets. 18

20 Table 6a. Comparson between dfferent amounts of sampled past query sets on Trec123 dataset wth Trec123_T10 and Trec123_T20 sampled past query sets. Please note that the results for Trec123 dataset wth Trec123_T20 sampled past query set has been reported n our prevous work (Cetntas et al. 2009). Trec123 Trec123_T10 Trec123_T (10.54%) (39.66%) (10.37%) (17.73%) (10.70%) (26.15%) (10.49%) (17.31%) (11.36%) (17.26%) (04.58%) (09.30%) (04.47%) (07.84%) (03.97%) (08.66%) (02.08%) (07.83%) (00.22%) (05.83%) Overall Improvement 6.88% 15.76% Table 6b. Comparson between dfferent amounts of sampled past query sets on Trec4 dataset wth Trec4_T10 and Trec4_T20 sampled past query sets. Please note that the results for Trec4 dataset wth Trec4_T20 sampled past query set has been reported n our prevous work (Cetntas et al. 2009). Trec4 Trec4_T10 Trec4_T (21.56%) (17.65%) (04.52%) (-02.56%) (11.07%) (08.73%) (07.89%) (12.85%) (03.96%) (16.01%) (03.57%) (11.60%) (03.49%) (09.44%) (-04.20%) (20.29%) (-08.29%) (-01.96%) (-07.98%) (-03.52%) Overall Improvement 2.65% 7.05% The performance s evaluated by the Recall metrc R k defned n secton 4.4, where n the table s the k n R k. The percentages wthn the parentheses are the relatve mprovements of qsm over ReDDE. The overall mprovement s calculated by takng the average of the mprovement percentages when 1,2,,10 sources are selected. As shown from Table 4a, Table 4b, Table 5a and Table 5b, qsm-cut-10 always generates much better results than ReDDE. Thus, qsm can be consdered as a very effectve algorthm for resource selecton. 5.2 The Performance of qsm wth Dfferent Amounts of Search Results from Past Queres Table 4a & Table 4b (wth sampled past queres) and Table 5a & Table 5b (wth smulated past queres) also show that searchng more nformaton sources (.e. X = 20 n ths case) for past queres can help us do a better resource selecton for onlne queres (than searchng fewer nformaton sources (.e. X=10)). Ths s a reasonable result as the search results of past queres n qsm-cut-20 provde more nformaton than the search results of past queres n qsm-cut

Balanced Query Methods for Improving OCR-Based Retrieval

Balanced Query Methods for Improving OCR-Based Retrieval Balanced Query Methods for Improvng OCR-Based Retreval Kareem Darwsh Electrcal and Computer Engneerng Dept. Unversty of Maryland, College Park College Park, MD 20742 kareem@glue.umd.edu Douglas W. Oard

More information

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16

310 Int'l Conf. Par. and Dist. Proc. Tech. and Appl. PDPTA'16 310 Int'l Conf. Par. and Dst. Proc. Tech. and Appl. PDPTA'16 Akra Sasatan and Hrosh Ish Graduate School of Informaton and Telecommuncaton Engneerng, Toka Unversty, Mnato, Tokyo, Japan Abstract The end-to-end

More information

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures

Using the Perpendicular Distance to the Nearest Fracture as a Proxy for Conventional Fracture Spacing Measures Usng the Perpendcular Dstance to the Nearest Fracture as a Proxy for Conventonal Fracture Spacng Measures Erc B. Nven and Clayton V. Deutsch Dscrete fracture network smulaton ams to reproduce dstrbutons

More information

Study and Comparison of Various Techniques of Image Edge Detection

Study and Comparison of Various Techniques of Image Edge Detection Gureet Sngh et al Int. Journal of Engneerng Research Applcatons RESEARCH ARTICLE OPEN ACCESS Study Comparson of Varous Technques of Image Edge Detecton Gureet Sngh*, Er. Harnder sngh** *(Department of

More information

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy

Appendix for. Institutions and Behavior: Experimental Evidence on the Effects of Democracy Appendx for Insttutons and Behavor: Expermental Evdence on the Effects of Democrac 1. Instructons 1.1 Orgnal sessons Welcome You are about to partcpate n a stud on decson-makng, and ou wll be pad for our

More information

ALMALAUREA WORKING PAPERS no. 9

ALMALAUREA WORKING PAPERS no. 9 Snce 1994 Inter-Unversty Consortum Connectng Unverstes, the Labour Market and Professonals AlmaLaurea Workng Papers ISSN 2239-9453 ALMALAUREA WORKING PAPERS no. 9 September 211 Propensty Score Methods

More information

Physical Model for the Evolution of the Genetic Code

Physical Model for the Evolution of the Genetic Code Physcal Model for the Evoluton of the Genetc Code Tatsuro Yamashta Osamu Narkyo Department of Physcs, Kyushu Unversty, Fukuoka 8-856, Japan Abstract We propose a physcal model to descrbe the mechansms

More information

CLUSTERING is always popular in modern technology

CLUSTERING is always popular in modern technology Max-Entropy Feed-Forward Clusterng Neural Network Han Xao, Xaoyan Zhu arxv:1506.03623v1 [cs.lg] 11 Jun 2015 Abstract The outputs of non-lnear feed-forward neural network are postve, whch could be treated

More information

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores

Parameter Estimates of a Random Regression Test Day Model for First Three Lactation Somatic Cell Scores Parameter Estmates of a Random Regresson Test Day Model for Frst Three actaton Somatc Cell Scores Z. u, F. Renhardt and R. Reents Unted Datasystems for Anmal Producton (VIT), Hedeweg 1, D-27280 Verden,

More information

An Approach to Discover Dependencies between Service Operations*

An Approach to Discover Dependencies between Service Operations* 36 JOURNAL OF SOFTWARE VOL. 3 NO. 9 DECEMBER 2008 An Approach to Dscover Dependences between Servce Operatons* Shuyng Yan Research Center for Grd and Servce Computng Insttute of Computng Technology Chnese

More information

Copy Number Variation Methods and Data

Copy Number Variation Methods and Data Copy Number Varaton Methods and Data Copy number varaton (CNV) Reference Sequence ACCTGCAATGAT TAAGCCCGGG TTGCAACGTTAGGCA Populaton ACCTGCAATGAT TAAGCCCGGG TTGCAACGTTAGGCA ACCTGCAATGAT TTGCAACGTTAGGCA

More information

Optimal Planning of Charging Station for Phased Electric Vehicle *

Optimal Planning of Charging Station for Phased Electric Vehicle * Energy and Power Engneerng, 2013, 5, 1393-1397 do:10.4236/epe.2013.54b264 Publshed Onlne July 2013 (http://www.scrp.org/ournal/epe) Optmal Plannng of Chargng Staton for Phased Electrc Vehcle * Yang Gao,

More information

Encoding processes, in memory scanning tasks

Encoding processes, in memory scanning tasks vlemory & Cognton 1976,4 (5), 501 506 Encodng processes, n memory scannng tasks JEFFREY O. MILLER and ROBERT G. PACHELLA Unversty of Mchgan, Ann Arbor, Mchgan 48101, Three experments are presented that

More information

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS)

International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) Internatonal Assocaton of Scentfc Innovaton and Research (IASIR (An Assocaton Unfyng the Scences, Engneerng, and Appled Research Internatonal Journal of Emergng Technologes n Computatonal and Appled Scences

More information

NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 TIANHONG ZHOU

NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 TIANHONG ZHOU NUMERICAL COMPARISONS OF BIOASSAY METHODS IN ESTIMATING LC50 by TIANHONG ZHOU B.S., Chna Agrcultural Unversty, 2003 M.S., Chna Agrcultural Unversty, 2006 A THESIS submtted n partal fulfllment of the requrements

More information

Non-linear Multiple-Cue Judgment Tasks

Non-linear Multiple-Cue Judgment Tasks Non-lnear Multple-Cue Tasks Anna-Carn Olsson (anna-carn.olsson@psy.umu.se) Department of Psychology, Umeå Unversty SE-09 87, Umeå, Sweden Tommy Enqvst (tommy.enqvst@psyk.uu.se) Department of Psychology,

More information

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field

Subject-Adaptive Real-Time Sleep Stage Classification Based on Conditional Random Field Subject-Adaptve Real-Tme Sleep Stage Classfcaton Based on Condtonal Random Feld Gang Luo, PhD, Wanl Mn, PhD IBM TJ Watson Research Center, Hawthorne, NY {luog, wanlmn}@usbmcom Abstract Sleep stagng s the

More information

N-back Training Task Performance: Analysis and Model

N-back Training Task Performance: Analysis and Model N-back Tranng Task Performance: Analyss and Model J. Isaah Harbson (jharb@umd.edu) Center for Advanced Study of Language and Department of Psychology, Unversty of Maryland 7005 52 nd Avenue, College Park,

More information

What Determines Attitude Improvements? Does Religiosity Help?

What Determines Attitude Improvements? Does Religiosity Help? Internatonal Journal of Busness and Socal Scence Vol. 4 No. 9; August 2013 What Determnes Atttude Improvements? Does Relgosty Help? Madhu S. Mohanty Calforna State Unversty-Los Angeles Los Angeles, 5151

More information

NHS Outcomes Framework

NHS Outcomes Framework NHS Outcomes Framework Doman 1 Preventng people from dyng prematurely Indcator Specfcatons Verson: 1.21 Date: May 2018 Author: Clncal Indcators Team NHS Outcomes Framework: Doman 1 Preventng people from

More information

Computing and Using Reputations for Internet Ratings

Computing and Using Reputations for Internet Ratings Computng and Usng Reputatons for Internet Ratngs Mao Chen Department of Computer Scence Prnceton Unversty Prnceton, J 8 (69)-8-797 maoch@cs.prnceton.edu Jaswnder Pal Sngh Department of Computer Scence

More information

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana

Modeling the Survival of Retrospective Clinical Data from Prostate Cancer Patients in Komfo Anokye Teaching Hospital, Ghana Internatonal Journal of Appled Scence and Technology Vol. 5, No. 6; December 2015 Modelng the Survval of Retrospectve Clncal Data from Prostate Cancer Patents n Komfo Anokye Teachng Hosptal, Ghana Asedu-Addo,

More information

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of

Modeling Multi Layer Feed-forward Neural. Network Model on the Influence of Hypertension. and Diabetes Mellitus on Family History of Appled Mathematcal Scences, Vol. 7, 2013, no. 41, 2047-2053 HIKARI Ltd, www.m-hkar.com Modelng Mult Layer Feed-forward Neural Network Model on the Influence of Hypertenson and Dabetes Melltus on Famly

More information

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex

Sparse Representation of HCP Grayordinate Data Reveals. Novel Functional Architecture of Cerebral Cortex 1 Sparse Representaton of HCP Grayordnate Data Reveals Novel Functonal Archtecture of Cerebral Cortex X Jang 1, Xang L 1, Jngle Lv 2,1, Tuo Zhang 2,1, Shu Zhang 1, Le Guo 2, Tanmng Lu 1* 1 Cortcal Archtecture

More information

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect

A comparison of statistical methods in interrupted time series analysis to estimate an intervention effect Peer revew stream A comparson of statstcal methods n nterrupted tme seres analyss to estmate an nterventon effect a,b, J.J.J., Walter c, S., Grzebeta a, R. & Olver b, J. a Transport and Road Safety, Unversty

More information

Evaluation of Literature-based Discovery Systems

Evaluation of Literature-based Discovery Systems Evaluaton of Lterature-based Dscovery Systems Melha Yetsgen-Yldz 1 and Wanda Pratt 1,2 1 The Informaton School, Unversty of Washngton, Seattle, USA. 2 Bomedcal and Health Informatcs, School of Medcne,

More information

Drug Prescription Behavior and Decision Support Systems

Drug Prescription Behavior and Decision Support Systems Drug Prescrpton Behavor and Decson Support Systems ABSTRACT Adverse drug events plague the outcomes of health care servces. In ths research, we propose a clncal learnng model that ncorporates the use of

More information

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA

A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA Journal of Theoretcal and Appled Informaton Technology 2005 ongong JATIT & LLS ISSN: 1992-8645 www.jatt.org E-ISSN: 1817-3195 A MIXTURE OF EXPERTS FOR CATARACT DIAGNOSIS IN HOSPITAL SCREENING DATA 1 SUNGMIN

More information

The effect of salvage therapy on survival in a longitudinal study with treatment by indication

The effect of salvage therapy on survival in a longitudinal study with treatment by indication Research Artcle Receved 28 October 2009, Accepted 8 June 2010 Publshed onlne 30 August 2010 n Wley Onlne Lbrary (wleyonlnelbrary.com) DOI: 10.1002/sm.4017 The effect of salvage therapy on survval n a longtudnal

More information

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis

The Limits of Individual Identification from Sample Allele Frequencies: Theory and Statistical Analysis The Lmts of Indvdual Identfcaton from Sample Allele Frequences: Theory and Statstcal Analyss Peter M. Vsscher 1 *, Wllam G. Hll 2 1 Queensland Insttute of Medcal Research, Brsbane, Australa, 2 Insttute

More information

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE

INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE INITIAL ANALYSIS OF AWS-OBSERVED TEMPERATURE Wang Yng, Lu Xaonng, Ren Zhhua, Natonal Meteorologcal Informaton Center, Bejng, Chna Tel.:+86 684755, E-mal:cdcsjk@cma.gov.cn Abstract From, n Chna meteorologcal

More information

Shape-based Retrieval of Heart Sounds for Disease Similarity Detection Tanveer Syeda-Mahmood, Fei Wang

Shape-based Retrieval of Heart Sounds for Disease Similarity Detection Tanveer Syeda-Mahmood, Fei Wang Shape-based Retreval of Heart Sounds for Dsease Smlarty Detecton Tanveer Syeda-Mahmood, Fe Wang 1 IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120. {stf,wangfe}@almaden.bm.com Abstract.

More information

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo

Estimation for Pavement Performance Curve based on Kyoto Model : A Case Study for Highway in the State of Sao Paulo Estmaton for Pavement Performance Curve based on Kyoto Model : A Case Study for Kazuya AOKI, PASCO CORPORATION, Yokohama, JAPAN, Emal : kakzo603@pasco.co.jp Octávo de Souza Campos, Publc Servces Regulatory

More information

Active Affective State Detection and User Assistance with Dynamic Bayesian Networks. Xiangyang Li, Qiang Ji

Active Affective State Detection and User Assistance with Dynamic Bayesian Networks. Xiangyang Li, Qiang Ji Actve Affectve State Detecton and User Assstance wth Dynamc Bayesan Networks Xangyang L, Qang J Electrcal, Computer, and Systems Engneerng Department Rensselaer Polytechnc Insttute, 110 8th Street, Troy,

More information

Appendix F: The Grant Impact for SBIR Mills

Appendix F: The Grant Impact for SBIR Mills Appendx F: The Grant Impact for SBIR Mlls Asmallsubsetofthefrmsnmydataapplymorethanonce.Ofthe7,436applcant frms, 71% appled only once, and a further 14% appled twce. Wthn my data, seven companes each submtted

More information

*VALLIAPPAN Raman 1, PUTRA Sumari 2 and MANDAVA Rajeswari 3. George town, Penang 11800, Malaysia. George town, Penang 11800, Malaysia

*VALLIAPPAN Raman 1, PUTRA Sumari 2 and MANDAVA Rajeswari 3. George town, Penang 11800, Malaysia. George town, Penang 11800, Malaysia 38 A Theoretcal Methodology and Prototype Implementaton for Detecton Segmentaton Classfcaton of Dgtal Mammogram Tumor by Machne Learnng and Problem Solvng *VALLIAPPA Raman, PUTRA Sumar 2 and MADAVA Rajeswar

More information

ENRICHING PROCESS OF ICE-CREAM RECOMMENDATION USING COMBINATORIAL RANKING OF AHP AND MONTE CARLO AHP

ENRICHING PROCESS OF ICE-CREAM RECOMMENDATION USING COMBINATORIAL RANKING OF AHP AND MONTE CARLO AHP ENRICHING PROCESS OF ICE-CREAM RECOMMENDATION USING COMBINATORIAL RANKING OF AHP AND MONTE CARLO AHP 1 AKASH RAMESHWAR LADDHA, 2 RAHUL RAGHVENDRA JOSHI, 3 Dr.PEETI MULAY 1 M.Tech, Department of Computer

More information

A Geometric Approach To Fully Automatic Chromosome Segmentation

A Geometric Approach To Fully Automatic Chromosome Segmentation A Geometrc Approach To Fully Automatc Chromosome Segmentaton Shervn Mnaee ECE Department New York Unversty Brooklyn, New York, USA shervn.mnaee@nyu.edu Mehran Fotouh Computer Engneerng Department Sharf

More information

A New Machine Learning Algorithm for Breast and Pectoral Muscle Segmentation

A New Machine Learning Algorithm for Breast and Pectoral Muscle Segmentation Avalable onlne www.ejaet.com European Journal of Advances n Engneerng and Technology, 2015, 2(1): 21-29 Research Artcle ISSN: 2394-658X A New Machne Learnng Algorthm for Breast and Pectoral Muscle Segmentaton

More information

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS

WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS WHO S ASSESSMENT OF HEALTH CARE INDUSTRY PERFORMANCE: RATING THE RANKINGS ELLIOTT PARKER and JEANNE WENDEL * Department of Economcs, Unversty of Nevada, Reno, NV, USA SUMMARY Ths paper examnes the econometrc

More information

Using a Wavelet Representation for Classification of Movement in Bed

Using a Wavelet Representation for Classification of Movement in Bed Usng a Wavelet Representaton for Classfcaton of Movement n Bed Adrana Morell Adam Depto. de Matemátca e Estatístca Unversdade de Caxas do Sul Caxas do Sul RS E-mal: amorell@ucs.br André Gustavo Adam Depto.

More information

Lymphoma Cancer Classification Using Genetic Programming with SNR Features

Lymphoma Cancer Classification Using Genetic Programming with SNR Features Lymphoma Cancer Classfcaton Usng Genetc Programmng wth SNR Features Jn-Hyuk Hong and Sung-Bae Cho Dept. of Computer Scence, Yonse Unversty, 134 Shnchon-dong, Sudaemoon-ku, Seoul 120-749, Korea hjnh@candy.yonse.ac.kr,

More information

Estimation of Relative Survival Based on Cancer Registry Data

Estimation of Relative Survival Based on Cancer Registry Data Revew of Bonformatcs and Bometrcs (RBB) Volume 2 Issue 4, December 203 www.sepub.org/rbb Estmaton of Relatve Based on Cancer Regstry Data Olaf Schoffer *, Ante Nedostate 2, Stefane J. Klug,2 Cancer Epdemology,

More information

Investigation of zinc oxide thin film by spectroscopic ellipsometry

Investigation of zinc oxide thin film by spectroscopic ellipsometry VNU Journal of Scence, Mathematcs - Physcs 24 (2008) 16-23 Investgaton of znc oxde thn flm by spectroscopc ellpsometry Nguyen Nang Dnh 1, Tran Quang Trung 2, Le Khac Bnh 2, Nguyen Dang Khoa 2, Vo Th Ma

More information

THE NORMAL DISTRIBUTION AND Z-SCORES COMMON CORE ALGEBRA II

THE NORMAL DISTRIBUTION AND Z-SCORES COMMON CORE ALGEBRA II Name: Date: THE NORMAL DISTRIBUTION AND Z-SCORES COMMON CORE ALGEBRA II The normal dstrbuton can be used n ncrements other than half-standard devatons. In fact, we can use ether our calculators or tables

More information

A Linear Regression Model to Detect User Emotion for Touch Input Interactive Systems

A Linear Regression Model to Detect User Emotion for Touch Input Interactive Systems 2015 Internatonal Conference on Affectve Computng and Intellgent Interacton (ACII) A Lnear Regresson Model to Detect User Emoton for Touch Input Interactve Systems Samt Bhattacharya Dept of Computer Scence

More information

AUTOMATED CHARACTERIZATION OF ESOPHAGEAL AND SEVERELY INJURED VOICES BY MEANS OF ACOUSTIC PARAMETERS

AUTOMATED CHARACTERIZATION OF ESOPHAGEAL AND SEVERELY INJURED VOICES BY MEANS OF ACOUSTIC PARAMETERS AUTOMATED CHARACTERIZATIO OF ESOPHAGEAL AD SEVERELY IJURED VOICES BY MEAS OF ACOUSTIC PARAMETERS B. García, I. Ruz, A. Méndez, J. Vcente, and M. Mendezona Department of Telecommuncaton, Unversty of Deusto

More information

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx

ARTICLE IN PRESS Neuropsychologia xxx (2010) xxx xxx Neuropsychologa xxx (200) xxx xxx Contents lsts avalable at ScenceDrect Neuropsychologa journal homepage: www.elsever.com/locate/neuropsychologa Storage and bndng of object features n vsual workng memory

More information

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data

Unobserved Heterogeneity and the Statistical Analysis of Highway Accident Data Unobserved Heterogenety and the Statstcal Analyss of Hghway Accdent Data Fred L. Mannerng Professor of Cvl and Envronmental Engneerng Courtesy Department of Economcs Unversty of South Florda 4202 E. Fowler

More information

Maize Varieties Combination Model of Multi-factor. and Implement

Maize Varieties Combination Model of Multi-factor. and Implement Maze Varetes Combnaton Model of Mult-factor and Implement LIN YANG,XIAODONG ZHANG,SHAOMING LI Department of Geographc Informaton Scence Chna Agrcultural Unversty No. 17 Tsnghua East Road, Bejng 100083

More information

Biomarker Selection from Gene Expression Data for Tumour Categorization Using Bat Algorithm

Biomarker Selection from Gene Expression Data for Tumour Categorization Using Bat Algorithm Receved: March 20, 2017 401 Bomarker Selecton from Gene Expresson Data for Tumour Categorzaton Usng Bat Algorthm Gunavath Chellamuthu 1 *, Premalatha Kandasamy 2, Svasubramanan Kanagaraj 3 1 School of

More information

Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification

Fast Algorithm for Vectorcardiogram and Interbeat Intervals Analysis: Application for Premature Ventricular Contractions Classification Fast Algorthm for Vectorcardogram and Interbeat Intervals Analyss: Applcaton for Premature Ventrcular Contractons Classfcaton Irena Jekova, Vessela Krasteva Centre of Bomedcal Engneerng Prof. Ivan Daskalov

More information

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and

This article appeared in a journal published by Elsevier. The attached copy is furnished to the author for internal non-commercial research and Ths artcle appeared n a journal publshed by Elsever. The attached copy s furnshed to the author for nternal non-commercal research and educaton use, ncludng for nstructon at the authors nsttuton and sharng

More information

A Novel artifact for evaluating accuracies of gear profile and pitch measurements of gear measuring instruments

A Novel artifact for evaluating accuracies of gear profile and pitch measurements of gear measuring instruments A Novel artfact for evaluatng accuraces of gear profle and ptch measurements of gear measurng nstruments Sonko Osawa, Osamu Sato, Yohan Kondo, Toshyuk Takatsuj (NMIJ/AIST) Masaharu Komor (Kyoto Unversty)

More information

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE

IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE IMPROVING THE EFFICIENCY OF BIOMARKER IDENTIFICATION USING BIOLOGICAL KNOWLEDGE JOHN H. PHAN The Wallace H. Coulter Department of Bomedcal Engneerng, Georga Insttute of Technology, 313 Ferst Drve Atlanta,

More information

Submitted for Presentation 94th Annual Meeting of the Transportation Research Board January 11-15, 2015, Washington, D.C.

Submitted for Presentation 94th Annual Meeting of the Transportation Research Board January 11-15, 2015, Washington, D.C. Wegh-In-Moton Staton Montorng and Calbraton usng Inductve Loop Sgnature Technology Shn-Tng (Cndy) Jeng (Correspondng Author) CLR Analytcs Inc 8885 Research Drve Sute 15 Irvne, CA 92618 Tel: 949-864-6696,

More information

Chapter 20. Aggregation and calibration. Betina Dimaranan, Thomas Hertel, Robert McDougall

Chapter 20. Aggregation and calibration. Betina Dimaranan, Thomas Hertel, Robert McDougall Chapter 20 Aggregaton and calbraton Betna Dmaranan, Thomas Hertel, Robert McDougall In the prevous chapter we dscussed how the fnal verson 3 GTAP data base was assembled. Ths data base s extremely large.

More information

Lateral Transfer Data Report. Principal Investigator: Andrea Baptiste, MA, OT, CIE Co-Investigator: Kay Steadman, MA, OTR, CHSP. Executive Summary:

Lateral Transfer Data Report. Principal Investigator: Andrea Baptiste, MA, OT, CIE Co-Investigator: Kay Steadman, MA, OTR, CHSP. Executive Summary: Samar tmed c ali ndus t r esi nc 55Fl em ngdr ve, Un t#9 Cambr dge, ON. N1T2A9 T el. 18886582206 Ema l. nf o@s amar t r ol l boar d. c om www. s amar t r ol l boar d. c om Lateral Transfer Data Report

More information

Survival Rate of Patients of Ovarian Cancer: Rough Set Approach

Survival Rate of Patients of Ovarian Cancer: Rough Set Approach Internatonal OEN ACCESS Journal Of Modern Engneerng esearch (IJME) Survval ate of atents of Ovaran Cancer: ough Set Approach Kamn Agrawal 1, ragat Jan 1 Department of Appled Mathematcs, IET, Indore, Inda

More information

Introduction ORIGINAL RESEARCH

Introduction ORIGINAL RESEARCH ORIGINAL RESEARCH Assessng the Statstcal Sgnfcance of the Acheved Classfcaton Error of Classfers Constructed usng Serum Peptde Profles, and a Prescrpton for Random Samplng Repeated Studes for Massve Hgh-Throughput

More information

Richard Williams Notre Dame Sociology Meetings of the European Survey Research Association Ljubljana,

Richard Williams Notre Dame Sociology   Meetings of the European Survey Research Association Ljubljana, Rchard Wllams Notre Dame Socology rwllam@nd.edu http://www.nd.edu/~rwllam Meetngs of the European Survey Research Assocaton Ljubljana, Slovena July 19, 2013 Comparng Logt and Probt Coeffcents across groups

More information

Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer

Gene Selection Based on Mutual Information for the Classification of Multi-class Cancer Gene Selecton Based on Mutual Informaton for the Classfcaton of Mult-class Cancer Sheng-Bo Guo,, Mchael R. Lyu 3, and Tat-Mng Lok 4 Department of Automaton, Unversty of Scence and Technology of Chna, Hefe,

More information

Price linkages in value chains: methodology

Price linkages in value chains: methodology Prce lnkages n value chans: methodology Prof. Trond Bjorndal, CEMARE. Unversty of Portsmouth, UK. and Prof. José Fernández-Polanco Unversty of Cantabra, Span. FAO INFOSAMAK Tangers, Morocco 14 March 2012

More information

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article

Journal of Engineering Science and Technology Review 11 (2) (2018) Research Article Jestr Journal of Engneerng Scence and Technology Revew () (08) 5 - Research Artcle Prognoss Evaluaton of Ovaran Granulosa Cell Tumor Based on Co-forest ntellgence Model Xn Lao Xn Zheng Juan Zou Mn Feng

More information

Biased Perceptions of Income Distribution and Preferences for Redistribution: Evidence from a Survey Experiment

Biased Perceptions of Income Distribution and Preferences for Redistribution: Evidence from a Survey Experiment DISCUSSION PAPER SERIES IZA DP No. 5699 Based Perceptons of Income Dstrbuton and Preferences for Redstrbuton: Evdence from a Survey Experment Gullermo Cruces Rcardo Pérez Trugla Martn Tetaz May 2011 Forschungsnsttut

More information

Prototypes in the Mist: The Early Epochs of Category Learning

Prototypes in the Mist: The Early Epochs of Category Learning Journal of Expermental Psychology: Learnng, Memory, and Cognton 1998, Vol. 24, No. 6, 1411-1436 Copyrght 1998 by the Amercan Psychologcal Assocaton, Inc. 0278-7393/98/S3.00 Prototypes n the Mst: The Early

More information

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015

Incorrect Beliefs. Overconfidence. Types of Overconfidence. Outline. Overprecision 4/22/2015. Econ 1820: Behavioral Economics Mark Dean Spring 2015 Incorrect Belefs Overconfdence Econ 1820: Behavoral Economcs Mark Dean Sprng 2015 In objectve EU we assumed that everyone agreed on what the probabltes of dfferent events were In subjectve expected utlty

More information

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters

Prediction of Total Pressure Drop in Stenotic Coronary Arteries with Their Geometric Parameters Tenth Internatonal Conference on Computatonal Flud Dynamcs (ICCFD10), Barcelona, Span, July 9-13, 2018 ICCFD10-227 Predcton of Total Pressure Drop n Stenotc Coronary Arteres wth Ther Geometrc Parameters

More information

Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA

Alma Mater Studiorum Università di Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA Alma Mater Studorum Unverstà d Bologna DOTTORATO DI RICERCA IN METODOLOGIA STATISTICA PER LA RICERCA SCIENTIFICA Cclo XXVII Settore Concorsuale d afferenza: 13/D1 Settore Scentfco dscplnare: SECS-S/02

More information

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi HIV/AIDS-related Expectatons and Rsky Sexual Behavor n Malaw Adelne Delavande Unversty of Essex and RAND Corporaton Hans-Peter Kohler Unversty of Pennsylvanna January 202 Abstract We use probablstc expectatons

More information

Towards Automated Pose Invariant 3D Dental Biometrics

Towards Automated Pose Invariant 3D Dental Biometrics Towards Automated Pose Invarant 3D Dental Bometrcs Xn ZHONG 1, Depng YU 1, Kelvn W C FOONG, Terence SIM 3, Yoke San WONG 1 and Ho-lun CHENG 3 1. Mechancal Engneerng, Natonal Unversty of Sngapore, 117576,

More information

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi

HIV/AIDS-related Expectations and Risky Sexual Behavior in Malawi Unversty of Pennsylvana ScholarlyCommons PSC Workng Paper Seres 7-29-20 HIV/AIDS-related Expectatons and Rsky Sexual Behavor n Malaw Adelne Delavande RAND Corporaton, Nova School of Busness and Economcs

More information

An Introduction to Modern Measurement Theory

An Introduction to Modern Measurement Theory An Introducton to Modern Measurement Theory Ths tutoral was wrtten as an ntroducton to the bascs of tem response theory (IRT) modelng and ts applcatons to health outcomes measurement for the Natonal Cancer

More information

A-UNIFAC Modeling of Binary and Multicomponent Phase Equilibria of Fatty Esters+Water+Methanol+Glycerol

A-UNIFAC Modeling of Binary and Multicomponent Phase Equilibria of Fatty Esters+Water+Methanol+Glycerol -UNIFC Modelng of Bnary and Multcomponent Phase Equlbra of Fatty Esters+Water+Methanol+Glycerol N. Garrdo a, O. Ferrera b, R. Lugo c, J.-C. de Hemptnne c, M. E. Macedo a, S.B. Bottn d,* a Department of

More information

INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER

INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER INTEGRATIVE NETWORK ANALYSIS TO IDENTIFY ABERRANT PATHWAY NETWORKS IN OVARIAN CANCER LI CHEN 1,2, JIANHUA XUAN 1,*, JINGHUA GU 1, YUE WANG 1, ZHEN ZHANG 2, TIAN LI WANG 2, IE MING SHIH 2 1The Bradley Department

More information

TOPICS IN HEALTH ECONOMETRICS

TOPICS IN HEALTH ECONOMETRICS TOPICS IN HEALTH ECONOMETRICS By VIDHURA SENANI BANDARA WIJAYAWARDHANA TENNEKOON A dssertaton submtted n partal fulfllment of the requrements for the degree of DOCTOR OF PHILOSOPHY WASHINGTON STATE UNIVERSITY

More information

Project title: Mathematical Models of Fish Populations in Marine Reserves

Project title: Mathematical Models of Fish Populations in Marine Reserves Applcaton for Fundng (Malaspna Research Fund) Date: November 0, 2005 Project ttle: Mathematcal Models of Fsh Populatons n Marne Reserves Dr. Lev V. Idels Unversty College Professor Mathematcs Department

More information

EVALUATION OF BULK MODULUS AND RING DIAMETER OF SOME TELLURITE GLASS SYSTEMS

EVALUATION OF BULK MODULUS AND RING DIAMETER OF SOME TELLURITE GLASS SYSTEMS Chalcogende Letters Vol. 12, No. 2, February 2015, p. 67-74 EVALUATION OF BULK MODULUS AND RING DIAMETER OF SOME TELLURITE GLASS SYSTEMS R. EL-MALLAWANY a*, M.S. GAAFAR b, N. VEERAIAH c a Physcs Dept.,

More information

Integration of sensory information within touch and across modalities

Integration of sensory information within touch and across modalities Integraton of sensory nformaton wthn touch and across modaltes Marc O. Ernst, Jean-Perre Brescan, Knut Drewng & Henrch H. Bülthoff Max Planck Insttute for Bologcal Cybernetcs 72076 Tübngen, Germany marc.ernst@tuebngen.mpg.de

More information

Single-Case Designs and Clinical Biofeedback Experimentation

Single-Case Designs and Clinical Biofeedback Experimentation Bofeedback and Self-Regulaton, VoL 2, No. 3, 1977 Sngle-Case Desgns and Clncal Bofeedback Expermentaton Davd H. Barow: Brown Unversty and Butler Hosptal Edward B. Blanchard Unversty of Tennessee Medcal

More information

Inverted-U and Inverted-J Effects in Self-Referenced Decisions

Inverted-U and Inverted-J Effects in Self-Referenced Decisions Inverted-U and Inverted-J Effects n Self-Referenced Decsons Kenpe SHIINA (shnaatwaseda.jp) Department of Educatonal Psychology, Waseda Unversty, Tokyo, Japan Abstract Ratng one s own personalty trats s

More information

Event Summarization using Tweets

Event Summarization using Tweets Event Summarzaton usng Tweets ABSTRACT Deepaan Chakrabart Yahoo! Research 701 1st Avenue Sunnvale, CA 94089 deepa@ahoo-nc.com Twtter has become exceedngl popular, wth mllons of tweets beng posted ever

More information

A New Diagnosis Loseless Compression Method for Digital Mammography Based on Multiple Arbitrary Shape ROIs Coding Framework

A New Diagnosis Loseless Compression Method for Digital Mammography Based on Multiple Arbitrary Shape ROIs Coding Framework I.J.Modern Educaton and Computer Scence, 2011, 5, 33-39 Publshed Onlne August 2011 n MECS (http://www.mecs-press.org/) A New Dagnoss Loseless Compresson Method for Dgtal Mammography Based on Multple Arbtrary

More information

Natural Image Denoising: Optimality and Inherent Bounds

Natural Image Denoising: Optimality and Inherent Bounds atural Image Denosng: Optmalty and Inherent Bounds Anat Levn and Boaz adler Department of Computer Scence and Appled Math The Wezmann Insttute of Scence Abstract The goal of natural mage denosng s to estmate

More information

(From the Gastroenterology Division, Cornell University Medical College, New York 10021)

(From the Gastroenterology Division, Cornell University Medical College, New York 10021) ROLE OF HEPATIC ANION-BINDING PROTEIN IN BROMSULPHTHALEIN CONJUGATION* BY N. KAPLOWITZ, I. W. PERC -ROBB,~ ANn N. B. JAVITT (From the Gastroenterology Dvson, Cornell Unversty Medcal College, New York 10021)

More information

Resampling Methods for the Area Under the ROC Curve

Resampling Methods for the Area Under the ROC Curve Resamplng ethods for the Area Under the ROC Curve Andry I. Bandos AB6@PITT.EDU Howard E. Rockette HERBST@PITT.EDU Department of Bostatstcs, Graduate School of Publc Health, Unversty of Pttsburgh, Pttsburgh,

More information

Optimal probability weights for estimating causal effects of time-varying treatments with marginal structural Cox models

Optimal probability weights for estimating causal effects of time-varying treatments with marginal structural Cox models Optmal probablty weghts for estmatng causal effects of tme-varyng treatments wth margnal structural Cox models Mchele Santacatterna, Cela García-Pareja Rno Bellocco, Anders Sönnerborg, Anna Ma Ekström

More information

Journal of Economic Behavior & Organization

Journal of Economic Behavior & Organization Journal of Economc Behavor & Organzaton 133 (2017) 52 73 Contents lsts avalable at ScenceDrect Journal of Economc Behavor & Organzaton j ourna l ho me pa g e: www.elsever.com/locate/jebo Perceptons, ntentons,

More information

Statistical Analysis on Infectious Diseases in Dubai, UAE

Statistical Analysis on Infectious Diseases in Dubai, UAE Internatonal Journal of Preventve Medcne Research Vol. 1, No. 4, 015, pp. 60-66 http://www.ascence.org/journal/jpmr Statstcal Analyss on Infectous Dseases 1995-013 n Duba, UAE Khams F. G. 1, Hussan H.

More information

4.2 Scheduling to Minimize Maximum Lateness

4.2 Scheduling to Minimize Maximum Lateness 4. Schedulng to Mnmze Maxmum Lateness Schedulng to Mnmzng Maxmum Lateness Mnmzng lateness problem. Sngle resource processes one ob at a tme. Job requres t unts of processng tme and s due at tme d. If starts

More information

The Importance of Being Marginal: Gender Differences in Generosity 1

The Importance of Being Marginal: Gender Differences in Generosity 1 The Importance of Beng Margnal: Gender Dfferences n Generosty 1 Stefano DellaVgna, John A. Lst, Ulrke Malmender, and Gautam Rao Forthcomng, Amercan Economc Revew Papers and Proceedngs, May 2013 Abstract

More information

A GEOGRAPHICAL AND STATISTICAL ANALYSIS OF LEUKEMIA DEATHS RELATING TO NUCLEAR POWER PLANTS. Whitney Thompson, Sarah McGinnis, Darius McDaniel,

A GEOGRAPHICAL AND STATISTICAL ANALYSIS OF LEUKEMIA DEATHS RELATING TO NUCLEAR POWER PLANTS. Whitney Thompson, Sarah McGinnis, Darius McDaniel, A GEOGRAPHICAL AD STATISTICAL AALYSIS OF LEUKEMIA DEATHS RELATIG TO UCLEAR POWER PLATS Whtney Thompson, Sarah McGnns, Darus McDanel, Jean Sexton, Rebecca Pettt, Sarah Anderson, Monca Jackson ABSTRACT:

More information

Evaluation of the generalized gamma as a tool for treatment planning optimization

Evaluation of the generalized gamma as a tool for treatment planning optimization Internatonal Journal of Cancer Therapy and Oncology www.jcto.org Evaluaton of the generalzed gamma as a tool for treatment plannng optmzaton Emmanoul I Petrou 1,, Ganesh Narayanasamy 3, Eleftheros Lavdas

More information

The Effect of Fish Farmers Association on Technical Efficiency: An Application of Propensity Score Matching Analysis

The Effect of Fish Farmers Association on Technical Efficiency: An Application of Propensity Score Matching Analysis The Effect of Fsh Farmers Assocaton on Techncal Effcency: An Applcaton of Propensty Score Matchng Analyss Onumah E. E, Esslfe F. L, and Asumng-Brempong, S 15 th July, 2016 Background and Motvaton Outlne

More information

AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION

AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION www.arpapress.com/volumes/vol8issue2/ijrras_8_2_02.pdf AN ENHANCED GAGS BASED MTSVSL LEARNING TECHNIQUE FOR CANCER MOLECULAR PATTERN PREDICTION OF CANCER CLASSIFICATION I. Jule 1 & E. Krubakaran 2 1 Department

More information

Economic crisis and follow-up of the conditions that define metabolic syndrome in a cohort of Catalonia,

Economic crisis and follow-up of the conditions that define metabolic syndrome in a cohort of Catalonia, Economc crss and follow-up of the condtons that defne metabolc syndrome n a cohort of Catalona, 2005-2012 Laa Maynou 1,2,3, Joan Gl 4, Gabrel Coll-de-Tuero 5,2, Ton Mora 6, Carme Saurna 1,2, Anton Scras

More information

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION

FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION computng@tanet.edu.te.ua www.tanet.edu.te.ua/computng ISSN 727-6209 Internatonal Scentfc Journal of Computng FAST DETECTION OF MASSES IN MAMMOGRAMS WITH DIFFICULT CASE EXCLUSION Gábor Takács ), Béla Patak

More information

Can Subjective Questions on Economic Welfare Be Trusted?

Can Subjective Questions on Economic Welfare Be Trusted? Publc Dsclosure Authorzed Polcy Research Workng Paper 6726 WPS6726 Publc Dsclosure Authorzed Publc Dsclosure Authorzed Can Subjectve Questons on Economc Welfare Be Trusted? Evdence for Three Developng

More information

A Computer-aided System for Discriminating Normal from Cancerous Regions in IHC Liver Cancer Tissue Images Using K-means Clustering*

A Computer-aided System for Discriminating Normal from Cancerous Regions in IHC Liver Cancer Tissue Images Using K-means Clustering* A Computer-aded System for Dscrmnatng Normal from Cancerous Regons n IHC Lver Cancer Tssue Images Usng K-means Clusterng* R. M. CHEN 1, Y. J. WU, S. R. JHUANG, M. H. HSIEH, C. L. KUO, Y. L. MA Department

More information

NeuroImage. Decoded fmri neurofeedback can induce bidirectional confidence changes within single participants

NeuroImage. Decoded fmri neurofeedback can induce bidirectional confidence changes within single participants NeuroImage 149 (2017) 323 337 Contents lsts avalable at ScenceDrect NeuroImage journal homepage: www.elsever.com/locate/neuromage Decoded fmri neurofeedback can nduce bdrectonal confdence changes wthn

More information