Incrementally Clustering Legislative Interpellation Documents

Size: px
Start display at page:

Download "Incrementally Clustering Legislative Interpellation Documents"

Transcription

1 th Hawaii International Conference on System Sciences Incrementally Clustering Legislative Interpellation Documents Fu-Ren Lin Institute of Service Science, National Tsing Hua University, Taiwan Yu-tze Huang Institute of Technology Management, National Tsing Hua University, Taiwan Dachi Liao Institute of Political Science, National Sun Yat-sen University, Taiwan Abstract The Parliamentary Library of Legislative Yuan website provides a fair and objective channel for the public to trace daily activities of the Legislative Yuan and legislators inquiries in Taiwan. However, the increased information content causes information overloading problem. To mitigate such information overloading problem, this study proposes an incremental clustering mechanism to renew the information regularly by presenting it as a categorical structure to ease the efforts on tracing issue development. This study first initiates a basic categorical structure by a two-stage clustering approach. Then, the incremental clustering method is applied to clustering a collection of incoming documents corresponding to the same topic into clusters, and designates these clusters into existing categories or creates a new category. Experimental results show the effectiveness of the proposed incremental clustering method, which enables the management of the hierarchical structure of categories on legislative interpellation. This study contributes to e-government initiatives on facilitating the public to trace the legislative activities periodically. 1. Introduction Taiwan is a country with enthusiasm for politics and there are flooded political-related news reports in both TV channels and newspapers. However, the news is mainly reported via third parties (i.e. reporters and anchors), which then inevitably involved with certain political positions or personal opinions in them. Nevertheless, the web site of the Parliamentary Library of Legislative Yuan as the library of the parliament of Taiwan provides a fair and objective channel to people. Its content is about the daily activities of the Legislative Yuan (i.e., the congress of Taiwan), including written records and videos of interpellation, conference speech, and legislation proposals. Thus, this website is the most direct and realistic channel to know the issues concerned by the Legislative Yuan as well as the performances of the legislators. However, the flooded contents often cause the information overloading problem. To solve this problem, some scholars in the related fields began to tackle the issue by applying information technology to effectively provide political information. Many technologies have been proposed to solve the information overloading problem including search engines (information retrieval), information agency, information customization, etc. [2]. Other methods such as text mining techniques have been used for discovering interesting patterns from unstructured text. For example, classification techniques can distinguish relevant documents from document sets, and clustering mechanisms can group related documents corresponding to the same topic. Those who interested in politics usually collect related information via mass media. However, some researches indicate that one of the reasons that voters disinterest in politics or do not vote is probably because they lack of political information or the ability to process them [7]. Thus, we think that the effective categorizing and clustering technology can help people understand issues discussed in the Legislative Yuan and supervise the legislators behaviors. Apart from that, the Parliamentary Library of Legislative Yuan website provides massive information. In interpellation sessions, the legislators interpolate the ministers or the members of the cabinet about policies, which are important questions involving public interests in general [9]. The importance of interpellation is not less than that of the negotiation of budgetary bill, law legislation, or other significant national matters, such as the consent for the important personnel appointment of the Judicial Yuan and the Control Yuan. Especially under current fierce political atmosphere, the legislators employing the right of interpellation to its extreme for the sake of the performance, which leads to the extension of the interpellation scope of person or matter (Legislative Bureau of the Legislative Yuan, 2004). This research aims to develop a hierarchical structure of categories to cluster the interpellation documents. This can transform the text provided in the Parliamentary Library of Legislative Yuan website into /12 $ IEEE DOI /HICSS

2 categorical information in order to effectively mitigate the information overloading problem faced by the general public who concerning the development of each issue in the parliament and overseeing the performance of legislators. Based on the periodical updates of the interpellation documents, this study aims to develop a clustering mechanism which can renew the categories of information regularly. Nevertheless, people get confused easily if there is a huge change in existing categorical structure. In order to prevent the disadvantage of cognitive confusion and computational load, this study adopts an incremental clustering mechanism to efficiently and effectively maintain clusters after adding new documents. In summary, this paper contributes to e- government initiative mainly on mitigating the cognitive and computational loads in tracing issues raised in parliamentary interpellation sessions using an incremental clustering approach. 2. Incremental clustering In this research, we propose an incremental clustering architecture to develop a hierarchical structure of categories for progressive issues of legislative interpellation. 2.1 Definition The system is designed to update data periodically. First, a suitable time slot is determined to collect data and specify the period of incoming data. For example, for legislative interpellation, a period is defined as the duration of a legislative meeting session. We conduct categorical structure initialization for interpellation documents during period 1, and perform incremental clustering iteratively on periods 2, 3, 4, and so on. Considering about incremental clustering, we expect to generate a hierarchical structure of categories. In this study, we define three types of categories. 1. Sub-category contains a group of related documents. Every branch will be ended with a sub-category, and each sub-category can be only mounted under one super-category as its parent. 2. Super-category is a virtual category containing multiple sub- or super-categories. 3. Root is a virtual category positioned on the root of a categorical tree. A new category not related to any other existing categories will be mounted under the root. It is worth noting that child nodes under root may not be correlated. Each category may contain three types of relation, parent, child and peer. Figure 1 illustrates an example of a hierarchical structure of categories. There s only one root as the parent of super-categories 1 and 2 and sub-categories 5 and 10. We define that each category except the root has only one parent. Super-category 1 has two child sub-categories: subcategories 3 and 4, and three peers: super-category 2 and sub-categories 5 and 10. Sub-category 3 has one parent (super-category 1) and one peer (sub-category 4). Figure 1. An example of hierarchical structure of categories 2.2 System Framework The system framework consists of two major parts: categorical structure initialization and incremental clustering (Figure 2). Categorical structure initialization only performs once to create the categorical tree at the beginning. Incremental clustering repeats along with the progress of interpellation. 2.3 Pre-process The preprocess stage aims to collect legislative interpellation documents and perform the data transformation task. First, interpellation documents are collected from the Parliamentary Library of Legislative Yuan. Then, we adopt CKIP (Chinese Knowledge and Information Processing) for word segmentation to annotate terms with part of speech (POS). We then use n-gram method to assemble the concatenated noun terms into noun phrases. According to the characteristics of news [6], the name entity terms identified as noun phrases, e.g., people, place, and organization, appear in the same issue for the purpose of consistence. For the same reason, we assume that the issue of legislative interpellation is similar to news. Hence, terms except noun phrases will be filtered out to obtain the high efficiency and effectiveness. However, the number of noun phrases in a corpus is too large to take all of them 2522

3 into computation. Thus, we conducted tfidf (term frequency inverse document frequency) to select terms with top α percent weight, and then converted these terms or term phrases into a vector space model. tfidf, often taken in term weighting and information retrieval, was also adopted to feature selection. tfidf takes a simple idea that a term with high frequency (tf) exhibits its importance, but the appearance of the term in many documents shows its low discrimination. Therefore, a term with high tfidf can be regarded as high representative to stand for the original stories [11]. The final step of the pre-process is to form a vector space model. 2.4 Categorical structure initialization At the beginning, we need to generate a preliminary categorical structure in order to assign incoming legislative interpellation documents of following periods into corresponding categories. It also provides a basic issue structure to help the public monitor legislative performance. Among clustering methods, this study chose a two-stage clustering approach to take both partitioning and hierarchical advantages. In the first step, we, by hierarchical clustering, compute inconsistency coefficient value in each fusion, and obtain the optimal number of clusters. In the second stage, the k-means clustering is employed to physically partition documents into the number of clusters determined in the first stage. Figure 2. System framework Stage 1 (hierarchical clustering). The website of the Legislative Yuan of Taiwan provides not only the purpose of interpellation but also the themes, keywords and categories encoded by human. In this process, we extract themes, keywords and categories to represent each interpellation s vector. Cosine coefficient [11] known as cosine similarity traditionally is to calculate the distance of two terms. That is to calculate tangent of two vectors as shown below. cos ni, mi j j w w 2 nj i nj i, w j mj i w 2 mj i, where w nj is the weight of term j in document n i i, and w is the weight of term j in document m i. mj i We take Cosine coefficient to calculate the similarity between documents. Each legislator s interpellation is filed as a document, and transformed into vector space d i. We define the similarity of two clusters as the average similarity between documents in corresponding clusters as shown below, where d i and d j are documents in cluster1 with m documents and cluster2 with n documents, respectively. similarity( cluster1, cluster2) mn, i 1, j 1 cos ine( d, d ) m n The hierarchical algorithm operates in a greedy and local manner [5]. It iteratively merges a pair of clusters scored the highest similarity at each stage, and stops when all clusters are merged into one cluster. The complexity of the binary structure generated from the hierarchical clustering can be reduced by choosing a cutting threshold to determine the number of clusters in order to perform the physical partitioning with k-means clustering in the second stage [8]. Selecting the optimal number of clusters is one of the central problems both in nonhierarchical and hierarchical cluster analysis [3]. The inconsistency coefficient obtains shallower categorical trees than the silhouette coefficient [5]. We also conducted a pilot experiment using the maximum similarity difference between each merge and last merge, and the result obtained by the inconsistency coefficient outperforms the silhouette coefficient. Therefore, we use the i j 2523

4 inconsistency coefficient to determine the number of clusters. The inconsistent function is used to generate a list of the inconsistency coefficients for each link in the cluster tree. By default, the inconsistent function compares each link in the cluster hierarchy with adjacent links that are less than z levels below it in the cluster hierarchy. This is called the depth of the comparison. The objects at the bottom of the cluster tree, called leaf nodes, that have no further objects below them, have an inconsistency coefficient of zero. Clusters that join two leaves also have a zero inconsistency coefficient. The inconsistency coefficient for the i th (i 1, 2,, N-1) fusion level α i is αi αz ci σ z, where α z and σ z are the respective mean and standard deviation of the height of level α i and the z highest fusion levels before it. Notice that z heights are taken from the sub-tree rooted at the node of the i th level. Let the sub-tree contain l fusion levels. If 0<l<z, the l levels are considered, and when l0 the consistency coefficient is zero [5]. Silhouette coefficient combines the ideas of both cohesion and separation for individual points as well as clusters. It is defined as follows. b ( i) a ( i) s( i) max( b ( i), a ( i)) where a(i) denotes the average dissimilarity of i to all other objects of cluster A. For any cluster C different from A, let d(i, C) be the average dissimilarity of i to all objects of C. After computing d(i, C) for all clusters C A, the smallest value among them is selected and denoted as b(i). The value of the silhouette coefficient can vary between -1 and 1. A negative value is undesirable We want the silhouette coefficient to be positive (a(i) < b(i)), and for a(i) to be as close to 0 as possible since the coefficient assumes its maximum value of 1 when a(i) 0 [4] Stage 2 (k-means clustering). In general, the k- means clustering method obtains much better performance than hierarchical clustering in most cases, but its performance depends on the number of clusters and initial seeds. After obtaining initial seeds and determining the number of clusters from the hierarchical clustering results in Stage 1, we then use the k-means clustering method to physically partition the document set starting with these initial seeds. We obtain clusters from k-means and take these clusters as the initial categories which are mounted under the root. The system outputs the hierarchical structure of categories from top to down and layer by layer. 2.5 Incremental clustering People tend to get used to a categorical structure; meanwhile, a large degree of structural change may confuse people s existing cognition of the structure of the categories. Therefore, we explore an incremental clustering approach to modify categories in a small range of hierarchical structure instead of re-clustering the whole document set. The incremental structure maintenance approach greatly reduces people s cognitive loadings [8]. Documents in each period will be considered as incoming documents and processed by period sequentially. The proposed incremental clustering approach firstly pre-processes incoming documents to transform un-structured text into vector space. Then the two-stage clustering technique is performed to cluster these incoming documents into emerging issues. Next these clusters are incrementally added into existing categorical structure. Finally each category will be named by distinguished terms Issue identification. In the political domain, the main objects or concepts of interest from documents are generally actors (such as states, parties, and politicians) and issues (such as employment, peace, and healthcare) [1]. Legislative interpellation is formed when legislators questioned specific issues in Legislative Yuan. New categories are created when new interpellations are added but there are not any similar existing categories. This implies that the new category can be represented more distinguished from other existing categories. Hence, a group of related interpellations are defined as an issue, which may be incrementally added into the categorical structure. We take advantage of the two-stage clustering technique to group interpellations occurred in the same period into issues Categorical structure maintenance. The major steps of the categorical structure maintenance approach are depicted in Figure 3. When a new issue is added into the categorical structure, the first step is to represent the issue with the vector space model [11]. Three tests, classification test, category inter-similarity test, and the Silhouette Coefficient test, are used to examine the need to create new categories, integrate issue and category, category decomposition, adding 2524

5 peer category, and super-category re-clustering. Categories generated from the categorical structure initialization stage are represented by the vector space model, where the centroid vector of a category is the average of vector values of documents in the same category. Note that only sub-categories will be represented by vector space model, whereas supercategory is a virtual category, in which we won t add any incoming issues. An incoming issue is represented by the vector space mode, which values denote the average occurrence of terms appearing in the documents of the same cluster. In classification test, we classify a new issue by comparing the issue with existing category centroids. We calculate the similarity between an incoming issue and a category with Cosine coefficient. If the similarity value is greater than γ, we assume that this issue is suitable to be classified to an existing category, and will be assigned into the corresponding category in next tests. Otherwise, if the similarity value is smaller than γ, which means that there s no existing category similar with the issue, this issue will be transformed into a new category, and then mounted under the root of the categorical tree. category, we need to decide whether this issue is suitable to be integrated into the category or not. First, we define inter-similarity function. Each category has its own centroid vector. We use the average cosine similarity [11] between centroid and each object in the same category to represent intersimilarity as follows: inter - similarity( C) size( c) i 1 cos ine( o, C ) i centroid size( C) By inter-similarity function, we obtain the original inter-similarity of category C and the new intersimilarity of category C with added issue. If new intersimilarity(i,c) is greater than original inter-similarity (C), which means that adding issue i into category C increases the cohesion of category C, which is the best condition we may want to see. Thus, we integrate issue i into category C. If new inter-similarity(i, C) is smaller than original inter-similarity(c), we test the decreasing intersimilarity rate to evaluate the performance of the integration between issue i and category C. The decreasing inter-similarity rate is defined as following For a new issue i, when the previous process determined category C is the most similar existing decreasing inter_similarity rate ( i,c) original inter_similarity original inter_similarity ( C) new inter_similarity( i,c) ( C) If the result of decreasing inter-similarity rate is greater than the threshold α, which means that the decreasing inter-similarity is in a reasonable range, we integrate issue i into category C. For the condition that decreasing inter-similarity rate is smaller than the threshold α, we first exam the original inter-similarity(c) to check the cohesion of category C. We set up an inter-similarity threshold β as the average of all categories inter-similarity in the categorical tree. If the original inter-similarity(c) is smaller than threshold β, which means that the cohesion of category C is relatively small in the categorical tree, we conduct category decomposition. For the condition that decreasing inter-similarity rate is smaller than the threshold α, but the original inter-similarity(c) is greater than threshold β, we add the issue as category C s peer. If the inter-similarity(c) is smaller than threshold β, the two-stage clustering is applied to decompose the category after issue i is added into category C. In the meanwhile, category C is transformed from a subcategory to a super-category. Note that the number of clusters is determined by inconsistency coe cient. We need to decide where these new clusters (C news ) should be replaced, so that we treat C new as an incoming issue and conduct similar method iteratively described as follows. First, we use classification test to obtain the most similar category C with the new cluster C new and exam the criteria mentioned above original inter-similarity (C ), new inter-similarity(c new,,c ) and decreasing inter-similarity rate. If the new inter-similarity (C new,,c ) is greater than original inter-similarity(c ) or the decreasing inter-similarity rate is below threshold α, C new will be added into category C. Otherwise, the new cluster C new will be mounted under the original super-category C. If the inter-similarity(c) is greater than threshold β, but adding new issue i into category C will cause inter-similarity(c) dramatically low, we will transform this issue i into a new category and put it on the position of category C s peer. That means that this new category is related to category C, and they will have the same parent. Categories in a same family denote a certain degree of correlation. It s worth noting that children under the root do not have this kind of correlation. Therefore, if the parent of category C is the root, we will create a new super-category and place it 2525

6 on the original category C s position. Then category C will be replaced by this new super-category s child. Also issue i will be transformed into a new category and placed as this new super-category s child. So that category C and issue i are still in the same family and denote a certain degree of correlation. If the inter-similarity(c) is greater than threshold β, but adding new issue i into category C will cause inter-similarity(c) dramatically low, we will transform this issue i into a new category and put it on the position of category C s peer. That means that this new category is related to category C, and they will have the same parent. Categories in a same family denote a certain degree of correlation. It s worth noting that child nodes under the root do not have this kind of correlation. Therefore, if the parent of category C is the root, we will create a new super-category and place it on the original category C s position. Then category C will be replaced by this new super-category s child. Also issue i will be transformed into a new category and placed as this new super-category s child. So that category C and issue i are still in the same family and denote a certain degree of correlation. After inserting a new issue, we test SCs of those super-categories wherein resides the new issue. Comparing the super-category s original SC with the new SC, if the SC s decreasing rate is greater than a given threshold θ, the number of clusters of the given super-category is inaccurate after updating the categorical structure. Therefore, we should perform the re-clustering function to re-structure the categorical structure under the super-category. If the SC s decreasing rate is equal to or less than a given threshold θ, the cluster structure of the given supercategory is still acceptable, and no further re-clustering action is needed Naming categories. The main idea of labeling categories is to facilitate users to differentiate categories. Maximum Term Weight Labeling (MTWL) [10] is based on the idea of tfidf and incorporates hierarchical information through a specialized weighting function idf global and idf local. MTWL can be written as MTWLk idfglobal idflocal tfk where tf k is the term frequency of term k. The global inverse document frequency for term k is calculated as idf global D log( 1) #( t, D) + where #(t k,d) denotes the number of documents in the collection containing term k. D is the number of all documents. Global weighting penalizes terms, which are over represented in the whole collection. However, k terms over represented in a particular sub-category only, will be likely selected. Hence, the term distribution among peers has to be taken into account to avoid siblings getting similar labels. We adopt the local inverse document frequency depending on the term distribution over documents in the sub-category. idf local for term k in cluster c j calculated as idf local j Dc p, log( + 1) #( t, ) k Dc p where c p is defined as the parent cluster of cluster c j. D cp * is the number of documents in c p, and #(t k,d cj *) is the number of documents in c p which contains term k. Muhr, Kern and Granitzer [10] suggested that MTWL can be extended by hierarchical labeling method in order to take parent child relationship into account. Also the results from [10] shows that MTWL extended by hierarchical labeling reaches stable accuracy in different levels. Therefore, we apply MTWL to extract reprehensive terms from subcategories. For super-categories, we conduct MTWL extended by hierarchical labeling to take path length into account. For a category C, every terms appeared in this category will become candidate terms. The system will assign the score to each candidate term. Finally we extract three terms from candidate terms with the top three scores in a category to represent the category. We conduct the same process for every suband super- categories in the system except the root. 3. System implementation and results 3.1 Data Source We collected 12,743 legislative interpellation archive generated by the 6 th term of legislators from February 1, 2005 to January 31, 2008 distributed by the Parliamentary Library of Legislative Yuan ( According to the session plan of Legislative Yuan, two sessions are held each year; thus, we treat six sessions in total as six periods shown in Figure 4. We extract legislator, category, theme, keyword, purpose and date from the documents. The results of incremental clustering are evaluated by human evaluators. Over ten thousands of documents are too large for human evaluators to exam, so that we take ten percent of documents in each period as the sample data to evaluate the proposed system. 2526

7 Figure 3. The procedure of categorical structure maintenance 3.2 System implementation The mechanism we proposed was implemented by Java and followed the process listed in Section 3. In the preprocess stage, we first extracted three types of terms which are theme, category and keyword, and then conducted feature selection for remaining terms. A term which part-of-speech (POS) is not a noun and its corresponding tfidf rank is on the last 80% is removed from the term list. We have used different tfidf values in the experimentation, and obtained the best results by extracting top 20% of terms. Thus, in the implementation, we set the tfidf threshold α to 0.2. This threshold results in a dynamic number of terms between each period. In the two-stage clustering process, before k-means clustering applied, the optimal number of clusters must be determined first. We take hierarchical clustering in periods 1 and 2 as an example. The maximum inconsistency coefficient determined the optimal number of clusters is 65 and 136 in periods 1 and 2, respectively. In categorical structure maintenance procedure, there are three parameters decreasing inter-similarity rate α, inter-similarity threshold β, and SC s decreasing rate θ. We set decreasing inter-similarity rate to 0.2 to make sure that whenever a new issue is added into category C, the new inter-similarity cannot drop more than 20% of original inter-similarity. Intersimilarity threshold β affects whether a category will be decomposed or not. We set it as the average of all categories inter-similarity in the proposed system. SC s decreasing rate θ is set to 0.3 because we don t want to conduct re-clustering often unless it is necessary. 2527

8 Figure 4. The number of interpellation documents in each session 3.3 Results We applied two-stage clustering to obtain the initial basic categorical structure of interpellation documents in period 1, and named each category. There are 65 sub-categories produced in the basic categorical structure shown in Figure 5. We then conducted incremental clustering with interpellations from periods 2 to 6. As time goes by, the number of new category creation increases which means that the categorical structure can cover most of incoming issues. 4. Evaluation Design and Results 4.1 Evaluation criteria The performance of the incremental clustering method is evaluated by the degree of modification done by experts on the results generated by the proposed system. We take the idea precision which is widely used as relevancy measures for information retrieval [11]. Accuracy measures the percentage of relevant documents in relation to the number of documents retrieved. In this study, we view a query as a category designation, and adopt accuracy measures to evaluate how accurate the incremental clustering method complies with domain experts in assigning documents to corresponding categories. M denotes the set of documents allocated to a category by a domain expert, and A denotes the set of documents assigned to a category by the incremental clustering method. N A denotes the number of documents in A, and N A M denotes the number of documents both in A and M. For each category, the accuracy is defined as below. N A M Accurancy N A The accuracy for a category denotes the percentage of documents assigned by the incremental clustering method matches with what domain experts assign. The accuracy for a hierarchical structure of categories is calculated by averaging the values of accuracy of all categories of the modified results. Figure 5. The number of categories in each period 2528

9 4.2 Experimental Design We designed a Web interface for subjects to modify the categorical structure on desktop computers, and then invited three evaluators with political domain background to evaluate the generated categorical structure. The evaluators need to click the categories on the left side of the screen and exam those interpellation documents on the ride side which are assigned to this category by the proposed incremental clustering system. After finishing reading the keywords and interpellations on the right side, the evaluators will decide whether these documents are suitable for this category or not, respectively. Notice that this procedure was followed from period 1 to 6, respectively. The difference on the categorical structure before and after domain experts modification is analyzed to evaluate the performance of the proposed incremental clustering methods. 4.3 Evaluation results and discussions We found that the accuracy in these experiments is very high (85%~93%); however, the difference among experts judgments varies greatly. It would bring more insight by investigating the consensus of experts by modifying accuracy measure to reflect the consensus of experts. The revised accuracy for each category is defined as below. Revised Accurancy N Ui U2... Un, where A denotes the set of documents in a category designated by the incremental clustering method; U i denotes the set of documents in a category designated by both incremental clustering method and domain expert i; N A denotes the number of documents in A. The revised accuracy denotes the percentage of the interpellation documents in a category assigned by the incremental clustering method which is also jointly designated by n domain experts. The revised accuracy of the hierarchical structure of categories is calculated by averaging the values of revised accuracy of all categories of the modified results. From the evaluation results, the revised accuracy is lower than those using the original accuracy measure. It implies that this evaluation presents difficulties in reaching consensus in categorization viewed by individual domain experts. We then conducted the second evaluation by presenting the interpellation documents with inconsistent answers between experts to ask them to discuss in order to obtain consistent categories for these interpellations. To compute the accuracy denoted as the second N A revised accuracy, we used the accuracy measure with M as the set of interpellations allocated to a category resulting from three domain experts negotiation. The accuracy of categorical structure in each period is listed in Table 1. The result shows that the proposed method can cluster these interpellations in an acceptable accuracy. These domain experts mentioned that some categories may contain multiple issues, and even we conducted category naming to help people identify these issues, they still felt difficult to decide the main issue in a category. 5. Conclusions and future works This research proposes an incremental clustering method to construct a hierarchical structure of categories, which helps the public identify the latest issues in Legislative Yuan, and monitor legislators performance. In categorical structure initialization stage, we constructed a basic categorical structure. Then, in the incremental clustering stage, the system designated each incoming interpellation document into a corresponding existing category or creates a new category. By doing this, the initial categorical structure is transformed to a hierarchical structure, and people can keep tracking legislators performance by allocating interpellations to the corresponding categories. Table 1. Accuracy of categorical structure in each period Period Accuracy: Expert 1 Accuracy: Expert 2 Accuracy: Expert 3 Accuracy: Average Revised accuracy Second revised accuracy We summarize the contributions of this study as follows: (1) This study has adopted the two-stage clustering approach iteratively to generate hierarchical structure of categories. (2) The incremental clustering has increased the work efficiency for clustering, in particular an ever-increasing volume of data. (3) This study has linked information retrieval and 2529

10 text mining techniques to streamline the transformation from interpellation documents to hierarchical structure of categories. (4) Transforming text into the form of statistics will effectively mitigate the problems caused by information overloading. The adoption of incremental clustering method for incoming documents can be further tested in following circumstances. (1) Multiple cases can be studied and implemented with different parameter settings to get the best systematic parameter setting. (2) In this paper, we only used ten percent of interpellation documents generated by the 6 th term of legislators for testing. It may be insufficient to assess the performance of the system in real world. Thus, we should take the complete data set to prove the effectiveness of the method in real world applications. (3) The results of the proposed system may facilitate people to objectively view the issues happened in Legislative Yuan if we provide the visualization for the results, which may help people understand them at a glance. (4) With domain experts feedback, the system should have the ability to learn from human judgment. References [1] W. V. Atteveldt, J. Kleinnijenhuis, N. Ruigrok, and S. Schlobach, S. "Good News or Bad News? Conducting sentiment analysis on Dutch text to distinguish between positive and negative relations." Journal of Information Technology & Politics, 5(1), 2008, pp [2] H. Berghel, "Cyberspace 2000: Dealing with information overload." Communications of the ACM, 40(2), 1997, pp [3] Everitt, B. S., Landau, S., & Leese, M. Cluster Analysis (fourth Ed.). Arnold, London, [4] Kaufman, L., & Rousseeuw, P. J.. Finding Groups in Data: An Introduction to Cluster Analysis. Wiley's Series in Probability and Statistics. John Wiley and Sons, New York, [5] T. Korenius, J. Laurikkala, M. Juhola, and K. Jarvelin, "Hierarchical clustering of a Finnish newspaper article collection with graded relevance assessments." Information Retrieval, 9(1), 2006, pp [6] Ku, L., A study on the multilingual topic detection of news articles. Master Dissertation, Department of Computer Science and Information Engineering, National Taiwan University, [7] Y. Liao, "The Research of Voter Turnout: Case Study in Taiwan." The Journal of Chinese Public Administration, (3), 2006, pp [8] F.-r. Lin, and C.-m. Hsueh, "Knowledge map creation and maintenance for virtual communities of practice." Information Processing & Management, 42(2), 2006, pp [9] J.J. Lin, "The Study of Interpellation System of Legislative Yuan in R.O.C.". Journal of TOKO, 1(1), [10] M. Muhr, R. Kern, and M. Granitzer, Analysis of structural relationships for hierarchical cluster labeling. In Proceeding of the 33rd international ACM SIGIR conference on Research and development in information retrieval (pp ), ACM, [11] G. Salton, and C. Buckley, "Term-weighting approaches in automatic text retrieval." Information processing & management, 24(5), 1988, pp

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

A Comparison of Collaborative Filtering Methods for Medication Reconciliation A Comparison of Collaborative Filtering Methods for Medication Reconciliation Huanian Zheng, Rema Padman, Daniel B. Neill The H. John Heinz III College, Carnegie Mellon University, Pittsburgh, PA, 15213,

More information

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework

Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework Stepwise Knowledge Acquisition in a Fuzzy Knowledge Representation Framework Thomas E. Rothenfluh 1, Karl Bögl 2, and Klaus-Peter Adlassnig 2 1 Department of Psychology University of Zurich, Zürichbergstraße

More information

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation Ryo Izawa, Naoki Motohashi, and Tomohiro Takagi Department of Computer Science Meiji University 1-1-1 Higashimita,

More information

More Examples and Applications on AVL Tree

More Examples and Applications on AVL Tree CSCI2100 Tutorial 11 Jianwen Zhao Department of Computer Science and Engineering The Chinese University of Hong Kong Adapted from the slides of the previous offerings of the course Recall in lectures we

More information

Chapter 1. Introduction

Chapter 1. Introduction Chapter 1 Introduction 1.1 Motivation and Goals The increasing availability and decreasing cost of high-throughput (HT) technologies coupled with the availability of computational tools and data form a

More information

INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION CONTENTS

INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION CONTENTS INTERNATIONAL STANDARD ON ASSURANCE ENGAGEMENTS 3000 ASSURANCE ENGAGEMENTS OTHER THAN AUDITS OR REVIEWS OF HISTORICAL FINANCIAL INFORMATION (Effective for assurance reports dated on or after January 1,

More information

Chapter IR:VIII. VIII. Evaluation. Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing

Chapter IR:VIII. VIII. Evaluation. Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing Chapter IR:VIII VIII. Evaluation Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing IR:VIII-1 Evaluation HAGEN/POTTHAST/STEIN 2018 Retrieval Tasks Ad hoc retrieval:

More information

Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information

Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information Analysis of Emotion Recognition using Facial Expressions, Speech and Multimodal Information C. Busso, Z. Deng, S. Yildirim, M. Bulut, C. M. Lee, A. Kazemzadeh, S. Lee, U. Neumann, S. Narayanan Emotion

More information

Lionbridge Connector for Hybris. User Guide

Lionbridge Connector for Hybris. User Guide Lionbridge Connector for Hybris User Guide Version 2.1.0 November 24, 2017 Copyright Copyright 2017 Lionbridge Technologies, Inc. All rights reserved. Published in the USA. March, 2016. Lionbridge and

More information

Real-time Summarization Track

Real-time Summarization Track Track Jaime Arguello jarguell@email.unc.edu February 6, 2017 Goal: developing systems that can monitor a data stream (e.g., tweets) and push content that is relevant, novel (with respect to previous pushes),

More information

PO Box 19015, Arlington, TX {ramirez, 5323 Harry Hines Boulevard, Dallas, TX

PO Box 19015, Arlington, TX {ramirez, 5323 Harry Hines Boulevard, Dallas, TX From: Proceedings of the Eleventh International FLAIRS Conference. Copyright 1998, AAAI (www.aaai.org). All rights reserved. A Sequence Building Approach to Pattern Discovery in Medical Data Jorge C. G.

More information

Expert System Profile

Expert System Profile Expert System Profile GENERAL Domain: Medical Main General Function: Diagnosis System Name: INTERNIST-I/ CADUCEUS (or INTERNIST-II) Dates: 1970 s 1980 s Researchers: Ph.D. Harry Pople, M.D. Jack D. Myers

More information

Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model

Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model Ruifeng Xu, Chengtian Zou, Jun Xu Key Laboratory of Network Oriented Intelligent Computation, Shenzhen Graduate School,

More information

Artificial Intelligence For Homeopathic Remedy Selection

Artificial Intelligence For Homeopathic Remedy Selection Artificial Intelligence For Homeopathic Remedy Selection A. R. Pawar, amrut.pawar@yahoo.co.in, S. N. Kini, snkini@gmail.com, M. R. More mangeshmore88@gmail.com Department of Computer Science and Engineering,

More information

Colon cancer subtypes from gene expression data

Colon cancer subtypes from gene expression data Colon cancer subtypes from gene expression data Nathan Cunningham Giuseppe Di Benedetto Sherman Ip Leon Law Module 6: Applied Statistics 26th February 2016 Aim Replicate findings of Felipe De Sousa et

More information

Network Analysis of Toxic Chemicals and Symptoms: Implications for Designing First-Responder Systems

Network Analysis of Toxic Chemicals and Symptoms: Implications for Designing First-Responder Systems Network Analysis of Toxic Chemicals and Symptoms: Implications for Designing First-Responder Systems Suresh K. Bhavnani 1 PhD, Annie Abraham 1, Christopher Demeniuk 1, Messeret Gebrekristos 1 Abe Gong

More information

Improved Intelligent Classification Technique Based On Support Vector Machines

Improved Intelligent Classification Technique Based On Support Vector Machines Improved Intelligent Classification Technique Based On Support Vector Machines V.Vani Asst.Professor,Department of Computer Science,JJ College of Arts and Science,Pudukkottai. Abstract:An abnormal growth

More information

T. R. Golub, D. K. Slonim & Others 1999

T. R. Golub, D. K. Slonim & Others 1999 T. R. Golub, D. K. Slonim & Others 1999 Big Picture in 1999 The Need for Cancer Classification Cancer classification very important for advances in cancer treatment. Cancers of Identical grade can have

More information

Exploration and Exploitation in Reinforcement Learning

Exploration and Exploitation in Reinforcement Learning Exploration and Exploitation in Reinforcement Learning Melanie Coggan Research supervised by Prof. Doina Precup CRA-W DMP Project at McGill University (2004) 1/18 Introduction A common problem in reinforcement

More information

Predicting Breast Cancer Survivability Rates

Predicting Breast Cancer Survivability Rates Predicting Breast Cancer Survivability Rates For data collected from Saudi Arabia Registries Ghofran Othoum 1 and Wadee Al-Halabi 2 1 Computer Science, Effat University, Jeddah, Saudi Arabia 2 Computer

More information

CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL

CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL 127 CHAPTER 6 HUMAN BEHAVIOR UNDERSTANDING MODEL 6.1 INTRODUCTION Analyzing the human behavior in video sequences is an active field of research for the past few years. The vital applications of this field

More information

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection

How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection How to Create Better Performing Bayesian Networks: A Heuristic Approach for Variable Selection Esma Nur Cinicioglu * and Gülseren Büyükuğur Istanbul University, School of Business, Quantitative Methods

More information

Chapter 5: Field experimental designs in agriculture

Chapter 5: Field experimental designs in agriculture Chapter 5: Field experimental designs in agriculture Jose Crossa Biometrics and Statistics Unit Crop Research Informatics Lab (CRIL) CIMMYT. Int. Apdo. Postal 6-641, 06600 Mexico, DF, Mexico Introduction

More information

Analyzing Human Negotiation using Automated Cognitive Behavior Analysis: The Effect of Personality. Pedro Sequeira & Stacy Marsella

Analyzing Human Negotiation using Automated Cognitive Behavior Analysis: The Effect of Personality. Pedro Sequeira & Stacy Marsella Analyzing Human Negotiation using Automated Cognitive Behavior Analysis: The Effect of Personality Pedro Sequeira & Stacy Marsella Outline Introduction Methodology Results Summary & Conclusions Outline

More information

Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews

Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews J Nurs Sci Vol.28 No.4 Oct - Dec 2010 Essential Skills for Evidence-based Practice Understanding and Using Systematic Reviews Jeanne Grace Corresponding author: J Grace E-mail: Jeanne_Grace@urmc.rochester.edu

More information

Cocktail Preference Prediction

Cocktail Preference Prediction Cocktail Preference Prediction Linus Meyer-Teruel, 1 Michael Parrott 1 1 Department of Computer Science, Stanford University, In this paper we approach the problem of rating prediction on data from a number

More information

Perceived similarity and visual descriptions in content-based image retrieval

Perceived similarity and visual descriptions in content-based image retrieval University of Wollongong Research Online Faculty of Informatics - Papers (Archive) Faculty of Engineering and Information Sciences 2007 Perceived similarity and visual descriptions in content-based image

More information

Audit Firm Administrator steps to follow

Audit Firm Administrator steps to follow Contents Audit Firm Administrator steps to follow... 3 What to know before you start... 3 Understanding CaseWare Cloud in a nutshell... 3 How to do the once off set up for the Audit Firm or Organisation...

More information

International Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17, ISSN

International Journal of Computer Engineering and Applications, Volume XI, Issue IX, September 17,  ISSN CRIME ASSOCIATION ALGORITHM USING AGGLOMERATIVE CLUSTERING Saritha Quinn 1, Vishnu Prasad 2 1 Student, 2 Student Department of MCA, Kristu Jayanti College, Bengaluru, India ABSTRACT The commission of a

More information

Introspection-based Periodicity Awareness Model for Intermittently Connected Mobile Networks

Introspection-based Periodicity Awareness Model for Intermittently Connected Mobile Networks Introspection-based Periodicity Awareness Model for Intermittently Connected Mobile Networks Okan Turkes, Hans Scholten, and Paul Havinga Dept. of Computer Engineering, Pervasive Systems University of

More information

Pilot Study: Clinical Trial Task Ontology Development. A prototype ontology of common participant-oriented clinical research tasks and

Pilot Study: Clinical Trial Task Ontology Development. A prototype ontology of common participant-oriented clinical research tasks and Pilot Study: Clinical Trial Task Ontology Development Introduction A prototype ontology of common participant-oriented clinical research tasks and events was developed using a multi-step process as summarized

More information

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering

Gene expression analysis. Roadmap. Microarray technology: how it work Applications: what can we do with it Preprocessing: Classification Clustering Gene expression analysis Roadmap Microarray technology: how it work Applications: what can we do with it Preprocessing: Image processing Data normalization Classification Clustering Biclustering 1 Gene

More information

Assurance Engagements Other than Audits or Review of Historical Financial Statements

Assurance Engagements Other than Audits or Review of Historical Financial Statements Issued December 2007 International Standard on Assurance Engagements Assurance Engagements Other than Audits or Review of Historical Financial Statements The Malaysian Institute Of Certified Public Accountants

More information

Genetic Algorithm for Scheduling Courses

Genetic Algorithm for Scheduling Courses Genetic Algorithm for Scheduling Courses Gregorius Satia Budhi, Kartika Gunadi, Denny Alexander Wibowo Petra Christian University, Informatics Department Siwalankerto 121-131, Surabaya, East Java, Indonesia

More information

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014

UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 UNIVERSITY of PENNSYLVANIA CIS 520: Machine Learning Final, Fall 2014 Exam policy: This exam allows two one-page, two-sided cheat sheets (i.e. 4 sides); No other materials. Time: 2 hours. Be sure to write

More information

Detection and Classification of Lung Cancer Using Artificial Neural Network

Detection and Classification of Lung Cancer Using Artificial Neural Network Detection and Classification of Lung Cancer Using Artificial Neural Network Almas Pathan 1, Bairu.K.saptalkar 2 1,2 Department of Electronics and Communication Engineering, SDMCET, Dharwad, India 1 almaseng@yahoo.co.in,

More information

This is a repository copy of Practical guide to sample size calculations: superiority trials.

This is a repository copy of Practical guide to sample size calculations: superiority trials. This is a repository copy of Practical guide to sample size calculations: superiority trials. White Rose Research Online URL for this paper: http://eprints.whiterose.ac.uk/97114/ Version: Accepted Version

More information

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology

ISC- GRADE XI HUMANITIES ( ) PSYCHOLOGY. Chapter 2- Methods of Psychology ISC- GRADE XI HUMANITIES (2018-19) PSYCHOLOGY Chapter 2- Methods of Psychology OUTLINE OF THE CHAPTER (i) Scientific Methods in Psychology -observation, case study, surveys, psychological tests, experimentation

More information

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation 1,2,3 EMR and Intelligent Expert System Engineering Research Center of

More information

extraction can take place. Another problem is that the treatment for chronic diseases is sequential based upon the progression of the disease.

extraction can take place. Another problem is that the treatment for chronic diseases is sequential based upon the progression of the disease. ix Preface The purpose of this text is to show how the investigation of healthcare databases can be used to examine physician decisions to develop evidence-based treatment guidelines that optimize patient

More information

Consultation Draft of the NHS Grampian British Sign Language (BSL) Plan

Consultation Draft of the NHS Grampian British Sign Language (BSL) Plan Consultation Draft of the NHS Grampian British Sign Language (BSL) Plan What NHS Grampian wishes to achieve to promote BSL over the next 2 years Consultation period 21 st May 2018 1 st July 2018 May 2018

More information

Elsevier ClinicalKey TM FAQs

Elsevier ClinicalKey TM FAQs Elsevier ClinicalKey FAQs Table of Contents What is ClinicalKey? Where can I access ClinicalKey? What medical specialties are covered in ClinicalKey? What information is available through ClinicalKey?

More information

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes.

Statistics 202: Data Mining. c Jonathan Taylor. Final review Based in part on slides from textbook, slides of Susan Holmes. Final review Based in part on slides from textbook, slides of Susan Holmes December 5, 2012 1 / 1 Final review Overview Before Midterm General goals of data mining. Datatypes. Preprocessing & dimension

More information

Position Paper: How Certain is Recommended Trust-Information?

Position Paper: How Certain is Recommended Trust-Information? Position Paper: How Certain is Recommended Trust-Information? Uwe Roth University of Luxembourg FSTC Campus Kirchberg 6, rue Richard Coudenhove-Kalergi L-1359 Luxembourg uwe.roth@uni.lu ABSTRACT Nowadays

More information

AUTOMATIC MEASUREMENT ON CT IMAGES FOR PATELLA DISLOCATION DIAGNOSIS

AUTOMATIC MEASUREMENT ON CT IMAGES FOR PATELLA DISLOCATION DIAGNOSIS AUTOMATIC MEASUREMENT ON CT IMAGES FOR PATELLA DISLOCATION DIAGNOSIS Qi Kong 1, Shaoshan Wang 2, Jiushan Yang 2,Ruiqi Zou 3, Yan Huang 1, Yilong Yin 1, Jingliang Peng 1 1 School of Computer Science and

More information

AC : USABILITY EVALUATION OF A PROBLEM SOLVING ENVIRONMENT FOR AUTOMATED SYSTEM INTEGRATION EDUCA- TION USING EYE-TRACKING

AC : USABILITY EVALUATION OF A PROBLEM SOLVING ENVIRONMENT FOR AUTOMATED SYSTEM INTEGRATION EDUCA- TION USING EYE-TRACKING AC 2012-4422: USABILITY EVALUATION OF A PROBLEM SOLVING ENVIRONMENT FOR AUTOMATED SYSTEM INTEGRATION EDUCA- TION USING EYE-TRACKING Punit Deotale, Texas A&M University Dr. Sheng-Jen Tony Hsieh, Texas A&M

More information

The use of Topic Modeling to Analyze Open-Ended Survey Items

The use of Topic Modeling to Analyze Open-Ended Survey Items The use of Topic Modeling to Analyze Open-Ended Survey Items W. Holmes Finch Maria E. Hernández Finch Constance E. McIntosh Claire Braun Ball State University Open ended survey items Researchers making

More information

Automated Medical Diagnosis using K-Nearest Neighbor Classification

Automated Medical Diagnosis using K-Nearest Neighbor Classification (IMPACT FACTOR 5.96) Automated Medical Diagnosis using K-Nearest Neighbor Classification Zaheerabbas Punjani 1, B.E Student, TCET Mumbai, Maharashtra, India Ankush Deora 2, B.E Student, TCET Mumbai, Maharashtra,

More information

Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions?

Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions? Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions? Isa Maks and Piek Vossen Vu University, Faculty of Arts De Boelelaan 1105, 1081 HV Amsterdam e.maks@vu.nl, p.vossen@vu.nl

More information

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES

MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES OBJECTIVES 24 MULTIPLE LINEAR REGRESSION 24.1 INTRODUCTION AND OBJECTIVES In the previous chapter, simple linear regression was used when you have one independent variable and one dependent variable. This chapter

More information

Reveal Relationships in Categorical Data

Reveal Relationships in Categorical Data SPSS Categories 15.0 Specifications Reveal Relationships in Categorical Data Unleash the full potential of your data through perceptual mapping, optimal scaling, preference scaling, and dimension reduction

More information

Review of PIE Figure 1.2

Review of PIE Figure 1.2 Chapter 1 The Social Work Profession Competency Practice Behavior Content Examples in Chapter 1 1-Demonstrate ethical and professional behavior Use reflection and self-regulation to manage personal values

More information

MyDispense OTC exercise Guide

MyDispense OTC exercise Guide MyDispense OTC exercise Guide Version 5.0 Page 1 of 23 Page 2 of 23 Table of Contents What is MyDispense?... 4 Who is this guide for?... 4 How should I use this guide?... 4 OTC exercises explained... 4

More information

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011

Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011 Incorporation of Imaging-Based Functional Assessment Procedures into the DICOM Standard Draft version 0.1 7/27/2011 I. Purpose Drawing from the profile development of the QIBA-fMRI Technical Committee,

More information

A Matrix of Material Representation

A Matrix of Material Representation A Matrix of Material Representation Hengfeng Zuo a, Mark Jones b, Tony Hope a, a Design and Advanced Technology Research Centre, Southampton Institute, UK b Product Design Group, Faculty of Technology,

More information

Pushing the Right Buttons: Design Characteristics of Touch Screen Buttons

Pushing the Right Buttons: Design Characteristics of Touch Screen Buttons 1 of 6 10/3/2009 9:40 PM October 2009, Vol. 11 Issue 2 Volume 11 Issue 2 Past Issues A-Z List Usability News is a free web newsletter that is produced by the Software Usability Research Laboratory (SURL)

More information

Handling Partial Preferences in the Belief AHP Method: Application to Life Cycle Assessment

Handling Partial Preferences in the Belief AHP Method: Application to Life Cycle Assessment Handling Partial Preferences in the Belief AHP Method: Application to Life Cycle Assessment Amel Ennaceur 1, Zied Elouedi 1, and Eric Lefevre 2 1 University of Tunis, Institut Supérieur de Gestion de Tunis,

More information

Data Mining in Bioinformatics Day 4: Text Mining

Data Mining in Bioinformatics Day 4: Text Mining Data Mining in Bioinformatics Day 4: Text Mining Karsten Borgwardt February 25 to March 10 Bioinformatics Group MPIs Tübingen Karsten Borgwardt: Data Mining in Bioinformatics, Page 1 What is text mining?

More information

A Cooperative Multiagent Architecture for Turkish Sign Tutors

A Cooperative Multiagent Architecture for Turkish Sign Tutors A Cooperative Multiagent Architecture for Turkish Sign Tutors İlker Yıldırım Department of Computer Engineering Boğaziçi University Bebek, 34342, Istanbul, Turkey ilker.yildirim@boun.edu.tr 1 Introduction

More information

ISO 5495 INTERNATIONAL STANDARD. Sensory analysis Methodology Paired comparison test. Analyse sensorielle Méthodologie Essai de comparaison par paires

ISO 5495 INTERNATIONAL STANDARD. Sensory analysis Methodology Paired comparison test. Analyse sensorielle Méthodologie Essai de comparaison par paires INTERNATIONAL STANDARD ISO 5495 Third edition 2005-11-15 Sensory analysis Methodology Paired comparison test Analyse sensorielle Méthodologie Essai de comparaison par paires Reference number ISO 2005 Provläsningsexemplar

More information

Hypertension encoded in GLIF

Hypertension encoded in GLIF Hypertension encoded in GLIF Guideline 2 (Based on the hypertension guideline. Simplified (not all contraindications, relative contra-indications, and relative indications are specified). Drug interactions

More information

Mining Medline for New Possible Relations of Concepts

Mining Medline for New Possible Relations of Concepts Mining Medline for New ossible elations of Concepts Wei Huang,, Yoshiteru Nakamori, Shouyang Wang, and Tieju Ma School of Knowledge Science, Japan Advanced Institute of Science and Technology, Asahidai

More information

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN

Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Statistical analysis DIANA SAPLACAN 2017 * SLIDES ADAPTED BASED ON LECTURE NOTES BY ALMA LEORA CULEN Vs. 2 Background 3 There are different types of research methods to study behaviour: Descriptive: observations,

More information

Predicting Heart Attack using Fuzzy C Means Clustering Algorithm

Predicting Heart Attack using Fuzzy C Means Clustering Algorithm Predicting Heart Attack using Fuzzy C Means Clustering Algorithm Dr. G. Rasitha Banu MCA., M.Phil., Ph.D., Assistant Professor,Dept of HIM&HIT,Jazan University, Jazan, Saudi Arabia. J.H.BOUSAL JAMALA MCA.,M.Phil.,

More information

Modeling Asymmetric Slot Allocation for Mobile Multimedia Services in Microcell TDD Employing FDD Uplink as Macrocell

Modeling Asymmetric Slot Allocation for Mobile Multimedia Services in Microcell TDD Employing FDD Uplink as Macrocell Modeling Asymmetric Slot Allocation for Mobile Multimedia Services in Microcell TDD Employing FDD Uplink as Macrocell Dong-Hoi Kim Department of Electronic and Communication Engineering, College of IT,

More information

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India

Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision in Pune, India 20th International Congress on Modelling and Simulation, Adelaide, Australia, 1 6 December 2013 www.mssanz.org.au/modsim2013 Logistic Regression and Bayesian Approaches in Modeling Acceptance of Male Circumcision

More information

Research on the Administrative Rules of APIs, Pharmaceutical Excipients and Auxiliary Materials Master File. Translation version

Research on the Administrative Rules of APIs, Pharmaceutical Excipients and Auxiliary Materials Master File. Translation version Research on the Administrative Rules of APIs, Pharmaceutical Excipients and Auxiliary Materials Master File Division of Pharmaceuticals Department of Drug Registration Hou Renping Translation version Main

More information

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES CHAPTER SIXTEEN Regression NOTE TO INSTRUCTORS This chapter includes a number of complex concepts that may seem intimidating to students. Encourage students to focus on the big picture through some of

More information

A MODIFIED FREQUENCY BASED TERM WEIGHTING APPROACH FOR INFORMATION RETRIEVAL

A MODIFIED FREQUENCY BASED TERM WEIGHTING APPROACH FOR INFORMATION RETRIEVAL Int. J. Chem. Sci.: 14(1), 2016, 449-457 ISSN 0972-768X www.sadgurupublications.com A MODIFIED FREQUENCY BASED TERM WEIGHTING APPROACH FOR INFORMATION RETRIEVAL M. SANTHANAKUMAR a* and C. CHRISTOPHER COLUMBUS

More information

Skin cancer reorganization and classification with deep neural network

Skin cancer reorganization and classification with deep neural network Skin cancer reorganization and classification with deep neural network Hao Chang 1 1. Department of Genetics, Yale University School of Medicine 2. Email: changhao86@gmail.com Abstract As one kind of skin

More information

Credal decision trees in noisy domains

Credal decision trees in noisy domains Credal decision trees in noisy domains Carlos J. Mantas and Joaquín Abellán Department of Computer Science and Artificial Intelligence University of Granada, Granada, Spain {cmantas,jabellan}@decsai.ugr.es

More information

Finding an Efficient Threshold for Fixation Detection in Eye Gaze Tracking

Finding an Efficient Threshold for Fixation Detection in Eye Gaze Tracking Finding an Efficient Threshold for Fixation Detection in Eye Gaze Tracking Sudarat Tangnimitchok 1 *, Nonnarit O-larnnithipong 1 *, Armando Barreto 1 *, Francisco R. Ortega 2 **, and Naphtali D. Rishe

More information

A NOVEL VARIABLE SELECTION METHOD BASED ON FREQUENT PATTERN TREE FOR REAL-TIME TRAFFIC ACCIDENT RISK PREDICTION

A NOVEL VARIABLE SELECTION METHOD BASED ON FREQUENT PATTERN TREE FOR REAL-TIME TRAFFIC ACCIDENT RISK PREDICTION OPT-i An International Conference on Engineering and Applied Sciences Optimization M. Papadrakakis, M.G. Karlaftis, N.D. Lagaros (eds.) Kos Island, Greece, 4-6 June 2014 A NOVEL VARIABLE SELECTION METHOD

More information

Moral Preferences FRANCESCA ROSSI

Moral Preferences FRANCESCA ROSSI Moral s FRANCESCA ROSSI Decision making Based on our preferences over the options Social context: aggregation of the individuals preferences Voting rules: from collection of preference orderings to a single

More information

Political advocacy: a way to achieve better outcomes for people with Parkinson s disease and their families A workshop on engaging effectively with

Political advocacy: a way to achieve better outcomes for people with Parkinson s disease and their families A workshop on engaging effectively with Political advocacy: a way to achieve better outcomes for people with Parkinson s disease and their families A workshop on engaging effectively with your local authorities EPDA Members Meeting 15 November

More information

EXTRACTION OF RETINAL BLOOD VESSELS USING IMAGE PROCESSING TECHNIQUES

EXTRACTION OF RETINAL BLOOD VESSELS USING IMAGE PROCESSING TECHNIQUES EXTRACTION OF RETINAL BLOOD VESSELS USING IMAGE PROCESSING TECHNIQUES T.HARI BABU 1, Y.RATNA KUMAR 2 1 (PG Scholar, Dept. of Electronics and Communication Engineering, College of Engineering(A), Andhra

More information

FMEA AND RPN NUMBERS. Failure Mode Severity Occurrence Detection RPN A B

FMEA AND RPN NUMBERS. Failure Mode Severity Occurrence Detection RPN A B FMEA AND RPN NUMBERS An important part of risk is to remember that risk is a vector: one aspect of risk is the severity of the effect of the event and the other aspect is the probability or frequency of

More information

Clay Tablet Connector for hybris. User Guide. Version 1.5.0

Clay Tablet Connector for hybris. User Guide. Version 1.5.0 Clay Tablet Connector for hybris User Guide Version 1.5.0 August 4, 2016 Copyright Copyright 2005-2016 Clay Tablet Technologies Inc. All rights reserved. All rights reserved. This document and its content

More information

Sentiment Classification of Chinese Reviews in Different Domain: A Comparative Study

Sentiment Classification of Chinese Reviews in Different Domain: A Comparative Study Sentiment Classification of Chinese Reviews in Different Domain: A Comparative Study Qingqing Zhou and Chengzhi Zhang ( ) Department of Information Management, Nanjing University of Science and Technology,

More information

PROPOSED WORK PROGRAMME FOR THE CLEARING-HOUSE MECHANISM IN SUPPORT OF THE STRATEGIC PLAN FOR BIODIVERSITY Note by the Executive Secretary

PROPOSED WORK PROGRAMME FOR THE CLEARING-HOUSE MECHANISM IN SUPPORT OF THE STRATEGIC PLAN FOR BIODIVERSITY Note by the Executive Secretary CBD Distr. GENERAL UNEP/CBD/COP/11/31 30 July 2012 ORIGINAL: ENGLISH CONFERENCE OF THE PARTIES TO THE CONVENTION ON BIOLOGICAL DIVERSITY Eleventh meeting Hyderabad, India, 8 19 October 2012 Item 3.2 of

More information

doi: / _59(

doi: / _59( doi: 10.1007/978-3-642-39188-0_59(http://dx.doi.org/10.1007/978-3-642-39188-0_59) Subunit modeling for Japanese sign language recognition based on phonetically depend multi-stream hidden Markov models

More information

Mandating Body Mass Index Reporting in the Schools

Mandating Body Mass Index Reporting in the Schools Mandating Body Mass Index Reporting in the Schools Country: USA Partner Institute: Department of Behavioral Science and Health Education, Rollins School of Public Health, Emory University Survey no: (15)

More information

Recorded Sound Repository Military Edition

Recorded Sound Repository Military Edition Recorded Sound Repository Military Edition A New Tool for Researchers Tanisha Hammill (HCE) Odile Clavier (Creare) Matt Ueckermann (Creare) MTG-15-02-5165 / 7114 / 1 DISCLAIMER The views expressed in this

More information

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE

EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE ...... EXERCISE: HOW TO DO POWER CALCULATIONS IN OPTIMAL DESIGN SOFTWARE TABLE OF CONTENTS 73TKey Vocabulary37T... 1 73TIntroduction37T... 73TUsing the Optimal Design Software37T... 73TEstimating Sample

More information

World Reflexology Week Information Pack September 2010

World Reflexology Week Information Pack September 2010 World Reflexology Week Information Pack 19-25 September 2010 Promoting Reflexology. Promoting Reflexologists. World Reflexology Week is your opportunity to promote your business as a reflexologist whilst

More information

Tourism Website Customers Repurchase Intention: Information System Success Model Ming-yi HUANG 1 and Tung-liang CHEN 2,*

Tourism Website Customers Repurchase Intention: Information System Success Model Ming-yi HUANG 1 and Tung-liang CHEN 2,* 2017 International Conference on Applied Mechanics and Mechanical Automation (AMMA 2017) ISBN: 978-1-60595-471-4 Tourism Website Customers Repurchase Intention: Information System Success Model Ming-yi

More information

Using Perceptual Grouping for Object Group Selection

Using Perceptual Grouping for Object Group Selection Using Perceptual Grouping for Object Group Selection Hoda Dehmeshki Department of Computer Science and Engineering, York University, 4700 Keele Street Toronto, Ontario, M3J 1P3 Canada hoda@cs.yorku.ca

More information

Selection and Combination of Markers for Prediction

Selection and Combination of Markers for Prediction Selection and Combination of Markers for Prediction NACC Data and Methods Meeting September, 2010 Baojiang Chen, PhD Sarah Monsell, MS Xiao-Hua Andrew Zhou, PhD Overview 1. Research motivation 2. Describe

More information

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful. icausalbayes USER MANUAL INTRODUCTION You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful. We expect most of our users

More information

Emotion Recognition using a Cauchy Naive Bayes Classifier

Emotion Recognition using a Cauchy Naive Bayes Classifier Emotion Recognition using a Cauchy Naive Bayes Classifier Abstract Recognizing human facial expression and emotion by computer is an interesting and challenging problem. In this paper we propose a method

More information

arxiv: v1 [stat.ml] 23 Jan 2017

arxiv: v1 [stat.ml] 23 Jan 2017 Learning what to look in chest X-rays with a recurrent visual attention model arxiv:1701.06452v1 [stat.ml] 23 Jan 2017 Petros-Pavlos Ypsilantis Department of Biomedical Engineering King s College London

More information

PubMed Tutorial for Veterinarians URL:

PubMed Tutorial for Veterinarians URL: Title: Scripts for the PubMed Tutorial for Veterinarians PubMed Tutorial for Veterinarians URL: http://cases.vetmoodle.org/cet_courseplayer/demo1/public/pubmed.html Digital collection of the documents

More information

Proposing a New Term Weighting Scheme for Text Categorization

Proposing a New Term Weighting Scheme for Text Categorization Proposing a New Term Weighting Scheme for Text Categorization Man LAN Institute for Infocomm Research 21 Heng Mui Keng Terrace Singapore 119613 lanman@i2r.a-star.edu.sg Chew-Lim TAN School of Computing

More information

OECD QSAR Toolbox v.4.2. An example illustrating RAAF scenario 6 and related assessment elements

OECD QSAR Toolbox v.4.2. An example illustrating RAAF scenario 6 and related assessment elements OECD QSAR Toolbox v.4.2 An example illustrating RAAF scenario 6 and related assessment elements Outlook Background Objectives Specific Aims Read Across Assessment Framework (RAAF) The exercise Workflow

More information

Statistical Power Sampling Design and sample Size Determination

Statistical Power Sampling Design and sample Size Determination Statistical Power Sampling Design and sample Size Determination Deo-Gracias HOUNDOLO Impact Evaluation Specialist dhoundolo@3ieimpact.org Outline 1. Sampling basics 2. What do evaluators do? 3. Statistical

More information

of subjective evaluations.

of subjective evaluations. Human communication support by the taxonomy of subjective evaluations Emi SUEYOSHI*, Emi YANO*, Isao SHINOHARA**, Toshikazu KATO* *Department of Industrial and Systems Engineering, Chuo University, 1-13-27,

More information

Student Minds Turl Street, Oxford, OX1 3DH

Student Minds Turl Street, Oxford, OX1 3DH Who are we? Student Minds is a national charity working to encourage peer support for student mental health. We encourage students to have the confidence to talk and to listen. We aim to bring people together

More information