Word Sense Disambiguation Using Sense Signatures Automatically Acquired from a Second Language

Similar documents
Approximating hierarchy-based similarity for WordNet nominal synsets using Topic Signatures

. Semi-automatic WordNet Linking using Word Embeddings. Kevin Patel, Diptesh Kanojia and Pushpak Bhattacharyya Presented by: Ritesh Panjwani

Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions?

Using Information From the Target Language to Improve Crosslingual Text Classification

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

Combining unsupervised and supervised methods for PP attachment disambiguation

Workshop/Hackathon for the Wordnet Bahasa

Exploiting Ordinality in Predicting Star Reviews

Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model

FINAL REPORT Measuring Semantic Relatedness using a Medical Taxonomy. Siddharth Patwardhan. August 2003

Predicting Breast Cancer Survivability Rates

Chapter IR:VIII. VIII. Evaluation. Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing

Case-based reasoning using electronic health records efficiently identifies eligible patients for clinical trials

Automated Medical Diagnosis using K-Nearest Neighbor Classification

An empirical evaluation of text classification and feature selection methods

Learning the Fine-Grained Information Status of Discourse Entities

How preferred are preferred terms?

KNOWLEDGE-BASED METHOD FOR DETERMINING THE MEANING OF AMBIGUOUS BIOMEDICAL TERMS USING INFORMATION CONTENT MEASURES OF SIMILARITY

Extraction of Adverse Drug Effects from Clinical Records

Data Mining in Bioinformatics Day 4: Text Mining

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation

Extracting geographic locations from the literature for virus phylogeography using supervised and distant supervision methods

The Danish Sign Language Dictionary Jette H. Kristoffersen and Thomas Troelsgård University College Capital, Copenhagen, Denmark

Improved Intelligent Classification Technique Based On Support Vector Machines

Detecting Hoaxes, Frauds and Deception in Writing Style Online

NMF-Density: NMF-Based Breast Density Classifier

Textual Emotion Processing From Event Analysis

Gender Based Emotion Recognition using Speech Signals: A Review

Brendan O Connor,

EMOTION CLASSIFICATION: HOW DOES AN AUTOMATED SYSTEM COMPARE TO NAÏVE HUMAN CODERS?

Facial expression recognition with spatiotemporal local descriptors

Predicting Breast Cancer Survival Using Treatment and Patient Factors

Fuzzy Rule Based Systems for Gender Classification from Blog Data

Contributions to Brain MRI Processing and Analysis

A Study of Abbreviations in Clinical Notes Hua Xu MS, MA 1, Peter D. Stetson, MD, MA 1, 2, Carol Friedman Ph.D. 1

Data complexity measures for analyzing the effect of SMOTE over microarrays

Modeling the Use of Space for Pointing in American Sign Language Animation

A NOVEL VARIABLE SELECTION METHOD BASED ON FREQUENT PATTERN TREE FOR REAL-TIME TRAFFIC ACCIDENT RISK PREDICTION

Automated Assessment of Diabetic Retinal Image Quality Based on Blood Vessel Detection

Exploiting Patent Information for the Evaluation of Machine Translation

Generalizing Dependency Features for Opinion Mining

Shu Kong. Department of Computer Science, UC Irvine

Expert System Profile

THE EFFECT OF EXPECTATIONS ON VISUAL INSPECTION PERFORMANCE

Sawtooth Software. MaxDiff Analysis: Simple Counting, Individual-Level Logit, and HB RESEARCH PAPER SERIES. Bryan Orme, Sawtooth Software, Inc.

Experiment Presentation CS Chris Thomas Experiment: What is an Object? Alexe, Bogdan, et al. CVPR 2010

Evaluation & Systems. Ling573 Systems & Applications April 9, 2015

Inter-session reproducibility measures for high-throughput data sources

OECD QSAR Toolbox v.4.2. An example illustrating RAAF scenario 6 and related assessment elements

A NEW DIAGNOSIS SYSTEM BASED ON FUZZY REASONING TO DETECT MEAN AND/OR VARIANCE SHIFTS IN A PROCESS. Received August 2010; revised February 2011

Proposing a New Term Weighting Scheme for Text Categorization

Gene Selection for Tumor Classification Using Microarray Gene Expression Data

Positive and Unlabeled Relational Classification through Label Frequency Estimation

Positive and Unlabeled Relational Classification through Label Frequency Estimation

Towards More Confident Recommendations: Improving Recommender Systems Using Filtering Approach Based on Rating Variance

Recognizing Scenes by Simulating Implied Social Interaction Networks

Loose Tweets: An Analysis of Privacy Leaks on Twitter

Epimining: Using Web News for Influenza Surveillance

10CS664: PATTERN RECOGNITION QUESTION BANK

What Helps Where And Why? Semantic Relatedness for Knowledge Transfer

25. Two-way ANOVA. 25. Two-way ANOVA 371

The Open Access Institutional Repository at Robert Gordon University

Supporting Information Identification of Amino Acids with Sensitive Nanoporous MoS 2 : Towards Machine Learning-Based Prediction

Shu Kong. Department of Computer Science, UC Irvine

of subjective evaluations.

Understanding Uncertainty in School League Tables*

Automatic Hemorrhage Classification System Based On Svm Classifier

Migratory Bird classification and analysis Aparna Pal

ITERATIVELY TRAINING CLASSIFIERS FOR CIRCULATING TUMOR CELL DETECTION

Erasmus MC at CLEF ehealth 2016: Concept Recognition and Coding in French Texts

Toward the Evaluation of Machine Translation Using Patent Information

Psychological testing

Identifying Adverse Drug Events from Patient Social Media: A Case Study for Diabetes

Testing the robustness of anonymization techniques: acceptable versus unacceptable inferences - Draft Version

Paul Bennett, Microsoft Research (CLUES) Joint work with Ben Carterette, Max Chickering, Susan Dumais, Eric Horvitz, Edith Law, and Anton Mityagin.

Using AUC and Accuracy in Evaluating Learning Algorithms

International Journal of Pharma and Bio Sciences A NOVEL SUBSET SELECTION FOR CLASSIFICATION OF DIABETES DATASET BY ITERATIVE METHODS ABSTRACT

Discovering and Understanding Self-harm Images in Social Media. Neil O Hare, MFSec Bucharest, Romania, June 6 th, 2017

Bootstrapped Integrative Hypothesis Test, COPD-Lung Cancer Differentiation, and Joint mirnas Biomarkers

Data mining for Obstructive Sleep Apnea Detection. 18 October 2017 Konstantinos Nikolaidis

The Long Tail of Recommender Systems and How to Leverage It

Unsupervised Measurement of Translation Quality Using Multi-engine, Bi-directional Translation

Learning with Rare Cases and Small Disjuncts

Identifying Thyroid Carcinoma Subtypes and Outcomes through Gene Expression Data Kun-Hsing Yu, Wei Wang, Chung-Yu Wang

Overview of the Patent Translation Task at the NTCIR-7 Workshop

Finding Information Sources by Model Sharing in Open Multi-Agent Systems 1

An Experimental Study on Authorship Identification for Cyber Forensics

Algorithms in Nature. Pruning in neural networks

Cross-cultural Deception Detection

Gender-Based Differential Item Performance in English Usage Items

1. INTRODUCTION. Vision based Multi-feature HGR Algorithms for HCI using ISL Page 1

Objectives. Quantifying the quality of hypothesis tests. Type I and II errors. Power of a test. Cautions about significance tests

RELYING ON TRUST TO FIND RELIABLE INFORMATION. Keywords Reputation; recommendation; trust; information retrieval; open distributed systems.

Data Mining Techniques to Predict Survival of Metastatic Breast Cancer Patients

AClass: A Simple, Online Probabilistic Classifier. Vikash K. Mansinghka Computational Cognitive Science Group MIT BCS/CSAIL

38 Int'l Conf. Bioinformatics and Computational Biology BIOCOMP'16

Referring Expressions & Alternate Views of Summarization. Ling 573 Systems and Applications May 24, 2016

CSE Introduction to High-Perfomance Deep Learning ImageNet & VGG. Jihyung Kil

Evaluation of Linguistic Labels Used in Applications

Language Volunteer Guide

Transcription:

Word Sense Disambiguation Using Sense Signatures Automatically Acquired from a Second Language Document Number Working Paper 6.12h(b) (3rd Round) Project ref. IST-2001-34460 Project Acronym MEANING Project full title Developing Multilingual Web-scale Language Technologies Project URL http://www.lsi.upc.es/ nlp/meaning/meaning.html Availability Public Authors: Xinglong Wang (UoS) and John Carroll (UoS) INFORMATION SOCIETY TECHNOLOGIES

WSD Using Sense Signatures Page : 1 Project ref. IST-2001-34460 Project Acronym MEANING Project full title Developing Multilingual Web-scale Language Technologies Security (Distribution level) Public Contractual date of delivery February 2004 Actual date of delivery February 2004 Document Number Working Paper 6.12h(b) (3rd Round) Type Report Status & version Final Number of pages 14 WP contributing to the deliverable WP6.12 WPTask responsible Eneko Agirre Authors Xinglong Wang (UoS) and John Carroll (UoS) Other contributors Reviewer EC Project Officer Evangelia Markidou Authors: Xinglong Wang (UoS) and John Carroll (UoS) Keywords: Word Sense Disambiguation, Sense Signature Acquisition Abstract: We present a novel near-unsupervised approach to the task of Word Sense Disambiguation (WSD). For each sense of a word with multiple meanings we produce a sense signature : a set of words or snippets of text that tend to co-occur with that sense. We build sense signatures automatically, using large quantities of Chinese text, and English-Chinese and Chinese-English bilingual dictionaries, taking advantage of the observation that mappings between words and meanings are often different in typologically distant languages. We train a classifier on the sense signatures and test it on two gold standard English WSD datasets, one for binary and the other for multi-way sense identification. Both evaluations give results that exceed previous state-of-the-art results for comparable systems. We also demonstrate that a little manual effort can improve sense signature quality, as measured by WSD accuracy.

WSD Using Sense Signatures Page : 2 Contents 1 Introduction 3 2 Acquisition of Sense Signatures 4 3 Experiments and Results 5 3.1 Building Sense Signatures........................... 5 3.2 WSD Experiments on TWA dataset...................... 7 3.3 WSD Experiments on Senseval-2 Lexical Sample dataset.......... 10 4 Conclusions and Future Work 12

WSD Using Sense Signatures Page : 3 1 Introduction We call a set of words or text snippets that tend to co-occur with a certain sense of a word a sense signature. More formally, a sense signature SS of sense s with a length n is defined as: SS s = {w 1, w 2,..., w n }, where w i denotes a word form. Note that word forms in a sense signature can be ordered or unordered: when ordered, they are a piece of text snippet, and when unordered, they are just a bag of words. In this paper, word forms in sense signatures are unordered. For example, a sense signature for the serving sense of waiter may be {restaurant, menu, waitress, dinner}, while the sense signature {hospital, station, airport, boyfriend} may be one for the waiting sense of the word. The term sense signature is derived from the term topic signature 1, which refers to a set of words that tend to be associated with a certain topic. Topic signatures were originally proposed by Lin and Hovy [2000] in the context of Text Summarisation. They developed a text summarisation system containing a topic identification module that was trained on topic signatures. Their topic signatures were extracted from manually topictagged text. Agirre et al. [2001] proposed an unsupervised method for building topic signatures. Their system queries an Internet search engine with monosemous synomyms of words that have multiple senses in WordNet [Miller et al., 1990], and then extracts topic signatures by processing text snippets returned by the search engine. This way, they can attach a set of topic signatures to each synset in WordNet. They evaluated their topic signatures on a WSD task using 7 words drawn from SemCor [Miller et al., 1993], but found their topic signatures were not very useful for WSD. Despite Agirre et al. s disappointing result, a means of automatically and accurately producing sense (topic) signatures would be one way of addressing the important problem of the lack of training data for machine learning based approaches to WSD. Another approach to this problem is to automatically produce sense-tagged data using parallel bilingual corpora [Gale et al., 1992; Diab and Resnik, 2002], exploiting the assumption that mappings between words and meanings are different in different languages. One problem with relying on bilingual corpora for data collection is that bilingual corpora are rare, and aligned bilingual corpora are even rarer. Mining the Web for bilingual text [Resnik, 1999] is a solution, but it is still difficult to obtain sufficient quantities of high quality data. Another problem is that if two languages are closely related, useful data for some words cannot be collected because different senses of polysemous words in one language often translate to the same word in the other. In the second round of the MEANING project, we [2004] used this cross-language paradigm to acquire sense signatures for English senses from large amounts of Chinese text and English-Chinese and Chinese-English bilingual dictionaries. We showed the signatures to be useful as training data for a binary disambiguation task. The WSD system described in this paper is based on our previous work. We automatically build sense signatures in a similar way and then train a Naïve Baysian classifier on them. We have evaluated the 1 We use the term sense signature because it more precisely reflects our motivation of creating this resource for the purpose of Word Sense Disambiguation (WSD), though topic signature has been more often used in the literature.

WSD Using Sense Signatures Page : 4 system on the TWA Sense Tagged Dataset [Mihalcea, 2003], as well as the English Lexical Sample Dataset from Senseval-2. Our results show conclusively that such sense signatures can be used successfully in a full-scale WSD task. The reminder of the paper is organised as follows. Section 2 briefly reviews the acquisition algorithm for sense signatures. Section 3 describes details of building this resource and demonstrates our applications of sense signatures to WSD. We also present results and analysis in this section. Finally, we conclude in Section 4 and talk about future work. 2 Acquisition of Sense Signatures Following the assumption that word senses are lexicalised differently between languages, we [Wang, 2004] proposed an unsupervised method to automatically acquire English sense signatures using large quantities of Chinese text and English-Chinese and Chinese-English dictionaries. The Chinese language was chosen since it is a distant language from English and the more distant two languages are, the more likely that senses are lexicalised differently [Resnik and Yarowsky, 1999]. We also showed that the topic signatures are helpful for a small-scale WSD task. The underlying assumption of this approach is that each sense of an ambiguous English word corresponds to a distinct translation in Chinese. As English ambiguous word w Sense 1 of w Chinese translation of sense 1 Chinese text snippet 1 Chinese text snippet 2...... {English sense signature 1 for sense 1 of w} {English sense signature 2 for sense 1 of w}...... English-Chinese Lexicon Chinese Search Engine Chinese Segementation Chinese- English Lexicon Sense 2 of w Chinese translation of sense 2 Chinese text snippet 1 Chinese text snippet 2...... {English sense signature 1 for sense 2 of w} {English sense signature 2 for sense 2 of w}...... Figure 1: Process of automatic acquisition of sense signatures proposed by us. For simplicity, assume w has two senses. shown in Figure 1, firstly, our system translates senses of an English word into Chinese words, using an English-Chinese dictionary, and then retrieves text snippets from a large amount of Chinese text, with the Chinese translations as queries. Secondly, the Chinese text snippets are segmented and then translated back to English word by word, using a Chinese-English dictionary. For each sense, a set of sense signatures is produced. As an example, suppose one wants to retrieve sense signatures for the financial sense of interest. One would first look up the Chinese translations of this sense in an English-Chinese dictionary, and find that is the right Chinese translation corresponding to this particular

WSD Using Sense Signatures Page : 5 sense. Then, the next stage is to automatically build a collection of Chinese text snippets by either searching in a large Chinese corpus or on the Web, using as query. As Chinese is a language written without spaces between words, one needs to use a segmentor to mark word boundaries before translating the snippets word by word back to English. The result is a collection of sense signatures for the financial sense of interest. For example, {interest rate, bank, annual, economy,...} might be one of them. We trained a secondorder co-occurrence vector model on the sense signatures and tested on the TWA Binary sense-tagged dataset, with reasonable results. Since this method acquires training data for WSD systems from raw monolingual Chinese text, it avoids the problem of the shortage of English sense-tagged corpora, and also of the shortage of aligned bilingual corpora. Also, if existing corpora are not big enough, one can always harvest more text from the Web. However, like all methods based on the cross-language translation assumption mentioned above, there are potential problems. For example, it is possible that a Chinese translation of an English sense is also ambiguous, and thus the contents of text snippets retrieved may talk about a concept other than the one we want. But our previous experiment, on just 6 words each with only a binary sense disambiguation, was too small-scale to determine whether this is a problem in practice. In the following section, we present our experiments on WSD and describe our solutions to such problems. 3 Experiments and Results We firstly describe in detail how we prepared sense signatures and then report our experiments on the TWA binary sense-tagged dataset, using various machine learning algorithms. Using the algorithm that performed best, we carried out a larger scale evaluation on the English Senseval-2 Lexical Sample dataset [Kilgarriff, 2001] and obtained results that significantly exceed those published for comparable systems. 3.1 Building Sense Signatures Following the approach described in Section 2, we built sense signatures for the 6 words to be binarily disambiguated in the TWA dataset and the 44 words in the Senseval-2 dataset 2. These 50 words have 229 senses in total to disambiguate. The first step was translating English senses to Chinese. We used the Yahoo! Student English-Chinese Online Dictionary 3, as well as a more comprehensive electronic dictionary. This is because the Yahoo! dictionary is designed for English learners, and its sense granularity is rather coarse-grained. For the binary disambiguation experiments, it is good enough. However, the Senseval-2 Lexical Sample task uses WordNet 1.7 as gold standard, which has very fine sense distinctions and translations in the Yahoo! dictionary don t match this standard. 2 These 44 words cover all nouns and adjectives in the Senseval-2 dataset, but exclude all verbs. We discuss this point in section 3.3. 3 See: http://cn.yahoo.com/dictionary.

WSD Using Sense Signatures Page : 6 PowerWord 2002 4 was chosen as a supplementary dictionary because it integrates several comprehensive English-Chinese dictionaries in a single application. For each sense of an English word entry, both Yahoo! and PowerWord 2002 dictionaries list not only Chinese translations but also English glosses, which provides a bridge between WordNet synsets and Chinese translations in the dictionaries. Note that we only used PowerWord 2002 to translate words in the Senseval-2 dataset. In detail, to automatically find a Chinese translation for sense s of an English word w, our system looks up w in both dictionaries and determines whether w has the same number of senses as in WordNet. If it does, in one of the bilingual dictionaries, we locate the English gloss g which has the maximum number of overlapping words with the gloss for s in the WordNet synset. The Chinese translation associated with g is then selected. Although this method successfully identified Chinese translations for 29 out of the 50 words (58%), translations for the remaining word senses remain unknown because the sense distinctions are different between our bilingual dictionaries and the WordNet. In fact, unless an English-Chinese bilingual WordNet becomes available, this problem is inevitable. For our experiments, we solved the problem by manually looking up dictionaries and identifying translations. This task took an hour by an annotator who speaks both Chinese and English fluently. It is possible that the Chinese translations are also ambiguous, which can make the topic of a collection of text snippets deviate from what is expected. For example, the oral sense of mouth can be translated as or in Chinese. However, the first translation ( ) is a single-character word and is highly ambiguous: by combining with other characters, its meaning varies. For example, means an exit or to export. On the other hand, the second translation ( ) is monosemous and should be used. To assess the influence of such ambiguous translations, we carried out experiments involving more human labour to verify the translations. The same annotator manually eliminated those highly ambiguous Chinese translations and then replaced them with less ambiguous or ideally monosemous Chinese translations. This process changed roughly half of the translations and took about five hours. We compared the basic system with this manually improved one. The results are shown in section 3.3. Using translations as queries, the sense signatures were automatically extracted from the Chinese Gigaword Corpus (CGC), distributed by the LDC 5, which contains 2.7GB newswire text, of which 900MB are sourced from Xinhua News Agency of Beijing, and 1.8GB are drawn from Central News from Taiwan. A small percentage of words have different meanings in these two Chinese dialects, and since the Chinese-English dictionary (LDC Mandarin-English Translation Lexicon Version 3.0 ) we use later is compiled with Mandarin usages in mind, we mainly retrieve data from Xinhua News. We set a threshold of 100, and only when the amount of snippets retrieved from Xinhua News is smaller than 100, we turn to Central News to collect more data. Specifically, for 51 out of the 229 (22%) Chinese queries, the system retrieved less than 100 instances from Xinhua News. Then it automatically retrieved more data from Central News. In theory, if the training data is 4 A commercial electronic dictionary application. See: http://cb.kingsoft.com. 5 Available at: http://www.ldc.upenn.edu/catalog/

WSD Using Sense Signatures Page : 7 still not enough, one can always turn to other text resources, such as the Web. To decide the optimal length of text snippets to retrieve, we carried out pilot experiments with two length settings: 250 ( 110 English words) and 400 ( 175 English words) Chinese characters, and found that more context words helped improve WSD performance (results not shown). Therefore, we retrieve text snippets with a length of 400 characters. We then segmented all text snippets, using the application ICTCLAS 6. After the segmentor marked all word boundaries, the system automatically translated the text snippets word by word using the electronic LDC Mandarin-English Translation Lexicon 3.0. As expected, the lexicon doesn t cover all Chinese words. We simply discarded those Chinese words that don t have an entry in this lexicon. We also discarded those Chinese words with multiword English translations. Since the discarded words can be informative, one direction of our research in the future is to find an up-to-date wide coverage dictionary, and to see how much difference it will make. Finally, we filtered the sense signatures with a stop-word list, to ensure only content words were included. We ended up with 229 sets of sense signatures for all senses of the 50 words in both test datasets. Each sense signature contains a set of words that were translated from a Chinese text snippet, whose content should closely relate to the English word sense in question. Words in a sense signature are unordered, because in this work we only used bag-of-words information. Except for the inevitable very small amount of manual work with respect to mapping WordNet glosses to those in English-Chinese dictionaries, the whole process is automatic. 3.2 WSD Experiments on TWA dataset The TWA dataset is a manually sense-tagged corpus [Mihalcea, 2003]. It contains binarily sense-tagged text instances, drawn from the British National Corpus, for 6 nouns. We trained various machine learning algorithms 7 on sense signatures we built automatically, as described above, and then evaluated them on the dataset. Table 1 summarises the sizes of sense signatures we computed for each sense of the 6 words, and also the number of test items in the TWA dataset. The average length of a sense signature is 35 words, which is much shorter than the length of the text snippets, which was set to 400 Chinese characters ( 175 English words). This is because function words and words that are not listed in the LDC Mandarin-English lexicon were eliminated. We didn t apply any weighting to the features because performance went down in our pilot experiments when we applied a TF.IDF weighting scheme (results not shown). We also limited the maximum training sense signatures to 6000, for efficiency purposes. Table 2 summarises the evaluation results. It shows that 5 out of 7 algorithms beat the most-frequent-sense baseline. McCarthy et al. [2004] argue that this is a very tough baseline 6 See: http://mtgroup.ict.ac.cn/ zhp/ictclas 7 We used the implementation in the Weka machine learning package, apart from the Second- Order Co-occurrence Vector Model (OCV), which we implemented ourselves. Weka is available at: http://www.cs.waikato.ac.nz/ ml/weka.

WSD Using Sense Signatures Page : 8 Word bass crane motion palm plant tank Sense 1. fish 2. music 1. bird 2. machine 1. physical 2. legal 1. hand 2. tree 1. living 2. factory 1. container 2. vehicle Training Data 400 746 1146 769 1240 2009 6000 3265 9265 575 361 936 6000 6000 12000 3573 1303 4876 Test Data 107 95 201 201 188 201 Table 1: The 6 ambiguous words and their senses in the TWA dataset, sizes of training sense signatures used for each sense and word (3rd column), and numbers of test items. Word DT NB NBk knn AB OCV SVM MFB bass 94.4 93.5 93.5 93.5 94.4 95.3 94.4 90.7 crane motion 62.8 76.1 77.7 73.1 81.9 81.6 74.5 77.1 47.9 74.1 77.7 72.6 66.3 74.1 74.7 70.1 palm plant tank 78.1 53.7 58.7 76.6 69.7 70.2 75.6 75.0 73.6 72.1 59.0 37.3 78.1 48.4 57.2 74.1 68.1 68.2 78.1 65.4 57.2 71.1 54.3 62.7 avg. 69.5 75.2 78.8 66.1 66.3 74.1 71.3 68.5 M03 92.5 71.6 75.6 80.6 69.1 63.7 76.6 Table 2: Accuracy scores when using 7 machine learning algorithms: Decision Trees (DT), Naïve Bayes with normal estimation (NB), Naïve Bayes with kernel estimation (NBk), k-nearest Neighbours (knn), AdaBoost (AB), Second-Order Occurrence Vector Model (OCV) and Support Vector Machines (SVM). Underlined scores are the winners in their rows. For comparison, it also shows the most-frequent-sense baseline (MFB) and the performance by a state-of-the-art unsupervised system (Mihalcea, 2003) (M03), evaluated on the same dataset. Accuracies are expressed in percentages. for an unsupervised WSD system to beat. Considering the results were obtained completely automatically 8, sense signatures look very promising as an adjunct or replacement to the expensive hand-tagged training examples for WSD. Note that the Naïve Bayes algorithms outperformed other competitors. The Naïve Bayes algorithm with kernel estimation [John and Langley, 1995] was the best algorithm overall. The NBk algorithm, trained on our sense signatures, also outperformed the unsupervised WSD system of Mihalcea [2003] 9, which was trained on a collection of feature sets, consisting of a number of context words in a window surrounding the monosemous equivalents of senses of the 6 words. One of the strengths of our approach is that training data come cheaply and relatively easily. However, the sense signatures are acquired automatically and they inevitably con- 8 Sense signatures for all senses of these 6 words are acquired completely automatically, without human intervention. 9 According to Mihalcea s paper, the average accuracy of her system is 76.6%, as shown in Table 2. But our re-calculation, based on her reported results, is 74.4%. It might be an error on either side.

WSD Using Sense Signatures Page : 9 0.9 0.85 0.8 0.75 0.7 0.65 bass 0 50 100 150 200 250 300 350 400 0.9 0.85 0.8 0.75 0.7 0.65 crane 0 50 100 150 200 250 300 350 400 0.9 0.85 0.8 0.75 0.7 0.65 motion 0 200 400 600 800 1000 1200 1400 0.9 0.85 0.8 0.75 0.7 0.65 palm 0 50 100 150 200 250 300 350 400 0.9 0.85 0.8 0.75 0.7 0.65 plant 0 200 400 600 800 1000 1200 1400 0.9 0.85 0.8 0.75 0.7 0.65 tank 0 200 400 600 800 1000 1200 1400 Figure 2: Accuracy scores with size of training sense signatures going up. Each bar is 1 standard deviation. tain a certain amount of noise, which may cause problems for the classifier. To assess the relationship between accuracy and the size of training data, we carried out a series of experiments, feeding the classifier with different numbers of sense signatures. For all of the 6 words, we did the following: taken a word w i, we randomly selected n sense signatures for each of its senses s i, from the total amount of sense signatures built for s i (shown in the 3rd column in Table 1). Then the NBk algorithm was trained on the 2 n signatures and tested on w i s test instances. We recorded the accuracy and repeated this process 200 times and calculated the mean and variance of the 200 accuracies. Then we assigned another value to n and iterated the above process until n took all the predefined values. In our experiments, n was taken from {50, 100, 150, 200, 400, 600, 800, 100, 1200 } for words motion, plant and tank and from {50, 100, 150, 200, 250, 300, 350 } for bass, crane and palm, because there were less sense signature data available for the latter three words. Finally, we used the t-test (p = 0.05) on pairwise sets of means and variances to see if improvements are statistically significant. The results are shown in Figure 2. 31 out of 39 t-scores are greater than the t-test critical values, so we are fairly confident that the more training sense signatures used, the more accurate the NBk classifier becomes on this disambiguation task. Another factor that can influence WSD performance is the distribution of training data for each sense. In our first series of experiments (Table 1), we used the natural distribution

WSD Using Sense Signatures Page : 10 of the senses appearing in Chinese news text 10. In our second series of experiments, ie., the t-test experiments (Figure 2), we used equal amounts of sense signatures for each sense of the words. We predict that the system should yield best performance when the distribution of the amounts of training data for senses approximates the natural distribution of sense occurrence in English text. However, this is probably difficult to verify since an unsupervised system has no a priori information about the natural distributions of senses. 3.3 WSD Experiments on Senseval-2 Lexical Sample dataset Although our WSD system performed very well on binary sense disambiguation, it is not obvious that it should perform reasonably when attempting to make finer-grained sense distinctions. Therefore, we carried out a larger scale evaluation on the standard English Senseval-2 Lexical Sample Dataset. This dataset consists of manually sense-tagged training and test instances for nouns, adjectives and verbs. We only tested our system on nouns and adjectives because verbs often have finer sense distinctions, which would mean more manual work would need to be done when mapping WordNet synsets to English-Chinese dictionary glosses. This would involve us in a rather different kind of enterprise since we would have moved from a near-unsupervised to a more supervised setup. To assess the influence of ambiguous Chinese translations, we prepared two sets of training data. As described in section 3.1, sense signatures in the first set were prepared without taking ambiguity in Chinese text into consideration, while those in the second set were prepared with a little more human effort involved trying to reduce ambiguity by using less ambiguous translations. We call the system trained on the first set Sys A and the one trained on the second Sys B. To deal with multiwords, we implemented a very simple detection module, based on WordNet entries. The results are shown in Table 3. Our Sys B system, with and without the multiword detection module, outperformed Sys A, which shows that sense signatures acquired with less ambiguous Chinese translations contain less noise and therefore boost WSD performance. To compare against, the table also shows various baseline performances and a system that participated in Senseval-2 11. Considering that the manual work involved in our approach is neglectable compared with manual sense-tagging, we classify our systems as unsupervised and we should aim to beat the random baseline. This all four of our systems do easily. We also easily beat another unsupervised baseline the Lesk [1986] baseline, which disambiguates words using WordNet definitions. The MFB baseline is actually a supervised baseline, since an unsupervised system doesn t have such prior knowledge beforehand; Sys B with multiword detection exceeded it. Sys B also exceeds the performance of UNED [Fernández-Amorós et al., 2001], which was one of the top-ranked 12 10 This is not strictly true because we limited the training sense signatures to 6000. 11 Accuracies for each word and in average were calculated by us, based on the information on Senseval-2 Website. See: http://www.sle.sharp.co.uk/senseval2/. 12 One system performed better but their answers were not on the official Senseval-2 website so that we couldn t do the comparison. Also, this system didn t attempt to disambiguate as many words as UNED and us.

WSD Using Sense Signatures Page : 11 Sys A Sys B Baselines & A Senseval-2 Entry Word Basic MW Basic MW RB MFB Lesk(U) UNED art-n(5) authority-n(7) bar-n(13) blind-a(3) bum-n(4) chair-n(4) channel-n(7) child-n(4) church-n(3) circuit-n(6) colourless-a(2) cool-a(6) day-n(9) detention-n(2) dyke-n(2) facility-n(5) faithful-a(3) fatigue-n(4) feeling-n(6) fine-a(9) fit-a(3) free-a(8) graceful-a(2) green-a(7) grip-n(7) hearth-n(3) holiday-n(2) lady-n(3) local-a(3) material-n(5) mouth-n(8) nation-n(3) natural-a(10) nature-n(5) oblique-a(2) post-n(8) restraint-n(6) sense-n(5) simple-a(7) solemn-a(2) spade-n(3) stress-n(5) vital-a(4) yew-n(2) 29.9 20.7 41.1 74.5 60.0 81.2 31.5 56.3 53.1 48.2 45.7 26.9 32.2 62.5 85.7 20.7 69.6 76.7 11.7 8.6 44.8 29.3 58.6 53.2 35.3 46.9 64.5 69.2 36.8 39.1 38.3 35.1 14.6 23.9 72.4 34.7 6.7 20.7 45.5 64.0 66.7 37.5 42.1 21.4 51.0 22.8 48.3 74.5 60.0 82.6 35.6 56.3 56.3 68.2 45.7 26.9 32.9 84.4 85.7 22.4 69.6 74.4 11.7 11.4 44.8 37.8 58.6 58.5 37.3 50.0 74.2 73.6 42.1 44.9 41.7 35.1 32.0 26.1 72.4 45.6 6.7 43.4 45.5 76.0 63.6 37.5 42.1 25.0 39.8 21.5 44.7 74.5 64.4 80.0 32.4 56.3 59.4 48.8 66.7 50.9 32.4 60.6 82.8 27.1 66.7 77.3 50.0 34.3 44.8 37.3 70.0 53.2 35.3 48.5 64.5 71.7 38.5 47.8 38.3 39.5 14.6 27.7 72.4 34.7 17.4 25.9 49.3 73.1 67.6 45.0 41.0 82.8 59.6 23.7 52.0 75.0 62.2 82.9 36.5 56.3 59.4 69.8 69.4 50.9 33.1 84.8 86.2 28.8 66.7 77.3 50.0 32.9 44.8 48.2 73.3 58.5 37.3 51.5 75.0 77.8 43.6 49.3 41.7 39.5 34.0 31.9 73.3 45.6 19.6 46.3 50.7 76.9 70.6 45.0 46.2 89.7 16.3 10.0 3.3 40.0 15.6 23.2 12.3 18.8 29.7 10.6 42.9 13.5 7.6 43.8 28.6 13.8 21.7 25.6 9.8 7.1 31.0 15.9 62.1 21.3 19.6 31.2 38.7 28.3 26.3 10.1 11.7 21.6 6.8 15.2 44.8 10.1 11.1 24.5 13.6 32.0 18.2 12.8 21.1 57.1 41.8 39.1 38.4 78.2 68.9 76.8 13.7 54.7 56.2 27.1 65.7 46.2 60.0 62.5 53.6 48.3 78.3 76.7 56.9 42.9 58.6 35.4 79.3 75.5 35.3 71.9 77.4 64.2 55.3 20.3 36.7 78.4 27.2 45.7 69.0 31.6 28.9 24.5 51.5 96.0 63.6 48.7 92.1 78.6 16.3 30.4 2.0 32.7 53.3 56.5 21.9 56.2 45.3 5.9 54.3 9.6 0 43.8 57.1 46.6 26.1 44.2 2.0 5.7 3.4 7.3 72.4 10.6 17.6 81.2 29.0 50.9 31.6 44.9 31.7 18.9 6.8 41.3 72.4 6.3 28.9 24.5 12.1 24.0 60.6 2.6 0 17.9 50.0 34.8 27.8 74.5 11.1 81.2 17.8 43.8 62.5 55.3 31.4 46.2 20.0 78.1 35.7 25.9 78.3 86.0 60.8 44.3 48.3 35.4 79.3 78.7 21.6 65.6 54.8 58.5 34.2 53.6 48.3 70.3 44.7 23.9 27.6 41.8 17.8 30.2 51.5 96.0 54.5 20.5 94.7 71.4 Avg. 40.7 46.0 46.1 52.0 18.1 50.5 24.6 46.4 Table 3: WSD accuracy on words in the English Senseval-2 Lexical Sample dataset. The left most column shows words, their POS tags and how many senses they have. Sys A and Sys B are our systems, and MW denotes a multi-word detection module was used in conjunction with the Basic system. For comparison, it also shows two baselines: RB is the random baseline and MFB is the most-frequent-sense baseline. UNED is the one of the best unsupervised participants in the Senseval-2 competition and Lesk(U) is the highest unsupervised-baseline set in the workshop. All accuracies are expressed in percentages. unsupervised systems in the Senseval-2 competition. There are a number of factors that can influence WSD performance. As mentioned above, the distribution of training data for senses is one. In our experiments, we used all sense signatures that we built for a sense (an upper bound of 6000). However, the distribution of senses in English text often doesn t match the distribution of their corresponding Chinese translations in Chinese text. For example, suppose an English word w has two senses: s 1 and s 2, where s 1 rarely occurs in English text, whereas sense s 2 is used frequently. Also suppose s 1 s Chinese translation is much more frequently used than s 2 s translation in Chinese text. Thus, the distribution of the two senses in English is different from that of

WSD Using Sense Signatures Page : 12 the translations in Chinese. As a result, the numbers of sense signatures we would acquire for the two senses would be distributed as if they were in Chinese text. A classifier trained on this data would then tend to predict unseen test instances in favour of the wrong distribution. The word nation, for example, has three senses, of which the country sense is used more frequently in English. However, in Chinese, the country sense and the people sense are almost equally distributed, which might be the reason for its WSD accuracy being lower with our systems than most of the other words. A possible way to alleviate this problem is to select training sense signatures according to an estimated distribution in natural English text, which can be done by analysing available sense-tagged corpora with help of smoothing techniques, or with the unsupervised approach of [McCarthy et al., 2004]. Cultural differences can cause difficulty in retrieving sufficient training data. For example, translations of senses of church and hearth appear only infrequently in Chinese text. Thus, it s hard to build sense signatures for these words. Another problem, as mentioned above, is that translations of English senses can be ambiguous in Chinese. For example, Chinese translations of the words vital, natural, local etc. are also ambiguous to some extent, and this might be a reason of their low performance. One way to solve this, as we described, is to manually check the translations. Another automatic way is that, before retrieving text snippets, we could segment or even parse the Chinese corpora, which should reduce the level of ambiguity and lead to better sense signatures. 4 Conclusions and Future Work We have presented WSD systems that use sense signatures as training data. Sense signatures are acquired automatically from large quantities of Chinese text, with the help of Chinese-English and English-Chinese dictionaries. We have tested our WSD systems on the TWA binary sense-tagged dataset and the English Senseval-2 Lexical Sample dataset, and it outperformed comparable state-of-the-art unsupervised systems. Also, we found that increasing the size of the sense signatures significantly improved WSD performance. Since sense signatures can be obtained very cheaply from any large Chinese text collection, including the Web, our approach is a way to tackle the knowledge acquisition bottleneck. There are a number of future directions we could investigate. Firstly, instead of using a bilingual dictionary to translate Chinese text snippets back to English, we could use machine translation software. Secondly, we could try this approach on other language pairs, Japanese-English, for example. This is also a possible solution to the problem that ambiguity may be preserved between Chinese and English. In other words, when a Chinese translation of an English sense is still ambiguous, we could try to collect sense signatures using translation in a third language, Japanese, for instance. Thirdly, it would be interesting to try to tackle the problem of Chinese WSD using sense signatures built using English, the reverse process to the one described in this paper.

WSD Using Sense Signatures Page : 13 References [Agirre et al., 2001] Eneko Agirre, Olatz Ansa, David Martinez, and Eduard Hovy. Enriching WordNet concepts with topic signatures. In Proceedings of the NAACL workshop on WordNet and Other Lexical Resources: Applications, Extensions and Customizations, 2001. Pittsburgh, USA. [Diab and Resnik, 2002] Mona Diab and Philip Resnik. An unsupervised method for word sense tagging using parallel corpora. In Proceedings of the 40 th Anniversary Meeting of the Association for Computational Linguistics (ACL-02), 2002. Philadelphia, USA. [Fernández-Amorós et al., 2001] David Fernández-Amorós, Julio Gonzalo, and Felisa Verdejo. The UNED systems at Senseval-2. In Poceedings of Second International Wordshop on Evaluating Word Sense Disambiguation Systems (SENSEVAL-2), 2001. Toulouse, France. [Gale et al., 1992] William A. Gale, Kenneth W. Church, and David Yarowsky. Using bilingual materials to develop word sense disambiguation methods. In Proceedings of the International Conference on Theoretical and Methodological Issues in Machine Translation, pages 101 112, 1992. [John and Langley, 1995] George H. John and Pat Langley. Estimating continuous distributions in Bayesian classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, pages 338 345, 1995. [Kilgarriff, 2001] Adam Kilgarriff. English lexical sample task description. In Proceedings of Second International Workshop on Evaluating Word Sense Disambiguation Systems (SENSEVAL-2), 2001. Toulouse, France. [Lesk, 1986] Michael E. Lesk. Automated sense disambiguation using machine-readable dictionaries: how to tell a pinecone from an ice cream cone. In Proceedings of the SIGDOC Conference, 1986. [Lin and Hovy, 2000] Chin-Yew Lin and Eduard Hovy. The automated acquisition of topic signatures for text summarization. In Proceedings of the COLING Conference, 2000. Strasbourg, France. [McCarthy et al., 2004] Diana McCarthy, Rob Koeling, Julie Weeds, and John Carroll. Finding predominant word senses in untagged text. In Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, 2004. Barcelona, Spain. [Mihalcea, 2003] Rada Mihalcea. The role of non-ambiguous words in natural language disambiguation. In Proceedings of the Conference on Recent Advances in Natural Language Processing, RANLP 2003, 2003. Borovetz, Bulgaria.

WSD Using Sense Signatures Page : 14 [Miller et al., 1990] George A. Miller, Richard Beckwith, Christiane Fellbaum, Derek Gross, and Katherine J. Miller. Introduction to WordNet: An on-line lexical database. Journal of Lexicography, 3(4):235 244, 1990. [Miller et al., 1993] G. Miller, C. Leacock, R. Tengi, and T. Bunker. A sementic concordance. In Proceedings of ARPA Workshop on Human Language Technology, 1993. [Resnik and Yarowsky, 1999] Philip Resnik and David Yarowsky. Distinguishing systems and distinguishing senses: New evaluation methods for word sense disambiguation. Natural Language Engineering, 5(2):113 133, 1999. [Resnik, 1999] Philip Resnik. Mining the Web for bilingual text. In Proceedings of the 37 th Annual Meeting of the Association for Computational Linguistics, 1999. [Wang, 2004] Xinglong Wang. Automatic acquisition of English topic signatures based on a second language. In Proceedings of the Student Research Workshop at ACL 2004, 2004. Barcelona, Spain.