May All Your Wishes Come True: A Study of Wishes and How to Recognize Them

Similar documents
Generalizing Dependency Features for Opinion Mining

A Predictive Chronological Model of Multiple Clinical Observations T R A V I S G O O D W I N A N D S A N D A M. H A R A B A G I U

Introduction to Sentiment Analysis

Asthma Surveillance Using Social Media Data

An Intelligent Writing Assistant Module for Narrative Clinical Records based on Named Entity Recognition and Similarity Computation

Brendan O Connor,

Experiments with CST-based Multi-document Summarization

Rating prediction on Amazon Fine Foods Reviews

Sentiment Analysis of Reviews: Should we analyze writer intentions or reader perceptions?

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts

Confluence: Conformity Influence in Large Social Networks

Experimenting with Drugs (and Topic Models) Michael Paul and Mark Dredze Johns Hopkins University

Knowledge networks of biological and medical data An exhaustive and flexible solution to model life sciences domains

GIANT: Geo-Informative Attributes for Location Recognition and Exploration

Dr. Patrick Walsh's Guide To Surviving Prostate Cancer Books

DOWNLOAD OR READ : WHAT MAKES ME HAPPY PDF EBOOK EPUB MOBI

Reader s Emotion Prediction Based on Partitioned Latent Dirichlet Allocation Model

Exploiting Ordinality in Predicting Star Reviews

Wangari Maathai. Stop and think: What words would you use to describe Wangari? What words would you use to describe the women?

GfK Verein. Detecting Emotions from Voice

A Probability Puzzler. Statistics, Data and Statistical Thinking. A Probability Puzzler. A Probability Puzzler. Statistics.

American Cancer Society National Cancer Information Center (NCIC)

INTRODUCTION TO MACHINE LEARNING. Decision tree learning

This is the author s version of a work that was submitted/accepted for publication in the following source:

Discovering Symptom-herb Relationship by Exploiting SHT Topic Model

UKParl: A Data Set for Topic Detection with Semantically Annotated Text

Prediction of Malignant and Benign Tumor using Machine Learning

Life Groups Research

REWIRE WEEK 4 MODULE 7: FINDING YOUR LOCUS OF CONTROL

You can use this app to build a causal Bayesian network and experiment with inferences. We hope you ll find it interesting and helpful.

Risk Aversion in Games of Chance

How to Stop the Pattern of Self-Sabotage. By Ana Barreto

Case Studies of Signed Networks

rated sexy smart safe Women Sexy, Smart & Safe

THE ANALYTICS EDGE. Intelligence, Happiness, and Health x The Analytics Edge

Modeling Annotator Rationales with Application to Pneumonia Classification

A Comparison of Collaborative Filtering Methods for Medication Reconciliation

APR, MAY, JUNE 2013 COMMANDER S CORNER

Student Minds Turl Street, Oxford, OX1 3DH

Dividing Lines Within the Deaf Community. When some people think about dividing lines one might think about the lines

Media pack for secondary breast cancer campaigners

CHAPTER 2 MAMMOGRAMS AND COMPUTER AIDED DETECTION

Setting up a Local Group

Learning the Fine-Grained Information Status of Discourse Entities

The use of Topic Modeling to Analyze Open-Ended Survey Items

Susan G. Komen Race for the Cure. Saturday, June 21. Thank you for joining the St. Louis Blues & FSN Midwest Team!

My Creativity 1. Do I tend to do things in the accepted way or am I more creative? HDIFAT? 2. HDIF when my creative juices are flowing?

Causality and Statistical Learning

From Sentiment to Emotion Analysis in Social Networks

2 Franklin Street, Belfast, BT2 8DQ Equality Scheme

An INSIDE OUT Family Discussion Guide. Introduction.

Section 4 - Dealing with Anxious Thinking

CHAPTER to. Resting and Sleep

Key Conversation Trends and Patterns about Electronic Cigarettes on Social Media

arxiv: v1 [cs.lg] 4 Feb 2019

Examiners Report June GCSE Health and Social Care 1 5HS01 01

Curso de Capacitación Docente en Neurociencias Alumna: Maria Elena Zingoni

HRS 2008 SECTION D: COGNITION PAGE 1 ******************************************************************

Consumer Review Analysis with Linear Regression

Causality and Statistical Learning

Sergio Hernandez, MD. An Introduction to the Mind

Minnesota Cancer Alliance SUMMARY OF MEMBER INTERVIEWS REGARDING EVALUATION

Emotions and Stress. 6. Why is guilt a learned emotion?

Lab 4 (M13) Objective: This lab will give you more practice exploring the shape of data, and in particular in breaking the data into two groups.

Be Opened Mark 7, 31-37

Overcoming Depression By Paul A. Hauck READ ONLINE

Welcome to ASTAIR s Extended Donor Profile

BBC LEARNING ENGLISH Shakespeare Speaks

Lin-Manuel Miranda By Jessica McBirney 2017

YOUR EMOTIONAL FINGERPRINT

Data Mining in Bioinformatics Day 4: Text Mining

Placement test. 1. Hello. My name... Peter. Nice to meet you. a) are b) is c) be d) has

Tips and Tricks. Look inside for valuable information to help you engage in your most successful local fundraiser.

TAPPING THE K. E. G. S.

Regression CHAPTER SIXTEEN NOTE TO INSTRUCTORS OUTLINE OF RESOURCES

sdrescue.org (619)

Summarizing Drug Experiences with Multi- Dimensional Topic Models. Michael Paul and Mark Dredze Johns Hopkins University

Using Information From the Target Language to Improve Crosslingual Text Classification

Season 1. No Smoking. Study Guide

Season 1. No Smoking. Study Guide

Lesson 1: Gaining Influence and Respect

SOCIAL DETERMINANTS OF HEALTH: A COMPARATIVE APPROACH BY ALAN DAVIDSON

Living a Healthy Balanced Life Emotional Balance By Ellen Missah

MEASURES OF ASSOCIATION AND REGRESSION

Midterm project due next Wednesday at 2 PM

Pap Testing. Remember. Funding for development and printing of this pamphlet: NCI Drawings and artwork by: Budong Zhao

Why Is Mommy Like She Is?

You re number one! How to take care of yourself while taking care of others. Maternal Mental Health Forum January 26, 2016

An empirical evaluation of text classification and feature selection methods

Captioning Your Video Using YouTube Online Accessibility Series

Discovering Identity Problems: A Case Study

Visualizing the Affective Structure of a Text Document

Language Volunteer Guide

VIBRATIONAL HEALING FOR AUTISM: AKA AWESOME KIDS! BY TAMI DUNCAN

Difficult times, Brighter Futures. President's Message:

15.301/310, Managerial Psychology Prof. Dan Ariely Recitation 8: T test and ANOVA

Facial expression recognition with spatiotemporal local descriptors

Tapping World Summit 2009

Distillation of Knowledge from the Research Literatures on Alzheimer s Dementia

Transcription:

May All Your Wishes Come True: A Study of Wishes and How to Recognize Them Andrew B. Goldberg, Nathanael Fillmore, David Andrzejewski, Zhiting Xu, Bryan Gibson & Xiaojin Zhu Computer Sciences Department University of Wisconsin-Madison 1

Times Square Virtual Wishing Well In December 2007, Web users sent in their wishes for the new year Wishes were printed on confetti Released from the sky at midnight in sync with the famous ball drop Over 100,000 wishes collected to form the WISH corpus 2

Sample New Year s Wishes Freq. Wish Freq. Wish 514 peace on earth 8 i wish for a puppy 351 peace 7 for the war in iraq to end 331 world peace 6 peace on earth please 244 happy new year 5 a free democratic venezuela 112 love 5 may the best of 2007 be the worst of 2008 76 health and happiness 5 to be financially stable 75 to be happy 1 a little goodness for everyone would be nice 51 i wish for world peace 1 i hope i get accepted into a college that i like 21 i wish for health and happiness 1 i wish to get more sex in 2008 21 let there be peace on earth 1 please let name be healthy and live all year 16 to find my true love 1 to be emotionally stable and happy 3

What is a wish? 4

What is a wish? Formally: wish (n.) a desire or hope for something to happen 4

What is a wish? Formally: wish (n.) a desire or hope for something to happen Open questions in NLP: How are wishes expressed? How can wishful expressions be automatically recognized? 4

What is a wish? Formally: wish (n.) a desire or hope for something to happen Open questions in NLP: How are wishes expressed? How can wishful expressions be automatically recognized? Our work: Analyze this unique new collection of wishes Leverage the WISH corpus to build general wish detectors Demonstrate effectiveness on consumer product reviews and informal political discussion online 4

Outline 5

Outline Why study wishes? (relation to prior work) Sentiment analysis Psychology / cognitive science 5

Outline Why study wishes? (relation to prior work) Sentiment analysis Psychology / cognitive science Analysis of the WISH corpus Topic and scope of wishes Geographical differences Latent topic modeling 5

Outline Why study wishes? (relation to prior work) Sentiment analysis Psychology / cognitive science Analysis of the WISH corpus Topic and scope of wishes Geographical differences Latent topic modeling Building wish detectors Key contribution: Automatically discovering wish templates 5

Outline Why study wishes? (relation to prior work) Sentiment analysis Psychology / cognitive science Analysis of the WISH corpus Topic and scope of wishes Geographical differences Latent topic modeling Building wish detectors Key contribution: Automatically discovering wish templates Experimental results 5

Why study wishes? 6

Why study wishes? Wishes add a novel dimension to sentiment analysis, opinion mining What people explicitly want, not just what they like or dislike Great camera. Indoor shots with a flash are not quite as good as 35mm. I wish the camera had a higher optical zoom so that I could take even better wildlife photos. Automatic wish detector can provide political value & business intelligence 6

Why study wishes? Wishes add a novel dimension to sentiment analysis, opinion mining What people explicitly want, not just what they like or dislike Great camera. Indoor shots with a flash are not quite as good as 35mm. I wish the camera had a higher optical zoom so that I could take even better wildlife photos. Automatic wish detector can provide political value & business intelligence Wishes can reveal a lot about people Psychologists have studied wish content vs. location, gender, age, etc (Speer 1939, Milgram and Riedel 1969, Ehrlichman and Eichenstein 1992, King and Broyles 1997) WISH corpus: much larger scale, from the entire globe 6

The WISH corpus

Analysis of the WISH corpus 8

Analysis of the WISH corpus Almost 100,000 wishes collected over 10 days in December 2007 We focus on the 89,574 wishes written in English Remaining 10,000+ in Portuguese, Spanish, Chinese, French, etc 8

Analysis of the WISH corpus Almost 100,000 wishes collected over 10 days in December 2007 We focus on the 89,574 wishes written in English Remaining 10,000+ in Portuguese, Spanish, Chinese, French, etc Many contain optional state/country location entered by the wisher 8

Analysis of the WISH corpus Almost 100,000 wishes collected over 10 days in December 2007 We focus on the 89,574 wishes written in English Remaining 10,000+ in Portuguese, Spanish, Chinese, French, etc Many contain optional state/country location entered by the wisher Minimal preprocessing TreeBank tokenization, downcasing, punctuation removal 8

Analysis of the WISH corpus Almost 100,000 wishes collected over 10 days in December 2007 We focus on the 89,574 wishes written in English Remaining 10,000+ in Portuguese, Spanish, Chinese, French, etc Many contain optional state/country location entered by the wisher Minimal preprocessing TreeBank tokenization, downcasing, punctuation removal Each wish is treated as a single entity (even if multiple sentences) 8

Analysis of the WISH corpus Almost 100,000 wishes collected over 10 days in December 2007 We focus on the 89,574 wishes written in English Remaining 10,000+ in Portuguese, Spanish, Chinese, French, etc Many contain optional state/country location entered by the wisher Minimal preprocessing TreeBank tokenization, downcasing, punctuation removal Each wish is treated as a single entity (even if multiple sentences) Average length of wishes is 8 tokens 8

WISH corpus: Scope and topic of wishes Manually annotated random subsample of 5,000 wishes 9

WISH corpus: Scope and topic of wishes Manually annotated random subsample of 5,000 wishes Topic of wishes: what the wish is about career (4.6%) money (7.3%) peace (12%) sex (4.5%) religion (3.4%) political (2.8%) travel (1.4%) love (18%) happiness (15%) health (15%) other (16%) 9

WISH corpus: Scope and topic of wishes Manually annotated random subsample of 5,000 wishes Topic of wishes: what the wish is about career (4.6%) money (7.3%) peace (12%) sex (4.5%) religion (3.4%) political (2.8%) travel (1.4%) love (18%) happiness (15%) health (15%) other (16%) individual requests: i wish for a new puppy solicitations: call me 555-1234, visit website.com sinister: to take over the world 9

WISH corpus: Scope and topic of wishes Manually annotated random subsample of 5,000 wishes Topic of wishes: what the wish is about Scope of wishes: who the wish is aimed at career (4.6%) money (7.3%) peace (12%) sex (4.5%) religion (3.4%) political (2.8%) travel (1.4%) love (18%) world (23%) family (14%) name (14%) other (7.8%) country (5.8%) happiness (15%) health (15%) other (16%) self (36%) individual requests: i wish for a new puppy solicitations: call me 555-1234, visit website.com sinister: to take over the world 9

WISH corpus: Geographical differences 10

WISH corpus: Geographical differences About 4,000 of the manually annotated wishes included valid location information Covered all 50 U.S. states and all continents except Antarctica 10

WISH corpus: Geographical differences About 4,000 of the manually annotated wishes included valid location information Covered all 50 U.S. states and all continents except Antarctica We compared topic and scope distributions between U.S. and non-u.s. wishes 10

WISH corpus: Geographical differences About 4,000 of the manually annotated wishes included valid location information Covered all 50 U.S. states and all continents except Antarctica We compared topic and scope distributions between U.S. and non-u.s. wishes 0.2 0.1 0 love Topic other health happiness peace money career sex religion political travel US Non US 10

WISH corpus: Geographical differences About 4,000 of the manually annotated wishes included valid location information Covered all 50 U.S. states and all continents except Antarctica We compared topic and scope distributions between U.S. and non-u.s. wishes 0.2 0.1 0 Topic Scope US Non US love other health happiness peace money career sex religion political travel 0.4 US Non US 0.3 0.2 0.1 0 self world family name other country 10

WISH corpus: Geographical differences About 4,000 of the manually annotated wishes included valid location information Covered all 50 U.S. states and all continents except Antarctica We compared topic and scope distributions between U.S. and non-u.s. wishes Statistically significant differences in both cases (Pearson X 2 -test, p < 0.01) 0.2 0.1 0 0.3 0.2 Topic Scope US Non US love other health happiness peace money career sex religion political travel 0.4 US Non US 0.1 0 self world family name other country 10

WISH corpus: Geographical differences About 4,000 of the manually annotated wishes included valid location information Covered all 50 U.S. states and all continents except Antarctica We compared topic and scope distributions between U.S. and non-u.s. wishes Statistically significant differences in both cases (Pearson X 2 -test, p < 0.01) 0.2 0.1 0 0.3 0.2 Topic Scope US Non US love other health happiness peace money career sex religion political travel 0.4 US Non US But no significant difference between red vs. blue states 0.1 0 self world family name other country 10

WISH corpus: Latent topic modeling 11

WISH corpus: Latent topic modeling So far analysis was of 5,000 manually labeled wishes 11

WISH corpus: Latent topic modeling So far analysis was of 5,000 manually labeled wishes We automatically analyzed all ~90,000 using Latent Dirichlet Allocation Each wish is treated as a short document 12 topics Inference performed by collapsed Gibbs sampling Hyperparameters set to α=0.5, β=0.1 11

WISH corpus: Latent topic modeling Topic Top words, sorted by p(word topic) Subjective Label 1 year, new, happy, 2008, best, everyone, great, wishing, hope New Year 2 all, god, home, come, safe, us, bless, troops, bring, iraq, return Troops 3 end, no, more, 2008, war, president, paul, ron, less, bush, vote Election 4 more, better, life, one, live, time, make, people, than, day, every Life 5 health, happiness, good, family, friends, prosperity, wealth, success Prosperity 6 love, find, true, life, meet, want, man, marry, someone, boyfriend Love 7 get, job, out, hope, school, better, house, well, back, college Career 8 win, 2008, money, want, make, become, lottery, more, great, lots Money 9 peace, world, love, earth, happiness, everyone, joy, 2008, around Peace 10 love, forever, jesus, know, together, u, always, best, mom, christ Religion 11 healthy, family, baby, life, children, safe, husband, stay, marriage Family 12 me, lose, please, let, cancer, weight, cure, mom, mother, visit, dad Health 12

Building wish detectors

Building wish detectors Novel NLP task: Wish Detection Given sentence S, classify S as wish or non-wish 14

Building wish detectors Novel NLP task: Wish Detection Given sentence S, classify S as wish or non-wish Want an approach that will extend beyond New Year s wishes Target domains: product reviews, political discussions 14

Building wish detectors Novel NLP task: Wish Detection Given sentence S, classify S as wish or non-wish Want an approach that will extend beyond New Year s wishes Target domains: product reviews, political discussions Wishes are highly domain dependent New Year s eve: I wish for world peace Product review: I want to have instant access to the volume 14

Building wish detectors Novel NLP task: Wish Detection Given sentence S, classify S as wish or non-wish Want an approach that will extend beyond New Year s wishes Target domains: product reviews, political discussions Wishes are highly domain dependent New Year s eve: I wish for world peace Product review: I want to have instant access to the volume Initial study Assume some labeled data in target domains Try to beat some standard baselines by exploiting the WISH corpus to learn patterns of wish expressions (wish templates) 14

Two simple baseline wish detectors Do not use WISH corpus 15

Two simple baseline wish detectors Do not use WISH corpus Manual Rule-based classifier If part of a sentence matches a template, classify it as a wish Some of the 13 templates created by two native English speakers: i wish i hope i want hopefully if only would be better if would like if should Expect high precision, low recall 15

Two simple baseline wish detectors Do not use WISH corpus Manual Rule-based classifier If part of a sentence matches a template, classify it as a wish Some of the 13 templates created by two native English speakers: i wish i hope i want if only would be better if would like if Words Linear Support Vector Machine Train on labeled training set from the target domain Representation: binary word-indicator vector normalized to sum to 1 Natural first baseline for a new text classification task hopefully should Expect high precision, low recall Expect high recall, low precision 15

Learning wish templates Key idea: Exploit redundancy in how wishes are expressed 16

Learning wish templates Key idea: Exploit redundancy in how wishes are expressed Many entries in the WISH corpus contain only a short wish content world peace health and happiness 16

Learning wish templates Key idea: Exploit redundancy in how wishes are expressed Many entries in the WISH corpus contain only a short wish content world peace health and happiness These wish contents appear within longer wishes with a common prefix/suffix: i wish for world peace i wish for health and happiness 16

Learning wish templates Key idea: Exploit redundancy in how wishes are expressed Many entries in the WISH corpus contain only a short wish content world peace health and happiness These wish contents appear within longer wishes with a common prefix/suffix: i wish for world peace i wish for health and happiness Intuitively, popular content appears within popular templates. 16

Learning wish templates Key idea: Exploit redundancy in how wishes are expressed Many entries in the WISH corpus contain only a short wish content world peace health and happiness These wish contents appear within longer wishes with a common prefix/suffix: i wish for world peace i wish for health and happiness Intuitively, popular content appears within popular templates. Can discover non-obvious templates, too: world peace, peace on earth let there be become rich, win the lottery to finally get a job, save the environment please 16

Learning wish templates Formally, we build a bipartite graph Two kinds of nodes: Content nodes c C on left, Template nodes t T on right Two kinds of edges: c t (weighted by # times content appears in the template) t c (weighted by # times template matches a content node) Content Templates 17

Learning wish templates Formally, we build a bipartite graph Two kinds of nodes: Content nodes c C on left, Template nodes t T on right Two kinds of edges: c t (weighted by # times content appears in the template) t c (weighted by # times template matches a content node) Content world peace c1 Templates 17

Learning wish templates Formally, we build a bipartite graph Two kinds of nodes: Content nodes c C on left, Template nodes t T on right Two kinds of edges: c t (weighted by # times content appears in the template) t c (weighted by # times template matches a content node) Content world peace c1 t1 Templates i wish for 17

Learning wish templates Formally, we build a bipartite graph Two kinds of nodes: Content nodes c C on left, Template nodes t T on right Two kinds of edges: c t (weighted by # times content appears in the template) t c (weighted by # times template matches a content node) Content world peace c1 #( i wish for world peace ) t1 Templates i wish for 17

Learning wish templates Formally, we build a bipartite graph Two kinds of nodes: Content nodes c C on left, Template nodes t T on right Two kinds of edges: Content world peace health and happiness c t (weighted by # times content appears in the template) t c (weighted by # times template matches a content node) c1 c2 #( i wish for world peace ) t1 Templates i wish for 17

Learning wish templates Formally, we build a bipartite graph Two kinds of nodes: Content nodes c C on left, Template nodes t T on right Two kinds of edges: Content world peace health and happiness c t (weighted by # times content appears in the template) t c (weighted by # times template matches a content node) c1 c2 #( i wish for world peace ) #( i wish for health and... ) t1 Templates i wish for 17

Learning wish templates Formally, we build a bipartite graph Two kinds of nodes: Content nodes c C on left, Template nodes t T on right Two kinds of edges: Content world peace health and happiness c t (weighted by # times content appears in the template) t c (weighted by # times template matches a content node) c1 c2 #( i wish for world peace ) #( i wish for health and... ) t1 Templates i wish for health c 3 17

Learning wish templates Formally, we build a bipartite graph Two kinds of nodes: Content nodes c C on left, Template nodes t T on right Two kinds of edges: Content world peace health and happiness c1 c2 health c 3 c t (weighted by # times content appears in the template) t c (weighted by # times template matches a content node) #( i wish for world peace ) #( i wish for health and... ) #( i wish for health ) t1 Templates i wish for 17

Learning wish templates Formally, we build a bipartite graph Two kinds of nodes: Content nodes c C on left, Template nodes t T on right Two kinds of edges: Content world peace health and happiness c1 c2 health c 3 c t (weighted by # times content appears in the template) t c (weighted by # times template matches a content node) #( i wish for world peace ) #( i wish for health and... ) #( i wish for health ) t1 t2 Templates i wish for and happiness 17

Learning wish templates Formally, we build a bipartite graph Two kinds of nodes: Content nodes c C on left, Template nodes t T on right Two kinds of edges: Content world peace health and happiness c1 c2 health c 3 c t (weighted by # times content appears in the template) t c (weighted by # times template matches a content node) #( i wish for world peace ) #( i wish for health and... ) #( i wish for health ) #( health and happiness ) t1 t2 Templates i wish for and happiness 17

Learning wish templates Formally, we build a bipartite graph Two kinds of nodes: Content nodes c C on left, Template nodes t T on right Two kinds of edges: c t (weighted by # times content appears in the template) t c (weighted by # times template matches a content node) Content world peace c1 #( i wish for world peace ) Templates health and happiness c2 health c 3 #( i wish for health and... ) #( i wish for health ) #( health and happiness ) #( health and happiness ) t1 t2 i wish for and happiness 17

Ranking template nodes world peace c1 #( i wish for world peace ) health and happiness c2 health c 3 #( i wish for health and... ) #( i wish for health ) #( health and happiness ) #( health and happiness ) t1 t2 i wish for and happiness 18

Ranking template nodes world peace c1 #( i wish for world peace ) health and happiness c2 health c 3 #( i wish for health and... ) #( i wish for health ) #( health and happiness ) #( health and happiness ) t1 t2 i wish for and happiness Useful templates match many complete wishes but few content-only wishes 18

Ranking template nodes world peace c1 #( i wish for world peace ) health and happiness c2 health c 3 #( i wish for health and... ) #( i wish for health ) #( health and happiness ) #( health and happiness ) t1 t2 i wish for and happiness Useful templates match many complete wishes but few content-only wishes We rank all template nodes t by score(t) = in(t) - out(t) 18

Ranking template nodes world peace c1 #( i wish for world peace ) health and happiness c2 health c 3 #( i wish for health and... ) #( i wish for health ) #( health and happiness ) #( health and happiness ) t1 t2 i wish for and happiness Useful templates match many complete wishes but few content-only wishes We rank all template nodes t by score(t) = in(t) - out(t) Subtracting the out-degree eliminates bad templates that contain specific topical content (e.g., and happiness ) 18

Ranking template nodes world peace c1 #( i wish for world peace ) health and happiness c2 health c 3 #( i wish for health and... ) #( i wish for health ) #( health and happiness ) #( health and happiness ) t1 t2 i wish for and happiness Useful templates match many complete wishes but few content-only wishes We rank all template nodes t by score(t) = in(t) - out(t) Subtracting the out-degree eliminates bad templates that contain specific topical content (e.g., and happiness ) Apply threshold score(t) 5 to obtain 811 top templates for use as features 18

Wish template features Some of the top 811 template features selected by our algorithm Top 10 Others in Top 200 in 2008 i want to i wish for for everyone i wish i hope i want my wish is i want my please this year wishing for i wish in 2008 may you i wish to i wish i had i wish this year to finally in the new year for my family to have 19

Learning with wish template features 20

Learning with wish template features We use the templates as features for classification in target domains 20

Learning with wish template features We use the templates as features for classification in target domains Each template leads to 2 features depending on level of matching in sentence: Whole-sentence match: i wish this mp3 player had more storage Partial-sentence match: most of all i wish this camera was smaller 20

Learning with wish template features We use the templates as features for classification in target domains Each template leads to 2 features depending on level of matching in sentence: Whole-sentence match: i wish this mp3 player had more storage Partial-sentence match: most of all i wish this camera was smaller Models using templates: [Templates] uses only these features in a linear SVM [Words+Templates] combines unigram and template features in a linear SVM 20

Test corpora 21

Test corpora Recall goal of discovering wishes in interesting text domains 21

Test corpora Recall goal of discovering wishes in interesting text domains Two test corpora, manually labeled sentences as wish vs. non-wish 21

Test corpora Recall goal of discovering wishes in interesting text domains Two test corpora, manually labeled sentences as wish vs. non-wish Consumer product reviews 1,235 sentences from amazon.com and cnet.com reviews (selected from data used in Hu and Liu, 2004; Ding et al., 2008) 12% wishes 21

Test corpora Recall goal of discovering wishes in interesting text domains Two test corpora, manually labeled sentences as wish vs. non-wish Consumer product reviews 1,235 sentences from amazon.com and cnet.com reviews (selected from data used in Hu and Liu, 2004; Ding et al., 2008) 12% wishes Political discussion board postings 6,379 sentences selected from politics.com (Mullen and Malouf, 2008). 34% wishes 21

Test corpora Recall goal of discovering wishes in interesting text domains Two test corpora, manually labeled sentences as wish vs. non-wish Consumer product reviews 1,235 sentences from amazon.com and cnet.com reviews (selected from data used in Hu and Liu, 2004; Ding et al., 2008) 12% wishes Political discussion board postings 6,379 sentences selected from politics.com (Mullen and Malouf, 2008). 34% wishes Download from http://pages.cs.wisc.edu/~goldberg/wish_data 21

Experimental results 10-fold cross validation, linear classifier (SVM light using default parameters) 22

Experimental results 10-fold cross validation, linear classifier (SVM light using default parameters) Precision 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 Politics 0.2 Manual Words 0.1 Templates Words + Templates 0 0 0.2 0.4 0.6 0.8 1 Recall 22

Experimental results 10-fold cross validation, linear classifier (SVM light using default parameters) 1 0.9 0.8 Politics 1 0.9 0.8 Products Precision 0.7 0.6 0.5 0.4 0.3 0.2 Manual Words 0.1 Templates Words + Templates 0 0 0.2 0.4 0.6 0.8 1 Recall Precision 0.7 0.6 0.5 0.4 0.3 0.2 Manual Words 0.1 Templates Words + Templates 0 0 0.2 0.4 0.6 0.8 1 Recall 22

Experimental results 10-fold cross validation, linear classifier (SVM light using default parameters) 1 0.9 0.8 Politics 1 0.9 0.8 Products Precision 0.7 0.6 0.5 0.4 0.3 Precision 0.7 0.6 0.5 0.4 0.3 AUC 0.2 Manual Words 0.1 Templates Words + Templates 0 0 0.2 0.4 0.6 0.8 1 Recall Corpus Manual Words Templates 0.2 Manual Words 0.1 Templates Words + Templates 0 0 0.2 0.4 0.6 0.8 1 Recall Words + Templates Politics 0.67 ± 0.03 0.77 ± 0.03 0.73 ± 0.03 0.80 ± 0.03 Products 0.49 ± 0.13 0.52 ± 0.16 0.47 ± 0.16 0.56 ± 0.16 22

What features are important? Features with largest magnitude weights for one fold of the Products corpus Sign Words Templates Words + Templates + wish i hope hoping + hope i wish i hope + hopefully hoping i just want + hoping i just want i wish + want i would like i would like - money family micro - find forever about - digital let me fix - again d digital - you for my dad you 23

Conclusions & Future Work 24

Conclusions & Future Work Studied wishes from an NLP perspective for the first time 24

Conclusions & Future Work Studied wishes from an NLP perspective for the first time Introduced and analyzed the WISH corpus of ~90,000 wishes 24

Conclusions & Future Work Studied wishes from an NLP perspective for the first time Introduced and analyzed the WISH corpus of ~90,000 wishes Proposed new wish detection task and simple baselines 24

Conclusions & Future Work Studied wishes from an NLP perspective for the first time Introduced and analyzed the WISH corpus of ~90,000 wishes Proposed new wish detection task and simple baselines Generated wish templates that transfer to new domains 24

Conclusions & Future Work Studied wishes from an NLP perspective for the first time Introduced and analyzed the WISH corpus of ~90,000 wishes Proposed new wish detection task and simple baselines Generated wish templates that transfer to new domains Built wish-annotated test corpora in product review and political domains 24

Conclusions & Future Work Studied wishes from an NLP perspective for the first time Introduced and analyzed the WISH corpus of ~90,000 wishes Proposed new wish detection task and simple baselines Generated wish templates that transfer to new domains Built wish-annotated test corpora in product review and political domains Much future work in wish detection remains: Additional wish-sensitive features Annotated training data is expensive semi-supervised learning 24

Acknowledgements We d like to thank: Times Square Alliance for providing the WISH corpus Wisconsin Alumni Research Foundation Yahoo! Key Technical Challenges Program & you! Download test corpora at http://pages.cs.wisc.edu/~goldberg/wish_data 25