Real-time Summarization Track

Similar documents
CSCI 5417 Information Retrieval Systems. Jim Martin!

Multi-modal Patient Cohort Identification from EEG Report and Signal Data

Evaluation & Systems. Ling573 Systems & Applications April 9, 2015

Query Refinement: Negation Detection and Proximity Learning Georgetown at TREC 2014 Clinical Decision Support Track

Monitoring of Epidemic Using Spatial Technology

RELEVANCE JUDGMENTS EXCLUSIVE OF HUMAN ASSESSORS IN LARGE SCALE INFORMATION RETRIEVAL EVALUATION EXPERIMENTATION

This is the author s version of a work that was submitted/accepted for publication in the following source:

Chapter IR:VIII. VIII. Evaluation. Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing

Detecting and monitoring foodborne illness outbreaks: Twitter communications and the 2015 U.S. Salmonella outbreak linked to imported cucumbers

Filippo Chiarello, Andrea Bonaccorsi, Gualtiero Fantoni, Giacomo Ossola, Andrea Cimino and Felice Dell Orletta

Microblog Retrieval for Disaster Relief: How To Create Ground Truths? Ribhav Soni and Sukomal Pal

Asthma Surveillance Using Social Media Data

Sub-Topic Classification of HIV related Opportunistic Infections. Miguel Anderson and Joseph Fonseca

Improving Passage Retrieval Using Interactive Elicition and Statistical Modeling

Open Range Software, LLC

Data Mining in Bioinformatics Day 4: Text Mining

When Twitter meets Foursquare: Tweet Location Prediction using Foursquare

Construction of the EEG Emotion Judgment System Using Concept Base of EEG Features

mpaceline for Peloton Riders User Guide

A Unified Approach to Ranking in Probabilistic Databases. Jian Li, Barna Saha, Amol Deshpande University of Maryland, College Park, USA

Alcohol interventions in secondary and further education

Comparing and Combining Methods for Automatic Query Expansion

My Fitness Pal Health & Fitness Tracker A User s Guide

Mathematical Modeling of Trending Topics on Twitter

An Analysis of Assessor Behavior in Crowdsourced Preference Judgments

Review: Logistic regression, Gaussian naïve Bayes, linear regression, and their connections

EEG Signal Description with Spectral-Envelope- Based Speech Recognition Features for Detection of Neonatal Seizures

Paul Bennett, Microsoft Research (CLUES) Joint work with Ben Carterette, Max Chickering, Susan Dumais, Eric Horvitz, Edith Law, and Anton Mityagin.

NTT DOCOMO Technical Journal

MedStar Digital Task Management System User Guide for Stakeholders

Loose Tweets: An Analysis of Privacy Leaks on Twitter

Approximating hierarchy-based similarity for WordNet nominal synsets using Topic Signatures

Building a Large-scale Corpus for Evaluating Event Detection on Twitter

Studying the Dark Triad of Personality through Twitter Behavior

Deep learning and non-negative matrix factorization in recognition of mammograms

Analyzing Spammers Social Networks for Fun and Profit

Rumor Detection on Twitter with Tree-structured Recursive Neural Networks

Nature Neuroscience: doi: /nn Supplementary Figure 1. Behavioral training.

Implications of Inter-Rater Agreement on a Student Information Retrieval Evaluation

Applications. DSC 410/510 Multivariate Statistical Methods. Discriminating Two Groups. What is Discriminant Analysis

Towards a Data-driven Approach to Identify Crisis-Related Topics in Social Media Streams

CHAPTER IV PREPROCESSING & FEATURE EXTRACTION IN ECG SIGNALS

Tom Hilbert. Influenza Detection from Twitter

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation

UCLA at TREC 2014 Clinical Decision Support Track: Exploring Language Models, Query Expansion, and Boosting

Information Retrieval from Electronic Health Records for Patient Cohort Discovery

A step towards real-time analysis of major disaster events based on tweets

Gene-microRNA network module analysis for ovarian cancer

Outline General overview of the project Last 6 months Next 6 months. Tracking trends on the web using novel Machine Learning methods.

The Impact of Relevance Judgments and Data Fusion on Results of Image Retrieval Test Collections

Zainab M. AlQenaei. Dissertation Defense University of Colorado at Boulder Leeds School of Business Operations and Information Management Division

Towards Real Time Epidemic Vigilance through Online Social Networks

EpiDash A GI Case Study

An unsupervised machine learning model for discovering latent infectious diseases using social media data

Automatic coding of death certificates to ICD-10 terminology

NLM at TREC 2012 Medical Records track

USING SOCIAL MEDIA TO STUDY PUBLIC HEALTH MICHAEL PAUL JOHNS HOPKINS UNIVERSITY MAY 29, 2014

A Meme is not a Virus:

On the Combination of Collaborative and Item-based Filtering

Recipe Recommendation Method by Considering the User s Preference and Ingredient Quantity of Target Recipe

User Instruction Guide

Part IV: Interim Assessment Hand Scoring System

Discovering and Understanding Self-harm Images in Social Media. Neil O Hare, MFSec Bucharest, Romania, June 6 th, 2017

Computing Interdisciplinarity of Scholarly Objects using an Author-Citation-Text Model

TWO HANDED SIGN LANGUAGE RECOGNITION SYSTEM USING IMAGE PROCESSING

Improving Foodborne Complaint and Outbreak Detection Using Social Media, New York City

Smoking Cessation Causes: Contrasting Evidence from Social Media and Online Forums

Comparative Study of K-means, Gaussian Mixture Model, Fuzzy C-means algorithms for Brain Tumor Segmentation

The Effect of Inter-Assessor Disagreement on IR System Evaluation: A Case Study with Lancers and Students

Semi-Automatic Construction of Thyroid Cancer Intervention Corpus from Biomedical Abstracts

The NIH Biosketch. February 2016

SUPPLEMENTAL MATERIAL

Brendan O Connor,

Best Practice Title: Reporting Programmatic And Repetitive Noncompliances in NTS and SSIMS

Overview of the FIRE 2018 track: Information Retrieval from Microblogs during Disasters (IRMiDis)

Classification of Local Language Disaster Related Tweets in Micro Blogs

ADVANCED VBA FOR PROJECT FINANCE Near Future Ltd. Registration no

What Happened to Bob? Semantic Data Mining of Context Histories

Lab 8: Multiple Linear Regression

Identification of Emergency Blood Donation Request on Twitter

Query Length Impact on Misuse Detection in Information Retrieval Systems

Convolutional Neural Networks for Text Classification

The Effect of Threshold Priming and Need for Cognition on Relevance Calibration and Assessment

Introduction to Machine Learning. Katherine Heller Deep Learning Summer School 2018

Run Time Tester Requirements Document

Counteracting Novelty Decay in First Story Detection

A MODIFIED FREQUENCY BASED TERM WEIGHTING APPROACH FOR INFORMATION RETRIEVAL

Stacked Gender Prediction from Tweet Texts and Images

Similarity Analysis of Legal Judgments and applying Paragraph-link to Find Similar Legal Judgments.

Mobile Health Surveillance: The Development of Software Tools for. Monitoring the Spread of Disease

Memory-Augmented Active Deep Learning for Identifying Relations Between Distant Medical Concepts in Electroencephalography Reports

Query Modification through External Sources to Support Clinical Decisions

Contents New Features 2 JustGiving Integration 2

Mining first-order frequent patterns in the STULONG database

Sheila Barron Statistics Outreach Center 2/8/2011

IR Meets EHR: Retrieving Patient Cohorts for Clinical Research Studies

Last week: Introduction, types of empirical studies. Also: scribe assignments, project details

Getting the Design Right Daniel Luna, Mackenzie Miller, Saloni Parikh, Ben Tebbs

Searching for flu. Detecting influenza epidemics using search engine query data. Monday, February 18, 13

Access to Excellent Research: Scopus in Lithuania

Transcription:

Track Jaime Arguello jarguell@email.unc.edu February 6, 2017

Goal: developing systems that can monitor a data stream (e.g., tweets) and push content that is relevant, novel (with respect to previous pushes), and timely Input: a query describing the topic or event of interest Evaluation period: 8/2/2016-8/11/2016 (10-day period) Output: Scenario A: up 10 tweets per day (ASAP) Scenario B: up 100 tweets per day (Midnight) 2

Evaluation About 200 interest profiles ( predicting the future ) Pooling: Scenario A and B results were pooled Judging: not relevant, relevant, highly relevant Clustering: relevant and highly relevant tweets were clustered manually per query Only the first tweet per cluster received credit 3

Evaluation Topic/Profile ID: RTS2 Title: Zika in Ecuador Description: Find updates on the current Zika crisis in the country of Ecuador. Narrative: The user has family in Ecuador and wants to see how her family might be affected by the Zika crisis. She s interested in reports of new cases as well as measures being taken to control the outbreak. 4

Runs Automatic: no human in the loop Manual preparation: tailored to 200 interest profiles Manual intervention: anything goes 5

Evaluation as a Service (EAAS) Stream of Tweets Participating Systems TREC RTS evaluation broker Twitter API Assessors 6

Evaluation as a Service (EAAS) 7

Evaluation as a Service (EAAS) Assessor Judgments Profiles Assessor 1 53 4 Assessor 2 3314 10 Assessor 3 136 10 Assessor 4 325 8 Assessor 5 949 10 Assessor 6 28 19 Assessor 7 277 19 Assessor 8 1923 15 Assessor 9 3775 45 Assessor 10 680 16 Assessor 11 107 55 Assessor 12 324 2 Assessor 13 201 12 8

Scenario A: Metrics Expected Gain (EG) G(NR) = 0.0, G(R) = 0.5, G(HR) = 1.0 N = number of tweets pushed on a given day Only first tweet from each cluster got credit EG-1: systems were rewarded to not pushing any tweets on silent days 1 N X G(t) 9

Scenario B: Metrics NDCG@10 NDCG-1@10: systems were rewarded to not pushing any tweets on silent days 10

Challenges Extremely short documents Tweets with little useful content Topic drift, particularly with event queries Deciding when to push (no knowledge of the future) Predicting novelty with respect to previous tweets Tradeoff between timeliness and predicting relevance Indexing and scoring interest profile documents in response to each tweet query Generating IDF values 11

Philips Research Extract terms and phrases from interest profiles (title, description, narrative) Expand phrases using paraphrase database Each interest profile is associated with terms and phrases belonging to different categories Relevance: based on overlap between interest profile representation and tweet Novelty: based on similarity with most similar previously pushed tweet 12

Philips Research relevance(c i )= X c2c i l 2 c n c max n (C i ) profile relevance(t) = CX i=1 w i relevance(c i ) 13

Philips Research Run 1: static relevance threshold Run 2: start with high threshold, and after 12 hours, decrease threshold based on number of pushed tweets Run 3: start with low threshold, and after 12 hours, increase threshold based on number of pushed tweets 14

University of Delaware (Fang) Interest profile expansion using average PMI between query terms and terms in 300 search results once before evaluation period at the start of each day Learn optimal threshold using linear regression clarity score top tweet score difference between top and tenth tweet scores Filter tweets based on Jaccard similarity with most similar previous tweet 15

University of Delaware (Fang) Clarity score: dissimilarity between language model of top results and background language model P(w q Q )= 10 Â i=1 P(w D i ) score(d i, Q) Â 10 i=1 score(d i, Q)! KLD(q Q, q G )=Â w P(w q Q ) log P(w qq ) P(w q G ) 16

University of Delaware (Carterette) Interest profile expansion using top tweets (terms, hashtags, usernames) and top web results Novel tweets identified using dynamic clustering, using a threshold to decide if a tweet belongs to a new cluster Interest profile IDF values generated using external twitter corpus. 17

University of Maryland Learning to rank using a wide range of features scoring functions (e.g., Jaccard, Cosine, BM25) tweet features (e.g., length, hashtags, urls) Novel tweets identified based on similarity with most similar previously pushed tweet Relevance threshold set to MIN and MAX score given to top tweets returned by twitter from past two weeks 18

Qatar University Cosine similarity between TF.IDF term vectors IDF values derived from emerging index Score tweets based on relevance and timeliness: S(t) =S r (t) 100 (CurTime time(t)) 100 Push one tweet after T minutes of silence Interest profile expansion based on emerging corpus and Twitter API (TF-scoring) Novel tweets identified using Jaccard coefficient with previously pushed tweets 19

University of Toulouse Score tweets using number of terms in common with interest profile Novel tweets identified using Jaccard coefficient with previously pushed tweets Push all novel tweets with score greater than a threshold 20