NTT DOCOMO Technical Journal

Similar documents
Michigan State University Alumni Communities

Asthma Surveillance Using Social Media Data

Table of Contents. Contour Diabetes App User Guide

Pink Is Not Enough Breast Cancer Awareness Month National Breast Cancer Coalition Advocate Toolkit

CHAPTER 4 THE QUESTIONNAIRE DESIGN /SOLUTION DESIGN. This chapter contains explanations that become a basic knowledge to create a good

USER GUIDE: NEW CIR APP. Technician User Guide

This is a repository copy of Measuring the effect of public health campaigns on Twitter: the case of World Autism Awareness Day.

Agile Product Lifecycle Management for Process

Psychology Perception

mehealth for ADHD Parent Manual

Contour Diabetes app User Guide

Experimental Research in HCI. Alma Leora Culén University of Oslo, Department of Informatics, Design

Registration Instructions Thank you for joining the Million Mile Movement!

Let s get started with the OneTouch Reveal web app

MCC Human Machine lnterface

OneTouch Reveal Web Application. User Manual for Healthcare Professionals Instructions for Use

Annotation and Retrieval System Using Confabulation Model for ImageCLEF2011 Photo Annotation

Sleep Apnea Therapy Software User Manual

Agenda. MyFitnessPal.com Introduction Finding the Site & Signing Up Navigating the Site Tracking Foods & Exercise Printing Information

Diabetes Management App. Instruction Manual

TruLink Hearing Control App User Guide

TruLink for Apple Frequently Asked Questions

Elsevier ClinicalKey TM FAQs

Mood Disorders Websites. Reviewed & Critiqued by. Kyle Sprouse. Spring

11. NATIONAL DAFNE CLINICAL AND RESEARCH DATABASE

The Credibility Process

Hello e-communications Toolkit

Using APR-DRG DataVision Reports for Hospital Ranking and Physician Review

Local Offer Annual Report September Background. 2. From September 2014: The Newcastle Solution

The Hill s Healthy Weight Protocol User Guide. Effective

Quick guide to connectivity and the Interton Sound app

Blood Glucose Monitoring System. Copyright 2016 Ascensia Diabetes Care Holdings AG diabetes.ascensia.com

AMERICAN SIGN LANGUAGE GREEN BOOKS, A TEACHER'S RESOURCE TEXT ON GRAMMAR AND CULTURE (GREEN BOOK SERIES) BY CHARLOTTE BAKER-SHENK

TruLink Hearing Control App User Guide

Scientific Method Stations

THE OECD SOCIAL CAPITAL QUESTION DATABANK USER GUIDE

BIOMECHANICS OF DENTAL IMPLANTS: HANDBOOK FOR RESEARCHERS (DENTAL SCIENCE, MATERIALS AND TECHNOLOGY) FROM BRAND: NOVA SCIENCE PUB INC

Hearing Control App User Guide

Clay Tablet Connector for hybris. User Guide. Version 1.5.0

IMPaLA tutorial.

My Fitness Pal Health & Fitness Tracker A User s Guide

Blood Glucose Monitoring System. Copyright 2016 Ascensia Diabetes Care Holdings AG diabetes.ascensia.com

Mobile App User Guide

RAPID INTERPRETATION OF ECGS IN EMERGENCY MEDICINE: A VISUAL GUIDE BY JENNIFER L MARTINDALE MD, DAVID F.M. BROWN MD

THE GALLAUDET DICTIONARY OF AMERICAN SIGN LANGUAGE FROM HARRIS COMMUNICATIONS

LibreHealth EHR Student Exercises

Chapter IR:VIII. VIII. Evaluation. Laboratory Experiments Logging Effectiveness Measures Efficiency Measures Training and Testing

Diabetes Management Software V1.3 USER S MANUAL

Dementia Direct Enhanced Service

SleepImage Website Instructions for Use

easy read Your rights under THE accessible InformatioN STandard

Houghton Mifflin Harcourt Avancemos Spanish correlated to the. NCSSFL ACTFL Can-Do Statements (2015), Novice Low, Novice Mid, and Novice High

The Study of Life. Before You Read. Science Journal

Facilitator/Educator Guide: What are the Odds? Modeling the Chances of Getting an Autoimmune Disease

UWA ERA Publications Collection 2011

Bowel Cancer Screening for Scotland

A Suggestive Recommendation Method to Make Tourists Feel like going

Houghton Mifflin Harcourt Avancemos Spanish 1b correlated to the. NCSSFL ACTFL Can-Do Statements (2015), Novice Low, Novice Mid and Novice High

OneTouch Reveal Web Application. User Manual for Patients Instructions for Use

New User FAQs. Myzone, Heart Rate Zones & The MZ-3

Houghton Mifflin Harcourt Avancemos Spanish 1a correlated to the. NCSSFL ACTFL Can-Do Statements (2015), Novice Low, Novice Mid and Novice High

Sleep Apnea Therapy Software Clinician Manual

ORMOM 2015 PATIENT FLOW-2015

professor makes the individual believe it to be true.

Like Me! Angela Bonsu and Selene Wartell Fall 2017 INFO 360D group As Is

CASAAMedia Samantha Henning, Missy Harris & Caroline McCarthy. Social media Engagement & Evaluation

Running Head: PERSONALITY AND SOCIAL MEDIA 1

MODULE 4: SPLICING. Removal of introns from messenger RNA by splicing

Avaya one-x Communicator for Mac OS X R2.0 Voluntary Product Accessibility Template (VPAT)

Improving drug safety decisions with more comprehensive searches

Multi-Therapy Smart Infusion for Home Care

AVELEY MEDICAL CENTRE & THE BLUEBELL SURGERY

The Impact of Relevance Judgments and Data Fusion on Results of Image Retrieval Test Collections

INFORMATION SYSTEMS: A MANAGER'S GUIDE TO HARNESSING TECHNOLOGY, V. 4.0 BY JOHN GALLAUGHER

Concepts Weight Market Size Profit Potential Technology Readiness Market Entry Difficulty

ProScript User Guide. Pharmacy Access Medicines Manager

PRENTICE HALL HEALTH: SKILLS FOR WELLNESS DOWNLOAD EBOOK : PRENTICE HALL HEALTH: SKILLS FOR WELLNESS PDF

Chapter 2 Mapping Geotagged Tweets to Tourist Spots Considering Activity Region of Spot

Quick guide to connectivity and the ReSound Smart 3D app

ios Accessibility Towards Universal Design

ICD-10 Readiness (*9/14/15) By PracticeHwy.com, Inc.

SAMPLE. Certificate in Understanding Autism. Workbook 1 DIAGNOSIS PERSON-CENTRED. CACHE Level 2 ASPERGER SYNDROME SOCIAL INTERACTION UNDERSTANDING

Identifying Signs of Depression on Twitter Eugene Tang, Class of 2016 Dobin Prize Submission

HARRISON'S GASTROENTEROLOGY AND HEPATOLOGY, 3 E (HARRISON'S SPECIALTY) BY DAN LONGO, ANTHONY S. FAUCI

Answers to end of chapter questions

Speak Out! Sam Trychin, Ph.D. Copyright 1990, Revised Edition, Another Book in the Living With Hearing Loss series

These Terms Synonym Term Manual Used Relationship

Tweet Location Detection

A step towards real-time analysis of major disaster events based on tweets

Action Planning. Training Guidelines and Procedures. Training Etiquette and Philosophy

Scientific Investigation

Blue Distinction Centers for Fertility Care 2018 Provider Survey

INFORMATION SOURCES REVIEW! 1. The Consumer Health Information Community. Information Sources Review. Nathan Kavanaugh. San Jose State University

App user guide. resound.com

User Interface Design Analysis precor efx 425 elliptical i300: HCI/D - Tom Mitchell Fall 2015

genes and addiction. They hope to learn how to better prevent and treat this illness.

#GETLOUD 66TH ANNUAL CMHA MENTAL HEALTH WEEK PUBLIC TOOLKIT

Introduction. Click here to access the following documents: 1. Application Supplement 2. Application Preview 3. Experiential Component

easy read Your rights under THE accessible InformatioN STandard

Transcription:

Tweet Analysis Place-name Identification Technology Location Information Twitter* 1 is an SNS for sharing real-time information, and this information can be analyzed to reveal trends around the world; detecting real-world events as they occur and identifying locations that are attracting attention. This article describes technology that analyzes tweets and associates a location with them. It also describes applications for such location-associated tweets, including visualizing real-time tweets on a map, detecting areas of activity for specific keywords such as autumn colors, and analyzing the quality of communications coverage by area. 1. Introduction Twitter is widely used as a social media for sharing user posts in real time, and by analyzing this content, it is possible to discover trends currently occurring around the world, including events and locations that are attracting attention and the latest news. For example, NTT DOCOMO s dmenu has real-time search [1] entries called Location-based Tweet Search and the Trending Spot Ranking in the area guide [2], which gather real-time tweets* 2 related to sight-seeing spots and provide users with instant information on popular near-by locations [3]. The ability to gather information related to placenames and other locations accurately and in real-time is important for providing such localized information services. To fulfill this requirement, NTT DOCOMO has developed technology to analyze tweets and associate locations with them. This article describes the technology for associating locations with tweets. It also describes examples of applications for location-associated tweets, including visualizing location-related tweets in real time on a map, detecting burst areas for specific location-related keywords such as autumn colors, and analyzing mobile phone connectivity and other aspects of communication quality within coverage areas. Service & Solution Development Department Research Laboratories Keiichi Ochiai Daisuke Torii Haruka Kikuchi Wataru Yamada 2. Linking Tweets with Location Information There are three methods for gathering location-associated tweets. (1) Geo-tagged Tweets Using tweets that have been posted with geo-tags, which hold latitude and longitude information, attached. (2) Place-name Identification Through Textual Analysis Extracting place-names from tweets using textual analysis and using a place-name dictionary to associate a location with them. (3) Use of Accounts Having an Associated Location 2014 NTT DOCOMO, INC. Copies of articles may be reproduced only for personal, noncommercial use, provided that the name, the name(s) of the author(s), the title and date of the article appear in the copies. Currently Service Innovation Department *1 Twitter: The name Twitter, the logo and the Twitter bird are registered trademarks of Twitter, Inc. in the U.S.A. and other countries. *2 Tweet: A term used to refer to entries on the micro-blogging service provided by Twitter, Inc. Vol. 16 No. 2 29

Analysis of Location-related Tweets and its Applications Locations can also be associated When associating locations with and this can indicate whether the name with tweets based on the profile of tweets based on textual analysis, names is being used as the place name or for the user posting the tweet. In par- within the tweet text could refer to some other purpose. ticular, with sight-seeing spots and multiple different locations, or to some- For example, Maruyama Park can facilities that have their own public thing other than a place such as a refer to a park in Kyoto or a different accounts, a location can be associated person, and it is necessary to eliminate park in Hokkaido, but the name Kyoto with the account. the ambiguity in such cases [4]. We use is more likely to co-occur when We discuss details of these methods below. Note that geo-tagged tweets already have a location associated with them, so we discuss only place-name identification using textual analysis and the user-profile based method in this article. 2.1 Place-name Identification through Textual Analysis Of the three techniques mentioned above, textual analysis to associate locations has the potential to extract more tweets with associated locations than either the geo-tagged tweet or the location-associated account methods. As such, method (2) is most effective for extracting information. Co-occurrence Place-name Tweets a place-name identification technology based on co-occurrences* 3 of words to eliminate ambiguity regarding different places with the same name, and a place-name identification technology based on Conditional Random Fields (CRF)* 4, which identifies place names by examining words before and after the place-name candidate, to eliminate ambiguity due to names that may refer to something (or someone) other than a place. 1) Place-name Identification Using Cooccurrences Identifying place names using cooccurrences works on the assumption that a place name is more likely to cooccur with names of near-by places or words related to local characteristics, containing place names Is there ambiguity? No Yes referring to the one in Kyoto. Similarly, Matsushima can be the name of a person, or of a place related to the poet Matsuo Bashou, so it is more likely to co-occur with Matsuo Bashou when referring to the place. Such words that are likely to co-occur are called cooccurrences in this article. A flowchart of place-name identification using co-occurrences is shown in Figure 1. Associations between ambiguous place names and co-occurrences are stored in a before-hand, tweets are selected by checking if they include names in a place-name, the names are checked for ambiguity, and if they are ambiguous, they are correlated with the co-occurrence to extract only the tweets containing co- containing cooccurrences Place-nameassociated tweet Figure 1 Co-occurrence place-name identification flowchart *3 Co-occurrence: When two particular words appear in the same sentence. *4 CRF: Conditional Random Field. A type of method for assigning pre-defined labels to a sequence of input entities based on feature values of the entities. In this document, labels for place-names and non-place-names are assigned to eliminate ambiguity regarding place names within the text, based the sentence syntax and the surrounding words. 30 Vol. 16 No. 2

occurrence for that place name. This enables ambiguous place names to be associated correctly with tweets. 2) Place-name Identification Using CRF The co-occurrence place-name identification technology described above only selects tweets that contain both a place-name and corresponding co-occurrences. This is able to associate tweets with place-names very accurately, but it cannot find associations for tweets that do not contain co-occurrences. However, even tweets that do not contain cooccurrences can be identified by looking at how words other than the place names are used within the text. For example, neither of the following contains co-occurrences: Place-name Tweets Place-name Tweets Figure 2 I had dinner at Matsushima, and Mr. Matsushima had dinner. However, seeing that they feature at or Mr. near Matsushima can be used to determine that they refer to a place and a person, respectively. These types of sentence characteristics are called features in the field of natural language processing. Place names in the text can be discriminated automatically from nonplace names by training a classifier* 5 with features for place names and nonplace names. For our method, we used a classifier called CRF. A flowchart for feature extraction and CRF training is shown in Figure 2. For classification using CRF, tweets containing place names containing place names Manual application of truth value Is there ambiguity? No Feature extraction Feature extraction and CRF training flowchart Yes that include place names are selected from the set of tweets, and truth values indicating whether they are place names or person names are applied manually. Then, features are extracted from the tweets marked with truth value, and the CRF is trained to increase its classification performance. A flowchart for place-name identification using CRF is shown in Figure 3. Tweets containing place names extracted from the set of tweets are first judged whether there is ambiguity as a placename or some other word. If there is ambiguity, the CRF trained using the steps in Fig. 2 is used to discriminate between place names and other words, and then place names are associated CRF training Identify using CRF Place-nameassociated tweet Figure 3 CRF place-name identification flowchart *5 Classifier: A device that sorts inputs into predetermined groupings based on their feature values. Vol. 16 No. 2 31

Analysis of Location-related Tweets and its Applications with the tweets containing them. This enables ambiguity between place names and non-place names to be eliminated, even if there are no co-occurrences, so accurate associations between place names and tweets can be implemented. 2.2 Use of Accounts with Associated Locations Sightseeing spots and facilities sometimes have their own public accounts, and these public accounts often post tweets with helpful information regarding the location or facility. For example, the public account for a shopping mall might post sale or fair information, or a town hall account might post event information. As such, we investigated public accounts of locations and facilities, and used them to create associations. This enables us to extract more useful tweets using tweets from these public accounts. 3. Displaying Tweets on a Map and Example Applications To display tweets on a map, latitude and longitude values indicating the display position must be assigned to each tweet. Display positions are determined either from the tweets themselves, or from the account that posted the tweet (Table 1). There are two issues when using tweets with associated locations, as follows: Whether understanding of the realtime situation is needed Whether tweets related to a particular topic need to be selected Table 1 Tweet type Geo-tagged tweet Tweets located by textual analysis Tweets from public accounts (e.g.: accounts of cities or regions) Figure 4 3.1 Real-time Display of Tweets on a Map Displaying tweets in real time on a map enables viewing of what is actually being said at the time regarding a particular location on the map. When not limiting to a particular topic, one can check the sorts of topics that are being talked about in each location. Tweets are displayed with a background color according to how the location information was determined, with blue indicating textual analysis, orange indicating a public account, and green indicating a geo-tagged tweet (Figure 4). The number of tweets that can be associated with a location varies greatly with the region, and depending on the scale of the map and the screen size, it may not be possible to display all of the tweets, or there may not be a single tweet to display. For this reason, tweets that are more likely to be useful for the user are displayed with higher priority when there are many location-associated tweets in the display area. Specifically, tweets Tweet display position on a map Display position on the map Coordinates attached to the tweet itself Coordinates of place name extracted from text Location of the public account (store, city hall, etc.) Tweet located using textual analysis Tweet posted by a public account with an associated location Geo-tagged tweet Tweet display position on a map 32 Vol. 16 No. 2

with a high number of retweets* 6 are displayed first, and high-priority tweets are eventually replaced with lower priority tweets after being displayed continuously for a period of time. On the other hand, if there are no location-associated tweets within the display area, tweets outside of the screen area are displayed with arrows indicating the direction in which they are located (shown with red frames in Figure 5). The arrows also work as navigation, allowing users to move to the location of the off-screen tweet by clicking them. This reduces the amount of map scrolling and zooming required to find tweets, and enables users to check tweets efficiently. 3.2 Detection of Areas with Increased Activity Area analysis is a special case of location-related analysis. Characteristics of an entire area can be understood by analyzing with respect to multiple locations within the area instead of only one particular location. For example, it is possible to estimate the best time to see autumn colors in a given area by indexing the number of tweets related to the topic at spots in that area. Or as shown in Figure 6, the advancement of changing colors can be reproduced by computing indices from past tweets, and arranging the best times in sequence. When determining characteristics of an overall area, we index according to multiple locations within the area, Figure 5 Navigation to tweets off the map Figure 6 Reproducing the advance of autumn colors by detecting areas of increased activity and avoid indexing using only a single regarding autumn colors). If the index location within the area. of activity is taken as simply the total To illustrate, consider the pattern of number of tweets in the area, the two tweets produced by three spots within a areas would appear the same, with given area. Figure 7(a) shows three three tweets. However, since the former tweets from a single spot, and Fig. 7(b) could be due to some event or other shows one tweet from each of three phenomenon characteristic of the single spots (assuming the example of tweets spot, the latter, with tweets occurring at *6 Retweet: A tweet from another user that is reposted, without modifying the content. The number of times an individual tweet is retweeted is called the number of retweets. Vol. 16 No. 2 33

Analysis of Location-related Tweets and its Applications all spots, may be a better indication of a phenomenon occurring over the whole area (such as the best time to see autumn colors). There are other possible indices that incorporate a perspective on this sort of diversity, such as the number of unique spots producing tweets (counting each spot as one, even if it produced multiple tweets). The level of activity of an area can be defined by considering a perspective on diversity as well as quantity, such as the total number of tweets in the whole area. In Fig. 7(b), for the example, if five tweets at each of the spots were produced instead of one each, this could be considered as a higher level of activity. By computing an index that combines both the quantity and the diversity of tweets in an area for successive periods of time, the time periods with the highest levels of activity in an area (e.g., the best time to view autumn colors) can be estimated. This sort of analysis holds promise for applications beyond autumn colors, such as cherry blossom viewing, or more broadly, for disaster situations such as typhoons, guerrilla rainstorms, tornados and earthquakes. 3.3 Analysis of the Communications Quality In this section, we describe an application of location-associated tweets for analyzing the quality of communication in an area. A data processing flowchart is shown in Figure 8. Tweets referring to communication quality in various locations can be extracted by selecting for words related to communication service, such as connect or disconnect, and the names of communication providers. A screen-shot of the system that displays the selected results on a map is shown in Figure 9. Fig. 9(a) and (b) show a list of place names extracted from the tweets, and the numbers of tweets selected by geo-tagging and textual analysis respectively. Figure 8 Figure 7 Location-related tweets Tweet patterns in a given area Filtering for provider name and dictionary terms Location tweets related to communications quality Fig. 9(c) shows a tweet regarding communication quality displayed on the map at the corresponding location. Fig. 9(d) shows a time-line of tweets related to communication quality for a selected place name. Tweets with positive content, such as the word connected, are displayed with a pink background, while negative content, such as disconnected, are shown with a blue background. This allows the user to get a visual impression of the user comments Dictionary of communication-servicerelated terms Communication area quality tweet processing flowchart 34 Vol. 16 No. 2

(b) Location-associated tweets : 32 Geo-tagged tweets : one Number of tweets related to communication quality (a) List of place-names with tweets about communication quality Figure 9 for a selected location. Work to improve the quality of communication in a given area can be supported by providing information in the form of tweets related to communication quality displayed on a map in this way. 4. Conclusion In this article we describe a method for associating locations with tweets and examples of applying such locationassociated tweets. Our examples included mapping location-related tweets onto a (c) @username [Provider] s LTE connects between Yokohama and Higashihakuraku! 2013-08-20 06:48:21 @username Can t connect to [Provider] near Higashihakuraku 2013-08-27 12:53:10 @username Earlier, I couldn t connect in the tunnel from Higashihakuraku to Yokohama, but now I can! [Provider] This is great! 2013-08-31 20:51:03 Screen-shot of checking communication area quality map in real-time, searching maps for burst areas with respect to specific location-related keywords such as autumn colors, and analyzing communication quality, such as ease of mobile phone connectivity, in a given area. In the future, we intend to expand this into a commercial service, and to perform analysis that combines this with other location-related data. REFERENCES [1] NTT DOCOMO: dmenu Real-time Search. http://realtime.search.smt.docomo.ne.jp/ (d) [2] NTT DOCOMO: Trending Spot Ranking docomo Map Navi. http://s.dmapnavi.jp/kanko/ranking/ra nking_top.php?geo=&val=&linkfrom= T009 [3] D. Torii et al.: Development of Realtime Search Services Offering Daily-life Information, NTT DOCOMO Technical Journal, Vol. 14, No. 4, pp. 10 16, Apr. 2013. [4] E. Amitay, N. Har El, R. Sivan, A. Soffer: Web-a-Where: Geotagging Web Content, SIGIR 04 Proc. of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 273 280, 2004. Vol. 16 No. 2 35