GIANT: Geo-Informative Attributes for Location Recognition and Exploration

GIANT: Geo-Informative Attributes for Location Recognition and Exploration Quan Fang, Jitao Sang, Changsheng Xu Institute of Automation, Chinese Academy of Sciences October 23, 2013

Where is this? La Sagrada Família, Barcelona

And how about this one? Gothic Quarter, Barcelona? Can computer localize this streetview photo & tell what this is about?

UGC photos is flourishing 8 billion images 250 billion images 16 billion images 200 million images a day 150 billion images 350 million images a day

Geo-location is available San Francisco London Chicago Paris Berlin New York Barcelona Istanbul Beijing Tokyo 8 billion images 16 billion images 200 million images a day Cairo Hong Kong Taipei Singapore Sao Paulo Sydney

when photos meet geo-location

Barcelona

Discriminative Regions

Geo-informative Attributes Discriminative Region Semantic Interpretation Roofs rooftop Roofs Eaves Windows colorful Balcony

Geo-informative Attributes Geographically discriminative Differentiate this location from others Representative Occur frequently in this location Semantically interpretable

Our Goal: Given a large geo-tagged image dataset, we automatically discover geo-informative attributes towards location recognition and exploration

Motivation Related Work Outline Our Solution Geo-informative Attribute Discovery Geo-informative Attribute Interpretation Evaluation Conclusion

Geographical Location Estimation Landmark recognition Matching; classification model [Li et al ICCV 2009]; [Kalogerakis et al, ICCV 2009]; [Zheng et al, CVPR2009];[Liu et al, MM2012] Not applicable to general location recognition Location recognition High visual diversity; intra-class variance Scene matching; classification models [IM2GPS, CVPR 2008]; [Crandall et al, WWW 2009]; [G.Friedland et al. MM2011]; Limited scalability; black-box mode

Visual Attributes Visual Attributes Learning Machine detectable; semantically meaningful Mid-level representation [Singh et al. ECCV2014];[Doersch et al. Siggraph 2012] [Farhadi et al, CVPR2009];[Parikh ICCV2011] Geographical visual attribute mining WHAT MAKES PARIS LOOK LIKE PARIS? [Doersch et al. Siggraph 2012] Discover visual elements that characterize a city Discriminative clustering Independently mine the visual patches No text interpretation

Outline Motivation Problem Related Work Our Solution Geo-informative Attribute Discovery Geo-informative Attribute Interpretation Evaluation Conclusion

Solution Geo-informative Attribute Discovery Discriminative Region Detection & Clustering Tag-Region Learning & Interpretation Discriminative Representative Semantically Understandable Geo-informative Attributes Geo-informative Attribute Interpretation

Outline Motivation Problem Related Work Our Solution Geo-informative Attribute Discovery Geo-informative Attribute Interpretation Evaluation Conclusion

Geo-informative Attribute Discovery Input Output Neighbor Voting Region-based Latent SVM Meanshift Clustering Geo-tagged Photos Geo-discriminative Photo Selection Geo-discriminative Regions Detection Geo-informative Attributes

Step 1: Non-discriminative Photo Filtering Nearest neighbor voting Photo Nearest neighbors Barcelona Not Barcelona

Step 2: Discriminative Region Selection Region Generation Hierarchically grid-based segmentation

Region-based Latent SVM Combine region feature, region discriminative capability, location label in a principled model Core idea: Each region is associated with a binary latent variable z indicating whether it s geo-informative or not Learn a function f w : X L R Region feature vector Latent variable Location label w T Φ x, z, l = Feature vs. Latent Variable Potential N i=1 + α T φ(x i, z i + N i=1 β T φ(z i, l γ T ψ(z i, z j, x i, x j (i,j E Latent variable vs. location label Potential Latent variable vs. Latent variable Potential

Region-based Latent SVM Feature vs. Latent Variable Potential l α T φ(x i, z i = α T b 1(z i = b x i b Z Latent Variable vs. Location Label Potential z k zi z j z t β T φ(z i, l = β b,c 1(l = b 1(z i = c b K c Z Latent Variable vs. Latent Variable Potential x k x i x j x t γ T ψ(z i, z j, x i, x j = b Z c Z γ b,c p(x i, x j 1(z i = b 1(z j = c p(x i, x j = e x i x j

Region-based Latent SVM Model Learning Holding latent variables z fixed, learn parameters w to minimize the objective Non-convex bundle optimization Model Inference Holding parameters w fixed, search the best variable configuration such that w T Φ(x, z, l is maximized Loopy belief propagation M 1 min w,ξ 0 2 w 2 + C 1 ξ i i=1 s. t. max z wt Φ(x i, z, l i max z wt Φ(x i, z, l Δ(l i, l ξ i, l L z = argmax z wt Φ(x i, z, l i )

Step 3: Geo-informative Attribute Generation Meanshift clustering Geo-informative Attributes Discriminative Region

City-scale Location Recognition Region-based Latent SVM Directly apply the RLSVM for a new given photo x GIANT: geo-informative attribute-based recognition Locality-constrained coding over the attribute vocabulary D Discriminative classifier l = arg max{max l z min S N n=1 y n D n s n 2 s. t.1 t s n = 1 w T Φ(x t, z, l

Outline Motivation Problem Related Work Our Solution Geo-informative Attribute Discovery Geo-informative Attribute Interpretation Evaluation Conclusion

Geo-informative Attribute Interpretation Tag-region Relatedness Learning Discriminative SVM Classifiers Discriminative Classifiers Scoring Semantic tags for Attributes

Tag-Region Relatedness Learning Measure the relatedness between the tag and regions Exploit the co-occurrence relationships between tags and photos Learn a bundle of discriminative SVM classifiers for each tag architecture 'architecture' Clustering SVM Learning

Attribute Interpretation Discriminative Classifiers Scoring Score each region with all SVMs Compute and sort tag scores Select the top n tags 'architecture' 'espanya' 'jugendstil architecture: 0.75 espanya: 0.45 jugendstil: 0.4 architecture: 0.8 espanya: 0.55 jugendstil: 0.36 architecture: 0.7 espanya: 0.5 jugendstil: 0.42 architecture: 0.6 espanya: 0.5 jugendstil: 0.42 architecture: 2.85 espanya: 2 jugendstil: 1.6

Motivation Related Work Outline Our Solution Geo-informative Attribute Discovery Geo-informative Attribute Interpretation Evaluation Conclusion

San Francisco Chicago New York London Paris Barcelona Hong Kong Tokyo Taipei Singapore Sao Paulo GoogleStreetView Sydney

London Paris Barcelona Berlin Istanbul Beijing Cairo Flickr City

Dataset Dataset GoogleStreetView Flickr #City 12 7 #Photo per city Nearly 10,000 2,000 ~ 3,000 #Total photo 139,840 13,503

Experiment Setup Dataset Randomly sample near 500 photos for each city GoogleStreetView: 6,111; Flickr city: 3,501; 100 photos per city for training, the rest for testing Compared Methods knn: Scene-matching method [J. Hays et al., CVPR 2008] LF+SVM : Low-level features combined with SVM [J. Zhu et al., MM 2008] BOVW+SVM: Bag of words features combined with SVM [J. Wang et al., CVPR 2010] DRLR: Discriminative region selection at the patch-level [C. Doersch et al., SIGGRAPH 2012]

Performance Comparison map results of recognition algorithms on GoogleStreetView

Performance Comparison map results of recognition algorithms on Flickr City Geo-informative attributes are of great value for location recognition.

Discriminative Region Detection Barcelona Hong Kong Paris Chicago London Istanbul

Cairo Berlin Istanbul Beijing Barcelona London Paris Singapore Chicago HongKong NYC Sanfransisco Sanpaulo Sydney Taipei Tokyo

Barcelona antonigaudi towers sebastianniedlich sagradafamilia nativity catalonia arquitectura espanya jugendstil artnouveau katalonien catalonia catalunya antoniogaudi antonigaudi sagradafamilia cathedral catholic

London Paris Attribute Interpretation Berlin Beijing Istanbul Cairo

Application: City Exploration Barcelona Sagradafamilia Antonigaudi sebastianniedlich catalonia arquitectura espanya Barcelona

Outline Motivation Problem Related Work Our Solution Geo-informative Attribute Discovery Geo-informative Attribute Interpretation Evaluation Conclusion

Conclusion Automatically discover geo-informative attributes Property: discriminative &representative& semantically interpretable Solution: Region-based latent SVM model for attribute detection Exploiting user-contributed tags for attribute interpretation Geo-informative attributes Be effective in general location recognition Visually characterize a city, enabling practical applications

Future Work The world is perceived by Visual + Semantic Geospatial visual + semantic knowledge mining Geo computing for location-based services Travel assistant system based on graph search Sensing Serving Geo Computing Mining Understanding

Thank you!