GIANT: Geo-Informative Attributes for Location Recognition and Exploration Quan Fang, Jitao Sang, Changsheng Xu Institute of Automation, Chinese Academy of Sciences October 23, 2013
Where is this? La Sagrada Família, Barcelona
And how about this one? Gothic Quarter, Barcelona? Can computer localize this streetview photo & tell what this is about?
UGC photos is flourishing 8 billion images 250 billion images 16 billion images 200 million images a day 150 billion images 350 million images a day
Geo-location is available San Francisco London Chicago Paris Berlin New York Barcelona Istanbul Beijing Tokyo 8 billion images 16 billion images 200 million images a day Cairo Hong Kong Taipei Singapore Sao Paulo Sydney
when photos meet geo-location
Barcelona
Discriminative Regions
Geo-informative Attributes Discriminative Region Semantic Interpretation Roofs rooftop Roofs Eaves Windows colorful Balcony
Geo-informative Attributes Geographically discriminative Differentiate this location from others Representative Occur frequently in this location Semantically interpretable
Our Goal: Given a large geo-tagged image dataset, we automatically discover geo-informative attributes towards location recognition and exploration
Motivation Related Work Outline Our Solution Geo-informative Attribute Discovery Geo-informative Attribute Interpretation Evaluation Conclusion
Geographical Location Estimation Landmark recognition Matching; classification model [Li et al ICCV 2009]; [Kalogerakis et al, ICCV 2009]; [Zheng et al, CVPR2009];[Liu et al, MM2012] Not applicable to general location recognition Location recognition High visual diversity; intra-class variance Scene matching; classification models [IM2GPS, CVPR 2008]; [Crandall et al, WWW 2009]; [G.Friedland et al. MM2011]; Limited scalability; black-box mode
Visual Attributes Visual Attributes Learning Machine detectable; semantically meaningful Mid-level representation [Singh et al. ECCV2014];[Doersch et al. Siggraph 2012] [Farhadi et al, CVPR2009];[Parikh ICCV2011] Geographical visual attribute mining WHAT MAKES PARIS LOOK LIKE PARIS? [Doersch et al. Siggraph 2012] Discover visual elements that characterize a city Discriminative clustering Independently mine the visual patches No text interpretation
Outline Motivation Problem Related Work Our Solution Geo-informative Attribute Discovery Geo-informative Attribute Interpretation Evaluation Conclusion
Solution Geo-informative Attribute Discovery Discriminative Region Detection & Clustering Tag-Region Learning & Interpretation Discriminative Representative Semantically Understandable Geo-informative Attributes Geo-informative Attribute Interpretation
Outline Motivation Problem Related Work Our Solution Geo-informative Attribute Discovery Geo-informative Attribute Interpretation Evaluation Conclusion
Geo-informative Attribute Discovery Input Output Neighbor Voting Region-based Latent SVM Meanshift Clustering Geo-tagged Photos Geo-discriminative Photo Selection Geo-discriminative Regions Detection Geo-informative Attributes
Step 1: Non-discriminative Photo Filtering Nearest neighbor voting Photo Nearest neighbors Barcelona Not Barcelona
Step 2: Discriminative Region Selection Region Generation Hierarchically grid-based segmentation
Region-based Latent SVM Combine region feature, region discriminative capability, location label in a principled model Core idea: Each region is associated with a binary latent variable z indicating whether it s geo-informative or not Learn a function f w : X L R Region feature vector Latent variable Location label w T Φ x, z, l = Feature vs. Latent Variable Potential N i=1 + α T φ(x i, z i + N i=1 β T φ(z i, l γ T ψ(z i, z j, x i, x j (i,j E Latent variable vs. location label Potential Latent variable vs. Latent variable Potential
Region-based Latent SVM Feature vs. Latent Variable Potential l α T φ(x i, z i = α T b 1(z i = b x i b Z Latent Variable vs. Location Label Potential z k zi z j z t β T φ(z i, l = β b,c 1(l = b 1(z i = c b K c Z Latent Variable vs. Latent Variable Potential x k x i x j x t γ T ψ(z i, z j, x i, x j = b Z c Z γ b,c p(x i, x j 1(z i = b 1(z j = c p(x i, x j = e x i x j
Region-based Latent SVM Model Learning Holding latent variables z fixed, learn parameters w to minimize the objective Non-convex bundle optimization Model Inference Holding parameters w fixed, search the best variable configuration such that w T Φ(x, z, l is maximized Loopy belief propagation M 1 min w,ξ 0 2 w 2 + C 1 ξ i i=1 s. t. max z wt Φ(x i, z, l i max z wt Φ(x i, z, l Δ(l i, l ξ i, l L z = argmax z wt Φ(x i, z, l i )
Step 3: Geo-informative Attribute Generation Meanshift clustering Geo-informative Attributes Discriminative Region
City-scale Location Recognition Region-based Latent SVM Directly apply the RLSVM for a new given photo x GIANT: geo-informative attribute-based recognition Locality-constrained coding over the attribute vocabulary D Discriminative classifier l = arg max{max l z min S N n=1 y n D n s n 2 s. t.1 t s n = 1 w T Φ(x t, z, l
Outline Motivation Problem Related Work Our Solution Geo-informative Attribute Discovery Geo-informative Attribute Interpretation Evaluation Conclusion
Geo-informative Attribute Interpretation Tag-region Relatedness Learning Discriminative SVM Classifiers Discriminative Classifiers Scoring Semantic tags for Attributes
Tag-Region Relatedness Learning Measure the relatedness between the tag and regions Exploit the co-occurrence relationships between tags and photos Learn a bundle of discriminative SVM classifiers for each tag architecture 'architecture' Clustering SVM Learning
Attribute Interpretation Discriminative Classifiers Scoring Score each region with all SVMs Compute and sort tag scores Select the top n tags 'architecture' 'espanya' 'jugendstil architecture: 0.75 espanya: 0.45 jugendstil: 0.4 architecture: 0.8 espanya: 0.55 jugendstil: 0.36 architecture: 0.7 espanya: 0.5 jugendstil: 0.42 architecture: 0.6 espanya: 0.5 jugendstil: 0.42 architecture: 2.85 espanya: 2 jugendstil: 1.6
Motivation Related Work Outline Our Solution Geo-informative Attribute Discovery Geo-informative Attribute Interpretation Evaluation Conclusion
San Francisco Chicago New York London Paris Barcelona Hong Kong Tokyo Taipei Singapore Sao Paulo GoogleStreetView Sydney
London Paris Barcelona Berlin Istanbul Beijing Cairo Flickr City
Dataset Dataset GoogleStreetView Flickr #City 12 7 #Photo per city Nearly 10,000 2,000 ~ 3,000 #Total photo 139,840 13,503
Experiment Setup Dataset Randomly sample near 500 photos for each city GoogleStreetView: 6,111; Flickr city: 3,501; 100 photos per city for training, the rest for testing Compared Methods knn: Scene-matching method [J. Hays et al., CVPR 2008] LF+SVM : Low-level features combined with SVM [J. Zhu et al., MM 2008] BOVW+SVM: Bag of words features combined with SVM [J. Wang et al., CVPR 2010] DRLR: Discriminative region selection at the patch-level [C. Doersch et al., SIGGRAPH 2012]
Performance Comparison map results of recognition algorithms on GoogleStreetView
Performance Comparison map results of recognition algorithms on Flickr City Geo-informative attributes are of great value for location recognition.
Discriminative Region Detection Barcelona Hong Kong Paris Chicago London Istanbul
Cairo Berlin Istanbul Beijing Barcelona London Paris Singapore Chicago HongKong NYC Sanfransisco Sanpaulo Sydney Taipei Tokyo
Barcelona antonigaudi towers sebastianniedlich sagradafamilia nativity catalonia arquitectura espanya jugendstil artnouveau katalonien catalonia catalunya antoniogaudi antonigaudi sagradafamilia cathedral catholic
London Paris Attribute Interpretation Berlin Beijing Istanbul Cairo
Application: City Exploration Barcelona Sagradafamilia Antonigaudi sebastianniedlich catalonia arquitectura espanya Barcelona
Outline Motivation Problem Related Work Our Solution Geo-informative Attribute Discovery Geo-informative Attribute Interpretation Evaluation Conclusion
Conclusion Automatically discover geo-informative attributes Property: discriminative &representative& semantically interpretable Solution: Region-based latent SVM model for attribute detection Exploiting user-contributed tags for attribute interpretation Geo-informative attributes Be effective in general location recognition Visually characterize a city, enabling practical applications
Future Work The world is perceived by Visual + Semantic Geospatial visual + semantic knowledge mining Geo computing for location-based services Travel assistant system based on graph search Sensing Serving Geo Computing Mining Understanding
Thank you!