Skip to content

g-luo/geolocation_via_guidebook_grounding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

G^3: Geolocation via Guidebook Grounding

For inquiries and requests, please contact [email protected].

This repository contains the code accompanying our paper, which proposes a new method to use human-written guides to improve image geolocation, the task of predicting the location of an image. Our method uses explicit knowledge from human-written guidebooks that were created to help others improve their skills at playing GeoGuessr. The guidebooks describe the salient and class-discriminative visual features people have curated from experience. We evaluate on a novel dataset of StreetView images from a diverse set of locations. This code provides code and scripts to reproduce the results from our paper. We would also like to acknowledge the work Geolocation Estimation of Photos using a Hierarchical Model and Scene Classification, which this repository builds off of.

Geolocation via Guidebook Grounding Dataset

The StreetView panorama ids and guidebook are available for download here. You can download all necessary files by running ./scripts/get_data_files.sh.

In our dataset train/val/test contain the StreetView panorama ids, guidebook.json contains the guidebook text, s2_cells/countries.csv contains the mapping from our predicted class labels to the human-readable country name. From the panorama ids you can download the panoramas then cut them to images, which is described further in the section "Getting StreetView Images".

  • train:
    • train.csv: A csv file where each row corresponds to an image in the train set with metadata IMG_ID. For an image with IMG_ID YYcmZ_mdbshez6STxSxmRQ_0.png, the first 22 characters before the final underscore (YYcmZ_mdbshez6STxSxmRQ) corresponds to a pano_id in the StreetView API, and the last digit (_0.png) corresponds to a piece of the original panorama (where each panorama was split into four disjoint pieces).
  • val: Folder containing the val data in the same structure as train.
  • test: Folder containing the test data in the same structure as train.
  • guidebook.json: A json file containing a list of CLUE_ID, text, geoparsed (which countries were geoparsed from the text) corresponding to each guidebook clue mined from a human-written GeoGuessr guide.

We also include the following files which are used during training.

  • s2_cells/countries.csv: A csv file derived from the GeoNames database where each row corresponds to a ground truth country with metadata class_label (equivalent to hex_id), country, geonameid, latitude_mean, longitude_mean (the mean lat/lon of the country's landmass), geoJSON (a polygon representing the country's borders).
  • pseudo_labels/countries.json: A json file indexed by IMG_ID mapping to a list of CLUE_IDs which indicates for each StreetView image which guidebook clues correspond.
  • loss_weight/countries.json: A json file containing a list of weights for each country class used in the training country classification loss to account for data imbalance, where each index corresponds to class_label in s2_cells/countries.csv.
  • features:
    • streetview_clip_rn50x16.pkl: A pickle file indexed by IMG_ID containing features for each StreetView image as generated by CLIP RN50x16 off-the-shelf.
    • guidebook_roberta_base.pkl: A pickle file indexed by CLUE_ID containing features for each guidebook clue as generated by RoBERTa base off-the-shelf.

Getting StreetView Images

We provide the panorama ids for all images in our dataset at dataset/${split}/${split}.csv, where ${split} denotes either train, val, or test.

  1. Download the panoramas using the API, and save in the folder dataset/${split}/panos with the file name <pano_id>.jpg. If you encounter issues downloading the panoramas for your research, please contact us.

  2. Run all data preparation using the bash script ./scripts/process_streetview_images.sh. This script cuts the panoramas into images using scripts/panocutter.py, saves the images in msgpack format used during training using scripts/image_to_msgpack.py, and infers the image to label mapping using scripts/image_to_country.py. The images are stored in the compressed mspack format in shards, indexed by IMG_ID.

Training G^3

To run each row from our main table, run the following shell scripts.

Experiment Attn Supervision File
ISN N/A ./quickstart/isn.sh
ISN + Random Text N/A ./quickstart/isn_random_text.sh
ISN + Guidebook No ./quickstart/isn_guidebook_no-attn-sup.sh
ISN + Guidebook Yes ./quickstart/isn_guidebook.sh
ISN + CLIP N/A ./quickstart/isn_clip.sh
ISN + CLIP + Random Text N/A ./quickstart/isn_clip_random_text.sh
ISN + CLIP + Guidebook No ./quickstart/isn_clip_guidebook_no-attn-sup.sh
G^3 = ISN + CLIP + Guidebook Yes ./quickstart/isn_clip_guidebook.sh

You can also customize the arguments using our OmegaConf inheritance scheme. You can edit a config to have the field includes: [<parent_config.yml>] to inherit fields. You can also add fields via flags when running python -m classification.train.train_classification, for example adding model_params.name=<name> to override the default name in the provided config.

Evaluating G^3

To evaluate trained weights on the StreetView images, run the cells of notebooks/inference.ipynb.

Citing

If you find our dataset useful for your research, please, cite the following paper:

@article{luo2022geolocation,
  title={G^3: Geolocation via Guidebook Grounding},
  author={Luo, Grace and Biamby, Giscard and Darrell, Trevor and Fried, Daniel and Rohrbach, Anna},
  journal={Findings of EMNLP},
  year={2022}
}

About

G^3: Geolocation via Guidebook Grounding, Findings of EMNLP 2022

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published