For inquiries and requests, please contact [email protected].
This repository contains the code accompanying our paper, which proposes a new method to use human-written guides to improve image geolocation, the task of predicting the location of an image. Our method uses explicit knowledge from human-written guidebooks that were created to help others improve their skills at playing GeoGuessr. The guidebooks describe the salient and class-discriminative visual features people have curated from experience. We evaluate on a novel dataset of StreetView images from a diverse set of locations. This code provides code and scripts to reproduce the results from our paper. We would also like to acknowledge the work Geolocation Estimation of Photos using a Hierarchical Model and Scene Classification, which this repository builds off of.
The StreetView panorama ids and guidebook are available for download here. You can download all necessary files by running ./scripts/get_data_files.sh
.
In our dataset train/val/test
contain the StreetView panorama ids, guidebook.json
contains the guidebook text, s2_cells/countries.csv
contains the mapping from our predicted class labels to the human-readable country name. From the panorama ids you can download the panoramas then cut them to images, which is described further in the section "Getting StreetView Images".
- train:
- train.csv: A csv file where each row corresponds to an image in the train set with metadata
IMG_ID
. For an image withIMG_ID
YYcmZ_mdbshez6STxSxmRQ_0.png
, the first 22 characters before the final underscore (YYcmZ_mdbshez6STxSxmRQ
) corresponds to apano_id
in the StreetView API, and the last digit (_0.png
) corresponds to a piece of the original panorama (where each panorama was split into four disjoint pieces).
- train.csv: A csv file where each row corresponds to an image in the train set with metadata
- val: Folder containing the val data in the same structure as train.
- test: Folder containing the test data in the same structure as train.
- guidebook.json: A json file containing a list of
CLUE_ID
,text
,geoparsed
(which countries were geoparsed from the text) corresponding to each guidebook clue mined from a human-written GeoGuessr guide.
We also include the following files which are used during training.
- s2_cells/countries.csv: A csv file derived from the GeoNames database where each row corresponds to a ground truth country with metadata
class_label
(equivalent tohex_id
),country
,geonameid
,latitude_mean
,longitude_mean
(the mean lat/lon of the country's landmass),geoJSON
(a polygon representing the country's borders). - pseudo_labels/countries.json: A json file indexed by
IMG_ID
mapping to a list ofCLUE_ID
s which indicates for each StreetView image which guidebook clues correspond. - loss_weight/countries.json: A json file containing a list of weights for each country class used in the training country classification loss to account for data imbalance, where each index corresponds to
class_label
in s2_cells/countries.csv. - features:
- streetview_clip_rn50x16.pkl: A pickle file indexed by
IMG_ID
containing features for each StreetView image as generated by CLIP RN50x16 off-the-shelf. - guidebook_roberta_base.pkl: A pickle file indexed by
CLUE_ID
containing features for each guidebook clue as generated by RoBERTa base off-the-shelf.
- streetview_clip_rn50x16.pkl: A pickle file indexed by
We provide the panorama ids for all images in our dataset at dataset/${split}/${split}.csv
, where ${split}
denotes either train, val, or test.
-
Download the panoramas using the API, and save in the folder
dataset/${split}/panos
with the file name<pano_id>.jpg
. If you encounter issues downloading the panoramas for your research, please contact us. -
Run all data preparation using the bash script
./scripts/process_streetview_images.sh
. This script cuts the panoramas into images usingscripts/panocutter.py
, saves the images in msgpack format used during training usingscripts/image_to_msgpack.py
, and infers the image to label mapping usingscripts/image_to_country.py
. The images are stored in the compressed mspack format in shards, indexed byIMG_ID
.
To run each row from our main table, run the following shell scripts.
Experiment | Attn Supervision | File |
---|---|---|
ISN | N/A | ./quickstart/isn.sh |
ISN + Random Text | N/A | ./quickstart/isn_random_text.sh |
ISN + Guidebook | No | ./quickstart/isn_guidebook_no-attn-sup.sh |
ISN + Guidebook | Yes | ./quickstart/isn_guidebook.sh |
ISN + CLIP | N/A | ./quickstart/isn_clip.sh |
ISN + CLIP + Random Text | N/A | ./quickstart/isn_clip_random_text.sh |
ISN + CLIP + Guidebook | No | ./quickstart/isn_clip_guidebook_no-attn-sup.sh |
G^3 = ISN + CLIP + Guidebook | Yes | ./quickstart/isn_clip_guidebook.sh |
You can also customize the arguments using our OmegaConf inheritance scheme. You can edit a config to have the field includes: [<parent_config.yml>]
to inherit fields. You can also add fields via flags when running python -m classification.train.train_classification,
for example adding model_params.name=<name>
to override the default name in the provided config.
To evaluate trained weights on the StreetView images, run the cells of notebooks/inference.ipynb
.
If you find our dataset useful for your research, please, cite the following paper:
@article{luo2022geolocation,
title={G^3: Geolocation via Guidebook Grounding},
author={Luo, Grace and Biamby, Giscard and Darrell, Trevor and Fried, Daniel and Rohrbach, Anna},
journal={Findings of EMNLP},
year={2022}
}