The poster will be presented at ECCV on Friday, October 4th, 10:30-12:30 at poster board #139.
- Installation
- Dataset
- Training
- Evaluation (includes pretrained weights)
- Examples: A photo album that contains >2000 randomly chosen street-view images and corresponding predictions from our model.
- Install Jax with GPU support: https://jax.readthedocs.io/en/latest/installation.html
- Clone this repository:
git clone https://github.com/fferflo/statewide-visual-geolocalization cd statewide-visual-geolocalization
- Install the remaining dependencies:
pip install -r requirements.txt
We train and evaluate our method on street-view images from the Mapillary platform and aerial imagery from Massachusetts, Washington DC, North Carolina (states in the US) and Berlin-Brandenburg, NRW, Saxony (states in Germany).
Please follow these instructions to download the data.
-
Fill in the dataset paths indicated by
TODO
in the configuration fileconfig/main.yaml
. The entries should look something like this:train: list: - tiles-path: .../data/opennrw path: .../data/mapillary-opennrw test: path: .../data/mapillary-boston100km2 tiles: - path: .../data/massgis/utm19 geojson: .../data/boston100km2.geojson
-
Run the training script:
python3 train.py --output .../train --config config/main.yaml
The results will be stored in
.../train-YYYY-MM-DDTHH-mm-ss
. The training uses all available GPUs by default. A training run with 2xH100 takes about 2.5 days.
-
Create a reference database for a search region by running the following script:
python create_reference_database.py --train .../train-YYYY-MM-DDTHH-mm-ss --output .../refdb-massgis --tiles .../data/massgis/utm19 .../data/massgis/utm18
This will create a division of the region into cells, predict embeddings for all cells, create a FAISS index for efficient retrieval and store everything in the output directory. This might take several days depending on your hardware setup and search region size.
By default, the search region is defined to cover all tiles that are specified in
--tiles
. The argument accepts multiple tile datasets, such as the overlapping UTM18 and UTM19 regions of Massachusetts. Optionally, a geojson file can be passed to the script via--geojson
to define a custom search region as a subset of the region covered by the tiles.Pretrained weights can be used by cloning the repository from Huggingface
git clone https://huggingface.co/fferflo/statewide-geoloc-nomassgis
and passing the path to the
--train
argument. These are not the original weights used in the paper, but are retrained using this repository. The results are slightly better than reported in the paper (see below).The output folder will contain the files:
aerial_features.bin # Embeddings for all cells cellregion.npz # Division of the region into cells faiss.index # FAISS index that can be loaded via faiss.read_index("faiss.index") config.yaml # Configuration parameters of the search region, model, etc model_weights.safetensors # Model weights used to create the embeddings
-
Localize query images against the reference database by running the following script:
python localize.py --query .../data/mapillary-boston100km2 --reference .../refdb-massgis --stride 1
This will predict embeddings for all street-view photos in the given dataset, and localize them against the reference database. The
--stride
parameter can be used to localize only a subset of the images (e.g. every 10th image with--stride 10
).The script will print the
Recall@k<r
of the localization for different radiir
and top-k
cells. For example, the pretrained weights from above yield the following results:> python localize.py --query .../data/mapillary-massgis --reference .../refdb-massgis --stride 100 ... takes some time ... Recall@1<0m: 0.2880 Recall@5<0m: 0.5007 Recall@10<0m: 0.5516 Recall@50<0m: 0.6432 Recall@100<0m: 0.6771 Recall@1<25m: 0.4683 Recall@5<25m: 0.6495 Recall@10<25m: 0.6914 Recall@50<25m: 0.7659 Recall@100<25m: 0.7915 Recall@1<50m: 0.6105 Recall@5<50m: 0.7229 Recall@10<50m: 0.7578 Recall@50<50m: 0.8216 Recall@100<50m: 0.8430 Recall@1<100m: 0.6297 Recall@5<100m: 0.7385 Recall@10<100m: 0.7732 Recall@50<100m: 0.8369 Recall@100<100m: 0.8583
Please consider citing our work if you use the code or data, or build upon the ideas presented in the paper:
@inproceedings{fervers2024statewide,
title = {Statewide Visual Geolocalization in the Wild},
author = {Florian Fervers and Sebastian Bullinger and Christoph Bodensteiner and Michael Arens and Rainer Stiefelhagen},
booktitle = {ECCV},
year = {2024}
}
Feel free to open an issue in this Github repository if you have any problems with the code or data.