Skip to content

Commit

Permalink
Update build process for Nominatim 4.5
Browse files Browse the repository at this point in the history
  • Loading branch information
leonardehrenfried committed Nov 11, 2024
1 parent 2212081 commit 86f1d34
Show file tree
Hide file tree
Showing 14 changed files with 816 additions and 0 deletions.
111 changes: 111 additions & 0 deletions 4.5/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,111 @@
ARG NOMINATIM_VERSION=4.5.0
ARG USER_AGENT=mediagis/nominatim-docker:${NOMINATIM_VERSION}

FROM ubuntu:24.04 AS build

ENV DEBIAN_FRONTEND=noninteractive
ENV LANG=C.UTF-8

WORKDIR /app

# Inspired by https://github.com/reproducible-containers/buildkit-cache-dance?tab=readme-ov-file#apt-get-github-actions
RUN \
--mount=type=cache,target=/var/cache/apt,sharing=locked \
--mount=type=cache,target=/var/lib/apt,sharing=locked \
# Keep downloaded APT packages in the docker build cache
rm -f /etc/apt/apt.conf.d/docker-clean && \
echo 'Binary::apt::APT::Keep-Downloaded-Packages "true";' >/etc/apt/apt.conf.d/keep-cache && \
# Do not start daemons after installation.
echo '#!/bin/sh\nexit 101' > /usr/sbin/policy-rc.d \
&& chmod +x /usr/sbin/policy-rc.d \
# Install all required packages.
&& apt-get -y update -qq \
&& apt-get -y install \
locales \
&& locale-gen en_US.UTF-8 \
&& update-locale LANG=en_US.UTF-8 \
&& apt-get -y install \
-o APT::Install-Recommends="false" \
-o APT::Install-Suggests="false" \
# Build tools from sources. \
build-essential \
python3-dev \
osm2pgsql \
pkg-config \
libicu-dev python3-pip \
# PostgreSQL.
postgresql-contrib \
postgresql-server-dev-16 \
postgresql-16-postgis-3 \
postgresql-16-postgis-3-scripts \
# Misc.
curl \
sudo \
sshpass \
openssh-client


# Configure postgres.
RUN true \
&& echo "host all all 0.0.0.0/0 md5" >> /etc/postgresql/16/main/pg_hba.conf \
&& echo "listen_addresses='*'" >> /etc/postgresql/16/main/postgresql.conf

ARG NOMINATIM_VERSION
ARG USER_AGENT

# Osmium install to run continuous updates.
#RUN --mount=type=cache,target=/root/.cache/pip,sharing=locked pip install --break-system-packages \
RUN pip install --break-system-packages \
nominatim-db==$NOMINATIM_VERSION \
osmium \
psycopg[binary] \
falcon \
uvicorn \
gunicorn \
nominatim-api

# Nominatim install.

RUN true \
# Remove development and unused packages.
&& apt-get -y remove --purge --auto-remove \
postgresql-server-dev-16 \
# Clear temporary files and directories.
&& rm -rf \
/tmp/* \
/var/tmp/*

# Apache configuration
COPY conf.d/apache.conf /etc/apache2/sites-enabled/000-default.conf

# Postgres config overrides to improve import performance (but reduce crash recovery safety)
COPY conf.d/postgres-import.conf /etc/postgresql/16/main/conf.d/postgres-import.conf.disabled
COPY conf.d/postgres-tuning.conf /etc/postgresql/16/main/conf.d/

COPY config.sh /app/config.sh
COPY init.sh /app/init.sh
COPY start.sh /app/start.sh
COPY startapache.sh /app/startapache.sh
COPY startpostgres.sh /app/startpostgres.sh

# Collapse image to single layer.
FROM scratch

COPY --from=build / /

# Please override this
ENV NOMINATIM_PASSWORD=qaIACxO6wMR3

ENV PROJECT_DIR=/nominatim

ARG USER_AGENT
ENV USER_AGENT=${USER_AGENT}

WORKDIR /app

EXPOSE 5432
EXPOSE 8080

COPY conf.d/env $PROJECT_DIR/.env

CMD ["/app/start.sh"]
246 changes: 246 additions & 0 deletions 4.5/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,246 @@
# Nominatim Docker (Nominatim version 4.4)

## Table of contents

- [Automatic import](#automatic-import)
- [Configuration](#configuration)
- [General Parameters](#general-parameters)
- [PostgreSQL Tuning](#postgresql-tuning)
- [Import Style](#import-style)
- [Flatnode files](#flatnode-files)
- [Configuration Example](#configuration-example)
- [Persistent container data](#persistent-container-data)
- [OpenStreetMap Data Extracts](#openstreetmap-data-extracts)
- [Updating the database](#updating-the-database)
- [Custom PBF Files](#custom-pbf-files)
- [Importance Dumps, Postcode Data, and Tiger Addresses](#importance-dumps-postcode-data-and-tiger-addresses)
- [Development](#development)
- [Docker Compose](#docker-compose)
- [Assorted use cases documented in issues](#assorted-use-cases-documented-in-issues)

---

## Automatic import

Download the required data, initialize the database and start nominatim in one go

```sh
docker run -it \
-e PBF_URL=https://download.geofabrik.de/europe/monaco-latest.osm.pbf \
-e REPLICATION_URL=https://download.geofabrik.de/europe/monaco-updates/ \
-p 8080:8080 \
--name nominatim \
mediagis/nominatim:4.4
```

Port 8080 is the nominatim HTTP API port and 5432 is the Postgres port, which you may or may not want to expose.

If you want to check that your data import was successful, you can use the API with the following URL: http://localhost:8080/search.php?q=avenue%20pasteur

## Configuration

### General Parameters

The following environment variables are available for configuration:

- `PBF_URL`: Which [OSM extract](#openstreetmap-data-extracts) to download and import. It cannot be used together with `PBF_PATH`.
Check [https://download.geofabrik.de](https://download.geofabrik.de)
Since the download speed is restricted at Geofabrik, there is a recommended list of mirrors for importing the full planet at [OSM Wiki](https://wiki.openstreetmap.org/wiki/Planet.osm#Planet.osm_mirrors).
At the mirror sites you can find the folder /planet which contains the planet-latest.osm.pbf
and often a `/replication` folder for the `REPLICATION_URL`.
- `PBF_PATH`: Which [OSM extract](#openstreetmap-data-extracts) to import from the .pbf file inside the container. It cannot be used together with `PBF_URL`.
- `REPLICATION_URL`: Where to get updates from. For example Geofabrik's update for the Europe extract are available at `https://download.geofabrik.de/europe-updates/`
Other places at Geofabrik follow the pattern `https://download.geofabrik.de/$CONTINENT/$COUNTRY-updates/`

- `REPLICATION_UPDATE_INTERVAL`: How often upstream publishes diffs (in seconds, default: `86400`). _Requires `REPLICATION_URL` to be set._
- `REPLICATION_RECHECK_INTERVAL`: How long to sleep if no update found yet (in seconds, default: `900`). _Requires `REPLICATION_URL` to be set._
- `UPDATE_MODE`: How to run replication to [update nominatim data](https://nominatim.org/release-docs/4.4.1/admin/Update/#updating-nominatim). Options: `continuous`/`once`/`catch-up`/`none` (default: `none`)
- `FREEZE`: Freeze database and disable dynamic updates to save space. (default: `false`)
- `REVERSE_ONLY`: If you only want to use the Nominatim database for reverse lookups. (default: `false`)
- `IMPORT_WIKIPEDIA`: Whether to download and import the Wikipedia importance dumps (`true`) or path to importance dump in the container. Importance dumps improve the scoring of results. On a beefy 10 core server, this takes around 5 minutes. (default: `false`)
- `IMPORT_US_POSTCODES`: Whether to download and import the US postcode dump (`true`) or path to US postcode dump in the container. (default: `false`)
- `IMPORT_GB_POSTCODES`: Whether to download and import the GB postcode dump (`true`) or path to GB postcode dump in the container. (default: `false`)
- `IMPORT_TIGER_ADDRESSES`: Whether to download and import the Tiger address data (`true`) or path to a preprocessed Tiger address set in the container. (default: `false`)
- `THREADS`: How many threads should be used to import (default: number of processing units available to the current process via `nproc`)
- `NOMINATIM_PASSWORD`: The password to connect to the database with (default: `qaIACxO6wMR3`)

The following run parameters are available for configuration:

- `shm-size`: Size of the tmpfs in Docker, for bigger imports (e.g. Europe) this needs to be set to at least 1GB or more. Half the size of your available RAM is recommended. (default: `64M`)

### PostgreSQL Tuning

The following environment variables are available to tune PostgreSQL:

- `POSTGRES_SHARED_BUFFERS` (default: `2GB`)
- `POSTGRES_MAINTENANCE_WORK_MEM` (default: `10GB`)
- `POSTGRES_AUTOVACUUM_WORK_MEM` (default: `2GB`)
- `POSTGRES_WORK_MEM` (default: `50MB`)
- `POSTGRES_EFFECTIVE_CACHE_SIZE` (default: `24GB`)
- `POSTGRES_SYNCHRONOUS_COMMIT` (default: `off`)
- `POSTGRES_MAX_WAL_SIZE` (default: `1GB`)
- `POSTGRES_CHECKPOINT_TIMEOUT` (default: `10min`)
- `POSTGRES_CHECKPOINT_COMPLETION_TARGET` (default: `0.9`)

See https://nominatim.org/release-docs/4.4.1/admin/Installation/#tuning-the-postgresql-database for more details on those settings.

### Import Style

The import style can be modified through an environment variable :

- `IMPORT_STYLE` (default: `full`)

Available options are :

- `admin`: Only import administrative boundaries and places.
- `street`: Like the admin style but also adds streets.
- `address`: Import all data necessary to compute addresses down to house number level.
- `full`: Default style that also includes points of interest.
- `extratags`: Like the full style but also adds most of the OSM tags into the extratags column.

See https://nominatim.org/release-docs/4.4.1/admin/Import/#filtering-imported-data for more details on those styles.

### Flatnode files

In addition you can also mount a volume / bind-mount on `/nominatim/flatnode` (see: Persistent container data) to use flatnode storage. This is advised for bigger imports (Europe, North America etc.), see: https://nominatim.org/release-docs/4.4.1/admin/Import/#flatnode-files. If the mount is available for the container, the flatnode configuration is automatically set and used.

```sh
docker run -it \
-v nominatim-flatnode:/nominatim/flatnode \
-e PBF_URL=https://download.geofabrik.de/europe/monaco-latest.osm.pbf \
-e REPLICATION_URL=https://download.geofabrik.de/europe/monaco-updates/ \
-p 8080:8080 \
--name nominatim \
mediagis/nominatim:4.4
```

### Configuration Example

Here you can find a [configuration example](example.md) for all flags you can use for the container creation.


## Persistent container data

If you want to keep your imported data across deletion and recreation of your container, make the following folder a volume:

- `/var/lib/postgresql/16/main` is the storage location of the Postgres database & holds the state about whether the import was successful
- `/nominatim/flatnode` is the storage location of the flatnode file.

So if you want to be able to kill your container and start it up again with all the data still present use the following command:

```sh
docker run -it --shm-size=1g \
-e PBF_URL=https://download.geofabrik.de/europe/monaco-latest.osm.pbf \
-e REPLICATION_URL=https://download.geofabrik.de/europe/monaco-updates/ \
-e IMPORT_WIKIPEDIA=false \
-e NOMINATIM_PASSWORD=very_secure_password \
-v nominatim-data:/var/lib/postgresql/16/main \
-p 8080:8080 \
--name nominatim \
mediagis/nominatim:4.4
```

## OpenStreetMap Data Extracts

Nominatim imports OpenStreetMap (OSM) data extracts. The source of the data can be specified with one of the following environment variables:

- `PBF_URL` variable specifies the URL. The data is downloaded during initialization, imported and removed from disk afterwards. The data extracts can be freely downloaded, e.g., from [Geofabrik's server](https://download.geofabrik.de).
- `PBF_PATH` variable specifies the path to the mounted OSM extracts data inside the container. No .pbf file is removed after initialization.

It is not possible to define both `PBF_URL` and `PBF_PATH` sources.

The replication update can be performed only via HTTP.

A sample of `PBF_PATH` variable usage is:

```sh
docker run -it \
-e PBF_PATH=/nominatim/data/monaco-latest.osm.pbf \
-e REPLICATION_URL=https://download.geofabrik.de/europe/monaco-updates/ \
-p 8080:8080 \
-v /osm-maps/data:/nominatim/data \
--name nominatim \
mediagis/nominatim:4.4
```

where the _/osm-maps/data/_ directory contains _monaco-latest.osm.pbf_ file that is mounted and available in container: _/nominatim/data/monaco-latest.osm.pbf_

## Updating the database

Full documentation for Nominatim update available [here](https://nominatim.org/release-docs/4.4.1/admin/Update/). For a list of other methods see the output of:

```sh
docker exec -it nominatim sudo -u nominatim nominatim replication --help
```

The following command will keep updating the database forever:

```sh
docker exec -it nominatim sudo -u nominatim nominatim replication --project-dir /nominatim
```

If there are no updates available this process will sleep for 15 minutes and try again.

## Custom PBF Files

If you want your Nominatim container to host multiple areas from Geofabrik, you can use a tool, such as [Osmium](https://osmcode.org/osmium-tool/manual.html), to merge multiple PBF files into one.

```sh
docker run -it \
-e PBF_PATH=/nominatim/data/merged.osm.pbf \
-p 8080:8080 \
-v /osm-maps/data:/nominatim/data \
--name nominatim \
mediagis/nominatim:4.4
```

where the _/osm-maps/data/_ directory contains _merged.osm.pbf_ file that is mounted and available in container: _/nominatim/data/merged.osm.pbf_

## Importance Dumps, Postcode Data, and Tiger Addresses

Including the Wikipedia importance dumps, postcode files, and Tiger address data can improve results. These can be automatically downloaded by setting the appropriate options (see above) to `true`. Alternatively, they can be imported from local files by specifying a file path (relative to the container), similar to how `PBF_PATH` is used. For example:

```sh
docker run -it \
-e PBF_URL=https://download.geofabrik.de/europe/monaco-latest.osm.pbf \
-e IMPORT_WIKIPEDIA=/nominatim/extras/wikimedia-importance.sql.gz \
-p 8080:8080 \
-v /osm-maps/extras:/nominatim/extras \
--name nominatim \
mediagis/nominatim:4.4
```

Where the path to the importance dump is given relative to the container. (The file does not need to be named `wikimedia-importance.sql.gz`.) The same works for `IMPORT_US_POSTCODES` and `IMPORT_GB_POSTCODES`.

For more information about the Tiger address file, see [Installing TIGER housenumber data for the US](https://nominatim.org/release-docs/4.4.1/customize/Tiger/).

## Development

If you want to work on the Docker image you can use the following command to build a local
image and run the container with

```sh
docker build -t nominatim . && \
docker run -it \
-e PBF_URL=https://download.geofabrik.de/europe/monaco-latest.osm.pbf \
-e REPLICATION_URL=https://download.geofabrik.de/europe/monaco-updates/ \
-p 8080:8080 \
--name nominatim \
nominatim
```

## Docker Compose

In addition, we also provide a basic `contrib/docker-compose.yml` template which you use as a starting point and adapt to your needs. Use this template to set the environment variables, mounts, etc. as needed.

Besides the basic docker-compose.yml, there are also some advanced YAML configurations available in the `contrib` folder.
These files follow the naming convention of `docker-compose-*.yml` and contain comments about the specific use case.

## Assorted use cases documented in issues

- [Using an external Postgres database](https://github.com/mediagis/nominatim-docker/issues/245#issuecomment-1072205751)
- [Using Amazon's RDS](https://github.com/mediagis/nominatim-docker/issues/378#issuecomment-1278653770)
- [Hardware sizing for importing the entire planet](https://github.com/mediagis/nominatim-docker/discussions/265)
- [Upgrading Nominatim](https://github.com/mediagis/nominatim-docker/discussions/317)
- [Using Nominatim UI](https://github.com/mediagis/nominatim-docker/discussions/486#discussioncomment-7239861)

13 changes: 13 additions & 0 deletions 4.5/conf.d/apache.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Listen 8080
<VirtualHost *:8080>
DocumentRoot /nominatim/website
CustomLog "|$/usr/bin/rotatelogs -n 7 /var/log/apache2/access.log 86400" combined
ErrorLog "|$/usr/bin/rotatelogs -n 7 /var/log/apache2/error.log 86400"
LogLevel info
<Directory /nominatim/website>
Options FollowSymLinks MultiViews
DirectoryIndex search.php
Require all granted
</Directory>
AddType text/html .php
</VirtualHost>
6 changes: 6 additions & 0 deletions 4.5/conf.d/env
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
NOMINATIM_TOKENIZER=icu
NOMINATIM_REPLICATION_URL=__REPLICATION_URL__
NOMINATIM_REPLICATION_UPDATE_INTERVAL=86400
NOMINATIM_REPLICATION_RECHECK_INTERVAL=900
NOMINATIM_IMPORT_STYLE=__IMPORT_STYLE__
NOMINATIM_FLATNODE_FILE=
2 changes: 2 additions & 0 deletions 4.5/conf.d/postgres-import.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
fsync = off
full_page_writes = off
10 changes: 10 additions & 0 deletions 4.5/conf.d/postgres-tuning.conf
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# See https://nominatim.org/release-docs/4.4.1/admin/Installation/#tuning-the-postgresql-database
shared_buffers = 2GB
maintenance_work_mem = 10GB
autovacuum_work_mem = 2GB
work_mem = 50MB
effective_cache_size = 24GB
synchronous_commit = off
max_wal_size = 1GB
checkpoint_timeout = 10min
checkpoint_completion_target = 0.9
Loading

0 comments on commit 86f1d34

Please sign in to comment.