Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue with pelias download wof: Corrupted SQLite during whosonfirst full planent Data Download #545

Closed
taminoelgert opened this issue Mar 5, 2024 · 2 comments
Labels

Comments

@taminoelgert
Copy link

taminoelgert commented Mar 5, 2024

Describe the bug
When attempting a full planet build using Kubernetes, the pelias download wof command consistently throws the following error after downloading the whosonfirst data:

error: [whosonfirst] error downloading whosonfirst-data-admin-latest.db.bz2
Error: Command failed: curl -sA 'pelias-whosonfirst/0.0.0-development' https://data.geocode.earth/wof/dist/sqlite/whosonfirst-data-admin-latest.db.bz2 | lbunzip2 > /data/whosonfirst/sqlite/whosonfirst-data-admin-latest.db
lbunzip2: stdin: compressed data error: bad block header magic

Steps to Reproduce

  • use full planent config
  • start whosonfirst container with ./bin/download command (pelias download wof)

Expected behavior

The pelias download wof command should download the whosonfirst data without encountering any errors.

Environment (please complete the following information):

  • Kubernetes environment with 32 cores and 64 GB RAM (on Kubernetes nodes).
  • Local environment with 24 cores and 32 GB RAM.
  • OS: [e.g. Linux]
  • Docker version 24.0.7, build afdd53b

Pastebin/Screenshots

pelias config:

{
      "logger": {
        "level": "info",
        "timestamp": true
      },
      "esclient": {
        "apiVersion": "7.x",
        "hosts": [
          {
            "protocol": "https",
            "host": "geocoder-es-http",
          }
        ]
      },
      "acceptance-tests": {
        "endpoints": {
          "docker": "http://pelias-api:4000/v1/"
        }
      },
      "api": {
        "services": {
          "placeholder": {
            "url": "http://pelias-placeholder:4100"},
          "interpolation": {
            "url": "http://pelias-interpolation:4300"},
          "libpostal": {
            "url": "http://pelias-libpostal:4400"}
        }
      },
      "imports": {
        "adminLookup": {
          "enabled": true
        },
        "geonames": {
          "datapath": "/data/geonames",
          "countryCode": "ALL"
        },
        "openstreetmap": {
          "download": [
            {
              "sourceURL": "https://planet.openstreetmap.org/pbf/planet-latest.osm.pbf"}
          ],
          "leveldbpath": "/tmp",
          "datapath": "/data/openstreetmap",
          "import": [
            {
              "filename": "planet-latest.osm.pbf"
            }]
        },
        "openaddresses": {
          "datapath": "/data/openaddresses",
          "files": [
          ]
        },
        "polyline": {
          "datapath": "/data/polylines",
          "files": [
            "extract.0sv"]
        },
        "whosonfirst": {
          "datapath": "/data/whosonfirst",
          "importPostalcodes": true
        },
        "interpolation": {
          "download": {
            "tiger": {
              "datapath": "/data/tiger"
            }
          }
        }
      }
    }

Additional context

The issue can also be reproduced locally in a Docker environment by following the same steps up to the pelias download all command. Subsequent steps, such as placeholder prepare, fail because "the SQLite is corrupted."

References

Thank you for your assessment

@missinglink
Copy link
Member

Hi @taminoelgert, I wasn't able to reproduce this issue.

It might have been an intermittent connection issue with our CDN provider https://bunny.net/
Could you please confirm if the issue has resolved itself?

aria2c https://data.geocode.earth/wof/dist/sqlite/whosonfirst-data-admin-latest.db.bz2

03/11 15:29:45 [NOTICE] Downloading 1 item(s)
 *** Download Progress Summary as of Mon Mar 11 15:30:47 2024 ***
=============================================================================
[#b8d2ec 6.2GiB/8.0GiB(78%) CN:1 DL:92MiB ETA:19s]
FILE: /tmp/whosonfirst-data-admin-latest.db.bz2
-----------------------------------------------------------------------------

[#b8d2ec 7.9GiB/8.0GiB(98%) CN:1 DL:108MiB]
03/11 15:31:06 [NOTICE] Download complete: /tmp/whosonfirst-data-admin-latest.db.bz2

Download Results:
gid   |stat|avg speed  |path/URI
======+====+===========+=======================================================
b8d2ec|OK  |   105MiB/s|/tmp/whosonfirst-data-admin-latest.db.bz2

Status Legend:
(OK):download completed.
lbunzip2 -t whosonfirst-data-admin-latest.db.bz2

echo $?
0

@taminoelgert
Copy link
Author

Thanks for the reply, I have just tried again and now it seems to be working without any problems. Thanks for the help though, I'll close the ticket then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants