Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

avoid direct pipe of curl|tar to prevent download problems #433

Closed
wants to merge 3 commits into from

Conversation

jeremy-rutman
Copy link

@jeremy-rutman jeremy-rutman commented Apr 7, 2019

this version works for me where the curl|tar version doesn't, hitting err below - see 189


error downloading whosonfirst-data-locality-latest.tar.bz2 bundle: Error: Command failed: curl -s https://dist.whosonfirst.org/bundles/whosonfir
st-data-locality-latest.tar.bz2 | tar -xj --strip-components=1 --exclude=README.txt -C /mnt/storage-proc2/users/jeremy/pelias_metal/data/whosonf
irst && mv /mnt/storage-proc2/users/jeremy/pelias_metal/data/whosonfirst/whosonfirst-data-locality-latest.csv /mnt/storage-proc2/users/jeremy/pe
lias_metal/data/whosonfirst/meta

bzip2: Compressed file ends unexpectedly;
        perhaps it is corrupted?  *Possible* reason follows.
bzip2: Inappropriate ioctl for device
        Input file = (stdin), output file = (stdout)

@orangejulius
Copy link
Member

Hey @jeremy-rutman,
Thanks for this PR. I'd personally prefer to understand a bit more about what's causing the original download script to fail. Piping directly to tar can save quite a bit of time and disk space, so it's not something we'd like to abandon without exploring other options.

Over in #417 we discovered that Node.js has issues with buffers filling on long downloads if the -s flag is not passed to curl, but that doesn't seem to be the issue here, since the -s flag is present. Do we know what else can cause this sort of thing?

@jeremy-rutman
Copy link
Author

Asked for some help on SO

@NickStallman
Copy link

I have seen the occasional glitch with WOF download which killed the whole process, requiring redownload of everutjomg frp, scratch which can be a bit annoying.

However is this an issue with the sqlite? The download process with sqlite is simpler, and can also be done manually if needed.

@jeremy-rutman
Copy link
Author

jeremy-rutman commented Apr 11, 2019

as i mentioned at stackoverflow, I tried the same command from shell instead of as a javascript exec, and curl|tar fails the same way at command line. wget -q -O netfile | tar seems to have worked for files that failed on curl|tar

wget -q -O - https://dist.whosonfirst.org/bundles/whosonfirst-data-postalcode-jp-latest.tar.bz2|tar -xj --strip-components=1 --exclude=README.txt -C /mnt/data_science/pelias_w6/data/whosonfirst 

if there is some alternate way to dl using sqlite that'd be great

@orangejulius
Copy link
Member

There is now a way to download the sqlite files. set "sqlite": true in pelias.json (mentioned in the wof readme).

It sounds like some possible causes of this are:

  • multiple simultaneous connections overwhelm the network (this can happen, but looks like it's not the case here since a single download command run manually also fails)
  • intermittent network issues cause the download to fail (the downloader does not recover from such errors, but this issue would also probably not repeat every time)
  • the disk and CPU usage of extracting the bundle archive overwhelms the system (on my laptop, downloading WOF data pretty much maxes out both CPU and disk.
  • slow network connections eventually lead to some type of timeout

@jeremy-rutman can you try the sqlite download and also check if possible causes 3 or 4 might be the culprit here? Some info on your network speed, time of download, and CPU and disk speed could be helpful here.

If it is that last one, the SQLite download would probably be significantly more reliable, since it extracts to only a single file and therefore doesn't seem to have the same impact on disk usage that extracting millions of individual geojson files does with the bundles.

@jeremy-rutman
Copy link
Author

jeremy-rutman commented Apr 13, 2019

With the imports.whosonfirst.sqlite set to true , the downloads all completed and rather faster to boot.

Machine stats for both failed curl|tar and successful sqlite :

  • network speed was ~20MB/s on Fri Apr 12 22:16:24 CDT 2019,
  • overall cpu load was ~10% on a 16 cpu machine with a few other pelias imports happening and a spark cluster running
  • disk write speed (from dd) is 460 MB/s

@jeremy-rutman
Copy link
Author

I did hit a fail on curl when imports.whosonfirst.sqlite is true on a different machine

deploy@dap-jupyter01:/mnt/open_street_map/pelias_docker/docker/projects/planet$ pelias download wof
child_process.js:637
    throw err;
    ^

Error: Command failed: curl --silent -L https://dist.whosonfirst.org/sqlite/inventory.json
    at checkExecSyncError (child_process.js:616:11)
    at Object.execFileSync (child_process.js:634:13)
    at module.exports (/code/pelias/whosonfirst/node_modules/download-file-sync/index.js:3:6)
    at generateSQLites (/code/pelias/whosonfirst/utils/download_sqlite_all.js:26:32)
    at Object.download (/code/pelias/whosonfirst/utils/download_sqlite_all.js:63:29)
    at Object.<anonymous> (/code/pelias/whosonfirst/utils/download_data.js:66:38)
    at Module._compile (internal/modules/cjs/loader.js:689:30)
    at Object.Module._extensions..js (internal/modules/cjs/loader.js:700:10)
    at Module.load (internal/modules/cjs/loader.js:599:32)
    at tryModuleLoad (internal/modules/cjs/loader.js:538:12)

althought the curl command runs at the command line without problem.
Possibly its the relatively slow disk here ? (80MB/s disk and ~15MB/s network)

@jdomag
Copy link

jdomag commented Jun 6, 2019

@jeremy-rutman
I build a docker image from your master repo and i hit an errors like below. Also it's wired that i see "download done" and just after that "download error"

error downloading whosonfirst-data-postalcode-us-latest.tar.bz2 bundle: Error: Command failed: wget -q https://dist.whosonfirst.org/bundles/whosonfirst-data-postalcode-us-latest.tar.bz2 && tar -xj --strip-components=1 --exclude=README.txt
 -C /data/whosonfirst -f whosonfirst-data-postalcode-us-latest.tar.bz2 && rm whosonfirst-data-postalcode-us-latest.tar.bz2 && mv /data/whosonfirst/whosonfirst-data-postalcode-us-latest.csv /data/whosonfirst/meta                           
                                                                                                                                                                                                                                              
done downloading whosonfirst-data-postalcode-ve-latest.tar.bz2 bundle                                                                                                                                                                         
                                                                                                                                                                                                                                              
error downloading whosonfirst-data-postalcode-ve-latest.tar.bz2 bundle: Error: Command failed: wget -q https://dist.whosonfirst.org/bundles/whosonfirst-data-postalcode-ve-latest.tar.bz2 && tar -xj --strip-components=1 --exclude=README.txt
 -C /data/whosonfirst -f whosonfirst-data-postalcode-ve-latest.tar.bz2 && rm whosonfirst-data-postalcode-ve-latest.tar.bz2 && mv /data/whosonfirst/whosonfirst-data-postalcode-ve-latest.csv /data/whosonfirst/meta                           
                                                                                                                                                                                                                                              
done downloading whosonfirst-data-postalcode-za-latest.tar.bz2 bundle                                                                                                                                                                         
                                                                                                                                                                                                                                              
error downloading whosonfirst-data-postalcode-za-latest.tar.bz2 bundle: Error: Command failed: wget -q https://dist.whosonfirst.org/bundles/whosonfirst-data-postalcode-za-latest.tar.bz2 && tar -xj --strip-components=1 --exclude=README.txt
 -C /data/whosonfirst -f whosonfirst-data-postalcode-za-latest.tar.bz2 && rm whosonfirst-data-postalcode-za-latest.tar.bz2 && mv /data/whosonfirst/whosonfirst-data-postalcode-za-latest.csv /data/whosonfirst/meta

All done!
{"level":"info","message":"Loading whosonfirst-data-ocean-latest.csv records from /data/whosonfirst/meta","label":"whosonfirst","timestamp":"2019-06-06T09:28:41.905Z"}
{"level":"info","message":"Loading whosonfirst-data-marinearea-latest.csv records from /data/whosonfirst/meta","label":"whosonfirst","timestamp":"2019-06-06T09:28:41.924Z"}
{"level":"info","message":"Loading whosonfirst-data-continent-latest.csv records from /data/whosonfirst/meta","label":"whosonfirst","timestamp":"2019-06-06T09:28:42.449Z"}
{"level":"info","message":"Loading whosonfirst-data-empire-latest.csv records from /data/whosonfirst/meta","label":"whosonfirst","timestamp":"2019-06-06T09:28:42.463Z"}
{"level":"info","message":"Loading whosonfirst-data-country-latest.csv records from /data/whosonfirst/meta","label":"whosonfirst","timestamp":"2019-06-06T09:28:42.483Z"}
{"level":"info","message":"Loading whosonfirst-data-dependency-latest.csv records from /data/whosonfirst/meta","label":"whosonfirst","timestamp":"2019-06-06T09:28:44.695Z"}
{"paused":false,"transient":1,"current_length":12,"level":"info","message":"","label":"dbclient-whosonfirst","timestamp":"2019-06-06T09:28:44.929Z"}
{"paused":false,"transient":1,"current_length":12,"level":"info","message":"","label":"dbclient-whosonfirst","timestamp":"2019-06-06T09:28:44.929Z"}
{"level":"info","message":"Loading whosonfirst-data-disputed-latest.csv records from /data/whosonfirst/meta","label":"whosonfirst","timestamp":"2019-06-06T09:28:45.210Z"}
{"level":"info","message":"Loading whosonfirst-data-macroregion-latest.csv records from /data/whosonfirst/meta","label":"whosonfirst","timestamp":"2019-06-06T09:28:45.278Z"}
{"level":"info","message":"Loading whosonfirst-data-region-latest.csv records from /data/whosonfirst/meta","label":"whosonfirst","timestamp":"2019-06-06T09:28:45.335Z"}
events.js:167
      throw er; // Unhandled 'error' event
      ^

Error: ENOENT: no such file or directory, open '/data/whosonfirst/meta/whosonfirst-data-region-latest.csv'
Emitted 'error' event at:
    at lazyFs.open (internal/fs/streams.js:115:12)
    at FSReqWrap.oncomplete (fs.js:141:20)

@jeremy-rutman
Copy link
Author

jeremy-rutman commented Jun 6, 2019

@jeremy-rutman
I build a docker image from your master repo and i hit an errors like below. Also it's wired that i see "download done" and just after that "download error"
...
Error: ENOENT: no such file or directory, open '/data/whosonfirst/meta/whosonfirst-data-region-latest.csv'
Emitted 'error' event at:
at lazyFs.open (internal/fs/streams.js:115:12)
at FSReqWrap.oncomplete (fs.js:141:20)

@jdomag I'd try setting imports.whosonfirst.sqlite to true in the config file and see if that takes care of it .
Alternatively try the commands you see failing , like wget -q https://dist.whosonfirst.org/bundles/whosonfirst-data-postalcode-ve-latest.tar.bz2 && tar -xj --strip-components=1 --exclude=README.txt at the command line and see if they work from there

@jdomag
Copy link

jdomag commented Jun 6, 2019

@jeremy-rutman
Unfortunately i hit another errors with sqlite downloads:
#452

@jeremy-rutman
Copy link
Author

jeremy-rutman commented Jun 6, 2019

@jeremy-rutman
Unfortunately i hit another errors with sqlite downloads:
#452

did you try the wget->tar commands at the command line ? That might work if all else fails , and if it doesnt work it may point to why its failing

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants