Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bzip2 to lbzip2 migration to use all CPU cores #173

Merged
merged 6 commits into from
Mar 7, 2020

Conversation

loadit1
Copy link
Contributor

@loadit1 loadit1 commented Feb 25, 2020

Summary of changes:

  1. If lbzip2 installed on system we use it. If not, using legacy bzip2.
  2. Updated Dockerfile to install lbzip2 and parallel

Related to discussion about issue 480 in pelias/whosonfirst

Related also to #142 #143

Summary of changes:

If lbzip2 installed on system we use it. If not, using legacy bzip2.
Updated Dockerfile to install lbzip2
@missinglink
Copy link
Member

On the warpath ;)

See pelias/whosonfirst#480 (comment), if you re-open this on a new branch within our org then docker images will be automatically generated for testing 🎉

@missinglink
Copy link
Member

The test failures may be unrelated to your work, we recently moved the CDN assets:

BUCKET=https://storage.googleapis.com/pelias-data.geocode.earth/placeholder

I'll check it out and let you know

@missinglink
Copy link
Member

missinglink commented Feb 25, 2020

I just merged #174, please rebase origin/master to pull in a fix for the Travis test failure.

Summary of changes:

If lbzip2 installed on system we use it. If not, using legacy bzip2.
Updated Dockerfile to install lbzip2
@loadit1
Copy link
Contributor Author

loadit1 commented Feb 25, 2020

@missinglink sure, done.

@loadit1
Copy link
Contributor Author

loadit1 commented Feb 25, 2020

@missinglink done, please let me know if you still see some issues with indentation and I'll try to fix them if so. Thank you!

@loadit1
Copy link
Contributor Author

loadit1 commented Feb 26, 2020

Found that everything ready to use parallel (#142 #143), but Dockerfile was not updated to use it. As a result, we also use one core on pelias prepare all stage. Fixed Dockerfile

@loadit1 loadit1 requested a review from missinglink February 27, 2020 16:12
@loadit1
Copy link
Contributor Author

loadit1 commented Mar 3, 2020

Hello! Please let me know if this PR can be merged or you need any action from my side? Thank you!

@loadit1 loadit1 requested review from orangejulius and missinglink and removed request for missinglink March 7, 2020 11:49
@missinglink
Copy link
Member

Sorry I didn't get back to you.

I got mentally stuck on adding the parallel dependency, what relevance does it have to this issue?

I'm also not quite clear on how the code used to work without this?

The reason I'm worried is that I've had both successes and deflated failures using the parallel program, so I'm always worried about the latter.

The rest of the PR looks fine to me.

@missinglink
Copy link
Member

missinglink commented Mar 7, 2020

Agh ok so I remember now, the scripts which reference parallel are only used to generate new data extracts which are in turn used to generate the database.

Most people just download the data from Geocode Earth so they don't use these scripts.

Yea, I think it's fine to add these, I checked our build server and we do this before running the scripts:

apt-get install -y jq parallel pigz

Is pigz also required? It's the gzip equivalent of lbzip2.

Copy link
Member

@missinglink missinglink left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, took me a while but this should be fine

@missinglink missinglink merged commit 09ebaea into pelias:master Mar 7, 2020
@loadit1 loadit1 deleted the lbzip2 branch March 7, 2020 17:45
@loadit1
Copy link
Contributor Author

loadit1 commented Mar 7, 2020

@missinglink Thanks a lot!

As far I can see, pigz is using only in s3_upload.sh. I do not use this, so I'm not sure if I can recommend to add this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants