-
-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bzip2 to lbzip2 migration to use all CPU cores #480
Comments
Sounds reasonable, I'm not familiar with The only negative I could see is that it uses more RAM but this is probably a decent trade-off considering the reduction in wall-clock decompression time. https://vbtechsupport.com/1614/ Could you please open a PR and do some testing to ensure compatibility? |
Yeah, agreed. Speeding up the process of decompressing massive WOF Since I imagine |
Hmmm so according to that benchmark I linked Definitely worth investigating a little bit more and possibly detecting/logging OOM errors on modest systems. |
Hi! PR is just created: #481 Summary of changes:
Tested Using lbzip2
Using bzip2
Frankly speaking, speed results are not so different just because we use parallel download and parallel run of several instances of bzip2 ( But reason why I start to investigate it is that in PiP there is hardcoded one single sqlite file |
Seems like a reasonable trade off to me. It's unlikely anyone is running this on a extremely low memory machine. @loadit1 how many CPU cores did you have for your test? Even with the parallel download, on the Threadripper the CPU was never pushed very hard with bunzip2 so it would make a significantly bigger difference on larger systems. |
I don't remember exact specs on VM for my tests above, so I decided to test once again with 2 options:
Both options tested on Medium and Tiny VM setups. HW specs of my Host machine: Test 1: Medium VM setup
Test 2: Tiny VM setup
My test bash script:
You may use my bash test script to test other HW configs even with less RAM if smaller setups is using widely. |
Thanks for the benchmarks, they really help to put my mind at ease! Our docs say the absolute minimum RAM for Pelias is 8GB and we really recommend 16GB. We just recently suffered a bug where running another program on a 64 core machine caused OOM errors due to each core potentially using >2GB RAM at peak, so I wanted to avoid that here. It's a fairly recent issue since virtualization/containerization allows for uncommon CPU/RAM ratios these days 🤷♂️ Looking at your tests it seems that a ratio of 1GB per CPU core is adequate to avoid OOM errors. I'm happy to merge this, thanks for taking the time to investigate 😃 |
Thanks @loadit1, I have added you to our @pelias/contributors team which means you can create your own branches within the pelias org repos. When you do that, please prefix the branch with your username, eg Thanks 🎉 |
@missinglink good to know, thank you! |
Thanks @loadit1 for all the PRs and the extensive benchmarks :) The faster extraction should really help! |
Hi!
Line 30
in sqlite_download.sh :Line 53
in download_sqlite_all.js :Would you be so kind to advice me why not to migrate from
bzip2
tolbzip2
in order to increase performance by using all CPU cores?Thanks!
The text was updated successfully, but these errors were encountered: