Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Out of memory error #4

Open
SWittouck opened this issue Apr 10, 2020 · 6 comments
Open

Out of memory error #4

SWittouck opened this issue Apr 10, 2020 · 6 comments

Comments

@SWittouck
Copy link

Dear Zhemin,

Thank you for making PEPPA publicly available and for putting the publication on bioRxiv, it's a very nice read!

I managed to install PEPPA successfully and tried to do a test run on 73 genomes of the order Lactobacillales. After a few minutes I got an out of memory error (memory was indeed full) and the job aborted. Is there anything I can do to solve this? I have 16GB of memory and was using all 16 threads I have available.

Best wishes,
Stijn

@zheminzhou
Copy link
Owner

Due to the problem of multi-threading in Python, part of the parallel calculation is handled by multi-processes, and all data in the memory will be replicated in each process. Please try to run PEPPA with fewer processes (i.e., 4). I will close this issue for now but please re-open it if you still get an out-of-memory problem.

@SWittouck
Copy link
Author

Dear Zhemin,

Thank you for your suggestion, I will try this.

Best wishes,
Stijn

@SWittouck
Copy link
Author

Dear Zhemin,

I tried to run with fewer threads, as you suggested, even down to a single thread. Unfortunately, the issue remained. In annex the log file with the error - it seems to occur in the BLASTn step.

Best wishes,
Stijn
peppa.log

@zheminzhou
Copy link
Owner

I have pushed PEPPA in pypi with a formal version number 1.0
The codes in this version have been re-visited to optimize the memory performance.
You can install it in python3 >=3.5 via
pip install bio-peppa
And the executable is 'PEPPA' by default.
Hope this can solve the memory leaking problem.

@zheminzhou zheminzhou reopened this Apr 22, 2020
@SWittouck
Copy link
Author

Hi Zhemin,

I installed PEPPA version 1.0 using pip, as you suggested. It didn't fix the problem: I still got out-of-memory errors, no matter the number of threads I used. However, I took a closer look at how PEPPA works, and it seems to me that it is not suited for datasets above the genus level? While I have a genome dataset on the order level; I think the blastn searches are not sensitive enough for those. When I set --clust_identity to 0.5, --clust_match_prop to 0.6 and --match_identityto 0.5, there was no error anymore! So I'm still not sure what caused the error, and I think my dataset is anyway outside of the scope of PEPPA, but at least the error got solved. Thank you for your help!

I have one additional remark: I found a bug in PEPPA_parser.py. In line 64, there is a ] too many.

Best regards,
Stijn

@zheminzhou
Copy link
Owner

Thank you for the bug report (again) and the solution you found. PEPPA allows a lower limit of "--match_identity" down to 0.4, so your value of 0.5 is fine. However, the "clust_identity" and "clust_match_prop" values are certainly out of my testing scope. I think the phylogeny based paralog splitting will still be able to handle this but am not for sure.

Will push up the fixation for the bug in PEPPA_parser.py later this week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants