Out of memory error #4

SWittouck · 2020-04-10T12:50:06Z

Dear Zhemin,

Thank you for making PEPPA publicly available and for putting the publication on bioRxiv, it's a very nice read!

I managed to install PEPPA successfully and tried to do a test run on 73 genomes of the order Lactobacillales. After a few minutes I got an out of memory error (memory was indeed full) and the job aborted. Is there anything I can do to solve this? I have 16GB of memory and was using all 16 threads I have available.

Best wishes,
Stijn

zheminzhou · 2020-04-10T21:50:08Z

Due to the problem of multi-threading in Python, part of the parallel calculation is handled by multi-processes, and all data in the memory will be replicated in each process. Please try to run PEPPA with fewer processes (i.e., 4). I will close this issue for now but please re-open it if you still get an out-of-memory problem.

SWittouck · 2020-04-11T14:19:13Z

Dear Zhemin,

Thank you for your suggestion, I will try this.

Best wishes,
Stijn

SWittouck · 2020-04-12T05:13:43Z

Dear Zhemin,

I tried to run with fewer threads, as you suggested, even down to a single thread. Unfortunately, the issue remained. In annex the log file with the error - it seems to occur in the BLASTn step.

Best wishes,
Stijn
peppa.log

zheminzhou · 2020-04-22T14:20:35Z

I have pushed PEPPA in pypi with a formal version number 1.0
The codes in this version have been re-visited to optimize the memory performance.
You can install it in python3 >=3.5 via
pip install bio-peppa
And the executable is 'PEPPA' by default.
Hope this can solve the memory leaking problem.

SWittouck · 2020-04-23T09:25:28Z

Hi Zhemin,

I installed PEPPA version 1.0 using pip, as you suggested. It didn't fix the problem: I still got out-of-memory errors, no matter the number of threads I used. However, I took a closer look at how PEPPA works, and it seems to me that it is not suited for datasets above the genus level? While I have a genome dataset on the order level; I think the blastn searches are not sensitive enough for those. When I set --clust_identity to 0.5, --clust_match_prop to 0.6 and --match_identityto 0.5, there was no error anymore! So I'm still not sure what caused the error, and I think my dataset is anyway outside of the scope of PEPPA, but at least the error got solved. Thank you for your help!

I have one additional remark: I found a bug in PEPPA_parser.py. In line 64, there is a ] too many.

Best regards,
Stijn

zheminzhou · 2020-04-24T08:48:34Z

Thank you for the bug report (again) and the solution you found. PEPPA allows a lower limit of "--match_identity" down to 0.4, so your value of 0.5 is fine. However, the "clust_identity" and "clust_match_prop" values are certainly out of my testing scope. I think the phylogeny based paralog splitting will still be able to handle this but am not for sure.

Will push up the fixation for the bug in PEPPA_parser.py later this week.

zheminzhou closed this as completed Apr 10, 2020

zheminzhou reopened this Apr 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Out of memory error #4

Out of memory error #4

SWittouck commented Apr 10, 2020

zheminzhou commented Apr 10, 2020

SWittouck commented Apr 11, 2020

SWittouck commented Apr 12, 2020

zheminzhou commented Apr 22, 2020

SWittouck commented Apr 23, 2020

zheminzhou commented Apr 24, 2020

Out of memory error #4

Out of memory error #4

Comments

SWittouck commented Apr 10, 2020

zheminzhou commented Apr 10, 2020

SWittouck commented Apr 11, 2020

SWittouck commented Apr 12, 2020

zheminzhou commented Apr 22, 2020

SWittouck commented Apr 23, 2020

zheminzhou commented Apr 24, 2020