Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

loading package taking too much memory #44

Open
spocks opened this issue Aug 27, 2020 · 2 comments
Open

loading package taking too much memory #44

spocks opened this issue Aug 27, 2020 · 2 comments

Comments

@spocks
Copy link

spocks commented Aug 27, 2020

SummarizedExperiment is using too much memory and time to load. The code bellow and results indicate that it is taking 328MB of memory. For a single use this is not a big issue. However using library(SummarizedExperiment) on high performance computing is causing the code to crash due this high memory use.

library(profvis)			
profvis({
  library(SummarizedExperiment)
})

Screenshot_2020-08-27_13-37-48

@vjcitn
Copy link

vjcitn commented Aug 28, 2020

This observation needs to be spelled out more. What kinds of high-performance applications are suffering from the size/load time you are commenting on? Can you restructure the data access tasks so that the R processes operating on the data don't need the package to be attached? Selective importing of symbols can also reduce memory footprint.

@Pascallio
Copy link

Pascallio commented Jan 4, 2024

Since it has been 3.5 years since this issue has been addressed without a solution, I wrote some help for package developers that have noticed this issue and want to minimize their memory footprint. In short, do not add SummarizedExperiment (SE) to your NAMESPACE file, but do list it under Imports in the DESCRIPTION file. This ensures that users install SE as a dependency when installing your package, but only load its namespace when needed. Also, do not use @import SummarizedExperiment or @importFrom SummarizedExperiment … in your function documentation as that will still load the package namespace into your environment.
Instead, create functions that use a SummarizedExperiment as follows:

exp <- SummarizedExperiment::SummarizedExperiment(…)

The SE container depends on other (heavy) packages like IRanges, GenomeInfoDB, SparseArray, etc. These packages all depend on eachother, so each package requires an update that changes their namespace in order to solve this at a fundamental level.

In my case, it is convenient to use a SE container in a late, optional stage of my pipeline. However, if SE is in the package namespace, it would add about 500 MB per core in a multicore cluster (on Windows). This is because R loads the entire namespace when creating a (SNOW) cluster, even if it is not being used. While it may be a niche use case, it does prevent me to import 4GB of unused dependencies when using 8 cores.

Finally, if you do list SummarizedExperiment in your NAMESPACE, any other package that depends on yours will suffer from the same issue, so removing it from the NAMESPACE file could prevent similar issues in the future

Hope this helps!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants