loading package taking too much memory #44

spocks · 2020-08-27T18:00:25Z

SummarizedExperiment is using too much memory and time to load. The code bellow and results indicate that it is taking 328MB of memory. For a single use this is not a big issue. However using library(SummarizedExperiment) on high performance computing is causing the code to crash due this high memory use.

library(profvis)			
profvis({
  library(SummarizedExperiment)
})

The text was updated successfully, but these errors were encountered:

vjcitn · 2020-08-28T15:25:21Z

This observation needs to be spelled out more. What kinds of high-performance applications are suffering from the size/load time you are commenting on? Can you restructure the data access tasks so that the R processes operating on the data don't need the package to be attached? Selective importing of symbols can also reduce memory footprint.

Pascallio · 2024-01-04T20:11:31Z

Since it has been 3.5 years since this issue has been addressed without a solution, I wrote some help for package developers that have noticed this issue and want to minimize their memory footprint. In short, do not add SummarizedExperiment (SE) to your NAMESPACE file, but do list it under Imports in the DESCRIPTION file. This ensures that users install SE as a dependency when installing your package, but only load its namespace when needed. Also, do not use @import SummarizedExperiment or @importFrom SummarizedExperiment … in your function documentation as that will still load the package namespace into your environment.
Instead, create functions that use a SummarizedExperiment as follows:

exp <- SummarizedExperiment::SummarizedExperiment(…)

The SE container depends on other (heavy) packages like IRanges, GenomeInfoDB, SparseArray, etc. These packages all depend on eachother, so each package requires an update that changes their namespace in order to solve this at a fundamental level.

In my case, it is convenient to use a SE container in a late, optional stage of my pipeline. However, if SE is in the package namespace, it would add about 500 MB per core in a multicore cluster (on Windows). This is because R loads the entire namespace when creating a (SNOW) cluster, even if it is not being used. While it may be a niche use case, it does prevent me to import 4GB of unused dependencies when using 8 cores.

Finally, if you do list SummarizedExperiment in your NAMESPACE, any other package that depends on yours will suffer from the same issue, so removing it from the NAMESPACE file could prevent similar issues in the future

Hope this helps!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

loading package taking too much memory #44

loading package taking too much memory #44

spocks commented Aug 27, 2020

vjcitn commented Aug 28, 2020

Pascallio commented Jan 4, 2024 •

edited

Loading

loading package taking too much memory #44

loading package taking too much memory #44

Comments

spocks commented Aug 27, 2020

vjcitn commented Aug 28, 2020

Pascallio commented Jan 4, 2024 • edited Loading

Pascallio commented Jan 4, 2024 •

edited

Loading