Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Leak when calling predict inside of RStudio #16

Open
abelsonlive opened this issue Sep 17, 2015 · 6 comments
Open

Memory Leak when calling predict inside of RStudio #16

abelsonlive opened this issue Sep 17, 2015 · 6 comments

Comments

@abelsonlive
Copy link

I've had a persistent bug when using predict on a bigrf model inside of RStudio. Essentially, there seems to be a memory leak which leads to RStudio sucking up all of my machine's RAM and forcing me to shutdown my computer. Curiously this does not happen when I run my script from the command line using Rscript.

@abelsonlive
Copy link
Author

screenshot 2015-09-17 14 26 02

@aloysius-lim
Copy link
Owner

Are you running predict in parallel? Can you share a code snippet?

@abelsonlive
Copy link
Author

No I'm not running it in parallel. It's a pretty straightforward implementation. While it's hard to share the exact code snippet as it's been abstracted out into separate functions, its basically this:

require(bigrf)

samp <- sample(1:nrow(iris), nrow(iris) * .6)
train <- iris[samp, ]
test <- iris[-samp,]

m <- bigrfc(train, 
       train$Species, 
       ntree=10, 
       varselect=1:4,
       trace=1)
p <- predict(m, test)

The test set is ~ 2 GB and I'm running it on a machine with 16 GB of ram.

@abelsonlive
Copy link
Author

You can see the repository here: https://github.com/enigma-io/smoke-alarm-risk.
The functions in question are here: https://github.com/enigma-io/smoke-alarm-risk/blob/master/rscripts/model.R

@abiyug
Copy link

abiyug commented Dec 3, 2015

How many core processors does your computer have? How long is it taking you to train the 2GB data?

@ajnisbet
Copy link

I have the same issue, with basically the same code as @abelsonlive.

Dataset is 300MB on a 6GB machine, 4 core machine, 50 trees. Occurs with and without parallel.

Using R 3.2.3 on Fedora 23.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants