Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large grain crashes Davros #136

Open
griff opened this issue Sep 21, 2021 · 19 comments
Open

Large grain crashes Davros #136

griff opened this issue Sep 21, 2021 · 19 comments

Comments

@griff
Copy link
Contributor

griff commented Sep 21, 2021

I have a grain that I used to sync all my photos to so it is about 11Gb and it can't start anymore.

As best I have found in sandstorm logs is that the problem is with the preview file caching storing its data on tmpdir which is memory backed in the grain and so it fills that and crashes.

@mnutt
Copy link
Owner

mnutt commented Sep 22, 2021

Hmm, that's something I had not considered! I could have the thumbnailer expire images via LRU or something, probably as a sort of background job. Is this something that happens over a long duration of time, or you have one single large directory and merely viewing it fills up the memory and crashes?

@mnutt
Copy link
Owner

mnutt commented Sep 22, 2021

Hmm, as I look at it a bit more I don't recollect previews/thumbnails being stored on a memory-backed filesystem. If you start davros outside of sandstorm on a unix system it'll likely put thumbnails in /tmp, but within sandstorm these end up in /var/davros/tmp, which should be file storage:

https://github.com/mnutt/davros/blob/master/.sandstorm/sandstorm-pkgdef.capnp#L160

Maybe it's some sort of leak in the thumbnailing itself, or davros trying to generate too many thumbnails at the same time?

@ocdtrekkie
Copy link
Contributor

This probably should constitute a breaking issue for approval. I'm not sure how many people have exceptionally large Davros grains, but I am concerned if we don't suss out the issue here, we will find out how many people have very large Davros grains. ;)

I know there was some further discussion on IRC, did we get anywhere in identifying exactly what the issue was? @griff, you mention some logs, can you share them here, by chance, sanitized if necessary?

@griff
Copy link
Contributor Author

griff commented Sep 23, 2021

I have just put my new grain through its paces and I can't reproduce my own problem so I am just closing this issue.

I first uploaded all pictures stored on my computer to the grain (11Gb) but in multiple folders.

And I have just now finished uploading all pictures from my phone (10Gb) using the same method that was used to populate the failing grain. It creates a single folder with all 2600 images and videos in it and while loading the davros view of just that folder is a bit slow I haven't noticed any breakage.

@griff griff closed this as completed Sep 23, 2021
@griff
Copy link
Contributor Author

griff commented Sep 23, 2021

Sorry for the inconvenience!

@Michael-S
Copy link

I am able to reproduce this issue with a Davros grain that has over 1000 images. A backup of the grain is available here if anyone else wants to try. It contains a lot of NSFW language - it's a collection of memes I share with family and friends. There is no nudity. https://2oibt9mht7i0o2w4is69.ducky.sandcats.io/Davros_funnies_in_line.zip

@ocdtrekkie
Copy link
Contributor

@griff or @mnutt , can we reopen this?

@griff griff reopened this Oct 31, 2021
@griff
Copy link
Contributor Author

griff commented Nov 28, 2021

I have found the underlying problem that was causing my issue. It is this: sandstorm-io/sandstorm#3512

@ocdtrekkie
Copy link
Contributor

Okay, that would make this no longer a Davros issue, arguably, unless @mnutt intends to find some way to create less files when making thumbnails... which seems unrealistic?

Are you able to raise your fs.inotify.max_user_watches value on the box in question? It sounds like in kernel 5.11 and up, Linux will more intelligently set this default value based on the memory of your machine.

@Michael-S
Copy link

Thank you to all who looked into this! I changed my fs.inotify.max_user_watches to 32768 and restarted, no dice. I do not see the error in sandstorm-io/sandstorm#3512 in my logs. I do see this in sandstorm.log now:

sandstorm/gateway.c++:1072: error: exception = kj/compat/http.c++:1851: failed: expected !inBody; previous HTTP message body incomplete; can't write more messages
stack: 4c8412 4ff5e4 4a99bf 4f60da 4f70d1 544fa1 4fe500

I would swear that error in the log is new. I haven't touched C++ in 16 years, but I'll take a look at that file and see if anything useful pops out at me.

@ocdtrekkie
Copy link
Contributor

@griff Did you see the error in your Sandstorm log by chance?

@Michael-S Do you know if that setting is machine or user specific where you changed it? I want to say it might be the latter, and sandstorm runs as its own user account. (I don't know how to set that setting even, just trying to ballpark guesses based on what I read.)

@Michael-S
Copy link

Michael-S commented Nov 29, 2021

I changed it in /etc/sysctl.conf and rebooted the VM, so I don't think that's it.
Edit: to inspect the value, do cat /proc/sys/fs/inotify/max_user_watches
To change the value, you can change it dynamically but the easy way is to add a line to /etc/sysctl.conf:
fs.inotify.max_user_watches=32768 and then restart.

@zenhack
Copy link

zenhack commented Nov 29, 2021

@ocdtrekkie, it is not user specific.

@ocdtrekkie
Copy link
Contributor

🤔 So do @griff and @Michael-S have different issues then? I am really curious if @griff found the sandstorm/supervisor.c++:232: overloaded: inotify_add_watch: No space left on device errors in his system log then, since @Michael-S did not.

@griff
Copy link
Contributor Author

griff commented Nov 29, 2021

@ocdtrekkie I got inotify_add_watch in the log and increasing fs.inotify.max_user_watches fixed my issue. So it looks to be different issues.

@ocdtrekkie
Copy link
Contributor

As a stupid check (on myself): I asked if @Michael-S saw the inotify_add_watch error appeared in the Sandstorm/system log, and the other issue specifies that it appears in the grain log. @Michael-S Nothing in the grain log for the grain that won't start for you, right?

@Michael-S
Copy link

Right, nothing in the grain log and nothing related to inotify in the Sandstorm log.

@ocdtrekkie
Copy link
Contributor

Okay thanks, I figured but just wanted to confirm

@mnutt
Copy link
Owner

mnutt commented Nov 30, 2021

Hmm, at some point maybe I can explore storing thumbnails in a SQLite database or something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants