Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uploaded files should be written directly to MongoDB #5

Open
keesvanbochove opened this issue Jul 19, 2012 · 6 comments
Open

Uploaded files should be written directly to MongoDB #5

keesvanbochove opened this issue Jul 19, 2012 · 6 comments

Comments

@keesvanbochove
Copy link
Member

Currently the metabolomicsModule writes uploaded files via uploadr to disk, and then saves the files in MongoDB. Direct writing to MongoDB would lead to cleaner code and possibly performance enhancement. Also, this would make it possible to work in cloud environments where no disk access is present.

@keesvanbochove
Copy link
Member Author

The trackR plugin also writes to disk, so to become independent of disk access that plugin also should change its write behaviour to directly to MongoDB.

@gooi
Copy link

gooi commented Jul 20, 2012

What would be the consequences if during the upload of a (large) file the network connection breaks. What does MongoDB do with the partial upload?

@4np
Copy link
Contributor

4np commented Jul 25, 2012

The file will only be put into MongoDB when upload is complete, so that should not result in any issues...

@gooi
Copy link

gooi commented Jul 25, 2012

I think preformance is mainly determined by network bandwidth in this case. Sounds like a 'nice to have'

@4np
Copy link
Contributor

4np commented Jul 25, 2012

It is mainly an architectural flaw which should be resolved. By using intermediate local storage and a not well thought out clean up strategy, the file uploads may leak data or consume large amounts of storage over time. Also in a multiple-server set up this may lead to problems as well.

Basically the steps are now:

  1. initiate upload
  2. store file on disk on server
  3. load file and put it in mongodb
  4. if and when the file gets deleted remains unclear at this point

Looking ahead to cloud hosting options, load balanced environments, etcetera we should try to steer clear of local storage on a server. This may work now on one machine, but it will probably fail in a clustered setting. Which means step 2 is most likely not necessary and step 4 is something that consequently is not necessary either.

Note that as this is an open source project, where others will use this software, this is something we need to keep in mind. Even though we currently use single server setups, because we might very well use multiple-server (or cloud) setups ourselves some day.

@keesvanbochove
Copy link
Member Author

See for details on how to implement this the comment from Jeroen in this ticket: https://github.com/NetherlandsMetabolomicsCentre/metabolomicsModule/issues/10

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants