-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Set up to support contribution of selection alerts #81
base: main
Are you sure you want to change the base?
Conversation
Hi @sanjaynagi, I've done some maintenance here to get the build working with newer versions of malariagen_data, needed to be able to do authenticated access to GCS. I've also added some examples to the README on how to save the outputs from a successful build to GCS, and then restore them to a local filesystem (maybe on a different computer). Basically, this command saves a build to GCS:
...and then this command restores a build from GCS to a local filesystem (maybe on a different computer from the one used to create the build):
However, if I then run Initially I thought this would be because of file modification times, because the input code files might have a newer modification time than the build files restored from GCS (e.g., if the build was run and saved to GCS, and then the code files where cloned or checked out to a different computer later - git does not preserve file modification times.) That's why I added the But even with that, I'm still finding the workflow is rerunning too much. I suspect it's something to do with checkpointing, but I'm not sure. Perhaps the workflow needs to be broken up? Separate out the book build into a different workflow? That way someone who just wants to author or edit an alert page could do so and be sure to run only the book rebuild? |
I've now split the workflow into two separate workflows, one for doing the analysis, and one for building the site. This is actually really simple, we just have two separate snakefiles ( This should mean that we can downloaded a build from the GCS bucket, and run the site build workflow, and it wont want to re-run the workflow. I cant test this though, as the gsutils command says i dont have permissions. I've tested both split-up workflows on Datalab and they work perfectly, just havent tested with explicitly GCS downloaded data. The argument to run snakemake workflows with a specific snakefile, is simply:
Silly that its took me 4 months to attempt this, because it only took about 20 minutes. |
Resolves #80.