-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding single-read functionality to RAW and CLEAN #80
Conversation
… for single read analyses.
…-multiqc-paired.R
…on. Renamed Multiqc to not be confusing regardings its naming as "Single"
… or paired-end read version of fastp
…read or paired-end read version of fastp
… paired-end read version of SUMMARIZE_MULTIQC
…for paired-end vs single read runs.
…d paired end read runs
…book/notebooks/2024-10-17_crits-christoph-2-4-0.html to create analyses of single and paired-end read data.
…samples being paired-end or not.
… pair information for single read data. Also dropped some code which combines values across read pairs, for single read data. I dropped the renaming of tab_tsv to tab_tsv_2 for paired end data, so I didn't have to create two different versions of the combine step at the end of the subscript. ``` tab <- tab_json %>% inner_join(tab_tsv, by="sample") ```
…ead data, as I instead amended the existing script to be able to handle both single read paired end data.
…ad version. Renamed Multiqc to not be confusing regardings its naming as "Single"" This reverts commit 01ea0c5.
…rizeMultiqc" This reverts commit ad8faf9.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few minor changes here, otherwise looks good!
@simonleandergrimm @harmonbhasin Since we're getting close to merging this would be a good time to update the CHANGELOG. |
Aside from Will getting back to my comment and me editing the changelog this is good to go in. |
workflows/run_dev_se.nf
Outdated
@@ -51,7 +37,7 @@ workflow RUN_DEV_SE { | |||
// Publish results | |||
params_str = JsonOutput.prettyPrint(JsonOutput.toJson(params)) | |||
params_ch = Channel.of(params_str).collectFile(name: "run-params.json") | |||
time_ch = Channel.of(start_time_str + "\n").collectFile(name: "time.txt") | |||
time_ch = Channel.of(params.start_time_str + "\n").collectFile(name: "time.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is this getting into params? Seems naively like it would be simpler to just emit it from LOAD_SAMPLESHEET.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonleandergrimm what Will is referring to here is this line "params.start_time_str". Previously this was in the main workflow like this:
start_time = new Date() start_time_str = start_time.format("YYYY-MM-dd HH:mm:ss z (Z)")
. Either move this back to the main workflow (in which case you can use start_time_str
, or keep it in LOAD_SAMPLESHEET (and make sure to emit it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know what Will meant. I changed the code; let me know if the new time_ch creation logic looks fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonleandergrimm outside of the change Will requested, this looks good to me. Make that change and you should be good to go!
workflows/run_dev_se.nf
Outdated
@@ -51,7 +37,7 @@ workflow RUN_DEV_SE { | |||
// Publish results | |||
params_str = JsonOutput.prettyPrint(JsonOutput.toJson(params)) | |||
params_ch = Channel.of(params_str).collectFile(name: "run-params.json") | |||
time_ch = Channel.of(start_time_str + "\n").collectFile(name: "time.txt") | |||
time_ch = Channel.of(params.start_time_str + "\n").collectFile(name: "time.txt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@simonleandergrimm what Will is referring to here is this line "params.start_time_str". Previously this was in the main workflow like this:
start_time = new Date() start_time_str = start_time.format("YYYY-MM-dd HH:mm:ss z (Z)")
. Either move this back to the main workflow (in which case you can use start_time_str
, or keep it in LOAD_SAMPLESHEET (and make sure to emit it).
…ory/mgs-workflow into single-read-raw-clean
@willbradshaw Let me know if this CHANGELOG edit looks good to you. If so I will create the same for the other PR.
|
I'm not a huge fan of those CHANGELOG changes because they imply that single-end read processing is more complete than it actually is. I would also keep all the single-end updates under one section:
|
Edited CHANGELOG to incorporate your changes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me!
This PR adds support for single-read (single-end) sequencing data to the RAW and CLEAN stages of the pipeline while maintaining existing paired-end functionality. This allows the pipeline to process both single-end and paired-end sequencing data using the same workflow infrastructure.
Key Changes
read_type
parameter ("single_end" or "paired_end") inrun_dev_se.config
to control pipeline behaviorbin/summarize-multiqc-single.R
so it takes in aread-type
variable, which triggers if/else branches throughout the script to change data processing accordingly.Testing
I added test directories with example data for both single and paired-end cases
test-single-read/
- Contains single-end test data and configurationtest-paired-end/
- Contains paired-end test data and configurationI validated the pipeline changes in this notebook: https://data.securebio.org/simons-notebook/posts/2024-10-24-mgs-single-read-eval/