-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add proteus module for maxquant data analysis #147
Conversation
… modules/nf-core/proteus/ pxnotebook_env.yml Dockerfile!
…into add_proteus
|
…arly the workflow parts necessary for proteus from those not necessary
into add_proteus; checking pipeline functionality
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of initial comments while I look at the rest.
features_log2_assays <- "" | ||
} | ||
|
||
if (is.null(features_log2_assays)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is repeated code from shinyngs, we should factor that out. I'll make a PR for it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WackerO the function for this is now available in the latest shinyngs, you just have to update the container for the notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should use the factored-out conditional logging function here, so I've unresolved this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah of course, just had to update shinyngs for the rmarkdown report for that function to be found. I have for now added the [] removal in the report; maybe in a future release that part can be integrated into shiny as well
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm having a bit of trouble in this, in that you're duplicating a lot of code from the wider pipeline. We should be able to process conditionally in certain places, e.g. in how the matrices are generated which differential method is called, but basically follow the same structure.
e.g. CUSTOM_MATRIXFILTER should only really need to be called in one place. It should be possible to validate everything the same way, since we need to check the same things- i.e. that samples, matrices and metadata are compatible.
We should be able to handle this similarly to arrays. I'll take a look and see if I can make some more detailed suggestions, but maybe that is enough to start with.
features_log2_assays <- "" | ||
} | ||
|
||
if (is.null(features_log2_assays)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@WackerO the function for this is now available in the latest shinyngs, you just have to update the container for the notebook.
workflows/differentialabundance.nf
Outdated
ch_contrasts_split = ch_contrasts_file | ||
.splitCsv ( header:true, sep:(params.contrasts.endsWith('tsv') ? '\t' : ',')) | ||
.map{ it.tail().first() } | ||
|
||
// For proteus, extract only meta and contrast variable | ||
ch_contrasts_proteus = ch_contrasts_split | ||
.map{ | ||
tuple( | ||
exp_meta, // meta map | ||
it.variable // contrast variable | ||
) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ch_contrasts_split = ch_contrasts_file | |
.splitCsv ( header:true, sep:(params.contrasts.endsWith('tsv') ? '\t' : ',')) | |
.map{ it.tail().first() } | |
// For proteus, extract only meta and contrast variable | |
ch_contrasts_proteus = ch_contrasts_split | |
.map{ | |
tuple( | |
exp_meta, // meta map | |
it.variable // contrast variable | |
) | |
} | |
ch_contrasts_split = ch_contrasts_file | |
.splitCsv ( header:true, sep:(params.contrasts.endsWith('tsv') ? '\t' : ',')) | |
.map{ it.tail().first() } | |
// For proteus, extract only meta and contrast variable | |
ch_contrasts_proteus = ch_contrasts_split | |
.map{ | |
tuple( | |
exp_meta, // meta map | |
it.variable // contrast variable | |
) | |
} |
So, proteus will run once for every contrast variable? How does the contrast variable impact on the quantifications?
If it does, why does the proteus module return results keyed by the meta
, rather than the meta2
corresponding to the contrast variable?
I think we can simplify this chunk a lot, but I need to understand what's going on better first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ooop, good point, I indeed had to change that code part. Proteus runs once for each contrast var to generate the plots. To the best of my knowledge, the quant tables are not different between contrasts (but just to make sure, I opened an issue and asked the developers). For that reason, I added a reduce
to run the rest of the pipeline only for one of the tables.
I also changed the input channels to proteus (they were unnecessarily complicated as I realized) and the meta
accordingly after adding a second contrast to the test dataset. I had a look at the output and looks good. The report is attached just in case you are curious, but it's nothing spectacular.
Btw, I found that apparently, RMD parameters that contain capital letters will cause a weird bug during knitting; I had to rename some params. Just as a hint for future coding.
…into add_proteus
…into add_proteus
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Getting there- just a few more simplifications to facilitate future maintenance of these changes.
features_log2_assays <- "" | ||
} | ||
|
||
if (is.null(features_log2_assays)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still think we should use the factored-out conditional logging function here, so I've unresolved this.
workflows/differentialabundance.nf
Outdated
} else if (params.study_type == 'maxquant'){ | ||
|
||
// For maxquant, we will use the processed matrices from PROTEUS | ||
ch_features = ch_in_norm |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very similar to what happens in the final part of this conditional, so let's combine the two rather than duplicating what happens there. Right now the last bit is
// Otherwise we can just use the matrix input
matrix_as_anno_filename = "matrix_as_anno.${matrix_file.getExtension()}"
matrix_file.copyTo(matrix_as_anno_filename)
ch_features = Channel.of([ exp_meta, file(matrix_as_anno_filename)])
But you could do something like:
// Otherwise we can just use the matrix input
matrix_as_anno_filename = "matrix_as_anno.${matrix_file.getExtension()}"
if (params.study_type == 'maxquant'){
ch_features_matrix = ch_in_norm
} else{
ch_features_matrix = ch_in_raw
}
ch_features = ch_features_matrix
.map{ exp_meta, matrix_file ->
matrix_file.copyTo(matrix_as_anno_filename)
return [exp_meta, file(matrix_as_anno_filename)]
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL that the inside of the map is not separate from the rest of the code. I had to rename exp_meta, matrix_file
and return [exp_meta
to not receive this error:
- cause: The current scope already contains a variable of the name exp_meta
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yep, my bad
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah and I also added the workdir to matrix_as_anno_filename = "${workflow.workDir}/matrix_as_anno.${matrix_file.getExtension()}"
, otherwise the matrix_as_anno is just created wherever the user happens to be while running the pipelines
workflows/differentialabundance.nf
Outdated
ch_in_raw = PROTEUS.out.raw_tab | ||
.reduce{a, b -> a} | ||
.map{tuple('id': exp_meta.id, it[1])} | ||
ch_in_norm = PROTEUS.out.norm_tab | ||
.reduce{a, b -> a} | ||
.map{tuple('id': exp_meta.id, it[1])} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ch_in_raw = PROTEUS.out.raw_tab | |
.reduce{a, b -> a} | |
.map{tuple('id': exp_meta.id, it[1])} | |
ch_in_norm = PROTEUS.out.norm_tab | |
.reduce{a, b -> a} | |
.map{tuple('id': exp_meta.id, it[1])} | |
ch_in_raw = PROTEUS.out.raw_tab | |
.first() | |
.map{ meta, matrix -> tuple(exp_meta, matrix) } | |
ch_in_norm = PROTEUS.out.norm_tab | |
.first() | |
.map{ meta, matrix -> tuple(exp_meta, matrix) } |
I think this is much simpler to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oop, I did not realize first()
has the same effect as my reduce()
snippet. Thanks!
Co-authored-by: Jonathan Manning <[email protected]>
Co-authored-by: Jonathan Manning <[email protected]>
Co-authored-by: Jonathan Manning <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, think we're good, much better integrated now, thanks for the work.
Thanks for the review! |
This PR adds the proteus module to differentialabundance; it allows importing proteomics measurements from MaxQuant which can then be analyzed with the limma module
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).