-
Notifications
You must be signed in to change notification settings - Fork 83
Conversation
@jaclyn-taroni I have not added this to CI, because I wasn't sure if you first want to review. The script uses either MM2S and medulloPackage depending on what the user inputs and performs batch correction on RNA_library if asked for. There are four bash scripts corresponding to the above combination (2 method x with and w/o batch correction). Let me know if I can just add one of these to the CI? |
please put all in CI! :) |
Hi @komalsrathi, I filed komalsrathi#1 to this branch and I am providing the context here. CI failed because of the version of To verify that this was the issue, I built the Docker image from this branch and identified some issues with the logic around the batch correction that were remedied in komalsrathi@9b4d7f9. Now the shell scripts without batch correction run successfully locally. However, when I run the shell scripts that do include batch correction, I get the following errors (arising from
I'm not sure what format the In service of debugging that and not repeating filtering and batch correction multiple times, I'd recommend that we move the filtering step out of the script that does the classification. Some ideas around that were added in komalsrathi@adf3287. That needs polishing/more testing, but running the following from the
The analysis itself is somewhat specific–I don't expect we will need to do the batch correction for medulloblastoma only outside of this project and module, so I think hardcoding file paths for the expression files is fine and in the interest of making things simpler. I also think it could be useful to move the comparison of all the results to a notebook or script that takes as input the expected classes and the predicted classes from each method/processing strategy combination. The shell script to run the entire module and gets added to CI could then consist of:
I'll say that it is unclear to me at this point if the batch correction is a necessary step - the ComBat step itself is pretty quick and the classification steps don't seem too bad, so may be fine to leave it in. Thoughts? Happy to make some of these edits to the |
Hi @jaclyn-taroni thanks for the detailed review. I saw your PR, before merging let me double check on my end (locally) why changing |
CI related edits to MB subtype steps
|
Weird, CI failed with a data download issue. I'm going to hit rerun now! |
Checking the failed CI |
Co-authored-by: Jaclyn Taroni <[email protected]>
I moved up the module in CI. Also removed the expression matrices from gitignore. The filter+batch correction is only run once and I think the error that I was seeing was in this step. Hoping this commit will fix it. |
Without explicitly skipping the |
@jaclyn-taroni I am confused why would there be an issue with filtering step? It works fine locally and it takes minimal time.
|
Because in CI it's filtering the testing files, which are a subset of the full files to save on time, we can try the strategy in komalsrathi#2 |
Skip filtering and batch correction. in CI
Going to resolve the conflicts - I get an error locally that does not show up in CI and I believe it happens whether or not I use the committed files or regenerate them with the |
Okay, I was wrong in a way that made this easier to get to the bottom of this 😅 which is ideal. When I ran this in the Docker container using the version of This is resolved when installing remotes::install_github("jtleek/sva-devel", ref = "123be9b2b9fd7c7cd495fab7d7d901767964ce9e") I suspect, then, that the ComBat-corrected file committed to the repository was generated with a more recent version of |
@jaclyn-taroni just for reference, the files were generated using |
Suggestions here: komalsrathi#3 - quoting from that
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code for the first two steps and the shell script look good! Let's plan to get komalsrathi#3 merged so the dependencies are all set such that if someone needed to regenerate the expression files later, they'd be able to do that in the Docker container.
analyses/molecular-subtyping-MB/02-compare-classes.Rmd
should be removed from this pull request and included in a new pull request so we can have a discussion about the best way to display that information and save it in a format that can be used elsewhere in the project. That notebook is missing important context about what samples are included when it reports the accuracy above the table without the code chunks in my opinion.
The README also needs to be updated to reflected the recent changes before we can get this approved and merged.
|
||
The goal of this analysis is to utilize the R packages [medulloPackage](https://github.com/d3b-center/medullo-classifier-package) and [MM2S](https://github.com/cran/MM2S) that leverage expression data from RNA-seq or array to classify the medulloblastoma samples into four subtypes i.e Group3, Group4, SHH, WNT. | ||
|
||
### Analysis scripts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section of the README should be updated to reflect the recent changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!! I will make these changes today. I have updated the above. Please let me know whenever you can.
I will also work on submitted another PR for the markdown with more details.
Update sva on Docker image
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 looks good to me! Thanks for the updates!
Purpose/implementation Section
What scientific question is your analysis addressing?
Molecular classification of Medulloblastoma samples into Group3, Group4, SHH and WNT subtypes.
What was your approach?
I have used two different methods: MM2S and medulloPackage. The input to both methods is log-transformed FPKM matrix for poly-A and stranded datasets.
What GitHub issue does your pull request address?
#731
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
Feasibility of the approach, code structure and if I should remove batch correction out of the module.
Is there anything that you want to discuss further?
Should we keep batch correction or remove it? The results are same with or without batch correcting using
RNA_library
as batch.Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes
Results
What types of results are included (e.g., table, figure)?
Table in
comparison*.rds
format containing expected subtype classification (pathology reports) and observed subtypes along with p-values (in case of medulloPackage) and scores (in case of MM2S).What is your summary of the results?
After comparing with the expected subtypes from pathology reports, we get an accuracy of 81.25% (26/32 correctly classified) with medulloPackage and an accuracy of 78.125% (25/32 correctly classified) with MM2S.
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.