Mb subtypes #738

komalsrathi · 2020-08-19T18:36:15Z

Purpose/implementation Section

What scientific question is your analysis addressing?

Molecular classification of Medulloblastoma samples into Group3, Group4, SHH and WNT subtypes.

What was your approach?

I have used two different methods: MM2S and medulloPackage. The input to both methods is log-transformed FPKM matrix for poly-A and stranded datasets.

What GitHub issue does your pull request address?

#731

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Feasibility of the approach, code structure and if I should remove batch correction out of the module.

Is there anything that you want to discuss further?

Should we keep batch correction or remove it? The results are same with or without batch correcting using RNA_library as batch.

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

Table in comparison*.rds format containing expected subtype classification (pathology reports) and observed subtypes along with p-values (in case of medulloPackage) and scores (in case of MM2S).

What is your summary of the results?

After comparing with the expected subtypes from pathology reports, we get an accuracy of 81.25% (26/32 correctly classified) with medulloPackage and an accuracy of 78.125% (25/32 correctly classified) with MM2S.

Reproducibility Checklist

The dependencies required to run the code in this pull request have been added to the project Dockerfile.
This analysis has been added to continuous integration.

Documentation Checklist

This analysis module has a README and it is up to date.
This analysis is recorded in the table in analyses/README.md and the entry is up to date.
The analytical code is documented and contains comments.

komalsrathi · 2020-08-19T18:41:01Z

@jaclyn-taroni I have not added this to CI, because I wasn't sure if you first want to review. The script uses either MM2S and medulloPackage depending on what the user inputs and performs batch correction on RNA_library if asked for. There are four bash scripts corresponding to the above combination (2 method x with and w/o batch correction). Let me know if I can just add one of these to the CI?

jharenza · 2020-08-19T18:52:15Z

@jaclyn-taroni I have not added this to CI, because I wasn't sure if you first want to review. The script uses either MM2S and medulloPackage depending on what the user inputs and performs batch correction on RNA_library if asked for. There are four bash scripts corresponding to the above combination (2 method x with and w/o batch correction). Let me know if I can just add one of these to the CI?

please put all in CI! :)

jaclyn-taroni · 2020-08-20T02:58:47Z

Hi @komalsrathi, I filed komalsrathi#1 to this branch and I am providing the context here.

CI failed because of the version of tidyselect on the project Docker image (0.2.5), which doesn't include all_of().

To verify that this was the issue, I built the Docker image from this branch and identified some issues with the logic around the batch correction that were remedied in komalsrathi@9b4d7f9.

Now the shell scripts without batch correction run successfully locally.

However, when I run the shell scripts that do include batch correction, I get the following errors (arising from classify.mb())

run-molecular-subtyping-MB-batch-correct-medullo-classifier.sh fails with:

[1] "Classify medulloblastoma subtypes..."
Error in exprs[g1, ] : subscript out of bounds
Calls: classify.mb -> <Anonymous> -> signatureGenes -> apply -> FUN
Execution halted

run-molecular-subtyping-MB-batch-correct-MM2S.sh fails with:

[1] "Classify medulloblastoma subtypes..."
'select()' returned 1:many mapping between keys and columns
Error in `.rowNamesDF<-`(x, value = value) :
  duplicate 'row.names' are not allowed
Calls: classify.mb ... row.names<- -> row.names<-.data.frame -> .rowNamesDF<-
In addition: Warning message:
non-unique value when setting 'row.names': ‘NaN’
Execution halted

I'm not sure what format the classify.mb() function wants based on the documentation.

In service of debugging that and not repeating filtering and batch correction multiple times, I'd recommend that we move the filtering step out of the script that does the classification. Some ideas around that were added in komalsrathi@adf3287.

That needs polishing/more testing, but running the following from the analyses/molecular-subtyping-MB directory worked:

Rscript --vanilla 00-filter-and-batch-correction.R \
  --batch_col RNA_library \
  --output_prefix medulloblastoma-exprs \
  --output_dir input

The analysis itself is somewhat specific–I don't expect we will need to do the batch correction for medulloblastoma only outside of this project and module, so I think hardcoding file paths for the expression files is fine and in the interest of making things simpler.

I also think it could be useful to move the comparison of all the results to a notebook or script that takes as input the expected classes and the predicted classes from each method/processing strategy combination.

The shell script to run the entire module and gets added to CI could then consist of:

Filter and batch correction
Classification for all combinations of batch correction status and classifier
Collect all the results together

I'll say that it is unclear to me at this point if the batch correction is a necessary step - the ComBat step itself is pretty quick and the classification steps don't seem too bad, so may be fine to leave it in.

Thoughts? Happy to make some of these edits to the jaclyn-taroni:mb-subtypes branch (komalsrathi#1) if that's helpful.

komalsrathi · 2020-08-20T03:40:19Z

Hi @jaclyn-taroni thanks for the detailed review. I saw your PR, before merging let me double check on my end (locally) why changing all_of to select causes so many issues.

CI related edits to MB subtype steps

komalsrathi · 2020-08-20T03:55:56Z

~~Also, forgot to mention that batch correction has no effect on the output so maybe better left out of the analysis (to make things simpler)?~~ This was the case before (non-log input) but not anymore. Will add all results to a notebook and update.

jaclyn-taroni · 2020-08-20T11:40:23Z

Weird, CI failed with a data download issue. I'm going to hit rerun now!

Dockerfile

komalsrathi · 2020-08-20T12:50:23Z

Checking the failed CI

Co-authored-by: Jaclyn Taroni <[email protected]>

komalsrathi · 2020-08-20T13:15:03Z

I suspect it is related to the test files because the batch correction step runs locally - something like the poly-A MB sample missing from the poly-A testing file (haven't checked to be sure). Some ideas:

Don't ignore the expression files generated by the 00 script, so when they are committed to the repo they will be available for CI and the shell script could have an option to skip the filtering and batch correction that you'd use in CI (here's an example from the ATRT module: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/4644b95d8caf3ea408249fa71b8318c163eb46ab/analyses/molecular-subtyping-ATRT/run-molecular-subtyping-ATRT.sh)

We could specifically accommodate this module when we generate the test files (see: https://github.com/AlexsLemonade/OpenPBTA-analysis/tree/4644b95d8caf3ea408249fa71b8318c163eb46ab/analyses/create-subset-files#special-considerations)

First one is probably more robust, but will increase the run time in CI. I'll also mention that the other molecular subtyping steps are grouped together in CI near the top:

OpenPBTA-analysis/.circleci/config.yml

Line 37 in 4644b95

### MOLECULAR SUBTYPING ###

Moving this up will make the time between commit and getting the test result for this module shorter, which is helpful for debugging!

I moved up the module in CI. Also removed the expression matrices from gitignore. The filter+batch correction is only run once and I think the error that I was seeing was in this step. Hoping this commit will fix it.

jaclyn-taroni · 2020-08-20T13:19:24Z

Without explicitly skipping the 00 step in CI, I expect that the same problem will present itself but we'll see shortly!

…ilter

komalsrathi · 2020-08-20T14:04:51Z

@jaclyn-taroni I am confused why would there be an issue with filtering step? It works fine locally and it takes minimal time.


[1] "Batch correct input matrices..."
Error in `[.data.frame`(expr.input.mb, , clin.mb$Kids_First_Biospecimen_ID) : 
  undefined columns selected
Calls: as.matrix -> [ -> [.data.frame
Execution halted

jaclyn-taroni · 2020-08-20T14:08:22Z

Because in CI it's filtering the testing files, which are a subset of the full files to save on time, we can try the strategy in komalsrathi#2

Skip filtering and batch correction. in CI

jaclyn-taroni · 2020-08-20T22:02:30Z

Going to resolve the conflicts - I get an error locally that does not show up in CI and I believe it happens whether or not I use the committed files or regenerate them with the 00 step. Will post more on that, but first I'm going to fix the conflicts. Then I'll rebuild the image locally and see if that changes things.

jaclyn-taroni · 2020-08-20T23:04:36Z

Okay, I was wrong in a way that made this easier to get to the bottom of this 😅 which is ideal.

When I ran this in the Docker container using the version of sva installed (3.34.0), the ComBat output was all NaN (related: jtleek/sva-devel#14).

This is resolved when installing sva the following way:

remotes::install_github("jtleek/sva-devel", ref = "123be9b2b9fd7c7cd495fab7d7d901767964ce9e")

I suspect, then, that the ComBat-corrected file committed to the repository was generated with a more recent version of sva than is installed on the project Docker image and that's why CI doesn't fail. So I will think about/look into what is the best way to install a more recent version of sva in the context of this project's image and report back with suggestions.

komalsrathi · 2020-08-21T00:47:15Z

@jaclyn-taroni just for reference, the files were generated using sva_3.36.0

jaclyn-taroni · 2020-08-21T00:54:55Z

Suggestions here: komalsrathi#3 - quoting from that

I confirmed that updating sva in this manner only changes the version of the sva package; all other installed package versions remain the same.

jaclyn-taroni

The code for the first two steps and the shell script look good! Let's plan to get komalsrathi#3 merged so the dependencies are all set such that if someone needed to regenerate the expression files later, they'd be able to do that in the Docker container.

analyses/molecular-subtyping-MB/02-compare-classes.Rmd should be removed from this pull request and included in a new pull request so we can have a discussion about the best way to display that information and save it in a format that can be used elsewhere in the project. That notebook is missing important context about what samples are included when it reports the accuracy above the table without the code chunks in my opinion.

The README also needs to be updated to reflected the recent changes before we can get this approved and merged.

jaclyn-taroni · 2020-08-21T01:16:29Z

analyses/molecular-subtyping-MB/README.md

+
+The goal of this analysis is to utilize the R packages [medulloPackage](https://github.com/d3b-center/medullo-classifier-package) and [MM2S](https://github.com/cran/MM2S) that leverage expression data from RNA-seq or array to classify the medulloblastoma samples into four subtypes i.e Group3, Group4, SHH, WNT.
+
+### Analysis scripts


This section of the README should be updated to reflect the recent changes.

Thanks!! ~~I will make these changes today.~~ I have updated the above. Please let me know whenever you can.
I will also work on submitted another PR for the markdown with more details.

Update sva on Docker image

jaclyn-taroni

👍 looks good to me! Thanks for the updates!

komalsrathi added 11 commits August 18, 2020 09:07

add molecular subtype classification for MB

c85fcdc

Merge branch 'master' into mb-subtypes

31de9e5

remove old output

ca3fcef

add analyses README

77a1cee

add MM2S as alt method

fc91cc8

update path file

041dd95

log transform input

5f95637

move classify function to util

b0b44c6

add packages to dockerfile

194a20e

update analyses/README

8b22f79

update analyses/README

a8e4a32

komalsrathi and others added 4 commits August 19, 2020 15:26

add scripts to ci

098fccc

Merge remote-tracking branch 'komalsrathi/mb-subtypes' into mb-subtypes

1ecf7f7

Changes to get the ComBat step to run in Docker

9b4d7f9

Add idea for filter and ComBat script

adf3287

jaclyn-taroni mentioned this pull request Aug 20, 2020

CI related edits to MB subtype steps komalsrathi/OpenPBTA-analysis#1

Merged

Merge pull request #1 from jaclyn-taroni/mb-subtypes

880736a

CI related edits to MB subtype steps

komalsrathi added 4 commits August 20, 2020 02:21

create single script

9a85941

create single script

e2e8378

add output files to gitignore

f30ec54

remove exprs data

ad5ce8b

jaclyn-taroni reviewed Aug 20, 2020

View reviewed changes

Dockerfile Outdated Show resolved Hide resolved

Update Dockerfile

f888160

Co-authored-by: Jaclyn Taroni <[email protected]>

Merge remote-tracking branch 'komalsrathi/mb-subtypes' into skip-mb-f…

05c4544

…ilter

Skip filtering step in CI

2be30c6

Merge pull request #2 from jaclyn-taroni/skip-mb-filter

9dd13c8

Skip filtering and batch correction. in CI

Merge branch 'master' into mb-subtypes

ae26569

jaclyn-taroni added 4 commits August 20, 2020 19:49

Merge remote-tracking branch 'komalsrathi/mb-subtypes' into mb-sva-devel

51891e7

Install sva-devel on Docker image

0ce2bdc

Rerun module in Docker container

48c2bab

Set upgrade = FALSE for medullo classifier install

245529a

jaclyn-taroni mentioned this pull request Aug 21, 2020

Update sva on Docker image komalsrathi/OpenPBTA-analysis#3

Merged

jaclyn-taroni reviewed Aug 21, 2020

View reviewed changes

komalsrathi and others added 3 commits August 21, 2020 08:07

Merge pull request #3 from jaclyn-taroni/mb-sva-devel

6fc4333

Update sva on Docker image

update README and remove rmd

2aedf68

update analyses README

a6a30a3

jaclyn-taroni mentioned this pull request Aug 21, 2020

Updated analysis: Take consensus of two classifiers for medulloblastoma subtype labels #742

Closed

jaclyn-taroni approved these changes Aug 21, 2020

View reviewed changes

jaclyn-taroni merged commit b145cb4 into AlexsLemonade:master Aug 21, 2020

This was referenced Aug 21, 2020

Medulloblastoma subtype labels have some disagreement with unsupervised analysis #730

Closed

Proposed Analysis: medulloblastoma subtyping #731

Closed

This was referenced Aug 27, 2020

Medulloblastoma pathology subtypes: file for data release #746

Closed

Updated analysis: medulloblastoma consensus subtypes #747

Closed

komalsrathi deleted the mb-subtypes branch September 8, 2020 14:47

cansavvy mentioned this pull request Sep 21, 2020

MB subtyping update to pathology_diagnosis #787

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Mb subtypes #738

Mb subtypes #738

komalsrathi commented Aug 19, 2020 •

edited

Loading

komalsrathi commented Aug 19, 2020

jharenza commented Aug 19, 2020

jaclyn-taroni commented Aug 20, 2020

komalsrathi commented Aug 20, 2020

komalsrathi commented Aug 20, 2020 •

edited

Loading

jaclyn-taroni commented Aug 20, 2020

komalsrathi commented Aug 20, 2020

komalsrathi commented Aug 20, 2020

jaclyn-taroni commented Aug 20, 2020

komalsrathi commented Aug 20, 2020

jaclyn-taroni commented Aug 20, 2020 •

edited

Loading

jaclyn-taroni commented Aug 20, 2020

jaclyn-taroni commented Aug 20, 2020

komalsrathi commented Aug 21, 2020

jaclyn-taroni commented Aug 21, 2020

jaclyn-taroni left a comment

jaclyn-taroni Aug 21, 2020

komalsrathi Aug 21, 2020 •

edited

Loading

jaclyn-taroni left a comment


		The goal of this analysis is to utilize the R packages [medulloPackage](https://github.com/d3b-center/medullo-classifier-package) and [MM2S](https://github.com/cran/MM2S) that leverage expression data from RNA-seq or array to classify the medulloblastoma samples into four subtypes i.e Group3, Group4, SHH, WNT.

		### Analysis scripts

Mb subtypes #738

Mb subtypes #738

Conversation

komalsrathi commented Aug 19, 2020 • edited Loading

Purpose/implementation Section

What scientific question is your analysis addressing?

What was your approach?

What GitHub issue does your pull request address?

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

komalsrathi commented Aug 19, 2020

jharenza commented Aug 19, 2020

jaclyn-taroni commented Aug 20, 2020

komalsrathi commented Aug 20, 2020

komalsrathi commented Aug 20, 2020 • edited Loading

jaclyn-taroni commented Aug 20, 2020

komalsrathi commented Aug 20, 2020

komalsrathi commented Aug 20, 2020

jaclyn-taroni commented Aug 20, 2020

komalsrathi commented Aug 20, 2020

jaclyn-taroni commented Aug 20, 2020 • edited Loading

jaclyn-taroni commented Aug 20, 2020

jaclyn-taroni commented Aug 20, 2020

komalsrathi commented Aug 21, 2020

jaclyn-taroni commented Aug 21, 2020

jaclyn-taroni left a comment

Choose a reason for hiding this comment

jaclyn-taroni Aug 21, 2020

Choose a reason for hiding this comment

komalsrathi Aug 21, 2020 • edited Loading

Choose a reason for hiding this comment

jaclyn-taroni left a comment

Choose a reason for hiding this comment

komalsrathi commented Aug 19, 2020 •

edited

Loading

komalsrathi commented Aug 20, 2020 •

edited

Loading

jaclyn-taroni commented Aug 20, 2020 •

edited

Loading

komalsrathi Aug 21, 2020 •

edited

Loading