Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

325 mixed model in limma #339

Closed
wants to merge 12 commits into from
Closed

Conversation

KamilMaliszArdigen
Copy link

@KamilMaliszArdigen KamilMaliszArdigen commented Nov 5, 2024

This is a limma module update to provide new logic related to mixed models

PR checklist

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the pipeline conventions in the contribution docs
  • If necessary, also make a PR on the nf-core/differentialabundance branch on the nf-core/test-datasets repository.
  • Make sure your code lints (nf-core lint).
  • Ensure the test suite passes (nf-test test main.nf.test -profile test,docker).
  • Check for unexpected warnings in debug mode (nextflow run . -profile debug,test,docker --outdir <OUTDIR>).
  • Usage Documentation in docs/usage.md is updated.
  • Output Documentation in docs/output.md is updated.
  • CHANGELOG.md is updated.
  • README.md is updated (including new tool citations and authors/contributors).

@KamilMaliszArdigen KamilMaliszArdigen linked an issue Nov 5, 2024 that may be closed by this pull request
@KamilMaliszArdigen KamilMaliszArdigen changed the base branch from master to dev November 5, 2024 15:04
@nf-core nf-core deleted a comment from github-actions bot Nov 5, 2024
Copy link

github-actions bot commented Nov 5, 2024

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit 7065efa

+| ✅ 300 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.0.2
  • Run at 2024-12-03 13:55:22

@KamilMaliszArdigen
Copy link
Author

Hi @pinin4fjords, This PR is updating limma module with it's latest version. I will be more than happy to provide any additional information if needed.

@KamilMaliszArdigen
Copy link
Author

@pinin4fjords I would really appreciate your input in this topic. Thank you in advance.

README.md Outdated Show resolved Hide resolved
conf/modules.config Outdated Show resolved Hide resolved
@DSchreyer
Copy link

DSchreyer commented Nov 15, 2024

I am getting this error when running a simple comparison. It seems that in FILTER_DIFFTABLE it expects a log2FoldChange column which does not exist.

This is the command:

        nextflow run output/differentialabundance/main.nf \
          --input ${ diffabundance_samplesheet } \
          --matrix ${ diffabundance_counts } \
          --contrasts ${ diffabundance_contrasts } \
          --outdir output \
          --observations_id_col sample \
          --features_id_col geneID \
          -profile rnaseq \
          --differential_use_limma 
Workflow execution completed unsuccessfully
The exit status of the task that caused the workflow execution to fail was: 1

Error executing process > 'NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:FILTER_DIFFTABLE (1)'

Caused by:
  Process `NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:FILTER_DIFFTABLE (1)` terminated with an error exit status (1)


Command executed:

  #!/usr/bin/env python
  
  from math import log2
  from os import path
  import pandas as pd
  import platform
  from sys import exit
  
  # 1. Check that the current logFC/padj is not NA
  # 2. Check that the current logFC is >= threshold (abs does not work, so use a workaround)
  # 3. Check that the current padj is <= threshold
  # If this is true, the row is written to the new file, otherwise not
  if not any("VISIT_V05_vs_V02.limma.results.tsv".endswith(ext) for ext in [".csv", ".tsv", ".txt"]):
      exit("Please provide a .csv, .tsv or .txt file!")
  
  table = pd.read_csv("VISIT_V05_vs_V02.limma.results.tsv", sep=("," if "VISIT_V05_vs_V02.limma.results.tsv".endswith(".csv") else "	"), header=0)
  logFC_threshold = log2(float("2.0"))
  table = table[~table["log2FoldChange"].isna() &
              ~table["padj"].isna() &
              (pd.to_numeric(table["log2FoldChange"], errors='coerce').abs() >= float(logFC_threshold)) &
              (pd.to_numeric(table["padj"], errors='coerce') <= float("0.05"))]
  
  table.to_csv(path.splitext(path.basename("VISIT_V05_vs_V02.limma.results.tsv"))[0]+"_filtered.tsv", sep="	", index=False)
  
  with open('versions.yml', 'a') as version_file:
      version_file.write('"NFCORE_DIFFERENTIALABUNDANCE:DIFFERENTIALABUNDANCE:FILTER_DIFFTABLE":' + "\n")
      version_file.write("    pandas: " + str(pd.__version__) + "\n")

Command exit status:
  1

Command output:
  (empty)

Command error:
  Traceback (most recent call last):
    File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3803, in get_loc
      return self._engine.get_loc(casted_key)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "pandas/_libs/index.pyx", line 138, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/index.pyx", line 165, in pandas._libs.index.IndexEngine.get_loc
    File "pandas/_libs/hashtable_class_helper.pxi", line 5745, in pandas._libs.hashtable.PyObjectHashTable.get_item
    File "pandas/_libs/hashtable_class_helper.pxi", line 5753, in pandas._libs.hashtable.PyObjectHashTable.get_item
  KeyError: 'log2FoldChange'
  
  The above exception was the direct cause of the following exception:
  
  Traceback (most recent call last):
    File ".command.sh", line 18, in <module>
      table = table[~table["log2FoldChange"].isna() &
                     ~~~~~^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 3805, in __getitem__
      indexer = self.columns.get_loc(key)
                ^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/site-packages/pandas/core/indexes/base.py", line 3805, in get_loc
      raise KeyError(key) from err
  KeyError: 'log2FoldChange'

@KamilMaliszArdigen
Copy link
Author

KamilMaliszArdigen commented Nov 15, 2024

@DSchreyer There are differences in column names in outputs of deseq2 and limma so we created dedicated profile rnaseq_limma where this is adjusted for next steps. You can see in details what options are set here:

// Differential options

Please try to launch pipeline with following command:

        nextflow run output/differentialabundance/main.nf \
          --input ${ diffabundance_samplesheet } \
          --matrix ${ diffabundance_counts } \
          --contrasts ${ diffabundance_contrasts } \
          --outdir output \
          --observations_id_col sample \
          --features_id_col geneID \
          -profile rnaseq_limma

@DSchreyer
Copy link

@DSchreyer There are differences in column names in outputs of deseq2 and limma so we created dedicated profile rnaseq_limma where this is adjusted for next steps. You can see in details what options are set here:

// Differential options

Please try to launch pipeline with following command:

        nextflow run output/differentialabundance/main.nf \
          --input ${ diffabundance_samplesheet } \
          --matrix ${ diffabundance_counts } \
          --contrasts ${ diffabundance_contrasts } \
          --outdir output \
          --observations_id_col sample \
          --features_id_col geneID \
          -profile rnaseq_limma

@KamilMaliszArdigen Thanks, yes this helped to solve some of the issues. However, I am still getting the following error in the PLOT_EXPLORATORY process with my own dataset.

  In cond_log2_transform_matrix(matrix_data = assay_data[[index]],  :
    NaNs produced
  Fontconfig error: No writable cache directories
  Error in density.default(cond_log2_transform_matrix(plotmatrices[[pm]][,  : 
    'x' contains missing values
  Calls: ggplot_densityplot ... lapply -> FUN -> density -> density -> density.default
  Execution halted

could this come from the issue that we can have negative values in the normalised count matrix ? I also tested it with the test_rnaseq_limma profile which works as expected. However, i checked the normalised count tables from the test data and there are no negative values. Could this be an issue ?

@KamilMaliszArdigen
Copy link
Author

@DSchreyer So main difference is that in rnaseq profile we have exploratory_log2_assays = 'raw,normalised' and in rnaseq_limma this is left empty. I'm not sure what is happening in your data please take a look at the work directory and the LIMMA count table - I expect that there might be negative values present. If this is the case it will be needed to keep exploratory_log2_assays empty at least this is my understanding of the plotting logic.

@KamilMaliszArdigen
Copy link
Author

@nf-core-bot fix linting

@DSchreyer
Copy link

@DSchreyer So main difference is that in rnaseq profile we have exploratory_log2_assays = 'raw,normalised' and in rnaseq_limma this is left empty. I'm not sure what is happening in your data please take a look at the work directory and the LIMMA count table - I expect that there might be negative values present. If this is the case it will be needed to keep exploratory_log2_assays empty at least this is my understanding of the plotting logic.

Hi @KamilMaliszArdigen yes, i checked and there are negative values. When using --limma_use_voom false it skips the plotting which resolves the issue. Do you think that is a good idea or could interfere with the previous steps ?

@KamilMaliszArdigen
Copy link
Author

@DSchreyer So main difference is that in rnaseq profile we have exploratory_log2_assays = 'raw,normalised' and in rnaseq_limma this is left empty. I'm not sure what is happening in your data please take a look at the work directory and the LIMMA count table - I expect that there might be negative values present. If this is the case it will be needed to keep exploratory_log2_assays empty at least this is my understanding of the plotting logic.

Hi @KamilMaliszArdigen yes, i checked and there are negative values. When using --limma_use_voom false it skips the plotting which resolves the issue. Do you think that is a good idea or could interfere with the previous steps ?

Well this will not work as expected voom is responsible for normalisation of rna_seq data. So this will result in not normalised data processing. I recommend to set the --exploratory_log2_assays = '' and than simply during plot generation log2 will not be generated.

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this go into the usage docs please?

README.md Outdated Show resolved Hide resolved
conf/modules.config Outdated Show resolved Hide resolved
@DSchreyer
Copy link

@DSchreyer So main difference is that in rnaseq profile we have exploratory_log2_assays = 'raw,normalised' and in rnaseq_limma this is left empty. I'm not sure what is happening in your data please take a look at the work directory and the LIMMA count table - I expect that there might be negative values present. If this is the case it will be needed to keep exploratory_log2_assays empty at least this is my understanding of the plotting logic.

Hi @KamilMaliszArdigen yes, i checked and there are negative values. When using --limma_use_voom false it skips the plotting which resolves the issue. Do you think that is a good idea or could interfere with the previous steps ?

Well this will not work as expected voom is responsible for normalisation of rna_seq data. So this will result in not normalised data processing. I recommend to set the --exploratory_log2_assays = '' and than simply during plot generation log2 will not be generated.

Using --exploratory_log2_assays "" \ results in the following error:

The following invalid input values have been detected:

* --exploratory_log2_assays (true): Value is [boolean] but should be [string]

@DSchreyer
Copy link

I experience another issue when trying to run limma with the blocking column:

contrasts.csv

id,variable,reference,target,blocking
TRT_VISIT_TRT_Dose1_V05_vs_V02,TRT_VISIT,TRT_Dose1_V02,TRT_Dose1_V05,SUBJID

error:

Coefficients not estimable: TRT_VISIT.TRT_Dose1_V05 TRT_VISIT.TRT_Dose2_V05
Warning message:
Partial NA coefficients for 1150 probe(s)
Coefficients not estimable: TRT_VISIT.TRT_Dose1_V05 TRT_VISIT.TRT_Dose2_V05
Warning message:
Partial NA coefficients for 1150 probe(s)
Error in contrasts.fit(fit, contrast.matrix) :
  trying to take contrast of non-estimable coefficient
Execution = design
)
halted

without the blocking column it works normally. Any issues on my side here or something that needs to be fixed on pipeline level? This should be with regards to the mixedmodel analysis

@pinin4fjords
Copy link
Member

@DSchreyer could you maybe make future support queries in Slack (differentialabundance channel) please? This PR is not really related.

But it sounds like you're specifying a blocking variable that's non-independent from your variable of interest.

@KamilMaliszArdigen
Copy link
Author

@nf-core-bot fix linting

@KamilMaliszArdigen
Copy link
Author

@DSchreyer So main difference is that in rnaseq profile we have exploratory_log2_assays = 'raw,normalised' and in rnaseq_limma this is left empty. I'm not sure what is happening in your data please take a look at the work directory and the LIMMA count table - I expect that there might be negative values present. If this is the case it will be needed to keep exploratory_log2_assays empty at least this is my understanding of the plotting logic.

Hi @KamilMaliszArdigen yes, i checked and there are negative values. When using --limma_use_voom false it skips the plotting which resolves the issue. Do you think that is a good idea or could interfere with the previous steps ?

Well this will not work as expected voom is responsible for normalisation of rna_seq data. So this will result in not normalised data processing. I recommend to set the --exploratory_log2_assays = '' and than simply during plot generation log2 will not be generated.

Using --exploratory_log2_assays "" \ results in the following error:

The following invalid input values have been detected:

* --exploratory_log2_assays (true): Value is [boolean] but should be [string]

Have you provided --exploratory_log2_assays="" or --exploratory_log2_assays "" in CLI while launching nextflow? This is evaluated differently by nextflow.

@grst
Copy link
Member

grst commented Dec 20, 2024

Hi @KamilMaliszArdigen,

thanks for your work on this! We aligned within our team at BI and with @pinin4fjords, and we decided to close this PR and move forward with implementing DREAM (#363) instead. This is a generalization of limma that works with an arbitrary number of random effects, doesn't require manual addition of columns to the metadata table and allows to conveniently specify random effects using a wilkinson formula, e.g. ~ timepoint + (1 | subject_id).

I know it sucks throwing this away, but sometimes one needs to get started on something to find out that it's not the best way forward.

@grst grst closed this Dec 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Mixed-model in Limma
5 participants