Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a warning in the RMD report when some rows are lost after merging Diff and Fatures tables #354

Conversation

alanmmobbs93
Copy link

@alanmmobbs93 alanmmobbs93 commented Nov 14, 2024

PR checklist

  • This comment contains a description of changes (with reason).
  • CHANGELOG.md is updated.

Issue reported:

On #340 it was reported that many genes were lost in the volcano plot generated by the R Markdown report, but that the PNG file was correct. These genes get lost if the annotation table doesn't contain them, resulting in reduced plots and results in this last part. As mentioned by the user, big discrepancies can be easily catched, but the small ones can go unnoticed, ending up in undesired results.

So, a warning message was added if the annotated differential expressed table contains fewer rows. Other solutions can be:

  • Retain all rows even if there's no data about these genes.
  • Report the error and stop the pipeline (I believe this should be done as an early validation when starting the pipeline).

Copy link

github-actions bot commented Nov 14, 2024

nf-core pipelines lint overall result: Passed ✅ ⚠️

Posted for pipeline commit e7fd0b4

+| ✅ 300 tests passed       |+
#| ❔   6 tests were ignored |#
!| ❗   4 tests had warnings |!

❗ Test warnings:

  • pipeline_todos - TODO string in main.nf: Optionally add in-text citation tools to this list.
  • pipeline_todos - TODO string in main.nf: Optionally add bibliographic entries to this list.
  • pipeline_todos - TODO string in main.nf: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled!
  • pipeline_todos - TODO string in base.config: Check the defaults for all processes

❔ Tests ignored:

✅ Tests passed:

Run details

  • nf-core/tools version 3.0.2
  • Run at 2024-11-20 12:20:40

@pinin4fjords
Copy link
Member

A lot of other changes here, do you need to sort out via Git?

@alanmmobbs93 alanmmobbs93 force-pushed the update/warning-for-merging-counts-and-features branch from b468572 to 993d74f Compare November 14, 2024 19:11
@alanmmobbs93
Copy link
Author

alanmmobbs93 commented Nov 14, 2024

@pinin4fjords Sorry about that, it's corrected now

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments, but we should probably handle this upstream actually. We should be able to guarantee that this step receives a feature table with all the features. I feel like if that's not happening there's a bigger problem elsewhere.

assets/differentialabundance_report.Rmd Outdated Show resolved Hide resolved
assets/differentialabundance_report.Rmd Outdated Show resolved Hide resolved
@atrigila
Copy link

Could you merge the latest changes from dev? I've incorporated some modifications to the ci.yml, that should make your tests run :).

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new logic is quite verbose, I asked ChatGPT to suggest a more concise version:

# Read differential results and merge with features table
results <- lapply(differential_files, function(diff_file) {
    if (!file.exists(diff_file)) stop(paste("Differential file", diff_file, "does not exist"))

    diff <- read_differential(
        diff_file,
        feature_id_column = params$differential_feature_id_column,
        fc_column = params$differential_fc_column,
        pval_column = params$differential_pval_column,
        qval_column = params$differential_qval_column
    )

    # Log transform fold changes if not already logged
    if (!params$differential_foldchanges_logged) {
        diff[[params$differential_fc_column]] <- log2(diff[[params$differential_fc_column]])
    }

    # Annotate differential table if features table is provided
    if (!is.null(params$features)) {
        merged <- merge(features, diff, by.x = params$features_id_col, by.y = params$differential_feature_id_column)
        n_missing <- length(setdiff(diff[[params$differential_feature_id_column]], merged[[params$features_id_col]]))
        warnings <- c(
            if (n_missing > 0) sprintf(
                '<p style="color:#DAA520;"><strong>WARNING:</strong> %d IDs from the differential table (%s) were lost on merge with features table (%s).</p>',
                n_missing, basename(diff_file), basename(params$features)
            ),
            if (nrow(merged) < nrow(diff)) sprintf(
                '<p style="color:#DAA520;"><strong>WARNING:</strong> Rows were lost on merge (%s -> %s). Original: %d, Merged: %d.</p>',
                basename(diff_file), basename(params$features), nrow(diff), nrow(merged)
            ),
            if (nrow(merged) > nrow(diff)) sprintf(
                '<p style="color:#DAA520;"><strong>WARNING:</strong> Rows were duplicated on merge (%s -> %s). Original: %d, Merged: %d.</p>',
                basename(diff_file), basename(params$features), nrow(diff), nrow(merged)
            )
        )
    } else {
        merged <- diff
        warnings <- character(0)
    }

    list(diff_features = merged, warnings = warnings)
})

# Separate differential_results and warnings_list from results
differential_results <- lapply(results, `[[`, "diff_features")
warnings_list <- unlist(lapply(results, `[[`, "warnings"))

What do you think?

It would also be good to have a 'before and after' of the report from test_full, to make sure nothing is changed unexpectedly. I can do it, but not quickly, so if you have time to do that and post them here that would be awesome.

@alanmmobbs93
Copy link
Author

@pinin4fjords Done! I just added some comments to your solution to make it not only shorter but also readable. I cannot attach the output HTML files here, If you know how to do it just let me know. To my surprise, some genes were lost in the process when running the test_full profile

@pinin4fjords
Copy link
Member

To my surprise, some genes were lost in the process when running the test_full profile

OK, if this is having an impact on results we'll need to spend some time figuring that out before any merge

@alanmmobbs93
Copy link
Author

Hello @pinin4fjords! I'd kindly suggest going on with this PR. This is a preexisting problem that got visualized now with these warnings (reinforcing the concept of always checking these steps). It's not about the new code as you can see with the following outputs:

Printing dim(diff) on dev branch

## Dimensions of DIFF before merging
## 31317 4 
## Dimensions of DIFF after merging
## 27743 6

Printing dim(diff) on feature branch:

## Dimensions of DIFF before merging
## 31317 4 
## Dimensions of DIFF after merging
## 27743 6 

We can work on the issue after this

Copy link
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sorry, misunderstood your comment, go ahead :-)

@alanmmobbs93 alanmmobbs93 merged commit 6307b48 into nf-core:dev Nov 21, 2024
16 checks passed
@alanmmobbs93 alanmmobbs93 deleted the update/warning-for-merging-counts-and-features branch December 12, 2024 22:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants