-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a warning in the RMD report when some rows are lost after merging Diff and Fatures tables #354
Add a warning in the RMD report when some rows are lost after merging Diff and Fatures tables #354
Conversation
|
A lot of other changes here, do you need to sort out via Git? |
b468572
to
993d74f
Compare
@pinin4fjords Sorry about that, it's corrected now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments, but we should probably handle this upstream actually. We should be able to guarantee that this step receives a feature table with all the features. I feel like if that's not happening there's a bigger problem elsewhere.
Co-authored-by: Jonathan Manning <[email protected]>
Could you merge the latest changes from dev? I've incorporated some modifications to the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The new logic is quite verbose, I asked ChatGPT to suggest a more concise version:
# Read differential results and merge with features table
results <- lapply(differential_files, function(diff_file) {
if (!file.exists(diff_file)) stop(paste("Differential file", diff_file, "does not exist"))
diff <- read_differential(
diff_file,
feature_id_column = params$differential_feature_id_column,
fc_column = params$differential_fc_column,
pval_column = params$differential_pval_column,
qval_column = params$differential_qval_column
)
# Log transform fold changes if not already logged
if (!params$differential_foldchanges_logged) {
diff[[params$differential_fc_column]] <- log2(diff[[params$differential_fc_column]])
}
# Annotate differential table if features table is provided
if (!is.null(params$features)) {
merged <- merge(features, diff, by.x = params$features_id_col, by.y = params$differential_feature_id_column)
n_missing <- length(setdiff(diff[[params$differential_feature_id_column]], merged[[params$features_id_col]]))
warnings <- c(
if (n_missing > 0) sprintf(
'<p style="color:#DAA520;"><strong>WARNING:</strong> %d IDs from the differential table (%s) were lost on merge with features table (%s).</p>',
n_missing, basename(diff_file), basename(params$features)
),
if (nrow(merged) < nrow(diff)) sprintf(
'<p style="color:#DAA520;"><strong>WARNING:</strong> Rows were lost on merge (%s -> %s). Original: %d, Merged: %d.</p>',
basename(diff_file), basename(params$features), nrow(diff), nrow(merged)
),
if (nrow(merged) > nrow(diff)) sprintf(
'<p style="color:#DAA520;"><strong>WARNING:</strong> Rows were duplicated on merge (%s -> %s). Original: %d, Merged: %d.</p>',
basename(diff_file), basename(params$features), nrow(diff), nrow(merged)
)
)
} else {
merged <- diff
warnings <- character(0)
}
list(diff_features = merged, warnings = warnings)
})
# Separate differential_results and warnings_list from results
differential_results <- lapply(results, `[[`, "diff_features")
warnings_list <- unlist(lapply(results, `[[`, "warnings"))
What do you think?
It would also be good to have a 'before and after' of the report from test_full, to make sure nothing is changed unexpectedly. I can do it, but not quickly, so if you have time to do that and post them here that would be awesome.
@pinin4fjords Done! I just added some comments to your solution to make it not only shorter but also readable. I cannot attach the output HTML files here, If you know how to do it just let me know. To my surprise, some genes were lost in the process when running the test_full profile |
OK, if this is having an impact on results we'll need to spend some time figuring that out before any merge |
Hello @pinin4fjords! I'd kindly suggest going on with this PR. This is a preexisting problem that got visualized now with these warnings (reinforcing the concept of always checking these steps). It's not about the new code as you can see with the following outputs: Printing ## Dimensions of DIFF before merging
## 31317 4
## Dimensions of DIFF after merging
## 27743 6 Printing ## Dimensions of DIFF before merging
## 31317 4
## Dimensions of DIFF after merging
## 27743 6 We can work on the issue after this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, sorry, misunderstood your comment, go ahead :-)
PR checklist
CHANGELOG.md
is updated.Issue reported:
On #340 it was reported that many genes were lost in the volcano plot generated by the R Markdown report, but that the PNG file was correct. These genes get lost if the annotation table doesn't contain them, resulting in reduced plots and results in this last part. As mentioned by the user, big discrepancies can be easily catched, but the small ones can go unnoticed, ending up in undesired results.
So, a warning message was added if the annotated differential expressed table contains fewer rows. Other solutions can be: