Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Keep all windows in the scoreMatrix #197

Open
balwierz opened this issue Jan 27, 2021 · 2 comments
Open

[Feature request] Keep all windows in the scoreMatrix #197

balwierz opened this issue Jan 27, 2021 · 2 comments

Comments

@balwierz
Copy link

Currently ScoreMatrix does not keep all windows in the output. It removes all windows which overlap exactly 0 features in target.
This causes two research problems:

  • it gives an impression (on a plotted heatmap), that there is more signal across windows than there really is
  • it breaks the correspondence between heatmap rows and windows. In any subsequent analysis the user needs to create a mapping between them. Grouping becomes cumbersome and may lead to severe bugs.

Personally, I cannot imagine a situation where removing windows would be beneficial. No overlap is a valid research result. Zero signal lies on the continuum next to very low signal (which is included).

@frenkiboy
Copy link
Contributor

Dear Piotr,

Rownames of the scorematrix should correspond to the windows which were kept during the overlap.

You can also get around this by setting up the seqlevels (from one of your previous examples):
seqlengths(gr3) = seqlengths(gr4) = 10

Cheers,

v

@balwierz
Copy link
Author

Hi,

Thanks, I know about the rownames and I am using them. I also always fix seqInfo of any GRanges object I create in R.
I started this not for myself, but rather for other users. As I know some people never assign seqInfo.

I am currently using a slightly slower, high-level variation on ScoreMatrix to keep all the windows and keep window:target orientation pairs only (#108)

ScoreMatrixPiotr <- function(target, windows, ignore.strand=FALSE)
{
    stopifnot(max(width(windows)) == min(width(windows)))
    summariseOneRow <- function(target, window)
    {
        as.numeric(append(coverage(ranges(target)), Rle(0L, 1000000000))[ranges(window)])
    }
    ret <- matrix(0, ncol = length(windows), nrow=width(windows[1])) # this will be transposed
    o <- findOverlaps(target, windows, ignore.strand=ignore.strand) %>%
        as_tibble() %>%
        group_by(subjectHits) %>%
        summarise(data=list(summariseOneRow(target[queryHits], windows[subjectHits[1]])))
    rowI <- pull(o, subjectHits)
    data <- pull(o, data)
    strands <- strand(windows) %>% as.vector()
    for(i in seq_along(rowI))
    {
        if(strands[rowI[i]] == "-")
            ret[ , rowI[i]] <- rev(data[i][[1]])
        else
            ret[ , rowI[i]] <- data[i][[1]]
    }
    t(ret)
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants