Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hicConvertFormat : Missing negative values in ginteractions (.tsv) output #584

Open
u-n-i-v-e-r-z opened this issue Jul 13, 2020 · 2 comments

Comments

@u-n-i-v-e-r-z
Copy link

Hi everyone,

I hope you're all fine.
I noticed that hicConvertFormat won't return negative values that are still present in a given .h5 matrix (norm KR).
This matrix represent differential interaction measures between two given conditions.

I'm using hicexplorer_3.4.3
Here is the output when I import the .h5 matrix :

>>> ma = HiCMatrix.hiCMatrix("myMatrix_25kb.norm.KR.h5")   
>>> ma.getMatrix()

matrix([[ 1.8099686 , -1.33049537, -3.69973388, ...,  0.        ,
          0.        , -1.08915401],
        [-1.33049537,  2.01023492, -3.59139214, ..., -1.1260478 ,
          0.16709778,  0.20518745],
        [-3.69973388, -3.59139214,  1.7171135 , ..., -0.7801746 ,
          0.        , -0.19838392],
        ...,
        [ 0.        , -1.1260478 , -0.7801746 , ...,  1.69022627,
          1.68831546, -0.23708201],
        [ 0.        ,  0.16709778,  0.        , ...,  1.68831546,
          0.96983819, -0.15338982],
        [-1.08915401,  0.20518745, -0.19838392, ..., -0.23708201,
         -0.15338982,  3.50339214]])

Negative values are present and it is confirmed when plotting the matrix.

Then I use hicConvertFormat like this :

hicConvertFormat --matrices myMatrix_25kb.norm.KR.h5 --outFileName myMatrix_25kb.norm.KR.gi --inputFormat h5 --outputFormat ginteractions

And this is the output of the .tsv file I obtain (under R in cm object):

> cm <- data.table::fread("myMatrix_25kb.norm.KR.gi.tsv")
> cm[order(cm$V4,cm$V5),]

   V1       V2       V3 V4       V5       V6         V7
      1:  I        0    25000  I        0    25000 1.80996860
      2:  I    25000    50000  I    25000    50000 2.01023492
      3:  I    50000    75000  I    50000    75000 1.71711350
      4:  I    50000    75000  I    75000   100000 0.04064520
      5:  I    75000   100000  I    75000   100000 1.71038424

What is weird is when asking for negative values, the only one I can obtain is located at the very end of the chromosome I:

cm[cm$V7 < 0,]

V1       V2       V3 V4       V5       V6         V7
1:  I 15050000 15072434  I 15050000 15072434 -0.6984005

I would rather use .h5 file but I have several problems with blosc filter compatibility and I can't load matrices under R currently.
Any workaround would be greatly appreciated !

BR

Alex

@joachimwolff
Copy link
Collaborator

The posted values are matching in the matrix and in your ginteractions file. At least the first three, these are exactly the diagonal values (0, 0), (1,1) and (2,2).

  • What is the content of the ginteractions file if you do not open it with R, but sth like less?
  • What are the values at I 0 25000 I 25000 50000, I 0 25000 I 50000 75000, I 0 25000 I 75000 100000?

We just write out what the content of the matrix is: https://github.com/deeptools/HiCMatrix/blob/master/hicmatrix/lib/ginteractions.py#L25

Maybe the problem is located in R?

@u-n-i-v-e-r-z
Copy link
Author

@joachimwolff,

- What is the content of the ginteractions file if you do not open it with R, but sth like less?

Those are the first lines from call to "less" :

I       0       25000   I       0       25000   1.8099686004600926
I       0       25000   I       3675000 3700000 0.2824356663718901
I       0       25000   I       3975000 4000000 0.46480353773800975
I       0       25000   I       4000000 4025000 0.136099560651966
I       0       25000   I       4575000 4600000 0.35421538999192403
I       0       25000   I       5100000 5125000 0.606264897271899

- What are the values at I 0 25000 I 25000 50000, I 0 25000 I 50000 75000, I 0 25000 I 75000 100000?

> cm[V1=="I" & V2==0 & V3==25000 & V4=="I" & V5==25000 & V6==50000,]
Empty data.table (0 rows and 7 cols): V1,V2,V3,V4,V5,V6...

> cm[V1=="I" & V2==0 & V3==25000 & V4=="I" & V5==50000 & V6==75000,]
Empty data.table (0 rows and 7 cols): V1,V2,V3,V4,V5,V6...

> cm[V1=="I" & V2==0 & V3==25000 & V4=="I" & V5==75000 & V6==100000,]
Empty data.table (0 rows and 7 cols): V1,V2,V3,V4,V5,V6...

> # Test on random ranges diagonal
> cm[V1=="I" & V2==0 & V3==25000 & V4=="I" & V5==0 & V6==25000,]
   V1 V2    V3 V4 V5    V6       V7
1:  I  0 25000  I  0 25000 1.809969

> # Test on random ranges out diag
> cm[V1=="I" & V2==50000 & V3==75000 & V4=="I" & V5==75000 & V6==100000,]
   V1    V2    V3 V4    V5     V6        V7
1:  I 50000 75000  I 75000 100000 0.0406452

To me nothing is wrong here, negative values are simply discarded.
To be sure I tested if I had the same number of lines in raw .tsv and after loading it into R :

Bash

ẁc -l myMatrix_25kb.norm.KR.gi.tsv
3091491

R

> nrow(cm)
[1] 3091491

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants