Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Loss of *.cool 'weight' column when working with hicAdjustMatrix; possible to retain 'weight' column? #740

Open
7 tasks done
kalavattam opened this issue Jul 25, 2021 · 2 comments

Comments

@kalavattam
Copy link

kalavattam commented Jul 25, 2021

Welcome to the HiCExplorer GitHub repository! Before opening the issue please check
that the following requirements are met :

  • Search whether this issue (or a similar issue) has been solved before using the search tab above. Link the previous issue if appropriate below.

Maybe on TODO already? The TODO bullet references this issue.

  • Paste your HiCExplorer version (hicInfo --version) and your python version (python --version) below.
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
DEBUG:h5py._conv:Creating converter from 7 to 5
DEBUG:h5py._conv:Creating converter from 5 to 7
hicInfo 3.7-dev
Python 3.8.3
cooler, version 0.8.11
  • Have you checked our documentation on hicexplorer.readthedocs.io?
  • Do you use conda to install HiCExplorer?
  • Do you use the latest HiCExplorer release? If not, please install it via a conda environment:
    conda create --name hicexplorer hicexplorer=3.6 python=3.8 -c bioconda -c conda-forge
    and activate the environment: conda activate hicexplorer. Retry your command. You can exit a conda environment via conda deactivate. To learn more about conda and environments, please consider the following documentation.

Retry your command, is it solved now? If not please continue with the following:

  • Paste the full HiCExplorer command that produces the issue below
    (ignore if you simply spotted the issue in the code/documentation).
parallel --header : --colsep " " -k -j "${parallel}" \
"hicAdjustMatrix \
--matrix {infile} \
--outFileName {outfile} \
--chromosomes chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX \
--interIntraHandling {intra_inter}" \
:::: "${list}"

#  have tested with {intra_inter} as either "inter" or "intra" 
#+ {infile} is, for example, cool2cool_balance_cardioPooled_D0_500000_downsampled_pooled_cis.0.78.cool
#+ {outfile} is, for example, cool2cool_balance_cardioPooled_D0_500000_downsampled_pooled_cis.0.78.sans-inter.cool
  • Paste the output printed on screen from the command that produces the issue
    below (ignore if you simply spotted the issue in the code/documentation).

Example hicInfo readout for infile, cool2cool_balance_cardioPooled_D0_500000_downsampled_pooled_cis.0.78.cool

# Matrix information file. Created with HiCExplorer's hicInfo version 3.6
File:	/net/noble/vol7/kga0/2020_endothelial-diff/data/HiC-Pro_hic2cool_cool2cool_downsampled_balance/500000/pooled_cis/cool2cool_balance_cardioPooled_D0_500000_downsampled_pooled_cis.0.78.cool
Date:	2021-07-20T17:28:02.057103
Genome assembly:	unknown
Size:	6,189
Bin_length:	500000
Chromosomes:length: chrM: 16569 bp; chr1: 248956422 bp; chr2: 242193529 bp; chr3: 198295559 bp; chr4: 190214555 bp; chr5: 181538259 bp; chr6: 170805979 bp; chr7: 159345973 bp; chr8: 145138636 bp; chr9: 138394717 bp; chr10: 133797422 bp; chr11: 135086622 bp; chr12: 133275309 bp; chr13: 114364328 bp; chr14: 107043718 bp; chr15: 101991189 bp; chr16: 90338345 bp; chr17: 83257441 bp; chr18: 80373285 bp; chr19: 58617616 bp; chr20: 64444167 bp; chr21: 46709983 bp; chr22: 50818468 bp; chrX: 156040895 bp; chrY: 57227415 bp; 
Number of chromosomes:	25
Non-zero elements:	13,690,671
The following columns are available: ['chrom' 'start' 'end' 'weight']


Generated by:	cooler-0.8.6.post0

Example hicInfo readout for outfile, cool2cool_balance_cardioPooled_D0_500000_downsampled_pooled_cis.0.78.sans-inter.cool (i.e., retain cis contacts only)

# Matrix information file. Created with HiCExplorer's hicInfo version 3.6
File:	/net/noble/vol7/kga0/2020_endothelial-diff/data/HiC-Pro_hic2cool_cool2cool_downsampled_balance/500000/pooled_cis/cool2cool_balance_cardioPooled_D0_500000_downsampled_pooled_cis.0.78.sans-inter.cool
Date:	2021-07-25T14:31:02.147295
Genome assembly:	unknown
Size:	6,073
Bin_length:	500000
Chromosomes:length: chr1: 248956422 bp; chr2: 242193529 bp; chr3: 198295559 bp; chr4: 190214555 bp; chr5: 181538259 bp; chr6: 170805979 bp; chr7: 159345973 bp; chr8: 145138636 bp; chr9: 138394717 bp; chr10: 133797422 bp; chr11: 135086622 bp; chr12: 133275309 bp; chr13: 114364328 bp; chr14: 107043718 bp; chr15: 101991189 bp; chr16: 90338345 bp; chr17: 83257441 bp; chr18: 80373285 bp; chr19: 58617616 bp; chr20: 64444167 bp; chr21: 46709983 bp; chr22: 50818468 bp; chrX: 156040895 bp; 
Number of chromosomes:	23
Non-zero elements:	776,949
The following columns are available: ['chrom' 'start' 'end']


Generated by:	HiCMatrix-15
Cooler library version:	cooler-0.8.11
HiCMatrix url:	https://github.com/deeptools/HiCMatrix
#  fwiw, I have excluded chrM and chrY in the conversion from infile to outfile...

Wanted behavior: Retention of column 'weight' in outfile (and/or an option to retain or discard the column)
Actual behavior: Loss of column 'weight' in outfile

Note: Thank you for implementing an option to exclude to cis or trans contacts via hicAdjustMatrix. This makes different kinds of Hi-C analyses easier to do. If possible, I think it would be useful to have the ability to retain the weights from balanced matrices (i.e., matrices previously comprised of both cis and trans contacts). For now, I would like to avoid re-balancing with cis or trans contacts only. I can imagine that this option would be useful for adjusted matrices comprised of subsets of chromosomes, including both cis and trans contacts.

@kalavattam
Copy link
Author

Hi, I've looked into this a bit more and it appears that, in the process of using hicAdjustMatrix, unbalanced counts are replaced by the balanced counts, hence the loss of the column 'weight'. This is clear in the attached images.

I think this behavior is fine for my use cases (although I'm not sure about other researchers).

I tried to find this in the code, but didn't really see it after a quick check of hicAdjustMatrix.py in the develop branch.

Anyways, thanks!

cool2cool_balance_cardioPooled_D0_500000_downsampled_pooled_cis 0 78 chrAll min0 0001_max1
cool2cool_balance_cardioPooled_D0_500000_downsampled_pooled_cis 0 78 sans-inter chrAll min0 0001_max1
cool2cool_balance_cardioPooled_D0_500000_downsampled_pooled_cis 0 78 sans-intra chrAll min0 0001_max1

@joachimwolff
Copy link
Collaborator

Hi,

this is more a feature than a bug :)

The issue is a bit of a historical one now; the h5 files never stored the correction factors and raw values separately, therefore in many parts of the source code, including in hicAdjustMatrix, we have not changed this behavior for the cool files. As long as the users stay within HiCExplorer, it should also not matter. However, I think it is not too difficult to change the behavior; we just need to implement it. Maybe in the next 3.8 release, but given the time I have to implement new features, I cannot promise a 3.8 release will be published this year.

Thanks for the report and all the best,

Joachim

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants