Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allele specific analysis - assignment failure #572

Open
PhrenoVermouth opened this issue Mar 10, 2023 · 5 comments
Open

Allele specific analysis - assignment failure #572

PhrenoVermouth opened this issue Mar 10, 2023 · 5 comments

Comments

@PhrenoVermouth
Copy link

Hi,
I'm doing an allele-specific analysis, the pipeline went well until the final ice-normalization step. Tracing backwards I found the raw matrics were empty, and markAllelicStatus aberrantly assigned the reads. See .allelstat file as follows:

## /usr/local/bin//HiC-Pro_3.1.0/scripts/markAllelicStatus.py
## ibam=bowtie_results/bwt2/E9-5-PLA-rep1/E9-5-PLA-rep1-H3K27ac_S15_L002_cut_mm10.bwt2pairs.bam
## snpFile=/home3/gyang/2.ProjectET_HiCUT_Preprocessing/build/snps_C57b6_PWK_PhJ.vcf
## tag=XA
## output=bowtie_results/bwt2/E9-5-PLA-rep1/E9-5-PLA-rep1-H3K27ac_S15_L002_cut_mm10.bwt2pairs_allspe.bam
## verbose=True
## =========================
Total number of snps loaded	17046154.0
## =========================
Total number of reads	91295716	100
Number of reads with at least one 'N'	85300346	93.433
Number of reads assigned to ref genome	24111100	26.41
Number of reads assigned to alt genome	0	0.0
Number of conflicting reads	0	0.0
Number of unassigned reads	67184616	73.59

The weird thing is that when I fed the bam (E9-5-PLA-rep1-H3K27ac_S15_L002_cut_mm10.bwt2pairs.bam) to SNPsplit to verify the data, it went pretty well:

Allele-tagging report
=====================
Processed 91295716 read alignments in total
Reads were unaligned and hence skipped: 0 (0.00%)
46949809 reads were unassignable (51.43%)
26001038 reads were specific for genome 1 (28.48%)
18286300 reads were specific for genome 2 (20.03%)
46452 reads did not contain one of the expected bases at known SNP positions (0.05%)
58569 contained conflicting allele-specific SNPs (0.06%)

Thus this problem was not caused by the data. For VCF file generation, I stringently followed the utils introduction:

$HICPRO_PATH/bin/utils/extract_snps.py -i mgp.v5.merged.snps_all.dbSNP142.vcf -a PWK_PhJ > snps_C57b6_PWK_PhJ.vcf

and I've checked the vcf file by hand, everything seems normal:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT REF-PWK_PhJ-F1
chr1    3000185 rs585444580     G       T                               999     PASS    DP=930;DP4=283,108,361,178;CSQ=T||||intergenic_variant||||||||  GT
      0|1

One clue is that I got a HUGE markAllelicStatus.log file (>2G), full of warnings (more lines omitted below):

/home/gyang/anaconda3/bin/python /usr/local/bin//HiC-Pro_3.1.0/scripts/markAllelicStatus.py -s /home3/gyang/2.ProjectET_HiCUT_Preprocessing/build/snps_C57b6_PW
K_PhJ.vcf -v -r -i bowtie_results/bwt2/E9-5-PLA-rep1/E9-5-PLA-rep1-H3K27ac_S15_L002_cut_mm10.bwt2pairs.bam -o bowtie_results/bwt2/E9-5-PLA-rep1/E9-5-PLA-rep1-H
3K27ac_S15_L002_cut_mm10.bwt2pairs_allspe.bam
[E::idx_find_and_load] Could not retrieve index file for 'bowtie_results/bwt2/E9-5-PLA-rep1/E9-5-PLA-rep1-H3K27ac_S15_L002_cut_mm10.bwt2pairs.bam'
Warning : no SNPs found at position chr8 : 11389358. N ignored
Warning : no SNPs found at position chr8 : 11389358. N ignored
Warning : no SNPs found at position chr14 : 49493363. N ignored
Warning : no SNPs found at position chr14 : 49493363. N ignored
Warning : no SNPs found at position chr5 : 73611472. N ignored
Warning : no SNPs found at position chr5 : 73611511. N ignored

Your assistance is highly appreciated! And currently, I have to use SNPsplit to tag the reads and sed command for tag substitutions (XX --> XA). Hope that works. LOL

@harshadajadhav95
Copy link

I'm facing the same problem even after changing extract_snps.py as suggested above. Kindly help

@PhrenoVermouth
Copy link
Author

I'm facing the same problem even after changing extract_snps.py as suggested above. Kindly help

Hey @harshadajadhav95, it has been quite a while, maybe you can refer to #606?

Best,
Guang

@harshadajadhav95
Copy link

Thanks for your response. I referred #606. However, I am still encountering the same issue.

Regards,
Harshada

@WardDeb
Copy link

WardDeb commented Nov 27, 2024

You can check out either devel branch, or clone from my fork (https://github.com/wardDeb/hic-pro).
Been a while since I've been looking into this, though, so I'm not so sure what happened in the meantime.

@harshadajadhav95
Copy link

Thank you. I'll do the needful

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants