Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JACUSA_to_TRIBE.pl obsolete? #45

Open
ckuenne opened this issue Aug 16, 2021 · 3 comments
Open

JACUSA_to_TRIBE.pl obsolete? #45

ckuenne opened this issue Aug 16, 2021 · 3 comments
Assignees

Comments

@ckuenne
Copy link

ckuenne commented Aug 16, 2021

Is the JACUSA_to_TRIBE.pl script found here (https://github.com/dieterich-lab/tribe-workflow/tree/master/RRD_workflow) still working for the JACUSA2 output format? Or is that functionality supposed to be superseded by JACUSA2helper? If so, can you suggest a workflow? Since TRIBE was one of the original selling points for JACUSA, an automation or tutorial for this specific pipeline might be of wider interest.

OR should we rather use JACUSA2 directly with -B A2G? I do have stranded paired-end reads, so that should theoretically work, right? But there I run into an error (it works without -B):

JACUSA2 Version: 2.0.1 call-2 -a D,M,Y -filterNM 5 -s -c 2 -T 1 -P RF-FIRSTSTRAND -B A2G -p 16 -r x/jacusa_ht_vs_m3d-ht.A2G.new.org ./star_ht_1/igv/ht_1.bam,./star_ht_2/igv/ht_2.bam,./star_ht_3/igv/ht_3.bam,./star_ht_4/igv/ht_4.bam ./star_m3d-ht_1/igv/m3d-ht_1.bam,./star_m3d-ht_2/igv/m3d-ht_2.bam,./star_m3d-ht_3/igv/m3d-ht_3.bam,./star_m3d-ht_4/igv/m3d-ht_4.bam
--------------------------------------------------------------------------------
INFO    00:00:02  Thread 15: Working on contig chr1:3008197-3108196
INFO    00:00:02  Thread 14: Working on contig chr1:3108523-3208522
INFO    00:00:02  Thread 13: Working on contig chr1:3208523-3308522
INFO    00:00:02  Thread 12: Working on contig chr1:3308523-3408522
INFO    00:00:02  Thread 11: Working on contig chr1:3408523-3508522
INFO    00:00:02  Thread 10: Working on contig chr1:3508523-3608522
ERROR   00:00:02  Problem with read: NS500475:556:HHNVMBGXC:2:23111:24722:17846 in ./star_ht_2/igv/ht_2.bam
java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "htsjdk.samtools.SAMRecord.getIntegerAttribute(String)" is null
        at lib.data.storage.readsubstitution.BaseSubRecordProcessor.collectBaseSubs(BaseSubRecordProcessor.java:258)
        at lib.data.storage.readsubstitution.BaseSubRecordProcessor.processPE(BaseSubRecordProcessor.java:218)
        at lib.data.storage.readsubstitution.BaseSubRecordProcessor.processInternalPE(BaseSubRecordProcessor.java:209)
        at lib.data.storage.readsubstitution.BaseSubRecordProcessor.processPE(BaseSubRecordProcessor.java:184)
        at lib.data.storage.readsubstitution.BaseSubRecordProcessor.process(BaseSubRecordProcessor.java:238)
        at lib.data.storage.container.UnstrandedCacheContainter.process(UnstrandedCacheContainter.java:50)
        at lib.data.storage.container.AbstractStrandedCacheContainer.process(AbstractStrandedCacheContainer.java:83)
        at lib.data.storage.container.RFPairedEnd1CacheContainer.process(RFPairedEnd1CacheContainer.java:1)
        at lib.data.assembler.SiteDataAssembler.buildCache(SiteDataAssembler.java:56)
        at lib.util.ReplicateContainer.createIterators(ReplicateContainer.java:49)
        at lib.util.ConditionContainer.lambda$0(ConditionContainer.java:29)
        at java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
        at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
        at lib.util.ConditionContainer.updateWindowCoordinates(ConditionContainer.java:29)
        at lib.worker.AbstractWorker.updateReservedWindowCoordinate(AbstractWorker.java:170)
        at lib.worker.AbstractWorker.processInit(AbstractWorker.java:186)
        at lib.worker.AbstractWorker.run(AbstractWorker.java:218)

It was mapped with STAR 2.7.9a (including the MD field) and the offending read pair is this one:

NS500475:556:HHNVMBGXC:2:23111:24722:17846	163	chr1	3016524	255	75M	=	3016637	187	CTGGGTCTTCAATTCTGTTACATTGGTCTACTTGTCTGTCACTATACCAGTACCATGCAGTTTTGATAACAATTG	AA//AEE/EEEE//EEEEE//EEE/EE6/EEE/A//</EE6/EA///EEE/EE/EEEEE/<AEE<EE//<<E/E<	NH:i:1	HI:i:1	AS:i:145	nM:i:1	MD:Z:67C7
NS500475:556:HHNVMBGXC:2:23111:24722:17846	83	chr1	3016637	255	74M	=	3016524	-187	CCAGAGGTTCTTTTATCCTTGAGAAGAGTTTTTGCTATCCTCGTTTTTTTGTTATTCCAGATGAATCTGCCGAT	</A<<A<//66EEA///A<E/<A/EEEEEEEEEEEEEEEEE///EEEEEE/EEEAE/E/EEEAAAEAEE6AAAA	NH:i:1	HI:i:1	AS:i:145	nM:i:1	MD:Z:74

I tried different Java versions: oracle_jre_1.8.0_211, open_jdk_15.0.2, open_jdk_16.0.1.

@piechottam piechottam self-assigned this Aug 18, 2021
@piechottam
Copy link
Collaborator

Dear ckuenne,

Thank you for your feedback and detailed problem description!

  1. NullPointerException
    The SAM Output that you provided, suggests that STAR does not adhere to recommended practices for SAM format
    https://samtools.github.io/hts-specs/SAMtags.pdf
    The required "NM" tag is not present in the SAM file.
    According to STAR manual (https://github.com/alexdobin/STAR/blob/master/doc/STARmanual.pdf):
    [...]
    The SAM attributes can be specified by the user using --outSAMattributes A1 A2 A3 ... option
    which accept a list of 2-character SAM attributes. The implemented attributes are: NH HI NM MD
    AS nM jM jI XS. By default, STAR outputs NH HI AS nM attributes.
    [...]
    I am afraid, you will need to rerun STAR with "--outSAMattributes NM NH [...and what ever you need...]"

    STAR output has the nonstandard field "nM".
    From SAMtags.pdf: [...] Note that tags starting with ‘X’, ‘Y’, or ‘Z’ and tags containing lowercase letters
    in either position are reserved for local use and will not be formally defined in any future version of this [...]

  2. JACUSA_to_TRIBE.pl
    The person responsible for this PERL script is on vacation... as soon as he is back, I'll ask him to provide feedback.
    There are some workflows on https://dieterich-lab.github.io/JACUSA2helper/ (Articles).
    In general, we aim to add "useful" methods to JACUSA2helper.

@ckuenne
Copy link
Author

ckuenne commented Aug 20, 2021

Dear piechottam,

Thanks for the quick response! I have rerun STAR with "--outSAMattributes NH HI AS nM NM MD" and that seems to have solved the issue. JACUSA2 is currently running with -B A2G. I might be back with more feedback once that is done.

And maybe it would make sense to implement a small check for the BAM format in JACUSA. STAR is a pretty standard tool to use after all. And the default exception thrown is not really helpful to diagnose this.

@ckuenne
Copy link
Author

ckuenne commented Aug 24, 2021

another question to JACUSA_to_TRIBE.pl: does it only work for stranded_forward or also for stranded_reverse libraries? if i understood correctly, jacusa already takes care of that using -P FR-SECONDSTRAND / RF-FIRSTSTRAND = the jacusa output is relative to the original RNA/DNA, not to the sequencing. so the perl script does not have deal with strandedness again?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants