Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filtering artifacts using -a E:filename #50

Open
BreenMS opened this issue Nov 1, 2021 · 2 comments
Open

filtering artifacts using -a E:filename #50

BreenMS opened this issue Nov 1, 2021 · 2 comments
Assignees

Comments

@BreenMS
Copy link

BreenMS commented Nov 1, 2021

Hi,

We are using the -a E option to filter known SNPs based on a VCF file. In this example, the VCF file we are using for input was obtained UCSC, here: ftp://ftp.ncbi.nlm.nih.gov/snp/organisms/human_9606_b151_GRCh38p7/VCF/

00-common_all.vcf.gz: Common human SNPs (minor allele frequency >= 0.01

We are running the following command:

java -jar $JACUSA2_JAR call-1 -r results.out Sample1.bam -p 10 -a D,M,Y,E:file=00-common_all.vcf:type=VCF -s -m 20 -R GRCh38.primary_assembly.genome.fa -P RF-FIRSTSTRAND

and receiving the following error:

java.lang.NullPointerException
	at jacusa.filter.factory.exclude.FileBasedContainedCoordinate.init(FileBasedContainedCoordinate.java:114)
	at jacusa.filter.factory.exclude.FileBasedContainedCoordinate.<init>(FileBasedContainedCoordinate.java:40)
	at jacusa.filter.ExcludeSiteFilter.<init>(ExcludeSiteFilter.java:28)
	at jacusa.filter.factory.ExcludeSiteFilterFactory.createFilter(ExcludeSiteFilterFactory.java:82)
	at jacusa.filter.factory.FilterFactory.registerFilter(FilterFactory.java:132)
	at jacusa.filter.FilterConfig.registerFilters(FilterConfig.java:64)
	at lib.util.ConditionContainer.initReplicateContainer(ConditionContainer.java:63)
	at lib.worker.AbstractWorker.<init>(AbstractWorker.java:64)
	at jacusa.worker.CallWorker.<init>(CallWorker.java:31)
	at jacusa.method.call.CallMethod.createWorker(CallMethod.java:253)
	at jacusa.method.call.CallMethod.createWorker(CallMethod.java:1)
	at lib.worker.WorkerDispatcher.createWorker(WorkerDispatcher.java:52)
	at lib.worker.WorkerDispatcher.run(WorkerDispatcher.java:78)
	at lib.util.AbstractTool.run(AbstractTool.java:60)
	at jacusa.JACUSA.main(JACUSA.java:96)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.eclipse.jdt.internal.jarinjarloader.JarRsrcLoader.main(JarRsrcLoader.java:61)

We have two questions:

  1. We are unsure what is causing this error, any chance you can advise on this one?
  2. Are you able to provide a link that contains pre-built BED or VCF test files based on things folks would commonly want to filter (common SNPs, black listed regions of the genome, homopolymeric regions etc...).? A couple test examples/files might be useful.

Thank you,

M

@piechottam piechottam self-assigned this Nov 1, 2021
@piechottam
Copy link
Collaborator

The exclude filter uses htsjdk to read BED or VCF files.

The relevant codecs are:

  1. From your error message it seems that the provided VCF file is not correct? Did you unpack 00-common_all.vcf.gz?
    Could you run the VCF file on a VCF checker?

  2. That depends on your analysis and on your data at hand. (I assume you are searching for RNA editing sites).
    The "Best" scenario is the DNA vs. RNA comparison -> JACUSA2 call-2.
    If you happen to have SNP info SPECIFIC for your samples you could use these to filter your candidate RNA editing sites.
    The last resort, is to use an existing SNP database to filter your candidate sites (I guess your current approach).

  3. If you cannot not solve the problem with the VCF file and JACUSA2, I recommend to run JACUSA2 without the exclude filter and filter SNP sites with bedtools afterwards.

@BreenMS
Copy link
Author

BreenMS commented Nov 1, 2021

Thank you for the reply and pointing out the codecs.

  1. Yes, vcf is unpacked and checked with vcftools.
  2. Yes, RNA editing sites.
  3. Our work around has been to convert the vcf to bed and filter post calling - which is working fine.

If we find the source of the original error, will circle back to this thread.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants