Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Long reads support for Sniffles files #294

Closed
5 tasks done
zhemingfan opened this issue Jan 20, 2022 · 4 comments · Fixed by #300
Closed
5 tasks done

Long reads support for Sniffles files #294

zhemingfan opened this issue Jan 20, 2022 · 4 comments · Fixed by #300
Assignees
Labels
enhancement long read support Support for long read sequence data, e.g. from Oxford Nanopore or PacBio SV caller Tickets related to support for inputs from different SV callers
Milestone

Comments

@zhemingfan
Copy link
Collaborator

zhemingfan commented Jan 20, 2022

Overview

Currently, Sniffles (long read SV caller) outputs INVDUP SVTYPES which MAVIS does not handle. A temporary workaround of treating INVDUP as a combination of inversion, duplication, and insertion has been done.

A series of changes must be made to accommodate Sniffles:

  • Add support for uncertain types (e.g DEL/INS)
  • Add error handling of cases where breakpoints breakpoint start > breakpoint end
  • Add unit tests that includes aforementioned cases (Exceptions include cases where PRECISE and IMPRECISE calls have pos=0) and cases where bp1 = 1 and bp2 = 0
  • Add support for handling INVDUP from Sniffles input
  • Ensure that errors in Sniffles calls addressed in this post are fixed
@zhemingfan zhemingfan added enhancement long read support Support for long read sequence data, e.g. from Oxford Nanopore or PacBio SV caller Tickets related to support for inputs from different SV callers labels Jan 20, 2022
@zhemingfan zhemingfan self-assigned this Jan 20, 2022
@creisle creisle added this to the v3.0.0 milestone Jan 20, 2022
@zhemingfan zhemingfan changed the title Add support for handling INVDUP from Sniffles input Long reads support for Sniffles files Jan 21, 2022
@creisle
Copy link
Member

creisle commented Jan 26, 2022

So it looks like the cause of the start > end error is the 0 position in the BND alt syntax. See example test to reproduce below

def test_convert_record():
    variant = VcfRecordType(
        9000,
        12000,
        'chr14_KI270722v1_random',
        alts=['N[chr17_GL000205v2_random:0['],
        ref='N',
        info=VcfInfoType(
            IMPRECISE=True,
            SVMETHOD="Snifflesv1.0.11",
            SVTYPE="BND",
            SUPTYPE="SR",
            SVLEN="0",
            STRANDS="+-",
            RE="5",
            REF_strand="0,0",
            AF="1",
        ),
    )
    records = convert_record(variant)
    records = [_convert_tool_row(r, SUPPORTED_TOOL.VCF, False) for r in records]

Based on the vcf 4.2 spec (https://samtools.github.io/hts-specs/VCFv4.2.pdf) these indicate connections to telomeres. I am not totally sure how to deal with these, but one solution might be to just make the 0 a 1 since we cannot start before the start of a sequence and coordinates are 1-based.

@zhemingfan zhemingfan linked a pull request Jan 26, 2022 that will close this issue
@zhemingfan
Copy link
Collaborator Author

Ensure that future versions use Sniffles2.0

@jessicadlang
Copy link

Hi, I'm trying to use a vcf produced by Sniffles2 with the convert feature, but am getting the following error:

Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 1 (use --cores to define parallelism)
Rules claiming more threads will be scaled down.
Provided resources: mem_mb=16000, mem_mib=15259, disk_mb=1000, disk_mib=954, time_limit=57600, cpus=1
Select jobs to execute...

[Mon Dec 16 11:20:53 2024]
rule init_config:
input: /project/LangWorkspace/longread/PEO1/PEO1/output/config.raw.json
output: /project/LangWorkspace/longread/PEO1/PEO1/output/config.json
log: /project/LangWorkspace/longread/PEO1/PEO1/output/logs/init_config.snakemake.log.txt
jobid: 0
reason: Forced execution
resources: mem_mb=16000, mem_mib=15259, disk_mb=1000, disk_mib=954, tmpdir=/tmp, time_limit=57600, cpus=1, lo
g_dir=/project/LangWorkspace/longread/PEO1/PEO1/output/logs

mavis setup --config /project/LangWorkspace/longread/PEO1/PEO1/output/config.raw.json --outputfile /project/LangW
orkspace/longread/PEO1/PEO1/output/config.json
Activating singularity image /project8/LangWorkspace/mavis/.snakemake/singularity/692ac5546ddf42e31ddd9862317dc26b.simg
2024-12-16 11:20:58,207 [INFO] MAVIS: 3.1.2
2024-12-16 11:20:58,207 [INFO] hostname: slurm138.ssc.wisc.edu
2024-12-16 11:20:58,207 [INFO] arguments
2024-12-16 11:20:58,207 [INFO] command= 'setup'
2024-12-16 11:20:58,207 [INFO] config= '/project/LangWorkspace/longread/PEO1/PEO1/output/config.raw.json'
2024-12-16 11:20:58,207 [INFO] log= None
2024-12-16 11:20:58,207 [INFO] log_level= 'INFO'
2024-12-16 11:20:58,207 [INFO] outputfile= '/project/LangWorkspace/longread/PEO1/PEO1/output/config.json'
Traceback (most recent call last):
File "/usr/local/bin/mavis", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.7/site-packages/mavis/main.py", line 295, in main
raise err
File "/usr/local/lib/python3.7/site-packages/mavis/main.py", line 272, in main
_config.add_bamstats_to_config(config)
File "/usr/local/lib/python3.7/site-packages/mavis/config.py", line 122, in add_bamstats_to_config
library.update(calculate_bam_stats(config, libname))
File "/usr/local/lib/python3.7/site-packages/mavis/config.py", line 46, in calculate_bam_stats
distribution_fraction=config['bam_stats.distribution_fraction'],
File "/usr/local/lib/python3.7/site-packages/mavis/bam/stats.py", line 275, in compute_genome_bam_stats
median = hist.median()
File "/usr/local/lib/python3.7/site-packages/mavis/bam/stats.py", line 77, in median
return (values[low_center - 1] + values[high_center - 1]) / 2
IndexError: list index out of range
[Mon Dec 16 11:21:30 2024]
Error in rule init_config:
jobid: 0
input: /project/LangWorkspace/longread/PEO1/PEO1/output/config.raw.json
output: /project/LangWorkspace/longread/PEO1/PEO1/output/config.json
log: /project/LangWorkspace/longread/PEO1/PEO1/output/logs/init_config.snakemake.log.txt (check log file(s) f
or error details)
shell:
mavis setup --config /project/LangWorkspace/longread/PEO1/PEO1/output/config.raw.json --outputfile /proje
ct/LangWorkspace/longread/PEO1/PEO1/output/config.json
(one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message

Is this error because something in Sniffles2 output does not work with stats.py around line 77?

@jessicadlang
Copy link

I've been looking into the code more. Seems the problem is actually in generating the bam stats. Has there been any more thought to issue #210, which is related to long-read bams causing problems?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement long read support Support for long read sequence data, e.g. from Oxford Nanopore or PacBio SV caller Tickets related to support for inputs from different SV callers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants