Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error from Manta input #213

Closed
moldach opened this issue Jun 23, 2020 · 8 comments
Closed

Error from Manta input #213

moldach opened this issue Jun 23, 2020 · 8 comments
Assignees
Labels
Milestone

Comments

@moldach
Copy link

moldach commented Jun 23, 2020

The mavis config steps creates a mavis.cfg for the MAVIS standard input file formats (.bam and .vcf); however, according to the the documents the Reference Input Files should be set up using Environment Variables or entered in the mavis.cfg file manually.

Set up mavis.cfg

mavis config \
    --library maddog genome normal False 470.sorted.dedupped.bam \
    --convert manta diploidSV.vcf manta \
    --assign maddog manta  \
    -w mavis.cfg

Manually include references

This is how my directory looks like

(mavis) [moldach - MAVIS-MADDOG]$ ll
total 5461660
-rw-r----- 1 moldach moldach 5352369367 Jun 23 12:32 470.sorted.dedupped.bam
-rw-r----- 1 moldach moldach     307672 Jun 23 12:32 470.sorted.dedupped.bam.bai
-rw-r----- 1 moldach moldach   25639999 Jun 23 07:57 ce11.2bit
-rw-r----- 1 moldach moldach   34147927 Jun 23 07:47 celegan2.json
-rwxr-x--- 1 moldach moldach  101957874 Jun 23 07:48 c_elegans.PRJNA13758.WS265.genomic.fa
-rw-r----- 1 moldach moldach      35325 Jun 23 14:23 diploidSV.vcf.gz
-rw-r----- 1 moldach moldach       4014 Jun 23 14:24 diploidSV.vcf.gz.tbi
-rw-r----- 1 moldach moldach          0 Jun 23 14:36 log.txt
-rw-r----- 1 moldach moldach   78248661 Jun 23 14:22 mavis_CeDNR_annotations.tab
-rw-r----- 1 moldach moldach        666 Jun 23 14:35 mavis.cfg

So I'll add the required references to mavis.cfg:

[reference]
template_metadata =
masking =
annotations = /scratch/moldach/MAVIS-MADDOG/celegan2.json
aligner_reference = /scratch/moldach/MAVIS-MADDOG/ce11.2bit
dgv_annotation = /scratch/moldach/MAVIS-MADDOG/mavis_CeDNR_annotations.tab
reference_genome = /scratch/moldach/MAVIS-MADDOG/c_elegans.PRJNA13758.WS265.genomic.fa

[maddog]
library = maddog
protocol = genome
bam_file = 470.sorted.dedupped.bam
read_length = 101
median_fragment_size = 342
stdev_fragment_size = 74
strand_specific = False
strand_determining_read = 2
disease_status = normal
inputs = manta

[convert]
assume_no_untemplated = True
manta = convert_tool_output
        diploidSV.vcf.gz
        manta
        False

Running MAVIS

I'm getting the following error:

(mavis) [moldach@cedar1 MAVIS-MADDOG]$ mavis setup mavis.cfg -o output_dir/ >> log.txt
                      MAVIS: 2.2.6
                      hostname: cedar1.cedar.computecanada.ca
[2020-06-23 14:36:41] arguments
                        command = 'setup'
                        config = '/scratch/moldach/MAVIS-MADDOG/mavis.cfg'
                        log = None
                        log_level = 'INFO'
                        output = 'output_dir/'
                        skip_stage = []
                      creating output directory: 'output_dir/converted_inputs'
                      setting up the directory structure for maddog as /scratch/moldach/MAVIS-MADDOG/output_dir/maddog_normal_genome
                      converting input command: ['convert_tool_output', '/scratch/moldach/MAVIS-MADDOG/diploidSV.vcf.gz', 'manta', False]
                      reading: /scratch/moldach/MAVIS-MADDOG/diploidSV.vcf.gz
                      found 425 rows
                      Error in converting row {'id': 'MantaBND:47:0:2:0:0:0:1', 'break2_orientation': 'L', 'untemplated_seq': '', 'break1_chromosome': 'I', 'break2_chromosome': 'I', 'break1_position_start': 1059667, 'break1_position_end': 1060026, 'break2_position_start': 1101776, 'break2_position_end': 1101776, 'event_type': 'BND', 'MATEID': 'MantaBND:47:0:2:0:0:0:0', 'IMPRECISE': True, 'BND_DEPTH': 90, 'MATE_BND_DEPTH': 41}
Traceback (most recent call last):
  File "/home/moldach/bin/mavis/bin/mavis", line 10, in <module>
    sys.exit(main())
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/main.py", line 414, in main
    raise err
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/main.py", line 397, in main
    pipeline = _pipeline.Pipeline.build(config)
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/schedule/pipeline.py", line 311, in build
    libconf.inputs = run_conversion(config, libconf, conversion_dir)
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/schedule/pipeline.py", line 75, in run_conversion
    output_tabbed_file(convert_tool_output(
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/tools.py", line 73, in convert_tool_output
    result.extend(_convert_tool_output(fname, file_type, stranded, log, assume_no_untemplated=assume_no_untemplated))
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/tools.py", line 535, in _convert_tool_output
    raise err
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/tools.py", line 532, in _convert_tool_output
    std_rows = _convert_tool_row(row, file_type, stranded, assume_no_untemplated=assume_no_untemplated)
  File "/home/moldach/bin/mavis/lib/python3.8/site-packages/mavis/tools.py", line 466, in _convert_tool_row
    raise UserWarning(
UserWarning: ('row failed to create any breakpoint pairs. This generally indicates an input formatting error', {'id': 'MantaBND:47:0:2:0:0:0:1', 'break2_orientation': 'L', 'untemplated_seq': '', 'break1_chromosome': 'I', 'break2_chromosome': 'I', 'break1_position_start': 1059667, 'break1_position_end': 1060026, 'break2_position_start': 1101776, 'break2_position_end': 1101776, 'event_type': 'BND', 'MATEID': 'MantaBND:47:0:2:0:0:0:0', 'IMPRECISE': True, 'BND_DEPTH': 90, 'MATE_BND_DEPTH': 41}, {'tracking_id': 'manta-MantaBND:47:0:2:0:0:0:1', 'break1_orientation': '?', 'break2_orientation': 'L', 'break1_strand': ['?'], 'break2_strand': ['?'], 'id': 'MantaBND:47:0:2:0:0:0:1', 'untemplated_seq': '', 'break1_chromosome': 'I', 'break2_chromosome': 'I', 'break1_position_start': 1059667, 'break1_position_end': 1060026, 'break2_position_start': 1101776, 'break2_position_end': 1101776, 'event_type': 'BND', 'MATEID': 'MantaBND:47:0:2:0:0:0:0', 'IMPRECISE': True, 'BND_DEPTH': 90, 'MATE_BND_DEPTH': 41}, [('L', 'L', '?', '?', 'translocation', True), ('L', 'L', '?', '?', 'translocation', False), ('L', 'L', '?', '?', 'inverted translocation', True), ('L', 'L', '?', '?', 'inverted translocation', False), ('R', 'L', '?', '?', 'translocation', True), ('R', 'L', '?', '?', 'translocation', False), ('R', 'L', '?', '?', 'inverted translocation', True), ('R', 'L', '?', '?', 'inverted translocation', False)])
@creisle creisle self-assigned this Jun 27, 2020
@creisle
Copy link
Member

creisle commented Jun 27, 2020

@moldach would you be able to pull out the row from the vcf that errored along with its mate and paste it here? From just this error message it looks like the break1_orientation is missing but I can't tell more without the data itself

@creisle creisle added the awaiting reply waiting for reply from the reporter label Jun 27, 2020
@ramsainanduri
Copy link

ramsainanduri commented Jun 30, 2020

I have been getting the same error

My config

[reference]
template_metadata = /reference_inputs/cytoBand.txt
masking = /reference_inputs/hg19_masking.tab
annotations = /reference_inputs/ensembl69_hg19_annotations.json
aligner_reference =/reference_inputs/Genomes/Human_genome/hg19.fa
dgv_annotation = /reference_inputs/reference_inputs/dgv_hg19_variants.tab
reference_genome = /reference_inputs/Genomes/Human_genome/hg19.fa

[S14N]
library = S14N
protocol = genome
bam_file = /home/ram.nanduri/SV_VCFS/Bam/S14N.recaled.bam
read_length = None
median_fragment_size = None
stdev_fragment_size = None
strand_specific = False
strand_determining_read = 2
disease_status = normal
inputs = manta

[S14T]
library = S14T
protocol = genome
bam_file = /home/ram.nanduri/SV_VCFS/Bam/S14T.recaled.bam
read_length = None
median_fragment_size = None
stdev_fragment_size = None
strand_specific = False
strand_determining_read = 2
disease_status = diseased
inputs = manta
[convert]
assume_no_untemplated = True
manta = convert_tool_output
/home/ram.nanduri/SV_VCFS/S14N_vs_S14T/results/variants/diploidSV.vcf.gz
/home/ram.nanduri/SV_VCFS/S14N_vs_S14T/results/variants/somaticSV.vcf.gz
manta
False

My Log
MAVIS: 2.2.6
hostname: 10.1.11.37
[2020-06-30 01:19:19] arguments
command = 'setup'
config = '/home/ram.nanduri/SV_VCFS/MAVIS_ORI_TEST/S14N_vs_S14T.mavis.cfg'
log = 'S14N_vs_S14T.Run.log'
log_level = 'INFO'
output = 'S14N_vs_S14T.mavis.output/'
skip_stage = [
'cluster'
'validate'
]
creating output directory: 'S14N_vs_S14T.mavis.output/converted_inputs'
setting up the directory structure for S14N as /home/ram.nanduri/SV_VCFS/MAVIS_ORI_TEST/S14N_vs_S14T.mavis.output/S14N_normal_genome
converting input command: ['convert_tool_output', '/home/ram.nanduri/SV_VCFS/S14N_vs_S14T/results/variants/diploidSV.vcf.gz', '/home/ram.nanduri/SV_VCFS/S14N_vs_S14T/results/variants/somaticSV.vcf.gz', 'manta', False]
reading: /home/ram.nanduri/SV_VCFS/S14N_vs_S14T/results/variants/diploidSV.vcf.gz
found 290 rows
Error in converting row {'id': 'MantaBND:207:0:1:0:0:0:0', 'break2_orientation': 'R', 'untemplated_seq': 'GCCCCAT', 'break1_chromosome': 'chr1', 'break2_chromosome': 'chr1', 'break1_position_start': 17051724, 'break1_position_end': 17051724, 'break2_position_start': 234912188, 'break2_position_end': 234912188, 'event_type': 'BND', 'MATEID': 'MantaBND:207:0:1:0:0:0:1', 'SVINSLEN': 7, 'SVINSSEQ': 'GCCCCAT', 'BND_DEPTH': 5, 'MATE_BND_DEPTH': 4}

Mate Pairs from the diploidSV.vcf.gz

chr1 17051724 MantaBND:207:0:1:0:0:0:0 C [chr1:234912188[GCCCCATC 36 PASS SVTYPE=BND;MATEID=MantaBND:207:0:1:0:0:0:1;SVINSLEN=7;SVINSSEQ=GCCCCAT;BND_DEPTH=5;MATE_BND_DEPTH=4 GT:FT:GQ:PL:PR:SR 0/1:PASS:30:86,0,28:1,2:3,1

chr1 234912188 MantaBND:207:0:1:0:0:0:1 A [chr1:17051724[ATGGGGCA 36 PASS SVTYPE=BND;MATEID=MantaBND:207:0:1:0:0:0:0;SVINSLEN=7;SVINSSEQ=ATGGGGC;BND_DEPTH=4;MATE_BND_DEPTH=5 GT:FT:GQ:PL:PR:SR 0/1:PASS:30:86,0,28:1,2:3,1

@creisle
Copy link
Member

creisle commented Jun 30, 2020

thanks @ramsainanduri! That's really helpful :) I will look into this now

@creisle creisle removed the awaiting reply waiting for reply from the reporter label Jun 30, 2020
@creisle creisle added the bug label Jun 30, 2020
@creisle creisle added this to the v2.2.7 milestone Jun 30, 2020
@ramsainanduri
Copy link

ramsainanduri commented Jul 1, 2020

Hi @creisle,
Is this bug fixed and when can we expect the new version?

@creisle
Copy link
Member

creisle commented Jul 2, 2020

It is fixed and will be released in 2.2.7 which should be released today or in the next couple of days. In the mean time you can build from the fix branch here https://github.com/bcgsc/mavis/tree/bugfix/issue-213-bnd-non-trans if you like

@ramsainanduri
Copy link

okay thank you.

@creisle creisle closed this as completed in f9c3a71 Jul 4, 2020
@creisle creisle mentioned this issue Jul 4, 2020
@creisle
Copy link
Member

creisle commented Jul 6, 2020

This has now been released. https://pypi.org/project/mavis/2.2.7/ Please let me know if you have further issues

@moldach
Copy link
Author

moldach commented Jul 17, 2020

This works great, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants