Skip to content

Commit

Permalink
bump to version 4. get chromosome coordinates for each block
Browse files Browse the repository at this point in the history
  • Loading branch information
saulobejo committed Jun 25, 2020
1 parent 6c21c69 commit c3df5d3
Show file tree
Hide file tree
Showing 5 changed files with 1,092 additions and 51 deletions.
130 changes: 84 additions & 46 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -116,18 +116,21 @@ http://samtools.github.io/hts-specs/SAMv1.pdf


```
The random access method to be described next limits the uncompressed contents of each BGZF block
to a maximum of 216 bytes of data. Thus while ISIZE is stored as a uint32 t as per the gzip format, in
BGZF it is limited to the range [0, 65536]. BSIZE can represent BGZF block sizes in the range [1, 65536],
The random access method to be described next limits the uncompressed contents
of each BGZF block to a maximum of 216 bytes of data. Thus while ISIZE is
stored as a uint32 t as per the gzip format, in BGZF it is limited to the range
[0, 65536]. BSIZE can represent BGZF block sizes in the range [1, 65536],
though typically BSIZE will be rather less than ISIZE due to compression.
4.1.1 Random access
BGZF files support random access through the BAM file index. To achieve this, the BAM file index uses
virtual file offsets into the BGZF file. Each virtual file offset is an unsigned 64-bit integer, defined as:
coffset<<16|uoffset, where coffset is an unsigned byte offset into the BGZF file to the beginning of a
BGZF block, and uoffset is an unsigned byte offset into the uncompressed data stream represented by that
BGZF block. Virtual file offsets can be compared, but subtraction between virtual file offsets and addition
between a virtual offset and an integer are both disallowed.
BGZF files support random access through the BAM file index. To achieve this,
the BAM file index uses virtual file offsets into the BGZF file. Each virtual
file offset is an unsigned 64-bit integer, defined as: coffset<<16|uoffset,
where coffset is an unsigned byte offset into the BGZF file to the beginning of
a BGZF block, and uoffset is an unsigned byte offset into the uncompressed data
stream represented by that BGZF block. Virtual file offsets can be compared,
but subtraction between virtual file offsets and addition between a virtual
offset and an integer are both disallowed.
```

TABIX
Expand Down Expand Up @@ -204,6 +207,7 @@ Schema
------

https://jsonschema.net/home
https://www.jsonschemavalidator.net/


Example output
Expand All @@ -213,44 +217,78 @@ JSON

```JSON
{
"__format_name__": "TBJ",
"__format_ver__": 2,
"n_ref": 1,
"format": 2,
"col_seq": 1,
"col_beg": 2,
"col_end": 0,
"meta": "#",
"skip": 0,
"l_nm": 11,
"names": [
"SL2.50ch00"
"n_ref": 1,
"format": 2,
"col_seq": 1,
"col_beg": 2,
"col_end": 0,
"meta": "#",
"skip": 0,
"l_nm": 11,
"names": [
"SL2.50ch00"
],
"refs": [
{
"ref_n": 0,
"ref_name": "SL2.50ch00",
"n_bin": 86,
"bins": [
{
"bin_n": 0,
"bin": 4681,
"n_chunk": 1,
"chunks": {
"chunk_begin": [
{
"real": 0,
"bytes": 29542,
"bin_pos": -1,
"first_pos": -1,
"last_pos": -1
}
],
"chunk_end": [
{
"real": 124525,
"bytes": 19630,
"bin_pos": 16388,
"first_pos": 16141,
"last_pos": 17808
}
]
}
}
],
"refs": [{
"ref_n": 0,
"ref_name": "SL2.50ch00",
"n_bin": 86,
"bins": [{
"bin_n": 0,
"bin": 4681,
"n_chunk": 1,
"chunks": [
[29542, 8160890030]
]
},
{
"bin_n": 85,
"bin": 4766,
"n_chunk": 1,
"chunks": [
[460168303127, 461352730624]
]
}
],
"n_intv": 86,
"intvs": [29542, 460168303127]
}],
"n_no_coor": null
"bins_begin": {
"real": 7021611,
"bytes": 4631,
"bin_pos": 1392700,
"first_pos": 1392519,
"last_pos": 1393971
},
"bins_end": {
"real": 7021611,
"bytes": 4631,
"bin_pos": 1392700,
"first_pos": 1392519,
"last_pos": 1393971
},
"n_intv": 86,
"intvs": [
{
"real": 7021611,
"bytes": 4631,
"bin_pos": 1392700,
"first_pos": 1392519,
"last_pos": 1393971
}
]
}
],
"n_no_coor": null,
"__format_name__": "TBJ",
"__format_ver__": 4
}
```

Expand Down
6 changes: 3 additions & 3 deletions examples/example.v2.schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@
"type": "integer",
"title": "The format schema",
"description": "Format (0: generic; 1: SAM; 2: VCF).",
"default": 0,
"default": 2,
"examples": [
2
]
Expand All @@ -131,7 +131,7 @@
"type": "integer",
"title": "The col_beg schema",
"description": "Column for the start of a region.",
"default": 0,
"default": 2,
"examples": [
2
]
Expand All @@ -151,7 +151,7 @@
"type": "string",
"title": "The meta schema",
"description": "Leading character for comment lines.",
"default": "",
"default": "#",
"examples": [
"#"
]
Expand Down
106 changes: 106 additions & 0 deletions examples/example.v4.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
{
"n_ref": 1,
"format": 2,
"col_seq": 1,
"col_beg": 2,
"col_end": 0,
"meta": "#",
"skip": 0,
"l_nm": 11,
"names": [
"SL2.50ch00"
],
"refs": [
{
"ref_n": 0,
"ref_name": "SL2.50ch00",
"n_bin": 86,
"bins": [
{
"bin_n": 0,
"bin": 4681,
"n_chunk": 1,
"chunks": {
"chunk_begin": [
{
"real": 0,
"bytes": 29542,
"bin_pos": -1,
"first_pos": -1,
"last_pos": -1
}
],
"chunk_end": [
{
"real": 124525,
"bytes": 19630,
"bin_pos": 16388,
"first_pos": 16141,
"last_pos": 17808
}
]
}
},
{
"bin_n": 85,
"bin": 4766,
"n_chunk": 1,
"chunks": {
"chunk_begin": [
{
"real": 7021611,
"bytes": 4631,
"bin_pos": 1392700,
"first_pos": 1392519,
"last_pos": 1393971
}
],
"chunk_end": [
{
"real": 7039684,
"bytes": 0,
"bin_pos": -1,
"first_pos": -1,
"last_pos": -1
}
]
}
}
],
"bins_begin": {
"real": 7021611,
"bytes": 4631,
"bin_pos": 1392700,
"first_pos": 1392519,
"last_pos": 1393971
},
"bins_end": {
"real": 7021611,
"bytes": 4631,
"bin_pos": 1392700,
"first_pos": 1392519,
"last_pos": 1393971
},
"n_intv": 86,
"intvs": [
{
"real": 0,
"bytes": 29542,
"bin_pos": -1,
"first_pos": -1,
"last_pos": -1
},
{
"real": 7021611,
"bytes": 4631,
"bin_pos": 1392700,
"first_pos": 1392519,
"last_pos": 1393971
}
]
}
],
"n_no_coor": null,
"__format_name__": "TBJ",
"__format_ver__": 4
}
Loading

0 comments on commit c3df5d3

Please sign in to comment.