-
Notifications
You must be signed in to change notification settings - Fork 63
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Docs Update: BAM Tags, changelog. (#20)
- Loading branch information
Showing
7 changed files
with
215 additions
and
15 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,31 @@ | ||
--- | ||
layout: default | ||
title: BAM Tags | ||
nav_order: 8 | ||
--- | ||
|
||
#### Iso-seq Tags | ||
|
||
| Tag | Type | Short Name | Relevant Executable | Value | | ||
| --- | ---- | ---------- | ----- | ----- | | ||
|CR| string | Cell Raw | `correct` | Raw (uncorrected) barcode. | | ||
|CB| string | Cell Barcode | `correct` | Corrected cell/group barcode. | | ||
|UR| string | UMI Raw | None currently | Molecular/UMI barcode. | | ||
|UB| string | UMI Barcode | None currently | Corrected molecular/UMI barcode. | | ||
|XM| string | UMI Barcode | `tag` | Corrected molecular/UMI barcode. | | ||
|XC| string | Cell Barcode | `tag`, `correct` | Original Cell barcode. | | ||
|XA| string | tag name order| `tag`, `correct` | Order of tags names. | | ||
|nc| int | Number of Candidates | `correct` | Number of candidate barcodes. | | ||
|oc| string | Other Choices | `correct` | String representation of other potential barcodes. | | ||
|gp| int | Group Passes | `correct` | Flag specifying whether or not the barcode for the given read passes filters. 1 for passing, 0 for failing. | | ||
|nb| int | Barcode Distance | `correct` | Edit distance from the barcode for the read to the barcode to which it was reassigned. This is 0 if the barcode matches exactly, -1 if the barcode could not be rescued, and the edit distance otherwise. | | ||
|ic| int | input-consensus | `dedup`, `groupdedup` | Number of reads used to generate consensus. If less than `is`, this means that reads were down-sampled when consensus-calling. | | ||
|is| int | input-sequences | `dedup`, `groupdedup` | Number of reads associated with isoform. | | ||
|XO| string | X Overhang | `tag` | Overhang sequence tag. | | ||
|XG| string | X GGG | `tag` | PacBio's GGG UMI suffix tag | | ||
|rq | float | read quality | | Predicted accuracy for polished isoform | | ||
|iz | int | maximum subreads used | | maximum number of subreads used for polishing | | ||
|it | string | trimmed | `tag` | List of barcodes/UMIs clipped during tag | | ||
|im | string | names | `dedup`, `groupdedup` | List of names of input reads used in generating consensus | | ||
|
||
<img src="../doc/img/isoseq.png"/> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
--- | ||
layout: default | ||
parent: Single cell | ||
title: Barcode Statistics | ||
nav_order: 7 | ||
--- | ||
|
||
*** | ||
|
||
`isoseq3 bcstats` emits statistics for each barcode: | ||
|
||
1. Barcode sequence | ||
2. Number of reads matching the barcode | ||
3. Frequency Rank (within barcodes) | ||
4. Number of unique molecular barcodes matching this barcode | ||
5. Whether the barcode is Group/Cell barcode or a Molecular Barcode/UMI | ||
|
||
If `--json` is unset, JSON summary information is written to stderr ("/dev/stderr"). | ||
Similarly, if '-o' is unset, output TSV information is written to stdout ("/dev/stdout"). | ||
|
||
```bash | ||
# Example: | ||
isoseq3 bcstats --json sample.bcstats.json -o sample.bcstats.tsv sample.bam | ||
``` | ||
|
||
In default behavior, the program only emits stats on group barcodes. | ||
Adding `--umi` will cause stats for the full molecular barcodes to be emitted as well. | ||
|
||
<img src="../../doc/img/isoseq.png"/> |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
--- | ||
layout: default | ||
parent: Single cell | ||
title: Barcode Correction Documentation via correct | ||
nav_order: 6 | ||
--- | ||
|
||
## Barcode Correction Documentation | ||
|
||
### Why Barcode Correction? | ||
|
||
Single-cell, spatially-resolved, and other barcoded sequencing applications | ||
rely on the accuracy of the cell or group barcode, which is typically chosen from a set of | ||
known candidates, often referred to as a "whitelist". | ||
|
||
This contrasts with the uniformly randomly-generated molecular barcodes (a.k.a. UMIs, "Unique molecular identifiers"). | ||
|
||
This tool uses the set of known candidates to correct sequencing errors in cell barcode identification. There are two primary benefits: | ||
|
||
1. Increased yield | ||
2. Improved accuracy in downstream deduplication. | ||
|
||
By correcting errors in cell barcodes, the total number of usable reads is increased (typically ~5%). | ||
|
||
And, once cell barcodes are corrected, the downstream groupdedup software tool can perform deduplication much more efficiently | ||
than standard deduplication. This is because only reads sharing a cell barcode are compared, which dramatically reduces the search space compared to exhaustive pairwise comparisons. | ||
|
||
### What does Barcode Correction do? | ||
|
||
The tool takes a list of true barcodes and builds a locality-sensitive hashing (LSH) index over that set to facilitate fast nearest-neighbor queries. | ||
|
||
This remaps reads with cell barcodes to their nearest-neighbors within the truth set. | ||
|
||
### When would a user call this tool? | ||
|
||
Run this tool on barcode-tagged BAM files before deduplication (`isoseq3 groupdedup`). | ||
This provides substantial runtime improvements compared to `isoseq3 dedup`. | ||
|
||
## Usage | ||
|
||
### (with barcode-set in barcodes.txt) | ||
``` | ||
isoseq3 correct --barcodes barcodes.txt input.bam output.bam | ||
``` | ||
|
||
#### Tags | ||
This requires the existance of XC and XU barcode tags. | ||
The program will fail if either are missing. | ||
|
||
We also add or update the following tags: | ||
|
||
| Tag | Type | Short Name | Value | | ||
| --- | ---- | ---------- | ----- | | ||
|CR| string | Cell Raw | Raw (uncorrected) barcode. | | ||
|CB| string | Cell Barcode | Corrected cell/group barcode. | | ||
|XC| string | Cell Barcode | Original Cell barcode. | | ||
|nc| int | Number of Candidates | Number of candidate barcodes. | | ||
|oc| string | Other Choices | String representation of other potential barcodes. | | ||
|gp| int | Group Passes | Flag specifying whether or not the barcode for the given read passes filters. 1 for passing, 0 for failing. | | ||
|nb| int | Number of Barcode Mismatches | Edit distance from the barcode for the read to the barcode to which it was reassigned. This is -1 if the barcode could not be corrected, and the edit distance otherwise. (This means 0 for an exact match.) | | ||
|
||
<img src="../../doc/img/isoseq.png"/> |