Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a description of the toolkit to the README #34

Merged
merged 34 commits into from
Aug 13, 2024
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
34 commits
Select commit Hold shift + click to select a range
798a5ac
Add a description of the toolkit to the README
clintval Nov 22, 2023
e3e3dc1
Generate docs files
Mar 13, 2024
0ade11f
Merge remote-tracking branch 'origin/main' into cv_README
clintval May 10, 2024
531a122
Generate docs files
May 10, 2024
17e55e4
Fix up README after a review
clintval May 10, 2024
983dbb7
Fix up README after a review
clintval May 10, 2024
5a903a0
Remove outdated intro in Overview
clintval May 10, 2024
a62b275
Fixup a sentence
clintval May 10, 2024
a8cc755
Whitespace
clintval May 10, 2024
9304a02
Generate docs files
May 10, 2024
475c3e9
Generate docs files
May 10, 2024
c2ca29a
Remove duplicate .gitignore line
clintval May 10, 2024
0090746
Generate docs files
May 10, 2024
7067311
Small review fixups
clintval May 10, 2024
9a45075
Generate docs files
May 10, 2024
af562e6
docs: revise docs based on @msto review
clintval Jul 23, 2024
818d158
Generate docs files
Jul 23, 2024
a29d0ca
docs: small docs fixups for clarity and formatting
clintval Jul 23, 2024
c6a7e11
Generate docs files
Jul 23, 2024
8d80f2a
docs: one more pass at docs clarity!
clintval Jul 23, 2024
88fff29
chore: query group and template definition
clintval Jul 23, 2024
697b07d
docs: move reference down
clintval Jul 23, 2024
34e9faf
docs: do not repeat thyself
clintval Jul 23, 2024
ecf1df4
Generate docs files
Jul 23, 2024
09ead37
docs: little fixup
clintval Jul 23, 2024
45a17c1
Generate docs files
Jul 23, 2024
ef7f8f2
docs: formatting to be the same
clintval Jul 23, 2024
ac5b334
Generate docs files
Jul 23, 2024
f56f0e7
chore: header fixup
clintval Jul 23, 2024
2297324
chore: header fixup
clintval Jul 23, 2024
51557f2
Generate docs files
Jul 23, 2024
df8ead1
Generate docs files
Jul 23, 2024
3f52590
docs: suit review from @nh13
clintval Aug 13, 2024
ddd45aa
Generate docs files
Aug 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
.java-version
.idea
.idea_modules
out
Expand Down
28 changes: 25 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,15 +7,37 @@

[bioconda-badge-link]: https://img.shields.io/conda/dn/bioconda/fgsv.svg?label=Bioconda
[bioconda-link]: http://bioconda.github.io/recipes/fgsv/README.html
[github-badge]: https://github.com/fulcrumgenomics/fgsv/actions/workflows/unittests.yaml/badge.svg
[github-badge]: https://github.com/fulcrumgenomics/fgsv/actions/workflows/unittests.yaml/badge.svg?branch=main
[github-link]: https://github.com/fulcrumgenomics/fgsv/actions/workflows/unittests.yaml
[scala-badge]: https://img.shields.io/badge/language-scala-c22d40.svg
[scala-link]: https://www.scala-lang.org/
[license-badge]: https://img.shields.io/badge/license-MIT-blue.svg
[license-link]: https://github.com/fulcrumgenomics/fgsv/blob/main/LICENSE

Tools to find evidence for structural variation.
Tools for calling breakpoints and exploring structural variation.
clintval marked this conversation as resolved.
Show resolved Hide resolved

## Documentation

Documentation can be found in the [docs folder](docs/01_Introduction.md)
Documentation can be found in the [docs folder](docs/01_Introduction.md).

## Introduction to the `fgsv` Toolkit

The tool [`fgsv SvPileup`](https://github.com/fulcrumgenomics/fgsv/blob/main/docs/tools/SvPileup.md) takes a query-grouped BAM file as input and scans through each template one at a time, where a template is the full collection of reads and alignments from a single source molecule.
clintval marked this conversation as resolved.
Show resolved Hide resolved
clintval marked this conversation as resolved.
Show resolved Hide resolved
clintval marked this conversation as resolved.
Show resolved Hide resolved
Primary and supplemental alignments for a template are used to construct a “chain” of aligned sub-segments in a way that is order and strand-aware.
clintval marked this conversation as resolved.
Show resolved Hide resolved
These aligned sub-segments relate to each other through typical alignment mechanisms like insertions and deletions but also contain information about the relative orientation of the sub-segment to the reference genome and importantly, jumps between reference sequences (chromosomes).
clintval marked this conversation as resolved.
Show resolved Hide resolved

For each chain of aligned sub-segments per template, outlier jumps are collected where the minimum inter-segment distance within a read must be 100bp (by default) or greater and the minimum inter-read distance per pair must be 1000bp (by default) or greater.
clintval marked this conversation as resolved.
Show resolved Hide resolved
In the case where there is both evidence for a split-read alignment and inter-read jump, the split-read alignment evidence is favored.
clintval marked this conversation as resolved.
Show resolved Hide resolved
At locations where these jumps occur, breakpoints are marked, and the breakpoints are given a unique ID based on the position of the breakpoint, the directionality of the left and right strands, and the other location the aligned sub-segment jumps to.
clintval marked this conversation as resolved.
Show resolved Hide resolved
The output of this process is simply a pileup of candidate breakpoint locations.
clintval marked this conversation as resolved.
Show resolved Hide resolved
The output of this tool is a metrics file tabulating the breakpoints and a BAM file with each alignment having custom tags that indicate which breakpoint the alignment supports (by breakpoint ID), if any.
clintval marked this conversation as resolved.
Show resolved Hide resolved

Because of variability in short-read sequence data and their alignments, evidence for a single breakpoint may span a few loci near the true breakpoint.
clintval marked this conversation as resolved.
Show resolved Hide resolved
The tool [`fgsv AggregateSvPileup`](https://github.com/fulcrumgenomics/fgsv/blob/main/docs/tools/AggregateSvPileup.md) is used to coalesce nearby breakpoints into one call if they appear to belong to one breakpoint.
This polishing step preserves true positive breakpoint calls and should reduce the number of false positive breakpoint calls.
clintval marked this conversation as resolved.
Show resolved Hide resolved
Adjacent breakpoints are only merged if their left sides map to the same reference sequence, their right rides sides map to the same reference sequence, the strandedness of the left and right aligned sub-segments is the same, and their left and right positions are both within a given length threshold.
clintval marked this conversation as resolved.
Show resolved Hide resolved
One shortcoming of the existing behavior, that should be corrected at some point, is that inter-read breakpoint evidence is considered similarly to inter-pair breakpoint evidence even though inter-read breakpoint evidence often has nucleotide-level alignment resolution and inter-pair breakpoint evidence does not.
clintval marked this conversation as resolved.
Show resolved Hide resolved
The output of this tool is a metrics file tabulating the coalesced breakpoints with all previous breakpoint IDs listed for the new breakpoint call and an estimation of the allele frequency of the call based on the alignments that support the breakpoint.

The `fgsv` tools are an effective structural variant debugging toolkit but are not meant to be considered as a structural variant calling toolchain in-and-of-itself.
Instead, it’s better to think of the `fgsv` toolkit as an effective “breakpoint caller”.
clintval marked this conversation as resolved.
Show resolved Hide resolved
2 changes: 1 addition & 1 deletion docs/tools/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ title: fgsv tools

# fgsv tools

The following tools are available in fgsv version 0.1.0-dcaa891.
The following tools are available in fgsv version 0.1.1-798a5ac.
## All tools

All tools.
Expand Down