From 89a2ee535e588a70a02b130f2182ac6e148a108a Mon Sep 17 00:00:00 2001 From: Harun Mustafa <hmusta@users.noreply.github.com> Date: Fri, 10 Jan 2025 19:09:46 +0100 Subject: [PATCH] Update sequence alignment documentation --- metagraph/docs/source/sequence_search.rst | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) diff --git a/metagraph/docs/source/sequence_search.rst b/metagraph/docs/source/sequence_search.rst index dfaf1fda41..6e9f685b01 100644 --- a/metagraph/docs/source/sequence_search.rst +++ b/metagraph/docs/source/sequence_search.rst @@ -38,17 +38,22 @@ flags are available. Sequence-to-graph alignment ^^^^^^^^^^^^^^^^^^^^^^^^^^^ +If the :code:`--map` flag is not used, this enables sequence-to-graph alignment, approximately finding the best-matching path in the graph to the query sequence. Alongside this path, this mode also returns an alignment score and a CIGAR string describing the edits needed to transform the spelling of the graph path to the query sequence. An example command may be:: -Additional parameters -^^^^^^^^^^^^^^^^^^^^^ + metagraph align -i MYGRAPH.dbg MYREADS.fa -Query sequences against the index ---------------------------------- -(Experiment discovery) +The output of the query is in TSV format, with one line per query sequence, where the columns are as follows: -Parameters for exact k-mer matching -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +1. Query name +2. Query sequence +3. Strand +4. Reference sequence (the spelling of the matched path +5. Alignment score +6. Number of exact matches +7. CIGAR string-like alignment summary +8. Number of nucleotides trimmed from the prefix of the reference sequence +9. Ref name matches (if the :code:`-a` flag is passed) -Parameters for alignment -^^^^^^^^^^^^^^^^^^^^^^^^ +An important parameter is the seed length, which can be set with :code:`--align-min-seed-length` and can be shorter than the value of k used to construct the graph. +If an annotator is provided with the :code:`-a` flag, the returned alignments will be label-consistent, meaning that there is at least one label that is shared by all nodes on the path.