Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ProPhex update #25

Open
wants to merge 11 commits into
base: master
Choose a base branch
from
Open
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update output format
karel-brinda committed Apr 19, 2020
commit e8e93c80c8b1d3d955f51be44d3db2cf2585ce13
27 changes: 14 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
@@ -115,19 +115,20 @@ Usage: prophex bwt2fa <idxbase> <output.fa>

## Output format

Matches are reported in an extended
[Kraken format](http://ccb.jhu.edu/software/kraken/MANUAL.html#output-format).
ProPhex produces a tab-delimited file with the following columns:

1. Category (unused, `U` as a legacy value)
2. Sequence name
3. Final decision (unused, `0` as a legacy value)
4. Sequence length
5. Assigned k-mers. Space-delimited list of k-mer blocks with the same assignments. The list is of
the following format: comma-delimited list of sets (or `A` for ambiguous, or
  `0` for no matches), colon, length. Example: `2157,393595:1 393595:1 0:16` (the first k-mer assigned to the nodes `2157` and `393595`, the second k-mer assigned to `393595`, the subsequent 16 k-mers unassigned)
6. Bases (optional)
7. Base qualities (optional)
Matches are reported in the form of a tab-delimited file with the following
columns:

1. Sequence name
2. Sequence length
3. Assigned k-mers. Space-delimited list of k-mer blocks matching the same
k-mer sets. The list is of the following format: comma-delimited list of
k-mer sets (`~` for an ambiguous nucleotide name `*` for no k-mer matches),
colon, the number of k-mers in the block. Example: `2157,393595:1 393595:1
*:16` (the first k-mer assigned to the k-mer sets `2157` and `393595`, the
second k-mer assigned to `393595`, and the subsequent 16 k-mers do not match
anything)
4. Bases (optional)
5. Base qualities (optional)


## FAQs
6 changes: 3 additions & 3 deletions src/prophex_query.c
Original file line number Diff line number Diff line change
@@ -154,7 +154,7 @@ void construct_streaks(char** all_streaks, char** current_streak, int* seen_kmer
*current_streak[0] = '\0';
int current_streak_approximate_length = 0;
if (is_ambiguous_streak) {
strcat(*current_streak, "A:");
strcat(*current_streak, CONTAINS_AMBIG_NUCL ":");
current_streak_approximate_length += 2;
} else if (kmersets_cnt > 0) {
int r;
@@ -167,7 +167,7 @@ void construct_streaks(char** all_streaks, char** current_streak, int* seen_kmer
get_kmerset_name_length(seen_kmersets[kmersets_cnt - 1]), MAX_SOFT_STREAK_LENGTH);
strncat_with_check(*current_streak, ":", &current_streak_approximate_length, 1, MAX_SOFT_STREAK_LENGTH);
} else {
strncat_with_check(*current_streak, "0:", &current_streak_approximate_length, 2, MAX_SOFT_STREAK_LENGTH);
strncat_with_check(*current_streak, NO_MATCH ":", &current_streak_approximate_length, 2, MAX_SOFT_STREAK_LENGTH);
}
sprintf(*current_streak + strlen(*current_streak), "%d", streak_size);
current_streak_approximate_length += 3;
@@ -458,7 +458,7 @@ void process_sequences(const bwaidx_t* idx, int n_seqs, bseq1_t* seqs, const pro
for (i = 0; i < n_seqs; ++i) {
bseq1_t* seq = seqs + i;
if (opt->output) {
fprintf(stdout, "U\t%s\t0\t%d\t", seq->name, seq->l_seq);
fprintf(stdout, "%s\t%d\t", seq->name, seq->l_seq);
print_streaks(prophex_worker->output[i]);
if (opt->output_read_qual) {
fprintf(stdout, "\t");
3 changes: 3 additions & 0 deletions src/prophex_query.h
Original file line number Diff line number Diff line change
@@ -14,6 +14,9 @@
#include "klcp.h"
#include "prophex_utils.h"

#define CONTAINS_AMBIG_NUCL "~"
#define NO_MATCH "*"

typedef struct {
uint64_t position;
int strand;