Skip to content

Commit

Permalink
Allow BYTE_ARRAY_STOP to work on non-zero STOP code with TOK3.
Browse files Browse the repository at this point in the history
Our htscodec name tokeniser decoder always adds nul bytes between
names.  This happens to match the default STOP byte used in htslib's
CRAM implementation, but there's nothing to say it has to be 0 and
indeed Java uses 9 (tab).

This is an oversight and ideally we'd change the name tokeniser decode
function to take an additional parameter to specify the stop byte, but
that's changing the API.  Easiest is just to recognise this on-the-fly
and correct the error by looking for a different stop byte.
  • Loading branch information
jkbonfield committed Jan 7, 2025
1 parent c705bec commit be16c53
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion cram/cram_codecs.c
Original file line number Diff line number Diff line change
Expand Up @@ -3613,7 +3613,10 @@ int cram_byte_array_stop_decode_block(cram_slice *slice, cram_codec *c,
cp = b->data + b->idx;
cp_end = b->data + b->uncomp_size;

stop = c->u.byte_array_stop.stop;
// STOP byte is hard-coded as zero by our name tokeniser decoder
// implementation, so we may ignore what was requested.
stop = b->orig_method == TOK3 ? 0 : c->u.byte_array_stop.stop;

if (cp_end - cp < out->alloc - out->byte) {
unsigned char *out_cp = BLOCK_END(out);
while (cp != cp_end && *cp != stop)
Expand Down

0 comments on commit be16c53

Please sign in to comment.