Allow BYTE_ARRAY_STOP to work on non-zero STOP code with TOK3.

Our htscodec name tokeniser decoder always adds nul bytes between names. This happens to match the default STOP byte used in htslib's CRAM implementation, but there's nothing to say it has to be 0 and indeed Java uses 9 (tab). This is an oversight and ideally we'd change the name tokeniser decode function to take an additional parameter to specify the stop byte, but that's changing the API. Easiest is just to recognise this on-the-fly and correct the error by looking for a different stop byte.
jkbonfield · Jan 7, 2025 · be16c53 · be16c53
1 parent c705bec
commit be16c53
Showing 1 changed file with 4 additions and 1 deletion.
diff --git a/cram/cram_codecs.c b/cram/cram_codecs.c
@@ -3613,7 +3613,10 @@ int cram_byte_array_stop_decode_block(cram_slice *slice, cram_codec *c,
     cp = b->data + b->idx;
     cp_end = b->data + b->uncomp_size;
 
-    stop = c->u.byte_array_stop.stop;
+    // STOP byte is hard-coded as zero by our name tokeniser decoder
+    // implementation, so we may ignore what was requested.
+    stop = b->orig_method == TOK3 ? 0 : c->u.byte_array_stop.stop;
+
     if (cp_end - cp < out->alloc - out->byte) {
         unsigned char *out_cp = BLOCK_END(out);
         while (cp != cp_end && *cp != stop)