Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: debugging update to latest versions #118

Open
holtjma opened this issue Jan 9, 2025 · 14 comments
Open

Question: debugging update to latest versions #118

holtjma opened this issue Jan 9, 2025 · 14 comments

Comments

@holtjma
Copy link

holtjma commented Jan 9, 2025

Hi @jguhlin,

I'm in the process of updating this dependency for StarPhase, and I've run into some issues. Currently, I'm on version 0.1.16, and I tried updating to the latest (0.1.23). Things I updated code-wise:

  • Added None for qname (unneeded)
  • Added <Built> tag for types where appropriate
  • Modified flags since these are in a slice now instead of Vec

All of this was just to get the tool to compile, and it seemingly compiles fine. However, I'm finding that the alignments I'm getting out are different than before. For example, one that I'm debugging used to generate 7 mappings on v0.1.16 and now only generates 3 mappings on v0.1.23. Critically, the one I'm looking for is missing in that set. In terms of what I'm aligning, it's basically HiFi reads against the IMGT/HLA database. Aligner construction and mapping looks like this:

let db_aligner: Aligner<Built> = Aligner::builder()
            .map_hifi()
            .with_cigar()
            .with_index(tmp_db_fasta.path(), None)
            .unwrap();
...
db_aligner.map(
            &read_bytes,
            true, true, None, None, None
)?;

I stepped through the versions and v0.1.20 seems to be identical in output to v0.1.16, whereas v0.1.21 behaves like v0.1.23. It seems like quite a bit changed in v0.1.21, but I didn't see anything in the CHANGELOG that would cause mappings that were previously reported to get removed. Is there something else that changed or do you have any advice on debugging this further?

Matt

@jguhlin
Copy link
Owner

jguhlin commented Jan 9, 2025

That's very weird. Thanks for starting the debug process.

I'm taking a look at the diff between the current and 0.1.20 now.

I notice that with_cigar has changed.
Could you try?

let mut db_aligner = Aligner::builder().map_hifi();
db_aligner.mapopt.flag |= MM_F_CIGAR as i64;
let db_aligner = db_aligner.with_index(tmp_db_fasta.path(), None).unwrap();

(I tried my best, not sure if that code is quite correct).

The current version does this:

        self.mapopt.flag |= MM_F_CIGAR as i64 | MM_F_OUT_CS as i64;

It enables us to compute the CS tag, so it shouldn't change anything at all here.....

Can you share the before and after mappings? Do they just overlap? Maybe the CS tag collapses them? And what does the command line equivalent get you? And, just as a test, if you do specify query name, do you get the a good number back? (You could just enumerate them, or even make them all the same if that's not possible).

I wonder if it's a weird case of line 1027, which is a regrettable if statement...

But yeah, I'm not seeing anything that should have caused that
I've made a diff between lib.rs versions here if you want to take a look: https://www.diffchecker.com/8E5qVOm4/

@holtjma
Copy link
Author

holtjma commented Jan 9, 2025

Oh awesome, thanks for sending the diff! I looked through it too (up to the tests anyways), but didn't really see anything that was obvious to me...

I notice that with_cigar has changed.
Could you try?

Tried that, and it didn't seem to change the results any.

And, just as a test, if you do specify query name, do you get the a good number back?

Tried that before posting this comment, but seemed to have no effect (other than population the field of course). Double checked it just now to be sure.

Can you share the before and after mappings? Do they just overlap? Maybe the CS tag collapses them?

Yea, see attached log files (sorry for formatting). If you look at the target names, the 3 in the "after" set are also in the "before" set. But the other four in "before" are just not present, so I don't think it's a collapsing issue.

I just noticed there's a supplementary field now, is it possibly getting filtered somewhere from supplementary?

And what does the command line equivalent get you?

I'm not sure what you mean here, like running minimap2 directly?

Side note: one thing I just tried was increasing the mapopt.n parameter to 10 and that recovers a bunch of mappings including the one we're hunting for, although it gave back more than needed. I could probably test this further, but seems like a hack. Odd that it works for this N of 1 though.

before_mappings.txt
after_mappings.txt

@jguhlin
Copy link
Owner

jguhlin commented Jan 9, 2025

Yeah, with minimap2 what is the output? I want this to give the same output as that (plus it tells us if the difference might be something related to default minimap2 or not).

Thanks for those checks.

Going to dig into it some more. I think supplementary could be it...

@jguhlin
Copy link
Owner

jguhlin commented Jan 9, 2025

What is the mapopt.n parameter?

@holtjma
Copy link
Author

holtjma commented Jan 9, 2025

Yeah, with minimap2 what is the output?

I'll try to rig this up and report back with the mappings, probably won't be until tomorrow. Looks like I should use v2.28 (side note: I think this is shared between v0.1.20 and v0.1.21, right?)

What is the mapopt.n parameter?

Sorry, mental confusion, I meant mapopt.best_n

@jguhlin
Copy link
Owner

jguhlin commented Jan 9, 2025

Yep, v2.28 should be the normal. And cool, I should have found that but didn't see it, haha.

Another question, were you creating the index the same way before or using with_seqs or something? That changes max_occ to be more to Mappy (python Minimap2) than the CLI

@holtjma
Copy link
Author

holtjma commented Jan 9, 2025

Yea, I haven't changed how the index is created. It throws all the haplotypes into a FASTA file and then loads that using with_index.

When I add that best_n=10 parameter, the desired mapping comes back as the second result. I've included it here in case there's something about this that's useful, but nothing stood out to me:

Mapping { query_name: None, query_len: Some(21500), query_start: 8247, query_end: 21138, strand: Forward, target_name: Some("HLA:HLA00887"), target_len: 12905, target_start: 0, target_end: 12905, match_len: 12887, block_len: 12907, mapq: 0, is_primary: false, is_supplementary: false, alignment: Some(Alignment { nm: 20, cigar: Some([(230, 0), (1, 2), (1826, 0), (1, 2), (2176, 0), (1, 1), (216, 0), (1, 2), (1681, 0), (1, 2), (520, 0), (1, 2), (2353, 0), (9, 2), (617, 0), (1, 2), (228, 0), (1, 2), (1227, 0), (1, 1), (1815, 0)]), cigar_str: Some("8247S230M1D1826M1D2176M1I216M1D1681M1D520M1D2353M9D617M1D228M1D1227M1I1815M362S"), md: Some("230^C1826^A2392^T1681^A520^T2353^TTCCCGAAG225C391^C228^G596T2445"), cs: Some(":230-c:1826-a:2176+t:216-t:1681-a:520-t:2353-ttcccgaag:225*ct:391-c:228-g:596*tc:630+a:1815"), alignment_score: Some(12783) }) }

@jguhlin
Copy link
Owner

jguhlin commented Jan 9, 2025

That does help. It's neither primary nor secondary alignment. Does setting best_n to 5 fix it? As in completely the same as before?

And can you share the code of how the aligner was being built before? I wonder if that's it. With the Aligner and other modules I'm forcing config to be applied in a certain way, that should mimic the command line better and prevent weird errors. This may be one of those edge cases

@holtjma
Copy link
Author

holtjma commented Jan 9, 2025

So setting best_n=5 does get seemingly identical results. Did the default there change? I didn't see it in the diff.

And yea, here was the previous build code, no major difference other than the type:

let db_aligner: Aligner = Aligner::builder()
            .map_hifi()
            .with_cigar()
            .with_index(tmp_db_fasta.path(), None)
            .unwrap();

@jguhlin
Copy link
Owner

jguhlin commented Jan 9, 2025

Ok, yeah. That's it. Somehow best_n should be 5, and that was working before despite me setting it to 1 (I don't even remember why?). And I think changing how the defaults are applied (line 551 in the current version was added) must have changed it and allowed the 1 to sneak through.

I'll comment that line out, because I don't even know why it is there. For now, can you set it to 5 (minimap2 default) manually, and I'll push out a new version in a few weeks?

Moving soon and going up north tomorrow to look at rentals, so this whole month is going to be kinda funky.

@jguhlin
Copy link
Owner

jguhlin commented Jan 9, 2025

You can also set your dependency to

minimap2 = { git = "https://github.com/jguhlin/minimap2-rs/", branch = "Fix-best_n-config" }

for now. That should work, until a new release comes out.

@holtjma
Copy link
Author

holtjma commented Jan 9, 2025

Yep, I can set it manually! Glad it’s a relatively easy fix too. I’ll give it a deeper test end-to-end test tomorrow also. I appreciate all the help, especially with the busy life events!

@jguhlin
Copy link
Owner

jguhlin commented Jan 9, 2025 via email

@holtjma
Copy link
Author

holtjma commented Jan 10, 2025

Just following up, ran a full test this morning and things seem to match up again with that change. Going to inspect a few more closely, but I think it's safe to say that was the issue.

For anyone else who encounters this issue, it was introduced in v0.1.21 and caused by a change to the best_n parameter. Workaround until patch is to set that parameter manually:

If this was your code before (<=v0.1.20):

let db_aligner = Aligner::builder()
    .map_hifi()
    .with_cigar()
    .with_index(tmp_db_fasta.path(), None)?;

Then this workaround should match (>=v0.1.21 until patched):

let mut db_aligner = Aligner::builder()
     .map_hifi();
db_aligner.mapopt.best_n = 5;
let db_aligner = db_aligner
    .with_cigar()
    .with_index(tmp_db_fasta.path(), None)?;

Thanks again for the help Joseph!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants