Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug(s?): Soft clipping in cigar outputs #51

Open
holtjma opened this issue Feb 28, 2024 · 6 comments
Open

Bug(s?): Soft clipping in cigar outputs #51

holtjma opened this issue Feb 28, 2024 · 6 comments

Comments

@holtjma
Copy link

holtjma commented Feb 28, 2024

Hello,

I came across an issue with soft clipping results in the output. I think at least one of these is probably a bug, the other I'm not sure if I'm just missing an option to enable it. For context, I'm using minimap2 = "0.1.16" and mapping extracted reference sequences into a read (I can provide more details around how I'm calling .map() if needed).

Here are some example output Mappings where I saw the issue:

Mapping { query_name: None, query_len: Some(4000), query_start: 0, query_end: 2613, strand: Forward, target_name: Some("N/A"), target_len: 12338, target_start: 9728, target_end: 12338, match_len: 2604, block_len: 2613, mapq: 60, is_primary: true, alignment: Some(Alignment { nm: 9, cigar: Some([(1270, 0), (1, 1), (91, 0), (1, 1), (871, 0), (1, 1), (378, 0)]), cigar_str: Some("1270M1I91M1I871M1I378MS1387"), md: Some("186G817G13A2C17G6T1563"), cs: Some(":186*ga:817*ga:13*ag:2*ct:17*ga:6*tc:223+t:91+g:871+c:378") }) }
Mapping { query_name: None, query_len: Some(4290), query_start: 2407, query_end: 4288, strand: Forward, target_name: Some("N/A"), target_len: 11711, target_start: 0, target_end: 1903, match_len: 1762, block_len: 1925, mapq: 60, is_primary: true, alignment: Some(Alignment { nm: 163, cigar: Some([(31, 0), (1, 2), (42, 0), (1, 1), (279, 0), (1, 1), (163, 0), (5, 2), (36, 0), (2, 1), (13, 0), (21, 2), (55, 0), (9, 2), (6, 0), (6, 2), (48, 0), (5, 1), (13, 0), (8, 1), (77, 0), (1, 1), (11, 0), (1, 1), (275, 0), (1, 1), (39, 0), (1, 2), (528, 0), (1, 1), (89, 0), (1, 1), (67, 0), (1, 2), (87, 0)]), cigar_str: Some("2407S31M1D42M1I279M1I163M5D36M2I13M21D55M9D6M6D48M5I13M8I77M1I11M1I275M1I39M1D528M1I89M1I67M1D87MS2"), md: Some("31^G192G47A14T14C4G11A2A8A3A17A0T5C1C0T17G2T1T10T18T0G2T4C8T2G7A3C2T29G5G6G6C6G1C4^CGCCA12T0T0A5C3T13C10^ACCGGATTCCAGCTGGGAAAT2G0C7A4T0T5C3C1A5C6G9A2^CGTCACAAG3C2^CCTCGT2C2T0G3A8T0A14T6G26T5C6G6C2T2C1A3C2A9T0C0A0C9T10T1T17T25A7T4G0T7A16G12T9G130C85^C10C421C5T6A39G15C144G5T24C6^G39C30A0C15"), cs: Some(":31-g:42+c:150*gc:47*ag:14*tc:14*cg:4*gt:11*ac:2*ag:8*ac:3*ag:17+g*ag*tc:5*ca:1*ca*ta:17*ga:2*tc:1*tg:10*tg:18*tc*gt:2*tc:4*ca:8*tc:2*gc:7*ag:3*ct:2*tg:29*ga:5*gc:6*ga:6*ct:6*gc:1*ct:4-cgcca:12*tc*tc*ag:5*ct:3*tc:11+ca:2*cg:10-accggattccagctgggaaat:2*ga*ct:7*ag:4*tc*tc:5*ct:3*cg:1*ag:5*cg:6*ga:9*ac:2-cgtcacaag:3*ca:2-cctcgt:2*ca:2*tg*ga:3*ag:8*tc*ac:14*tc:6*ga:5+ctcag:13+ctttatct:8*tc:5*cg:6*ga:6*cg:2*tg:2*cg:1*at:3*ct:2*ag:9*tc*ct*ag*ct:9*tg:10+c*tg:1*tg:8+c:9*tc:25*ag:7*tc:4*gt*ta:7*ag:16*gc:12*tc:9*gt:130*ca:46+g:39-c:10*cg:421*ct:5*tc:6*ac:39*ga:15*ca:26+g:89+a:29*ga:5*tc:24*ct:6-g:39*ct:30*ag*cg:15") }) }

The key values of the cigar string and cigar Vec, I have extracted below:

cigar: Some([(1270, 0), (1, 1), (91, 0), (1, 1), (871, 0), (1, 1), (378, 0)]), 
cigar_str: Some("1270M1I91M1I871M1I378MS1387")
...
cigar: Some([(31, 0), (1, 2), (42, 0), (1, 1), (279, 0), (1, 1), (163, 0), (5, 2), (36, 0), (2, 1), (13, 0), (21, 2), (55, 0), (9, 2), (6, 0), (6, 2), (48, 0), (5, 1), (13, 0), (8, 1), (77, 0), (1, 1), (11, 0), (1, 1), (275, 0), (1, 1), (39, 0), (1, 2), (528, 0), (1, 1), (89, 0), (1, 1), (67, 0), (1, 2), (87, 0)]), 
cigar_str: Some("2407S31M1D42M1I279M1I163M5D36M2I13M21D55M9D6M6D48M5I13M8I77M1I11M1I275M1I39M1D528M1I89M1I67M1D87MS2")

There are two issues I noticed in these outputs:

  1. In the first example, the final cigar string sequence is "S1387", but I think it should be "1387S" to match the rest of the cigar. This is the one I think is likely a bug.
  2. In both examples, I don't see the soft-clipping values reflects in the cigar Vec. I'm not sure if this is a bug, or if I'm just missing an option to make them appear.

I found a workaround using the query_start and query_end fields to get these soft-clipping values. Are you able to confirm if these are bugs or if there's something I'm missing in the usage?

Thanks for maintaining this crate!
Matt

@jguhlin
Copy link
Owner

jguhlin commented Mar 4, 2024

Definitely a bug. I'm just getting over being sick so it'll probably be next week before I can tackle these though!

@holtjma
Copy link
Author

holtjma commented Mar 4, 2024

Hope you feel better!

@jguhlin
Copy link
Owner

jguhlin commented Mar 12, 2024

Almost better. :/ Working on this today, hoping to get it pushed out soon (coincides with the minimap2 version update, so that's actually great!)

@jguhlin
Copy link
Owner

jguhlin commented Mar 12, 2024

I believe 1 is fixed. 2 may need an additional option. Minimap doesn't output soft clipping options unless it is in SAM format, and does not output it in PAF. So it's not as automated in the upstream library. Will keep working on it though, but it may need to be an additional option.

@jguhlin
Copy link
Owner

jguhlin commented Mar 16, 2024

@holtjma I've added the option with_cigar_clipping for aligner (aligner.with_cigar_clipping()).

I'm keeping it this way as minimap2 does not return the CIGAR with soft clipping except in the string (never seen in PAF, but seen in SAM, for example). Trying to keep the library as close to minimap2 as possible when in default mode. Let me know if that works.

I don't have a good example for soft clipping on the front, but I see minimap2 has the workings for it, so if you come up with something you can share, let me know!

jguhlin added a commit that referenced this issue Mar 16, 2024
@holtjma
Copy link
Author

holtjma commented Mar 18, 2024

That sounds good to me!

I'm not sure why minimap2 does it that way, but I definitely understand the desire to keep it as close as possible. Happy to try everything out once it's ready. Thanks again for taking the time to work on this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants