Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SAM output format @SQ headers #947

Open
schorlton-bugseq opened this issue Jan 31, 2025 · 2 comments
Open

SAM output format @SQ headers #947

schorlton-bugseq opened this issue Jan 31, 2025 · 2 comments

Comments

@schorlton-bugseq
Copy link

Thanks for your hard work on this tool!

I noticed when using easy-search with v17-b804f and setting --format-mode 1 (SAM output), the @SQ header lines of the SAM file are only populated for reference sequences which had alignments. While this is perfectly valid, by dropping the reference sequences without alignments, it precludes and complicates some downstream analyses. For example, samtools depth -aa doesn't know about the reference sequences without alignments, and samtools cat messes up the reference sequence names for two SAMs with different headers, even though the reference file may have been the same to generate them.

Would it be possible to output the @SQ lines for all lines in the input sequence when outputting in SAM format? Thanks for your consideration!

@milot-mirdita
Copy link
Member

Could you give me an example where this happens? I am not sure I understand whats wrong

@schorlton-bugseq
Copy link
Author

Sure! This is a mock example to show, I understand it is somewhat ridiculous.

>query
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
>target
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
mmseqs easy-search --format-mode 1 --search-type 2 query.fna target.fna result.sam tmp
cat result.sam

@HD	VN:1.4	SO:queryname
samtools sort -o sorted.bam result.sam
samtools index sorted.bam
samtools depth -aa sorted.bam # yields empty result

If result.sam looked like:

@HD	VN:1.4	SO:queryname
@SQ	SN:target	LN:77

(and possibly even had the unmapped read!), the samtools depth from above would have produced output.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants