Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Thoughts about compressing unitigs? #3

Open
rchikhi opened this issue Nov 4, 2022 · 2 comments
Open

Thoughts about compressing unitigs? #3

rchikhi opened this issue Nov 4, 2022 · 2 comments

Comments

@rchikhi
Copy link

rchikhi commented Nov 4, 2022

Hi Sebastian, Agnieszka, Heng,

AGC looks great. I wanted to see if it'd work also on badly-assembled sequences, e.g. unitigs, and didn't get good compression ratios. Would you say the approach fundamentally wouldn't work for unitigs, or did I miss some parameter tweaks?

I tried to compress 2 human samples unitigs (NA06986 & NA06991) using CHM13v2 as reference, resulting in AGC filesize of 3.6 GB, which is more than the concatenation of the raw gzipped unitigs (2x1.7GB). Cmdline: \time ~/tools/agc/agc create -t 10 chm13v2.0.oneline.fa NA06986.unitigs.fa.gz NA06991.unitigs.fa.gz > NA06986_NA06991.agc. Testing with parameter -s 200 didn't substantially change results.

thanks in advance for any feedback,
Rayan

@sebastiandeorowicz
Copy link
Member

Hi Rayan,
AGC was designed for high quality assemblies. Nevertheless, I'm a bit surprised that you report so bad ratios, so we have to take a look at this case. Definitely, we should be better than gzip. :-) I'll let you know when we will have any news.
Best,
Sebastian

@arcadeo
Copy link

arcadeo commented Mar 7, 2023

AGC does look great! And perhaps I misunderstood, but I think the size difference is due to the AGC file including three genomes (i.e., ref + 2 unitig assemblies), not just two. So AGC would still effectively be smaller at 3.6GB than concatenating the three assemblies.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants