Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it theoretically sound to have the same sequence in multiple gene trees? #12

Open
000generic opened this issue Dec 13, 2024 · 1 comment

Comments

@000generic
Copy link

Hi! I'm new to Asteroid and excited to see what it can do with gappy species representation across gene trees.

I have a pipeline that is producing single-copy gene trees, running through DISCO at the step prior to Asteroid. So - per single-copy gene tree, a given species is represented by a single sequence - and the sequences in the gene tree are all orthologs based on genome clustering methods.

However, I am running the genomes through multiple phylogenetic levels or clustering tools for ortholog detection - and so a single sequence from a single species can be in multiple single-copy gene trees that would become input for Asteroid.

As I understand it, there should only be a single sequence representing a given species in a given gene tree in order for the underlying methods and statistics to be sound in Asteroid / in supertree methods - hence tools like DISCO ensure single-copy gene trees are produced prior to downstream species tree building by supertree methods

BUT

I am unsure if it is a violation of statistical or other methods to have the same sequence in multiple single-copy gene trees used in Asteroid / supertree methods.

Any guidance on this would be greatly appreciated :)

Thank you! Eric

@BenoitMorel
Copy link
Owner

BenoitMorel commented Dec 15, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants