You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! I'm new to Asteroid and excited to see what it can do with gappy species representation across gene trees.
I have a pipeline that is producing single-copy gene trees, running through DISCO at the step prior to Asteroid. So - per single-copy gene tree, a given species is represented by a single sequence - and the sequences in the gene tree are all orthologs based on genome clustering methods.
However, I am running the genomes through multiple phylogenetic levels or clustering tools for ortholog detection - and so a single sequence from a single species can be in multiple single-copy gene trees that would become input for Asteroid.
As I understand it, there should only be a single sequence representing a given species in a given gene tree in order for the underlying methods and statistics to be sound in Asteroid / in supertree methods - hence tools like DISCO ensure single-copy gene trees are produced prior to downstream species tree building by supertree methods
BUT
I am unsure if it is a violation of statistical or other methods to have the same sequence in multiple single-copy gene trees used in Asteroid / supertree methods.
Any guidance on this would be greatly appreciated :)
Thank you! Eric
The text was updated successfully, but these errors were encountered:
Hi Eric
Intuitively, I would say that this does not violate any statistical
assumption, although I don't think this is a common way of inferring a
species tree.
An interesting test would be to also rerun the process for each batch of
gene tree (for each phylogenetic level or clustering tool) and see if you
get consistent species trees.
I hope it helps :-)
Benoit
Le ven. 13 déc. 2024, 22:01, Eric Edsinger ***@***.***> a
écrit :
Hi! I'm new to Asteroid and excited to see what it can do with gappy
species representation across gene trees.
I have a pipeline that is producing single-copy gene trees, running
through DISCO at the step prior to Asteroid. So - per single-copy gene
tree, a given species is represented by a single sequence - and the
sequences in the gene tree are all orthologs based on genome clustering
methods.
However, I am running the genomes through multiple phylogenetic levels or
clustering tools for ortholog detection - and so a single sequence from a
single species can be in multiple single-copy gene trees that would become
input for Asteroid.
As I understand it, there should only be a single sequence representing a
given species in a given gene tree in order for the underlying methods and
statistics to be sound in Asteroid / in supertree methods - hence tools
like DISCO ensure single-copy gene trees are produced prior to downstream
species tree building by supertree methods
BUT
I am unsure if it is a violation of statistical or other methods to have
the same sequence in multiple single-copy gene trees used in Asteroid /
supertree methods.
Any guidance on this would be greatly appreciated :)
Thank you! Eric
—
Reply to this email directly, view it on GitHub
<#12>, or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ADJJ3UNI7EYW7HQS6V6PCXL2FNDIXAVCNFSM6AAAAABTSXK3ROVHI2DSMVQWIX3LMV43ASLTON2WKOZSG4ZTSMRQGQZDKNI>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
Hi! I'm new to Asteroid and excited to see what it can do with gappy species representation across gene trees.
I have a pipeline that is producing single-copy gene trees, running through DISCO at the step prior to Asteroid. So - per single-copy gene tree, a given species is represented by a single sequence - and the sequences in the gene tree are all orthologs based on genome clustering methods.
However, I am running the genomes through multiple phylogenetic levels or clustering tools for ortholog detection - and so a single sequence from a single species can be in multiple single-copy gene trees that would become input for Asteroid.
As I understand it, there should only be a single sequence representing a given species in a given gene tree in order for the underlying methods and statistics to be sound in Asteroid / in supertree methods - hence tools like DISCO ensure single-copy gene trees are produced prior to downstream species tree building by supertree methods
BUT
I am unsure if it is a violation of statistical or other methods to have the same sequence in multiple single-copy gene trees used in Asteroid / supertree methods.
Any guidance on this would be greatly appreciated :)
Thank you! Eric
The text was updated successfully, but these errors were encountered: