Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relationship between Example .gff and .matrix files? #3

Open
cizydorczyk opened this issue Feb 29, 2020 · 0 comments
Open

Relationship between Example .gff and .matrix files? #3

cizydorczyk opened this issue Feb 29, 2020 · 0 comments

Comments

@cizydorczyk
Copy link

cizydorczyk commented Feb 29, 2020

Hello,

Let me start off by saying the tool looks great! I've had a major problem dealing with incomplete/mis-annotated/truncated genes in pangenome analyses, and no other tools have been designed to deal with such issues -- I think this is a great advantage of PEPPA!

Before trying the tool out on my own dataset, I tried running the provided dataset. It runs just fine, but perhaps I am misunderstanding something about the output. The .gff file produced by PEPPA.py contains thousands of entries, but the .matrix file produced by PEPPA_parser.py only contains ~>200 genes, and many ortholog groups noted in the .gff file are absent from the .matrix file.

Is this because this is a reduced/sample dataset designed to run quickly? The pangenome is reported as 223 genes, with a core genome of 31 genes, with an average number of genes/genome at 88...in a full analysis, all genes identified in the .gff would be included in the .matrix file, would they not, provided they pass pseudogene filtering/etc.?

Thank you,
Conrad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant