Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consolidate issues into one dataframe #112

Open
apriha opened this issue Nov 9, 2020 · 0 comments
Open

Consolidate issues into one dataframe #112

apriha opened this issue Nov 9, 2020 · 0 comments
Milestone

Comments

@apriha
Copy link
Owner

apriha commented Nov 9, 2020

Building on #107, consolidate several issues (e.g., duplicate_rsid, discrepant_XY) into one dataframe with the following columns / dtypes:

Column pandas dtype
rsid pd.StringDtype()
chrom pd.CategoricalDtype()
pos pd.UInt32Dtype()
genotype pd.CategoricalDtype()
duplicate_rsid pd.BooleanDtype()
discrepant_loci pd.BooleanDtype()
discrepant_XY pd.BooleanDtype()
heterozygous_MT pd.BooleanDtype()
discrepant_vcf_position pd.BooleanDtype()
discrepant_merge_position pd.BooleanDtype()
discrepant_merge_genotype pd.BooleanDtype()

Multiple issue columns could take on the value of True, and getting SNPs with issues (e.g., discrepant_XY) could be handled by filtering the issues dataframe.

rsids could appear more than once in this dataframe. However, if an rsid has two or more rows that are equivalent (same values for chrom, pos, and genotype), their issues should be consolidated into one row, with the issue columns flagging the issue(s).

@apriha apriha added this to the 3.1.0 milestone Nov 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant