Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is "Sex" correctly labelled in the Pedigree example and Sex documentation #391

Open
slaurie opened this issue Sep 17, 2024 · 3 comments
Open

Comments

@slaurie
Copy link

slaurie commented Sep 17, 2024

Hi @julesjacobsen

Is Sex correctly labelled in the Pedigree/Sex schemas?

In genetics/genomics, we usually use "1" for male, and "2" for female (easy to remember as equivalent to the normal number of X chromosomes per sex), but in the documentation and the examples for the Pedigree (https://phenopacket-schema.readthedocs.io/en/latest/pedigree.html) and Sex fields (https://phenopacket-schema.readthedocs.io/en/latest/sex.html#rstsex), it indicates that females should be encoded as 1 and males as 2.

Unfortunately the NICT links on the Sex page didn't work for me, so I can't check if they do have F1 and M2. However, if this is the case, then there is an inconsistency between NICT and what is defined in PLINK, so a choice needs to be made, as the documentation is inconsistent. Personally, I would suggest we stick with PLINK.

This may seem trivial, but obviously could lead to a lot of miscoding that will be a pain for users to rectify later.

@slaurie
Copy link
Author

slaurie commented Sep 17, 2024

P.S. If we were to stick with PLINK values, this would also affect Karyotipic Sex values
https://phenopacket-schema.readthedocs.io/en/latest/karyotypicsex.html#data-model

@julesjacobsen
Copy link
Collaborator

Yes, they are correct - note that the ORDINAL value isn't what should be used in the phenopacket, the value in the NAME field for these docs is what should be used, as shown in the example : https://phenopacket-schema.readthedocs.io/en/latest/sex.html#example

Example code for converting from PED /PLINK to Phenopackets would be

int pedSex = ...
var phenopacketSex = switch(pedSex) {
    case 1 -> FEMALE;
    case 2 -> MALE;
    case 3 -> OTHER_SEX;
    default -> UNKNOWN_SEX;

It's unfortunate that the ordinals don't match the PLINK values but they are different things.

@slaurie
Copy link
Author

slaurie commented Sep 17, 2024

Thanks for confirming, that your encoding is correct, and I noticed that the actual terms MALE and FEMALE are used explicitly which is great.

However, there is still an error on the "Pedigree" documentation (https://phenopacket-schema.readthedocs.io/en/latest/pedigree.html),
where it states:

"In a PED file, the sex of individuals is encoded as a “1” for females, “2” for males, and “0” for unknown. Phenopackets uses Sex instead.",

and this is also the encoding you have used in the examples., whereas the PLINK PED documentation (https://zzz.bwh.harvard.edu/plink/data.shtml) has

"Sex (1=male; 2=female; other=unknown)"

So it might be worth correcting this somehow. I do much prefer the explicit MALE and FEMALE.

Thanks, Steve

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants