Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding new package for Early Contact paper by Penske et al. 2023. #146

Merged
merged 5 commits into from
Nov 24, 2023

Conversation

AyGhal
Copy link
Contributor

@AyGhal AyGhal commented Nov 15, 2023

PR Checklist for a new package submission

  • The package does not exist already in the community archive, also not with a different name.
  • The package title in the POSEIDON.yml conforms to the general title structure suggested here: <Year>_<Last name of first author>_<Region, time period or special feature of the paper>, e.g. 2021_Zegarac_SoutheasternEurope, 2021_SeguinOrlando_BellBeaker or 2021_Kivisild_MedievalEstonia.
  • The package is stored in a directory that is named like the package title.

  • The package is complete and features the following elements:
    • Genotype data in binary PLINK format (not EIGENSTRAT format).
    • A POSEIDON.yml file with not just the file-referencing fields, but also the following meta-information fields present and filled: poseidonVersion, title, description, contributor, packageVersion, lastModified (see here for their definition)
    • A reasonably filled .janno file (for a list of available fields look here and here for more detailed documentation about them).
    • A .bib file with the necessary literature references for each sample in the .janno file.
  • Every file in the submission is correctly referenced in the POSEIDON.yml file and there are no additional, supplementary files in the submission that are not documented there.
  • Genotype data, .janno and .bib file are all named after the package title and only differ in the file extension.
  • The package version in the POSEIDON.yml file is 1.0.0.
  • The poseidonVersion of the package in the POSEIDON.yml file is set to the latest version of the Poseidon schema.
  • The POSEIDON.yml file contains the corresponding checksums for the fields genoFile, snpFile, indFile, jannoFile and bibFile.
  • There is either no CHANGELOG file or one with a single entry for version 1.0.0.

  • The Publication column in the .janno file is filled and the respective .bib file has complete entries for the listed mentioned keys.
  • The .janno file does not include any empty columns or columns only filled with n/a.
  • The order of columns in the .janno file adheres to the standard order as defined in the Poseidon schema here.

  • The package passes a validation with trident validate --fullGeno.

  • Large genotype data files are properly tracked with Git LFS and not directly pushed to the repository. For an instruction on how to set up Git LFS please look here. If you accidentally pushed the files the wrong way you can fix it with git lfs migrate import --no-rewrite path/to/file.bed (see here).

@AyGhal AyGhal requested review from stschiff and nevrome November 15, 2023 13:19
Copy link
Member

@stschiff stschiff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! Seems all good to me.

@nevrome
Copy link
Member

nevrome commented Nov 21, 2023

Indeed cool! I love how well this package follows the standard. Some minor observations:

  • There is a typo in the Relation_Type column for of the .janno file for YUN012: "Dauther" -> "Daughter"
  • Another two typos in the Source_Tissue column for YUN007 and PIE034: "tooh" -> "tooth"

Nothing to fix beyond that. Just some things that slightly confused me:

  • The failed radiocarbon dates might be an issue. Don't get me wrong: The way you marked them here makes a lot of sense to me. But that means, that having an entry in Date_Type is generally not sufficient to conclude that we actually have age information for a sample! That's an oversight in the schema and the code, I fear. We may have to fix this eventually and thus render this package invalid. And this, in turn, is a big issue for the infrastructure (see Breaking changes and maintaining reproducibility poseidon-hs#276). We should discuss this, @stschiff.
  • What does this monstrosity refer to in the Contamination_Meas column: ANGSD_Xchrom_contamination_males_MoM_Method2_new_llh? Is that clear to ANGSD users?
  • A number of relationships point to individuals that are not in this package. Are these somewhere else in the community-archive? E.g. PIE009. And then there are identical individuals that are already merged in the package. Are they published somewhere else? If not, and if they only exist in Sandra's data heap, then maybe we don't have to list them at all?

@AyGhal
Copy link
Contributor Author

AyGhal commented Nov 21, 2023

Thanks for checking!

  • Typos can be easily fixed!

  • for Contamination_Meas: MoM_Method2_new_llh that is one of the outputs of ANGSD. Maybe we can reduce it to ANGSD_MoM_Method2_new ? because there are results for method 1 & 2 and each has new and old version.

  • PIE009 is low coverage, so that one was not included in genotype files Sandra shared with me. But I don't know if they uploaded the bam file to ENA or not.

@nevrome
Copy link
Member

nevrome commented Nov 22, 2023

I see - I think it's good to keep PIE099 in the table then.

And what do the MoM and the llh in MoM_Method2_new_llh mean? Maybe I should check myself instead of bugging you. Just curious, at this point. It's actually neat that ANGSD has a way to describe the exact method used. A slightly convoluted one, though.

@AyGhal
Copy link
Contributor Author

AyGhal commented Nov 22, 2023

Honestly, I also don't know exactly. But there is this page about it. http://www.popgen.dk/angsd/index.php/Contamination
But might have to check the supplement of Rasmussen 2011.

@AyGhal
Copy link
Contributor Author

AyGhal commented Nov 24, 2023

fixed the issues discussed in the meeting.

@nevrome nevrome merged commit 45f9198 into poseidon-framework:master Nov 24, 2023
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants