-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add package for Allentoft at al. 2024 #202
Conversation
[Automatic PR] Update chronicle file
Thanks a lot, @smpeltola, for preparing this. Where did you get the genotype data from? Did you process the raw data yourself? |
@stschiff, the genotypes are directly from the vcf files Allentoft et al. provided, here: https://erda.ku.dk/archives/917f1ac64148c3800ab7baa29402d088/published-archive.html. I have pulled down 1240k positions, and for imputed data, set low quality imputed genotypes (maxGP < 0.99) to missing, and converted to PLINK. |
OK brillant. I'll take a look soon. |
Sorry for taking so long @smpeltola. I think there is an amazing amount of work in this package, and I don't want to stand in the way of having it merged-in quickly. A quick question to the team: This package does something we haven't seen before, I think, in that it provides both the imputed and the non-imputed genotypes within one package. @smpeltola has painstakingly marked all duplicate pairs as "identical", so in principle I think the information to properly analyze this dataset is there. The Poseidon_IDs reflect the processing, for example "NEO100" vs. "NEO100_imputed". I think this isn't exactly how we envisioned this, but it technically doesn't break our schema. Any thoughts from the team? If not, I suggest @AyGhal can merge this in, I couldn't see any obvious problem with the package. Thanks again @smpeltola for this work! |
The package looks good to me, thanks @smpeltola! |
Thanks, @AyGhal. Is filling the |
Okay. We'll add that later. |
I was just investigating why our mastodon action did not trigger when this PR was merged. It took me a while to realize that @smpeltola manually updated the chronicle file in this PR and therefore preempted the automated processes. You're impressively thorough 😮! It's not necessary to do this manually, because we have a GitHub action, which also triggers other downstream automation. But props to you for the attention to detail! @stschiff & @AyGhal: Let's keep an eye out for this in future reviews. I never thought somebody would notice the |
Sorry about that @nevrome ! I saw a notification from the chronicle file and just clicked buttons until it went away 😅 Maybe you can add a note in the submission instructions that those can be ignored. @stschiff I didn't fill the Nr_SNPs because, for some reason, though that trident rectify already had the functionality to retrieve them from the genotype data. I can fill them in if that's needed! |
All good, @smpeltola, I think we were just impressed that you went to the trouble and updating the chronicle file manually. And indeed, we should definitely have a feature to fill-in the number of SNPs. I will put this on the to-do list. In the mean-time, whenever you find a minute to do it, a Pull Request to update the number of SNPs would be welcome! |
PR Checklist for a new package submission
POSEIDON.yml
conforms to the general title structure suggested here:<Year>_<Last name of first author>_<Region, time period or special feature of the paper>
, e.g.2021_Zegarac_SoutheasternEurope
,2021_SeguinOrlando_BellBeaker
or2021_Kivisild_MedievalEstonia
.POSEIDON.yml
file with not just the file-referencing fields, but also the following meta-information fields present and filled:poseidonVersion
,title
,description
,contributor
,packageVersion
,lastModified
(see here for their definition).janno
file (for a list of available fields look here and here for more detailed documentation about them)..bib
file with the necessary literature references for each sample in the.janno
file.POSEIDON.yml
file and there are no additional, supplementary files in the submission that are not documented there..janno
and.bib
file are all named after the package title and only differ in the file extension.POSEIDON.yml
file is1.0.0
.poseidonVersion
of the package in thePOSEIDON.yml
file is set to the latest version of the Poseidon schema.POSEIDON.yml
file contains the corresponding checksums for the fieldsgenoFile
,snpFile
,indFile
,jannoFile
andbibFile
.CHANGELOG
file or one with a single entry for version1.0.0
.Publication
column in the.janno
file is filled and the respective.bib
file has complete entries for the listed mentioned keys..janno
file does not include any empty columns or columns only filled withn/a
..janno
file adheres to the standard order as defined in the Poseidon schema here..janno
and the.ssf
files are not fully quoted, so they only use single- or double quotes ("..."
,'...'
) to enclose text fields where it is strictly necessary (i.e. their entry includes a TAB).trident validate --fullGeno
.git lfs migrate import --no-rewrite path/to/file.bed
(see here).