Add package for Allentoft at al. 2024 #202

smpeltola · 2024-08-15T13:03:18Z

PR Checklist for a new package submission

The package does not exist already in the community archive, also not with a different name.
The package title in the POSEIDON.yml conforms to the general title structure suggested here: <Year>_<Last name of first author>_<Region, time period or special feature of the paper>, e.g. 2021_Zegarac_SoutheasternEurope, 2021_SeguinOrlando_BellBeaker or 2021_Kivisild_MedievalEstonia.
The package is stored in a directory that is named like the package title.

The Publication column in the .janno file is filled and the respective .bib file has complete entries for the listed mentioned keys.
The .janno file does not include any empty columns or columns only filled with n/a.
The order of columns in the .janno file adheres to the standard order as defined in the Poseidon schema here.
The .janno and the .ssf files are not fully quoted, so they only use single- or double quotes ("...", '...') to enclose text fields where it is strictly necessary (i.e. their entry includes a TAB).

The package passes a validation with trident validate --fullGeno.

Large genotype data files are properly tracked with Git LFS and not directly pushed to the repository. For an instruction on how to set up Git LFS please look here. If you accidentally pushed the files the wrong way you can fix it with git lfs migrate import --no-rewrite path/to/file.bed (see here).

[Automatic PR] Update chronicle file

stschiff · 2024-08-19T08:48:47Z

Thanks a lot, @smpeltola, for preparing this. Where did you get the genotype data from? Did you process the raw data yourself?

smpeltola · 2024-08-19T10:33:48Z

@stschiff, the genotypes are directly from the vcf files Allentoft et al. provided, here: https://erda.ku.dk/archives/917f1ac64148c3800ab7baa29402d088/published-archive.html. I have pulled down 1240k positions, and for imputed data, set low quality imputed genotypes (maxGP < 0.99) to missing, and converted to PLINK.

stschiff · 2024-09-02T08:31:52Z

OK brillant. I'll take a look soon.

stschiff · 2024-10-09T07:54:57Z

Sorry for taking so long @smpeltola. I think there is an amazing amount of work in this package, and I don't want to stand in the way of having it merged-in quickly. A quick question to the team:

This package does something we haven't seen before, I think, in that it provides both the imputed and the non-imputed genotypes within one package. @smpeltola has painstakingly marked all duplicate pairs as "identical", so in principle I think the information to properly analyze this dataset is there. The Poseidon_IDs reflect the processing, for example "NEO100" vs. "NEO100_imputed". I think this isn't exactly how we envisioned this, but it technically doesn't break our schema.

Any thoughts from the team? If not, I suggest @AyGhal can merge this in, I couldn't see any obvious problem with the package. Thanks again @smpeltola for this work!

AyGhal · 2024-10-09T15:53:36Z

The package looks good to me, thanks @smpeltola!
As far as I understood, both imputed and non-imputed data are 1240k pull-downs. I agree that this does not break our schema. The only thing I can request/recommend is to fill the Nr_SNPs column.
What do you all think?

stschiff · 2024-10-11T10:13:03Z

Thanks, @AyGhal. Is filling the Nr_SNPs column easi, @smpeltola, in the sense that it can be done within an hour or so? If not, I suggest we merge this in as is and try to fill it later. I was anyway at some point planning to make that a feature of trident rectify, since it is directly retrievable from the genotype data.

AyGhal · 2024-10-11T13:08:46Z

Okay. We'll add that later.

nevrome · 2024-10-23T19:31:27Z

I was just investigating why our mastodon action did not trigger when this PR was merged. It took me a while to realize that @smpeltola manually updated the chronicle file in this PR and therefore preempted the automated processes. You're impressively thorough 😮! It's not necessary to do this manually, because we have a GitHub action, which also triggers other downstream automation. But props to you for the attention to detail!

@stschiff & @AyGhal: Let's keep an eye out for this in future reviews. I never thought somebody would notice the archive.chron file and manually update it.

smpeltola · 2024-10-25T18:23:36Z

Sorry about that @nevrome ! I saw a notification from the chronicle file and just clicked buttons until it went away 😅 Maybe you can add a note in the submission instructions that those can be ignored.

@stschiff I didn't fill the Nr_SNPs because, for some reason, though that trident rectify already had the functionality to retrieve them from the genotype data. I can fill them in if that's needed!

stschiff · 2024-10-28T08:59:26Z

All good, @smpeltola, I think we were just impressed that you went to the trouble and updating the chronicle file manually. And indeed, we should definitely have a feature to fill-in the number of SNPs. I will put this on the to-do list. In the mean-time, whenever you find a minute to do it, a Pull Request to update the number of SNPs would be welcome!

Sanni Peltola and others added 3 commits August 15, 2024 14:29

Genotypes from Allentoft et al. 2024

6aa530c

Update of chronicle file

65c1fef

Merge pull request #1 from smpeltola/chronicleUpdates

7c6fc68

[Automatic PR] Update chronicle file

nevrome changed the title ~~Add Allentoft at al. 2024~~ Add package for Allentoft at al. 2024 Sep 6, 2024

AyGhal merged commit a8d321b into poseidon-framework:master Oct 11, 2024
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add package for Allentoft at al. 2024 #202

Add package for Allentoft at al. 2024 #202

smpeltola commented Aug 15, 2024 •

edited

Loading

stschiff commented Aug 19, 2024

smpeltola commented Aug 19, 2024

stschiff commented Sep 2, 2024

stschiff commented Oct 9, 2024

AyGhal commented Oct 9, 2024

stschiff commented Oct 11, 2024

AyGhal commented Oct 11, 2024

nevrome commented Oct 23, 2024

smpeltola commented Oct 25, 2024

stschiff commented Oct 28, 2024

Add package for Allentoft at al. 2024 #202

Add package for Allentoft at al. 2024 #202

Conversation

smpeltola commented Aug 15, 2024 • edited Loading

PR Checklist for a new package submission

stschiff commented Aug 19, 2024

smpeltola commented Aug 19, 2024

stschiff commented Sep 2, 2024

stschiff commented Oct 9, 2024

AyGhal commented Oct 9, 2024

stschiff commented Oct 11, 2024

AyGhal commented Oct 11, 2024

nevrome commented Oct 23, 2024

smpeltola commented Oct 25, 2024

stschiff commented Oct 28, 2024

smpeltola commented Aug 15, 2024 •

edited

Loading