-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arranging individuals in a poseidon package #300
Comments
If I'm not mistaken this is not the case. The order in forgeScript does not translate to the output order. It's a selection language only. The sample order in the We discussed the idea to change this behaviour in the past in #230, but it would be a big change. @stschiff wrote back then:
Back then we decided to postpone this. Now regarding a pure |
That would cover my use case perfectly, and then some! |
I completely agree that we need this. And I think it should be an option for |
i think a subcommand that rearranged the individuals while adding a changelog entry and incrementing the patch version also makes sense. afaik forge always creates a new package, potentially losing any info that was in the YAML file, right? |
Good point. I mean, the functionality is the bit that we need to look into. The API can be discussed. I can also imagine both. If we have a |
Brilliant, @stschiff! Regarding the bigger workflow of rearranging the individuals in a package according to some property I would like to point out that qjanno "SELECT '<'||Poseidon_ID||'>' FROM d(2018_Lamnidis_Fennoscandia) ORDER BY Poseidon_ID DESC" --raw --noOutHeader Maybe the recommended workflow to reorder a package should indeed be as follows:
An in-place reordering of the package where data does not get copied is not a good idea, @TCLamnidis. This can easily lead to corrupted packages and data loss. So using forge for this is overall a good idea, I think. The only issue not covered by this workflow is the loss of the .yml file information. For this I wonder if we should add yet another flag to forge. It could be called |
yes, I was thinking along the same lines. I was even wondering whether we should make it a default that forge always copies the old YAML, README and CHANGELOG as long as there is only a single source package? Wouldn't that make sense and be the expected behaviour? We could even output a log-info "All output comes from a single source package, copying README, CHANGELOG and YAML" or so? |
I would prefer a specific flag. Otherwise adding or removing a single individual from a selection could lead to a very different output. That is confusing - especially for automatic processing. |
OK, I agree. Also to not make this a breaking change, a flag would be good. There are two basic options: 1.) A flag 2.) An option I would vote for option 1. |
Implemented and merged 1.) in #304 |
A feature in trident where it can arrange the individuals in a poseidon package by the Poseidon_ID or Group_ID would be nice to have.
Practically, the operation is somewhat superfluous, as most tools do not care about the order of individuals on input. However, being able to rearrange the janno file in some basic ways without breaking the package can be very helpful for basic categorisation of information. Currently, changing the order of individuals in the janno is prohibited without also changing the order in the fam/ind file, which in turn means also changing the order in the bed/geno file.
This is a problem when dealing with reviewing PRs that add or update packages where the information is not in an intuitive order. Minotaur suffers from this problem particularly, because nextflow's asynchronous nature means that the order of individuals in the package will be different with each rerun of a package, and will always be random unless steps are taken to reorder janno/ind/geno files during package creation.
The minotaur-specific issue of order when rerunning packages is somewhat mitigated by
jannocoalesce
, but reordering of individuals is a data manipulation that might be useful for many users, and it makes sense to do within trident since it requires fiddling with the actual genotype data as well as the janno.I assume this is technically already possible through some complicated
forge
command, where a user specifies all the poseidon_IDs in<>
and in the desired order, but having an option or subcommand to sort the package without needing to explicitly state the order would be a nice quality of life feature imo.The text was updated successfully, but these errors were encountered: