-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Get lineage phenotypes #1
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plsteinberg, nice to see work on this.
Honestly, the scope of what you are doing here is too much for me to provide detailed code reviews, so I am approving this (so you can merge whatever you want) but also making some high-level comments of things you might want to consider changing before you merge. Also tagging @trvrb in case he has input.
- The README seems out of date after your changes. (This is fine if this is just an intermediate pull request, but wanted to note it).
- You can use your judgment how much to keep my code versus just totally replace it. Repo might be clunky if you add without replacing. Don't worry too much about overwriting stuff. We can always go back in
git
history to get older things. - Regarding your question on NTD. Basically, spike has several domains. It can be divided various ways, but typically it is
NTD - RBD - [a bit more of S1 sometimes called SD1] - S2
. The NTD is probably the second most important part evolutionarily (at least for antibody escape) after the RBD. So you do not want to drop the NTD (or any region really). I think it would be reasonable to divide in either of the following ways: (a) RBD and everything else; (b) NTD, RBD, and everything else. The "everything else" category will mostly include S2, but will also include bits of S1. - This is really minor, but a few of your CSVs have a first column that is just the index (empty column with sequential numbers). Write your CSV files using pandas with
.to_csv(<file>, index=False)
to get rid of this. - I think you might eventually want different configs for different variants as best I remember, but I'm not sure.
Thanks for your comments! Good point, I will update the README with the commands and descriptions of the second notebook once I get other people's input on whether this is what we want/need. And I just changed the config to also include the NTD, thanks! |
@plsteinberg and @jbloom --- Is this now ready to merge? Is looks like it might be? |
@plsteinberg is probably better suited than me to answer that question, but I think yes. Although I should not that it looks like Cornelius is no longer maintaining his Pango lineage definition JSON starting as of a few months ago. |
sorry! It does not seem like Marlin ended up using the lineage_phenotypes, so I don't know if it should be merged. I am not sure if that is because we need to be using different data (ie not use the yeast RBD data or use updated data in general) or I need to fix my code. If there is a specific ask let me know. |
Modified run for assigning phenotpes to lineages.
Added:
lineage_phenotypes.ipnyb
(based on SARS2-spike-predictor-phenos.ipnyb)XBB_configs.yaml
(based on config.yaml)lineage_phenotypes.csv
andlineage_phenotypes_randomized.csv
pivot_spike_pseudovirus_DMS_XBB.1.5.csv
Modified:
lineage_phenotypes.ipnyb
only outputs the intermediate files andlineage_phenotypes.csv
and replaced "clade" to "lineage" throughout notebook and config.pivot_spike_pseudovirus_DMS_XBB.1.5.csv
.XBB_configs.yaml
selects for spike RBD (region=RBD) and spike non-RBD (region=S2). Should I keep NDT and other as well?XBB_configs.yaml
only uses XBB as reference. Should we eventually create a seperate<variant>_config.yaml
for BA2.86 and BA.2? In that case, it would make sense to provide the config in the papermill command rather than in the notebook itself.mutation_phenotypes.csv
andmutation_phenotypes_randomized.csv
while runninglineage_phenotypes.ipnyb
. Sorry, I am guessing it would be good to keep the original ones too.Removed from SARS2-spike-predictor-phenos.ipnyb:
Results:
Header of
lineage_phenotypes.csv
Missing from Trevor's list in slack: