Get lineage phenotypes #1

plsteinberg · 2024-09-20T22:41:21Z

Modified run for assigning phenotpes to lineages.

papermill -p config_yaml XBB_config.yaml lineage_phenotypes.ipynb results/lineage_phenotypes.ipynb

Added:

lineage_phenotypes.ipnyb (based on SARS2-spike-predictor-phenos.ipnyb)
XBB_configs.yaml (based on config.yaml)
lineage_phenotypes.csv and lineage_phenotypes_randomized.csv
pivot_spike_pseudovirus_DMS_XBB.1.5.csv

Modified:

lineage_phenotypes.ipnyb only outputs the intermediate files and lineage_phenotypes.csv and replaced "clade" to "lineage" throughout notebook and config.
Split the spike pseudovirus DMS input data by region (RBD, S2, NDT, other) and saved intermediate file pivot_spike_pseudovirus_DMS_XBB.1.5.csv.
XBB_configs.yaml selects for spike RBD (region=RBD) and spike non-RBD (region=S2). Should I keep NDT and other as well?
XBB_configs.yaml only uses XBB as reference. Should we eventually create a seperate <variant>_config.yaml for BA2.86 and BA.2? In that case, it would make sense to provide the config in the papermill command rather than in the notebook itself.
Overwrote mutation_phenotypes.csv and mutation_phenotypes_randomized.csv while running lineage_phenotypes.ipnyb. Sorry, I am guessing it would be good to keep the original ones too.

Removed from SARS2-spike-predictor-phenos.ipnyb:

Growth data step
Summary plot step
Removed code to avoid ValueError: some clades have growth data but are not defined {'XDV.1'} corneliusroemer/pango-sequences#9.

Results:

Header of lineage_phenotypes.csv

'lineage', 'date', 'parent', 'spike muts from Wuhan-Hu-1',
     'number spike muts from Wuhan-Hu-1', 'spike muts from XBB.1.5',
     'descendant of XBB',
     'spike pseudovirus DMS RBD human sera escape relative to XBB.1.5',
     'spike pseudovirus DMS RBD ACE2 binding relative to XBB.1.5',
     'spike pseudovirus DMS RBD spike mediated entry relative to XBB.1.5',
     'spike pseudovirus DMS S2 human sera escape relative to XBB.1.5',
     'spike pseudovirus DMS S2 ACE2 binding relative to XBB.1.5',
     'spike pseudovirus DMS S2 spike mediated entry relative to XBB.1.5',
     'RBD yeast-display DMS ACE2 affinity relative to XBB.1.5',
     'RBD yeast-display DMS RBD expression relative to XBB.1.5',
     'RBD yeast-display DMS escape relative to XBB.1.5',
     'EVEscape relative to XBB.1.5', 'Hamming distance relative to XBB.1.5'

Missing from Trevor's list in slack:

number of ORF1ab mutations
number of accessory protein mutations

jbloom

@plsteinberg, nice to see work on this.

Honestly, the scope of what you are doing here is too much for me to provide detailed code reviews, so I am approving this (so you can merge whatever you want) but also making some high-level comments of things you might want to consider changing before you merge. Also tagging @trvrb in case he has input.

The README seems out of date after your changes. (This is fine if this is just an intermediate pull request, but wanted to note it).
You can use your judgment how much to keep my code versus just totally replace it. Repo might be clunky if you add without replacing. Don't worry too much about overwriting stuff. We can always go back in git history to get older things.
Regarding your question on NTD. Basically, spike has several domains. It can be divided various ways, but typically it is NTD - RBD - [a bit more of S1 sometimes called SD1] - S2. The NTD is probably the second most important part evolutionarily (at least for antibody escape) after the RBD. So you do not want to drop the NTD (or any region really). I think it would be reasonable to divide in either of the following ways: (a) RBD and everything else; (b) NTD, RBD, and everything else. The "everything else" category will mostly include S2, but will also include bits of S1.
This is really minor, but a few of your CSVs have a first column that is just the index (empty column with sequential numbers). Write your CSV files using pandas with .to_csv(<file>, index=False) to get rid of this.
I think you might eventually want different configs for different variants as best I remember, but I'm not sure.

plsteinberg · 2024-09-23T18:59:24Z

Thanks for your comments!

Good point, I will update the README with the commands and descriptions of the second notebook once I get other people's input on whether this is what we want/need.

And I just changed the config to also include the NTD, thanks!

trvrb · 2024-12-23T23:53:34Z

@plsteinberg and @jbloom --- Is this now ready to merge? Is looks like it might be?

jbloom · 2025-01-02T04:13:59Z

@plsteinberg is probably better suited than me to answer that question, but I think yes. Although I should not that it looks like Cornelius is no longer maintaining his Pango lineage definition JSON starting as of a few months ago.

plsteinberg · 2025-01-10T18:25:25Z

sorry! It does not seem like Marlin ended up using the lineage_phenotypes, so I don't know if it should be merged. I am not sure if that is because we need to be using different data (ie not use the yeast RBD data or use updated data in general) or I need to fix my code. If there is a specific ask let me know.

philippa-steinberg added 4 commits September 20, 2024 15:13

Modify SARS2-spike-predictor-phenos, rename clade to lineage

f732519

Modify config.yaml for lineage_phenotypes.ipnyb

db49d15

Intermediate pivot df by region

d1e0194

Results from lineage_phenotypes.ipnyb

497b7d4

jbloom approved these changes Sep 21, 2024

View reviewed changes

Include NTD in phenotypes

f1a7803

philippa-steinberg added 3 commits October 1, 2024 13:33

Split RBD and Non-RBD

b873472

Add ORF1a, accessory, mut counts

04bf002

Change domain range

f4f722e

marlinfiggins mentioned this pull request Dec 19, 2024

First pass at making snakemake workflow for innovation model blab/ncov-escape#11

Merged

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get lineage phenotypes #1

Get lineage phenotypes #1

plsteinberg commented Sep 20, 2024 •

edited

Loading

jbloom left a comment

plsteinberg commented Sep 23, 2024

trvrb commented Dec 23, 2024

jbloom commented Jan 2, 2025

plsteinberg commented Jan 10, 2025 •

edited

Loading

Get lineage phenotypes #1

Are you sure you want to change the base?

Get lineage phenotypes #1

Conversation

plsteinberg commented Sep 20, 2024 • edited Loading

jbloom left a comment

Choose a reason for hiding this comment

plsteinberg commented Sep 23, 2024

trvrb commented Dec 23, 2024

jbloom commented Jan 2, 2025

plsteinberg commented Jan 10, 2025 • edited Loading

plsteinberg commented Sep 20, 2024 •

edited

Loading

plsteinberg commented Jan 10, 2025 •

edited

Loading