You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently fine-tuned scGPT using perturb-seq data to tackle a perturbation prediction task. Specifically, I aimed to predict gene expression levels for each perturbation condition where a single gene was perturbed at a time.
Here are the key details:
Fine-tuning process: I fine-tuned scGPT using perturb-seq data with standard hyperparameters and followed the recommended pipeline.
Prediction results: After fine-tuning, I generated predictions for all perturbation conditions. However, I observed that the Pearson correlation R2 values between the predicted gene expression profiles across different perturbations are consistently around 0.99, suggesting highly similar predictions regardless of the perturbation.
This high similarity in predictions was unexpected, as I anticipated more variation in the predicted expression profiles for different perturbations.
My questions:
Have others encountered similar results when using scGPT for perturb-seq data or similar tasks?
What could be the possible reasons for this behavior? Could it be related to:
Model architecture or loss function configuration?
Insufficient fine-tuning or suboptimal hyperparameters?
Data preprocessing or the inherent nature of perturb-seq data?
What strategies would you recommend to improve the model's sensitivity to different perturbations and generate more distinct predictions?
Thank you for your time and support! I'm looking forward to any insights or suggestions on how to address this issue.
The text was updated successfully, but these errors were encountered:
Hi @ellieujin!
I am currently stuck in finding the so-called condition tokens. I checked the input to the model step by step but I found that seems only the binned gene values are gathered, not like the paper claimed: emb = gene_id + gene_value + contional_token.
I wonder how you changed your Pertuabation Conditions? Was that part of the so-called condition tokens?
Hi @jumbokun,
I guess the TransformerGenerator in scgpt/model/generation_model.py on GitHub handles the gene_id + gene_value + condition_token embedding, especially in the _encode part.
When setting the perturbation conditions, I simply used the condition column from adata.obs before processing it with PertData.
Hello, and thank you for this amazing tool!
I recently fine-tuned scGPT using perturb-seq data to tackle a perturbation prediction task. Specifically, I aimed to predict gene expression levels for each perturbation condition where a single gene was perturbed at a time.
Here are the key details:
This high similarity in predictions was unexpected, as I anticipated more variation in the predicted expression profiles for different perturbations.
My questions:
Thank you for your time and support! I'm looking forward to any insights or suggestions on how to address this issue.
The text was updated successfully, but these errors were encountered: