Skip to content

lingxusb/TXpredict

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TXpredict:predicting microbial transcriptome using genome sequence

github

We present TXpredict, a transcriptome prediction tool that generalizes to novel microbial genomes. By leveraging information learned from a large protein language model (ESM2), TXpredict achieves an average Spearman correlation of 0.53 in predicting gene expressions for new bacterial genomes. We further extend this framework to predict transcriptomes for 900 additional microbial genomes spanning 280 genera, a large proportion of which remain uncharacterized at the transcriptional level. Additionally, TXpredict enables the prediction of condition-specific gene expression, providing a powerful tool for understanding microbial adaptation and facilitating rational design of gene regulatory sequences.

Models

Our transcriptome prediciton models are available from Huggingface.

Colab notebooks

We have provided Colab notebooks for transcriptome prediction in the web browser. Please also check our Colab instruction

  • The only required inputs are genome sequence file (.fna or .fasta) and the annotation file (.gtf, .gff or .gff3). Please check our example data
  • Please connect to a GPU instance (e.g. T4, Runtime -> Change runtime type -> T4 GPU).
  • It takes ~20min to predict transcriptome for a genome with 4k genes.

Acknowledgement

We deeply appreciate the experimental works and datasets that make our work possible.

References

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published