DePTH is a predictive model designed to assess the interaction between Human leukocyte antigens(HLA) and T-cell receptor(TCR). By inputting an HLA and a TCR, the model generates a score representing their association. For details on the model structure and usage, please refer to https://liusi2019.github.io/DePTH-tutorial/. This GitHub repository provides an analysis of several significant confounding factors involved in the DePTH model, along with improvements to the original model, resulting in DePTH 2.0. All code in this repository follows the methodology described in the paper: Improved Deep Learning Prediction of TCR-HLA Associations.
This repository focuses on the further development and analysis of the DePTH model, including:
-
Confounding Factors Analysis
- Analysis of factors such as:
- TCR Generation Probability
- CDR3 Length
- Analysis of factors such as:
-
DePTH 2.0 Enhancements
- Updates and improvements to the original DePTH model for better performance and usability.
In our analysis, we primarily use the Delmonte data and TCGA data. The data provided in the data
folder has already been preprocessed and is ready for testing. For the raw Delmonte and TCGA data, please refer to Delmonte and TCGA. Currently, the preprocessing steps are not included in this repository, but we plan to add them in the future. The preprocessed data provides the HLA-TCR pairs in the Positive Set and Negative Set. For details on the selection criteria, please refer to the supplementary materials of Improved Deep Learning Prediction of TCR-HLA Associations.
All the analyses conducted in this project are implemented in R. Depending on the type of analysis, different R packages are required. For convenience, we list all the required packages below. Please ensure they are installed before running the scripts.
To run the analyses, you will need the following R packages:
tidyr
dplyr
lmtest
stringr
ggplot2
gridExtra
plotly
Matrix
uwot
irlba
pROC
ggpointdensity
viridis
purrr
igraph
pbapply
data.table
Rtsne
leidenbase
reshape2
You can install the required packages in R using the following command:
required_packages <- c(
"tidyr", "dplyr", "lmtest", "stringr", "ggplot2", "gridExtra", "plotly",
"Matrix", "uwot", "irlba", "pROC", "ggpointdensity", "viridis", "purrr",
"igraph", "pbapply", "data.table", "Rtsne", "leidenbase", "reshape2"
)
install.packages(setdiff(required_packages, installed.packages()[, "Package"]))
The analysis of TCR generation probability focuses on understanding the relationship between TCR generation probability and the scores generated by the DePTH model. This is further divided into two main aspects:
- Investigating the relationship between TCR generation probability and the DePTH mean score (the average score between a given TCR and all HLAs).
- Exploring whether TCR generation probability alone provides valuable insights into the association between HLA and TCR.
This analysis helps to assess the extent to which TCR generation probability influences or correlates with DePTH model predictions.
The analysis of CDR3 length focuses on understanding its relationship with the scores provided by the DePTH model. This is divided into three main aspects:
- Examining the distribution of CDR3 length in the Positive Set and Negative Set.
- Exploring the relationship between CDR3 length and the DePTH mean score (the average score between a given TCR and all HLAs).
This analysis helps to assess the extent to which CDR3 Length influences or correlates with DePTH model predictions.
Additionally, other analyses and validations mentioned in the paper can be found in the corresponding files within the analysis
folder. To run these analyses:
- Download the
data
folder. - Update the
data
directory paths in theanalysis
scripts. - Execute the scripts to reproduce the results.