dsMTL (Federated Multi-Task Learning based on DataSHIELD) provided federated, privacy-preserving MTL analysis. dsMTL was developed based on DataSHIELD, an ecosystem supporting the federated analysis of sensitive individual-level data that remains stored behind the data owner’s firewall throughout analysis. Multi-task Learning (MTL) aimed at simultaneously learning the outcome (e.g. diagnosis) associated patterns across datasets with dataset-specific, as well as shared, effects. MTL has numerous exciting application areas, such as comorbidity modeling, and has already been applied successfully for e.g. disease progression analysis.
dsMTL currently includes three supervised and one unsupervised federated multi-task learning as well as two federated machine learning algorithms. Each algorithm captured a specific form of cross-cohort heterogeneity, which was linked to different applications in molecular studies.
Name | Type | Task | Effect |
---|---|---|---|
dsLasso |
ML | Classification/Regression | Train a Lasso model on the conbained cohorts |
dsLassoCov |
ML | Classification/Regression | Federated ML model that can capture the covariate effect |
dsMTL_L21 |
MTL | Classification/Regression | Screen out unimportant features to all tasks |
dsMTL_trace |
MTL | Classification/Regression | Identify models represented in low-dimentional spcae |
dsMTL_net |
MTL | Classification/Regression | Incorporate task-relatedness described as a graph |
dsMTL_iNMF |
MTL | Matrix factorization | Factorize matrices into shared and specific components |
Note
In addition to the original architecture for federated MTL approaches as presented in the initial publication (Cao et al., 2022) and described above, which supports the analysis of sensitive individual-level data from geographically distributed data sources using the DataSHIELD platform, the concept of differential privacy has been included into the dsMTL package in 2025. This optional feature offers an additional security mechanism that is specific to the MTL models. In particular, differential privacy can provide a protection against so-called membership inference attacks to the models. Instructions and examples on how to set up the optional differential privacy feature are given in the documentation for dsMTLClient.
The Client-side package can be found:dsMTLClient
To enable dsMTLBase functions, the DataSHIELD server has to be installed first. dsMTLBase can be installed smoothly based on DataSHIELD server in several ways.
dsMTLBase was tested on the dependent softwares with minimum versions
Opal 3.0.3
dsBase 6.1.0
resourcer 1.0.1
R >= 3.5.0
The complete document for describing the installation of DataSHIELD server from scratch was here. Alternatively, to test dsMTL functions, one was recommonded to dowload a well-configured DataSHIELD server and install locally using Virtualbox. This tutorial was provided by DataSHIELD team. A quick test of dsMTLBase installation is possible based on opal demo server. This is an open server for testing DataSHIELD-derived functions, and will be regularly reset everyday. (username: administrator, password: password)
There are two ways to install dsMTLBase on a well-configured DataSHIELD server. With an administrator account, one could login the backend administration page from the web-browser, and let DataSHIELD server yield dsMTLBase codes from github directly. Alternatively, one could use the script to upload the dsMTLBase functions from the local computer. To use the script successfully, please fill your username, password and server IP in the top lines of the script.
The entire tutorial can be found here. After the login of the administration page, go to "Administration -> DataSHIELD -> Add Package". In the dialog (as shown below), filled with the repository information of dsMTLBase on github ( organization name: transbioZI; package name: dsMTLBase; git branch: main ).
1, Install DataSHIELD server management package opalr in R
install.packages("opalr")
2, Download dsMTLBase sources from github in shell
git clone https://github.com/transbioZI/dsMTLBase.git
cd dsMTLBase
gedit ./inst/uploadFuntions.R
3, Change the server information to yours (server IP, user name and passowrd) and run
Rscript ./inst/uploadFunctions.R
- For small-scale and uncompressed datasets, it is recommended to upload and import directly into DataSHIELD. Check "Management data in Opal" for detail tutorial.
- For large-scale and compressed datasets, i.e. *.rda files in R, it is recommended to attach the data sources using the R package resources. The tutorial can be found here
Cao, H., Zhang, Y., Baumbach, J., Burton, P. R., Dwyer, D., Koutsouleris, N., Matschinske, J., Marcon, Y., Rajan, S., Rieg, T., Ryser-Welch, P., Späth, J., The COMMITMENT Consortium, Herrmann, C., and Schwarz, E. (2022). dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning. Bioinformatics, 38(21), 4919-4926. DOI: 10.1093/bioinformatics/btac616
Han Cao ([email protected])
- dsMTLClient - federated, privacy-preserving machine-learning and multi-task learning analysis: https://github.com/transbioZI/dsMTLClient
- Documents of opal servers: https://opaldoc.obiba.org/en/latest/index.html
- Tutorial of DataSHIELD for beginers: https://data2knowledge.atlassian.net/wiki/spaces/DSDEV/pages/12943395/Beginners+Hub
- Forum of DataSHIELD: https://datashield.discourse.group/
- opalr - an R package for managing DataSHIELD server from script: https://cran.r-project.org/web/packages/opalr/index.html
- resources - an R package for importing data of different sources: https://opaldoc.obiba.org/en/latest/resources.html
- Tutorial of resources: https://rpubs.com/jrgonzalezISGlobal/tutorial_resources
- dsOmics - an R package based on DataSHIELD for omics analysis: https://github.com/isglobal-brge/dsOmics
- Tutorial of omics analysis using dsOmics: https://rpubs.com/jrgonzalezISGlobal/tutorial_DSomics
- Tutorial of omics analysis using dsOmics2: https://htmlpreview.github.io/?https://github.com/isglobal-brge/dsOmicsClient/blob/master/vignettes/dsOmics.html
- A book of DataSHIELD book with detailed explainations of esential packages: https://isglobal-brge.github.io/resource_bookdown/