Skip to content

transbioZI/dsMTLBase

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

45 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

dsMTLBase: dsMTL server site functions

dsMTL (Federated Multi-Task Learning based on DataSHIELD) provided federated, privacy-preserving MTL analysis. dsMTL was developed based on DataSHIELD, an ecosystem supporting the federated analysis of sensitive individual-level data that remains stored behind the data owner’s firewall throughout analysis. Multi-task Learning (MTL) aimed at simultaneously learning the outcome (e.g. diagnosis) associated patterns across datasets with dataset-specific, as well as shared, effects. MTL has numerous exciting application areas, such as comorbidity modeling, and has already been applied successfully for e.g. disease progression analysis.

dsMTL currently includes three supervised and one unsupervised federated multi-task learning as well as two federated machine learning algorithms. Each algorithm captured a specific form of cross-cohort heterogeneity, which was linked to different applications in molecular studies.

Name Type Task Effect
dsLasso ML Classification/Regression Train a Lasso model on the conbained cohorts
dsLassoCov ML Classification/Regression Federated ML model that can capture the covariate effect
dsMTL_L21 MTL Classification/Regression Screen out unimportant features to all tasks
dsMTL_trace MTL Classification/Regression Identify models represented in low-dimentional spcae
dsMTL_net MTL Classification/Regression Incorporate task-relatedness described as a graph
dsMTL_iNMF MTL Matrix factorization Factorize matrices into shared and specific components

Note

In addition to the original architecture for federated MTL approaches as presented in the initial publication (Cao et al., 2022) and described above, which supports the analysis of sensitive individual-level data from geographically distributed data sources using the DataSHIELD platform, the concept of differential privacy has been included into the dsMTL package in 2025. This optional feature offers an additional security mechanism that is specific to the MTL models. In particular, differential privacy can provide a protection against so-called membership inference attacks to the models. Instructions and examples on how to set up the optional differential privacy feature are given in the documentation for dsMTLClient.

Client-side Package of dsMTL

The Client-side package can be found:dsMTLClient

Installation

To enable dsMTLBase functions, the DataSHIELD server has to be installed first. dsMTLBase can be installed smoothly based on DataSHIELD server in several ways.

Requirements

dsMTLBase was tested on the dependent softwares with minimum versions

Opal 3.0.3
dsBase 6.1.0
resourcer 1.0.1
R >= 3.5.0

Install an DataSHIELD server

The complete document for describing the installation of DataSHIELD server from scratch was here. Alternatively, to test dsMTL functions, one was recommonded to dowload a well-configured DataSHIELD server and install locally using Virtualbox. This tutorial was provided by DataSHIELD team. A quick test of dsMTLBase installation is possible based on opal demo server. This is an open server for testing DataSHIELD-derived functions, and will be regularly reset everyday. (username: administrator, password: password)

Install dsMTLBase

There are two ways to install dsMTLBase on a well-configured DataSHIELD server. With an administrator account, one could login the backend administration page from the web-browser, and let DataSHIELD server yield dsMTLBase codes from github directly. Alternatively, one could use the script to upload the dsMTLBase functions from the local computer. To use the script successfully, please fill your username, password and server IP in the top lines of the script.

Using DataSHIELD backend administration page

The entire tutorial can be found here. After the login of the administration page, go to "Administration -> DataSHIELD -> Add Package". In the dialog (as shown below), filled with the repository information of dsMTLBase on github ( organization name: transbioZI; package name: dsMTLBase; git branch: main ).

Using R scripts

1, Install DataSHIELD server management package opalr in R

  install.packages("opalr")

2, Download dsMTLBase sources from github in shell

  git clone https://github.com/transbioZI/dsMTLBase.git
  cd dsMTLBase
  gedit ./inst/uploadFuntions.R

3, Change the server information to yours (server IP, user name and passowrd) and run

  Rscript ./inst/uploadFunctions.R

Data upload, import and management

  1. For small-scale and uncompressed datasets, it is recommended to upload and import directly into DataSHIELD. Check "Management data in Opal" for detail tutorial.
  2. For large-scale and compressed datasets, i.e. *.rda files in R, it is recommended to attach the data sources using the R package resources. The tutorial can be found here

References

Cao, H., Zhang, Y., Baumbach, J., Burton, P. R., Dwyer, D., Koutsouleris, N., Matschinske, J., Marcon, Y., Rajan, S., Rieg, T., Ryser-Welch, P., Späth, J., The COMMITMENT Consortium, Herrmann, C., and Schwarz, E. (2022). dsMTL: a computational framework for privacy-preserving, distributed multi-task machine learning. Bioinformatics, 38(21), 4919-4926. DOI: 10.1093/bioinformatics/btac616

Contact

Han Cao ([email protected])

Useful links

  1. dsMTLClient - federated, privacy-preserving machine-learning and multi-task learning analysis: https://github.com/transbioZI/dsMTLClient
  2. Documents of opal servers: https://opaldoc.obiba.org/en/latest/index.html
  3. Tutorial of DataSHIELD for beginers: https://data2knowledge.atlassian.net/wiki/spaces/DSDEV/pages/12943395/Beginners+Hub
  4. Forum of DataSHIELD: https://datashield.discourse.group/
  5. opalr - an R package for managing DataSHIELD server from script: https://cran.r-project.org/web/packages/opalr/index.html
  6. resources - an R package for importing data of different sources: https://opaldoc.obiba.org/en/latest/resources.html
  7. Tutorial of resources: https://rpubs.com/jrgonzalezISGlobal/tutorial_resources
  8. dsOmics - an R package based on DataSHIELD for omics analysis: https://github.com/isglobal-brge/dsOmics
  9. Tutorial of omics analysis using dsOmics: https://rpubs.com/jrgonzalezISGlobal/tutorial_DSomics
  10. Tutorial of omics analysis using dsOmics2: https://htmlpreview.github.io/?https://github.com/isglobal-brge/dsOmicsClient/blob/master/vignettes/dsOmics.html
  11. A book of DataSHIELD book with detailed explainations of esential packages: https://isglobal-brge.github.io/resource_bookdown/