Skip to content
Arno van Hilten edited this page Jul 12, 2024 · 14 revisions

Welcome to the GenNet-multi-omic wiki!

Since there is not a one-size fits all solution for most multi-omics data, one should preprocess the him/herself. For this multi-omic network one needs to provide two inputs: methylation and gene-expression data. However, all types of inputs can be used as long as you can group your inputs per gene.

The information to group your omic (e.g., how to group your SNPs, CpGs, genes and/or pathways is also unique). This framework allows you to shape your network the way you want it. You can shape you network by providing sparse matrices with zeros and ones, where each input is a row and each output is a column. Simply add a one if a input (row) should be connected to an ouput neuron (column).

Mask

The custom connections are defined by the mask input, a sparse (COO) connectivity matrix.

The matrix has the shape of (N_nodes_layer_1, N_nodes_layer_2). It is a sparse matrix with zeros for no connections and ones if there is a connections. For example.

            output
          1 2 3 4 5
input 1 | 1 0 0 0 0 |
input 2 | 1 1 0 0 0 |
input 3 | 0 1 0 0 0 |
input 4 | 0 1 0 0 0 |
input 5 | 0 0 1 0 0 |
input 6 | 0 0 0 1 0 |
input 7 | 0 0 0 1 0 |

This connects the first two inputs (1,2) to the first neuron in the second layer. Connects input 2,3 and 4 to output neuron 2. Connects input 5 to output neuron 3 Connects input 6 and 7 o the 4th neuron in the subsequent layer Connects nothing to the 5th neuron

Select a appropriate model from the classification and regression models in the models folder. Replace the link to the masks (a COO matrix described above). Then proceed to running the model.

Data format

Note

For a cohort-wise cross-validation, normalize per cohort. Take care that you standardize/normalize the training data and not the training and test data together!

Prepare the following files:

  • Methylation.h5 : Each colum is an input feature and each row is an individual. This should be preprocessed data.
  • GeneExpression.h5 : Each colum is an input feature and each row is an individual. This should be preprocessed data.
  • ytrain_<phenotype>_<fold>.csv : comma seperated file with patient information
  • yval_<phenotype>_<fold>.csv : comma seperated file with patient information
  • ytest_<phenotype>_<fold>.csv : comma seperated file with patient information

For the ytrain, yval and ytest, prepare a split in the data for training, validation and test for each fold. Fill in with your phenotype name. These .csv files should contain a column labels for the ground truth annotations and a column row with the indices of the individual in the .h5 files. Other columns should contain covariates.

Classification

For classification tasks use the GenNet_ME_GE_classification.py, this includes standard networks and the workflow for a classification task.

usage: GenNet_ME_GE_classification.py [-h] -j J [-lr LR] [-bs BS] [-l1 L1] [-mt MT] [-pn PN] [-fold FOLD] [-omic_l1 OMIC_L1] [-datapath DATAPATH]

usage: GenNet_ME_GE_classification.py [-h] -j J [-lr LR] [-bs BS] [-l1 L1] [-mt MT] [-pn PN] [-fold FOLD] [-omic_l1 OMIC_L1] [-datapath DATAPATH]

optional arguments:
  -h, --help          show this help message and exit
  -j J                jobid: identifier for the experiment (experiment number, must be int)
  -lr LR              learning rate : float
  -bs BS              batch size: integer
  -l1 L1              L1 penalty, must be a float
  -mt MT              Network name, select a name from models
  -pn PN              phenotype name
  -fold FOLD          fold number, must be integer
  -omic_l1 OMIC_L1    Omic-specific L1 penalty
  -datapath DATAPATH  Path to processed data

Regression

GenNet_ME_GE_Regression.py contains the workflow for regression tasks. Using GenNet_ME_GE_Regression.py --help will provide more information about the inputs

usage: GenNet_ME_GE_Regression.py [-h] [-j J] [-lr LR] [-bs BS] [-l1 L1] [-mt MT] [-pn PN] [-fold FOLD] [-omic_l1 OMIC_L1] [-datapath DATAPATH]

Argument parser for the experiment

optional arguments:
  -h, --help          show this help message and exit
  -j J                jobid: identifier for the experiment (experiment number, must be int)
  -lr LR              learning rate : float
  -bs BS              batch size: integer
  -l1 L1              L1 penalty, must be a float
  -mt MT              Network name select a name from models
  -pn PN              phenotype name
  -fold FOLD          fold number, must be integer
  -omic_l1 OMIC_L1    Omic-specific L1 penalty
  -datapath DATAPATH  Path to processed data

Creating a custom model:

If you have methylation and gene expression inputs as in our study you can select a model from ClassicationModels or RegressionModels. To create an interpretable neural networks with your multi-omics data you can modify the given neural networks for your needs. Underneath you will find an annotated network to help you get started:

def GenNet_classification_model_two_omics(inputsize_ME, inputsize_GE, l1_value):
    """
    Generates a classification model that integrates two different omics data: methylation (ME) and gene expression (GE).
    
    Parameters:
    inputsize_ME (int): The input size for methylation data.
    inputsize_GE (int): The input size for gene expression data.
    l1_value (float): The L1 penalty value for the regularizer on the weights to make the model more interpretable.
    
    Note:
    - Add an input_size_omic_3 if you have three different omics.
    - This network has a methylation input (ME) and a gene expression input (GE).
    """

    # Load the mask that describes how methylations should be grouped into genes
    mask_meth = scipy.sparse.load_npz(datapath + '/ME_gene.npz')   
    # Load the mask combining the methylation genes and the gene expression genes
    combine_mask = scipy.sparse.load_npz(datapath + '/ME_GE_gene.npz')

    # Define the inputs for GE
    input_GE = K.Input(inputsize_GE)
    # Define the inputs for ME
    input_ME = K.Input(inputsize_ME)

    # The layer that groups methylation into genes
    # Each input needs to be (size, 1) for the LocallyDirectedConnected layer
    gene_layer_ME = K.layers.Reshape(input_shape=(inputsize_ME,), target_shape=(inputsize_ME, 1))(input_ME)
    # Normalization layer for methylation input
    gene_layer_ME = K.layers.BatchNormalization(center=False, scale=False, name="inter_out_gene_layer_me")(gene_layer_ME)
    # The sparse layer that connects input CpGs to genes (methylation only representation)
    gene_layer_ME = LocallyDirectedConnected.LocallyDirected1D(mask=mask_meth, filters=1,input_shape=(inputsize_ME, 1), name="gene_layer_me")(gene_layer_ME)
    # Activation function for methylation input
    gene_layer_ME = K.layers.Activation("tanh", name="activation_ME")(gene_layer_ME)

    # Reshape the gene expression input
    input_GE_reshape = K.layers.Reshape(input_shape=(inputsize_GE,), target_shape=(inputsize_GE, 1))(input_GE)
    
    # Combine methylation and gene expression inputs
    combined = K.layers.concatenate([gene_layer_ME, input_GE_reshape], axis=1)
    # Normalization layer for combined inputs
    combined = K.layers.BatchNormalization(center=False, scale=False, name="inter_out_combined")(combined)

    # The sparse layer that connects combined inputs to genes
    gene_layer = LocallyDirectedConnected.LocallyDirected1D(mask=combine_mask, filters=1, name="gene_layer")(combined)
    # Flatten the layer output
    gene_layer = K.layers.Flatten()(gene_layer)
    # Activation function for combined inputs
    gene_layer = K.layers.Activation("tanh", name="activation_ME_GE")(gene_layer)
    # Final normalization layer
    gene_layer = K.layers.BatchNormalization(center=False, scale=False, name="inter_out_gene")(gene_layer)

    # Output node with L1 regularization
    end_node = K.layers.Dense(units=1, name="end_node", kernel_regularizer=tf.keras.regularizers.l1(l1_value))(gene_layer)
    # Sigmoid activation function for output node
    end_node = K.layers.Activation("sigmoid", name="activation_end")(end_node)
    # Flatten the final output
    end_node = K.layers.Flatten()(end_node)

    # Define the model with ME and GE inputs
    model = K.Model(inputs=[input_GE, input_ME], outputs=end_node)
    return model

Clone this wiki locally