Skip to content
This repository has been archived by the owner on Apr 19, 2023. It is now read-only.

Develop #131

Merged
merged 18 commits into from
Feb 17, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
35 changes: 35 additions & 0 deletions .github/workflows/mnncorrect.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: mnncorrect

on:
push:
branches:
- master
pull_request:
branches:
- master

jobs:
build:

runs-on: ubuntu-latest

steps:
- uses: actions/checkout@v1
with:
submodules: true
- name: Install Nextflow
run: |
export NXF_VER='19.12.0-edge'
wget -qO- get.nextflow.io | bash
sudo mv nextflow /usr/local/bin/
- name: Get sample data
run: |
mkdir testdata
wget https://raw.githubusercontent.com/aertslab/SCENICprotocol/master/example/sample_data_tiny.tar.gz
tar xvf sample_data_tiny.tar.gz
cp -r sample_data testdata/sample1
mv sample_data testdata/sample2
- name: Run single_sample test
run: |
nextflow run ${GITHUB_WORKSPACE} -profile mnncorrect,test__mnncorrect,docker -entry mnncorrect -ansi-log false
cat .nextflow.log
1,541 changes: 1,541 additions & 0 deletions assets/images/mnncorrect.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
532 changes: 286 additions & 246 deletions assets/images/scenic.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,388 changes: 758 additions & 630 deletions assets/images/scenic_multiruns.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,261 changes: 828 additions & 433 deletions assets/images/single_sample.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,768 changes: 1,045 additions & 723 deletions assets/images/single_sample_scenic.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
28 changes: 28 additions & 0 deletions conf/test__mnncorrect.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@

params {
global {
project_name = 'bbknn_CI'
}
data {
tenx {
cellranger_outs_dir_path = "testdata/*/outs/"
}
}
sc {
file_annotator {
metaDataFilePath = ''
}
scanpy {
filter {
cellFilterMinNGenes = 1
}
dim_reduction {
pca {
dimReductionMethod = 'PCA'
nComps = 2
}
}
}
}
}

33 changes: 27 additions & 6 deletions docs/features.rst
Original file line number Diff line number Diff line change
@@ -1,10 +1,31 @@
Features
=========

Change log fold change and FDR thresholds for markers stored in SCope loom
--------------------------------------------------------------------------
Set the seed
------------
Some steps in the pipelines are nondeterministic. To be able that the results are reproducible in time, by default a seed is set to:

By default, log fold change and FDR thresholds are set to 0 and 0.05 respectively.
.. code:: groovy

workflow.manifest.version.replaceAll("\\.","").toInteger()

The seed is a number derived from the the version of the pipeline used at the time of the analysis run.
To override the seed (integer) you have edit the nextflow.config file with:

.. code:: groovy

params {
global {
seed = [your-custom-seed]
}
}

This filter will only be applied on the final loom file of the VSN-Pipelines. All the intermediate files prior to the loom file will still contain all of them the markers.

Change log fold change (logFC) and false discovery rate (FDR) thresholds for the marker genes stored in the final SCope loom
----------------------------------------------------------------------------------------------------------------------------

By default, the logFC and FDR thresholds are set to 0 and 0.05 respectively.
If you want to change those thresholds applied on the markers genes, edit the ``nextflow.config`` with the following entries,

.. code:: groovy
Expand All @@ -22,8 +43,8 @@ If you want to change those thresholds applied on the markers genes, edit the ``

This filter will only be applied on the final loom file of the VSN-Pipelines. All the intermediate files prior to the loom file will still contain all of them the markers.

Select the optimal number of principal components
-------------------------------------------------
Automated selection of the optimal number of principal components
-----------------------------------------------------------------

When generating the config using ``nextflow config`` (see above), add the ``pcacv`` profile.

Expand Down Expand Up @@ -87,7 +108,7 @@ The latest version only implements this feature for the following pipelines:
- ``single_sample``
- ``bbknn``

Since ``v0.9.0``, it is possible to explore several combinations of parameters. The current version (``v0.9.0``) of the VSN-Pipelines allows to explore the following parameters:
Since ``v0.9.0``, it is possible to explore several combinations of parameters. The latest version of the VSN-Pipelines allows to explore the following parameters:

- ``params.sc.scanpy.clustering``

Expand Down
58 changes: 58 additions & 0 deletions docs/pipelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -182,6 +182,18 @@ The output is a loom file with the results embedded.

.. |Harmony Workflow| image:: https://raw.githubusercontent.com/vib-singlecell-nf/vsn-pipelines/master/assets/images/harmony.svg?sanitize=true

**mnncorrect** |mnncorrect|
-----------------

.. |mnncorrect| image:: https://github.com/vib-singlecell-nf/vsn-pipelines/workflows/mnncorrect/badge.svg

Runs the ``mnncorrect`` workflow (sample-specific filtering, merging of individual samples, normalization, log-transformation, HVG selection, PCA analysis, batch-effect correction (mnnCorrect), clustering, dimensionality reduction (t-SNE and UMAP)).
The output is a loom file with the results embedded.

|mnnCorrect Workflow|

.. |mnnCorrect Workflow| image:: https://raw.githubusercontent.com/vib-singlecell-nf/vsn-pipelines/master/assets/images/mnncorrect.svg?sanitize=true


Input Data Formats
*******************
Expand Down Expand Up @@ -253,3 +265,49 @@ In the generated .config file, make sure the ``file_paths`` parameter is set wit
Make sure that ``sc.file_converter.iff`` is set to ``h5ad``.

Currently H5AD input is only implemented in the ``h5ad_single_sample`` entry point.

TSV
---
::

-profiles tsv


In the generated .config file, make sure the ``file_paths`` parameter is set with the paths to the ``.tsv`` files::

[...]
h5ad {
file_paths = "data/1k_pbmc_v*_chemistry_SUFFIX.SC__FILE_CONVERTER.tsv"
suffix = "_SUFFIX.SC__FILE_CONVERTER.tsv"
}
[...]

- The ``suffix`` parameter is used to infer the sample name from the file paths (it is removed from the input file path to derive a sample name).
- The ``file_paths`` accepts glob patterns and also comma separated paths.

Make sure that ``sc.file_converter.iff`` is set to ``tsv``.

Currently H5AD input is only implemented in the ``tsv_single_sample`` entry point.

CSV
---
::

-profiles csv


In the generated .config file, make sure the ``file_paths`` parameter is set with the paths to the ``.csv`` files::

[...]
h5ad {
file_paths = "data/1k_pbmc_v*_chemistry_SUFFIX.SC__FILE_CONVERTER.csv"
suffix = "_SUFFIX.SC__FILE_CONVERTER.csv"
}
[...]

- The ``suffix`` parameter is used to infer the sample name from the file paths (it is removed from the input file path to derive a sample name).
- The ``file_paths`` accepts glob patterns and also comma separated paths.

Make sure that ``sc.file_converter.iff`` is set to ``csv``.

Currently H5AD input is only implemented in the ``csv_single_sample`` entry point.
43 changes: 42 additions & 1 deletion main.nf
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,18 @@ import static groovy.json.JsonOutput.*

nextflow.preview.dsl=2

if(!params.global.containsKey('seed')) {
params.seed = workflow.manifest.version.replaceAll("\\.","").toInteger()

Channel.from('').view {
"""
------------------------------------------------------------------
\u001B[32m No seed detected in the config \u001B[0m
\u001B[32m To ensure reproducibility the seed has been set to ${params.seed} \u001B[0m
------------------------------------------------------------------
"""
}
}

// run multi-sample with bbknn, output a scope loom file
workflow bbknn {
Expand Down Expand Up @@ -89,6 +101,13 @@ workflow cellranger {

}

workflow cellranger_metadata {

include CELLRANGER_COUNT_WITH_METADATA from './src/cellranger/workflows/cellRangerCountWithMetadata' params(params)
CELLRANGER_COUNT_WITH_METADATA(file(params.sc.cellranger.count.metadata))

}


// runs mkfastq, CellRanger count, then single_sample:
workflow single_sample_cellranger {
Expand All @@ -100,7 +119,7 @@ workflow single_sample_cellranger {

workflow h5ad_single_sample {

include getChannel as getH5ADChannel from './src/channels/h5ad' params(params)
include getChannel as getH5ADChannel from './src/channels/file' params(params)
include single_sample as SINGLE_SAMPLE from './workflows/single_sample' params(params)
data = getH5ADChannel(
params.data.h5ad.file_paths,
Expand All @@ -109,6 +128,28 @@ workflow h5ad_single_sample {

}

workflow tsv_single_sample {

include getChannel as getTSVChannel from './src/channels/file' params(params)
include single_sample as SINGLE_SAMPLE from './workflows/single_sample' params(params)
data = getTSVChannel(
params.data.tsv.file_paths,
params.data.tsv.suffix
).view() | SINGLE_SAMPLE

}

workflow csv_single_sample {

include getChannel as getCSVChannel from './src/channels/file' params(params)
include single_sample as SINGLE_SAMPLE from './workflows/single_sample' params(params)
data = getCSVChannel(
params.data.csv.file_paths,
params.data.csv.suffix
).view() | SINGLE_SAMPLE

}


workflow star {

Expand Down
27 changes: 26 additions & 1 deletion nextflow.config
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ manifest {
name = 'vib-singlecell-nf/vsn-pipelines'
description = 'A repository of pipelines for single-cell data in Nextflow DSL2'
homePage = 'https://github.com/vib-singlecell-nf/vsn-pipelines'
version = '0.10.0'
version = '0.11.0'
mainScript = 'main.nf'
defaultBranch = 'master'
nextflowVersion = '!19.12.0-edge' // with ! prefix, stop execution if current version does not match required version.
Expand Down Expand Up @@ -63,14 +63,17 @@ profiles {
includeConfig 'src/star/star.config'
}
bbknn {
includeConfig 'src/utils/conf/h5ad_concatenate.config'
includeConfig 'src/scanpy/scanpy.config'
includeConfig 'src/scanpy/conf/bbknn.config'
}
mnncorrect {
includeConfig 'src/utils/conf/h5ad_concatenate.config'
includeConfig 'src/scanpy/scanpy.config'
includeConfig 'src/scanpy/conf/mnncorrect.config'
}
harmony {
includeConfig 'src/utils/conf/h5ad_concatenate.config'
includeConfig 'src/scanpy/scanpy.config'
includeConfig 'src/harmony/harmony.config'
}
Expand Down Expand Up @@ -111,9 +114,19 @@ profiles {
includeConfig 'src/star/star.config'
includeConfig 'src/dropletutils/dropletutils.config'
}

cellranger {
includeConfig 'src/cellranger/cellranger.config'
}
cellranger_count {
includeConfig 'src/cellranger/conf/base.config'
includeConfig 'src/cellranger/conf/count.config'
}
cellranger_count_metadata {
includeConfig 'src/cellranger/conf/base.config'
includeConfig 'src/cellranger/conf/count.config'
includeConfig 'src/cellranger/conf/count_metadata.config'
}

// data profiles
tenx {
Expand All @@ -122,6 +135,12 @@ profiles {
h5ad {
includeConfig 'src/channels/conf/h5ad.config'
}
tsv {
includeConfig 'src/channels/conf/tsv.config'
}
csv {
includeConfig 'src/channels/conf/csv.config'
}
sra {
includeConfig 'src/channels/conf/sra.config'
includeConfig 'src/utils/conf/sra_metadata.config'
Expand Down Expand Up @@ -189,11 +208,17 @@ profiles {
includeConfig 'conf/test__single_sample_scenic_multiruns.config'
}
test__bbknn {
includeConfig 'src/utils/conf/h5ad_concatenate.config'
includeConfig 'conf/test__bbknn.config'
}
test__harmony {
includeConfig 'src/utils/conf/h5ad_concatenate.config'
includeConfig 'conf/test__harmony.config'
}
test__mnncorrect {
includeConfig 'src/utils/conf/h5ad_concatenate.config'
includeConfig 'conf/test__mnncorrect.config'
}

}

Expand Down
13 changes: 13 additions & 0 deletions src/channels/conf/csv.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
params {
data {
csv {
file_paths = ''
suffix = '.csv'
}
}
sc {
file_converter {
iff = 'csv'
}
}
}
13 changes: 13 additions & 0 deletions src/channels/conf/tsv.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
params {
data {
h5ad {
file_paths = ''
suffix = '.tsv'
}
}
sc {
file_converter {
iff = 'tsv'
}
}
}
10 changes: 1 addition & 9 deletions src/channels/h5ad.nf → src/channels/file.nf
Original file line number Diff line number Diff line change
@@ -1,14 +1,6 @@
nextflow.preview.dsl=2

def extractSample(path, suffix) {
if(!path.endsWith(".h5ad"))
throw new Exception("Wrong channel used for data: "+ path)
// Extract the sample name based on the given path and on the given suffix
suffix = suffix.replace(".","\\.")
pattern = /(.+)\/(.+)${suffix}/
(full, parentDir, id) = (path =~ pattern)[0]
return id
}
include '../utils/processes/files.nf'

workflow getChannel {

Expand Down
2 changes: 1 addition & 1 deletion src/pcacv
2 changes: 1 addition & 1 deletion src/scenic
Loading