Skip to content

Commit

Permalink
test: standardized usage (#118)
Browse files Browse the repository at this point in the history
* docs: update main README with new test paths

* ci: update tests

* refactor: add  .upper() to adapter retrieve

* docs: update main README

* refactor: add convert_lib_format function

* style: apply snakemake formatting

* ci: swap conda and singularity tests
  • Loading branch information
deliaBlue authored Nov 16, 2023
1 parent 7d70304 commit 4ce5e60
Show file tree
Hide file tree
Showing 9 changed files with 249 additions and 34 deletions.
51 changes: 38 additions & 13 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,17 +48,8 @@ jobs:
working-directory: ./scripts
run: pylint --rcfile=../pylint.cfg ./*.py

- name: Check workflow descriptor files for lints
working-directory: ./test
run: |
snakemake --snakefile="../workflow/Snakefile" --configfile="config.yaml" --lint
snakemake --snakefile="../workflow/rules/common.smk" --configfile="config.yaml" --lint
snakemake --snakefile="../workflow/rules/prepare.smk" --configfile="config.yaml" --lint
snakemake --snakefile="../workflow/rules/map.smk" --configfile="config.yaml" --lint
snakemake --snakefile="../workflow/rules/quantify.smk" --configfile="config.yaml" --lint
snakemake-test:

snakemake-format-graph-test:
runs-on: ubuntu-latest
defaults:
run:
Expand All @@ -77,17 +68,51 @@ jobs:
environment-file: environment.yml
auto-activate-base: false

- name: update mirflowz env with root packages
run: mamba env update -n mirflowz -f environment.root.yml
- name: update mirflowz env with dev packages
run: mamba env update -n mirflowz -f environment.dev.yml

- name: display environment info
run: |
conda info -a
conda list
- name: run test for snakemake format
run: bash test/test_snakefmt.sh

- name: run test for snakemate lint
run: bash test/test_snakemake_lint.sh

- name: run test for rule graph
run: bash test/test_rule_graph.sh


snakemake-integration-test:
runs-on: ubuntu-latest
defaults:
run:
shell: bash -l {0}

steps:

- name: check out repository
uses: actions/checkout@v4

- name: setup Conda/Mamba
uses: conda-incubator/setup-miniconda@v2
with:
mamba-version: "*"
activate-environment: mirflowz
environment-file: environment.yml
auto-activate-base: false

- name: update mirflowz env with root packages
run: mamba env update -n mirflowz -f environment.root.yml

- name: display environment info
run: |
conda info -a
conda list
- name: run local test with Singularity
run: bash test/test_workflow_local_with_singularity.sh

Expand Down
5 changes: 5 additions & 0 deletions .snakemake-workflow-catalog.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
usage:
software-stack-deployment:
conda: true
singularity: true
report: true
32 changes: 16 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ _MIRFLOWZ_ is a [Snakemake][snakemake] workflow for mapping miRNAs and isomiRs.
## Installation

The workflow lives inside this repository and will be available for you to run
after following the installation instructions layed out in this section.
after following the installation instructions laid out in this section.

### Cloning the repository

Expand Down Expand Up @@ -58,8 +58,8 @@ conda env create -f environment.yml
conda activate mirflowz
```

If you plan to run _MIRFLOWZ_ via Conda, we recommend to use the following
command for a faster environment creation specially if it you will run it on a
If you plan to run _MIRFLOWZ_ via Conda, we recommend using the following
command for a faster environment creation, specially if you will run it on an
HPC cluster.

```bash
Expand Down Expand Up @@ -157,7 +157,7 @@ tested, you can go ahead and run the workflow on your samples.

It is suggested to have all the input files for a given run (or hard links
pointing to them) inside a dedicated directory, for instance under the
_MIRFLOWZ_ root directory. This way it is easier to keep the data together,
_MIRFLOWZ_ root directory. This way, it is easier to keep the data together,
reproduce an analysis and set up Singularity access to them.

#### 1. Prepare a sample table
Expand All @@ -170,13 +170,12 @@ touch path/to/your/sample/table.tsv
```
> Fill the sample table according to the following requirements:
>
> - `sample`. This column contains the library name.
> - `sample_file`. In this column, you must provide the path to the library file.
> The path must be relative to the working directory.
> - `adapter`. This field must contain the adapter sequence in capital letters.
> - `format`. In this field you must state the library format. It can either be
> `fa` if providing a FASTA file or `fastq` if the library is a FASTQ file.
>
> - `sample`. Arbitrary name for the miRNA sequencing library.
> - `sample_file`. Path to the miRNA sequencing library file. The path must be
> relative to the directory where the workflow will be run.
> - `adapter`. Sequence of the 3'-end adapter used during library preparation.
> - `format`. One of `fa`/`fasta` or `fq`/`fastq`, if the library file is in
> FASTA or FASTQ format, respectively.
#### 2. Prepare genome resources

Expand All @@ -190,15 +189,16 @@ There are 4 files you must provide:

> _MIRFLOWZ_ expects both the reference sequence and gene annotation files to
> follow [Ensembl][ensembl] style/formatting. If you obtained these files from
> a source other than Ensembl, you may first need to convert them to the
> expected style to avoid issues!
> a source other than Ensembl, you must ensure that they adhere to the
> expected format by converting them, if necessary.
3. An **uncompressed GFF3** file with **microRNA annotations** for the reference
sequences above.

> _MIRFLOWZ_ expects the miRNA annotations to follow [miRBase][mirbase]
> style/formatting. If you obtained this file from a source other than miRBase,
> you may first need to convert it to the expected style to avoid issues!
> you must ensure that it adheres to the expected format by converting it, if
> necessary.
4. An **uncompressed tab-separated file** with a **mapping between the
reference names** used in the miRNA annotation file (column 1; "UCSC style")
Expand All @@ -223,7 +223,7 @@ cp config/config_template.yaml path/to/config.yaml

Open the new copy in your editor of choice and adjust the configuration
parameters to your liking. The template explains what each of the
parameters means and how you can meaningfully adjust them.
parameters mean and how you can meaningfully adjust them.

### Running the workflow

Expand All @@ -243,7 +243,7 @@ snakemake \
```

> **NOTE:** Depending on your working directory, you do not need to use the
> parameters `--snakefile` and `--configfile`. For instance, if the `Snakefile`
> parameters `--snakefile` and `--configfile`. For instance, if the `Snakefile`
> is in the same directory or the `workflow/` directory is beneath the current
> working directory, there's no need for the `--snakefile` directory. Refer to
> the [Snakemake documentation][snakemakeDocu] for more information.
Expand Down
125 changes: 125 additions & 0 deletions config/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
# Dependencies installation

Create and activate the virtual environment with the required dependencies
with Conda:

```bash
conda env create -f environment.yml
conda activate mirflowz
```

If you plan to run _MIRFLOWZ_ via Conda, we recommend using the following
command for a faster environment creation, specially if you will run it on an
HPC cluster.

```bash
conda config --set channel_priority strict
```

For a faster creation of the environment (and Conda environments in general),
you can also install [Mamba][mamba] on top of Conda. In that case, replace
`conda` with `mamba` in the commands above (particularly in
`conda env create`).

## Running _MIRFLOWZ_ with Singularity

If you want to run _MIRFLOWZ_ via Singularity and do not already
have it installed globally on your system, you must further update the Conda
environment with:

```bash
conda env update -f environment.root.yml
```

> Mind that you must have the environment activated and root permissions on
> your system to install Singularity. If you want to run _MIRFLOWZ_ on an HPC
> cluster (recommended in almost all cases), ask your system administrator
> about Singularity.
# Run the workflow on your own samples

In order to run _MIRFLOWZ_ on your own samples, we recommend having all the
input files inside a dedicated directory. This way, it is easier to keep the
data together and reproduce an analysis. Assuming that your current directory
is the repository's root directory, create a directory to store all your data
and traverse to it with:

```bash
mkdir path/to/your_run
cd path/to/your_run
```

## 1. Prepare the sample table

Create an empty sample table. Refer to the
[sample.tsv](../test/test_files/samples_table.tsv) test file to see what the
table must look like or use it as a template.

```bash
touch samples.tsv
```

> Fill the sample table according to the following requirements:
>
> - `sample`. Arbitrary name for the miRNA sequencing library.
> - `sample_file`. Path to the miRNA sequencing library file. The path must be
> relative to the directory where the workflow will be run.
> - `adapter`. Sequence of the 3'-end adapter used during library preparation.
> - `format`. One of `fa`/`fasta` or `fq`/`fastq`, if the library file is in
> FASTA or FASTQ format, respectively.
## 2. Prepare the genome resources

There are 4 files you must provide:

1. A **`gzip`ped FASTA** file containing **reference sequences**, typically the
genome of the source/organism from which the library was extracted.

2. A **`gzip`ped GTF** file with matching **gene annotations** for the
reference sequences above.

> _MIRFLOWZ_ expects both the reference sequence and gene annotation files to
> follow [Ensembl][ensembl] style/formatting. If you obtained these files from
> a source other than Ensembl, you must ensure that they adhere to the
> expected format by converting them, if necessary.
3. An **uncompressed GFF3** file with **microRNA annotations** for the reference
sequences above.

> _MIRFLOWZ_ expects the miRNA annotations to follow [miRBase][mirbase]
> style/formatting. If you obtained this file from a source other than miRBase,
> you must ensure that it adheres to the expected format by converting it, if
> necessary.

4. An **uncompressed tab-separated file** with a **mapping between the
reference names** used in the miRNA annotation file (column 1; "UCSC style")
and in the gene annotations and reference sequence files (column 2; "Ensembl
style"). Values in column 1 are expected to be unique, no header is
expected, and any additional columns will be ignored. [This
resource][chrMap] provides such files for various organisms, and in the
expected format.

> General note: If you want to process the genome resources before use (e.g.,
> filtering), you can do that, but make sure the formats of any modified
> resource files meet the formatting expectations outlined above!
## 3. Prepare the configuration file

We recommend creating a copy of the
[configuration file template](config_template.yaml).

```bash
cp ../config/config_template.yaml config.yaml

```

Open the new copy in your editor of choice and adjust the configuration
parameters to your liking. The template explains what each of the parameters
mean and how you can meaningfully adjust them.


[chrMap]: <https://github.com/dpryan79/ChromosomeMappings>
[ensembl]: <https://ensembl.org/>
[mamba]: <https://github.com/mamba-org/mamba>
[mirbase]: <https://mirbase.org/>
2 changes: 1 addition & 1 deletion config/config_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -99,4 +99,4 @@
"default": ["isomir", "mirna", "pri-mir"]
}
}
}
}
20 changes: 20 additions & 0 deletions test/test_snakefmt.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#!/bin/bash

# Tear down test environment
cleanup () {
rc=$?
cd $user_dir
echo "Exit status: $rc"
}
trap cleanup EXIT

# Set up test environment
set -eo pipefail # ensures that script exits at first command that exits with non-zero status
set -u # ensures that script exits when unset variables are used
set -x # facilitates debugging by printing out executed commands
user_dir=$PWD
script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)"
cd $script_dir

# Run tests
snakefmt --check -l 80 ../workflow
26 changes: 26 additions & 0 deletions test/test_snakemake_lint.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
#!/bin/bash

# This script is currently exiting with non-zero status.
# This is expected behaviour though, as several parameters can't be inferred from the test files.

# Tear down test environment
cleanup () {
rc=$?
cd $user_dir
echo "Exit status: $rc"
}
trap cleanup EXIT

# Set up test environment
set -eo pipefail # ensures that script exits at first command that exits with non-zero status
set -u # ensures that script exits when unset variables are used
set -x # facilitates debugging by printing out executed commands
user_dir=$PWD
script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" >/dev/null 2>&1 && pwd)"
cd $script_dir

# Run tests
snakemake \
--snakefile="../workflow/Snakefile" \
--configfile="config.yaml" \
--lint
13 changes: 13 additions & 0 deletions workflow/rules/common.smk
Original file line number Diff line number Diff line change
Expand Up @@ -11,3 +11,16 @@ def get_sample(column_id: str, sample_id: int = None) -> str:
)
else:
return str(samples_table[column_id].iloc[0])


def convert_lib_format(lib_format: str) -> str:
"""Convert library file format."""
formats = {
"fa": "fa",
"fasta": "fa",
"FASTA": "fa",
"fq": "fastq",
"fastq": "fastq",
"FASTQ": "fastq",
}
return formats[lib_format]
Loading

0 comments on commit 4ce5e60

Please sign in to comment.