Skip to content

Commit

Permalink
Merge pull request #7 from jlab/dev
Browse files Browse the repository at this point in the history
dev
  • Loading branch information
tensulin authored Nov 11, 2024
2 parents e2a70d2 + 941574e commit 4b11aa2
Show file tree
Hide file tree
Showing 10 changed files with 561 additions and 94 deletions.
6 changes: 5 additions & 1 deletion .github/workflows/github_tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,10 @@ jobs:
steps:
- name: Checkout Repo
uses: actions/checkout@v2
with:
lfs: true
- name: Checkout LFS objects
run: git lfs checkout

- name: Set up Python
uses: actions/setup-python@v2
Expand All @@ -27,7 +31,7 @@ jobs:
- name: Run tests with pytest
run: |
$CONDA/bin/pytest tests --doctest-modules --cov=src/marbel --cov-report=xml
$CONDA/bin/pytest tests --doctest-modules --cov=src/marbel --cov-report=xml
- name: Convert coverage to lcov format
run: |
Expand Down
139 changes: 107 additions & 32 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,47 +2,72 @@

# marbel (MetAtranscriptomic Reference Builder Evaluation Library)

This project generates an in silico metatranscriptomic dataset based on specified parameters.
This project generates an *in silico* metatranscriptomic dataset based on specified parameters.

## Installation

### Conda build and install (recommended)
### Install guide for development purposes

It is recomended to install the package with conda install.
#### Install git-lfs (absolutely necessary)

Build the package with:
Before cloning the repo you need to have git-lfs installed! If you do not have git-lfs and root rights install with

`conda build . `
```
sudo apt-get install git-lfs
```

For this you need to have conda-build installed `(conda install conda-build`)
If you already cloned the repo, remove it, install git-lfs and clone again.

Create new environment and install package:
#### Install miniconda (if not installed already)

```
conda create -n marbel
conda activate marbel
conda install --use-local marbel
```
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
### Install by hand (for development purposes)
bash Miniconda3-latest-Linux-x86_64.sh
```

You need to install [R](https://www.r-project.org/about.html) and the R library polyester. Polyester can be installed with
#### Create conda env

```
R
if (!require("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("polyester")
conda create -n marbel python=3.10 r-base
conda activate marbel
```

#### Instal g++ (Optional, for performance)

```
sudo apt-get install g++
```

#### Clone repository

Install the package:
git clone https://github.com/jlab/marbel.git

#### Install the package:

```
cd marbel
pip install -e .
```

### (Not ready, this is for later) nda build and install

It is recomended to install the package with conda install.

Build the package with:

`conda build . `

For this you need to have conda-build installed `(conda install conda-build`)

Create new environment and install package:

```
conda create -n marbel
conda activate marbel
conda install --use-local marbel
```

## Usage

To get help on how to use the script, run:
Expand All @@ -54,20 +79,70 @@ marbel --help
### Command Line Arguments

```
Usage: marbel [OPTIONS]
Options:
--n-species INTEGER Number of species to be drawn for the metatranscriptomic in silico dataset [default: 20]
--n-orthogroups INTEGER Number of orthologous groups to be drawn for the metatranscriptomic in silico dataset [default: 1000]
--n-samples <INTEGER INTEGER>... Number of samples to be created for the metatranscriptomic in silico datasetthe first number is the number of samples for group 1 and the second is the number of samples for group 2 [default: 10, 10]
--outdir TEXT Output directory for the metatranscriptomic in silico dataset [default: simulated_reads]
--max-phylo-distance TEXT Maximum mean phylogenetic distance for orthologous groups. Specify stricter limit to avoid groups with a more diverse phylogenetic distance. [default: None]
--min-identity FLOAT Minimum mean sequence identity score for orthologous groups. Specify for more stringent identity requirements. [default: None]
--deg-ratio <FLOAT FLOAT>... Ratio of up- and down-regulated genes. The first value is the ratio of up-regulated genes, the second represents the ratio of down-regulated genes [default: 0.1, 0.1]
--seed INTEGER Seed for sampling. Set for reproducibility [default: None]
--read-length INTEGER Read length for the generated reads [default: 100]
--output-format [fastq.gz|fastq|fasta] Output format for the reads [default: fastq.gz]
--version Show the version and exit.
--help Show this message and exit.
# Usage: marbel [OPTIONS]
## Options:
- `--n-species` **INTEGER**
Number of species to be drawn for the metatranscriptomic in silico dataset.
**[default: 20]**
- `--n-orthogroups` **INTEGER**
Number of orthologous groups to be drawn for the metatranscriptomic in silico dataset.
**[default: 1000]**
- `--n-samples` **<INTEGER INTEGER>...**
Number of samples to be created for the metatranscriptomic in silico dataset. The first number represents the number of samples for group 1, and the second is for group 2.
**[default: 10, 10]**
- `--outdir` **TEXT**
Output directory for the metatranscriptomic in silico dataset.
**[default: simulated_reads]**
- `--max-phylo-distance` **[phylum|class|order|family|genus]**
Maximum mean phylogenetic distance for orthologous groups. Specify a stricter limit to avoid groups with a more diverse phylogenetic distance.
**[default: None]**
- `--min-identity` **FLOAT**
Minimum mean sequence identity score for orthologous groups. Specify for more stringent identity requirements.
**[default: None]**
- `--dge-ratio` **FLOAT**
Ratio of up- and down-regulated genes. The first value is the ratio of up-regulated genes, and the second represents the ratio of down-regulated genes.
**[default: 0.1]**
- `--seed` **INTEGER**
Seed for sampling. Set for reproducibility.
**[default: None]**
- `--error-model` **[basic|perfect|HiSeq|NextSeq|NovaSeq|Miseq-20|Miseq-24|Miseq-28|Miseq-32]**
Sequencer model for the reads. Use `basic` or `perfect` (no errors) for custom read length.
**[default: HiSeq]**
- `--compressed / --no-compressed`
Compress the output FASTQ files.
**[default: compressed]**
- `--read-length` **INTEGER**
Read length for the generated reads. Only available when using `error_model` basic or perfect.
**[default: None]**
- `--library-size` **INTEGER**
Library size for the reads.
**[default: 100000]**
- `--library-size-distribution` **[poisson|uniform|negative_binomial]**
Distribution for the library size.
**[default: uniform]**
- `--threads` **INTEGER**
Number of threads to be used.
**[default: 10]**
- `--version`
Show the version and exit.
- `--help`
Show this message and exit.
```

Expand Down
5 changes: 5 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,12 @@ channels:
- conda-forge
- defaults
dependencies:
- pandas
- numpy
- flake8
- pytest
- pytest-cov
- coverage >= 6 # to ensure lcov option is available
- pip:
- ./

5 changes: 4 additions & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ build-backend = "hatchling.build"

[project]
name = "marbel"
version = "0.0.1"
version = "0.0.2"
authors = [
{ name="Timo Wentong Lin", email="[email protected]" },

Expand Down Expand Up @@ -41,3 +41,6 @@ include = [
"src/marbel/data/orthologues_processed_combined_all.parquet",
"src/marbel/data/EDGAR_all_species.newick",
]

[tool.hatch.metadata]
allow-direct-references = true
5 changes: 3 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
arviz==0.18.0
arviz
pymc
typer
rpy2
biopython
pyarrow
typing_extensions
ete3
ete3
InSilicoSeq @ git+https://github.com/jlab/InSilicoSeq.git
Loading

0 comments on commit 4b11aa2

Please sign in to comment.