Skip to content

Commit

Permalink
Merge branch 'main' of https://github.com/gymrek-lab/pheno_sim into n…
Browse files Browse the repository at this point in the history
…ew_plotting
  • Loading branch information
RossDeVito committed Oct 31, 2024
2 parents 2b69139 + e2d2f7e commit b7352b9
Show file tree
Hide file tree
Showing 30 changed files with 4,172 additions and 2,813 deletions.
100 changes: 100 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
name: Tests

on: [pull_request, workflow_call]

concurrency:
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref }}
cancel-in-progress: true

jobs:
tests:
name: ${{ matrix.session }} / ${{ matrix.python }} / ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
fail-fast: false
matrix:
include:
- { python: "3.11", os: "ubuntu-latest", session: "tests" }
- { python: "3.12", os: "ubuntu-latest", session: "tests" }

env:
NOXSESSION: ${{ matrix.session }}
FORCE_COLOR: "1"
PRE_COMMIT_COLOR: "always"

steps:
- name: Check out the repository
uses: actions/checkout@v4

- name: Setup Mambaforge
uses: conda-incubator/setup-miniconda@v3
with:
activate-environment: citrus
miniforge-variant: Mambaforge
auto-activate-base: false
miniforge-version: latest
use-mamba: true

- name: Get Date
id: get-date
run: echo "today=$(/bin/date -u '+%Y%m%d')" >> $GITHUB_OUTPUT
shell: bash

- name: Cache Conda env
uses: actions/cache@v4
with:
path: ${{ env.CONDA }}/envs
key:
conda-${{ runner.os }}--${{ runner.arch }}--${{ steps.get-date.outputs.today }}-${{ hashFiles('dev-env.yml') }}-${{ env.CACHE_NUMBER }}
env:
# Increase this value to reset cache if dev-env.yml has not changed
CACHE_NUMBER: 0
id: cache

- name: Install dev environment
run:
mamba env update -n citrus -f dev-env.yml
if: steps.cache.outputs.cache-hit != 'true'

- name: Try to build citrus
shell: bash -el {0}
run: |
poetry build --no-ansi
- name: Check distribution size
if: matrix.session == 'size'
run: |
du -csh dist/*
tar -ztvf dist/*.tar.gz
# check that the generated dist/ directory does not exceed 0.3 MB
# if this check fails, it's because you forgot to list large files in the "exclude" section of our pyproject.toml
# https://python-poetry.org/docs/pyproject/#include-and-exclude
[ $(du -b dist | cut -f1) -lt 300000 ]
- name: Check code coverage
if: matrix.session == 'coverage'
shell: bash -el {0}
run: |
export NOXSESSION=tests
# check that code coverage is not lower than specific percent
nox --verbose --python=${{ matrix.python }} -- --cov=. --cov-report=term-missing --cov-fail-under=0 # TODO update!!
- name: Run tests with nox
if: matrix.session == 'tests'
shell: bash -el {0}
run: |
nox --verbose --python=${{ matrix.python }}
large-files:
name: File sizes
runs-on: ubuntu-latest
steps:
- name: Check out the repository
uses: actions/checkout@v4

- name: Check for large files
uses: actionsdesk/[email protected]
with:
token: ${{ secrets.GITHUB_TOKEN }} # Optional
filesizelimit: 500000b
labelName: large-files
51 changes: 43 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,57 @@
# CITRUS🍊: A phenotype simulation tool with the flexibility to model complex interactions

CITRUS, the CIs and Trans inteRaction nUmerical Simulator, is a tool for simulating phenotypes with complex genetic archetectures that go beyond simple models that assume linear, additive contributions of individual SNPs. The goal of this tool is to provide better simulations for benchmarking GWAS/PRS models.
CITRUS, the CIs and Trans inteRaction nUmerical Simulator, is a collection of tools for simulating phenotypes with complex genetic architectures that go beyond simple models that assume linear, additive contributions of individual SNPs. The goal of CITRUS is to provide better simulations for benchmarking GWAS/PRS models.

## Getting Started
The key component of CITRUS is the ability to specify custom models relating genotypes to phenotypes. See the [designing simulations](doc/designing_simulations.md) for details on specifying models. Example models are provided in `example-files/`.

[User Guide](doc/user_guide.md)
CITRUS provides multiple command line utilities for performing and analyzing simulations:

[Designing Simulations](doc/designing_simulations.md)
* [citrus simulate](doc/cli.md#simulate): Perform a simulation using a given model
* [citrus plot](doc/cli.md#plot): Visualize a phenotype model
* [citrus shap](doc/cli.md#shap): Generate SHAP values for a model

[Command Line Interface](doc/cli.md)
## Installation

### With conda

## Installation
**TODO - conda install instructions**

### With pip

**TODO - pip install instructions**

### From source

To install from source (only recommended for development), clone the CITRUS repository and checkout the branch you're interested in:

```bash
git clone https://github.com/gymrek-lab/CITRUS.git
cd CITRUS/
pip install .
cd CITRUS
```

Now, create 1) a conda environment with our development tools and 2) a virtual environment with our dependencies and an editable install of CITRUS:

```
conda env create -n citrus -f dev-env.yml
conda run -n citrus poetry install
conda activate citrus
```

Note, for plotting models, you will need to have [graphviz](https://graphviz.org/) installed.

## Quickstart

```
# Visualize a model
citrus plot -c example-files/linear_additive.json
```

## Full documentation

[Command Line Interface](doc/cli.md)

[User Guide](doc/user_guide.md)

[Designing Simulations](doc/designing_simulations.md)


6 changes: 6 additions & 0 deletions citrus/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
from importlib.metadata import version, PackageNotFoundError

try:
__version__ = version(__name__)
except PackageNotFoundError:
__version__ = "unknown"
87 changes: 54 additions & 33 deletions cl_tool/cli.py → citrus/cli.py
Original file line number Diff line number Diff line change
@@ -1,35 +1,19 @@
"""CITRUS command line interface.
See CITRUS/doc/CLI.md for more information.
This tool can be used to run the simulation based on either:
1. A single configuration JSON file that specifies paths to genotype
data files.
CITRUS_sim -c <path_to_config_file>
2. A single configuration JSON file and a list of paths to genotype
data files. The list of paths must be the same length as the
number of input source files in the configuration file (i.e.
the length of the list under the 'input' key in the JSON). Any
paths in the configuration file will be ignored. The -g or
--genotype_files flag can be used to specify the paths to the
genotype files.
CITRUS_sim -c <path_to_config_file> -g <path_to_genotype_file> \\
<path_to_genotype_file> ...
CITRUS_sim -c <path_to_config_file> -g <path_to_genotype_file>
See CITRUS/doc/CLI.md and individual tools for more information.
"""

import click
import sys

@click.group()
@click.version_option(package_name="citrus", message="%(version)s")
def citrus():
pass

"""
citrus simulate
"""
@citrus.command(no_args_is_help=True)
@click.option(
'-c', '--config_file',
Expand Down Expand Up @@ -123,6 +107,9 @@ def simulate(
sep="\t" if tsv else ","
)

"""
citrus plot
"""
@citrus.command(no_args_is_help=True)
@click.option(
'-c', '--config_file',
Expand All @@ -137,28 +124,39 @@ def simulate(
help="Output filename (without extension) for saving plot."
)
@click.option(
'-f', '--format',
'-f', '--fmt',
type=click.Choice(['jpg', 'png', 'svg']),
default='png',
show_default=True,
help="File format and extension for the output plot."
)
def plot(config_file: str, out: str, format: str):
@click.option(
'--verbose',
is_flag=True,
help="Print extra output to the terminal",
default=False
)
def plot(config_file: str, out: str, fmt: str, verbose: str):
"""
Save a plot of the network defined by the simulation config file.
Note: Colors correspond to cis, inheritance, and trans effects
"""

from pheno_sim import plot
from . import plot
from json import load

with open(config_file, "r") as f:
config = load(f)

# Create a plot of the model
plot.visualize(input_spec=config, filename=out, img_format=format)
retcode = plot.visualize(input_spec=config, filename=out,
img_format=fmt, verbose=verbose)
sys.exit(retcode)

"""
citrus shap
"""
@citrus.command(no_args_is_help=True)
@click.option(
'-c', '--config_file',
Expand All @@ -179,6 +177,16 @@ def plot(config_file: str, out: str, format: str):
" 'input' list and genotypes2.vcf to the second input source)."
)
)
@click.option(
'-i', '--included_samples',
type=str,
default=None,
show_default=True,
help=(
"Path to file containing sample IDs to include in the SHAP analysis. "
"File should contain one sample ID per line."
)
)
@click.option(
'-s', '--save_path',
type=str,
Expand All @@ -202,12 +210,13 @@ def plot(config_file: str, out: str, format: str):
)
def shap(
config_file: str,
genotype_files: str,
genotype_files: str,
included_samples: str,
save_path: str,
save_config_path: str
):
"""
Computes the local and global shapley values of a model.
Computes the local shapley values of a model.
"""
from pheno_sim import PhenoSimulation
from pheno_sim.shap import run_SHAP
Expand All @@ -224,9 +233,21 @@ def shap(
for i, path in enumerate(genotype_files):
config['input'][i]['file'] = path

run_SHAP(
simulation,
phenotype_key,
save_path,
save_config_path,
)
# Load optional sample IDs
if included_samples:
with open(included_samples, "r") as f:
included_samples = [line.strip() for line in f] # type: ignore
run_SHAP(
simulation,
phenotype_key,
included_samples,
save_path,
save_config_path,
)
else:
run_SHAP(
simulation,
phenotype_key,
save_path,
save_config_path,
)
Loading

0 comments on commit b7352b9

Please sign in to comment.