APAlyzer utilizes the PAS (polyadenylation sites) collection in the PolyA_DB database to examine APA (alternative polyadenylation) events in all genomic regions, including 3′UTRs and introns.
Sources:
Required files are to be specified in the input config/samples.csv
.
Each row in the sample sheet has two columns:
- condition: name of the condition (e.g control)
- sample: name of the sample (e.g. control_replicate1)
- bam: relative path from APAlyzer working directory / absolute path to the BAM input file for the sample
It is important to name samples of the same condition with the exact condition name under the condition column since samples are grouped per condition to be processed by APAlyzer.
Parameters used to run APAlyzer are specified in config/config.APAlyzer.yaml
.
In the config file, users are able to specify the output directory and output
file name: out_dir, differential_output_file
.
In addition, the relative path from the working directory to the input sample file
from the previous step is to be specified with parameter sample_file
.
Other parameters that are important to specify for each run are the
path to GTF annotation file and GTF annotation file organism,
genome version, and ensemble version details:
gtf, gtf_organism, gtf_genome_version, gtf_ensemble_version
.
To run the method workflow, we first need to activate apaeval
conda environment
following the instructions on APAeval README.
Before running, you can perform a 'dry run' to check which steps will be run and where output files will be generated given the provided parameters and input sample file:
bash dryrun.sh
To run the workflow locally, you can use the provided wrapper script run_local.sh
which executes with singularity.
bash run_local.sh
Note: The run_local.sh script is currently set up to run with the APAeval test data.
If you have specified absolute paths in your sample sheet (e.g. config/samples.csv
) or the config file (config/config.DaPars2.yaml
),
or have input data that is not in the current directory, you will need to modify Singularity bind arguments so the
input files will be available to the container.
e.g. The path to the input GTF file is /share/annotation/annotation.gtf
, and my current working directory is /home/sam/DaPars2_snakemake/
.
Modify the --singularity-args
line in run_local.sh
like below to ensure the file is available to the container:
--sigularity-args="--bind /share/" \
If you are satisfied with the bind arguments, you can run the workflow locally by doing bash run_local.sh
The output of APAlyzer qualifies for differential challenge.
The file is postprocessed into a tsv file consisting of a column of
gene ids and another column of pvalues located in out_dir
that is specified in the config file config/config.APAlyzer.yaml
.
The rulegraph gives an overview of the steps of the workflow.
To obtain it, adapt and run the rulegraph.sh
script.
The current rulegraph is:
If you have any question or comment about APAlyzer, please contact Dr. Ruijia Wang ([email protected]).