Skip to content

Commit

Permalink
cleaned up and added readme file
Browse files Browse the repository at this point in the history
  • Loading branch information
AyushSemwal committed Jan 28, 2025
1 parent 4706486 commit 9dd6ccf
Show file tree
Hide file tree
Showing 43 changed files with 382 additions and 3,802 deletions.
65 changes: 65 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,65 @@
# Tranquilizer

Tranquilizer is a Deep Learning (DL) based tool to annotate, visualize the annotated reads, demultiplex for single-cell long-reads data including **TRAQNUIL-seq** and **scNanoRNASeq** and generate some inital QC plots.

## I/O

The directory storing all the raw reads in **fasta/fa/fasta.gz/fa.gz/fastq/fq/fastq.gz/fq.gz** is provided as the input and the tool generates demultiplexed fasta files along with valid and invalid annotated **.parquet** files and some QC .pdf files as the outputs in two separate steps/commands (preprocessfasta and annotate-reads). Check out the examples drectory for both **TRAQNUIL-seq** and **scNanoRNASeq** datasets.

## Usage

### <ins>Preprocessing</ins>

To enhance the efficiency of the annotation process, Tranquilizer organizes raw reads into separate .parquet files, grouping them based on their lengths. This approach optimizes data compression within each bin, accelerates the annotation of the entire dataset, and facilitates the visualization of user-specified annotated reads without dependence on the annotation status of the complete dataset.

Example usage:

```console
tranquilizer preprocessfasta /path/to/RAW_DATA/directory /path/to/OUTPUT/directory CPU_THREADS
```
It is recommended that you follow the directory structure as in the exmples.

### <ins>Read length distribution</ins>

As an initial quality control metric, users may wish to visualize the read length distribution. The `readlengthdist` command facilitates this by generating a plot with log10-transformed read lengths on the x-axis and their corresponding frequencies on the y-axis. The output is provided in .png format in the **/path/to/OUTPUT/directory/plots/** folder.

Example uage:

```console
tranquilizer readlengthdist /path/to/OUTPUT/directory
```

### <ins>Annotation, barcode correction and demultiplexing</ins>

Reads can be annotated, followed by barcode extraction, correction, and assignment to their respective cells (demultiplexing), using the single command `annotate-reads`. This command produces the following outputs:
* Demultiplexed FASTA files: Located in /path/to/OUTPUT/directory/demuxed_fasta/.
* Annotation metadata:
1. Valid reads: /path/to/OUTPUT/directory/annotations_valid.parquet
2. Invalid reads: /path/to/OUTPUT/directory/annotations_invalid.parquet
* Quality control (QC) plots:
1. barcode_plots.pdf
2. demux_plots.pdf
3. full_read_annots.pdf
All QC plots are saved in /path/to/OUTPUT/directory/plots/.

**Note**: Before running the annotate-reads command, ensure you select the appropriate model for your dataset. If unsure, use the command `tranquilizer availablemodels` to view the available models.

Example usage:

```console
tranquilizer annotate-reads MODEL_NAME /path/to/OUTPUT/directory /path/to/BARCODE_WHITELIST --chunk-size 100000 --portion full --njobs @CPU_threads
```

### <ins>Read visualization</ins>

Annotated reads can be inspected independently of the `annotate-reads` process—either before or after successfully running the `annotate-reads` command—by providing their names to the `visualize` command. The resulting visualization is saved as a .pdf file in the **/path/to/OUTPUT/directory/plots/** folder.

Example usage:

```console
tranquilizer visualize MODEL_NAME /path/to/OUTPUT/directory --portion full --read-names READ_NAME_1,READ_NAME_2,READ_NAME3
```



** Installation instructions and pre-requisites coming soon **
497 changes: 115 additions & 382 deletions main.py
100755 → 100644

Large diffs are not rendered by default.

Binary file removed models/scNanoRNAseq.h5
Binary file not shown.
Binary file removed models/scNanoRNAseq_lbl_bin.pkl
Binary file not shown.
Binary file removed models/tranquil_complex.h5
Binary file not shown.
Binary file removed models/tranquil_complex_lbl_bin.pkl
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file removed scripts/._extract_read_ends.py
Binary file not shown.
Binary file removed scripts/._simulate_training_data.py
Binary file not shown.
Binary file removed scripts/._train_model.py
Binary file not shown.
Binary file modified scripts/__pycache__/annotate_new_data.cpython-310.pyc
Binary file not shown.
Binary file not shown.
Binary file removed scripts/__pycache__/correct_UMI.cpython-310.pyc
Binary file not shown.
Binary file modified scripts/__pycache__/correct_barcodes.cpython-310.pyc
Binary file not shown.
Binary file modified scripts/__pycache__/demultiplex.cpython-310.pyc
Binary file not shown.
Binary file modified scripts/__pycache__/export_annotations.cpython-310.pyc
Binary file not shown.
Binary file modified scripts/__pycache__/extract_annotated_seqs.cpython-310.pyc
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file modified scripts/__pycache__/plot_read_len_distr.cpython-310.pyc
Binary file not shown.
Binary file modified scripts/__pycache__/preprocess_reads.cpython-310.pyc
Binary file not shown.
Binary file not shown.
Binary file removed scripts/__pycache__/train_model.cpython-310.pyc
Binary file not shown.
Binary file modified scripts/__pycache__/visualize_annot.cpython-310.pyc
Binary file not shown.
37 changes: 0 additions & 37 deletions scripts/barcode_correction.py

This file was deleted.

120 changes: 0 additions & 120 deletions scripts/correct_UMI.py

This file was deleted.

Loading

0 comments on commit 9dd6ccf

Please sign in to comment.