-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Showing
3 changed files
with
51 additions
and
8 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,4 +1,48 @@ | ||
**** | ||
atac | ||
==== | ||
**** | ||
|
||
The ``atac`` command exposes the functionality of ``alevin-fry`` for processing RAD files containing scATAC-seq data. The ``atac`` command sets the *mode* of ``alevin-fry``, and this command itself takes one of several various sub-commands (``generate-permit-list`` and ``sort`` being the primary ones). | ||
|
||
generate-permit-list (atac) | ||
=========================== | ||
|
||
This command takes as input an output directory containing a RAD file (created by ``piscem``), and it determines what cell barcodes should be associated with "true" cells, which should be corrected to | ||
some "true" barcode, and which should simply be ignored / discarded. | ||
|
||
This command has 4 required arguments; the path to an input directory ``--input``, | ||
the path to an output directory ``--output-dir`` (which will be created if it | ||
doesn't exist), and a path to the barcode permit-list file. The functioning of this argument is as follows: | ||
|
||
* ``--unfiltered-pl <plist>``: This option accepts as an argument a list of *possible* barcodes for the sample. For example, this is the flag you should use if you wish to provide an "external permit list", like the 10x v2 or 10x v3 permit lists. Unilike with the ``--valid-bc`` flag, the list passed to this argument is the set of all possible barcodes for the technology being processed, and it is likely that most of the barcodes in the file may not correspond to cells present in this particular sample. When using this argument, you may also pass the ``--min-reads`` argument to determine the minimum frequency with which a barcode must be seen in order to be retained. The algorithm used here will pass over the input records (mapped reads) and count how many times each of the barcodes in the unfiltered permit list occur exactly. Any barcode ocurring >= ``min-reads`` times will be considered as a present cell. Subsequently, all barcodes that did not match a present cell will be searched (at an edit distance of up to 1) againt the barcodes determined to correspond to present cells. If an initially non-matching barcode has a unique neighbor among the barcodes for present cells, it will be corrected to that barcode, but if it has no 1-edit neighbor, or if it has 2 or more 1-edit neighbors among that list (i.e. it's correction would be ambiguous), then the record is discarded. | ||
|
||
|
||
output | ||
------ | ||
|
||
The ``generate-permit-list`` command outputs a number of different files in the output directory. Not all files are relevant to users of ``alevin-fry``, but the files are described here. | ||
|
||
1. The file ``bin_lens.bin`` is a binary file that records the lengths of the bins used for creating temporary files for sorting. | ||
|
||
2. The file ``bin_recs.bin`` is a binary file that encodes where records should be routed during the sorting phase. | ||
|
||
3. The file ``permit_freq.bin`` is a binary file that encodes information about the frequency of occurrence of different barcodes in the permit list. | ||
|
||
4. The file ``permit_map.bin`` is a binary file (a serde serialized HashMap) that maps each barcode in the input RAD file that is within an edit distance of 1 to some *true* barcode to the barcode to which it corrects. This allows the ``collate`` command to group together all of the read records corresponding to the same *corrected* barcode. | ||
|
||
4. The file ``generate_permit_list.json`` that is a JSON file containing information about the run of the command. | ||
|
||
|
||
sort (atac) | ||
=========== | ||
|
||
This command takes as input the directory containing the original RAD file (created by ``piscem``) and the output directory generated by the ``generate-permit-list`` command above. It parses the input RAD file, buckets and then sorts the records by genomic location, and produces a globally-sorted BED file for downstream analysis. The process is highly multi-threaded, and the number of threads can be chosen by passing the appropriate argument to the ``--threads`` command. The output BED file can *optionally* be compressed if the ``--compress`` flag is passed to the ``sort`` command. The output of the ``sort`` command id described below. | ||
|
||
output | ||
------ | ||
|
||
The ``sort`` command outputs the following files: | ||
|
||
1. The ``sort.json`` file is a JSON file containing information about how the ``sort`` command was run. | ||
|
||
2. The ``map.bed`` file (or ``map.bed.gz`` if the ``--compress`` flag was passed) contains the output filed in BED format that can be provided to a peak caller like `MACS <https://github.com/macs3-project/MACS/>`_. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters