Skip to content

Commit

Permalink
fix: allow simphenotype to accept TR PGENs without --repeats (#263)
Browse files Browse the repository at this point in the history
  • Loading branch information
aryarm authored Nov 27, 2024
1 parent 808c31e commit 16a84d1
Show file tree
Hide file tree
Showing 4 changed files with 30 additions and 2 deletions.
2 changes: 2 additions & 0 deletions docs/api/data.rst
Original file line number Diff line number Diff line change
Expand Up @@ -183,6 +183,8 @@ The following methods from the :class:`Genotypes` class are disabled, however.
1. ``check_biallelic``
2. ``check_maf``

The constructor of the :class:`GenotypesTR` class also includes a :code:`vcftype` parameter. This can be helpful when the type of the TR file cannot be inferred automatically. Refer to `the TRTools docs <https://trtools.readthedocs.io/en/stable/trtools.utils.tr_harmonizer.html#trtools.utils.tr_harmonizer.VcfTypes>`_ for a list of accepted types.

.. _api-data-genotypestr:

GenotypesPLINK
Expand Down
11 changes: 11 additions & 0 deletions docs/formats/genotypes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,3 +37,14 @@ To convert a VCF containing tandem repeats to PGEN, use this command, instead.
.. code-block:: bash
plink2 --vcf-half-call m --make-pgen 'pvar-cols=vcfheader,qual,filter,info' --vcf input.vcf --make-pgen --out output
Tandem repeats
~~~~~~~~~~~~~~
VCFs containing tandem repeats usually have a *type* indicating the tool from which they originated. We support whichever types are supported by `TRTools <https://trtools.readthedocs.io/en/stable/CALLERS.html>`_.

We do our best to infer the *type* of a TR VCF automatically. However, there will be some instances when it cannot be inferred.
Users of TRTools know to specify :code:`--vcftype` in that situation. However, most haptools commands do not yet support the :code:`--vcftype` parameter. Instead, you can modify the header of your VCF to trick haptools into being able to infer the *type*.

For example, if you're using HipSTR, you can add :code:`##command=hipstr...`. Refer to `this code in TRTools <https://trtools.readthedocs.io/en/stable/trtools.utils.tr_harmonizer.html#trtools.utils.tr_harmonizer.InferVCFType>`_ for more details.

Please note that all of this also applies to PVAR files created from TR VCFs.
8 changes: 6 additions & 2 deletions haptools/sim_phenotype.py
Original file line number Diff line number Diff line change
Expand Up @@ -405,8 +405,12 @@ def simulate_pt(
# check if these are all repeat IDs, haplotype IDs, or a mix of them
if len(hp.type_ids["R"]) >= len(haplotype_ids) and repeats is None:
# if they're all repeat IDs or --repeats was specified
log.info("Loading TR genotypes")
gt = GenotypesTR(fname=genotypes, log=log)
if genotypes.suffix == ".pgen":
log.info("Loading TR genotypes from PGEN file")
gt = GenotypesPLINKTR(fname=genotypes, log=log, chunk_size=chunk_size)
else:
log.info("Loading TR genotypes from VCF/BCF file")
gt = GenotypesTR(fname=genotypes, log=log)
load_as_haps = False
else:
# the genotypes variable must contain haplotype genotypes
Expand Down
11 changes: 11 additions & 0 deletions tests/test_simphenotype.py
Original file line number Diff line number Diff line change
Expand Up @@ -670,6 +670,17 @@ def test_repeat(self, capfd):
assert captured.out
assert result.exit_code == 0

def test_repeat_pgen(self, capfd):
gt_file = DATADIR / "simple-tr.pgen"
hp_file = DATADIR / "simple_tr.hap"

cmd = f"simphenotype --id 1:10114:GTT {gt_file} {hp_file}"
runner = CliRunner()
result = runner.invoke(main, cmd.split(" "), catch_exceptions=False)
captured = capfd.readouterr()
assert captured.out
assert result.exit_code == 0

def test_repeat_with_hapgts(self, capfd):
tmp_transform = Path("temp-transform.vcf")
with open(tmp_transform, "w") as file:
Expand Down

0 comments on commit 16a84d1

Please sign in to comment.