Skip to content

Commit

Permalink
BUG: account for genes being mapped directly to drug classes without …
Browse files Browse the repository at this point in the history
…intermediate drugs in `confers_resistance_to()`.

Whenever genes were mapped to drug classes (i.e. immediate child of antibiotic molecule), `confers_resistance_to()` would store these mappings in a temporary list (`backup_drugs`) and only use them if the ARG wasn't being mapped to any other drug.

This strategy fails if an ARG is mapped to a drug class and a drug that falls under another drug class (not the same as the one being directly mapped to the ARG). In this case, the ARG is only mapped to the drug and the mapped drug class information is lost.

Now, `confers_resistance_to()` does not utilize `backup_drugs`. Rather, it maps ARGs to every possible drug and drug class. It then iterates over these drugs and drug classes to check if any of the drugs are children nodes of mapped drug classes. Mapped drug classes that are the parents of mapped drugs are removed (they will be restored by `drugs_to_drug_classes()`).
  • Loading branch information
Vedanth-Ramji authored and luispedro committed Feb 5, 2025
1 parent 2b064e2 commit 8ebfb35
Show file tree
Hide file tree
Showing 13 changed files with 371 additions and 358 deletions.
43 changes: 27 additions & 16 deletions argnorm/drug_categorization.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ def _get_drug_classes(super_classes_list: List[str]) -> List[str]:

return output

def confers_resistance_to(aro_num: str) -> List[str]:
def _get_drugs(aro_num: str) -> List[str]:
'''
Description: Returns a list of the drugs/antibiotics to which a gene confers resistance to.
Expand All @@ -41,34 +41,45 @@ def confers_resistance_to(aro_num: str) -> List[str]:
target (list[str]):
A list with ARO number of the drugs/antibiotics to which the input gene confers resistance to.
'''
# some gene superclasses can map to drugs which are immediate children of 'antibiotic molecule'
# only use these if no other drugs can be found, as this information will be present in drugs_to_drug_classes

backup_drugs = []
target = set()

for superclass in ARO[aro_num].superclasses():
for drug in ARO[superclass.id].relationships.get(confers_resistance_to_drug_class_rel, []):
if list(ARO[drug.id].superclasses())[1:] == antibiotic_molecule_node:
backup_drugs.append(drug.id)
else:
target.add(drug.id)
target.add(drug.id)

for drug in ARO[superclass.id].relationships.get(confers_resistance_to_antibiotic_rel, []):
if list(ARO[drug.id].superclasses())[1:] == antibiotic_molecule_node:
backup_drugs.append(drug.id)
else:
target.add(drug.id)
target.add(drug.id)

for rel in [regulates_rel, participates_in_rel, part_of_rel]:
for term in superclass.relationships.get(rel, []):
target.update(confers_resistance_to(term.id))

if not target:
target.update(backup_drugs)
target.update(_get_drugs(term.id))

return sorted(target)

def confers_resistance_to(aro_num: str) -> List[str]:
# some gene superclasses can map to drugs which are immediate children of 'antibiotic molecule'
# only use these if no other drugs can be found, as this information will be present in drugs_to_drug_classes

drugs = set(_get_drugs(aro_num))

drug_classes = set()
for drug in drugs:
if list(ARO[drug].superclasses())[1:] == antibiotic_molecule_node:
drug_classes.add(drug)

drugs = drugs - drug_classes
redundant_drug_classes = set()
for drug in drugs:
for drug_class in drug_classes:
if ARO[drug_class] in list(ARO[drug].superclasses())[1:]:
redundant_drug_classes.add(drug_class)

drug_classes = drug_classes - redundant_drug_classes
drugs.update(drug_classes)

return sorted(drugs)

def drugs_to_drug_classes(drugs_list: List[str]) -> List[str]:
'''
Description: Returns a list of categories of drug classes, e.g. cephem and penam are categorized as beta_lactam antibiotics.
Expand Down
132 changes: 66 additions & 66 deletions outputs/hamronized/abricate.argannot.tsv

Large diffs are not rendered by default.

142 changes: 71 additions & 71 deletions outputs/hamronized/abricate.megares.tsv

Large diffs are not rendered by default.

62 changes: 31 additions & 31 deletions outputs/hamronized/abricate.resfinder.tsv

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion outputs/hamronized/abricate.resfinderfg.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -3953,7 +3953,7 @@ Unnamed: 0 input_file_name gene_symbol gene_name reference_database_id reference
3950 GMGC10.95nr_block_0294 D-alanyl-D-alanine "D-alanyl-D-alanine carboxypeptidase DacB""|KU606669.1|Preterm_infant_stool|AMP" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 84.47 GMGC10.269_887_786.DACB 598 1434 + 100.0
3951 GMGC10.95nr_block_0294 D-alanyl-D-alanine "D-alanyl-D-alanine carboxypeptidase DacB""|KU606669.1|Preterm_infant_stool|AMP" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 81.39 GMGC10.270_123_852.DACB 598 1430 + 99.52
3952 GMGC10.95nr_block_0294 D-alanyl-D-alanine "D-alanyl-D-alanine carboxypeptidase DacB""|KU606669.1|Preterm_infant_stool|AMP" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 81.12 GMGC10.280_419_486.DACB 598 1434 + 100.0
3953 GMGC10.95nr_block_0294 ABC-F "ABC-F type ribosomal protection protein Msr(E)""|MG585948.1|pharmaceutical_effluent|AZM" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 99.93 GMGC10.280_792_741.UNKNOWN 1 1476 + 100.0 ARO:3003109 ARO:0000006 ARO:0000000
3953 GMGC10.95nr_block_0294 ABC-F "ABC-F type ribosomal protection protein Msr(E)""|MG585948.1|pharmaceutical_effluent|AZM" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 99.93 GMGC10.280_792_741.UNKNOWN 1 1476 + 100.0 ARO:3003109 ARO:0000006,ARO:0000026 ARO:0000000,ARO:0000026
3954 GMGC10.95nr_block_0294 D-alanyl-D-alanine "D-alanyl-D-alanine carboxypeptidase DacB""|KU606669.1|Preterm_infant_stool|AMP" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 99.04 GMGC10.281_471_119.DACB 637 1473 + 100.0
3955 GMGC10.95nr_block_0294 D-alanyl-D-alanine "D-alanyl-D-alanine carboxypeptidase DacB""|KU606669.1|Preterm_infant_stool|AMP" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 80.05 GMGC10.281_560_329.DACB 598 1434 + 100.0
3956 GMGC10.95nr_block_0294 D-alanyl-D-alanine "D-alanyl-D-alanine carboxypeptidase DacB""|KU606669.1|Preterm_infant_stool|AMP" resfinderfg 2021-Oct-18 abricate abricate 1.0.1 80.14 GMGC10.281_806_382.DACB 598 1433 + 99.88
Expand Down
2 changes: 1 addition & 1 deletion outputs/hamronized/amrfinderplus.ncbi.orfs.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,6 @@ amrfinderplus.ncbi.orfs.tsv bexA multidrug efflux MATE transporter BexA NCBI Ref
amrfinderplus.ncbi.orfs.tsv aad9 ANT(9) family aminoglycoside nucleotidyltransferase NCBI Reference Gene Database 2023-Nov-01 WP_002578722.1 amrfinderplus 3.10.30 gene_presence_detected AMINOGLYCOSIDE 100.0 AMINOGLYCOSIDE 68 841 258 k119_82797 258 - 99.61 ARO:3002630 ARO:0000039 ARO:0000016
amrfinderplus.ncbi.orfs.tsv erm(B) 23S rRNA (adenine(2058)-N(6))-methyltransferase Erm(B) NCBI Reference Gene Database 2023-Nov-01 WP_002292226.1 amrfinderplus 3.10.30 gene_presence_detected MACROLIDE 100.0 MACROLIDE 24650 25384 245 k119_84636 245 - 100.0 ARO:3000375 ARO:0000006,ARO:0000027,ARO:0000046,ARO:0000057,ARO:0000065,ARO:0000066,ARO:3000145,ARO:3000156,ARO:3000158,ARO:3000176,ARO:3000583,ARO:3000584,ARO:3000669,ARO:3000672,ARO:3000673,ARO:3000674,ARO:3000675,ARO:3000677,ARO:3000678,ARO:3000679,ARO:3000680,ARO:3000681,ARO:3000682,ARO:3000867 ARO:0000000,ARO:0000000,ARO:0000000,ARO:0000000,ARO:0000000,ARO:0000000,ARO:0000000,ARO:0000000,ARO:0000000,ARO:0000017,ARO:0000017,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026,ARO:0000026
amrfinderplus.ncbi.orfs.tsv lnu(AN2) lincosamide nucleotidyltransferase Lnu(AN2) NCBI Reference Gene Database 2023-Nov-01 WP_004308783.1 amrfinderplus 3.10.30 gene_presence_detected LINCOSAMIDE 100.0 LINCOSAMIDE 1830 2339 170 k119_91457 170 - 100.0 ARO:3002835 ARO:0000046,ARO:3007169 ARO:0000017,ARO:0000017
amrfinderplus.ncbi.orfs.tsv mef(En2) macrolide efflux MFS transporter Mef(En2) NCBI Reference Gene Database 2023-Nov-01 WP_063853729.1 amrfinderplus 3.10.30 gene_presence_detected MACROLIDE 100.0 MACROLIDE 2367 3569 401 k119_91457 401 - 99.5 ARO:3004659 ARO:0000046 ARO:0000017
amrfinderplus.ncbi.orfs.tsv mef(En2) macrolide efflux MFS transporter Mef(En2) NCBI Reference Gene Database 2023-Nov-01 WP_063853729.1 amrfinderplus 3.10.30 gene_presence_detected MACROLIDE 100.0 MACROLIDE 2367 3569 401 k119_91457 401 - 99.5 ARO:3004659 ARO:0000000,ARO:0000046 ARO:0000000,ARO:0000017
amrfinderplus.ncbi.orfs.tsv tet(W) tetracycline resistance ribosomal protection protein Tet(W) NCBI Reference Gene Database 2023-Nov-01 WP_002586627.1 amrfinderplus 3.10.30 gene_presence_detected TETRACYCLINE 100.0 TETRACYCLINE 21199 23115 639 k119_9485 639 + 100.0 ARO:3000194 ARO:0000051,ARO:0000069,ARO:3000152,ARO:3000528,ARO:3000667,ARO:3000668 ARO:3000050,ARO:3000050,ARO:3000050,ARO:3000050,ARO:3000050,ARO:3000050
amrfinderplus.ncbi.orfs.tsv catA13 type A-13 chloramphenicol O-acetyltransferase NCBI Reference Gene Database 2023-Nov-01 WP_043774378.1 amrfinderplus 3.10.30 gene_presence_detected CHLORAMPHENICOL 100.0 PHENICOL 1975 2595 207 k119_95290 207 + 100.0 ARO:3004454 ARO:3000385 ARO:3000387
Loading

0 comments on commit 8ebfb35

Please sign in to comment.