Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ChainParseError: 2 antibody domains in sequence #7

Open
deweihu96 opened this issue Mar 10, 2022 · 4 comments
Open

ChainParseError: 2 antibody domains in sequence #7

deweihu96 opened this issue Mar 10, 2022 · 4 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@deweihu96
Copy link

anarci supports 2 domains in one sequence, while abnumber does not

abnumber.exceptions.ChainParseError: Found 2 antibody domains in sequence: "DIQLTQSPSFLSASVGDRVTITCSARSSISFMYWYQQKPGKAPKLLIYDTSNLASGVPSRFSGSGSGTEFTLTISSLEAEDAATYYCQQWSSYPLTFGQGTKLEIKGGGSGGGGEVQLVESGGGLVQPGGSLRLSCAASGFTFSTYAMNWVRQAPGKGLEWVGRIRSKYNNYATYYADSVKDRFTISRDDSKNSLYLQMNSLKTEDTAVYYCVRHGNFGNSYVSWFAYWGQGTLVTVSSGGCGGGEVAALEKEVAALEKEVAALEKEVAALEKGGGDKTHTCPPCPAPEAAGGPSVFLFPPKPKDTLMISRTPEVTCVVVDVSHEDPEVKFNWYVDGVEVHNAKTKPREEQYNSTYRVVSVLTVLHQDWLNGKEYKCKVSNKALPAPIEKISKAKGQPREPQVYTLPPSREEMTKNQVSLWCLVKGFYPSDIAVEWESNGQPENNYKTTPPVLDSDGSFFLYSKLTVDKSRWQQGNVFSCSVMHEALHNHYTQKSLSLSPGK"

@prihoda
Copy link
Owner

prihoda commented Mar 13, 2022

Hi @deweihu96, thanks for reporting this, I would like to support this in the future. A pull request would be welcome.

The current AbNumber Chain object can only hold a single variable domain, with a single CDR3, etc. So probably this cannot be supported using chain = Chain(seq, 'imgt'), but using a separate call like chains = Chain.parse_domains(seq, 'imgt').

So if you have a sequence like Var1Const1Var2Const2, you should get two Chain objects where the chain.tail corresponds to any sequence that immediately follows the variable domain (chain1.tail = "Const1")

@prihoda prihoda added enhancement New feature or request help wanted Extra attention is needed labels Mar 13, 2022
@deweihu96
Copy link
Author

Hi @prihoda ~ Thanks for your reply. The simplest way that I came up with is:

  1. Use anarci to find two domains, and slice the sequences in two domains;
  2. Use abnumber to do numbering on two sequences.

@prihoda
Copy link
Owner

prihoda commented Apr 14, 2022

@deweihu96 sounds good. Can you share the part of the code where you parse the anarci output?

@deweihu96
Copy link
Author

deweihu96 commented Apr 19, 2022

@prihoda

>>> import anarci
>>> seq = 'QIQLVQSGSELKKPGASVKVSCKASGYTFTHYAMNWVRQAPGQGLEWMGWINTNTGEPTYAQGFTGRFVFSLDTSVSTAYLQISSLKAEDTAVYYCAREREPGMDEWGQGTLVTVSSGGGGSSSSSSDVVMTQSPLSLPVTLGQPASISCRSSQSLVHANTNTYLEWYQQRPGQSPRLLIYKVSNRFSGVPDRFSGSGSGTDFTLKISRVEAEDVGVYYCFQGTHVPNTFGQGTKLEIK'
>>> sequences, numbered, alignment_details, hit_tables =  anarci.run_anarci(seq,'kabat',allowed_species='human')
>>> alignment_details                                                         
#[[
#{'id': 'human_H', 'description': '', 'evalue': 1.4e-55, 'bitscore': 178.0, 'bias': 1.0, 'query_start': 0, 'query_end': 117, 'species': 'human', 'chain_type': 'H', 'scheme': 'imgt', 'query_name': 'Input sequence'}, 
#{'id': 'human_K', 'description': '', 'evalue': 1.9e-56, 'bitscore': 180.6, 'bias': 0.1, 'query_start': 127, 'query_end': 239, 'species': 'human', 'chain_type': 'K', 'scheme': 'imgt', 'query_name': 'Input sequence'}]]

Once you have the start and end positions, slice the sequence and parse them with abnumber: )

I noticed that you're also one of the authors of biophi. I want to say that's a really great job!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants