Skip to content

Commit

Permalink
Updated the README to be more explicit about how Mirage recognizes sp…
Browse files Browse the repository at this point in the history
…ecies and gene family under UniProt conventions
  • Loading branch information
alexander-nord committed Apr 10, 2022
1 parent 499512b commit fe4caf4
Showing 1 changed file with 10 additions and 2 deletions.
12 changes: 10 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -72,12 +72,20 @@ For example, the following is a valid sequence name entry:
```

Alternatively, protein sequences can be named following UniProt conventions,
where the `OS` and `GN` fields signify species and gene family:
where Mirage looks to the contents of the `OS` and `GN` fields to recognize the
sequence's species and gene family:

```
>sp|Q5VST9|OBSCN_HUMAN Obscurin OS=Homo_sapiens OX=9606 GN=OBSCN PE=1 SV=3
>sp|Q5VST9_iso1|OBSCN_HUMAN Obscurin OS=Homo_sapiens OX=9606 GN=OBSCN PE=1 SV=3
```

Because the simplified Mirage naming convention and the UniProt convention both
incorporate a triple of |-separated fields, it is critical to preserve the `OS`
and `GN` fields in sequences intended to be parsed under the UniProt convention.
In the above example, removing those fields would cause Mirage to mistakenly
identify the sequence as belonging to a species named 'sp' and a gene family
named 'Q5VST9_iso1.'


**Species Guide File**

Expand Down

0 comments on commit fe4caf4

Please sign in to comment.