Replies: 8 comments 2 replies
-
Hi @eboileau , tRNA modifications are usually annotated according to the clover leaf structure, see e.g. This is usually not consistent with the Ensembl coordinates. @piechottam has done quite some work here to deal with this. For tRNAs, a column with the "clover leaf position" would be helpful. |
Beta Was this translation helpful? Give feedback.
-
tRNAs can be classified into:
This is reflected in their naming: http://gtrnadb.ucsc.edu/docs/naming/ Biologists preferably refer to positions within tRNA by cloverleaf or Sprinzl coordinates Some tRNAs lack some coordinates, e.g., [...], 16, 18, [...] (position 17 of the consensus model is missing). To transform genomic to Sprinzl coordinates, you require an appropriate consensus secondary structure model (correct organism, Introns (yes/no) ... and you need the sequences (tRNAs) that you want to transform. As a result, you get a mapping file with sequence coordinates -> Sprinzl coordinates. I have done this in QutRNA and can provide you the mapping files. My thoughts:
|
Beta Was this translation helpful? Give feedback.
-
Thanks @piechottam Our starting point is a bedRMod file i.e. BED9+2 file, with single sites, so
While your reply really helps to put this in perspective, I don't really know what to do. |
Beta Was this translation helpful? Give feedback.
-
bedRMod has the following columns:
Yes, we know the organism, and the assembly is fixed in SCIMODOM, e.g. we have GRCm39 for mouse, and if a dataset is GRCm38, then data records are lifted over during upload. So, if I understand correctly, we could in principle allow a user to decide which coordinate system he wants to use for it's tRNA bedRMod file?
@piechottam let me know if I didn't understand correctly. I still have to wrap my head around the mapping though, I need to try it... As for the gtrnadb annotation, I found some BED12 files, but I'm not sure whether you suggest using a particular file? I don't find an easy up-to-date API to download files, any suggestion? |
Beta Was this translation helpful? Give feedback.
-
In principle, you could allow genomic and Sprinzl coordinates. My suggestion: don't implement Sprinzl coordinates for columns: Start, End. Add an extra column for metadata, e.g.: sprinzl=17A. You are correct about the mapping: genomic -> Sprinzl and Sprinzl -> genomic. However, as mentioned earlier, I would not support Sprinzl coordinates via columns Start and End. The mapping to be done ONCE for an assembly(organism) and release of gtrnadb. Assuming, a reference FASTA (genomic coordinates) for tRNAs for some organisms is given. In the next step, you use cmalign to calculate a secondary structure alignment with your FASTA against a CM model. The scripts that I have written for this are: https://github.com/dieterich-lab/QutRNA/blob/main/workflow/scripts/ss_consensus_to_sprinzl.py https://github.com/dieterich-lab/QutRNA/blob/main/workflow/scripts/seq_to_sprinzl.py |
Beta Was this translation helpful? Give feedback.
-
bedtool wouldn't be a problem as such, as operations would only be done on genomic coordinates, but indeed that would be "against" the specs to allow start/end to be a string. Adding an extra column is a possibility, but in our case we need to be careful as this would make exactly 12 columns, and some tools, like bedtools, then assume it's a BED12 file. But we can handle this internally. |
Beta Was this translation helpful? Give feedback.
-
Some questions about mapping genomic <-> Sprinzl
|
Beta Was this translation helpful? Give feedback.
-
So what happens is
|
Beta Was this translation helpful? Give feedback.
-
We started adding data, now only mRNA. The display is based on bedRMod, and shows site-specific modifications arranged by chrom, start, strand, and includes, among other fields, score, coverage, and frequency (dataset-specific information).
Ensembl annotation has all tRNA genes, and we are using Ensembl already, so this is the default choice. When searching, we can filter by RNA type to display either tRNA, mRNA, etc., using the same format.
In general, how do you annotate your tRNA data? What about other RNA types annotation (mt-tRNA, ...)?
Beta Was this translation helpful? Give feedback.
All reactions