Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some rRNA not mapped to the reference genome #2

Open
Xuyuch opened this issue Sep 12, 2024 · 7 comments
Open

Some rRNA not mapped to the reference genome #2

Xuyuch opened this issue Sep 12, 2024 · 7 comments

Comments

@Xuyuch
Copy link

Xuyuch commented Sep 12, 2024

Dear team;
Thanks for providing those wonderful reference genome. It helps me a lot in my research. I met a problem that many of my reads could mapped to rrna region but was classified to not aligned when using bowtie2 alignment. Here is the details:
I used bowtie2 -x rRNA_mm10_reference_genome/bowtie2_index/rRNA -U ../clean_Sample_1_R1.fastq.gz -S rRNA_trash.sam --un mrna_sample1.fastq And I assume unmapped as normal mrna.
For the next step, I put the unmapped reads to STARaligner, and got the mapped mrna reads.

STAR --genomeDir reference_genome/mm10_Gencode_STAR \
	--runThreadN 40 \
	--readFilesIn mrna_sample1.fastq \
	--outFileNamePrefix sample1_mRNA_R1 \
	--outSAMtype BAM SortedByCoordinate

Later, I used feature count to get the count matrix featureCounts -a gencode.vM10.annotation.gtf -g gene_name -o count_sample1_R1_ds sample1_mRNA_R1Aligned.sortedByCoord.out.bam And I found there are huge numbers of Rn18s-rs5 gene, which is a typical 18s rRNA. But it show up in the mrna files (All rRNA should be filter out) that should not contain any rRNA. I double check those reads in the genome browser(IGV) and majority of them does not have splice sites. Could you help me check this issue? Thanks a lot.
Yuchen

@vikramparalkar
Copy link
Owner

Hello:

  1. What is the genomic locus of Rn18s-rs5 where you are seeing these reads mapped? We can check to see whether we had masked that region.
  2. Do you know if the sequence of that region is different from our chromosome R? Because it is possible that there may be rRNA variants that could be causing this result.
  3. Can you try remapping with our mm39 genome? Does it give the same problem as well?

@Xuyuch
Copy link
Author

Xuyuch commented Sep 25, 2024

Hello,
Thanks for your help and sorry for the late reply. Here is the information i have

  1. The Genomic locus of Rn18s-rs5 chr17:39846354-chr17:39848202. More information could be found in NCBI Gene database with Gene ID: 110183.
  2. I did a blast between Rn18s-rs5 and ChrR. Seems most of them are matched there are small region mismatches(no more than 20bp) between Rn18s-rs5 sequence and BK000964.3(NCBI rRNA region). According the paper, you also used BK000964.3 as a base to build the chromosomeR.
  3. I haven't tried remapping with the mm39 genome. But I tried to map to BK000964.3. And the same problem occurs.

Thank you.
Yuchen

@vikramparalkar
Copy link
Owner

vikramparalkar commented Sep 25, 2024 via email

@Xuyuch
Copy link
Author

Xuyuch commented Sep 25, 2024

Hi Vikram.
Thanks for you clarify. I get what you mean and understand those reads should not be mapped to the customized genome according to the mask method you used. But that is the thing I am worried about. According to Refseq, it is a 18s rRNA, but the results shows it is neither mapped to chrR nor auto-chromosome.

Here is how i found the Rn18s-rs5. In short, I found it in the unmaped reads after mapping to the genome in the github.

I used the custom genome from the github(Mouse_mm10-rDNA_genome_v1.0.tar.gz) and get all mapped rRNA reads. And I double checked the remaining unmapped reads which I assume should not have any rRNA reads. However, within the unmapped reads, I found this Rn18s-rs5 which should be a rRNA gene.

In another word, I assumed all rRNA reads should be perfectly aligned to chrR while some reads,which mapped to Rn18s-rs5, does not mapped to chrR and been left in the unmaped reads.
Hope I explain my worries clear enough and please let me know if my thoughts make sense to you.

@vikramparalkar
Copy link
Owner

vikramparalkar commented Sep 26, 2024 via email

@Xuyuch
Copy link
Author

Xuyuch commented Sep 26, 2024

Hi Vikram:
Yep! I tried do Bowtie2 alignment twice and for the second run it does not give me any mapped reads so I think STAR may align more reads in some specific circumstance.

And I raised this point because Rn18s-rs5 reads seems dominant in my unmaped reads. Within 2,932,041 ummaped reads I put into STAR(around 30% of total reads), it has 457,111 Rn18s-rs5 reads. Because we got the data from a ribosome foot-printing experiment so most of them are rRNA. Rn18s-rs5 reads are not a large fraction of total counts, however, for a single gene, this gene's count is way more larger than a typical mrna (like Actb, a typical house keeping gene, only have reads of 4,166) and comparable to most of our 18s rRNA reads which aligned to the chrR in the github.
Thank you!
Yuchen

@vikramparalkar
Copy link
Owner

vikramparalkar commented Sep 26, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants