Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clonotype counting with lower cell cell count #321

Open
marcoco90 opened this issue Oct 23, 2024 · 3 comments
Open

Clonotype counting with lower cell cell count #321

marcoco90 opened this issue Oct 23, 2024 · 3 comments

Comments

@marcoco90
Copy link

Hello team,

We recently processed some T-cells samples with the Takara SMART-Seq Human TCR (with UMIs).

We used the command "run-trust4 --barcodeLevel molecule -f $Genome/hg38_bcrtcr.fa --ref $Genome/human_IMGT+C.fa -1 $read1 -2 $read2 --barcode $read2 --readFormat bc:0:11,r2:19:-1 -o $samid -t 10".

We accounted and corrected for UMIs to remove PCR duplicates.
However when looking at the number of TRB clonotypes, we obtained way more then the cell input number used for the experiment.

Sample Name Cell Count Reads Obtained Clonotype TRB
121242762 291 1,644,198 51,754
121488195 629 718,832 14,654
120483253 696 390,422 8,183
121109290 4,079 126,394 414
121155824 382 136,938 2,185
121552040 282 75,900 1,175
121545458 255 388,964 10,504
121605112 859 71,206 2
121605113 1,339 540,868 12,340

This number is directly taken from the script trust-stats.py

In your experience have you observed a higher number of clonotypes respect to the number of cells?
In theory we should always obtain clonotypes <= cell number, so when counting unique CDR3 that should be less than the cell number(?).
Even if I count the unique CDR3 (clonotypes) within each sample, the number is still higher then the number of cells as input

Is something that am I missing?
I am trying to work backwards to the explanation making sure the number of clonotype obtained from the TRUST4 is correct and not overestimating.

Thanks in advance for the help.

Best,
Marco

@mourisl
Copy link
Collaborator

mourisl commented Oct 23, 2024

Have you tried to filter the results? Some UMIs may only have one or two reads and are likely to be errors.

@marcoco90
Copy link
Author

@mourisl No I have not. Just filtered the out of frame CDR3. How do you filter using based on reads? Is there any output that has the number of reads associated to a specific CDR3 or umi?

@mourisl
Copy link
Collaborator

mourisl commented Oct 24, 2024

If you are using the barcode_report/airr file, there is an entry in the format represent the number of reads support the CDR3 in this barcode/UMI. You can filter based on that. If you are using the aggregated report file, you can regenerate the report file using the trust-simplerep.pl script with the same parameter as the running log but with an additional option "--filterBarcoderepReadCnt" to ignore UMIs with less than specified threshold's read support.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants