Clonotype counting with lower cell cell count #321

marcoco90 · 2024-10-23T02:21:23Z

Hello team,

We recently processed some T-cells samples with the Takara SMART-Seq Human TCR (with UMIs).

We used the command "run-trust4 --barcodeLevel molecule -f $Genome/hg38_bcrtcr.fa --ref $Genome/human_IMGT+C.fa -1 $read1 -2 $read2 --barcode $read2 --readFormat bc:0:11,r2:19:-1 -o $samid -t 10".

We accounted and corrected for UMIs to remove PCR duplicates.
However when looking at the number of TRB clonotypes, we obtained way more then the cell input number used for the experiment.

Sample Name Cell Count Reads Obtained Clonotype TRB
121242762 291 1,644,198 51,754
121488195 629 718,832 14,654
120483253 696 390,422 8,183
121109290 4,079 126,394 414
121155824 382 136,938 2,185
121552040 282 75,900 1,175
121545458 255 388,964 10,504
121605112 859 71,206 2
121605113 1,339 540,868 12,340

This number is directly taken from the script trust-stats.py

In your experience have you observed a higher number of clonotypes respect to the number of cells?
In theory we should always obtain clonotypes <= cell number, so when counting unique CDR3 that should be less than the cell number(?).
Even if I count the unique CDR3 (clonotypes) within each sample, the number is still higher then the number of cells as input

Is something that am I missing?
I am trying to work backwards to the explanation making sure the number of clonotype obtained from the TRUST4 is correct and not overestimating.

Thanks in advance for the help.

Best,
Marco

mourisl · 2024-10-23T03:43:45Z

Have you tried to filter the results? Some UMIs may only have one or two reads and are likely to be errors.

marcoco90 · 2024-10-23T06:40:20Z

@mourisl No I have not. Just filtered the out of frame CDR3. How do you filter using based on reads? Is there any output that has the number of reads associated to a specific CDR3 or umi?

mourisl · 2024-10-24T03:09:30Z

If you are using the barcode_report/airr file, there is an entry in the format represent the number of reads support the CDR3 in this barcode/UMI. You can filter based on that. If you are using the aggregated report file, you can regenerate the report file using the trust-simplerep.pl script with the same parameter as the running log but with an additional option "--filterBarcoderepReadCnt" to ignore UMIs with less than specified threshold's read support.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clonotype counting with lower cell cell count #321

Clonotype counting with lower cell cell count #321

marcoco90 commented Oct 23, 2024

mourisl commented Oct 23, 2024

marcoco90 commented Oct 23, 2024

mourisl commented Oct 24, 2024

Clonotype counting with lower cell cell count #321

Clonotype counting with lower cell cell count #321

Comments

marcoco90 commented Oct 23, 2024

mourisl commented Oct 23, 2024

marcoco90 commented Oct 23, 2024

mourisl commented Oct 24, 2024