From 74ee18c2019e92728ba11d81f22f44642045f73a Mon Sep 17 00:00:00 2001 From: Alexander Shlemov Date: Tue, 21 Mar 2017 05:30:46 +0300 Subject: [PATCH] Man updated --- barcodedIgReC_manual.html | 304 ++++++++++++++++++ .../BarcodedIgReC_pipeline.png | Bin 0 -> 15526 bytes igquast_manual.html | 74 +++-- 3 files changed, 340 insertions(+), 38 deletions(-) create mode 100644 barcodedIgReC_manual.html create mode 100644 docs/barigrec_figures/BarcodedIgReC_pipeline.png diff --git a/barcodedIgReC_manual.html b/barcodedIgReC_manual.html new file mode 100644 index 00000000..f23eccfe --- /dev/null +++ b/barcodedIgReC_manual.html @@ -0,0 +1,304 @@ + + BarcodedIgReC Manual + + + + +

BarcodedIgReC manual

+1. What is BarcodedIgReC?
+ +2. Installation
+    2.1. Verifying your installation
+ +3. BarcodedIgReC usage
+    3.1. Basic options
+    3.2. Advanced options
+    3.3. Examples
+    3.4. Output files
+ +4. Citation
+ +5. Feedback and bug reports
+ + + +

1. What is BarcodedIgReC?

+BarcodedIgReC is a modification of IgReC full-length antibody repertoire construction tool for barcoded datasets. +BarcodedIgReC pipeline is shown below:
+

+ BarcodedIgReC_pipeline +

+ +

Input:

+

+ BarcodedIgReC takes as an input demultiplexed paired-end or single reads with unique molecular identifiers (UMIs). + Please note that IgRepertoireConstructor constructs full-length repertoire and + expects that input reads cover variable region of antibody/TCR. +

+ +

Output:

+

+ BarcodedIgReC corrects sequencing and amplification errors and groups together reads corresponding to identical antibodies. + Thus, constructed repertoire is a set of antibody clusters characterized by + sequence, read multiplicity and molecule multiplicity. + While read multiplicity is the number of reads in an antibody cluster, + molecule multiplicity is an estimate to the number of RNA molecules related to the cluster. + BarcodedIgReC provides user with the following information about constructed repertoire: +

+ + + +

Stages:

+BarcodedIgReC pipeline consists of the following steps: +
    +
  1. VJ Finder: cleaning input reads using alignment against Ig germline genes
  2. +
  3. Barcode clustering: correcting errors in reads sharing a barcode, + correcting barcode errors, handling issues with identical and close barcodes assigned to unrelated molecules. + Also chimeric reads are discarded at this step. + As a result, we group all reads by original molecules. +
  4. +
  5. IgReC: grouping very close molecules, thus correcting minor remaining errors. + The output is formed by this step as well. +
  6. +
+ + + + + + + + + +

2. Installation

+ +BarcodedIgReC has the following dependencies: + + +To install BarcodedIgReC, type: +
+    
+    ./prepare_cfg
+    
+
+and: +
+    
+    make
+    
+
+ + +

2.1. Verifying your installation

+For testing purposes, BarcodedIgReC comes with a toy dataset.

+ +To try BarcodedIgReC on the test data set, run: +

+    ./barcoded_igrec.py --test
+
+
+ +If the installation of BarcodedIgReC is successful, you will find the following information at the end of the log: + +
+    
+    Thank you for using BarcodedIgReC!
+    Log was written to barigrec_test/igrc.log
+    
+
+ + + + + +

3. BarcodedIgReC usage

+

+ BarcodedIgReC takes as an input demultiplexed barcoded Illumina reads covering variable region of antibody and constructs repertoire + in CLUSTERS.FA and RCM format. +

+ +To run BarcodedIgReC, type: +
+    
+    ./barcoded_igrec.py [options] -s <single_reads.fastq> -o <output_dir>
+    
+
+ +OR + +
+    
+    ./barcoded_igrec.py [options] -1 <left_reads.fastq> -2 <right_reads.fastq> -o <output_dir>
+    
+
+ + + + +

3.1. Basic options:

+ +-s <single_reads.fastq>
+FASTQ file with single Illumina reads (required). + +

+ +-1 <left_reads.fastq> -2 <right_reads.fastq>
+FASTQ files with paired-end Illumina reads (required). + +

+ +-o / --output <output_dir>
+Output directory (required). + +

+ +-t / --threads <int>
+The number of parallel threads. The default value is 16. + +

+ +--test
+Running on the toy test dataset. Command line corresponding to the test run is equivalent to the following: +
+    
+    ./barcoded_igrec.py -s test_dataset/barcodedIgReC_test.fasta -l all -o barigrec_test
+    
+
+ +--loci / -l <str>
+Immunological loci to align input reads and discard reads with low score (required).
+Available values are IGH / IGL / IGK / IG (for all BCRs) / +TRA / TRB / TRG / TRD / TR (for all TCRs) or all. +This is a required parameter. + +

+ +--help
+Printing help. + +

+ + + + +

3.2. Advanced options:

+ +--organism <str>
+Organism for which the germline is taken. +Available values are human, mouse, pig, +rabbit, rat and rhesus_monkey. +The default value is human. +

+ +--igrec-tau <int>
+Maximal allowed number of mismatches between two barcode cluster consensuses corresponding to identical antibodies. +The default (and recommended) value is 2. +This value allows barcode cluster consensuses to contain a single error. +Higher values can reduce barcoding advantage. +Lower values may produce better results for large clusters, not gluing close sequences. +At the same time, small clusters can suffer from undercorrection in such case. + +

+ +--clustering-thr
+Maximal allowed distance between reads sharing a barcode to put them into one cluster. +The default value is 20. +Our analysis shows that this value allows both not to put unrelated antibodies into the same cluster and to correct all the amplification errors. +You can increase this value for overamplified datasets to ensure better error correction. +You can decrease this value for datasets with high clonality to better distinguish close antibodies. + + + +

3.3. Examples

+To construct antibody repertoire from single reads reads.fastq, type: +
+    
+    ./barcoded_igrec.py -s reads.fastq -o output_dir -l all
+    
+
+ + + +

3.4. Output files

+BarcodedIgReC creates working directory (which name was specified using option -o) +and outputs the following files there: + + +
+ + +

4. Citations

+If you use BarcodedIgReC in your research, please refer to +

+ Alexander Shlemov, Sergey Bankevich, Andrey Bzikadze, + Dmitriy M. Chudakov, + Yana Safonova, + and Pavel A. Pevzner. + Reconstructing antibody repertoires from error-prone immunosequencing datasets (submitted) +

+ + +

5. Feedback and bug reports

+Your comments, bug reports, and suggestions are very welcome. +They will help us to further improve BarcodedIgReC. +

+If you have any trouble running BarcodedIgReC, please send us the log file from the output directory. +

+Address for communications: igtools_support@googlegroups.com. + + diff --git a/docs/barigrec_figures/BarcodedIgReC_pipeline.png b/docs/barigrec_figures/BarcodedIgReC_pipeline.png new file mode 100644 index 0000000000000000000000000000000000000000..796383fee1107aad800c4c1f889666d5e0029377 GIT binary patch literal 15526 zcmZ8|1z3||_xDDFq=IxSAkrZXLtvONkdhARmhK4(C?zFGPC`b5(p}Qs-8q^8qxpvM zzVH8gt_!Zgw&ywbxljD=bI$K0L`ji|fQA4701!QWBCiSnV8YR_o$zqbzhfc1uK@r_ z+o$rfYVMQU$=0E!BVSX_ac_UI!N134OxE=&iJ3ZvG>lbo5m)B($-q;{d;j}vZ|`v> zm2fS)gC0dY3Otc#c{)}6_;%|LQ@Xas^t(on&ElB_@9ksZjgF_(X{x(vj%DUnZ`?Yp zJ=?LX5_{d(ams1Wy<6KSroQ&0Pi!|zziQmvU{wEn%DT`z-`>Z%IgtO!IcoPhH;9C> z&d=BP>Nzek-3g+pscH9~kIdzD*Him%6B85J8Q3LmyR%DP$6I3T;`Lsr9g@*+n}ULZ zT@zw*S&LN`c6K}L`I5CS7^1ISI+3q77#P+V2J8Kn^2=w>+Hu(bb7j>D%3GygFNHFf zIX97(T#t2-GCDVs`hk5_uk`ul2$?g3u8Y;yNw~PVFQX5cuH-oPG=bHJ4#!zRE-X^* zw@HI zL|qVS-+EJ|rnL0+d4_+5P(Js2l`QS!IU=TSQRSw>) zvjfxvoH?VLH?GRF-kzA~r@Z;B{Js8qXEa|&#wf(`?cYm4ZtGIZigAuaF2gTXnQ-5W zPu|L8_xbh)N{7(V2ZP$!oHIy7n{Z#9ie>xysZMByUbX9tQsceLTl0bxh+Z{lTa=^j zGi=Xj|IE)rW#F^g;);rKZ>qL%dg~-n$27;)?ys>yKBosO#p+aotu#0^;Et{U{)5N+ zQTJJEzFbKsc5XRLUae)@Tsuv!J?!APPT=nC_QL!|Rwhp&mZW{(;B`fxjZkpkN zZGaQDjijON;hAKzs~h0@7`r_g*_{q(FX&Lhb<5%|L7<%M04oeE*o!m7t5@Oh?X_&nwgHcaabfwmL7?+119hAz@lK_XF)vg)rN@mA)F|J&Ux@%xfkADGKZsTMH>TOibn ze&63-oB2-#fdiTM7L?Ai9t`;<;9a;2z-S*JUp~%4`7i9F=KSg%ccv>_GoQ!VkYbLW zBG(cNFpS(K;|{&0mra%XtobjmciH5;Tk1BI{#^mjvM*2L0+^|YZ24P)%z`5ggE8@S z%QI}RVRs251rE!j_%9Xb0|^(KlLf{_o!24J4Q)Q_-btP>-Sr92hjgL{cUlh*4>xMP zP<3BZ*2W~;M*e9nTk}C<_uPv!J1gtnugtCC%)J@r=Z^^aKZ_OQKW)3_x|B#3a7bQo zVOxDXM0o35HY~y+vOzOhKlkg`uZbHT22cAg&E@ftc)8JkuU8_3z@n{*pXNEgNEs8D z*@BpF7ujt-kjuC;gAv0Pp#6-pq9a3_hk*Z_`NfxFm5=?^3d)qp?Lv^M~BVKXXw}FTLa~=XE5!IZo ze~mO@3rmjlMNoT@8D4HEbCi#+DLyPSGqb*b6dq0|c0f5KE0dUzP}TAN4eG7dwx>co zFGvwA@I$8IL$mtNzpEH}a~~8?8I=9fy_wL+rID0TO`zo+Ca!Nt@~jU^59O?-hYBu= z-ya?x9{Kg_SN*Ql|ws9~?52*$(6^ksQ-kxUkn_N%|D@&o5WS&0J#1czgQPCbs z*Y-bOG1u1CK3kJt7M9bheG|O;1kv5yZF<+hrBSQ)_8k~%d#W6g_00MFc-t?W|a8!U>6rnZ93$==^$Zb{%Cx$3{Y_)4VUm#Yj8%?vS3Y zZcS*!30i(r&P;!B_5UpF4ggLqr~96nm`-%>mf-tX!4`DtioGkR%V8SKvP9q*g0 z4D0hf{u!oYUt1daW2j=~iw+HH*2{kBp~301rDV5R%N^8SGl|G`a$=&jLbvc#7zX92^ z4{Ce)t|D0#r5gaYHcONWAq+)mp=Zad-RzMbNZv4I&NyP)-^vr0g`Qm?`$X)jx69aK zW*yt;d#-UEPPm1YX!@j3x=7SGd$@$>p~W5JiNHE!@-Cs|ZlZ{N8MkIIage(-CzqW8h&Zrgnu(E;}%q^y$s*sX_LtLqAC$Wh)aAL4X_QyY+Kry3DZQ8$%$3TVoFs1Tq{;R`6{w-KkT!imo`i7w#~ zv_3se_wZ95-;Lee7nJ9wUQFO?SwTadE-CY1eQHe}yZ52SS|8JRcc;pChtc+Zs^%z| zbtG^HPJQ*E%EUj9@=Pz)-Q$6q{-H+Y+zW8C{PJeDs&!E)Y4dhSNQekJJkF$0NBxyZ zK$Tz~yT)d^qSz#EKo-DR@N3v<;aOh&j2=5Qp^p0@>Y1-ZZ8$m{5W;P#7SYjlh&!;NH%Wua*wS(Xop&3k7gZ;v zsRGyI981r~?`VtqbQiD_$fVj;h#@{L3|M0{u$g46Eq5+72PHCUz@?-IqsyoqsEBx( zn^$7+cgnLD-kS8KNWBoR3*3je#+Y#S-NF(|D+Wy%*<1>+-0XC+F6cf~33}^nw7hO) zz|p?$)JzBT^62%_fFh%9lL?+yx>9qdR(bsyBKHH*2*3AmL)ov&gF!c9iS}SIgtq|0 z`j}f>GeQyIqn(`#(@KVV(JX{n%jKSV@U4yV=Vrmr0xQ0v zLnBZwDQhdRm$aF$z_o6JimVl@F$<$Ap5S2X>hfG!a88(F_I~K+rhD@I*okEZFIrDR z)L<3la~XSNZAd=SHA2Vam6a8Wqb9A5$^oaGcr1po)w*^Pl4MB+N%zkVT`zljduQjR z_ZC*`sKn#WTNJM!PvV}`{3t0IhT#S{894_$VnC~z8UnnHJzNT&=cbj#esgreuZ@?t z;VQKntEq2khZ&Q`f5+*~(n*}YXEwg8xcwI=rUuc5$n6UTP_5_M^a3NHmmZK!*_|CR@3ZNZz0}Xy4Jkil*6Qcn$ zV5d2lWACtE>3)E&lK5A`9eGbC^C%^L)eti?vtF|C2QdVTKj0J5kc>COj4_wrSwerL zFuJ|3a1}@~J!lsHSwo2Rg+T6O+&eHPDl=p>+%SFSCStYGK`3D4X`a~dq{6KWQz*@R zhnTR|5ZbfK5$=XO;k^w{SMP?#sz^SCk= z;ZK|-SsMrVJ_T2S9O-V6x6;j1QLl`PGB)Ts)AvxOmwEd1(3)yu!9yooU5<3Q9tgEp zw){z2+rn}gWv7-#!O;|ij;6ur9~)}pn6R$j2^Jks)FX_Rg~|j6sdh@c#ncV+G1R~h zLZF1@BjXGOHDWM-^?J_P#kL5_!vGwtw$euLsg}2?0MjwXRg1iP?hKV#4m7PRp z1t<-;nst%dD@c{(!ta`*8xN19sWrF1ucR*pm-JD?8t}oOA ze^3P>9DVB>HJ`J6purj^qJ;nl@OUzsUaGI@Y5+MQSx|!ZmBAHf&tk&o)S_QxU9ea> ze`){pBuxJbeURSKeT%y~TspwJ*Jk8$p*Jy3ST=f8JmI&4<(LK*Ile|OLI76ql;G{@ z`b-f(m?SiVr9K&u6Rsu@QKEMh1cZx=l)b>xVS$p_weg#m-2wFkfJ&=nrauaE9t84y zq5EGk(Hf8<8!&(Esj%kl`a{D)x^wI~Xym@$iNvV*NPFEjz@_U8cW!U$ng|O%2@PcA zD84~np`av^ciMh}eOHvDI=pwLXRYDO3}bqG@$mywETPS$zd#I)O{gJ6Xk8plGwr5Y z{4`CEQE2K>W9>)lU@v_wTm{53@&(9AW=&BuF2ba(c=QQCQNh+grn<*gPcYk%S|B0L z0m_sE#d#ZfP+!tBnF~eNvl8%oZWXs_g!5L1htn*j<*up6S=P!xUk50t*ex9QwCDfc z)Gwwt$?#^gJ?;8U0p!@C{8qR%E3DZKWJrqFP4`A=h=78oK!0+5F$TP$$SOJrPC7o9i~42^}xK^ zzho6Z(mbP}V%k1WlRy{aJ56T1SBB&fRi|_TaDKIGy(MXQ`sFwpG5+4QKPVMM9}ke2 z=K@izW5ti-P%T*9lHa(^RRLd>bp>e6tH`P<61{^kkDjU(ZRW%ZEY)k9d#bQq9IPUHt5|FY&Rjm-C$Jbr*f#v!l_utA@e2j+ZKhx z18bMW)|Z+RSZS}Az`S zqVGS!2(gU&ZHv7D^|9EU@Mt_&*>+#O#fJaA!xx~-r9XZrMk~=XtY{NU9-_N(f-e}b zic(o1vm6ep+`!nQSvYch?b)snnjyjus>0FHmGADBq(f+j5 zb${f;qzXfC3Pul5xJBE}g~wf(&~WG5`ZRPuEQ~{DlkQ#s2WxM`s>&5q%i5it_2#$Ou#TMH*+lu46$*oTU7TQ-Nzrw^OYKyQ5W+ZpSPL_N2VeKPf&ROH6SF4Bx~n?L57 zNo1xaWx%xOGH2`Q>FbN?LI?okkFdna-?Fyi*yeX;%{^}n9hIzLL1+I z+lb4L5Ql*acR1cRm4hdLm8b5)fTe6{QWUJ9lEMRf?a;en*&+5NxfT(%DTfY)S?WZI z%qRN&t*e!*5B>YicHi_`5rQ3g0FbHcQ#W|%gQhRyK($GfxK-{2j3J>&VGomuddg%k z>|&$+p+}e#u+C@`WIbWUPIu?UyLtLr_E~Cw<%wVABu<#Kq+gs@%9cQ0uUqzuOgV7Uj<<_FE^acw@=^@#88~5g42dx(`PFQ}p zLvdTmNb=`6ubX1ysKb$&{)d!1D)=TpO?T#!>bJ%@f1Nd{JsG$&QCN9fH(YCh&U@)93*)qIs~f(O2z_<`LWBhssyVPTglS0##1_$lsUkA$h zQv(vpbKN!TMs7(IV!{SpF~*xP1yB}S1YJV56jvker+0@VEzaF zqed8q;%O(Vv6ky$M~7EL##pgr+tX6j^cfTr20Ne9)-v3<+)fKYOp0lj3uq~R&hV`bU={W! zc#pUgOdH&A`~IwWOVLhHm)D@VuD#YVKRI>(< z!&Olh*s8r^p-hE1!_q)2{e`33kfUFr`j11k)|$daVH)b+!!%6NHpM6c0BX|Sr*zo5 z4Y@Sl$)OO$X%&4X978QX2{2b=?)h_RX=c~L=?*bGqpxh$0M&i7#^Oc?k(@3(tLoP7 z40x|;Q7=c`r8}--?=Ls0xvxqrL3bx!blY!^h(L+ClNr>>*TG9+MZ50-R@JT)b482= z_lpMksQr)accKVBBWgE--{=8_cjJlXc|u~N*QCH6BgfUiVQ+E~T~CkHz4s2}2AEcw zP=aYdWXIeE&h!eymGXUHEQBacXAqfnZR5i>_Xm_l zyQ-&8H9TuJ4yGb)NGFwewSrPxeync};;z!5fVH|f>y$$isuh>O-RGiQylj|=U<^(^ zvYnI#5t%^fq^0Sk%I6SSAZM6WvV&5DhDs47SVJP&eU=yWt5(E@ifZJ3RFw-!dh`2P zL{7t_7@~rx8vAZxv^l?95%m+XPAukaYbmuNgwqt6&Xl2_y7Kzqg?q|4?eUM4wSbpo z!)eL3hUrFLhed9iKh%cfnXji6r=LscELZ8KUbuSs(A+XDu!>@2k-=WO5b7qVxHdFr=hmd0mI&x2<2r4={>G8j^PpHuQ?!P_`y(9vrcTx; z92%93L>Lf!)fh#-<@5rwZ>5Xh8?&W`lOiJCV~9~$so38K(6y-ok!}(w9vbPG`%bL! zevMwg=DPI;17(bck;r~1fg9z)Sf3aFdryTh8Mv3D}S_mLk{O z_N4OUjR>wzvr1lzAOR*zI2Wush*h z-`-UPn#*}0YvLLEuH$AzoYEGr+67mVFhOMRhZ>j}r*DnTi_Tv9j#bK>4MZr7ZPC+7 zfPURgFZjjgZr>dK$$#4g1AgkIPzwx-&p z$DJ<7Ab(Ge`8WFb#4p^?-X1`Vkj8^R+slo|*Kc z1RS+&-726OXWkG0F`O3wS%xiUKfGoPHJbD7(5J}OTfwp$@2|CB?&CX-8<;*ny9kQL zv?-R2KSMilwbzI(ux}HI(M%;*^&MkEvPC;gRDylBvhx;4!JJ`OG{kCOgqlLqI#(iJ zQwy)wMfQ4d9xaPi0e598foljznrEa9M$-d3i z@A=uBypMHD`U@`PNe%%xV-q|y51WnPNHS&g-F_MoPE^3i7bG?4U_q%MWZyksrA*&d zU2s%Ad7o&mLMbWLx~&>=c(t;EyJso)5X01A8xKYXE!_Q%Gu#hWY+x~E>|>2ME77n{{DrRBo8ZDH+3@kONr$CGBF?6WAtSRMyl+VW zL;z}f(75UrBR@ltb*<7>4?TOP{dOkv)L9NXP#EGNd=R5{I0OfkIUdOjuFvW9MtB}* zV+f3=y#kF;h#FUEs^~qXr#UK7q8!S8Gs&OxeA%I3-Frc5z+1C&P*JK^#5G^v#RMBp zt}l`5X#0LW_(>g(C2^2d#sKX5;PAIS?-bahvTS&|Y-a6Uf*216y6`c5MO#C-5u{`D zH0>M2McE@`yuDuIkS?)4^bQ`JW_htIKp!`DHupy_F3sVLJl7Tl%osU-8$cbaPX_za z6U2ew_!Lem#dhrgw?(X*tCJgCx?4;rR9c;!T%?2Q;LDbhj)kYUrA_eQxi&teQ0>uo z-$PQZYcr3@HlHcV$wd?*Mn;_=S*>k9x8iFk^%M_=W3!{h9;u(E&GKNp$W~6a$$uDw_5b)&D%wXh1(FWY)z+R(1;QXa1 ztbicz@Oa9eB#E*}8JcVL;ayF7vZs}je$84XV$w(~(TcuDJ zf@JF!HQVVbd11rVRL8|AG=!$nNBPrWn*n}tV>KaOI3c9;n|_%T&4?I`c=A^*4z#Ls zbQ2GL$hSt+ZlT-%e3~1E@cPX7%8b%#GoepQ&ORTVv=n|FxkwpZj=N#PO^+&EgAvuO zBJY2M?fS32Y-I5Z9B|BF9j|16rA{p$WTh|jtM=II8;S<-XE(modYw9+qVD0UZRtqN zr6be2drMXH-e~gQj`jOF$-Qp|(u>2RI>yac8IB-{Vzyv>)n+**y1SKS+avN|j#pkv z!)t zI!^pD367z`>>Y9c3f>cF{b`u1A@#IkTR~O7a!p8~u)bA~tdDE*nbBr*{daIv(JCy^ z*;ZR#E-5_3PVvs!!Xj;#$004mX*XDD&1qe79RAs?0SkONMB49>M$m*tcn zpxijh5odM5%HedClYB`3wFkxf5c3FYaaJh7Z2Doc+GR_`r^b@K_=-b>oECA}v^tK^1KWD7-Z?VHPgl8p)j5Y4omP9vTvHs}p zUh}lvbio?l<7{UZHpnfCgVH?G7m^VY2U)bc-oBSf3+mCMpnjTJLq&uUB|$ICt#idv z?uNaaxDo?$?4nGQ z`74t(eMFEOs^VD=(PS8tP|uo^y0G{X>vLA72)rtWu}rjI5S|giPbamLF>+^@2!4Tlnx z$deMJnAWMi1eq2Bg|`A~_)rO(2THC#z{Xk79G(#IG1WCEt9yte3@e6YX%>_6E;nL& zGHk85!j82eE4j=ZLgy3@6=)F7O z99o!kzqWO*dDwe#04HMupv2W7yURlU&4xFFQh96}3?Llgpm%pbYo;M5ZDj=+<2-AZ z*G4(;Sdkop40P$9bIQY}#OLMtuQ!{63xV!S zJ~+c^uId}!s+fr7kB_XlEWhV+Q9-yd5xjjN>#teA5kqsr%3bF(?34!@UhQo@If^l2Hh_b%dglP#Rf+4*IhO;#uwH^?I$NXtVIeEezLYRZrzUj`FiM zz7loK-X&`S9{`@#&{opgN8+mNBqZW$cJ(!0c|Sb4B3EF+&PabnC6Qq_c$$0hQm@Ht z@F}!}3`fmY(?uv1-L^%SEHM!X4>BLid&AfeYT0$?uh`ND-Y`o-fP=RM*_61R5H%HS zUwJ7mqT&?yR14!d6q*fLh5iVe8;=wC_VbfutV_Z` zdqkw#LEjUKKIoEI`1M*!>4LJke-lodW4>KL6A`+`g^ns%u!9ssx0;C4a({E`=aeN_EGqbL!%%aJQV% zemX}`n5+62^65=40^O&!8+Y&;jJ~wFQFl83ge~;G^rFSM_2h9?mOgD>awwkDb&OmV zR2oZX7YAa$QJznN0V$Qsj1FYz<4bXoRHpyB8@Eo?F{a0*S{+G#kgyi#1(#zLL3~m| zP^HLbowrKmoZGXzsOXw}R#B_Q8kVgI&}`dIB=+XK_eIvK9Ce;cw{u!m>(K1GEd%x`HSDA2P0rh;3;gF=gSz{0Q4 zY(X0E z|H;nCydt%9X{=NVoe8h+>@{@3OmjU55}aZD zJ7h}-AJT%e~YEHXjyR+ZMXg}?Dec3bjmHip~@M8LA_J!xk zWxpJWaQZ0O<^D}S|Ja#)<%3g^gRm6k*7Z#t9EI*~9x#%ziHWG({}^okyDI;0m`N2F zEB1A=^`=G8IZtA1HaY1b4uj8(_P2ezsXy*zn-f+~)&~wq{PPV5Z=1+(AFNY%JA8TrnfME?FMKIdfR{@4{F>iBqc|;jNy`Hz?@>< zP3L`D96X5X_*V1vW zxqzWY9#6fU$cta681I)PJHdQ-_0}m+)HoyFn3Mch1@?EqQcO!ArW+NN6!tz7;1`ux zm~9oznF=>OwV1mh3i)e2M?WY9t4EINmhrGUyHp&I6VUyLo*&!$-zvhzVF%1BCSac= zRCj@<5Fz|$cHD+=%ApI@{^){E*4(2%i$#B|)sQ%ChA?LT<;OA!)@-18aafrf9b7ViaO6l7Y^sk&bu^DN+wRqeACK?tm1sWJgXXTn5UwtMKgUOl)gfqxrGvgjwFTQ(Zdm`A>$OoOhyQ`>mZ zE+B7^eyx5eT`Qg`nKf2>HYNL>_UJK5@aowzbzip^Fe-2^J_VBoCGzh@5G#(}meV^0hs96l&=Cwr^8$b$J; zTor;1uhLMVYN`mu_!CShIKs|eN&v=KSNB@Pvb{=G{B=eurf6AlCvzE{(hhY5@F>xv zbH7KIFd8}BYK~Ye2|lBmYPQwr!!OzJFh1l1-$F)?=O7hnrR~piH*F^HEXiP; zTo2uri?Uf;XF_3LJlL}}t)|&gq*ImVm{xJnoRZY%VIP80l#Rnbc6-3?fq@Ry>a1#V z9R?{%=_i9B{`;G|hkF)%5r5*9A_c?|KaKs1XTIhhytdU_XhUszNfO_I2ATa9q)=hG zPrGAQ;rFML?lKLC-F-UYE9Om6x}Q#@j}7-oln&omz3`Z-@2t)ufRsj;6F?~BC9u`R zB|;9G0mgshra(pZ5$gJNFsiXSl<$@u=tI5-e7X3 zXD!gYPh;RdBRVnW#O9Hg7jCh>uUkysf3QJ59PJ_R(ozaJ(MRE=zvP?q#o*7%5Jdp>tW6a^6b6Kvk5gRc=!yybkCJ8Y z^tt!(aZ|TIVTMmCbxMGx5m1_U!fBdJ*TI5uA4zx_O$pupf_tOZAWoB5vTXz84obneH{yAO)5ikUt7a{ ze_iAw^3~no-B%2e0=RDvT0}lwEE2pl2}JU{D0GRAZVWj&+SSIT!6W~A@*|kljDk;` z95a0c|K&GCk|zV0FJy9k_#@3ahm4;Hltwn`6F}W4a)VZ37F{6ChQbWrJjkaLlhE-hm#vdcGb$%M75FM!A>$Xu5!IE zc|hr*PWp?(xIRq<_&5ATg%lPD$Bp%zk!9+K2XE4>`$yK<>3UNP3$Yl5l5MB z{=r`Dx*$)U6x=zn-k;1mV|{-*=9hKmMA~ZDa1f06?m)33`~9i?vH?>=a)=msb3yBo zSwJ4aZ5NY?Q(#$+saXy{J>Yrg>gp|M@}`pfdV>f z8LAP{2*Z?4e+TXpBl={qUgGrfujz>bt=`Y~$%HCIVswE*J$2%=2kfEx&uLjIyBxgN zPfrev#s(B>DfJqXMKMhI@j79utRz1{-t10r^ubQtM?Nl9C4$Kou7-_*Z(bPOOia)# zgEF07tQq0F_)@yxtaHzP=~bWAOdga#1dv%$n>`gG`TFoefw1wERhhpDUXQ#sZQO+R zcmrOHz0SnsKf$STJrNW@+J<@oIyG>3sl9hG+4(>H*xrn)=yuA>cI z!k*Xp78XPI7~KsKvgz!6-u0DuM9K5Y#Vff^{kNN?K%YreO+E5;R{&Sj*vHk>rc|F` z#1a}RoY&@HD)sf%o!%C>(2rTmna`Uh8OuVdE(yP90b`DZh+=p!R0gGOg4^B<*l!!Y zD#R}$>2xx)(e?g|hLs+{1PHWNalVfYs!Y)F1^)(WB2KDKh@JFvh2;bJZU$S>^DF4ok8o6n=B(xUclly+^1ZMZ+tI3s z-M>~pFT{o*YaZBGd05sF{hnF+dmi{bLPiAN#8UhI%ozio++->PjNM?0WS_P+SuPuc zCg4hPVybcT{&g-1m=#Fs&+EVH<%Q;`IE|PJM_4_Pgc@Py;5;1!(beLp$s5+C0aR#w zv@nr>(@GLBmiRGIpgbf~T;YEC@A)+}5H~?LPL7=QD)6MqHv_HYTa|#NHu+UbK=pJ1 ztEFHcJmW?m>-P!(2r>soy*Bu3LYSmV2m_>#wdN?J`wH{qzxS>RIeVm;%o1I#_j@i2 z4d1r|9CSwbj^y)4GHb{yoUSTKTUAsQrumwR;dGYwswc}E|51D_xrWd@6*0m0Bi4cOU%$Q{|9pi^ zd@91EK^a=Ak0yrEh-PN(?p7fF@n>lIVonwwqN#6{W`RYX?WFZuM!_@HZr4|{wO)k> z0k_(1$xz2nYp%q`A}K=O4&0gEC1mxv{<-=j$#{gc>+c9xdzV-9Y20k zr;HiNuRlT4S_;@{#p_Ml!WDJ@!+?1SoielO+oQzwqx7T z-HB%&RI2|%>`tGcr=?YKGiFcuy_AKmrLU(4n7I}mcJcXH$N!Li*r{0FAs83bMPy|B z1_+HP_*yDkh$*AOYD(Aw$I%kD4ZW-Xs#FvYTK4uIxz!fI=vLT{b^Ml2!Zk0FGr>ts z^l5dO?bLB5nrP+QtzP^eP87BrS0Xc8BJ=GS9>Kk@TShrqgun$hlXDt&B|n47pXPmD zxFifU82D4;t-PN`2htU3(X;fL=4NI}`!z#O*MfwIHJR&EcD-%!zmy>{mk!cui&t!- zV(7}bS6~Gj;r`G&KCIzHo_`2XRoSk@oxA17-q$Dbj&zd?rFV<0AVD@ z*>TK5-}mk-Iq>4(+%>DML<59Z#Q-h1mzj~#!ZU%I@Ad*xYz(Ug*G1wu@*b)MNF-A8 zl^Ac(yujvIV$op z(>a~?+x8&}6<+D;|43r9#;1jN6Y;gEbq!~w+)I^Fo%?)svzAjKF}rAUVEAs{nV$BK%co@0Ldwf#YwKah zDVOv}PED|7YildEg_N>~l>WOKUb&%nw&9NQhl5HrLj$Gj>mD`3Yi{b9<3kbfIM)j} zgWsB3Ll>IJvlhjllA4%U{h8nzlK~(FkR*I>6v)^ADoZIJPTA^-CQ_=esTol@+Unmv zXuc)KKuJmY(Prd{M}3MuJohPp9p&<1X6@+90I3*CYqIR&<#`auXY$x47k8=gaJSI& z-tfQZ(LuO>O`Rr;ac=OCYEd_Y`HN97R_Pcxo|)$^A6BGO#O|5j#?c{)q?`0|SD}(# z&gQg0rk``2mx;{)v+ODF{mTV(@R{lm&5AaQUL9-}qj=Vqd}jc2h~qZWs>}D)ugbxz zEHS^TE2DzMtK?^AmS^Yn$R8!1zE|p01O1. What is IgQUAST? IgQUAST can be used for benchmarking of adaptive immune repertoire construction tools and for quality estimation of constructed repertoires. IgQUAST performs reference-based and reference-free analysis:
    -
  • During reference-based analysis the tool compares two input repertoires, the reference repertoire and the constructed repertoire. - This analysis is separated into two scenarios: +
  • During reference-based analysis the tool compares two input repertoires: the reference repertoire and the constructed repertoire. + The analysis is separated into two scenarios:
      -
    • Repertoire-to-repertoire matching works with repertoire sequences only. - It aligns each from two input repertoires against another one and computes metrics like sensitivity and precision, +
    • Repertoire-to-repertoire matching only uses repertoire sequences. + The tool aligns each of two repertoires against the other one and computes + sensitivity and precision metrics, detects error positions in erroneously constructed sequences, - and compare reference and constructed abundances for ideally reconstructed sequences. + and compares reference and constructed abundances for ideally reconstructed sequences. +
    • -
    • Partition-based analysis works with initial reads partitions (RCMs, read-to-cluster maps) only. - It compares two partitions and computes partition similarity metrics (like Rand index). +
    • Partition-based analysis only uses partitions induced by the RCMs (read-to-cluster maps). + The tool compares two partitions and computes partition similarity metrics (like Rand index). Also it computes cluster quality measures (like purity and discordance) and plots their distributions for both input repertoires. +
  • -
  • Reference-free analysis is performed to the constructed repertoire. +
  • Reference-free analysis is performed on the constructed repertoire. The tool detects overestimated clusters in the repertoire using amplification-free Poisson model. - Also it estimates error rate and error profile of the initial read library. - In case of the reference repertoire is provided, the same analysis is performed for it. - All reference-free analysis presented in the corresponding scenario requiring both repertoire sequences and read-to-cluster map (RCM). + It also estimates error rate and error profile of the initial read library. + The same analysis is performed on the reference repertoire if it is provided. + All reference-free analysis requires both repertoire sequences and read-to-cluster map (RCM).
@@ -82,9 +85,9 @@

1.1. Input

IgQUAST takes as repertoire sequences in FASTA format and read-to-cluster map (RCM) file in a special format. See IgRepertoireConstructor manual - for the comprehensive repertoire format description. One of these two - files could be missing, in this case the tool optionally tries to reconstruct it using - available information. + for the comprehensive repertoire format description. + If you have only one of these files, + the tool can reconstruct it using available information (use --reconstruct option).

1.2. Output

IgQUAST reports: @@ -117,16 +120,16 @@

2. Installation



Please verify your IgQUAST installation - prior to initiate the IgQUAST: + before the first run of IgQUAST:
 
     ./igquast.py --test
 
-
If the installation is successful, you will find the following information at the end of the log: + If the installation is succeeded, you will find the following information at the end of the log:
 
-  Thank you for using IgQUAST!
+  Thank you for using IgQUAST!
   Log was written to igquast_test/igquast.log
 
 
@@ -141,19 +144,19 @@

3. IgQUAST usage

3.1. Input options

- -c / --constructed-repertoire <constructed repertoire FASTA>
FASTA file with constructed repertoire sequences. Could be gzipped. + -c / --constructed-repertoire <constructed repertoire FASTA>
FASTA file with constructed repertoire sequences. Can be gzipped.

- -C / --constructed-rcm <constructed repertoire RCM>
RCM file with constructed repertoire read-cluster map. Could be gzipped. + -C / --constructed-rcm <constructed repertoire RCM>
RCM file with constructed repertoire read-cluster map. Can be gzipped.

- -r / --reference-repertoire <reference repertoire FASTA>
FASTA file with reference repertoire sequences. Could be gzipped. + -r / --reference-repertoire <reference repertoire FASTA>
FASTA file with reference repertoire sequences. Can be gzipped.

- -R / --reference-rcm <reference repertoire RCM>
RCM file with reference repertoire read-cluster map. Could be gzipped. + -R / --reference-rcm <reference repertoire RCM>
RCM file with reference repertoire read-cluster map. Can be gzipped.

- -s / --initial-reads <initial reads>
Initial Rep-seq reads in FASTA or FASTQ format. Could be gzipped. + -s / --initial-reads <initial reads>
Initial Rep-seq reads in FASTA or FASTQ format. Can be gzipped.

--reconstruct | --no-reconstruct
Whether to reconstruct missing repertoire files if it is possible. Disabled by default. @@ -172,7 +175,7 @@

3.2. Output options



-F / --figure-format <figure formats(s)>
Figure - format(s) for outputted plots. Allowed values are png, + format(s) for plots. Allowed values are png, pdf and svg. One can pass several values separated by commas. Empty string means do not produce plots. @@ -188,15 +191,10 @@

3.3. Performed scenarios



--export-bad-clusters | --no-export-bad-clusters
Whether to export untrustworthy clusters during reference-free analysis. Disabled by default.

- --experimental | --no-experimental
Enable/disable experimental features. Disabled by default. -

3.4. Algorithm parameters

- --tau <positive integer>
Maximal distance for repertoire-to-repertoire matching. Default value is 6. -

- --reference-size-cutoff <positive integer>
Cutoff - for reference cluster size. Lesser reference clusters are discarded during + for reference cluster size. Smaller reference clusters are discarded during repertoire-to-repertoire comparison. Default value is 5.

@@ -224,40 +222,41 @@

3.5. Examples

-

3.6. Output files

IgQUAST creates working directory - (which name is specified using option -o) and outputs the following files there: +

3.6. Output files

IgQUAST creates output directory + (its name is specified using option -o) and outputs the following files there:
  • reference_based — Directory with reference-based plots:
    • error_position_distribution — distribution of error positions for constructed repertoire sequences reconstructed with only one error. - This plot helps to detect sequencing technology and repertoire construction strategy artifacts
    • + This plot helps to detect sequencing technology artifacts and repertoire construction strategy artifacts
    • sensitivity_precision — sensitivity and precision depending on cluster size threshold for the constructed repertoire
    • -
    • distance_distribution — constructed to reference and reference to constructed distance distribution depending on cluster size threshold for the constructed repertoire. 8 plots in all on one figure
    • +
    • distance_distribution — constructed to reference and reference to constructed distance distribution depending on the cluster size threshold for the constructed repertoire. 8 plots on one figure
    • {constructed_to_reference,reference_to_constructed}_distance_distribution_size_{1,3,5,10} — the same 8 plots separately
    • abundance_distributions — cluster size distribution for the constructed and reference repertoires
    • +
    • abundance_distributions_log — the same plot with logarithmic Y-scale
    • cluster_abundances_scatterplot — scatterplot of constructed cluster sizes against reference cluster sizes for ideally reconstructed clusters
    • constructed_purity_distribution — distribution of cluster purity in the constructed repertoire. This plot helps to detect overcorrection
    • constructed_purity_distribution_large — the same plot for large clusters only
    • -
    • constructed_purity_distribution_large_ylog — the same plot with logarithmic Y-scale
    • +
    • reference_discordance_distribution — distribution of constructed cluster discordance (relative contribution of the second popular reference cluster into the particular constructed cluster). This plot helps to detect overcorrection
    • reference_discordance_distribution_large — the same plot for large clusters only
    • -
    • reference_discordance_distribution_large_ylog — the same plot with logarithmic Y-scale
    • +
    • reference_purity_distribution — distribution of cluster purity in the reference repertoire. This plot helps to detect undercorrection
    • reference_purity_distribution_large — the same plot for large clusters only
    • -
    • reference_purity_distribution_large_ylog — the same plot with logarithmic Y-scale
    • +
    • constructed_discordance_distribution — distribution of reference cluster discordance (relative contribution of the second popular constructed cluster into the particular reference cluster). This plot helps to detect overcorrection
    • constructed_discordance_distribution_large — the same plot for large clusters only
    • -
    • constructed_discordance_distribution_large_ylog — the same plot with logarithmic Y-scale
    • +
  • @@ -293,7 +292,6 @@

    3.6. Output files

    IgQUAST--reference-free to enable it. - Experimental plots are not described here.

    4. Citations