From b7e6c7f93040ba2252e824c2af7428f7fac85872 Mon Sep 17 00:00:00 2001
From: Takuya Kitazawa <k.takuti@gmail.com>
Date: Fri, 25 Nov 2022 09:43:29 -0800
Subject: [PATCH] Complete the first draft of JuliaCon proceeding paper

---
 paper/images/tradeoff.pdf    | Bin 0 -> 16575 bytes
 paper/paper.tex              |   4 +
 paper/ref.bib                | 439 +++++++++++++++++------------------
 paper/section/algorithm.tex  |  72 +++---
 paper/section/conclusion.tex |   5 +
 paper/section/data.tex       |   4 +-
 paper/section/evaluation.tex |  98 ++++----
 paper/section/experiment.tex |  49 ++++
 8 files changed, 365 insertions(+), 306 deletions(-)
 create mode 100644 paper/images/tradeoff.pdf
 create mode 100644 paper/section/conclusion.tex
 create mode 100644 paper/section/experiment.tex
diff --git a/paper/images/tradeoff.pdf b/paper/images/tradeoff.pdf
new file mode 100644
index 0000000000000000000000000000000000000000..3eeab281ed0baea75f6ca4c3fc4597151c94a2c0
GIT binary patch
literal 16575
zcmd_Sc{EkuA3v<j7nwq)x@1Vno$qx`ndf;*q0Et)YmOvCiZY8t#%M5CLM20lOrb=Y
zL>Uq(H1ON!N}tI3KGttNe>`hFx3!&p-e(W*VekEZ?S0-SVSQ!Q{U`}6O!&qS{CYJE
z1xLc&ZH~iaWZ(#W>tpuba5SJ8!V$&+9`<mAk~PWN#r+suP7Y@8W(O6N{HubBKS|Aq
zWKFV%V^=Dvy1S9!n3cS~HHl>J<p#$=w=jghm%FWzJqd0O`cYN|?c4j4;0O&@(5K?>
ztJ3eQ2HX;kFmklEb8<Td$FEd3)3Z5lZ%cv`$c_KU4hjhfs{$yvxw(_PK^LID5@5yN
z4crk|8c_3c_wj&3Vk#NI5z6+yPPX=jYM^UyXY6I|=IvqaW$$JiuqydVdC(C;$KKA#
zTG8Dfv<r2H!x1EKM7$&c2ggccB(P{C9*u)~)Q1`%t?JLv-rL>B%N8gU(il*{ib5eB
z|D{ccxmE5}om_yz;0RS0piyOeTX#EP0Bw7>V<bm73bK<SRMDH{WpC{Y^UrvHNXLB$
z&vI|#;OC2cir$e8CBrUNIQ=^LcU4o_A0K+)i!D#~);Ps1{g69;aG&N60sD847_gzw
z`X|q%&+jCjFjjwQoL8J=Kv)Pcd{Fdi=ErG4pRla!t*^(6y(An{PcUvvz4Ja_`Fcd`
zv8bw#N0tmsc3nTM#E5Bd6*YMud;8R^D<|cdrd>M*{rLmC2hn>^p+;qf&g-jA(8^xJ
zHJtO`uX*Fcjy_UOh{^o;>o{+f8zEGRSNhAEEV-u=vqyB!Y=vz*FPc??5IrWUHtuDr
za$<sb(x~?pO<Te>zdrM?&t=*|-y|Qi*PSIOHRSkLSmjHFnd$X)-<4W8wVm5%sSKuY
z+9$DAOGM1!a)#8h{=I}}<3Ck{Pab(TlVzlYKkgo!$EL(Pl$?Ibnzm42qOwZ9y+iA<
z&d-9OQue7F<xb(Vg4O$^Pg62~ZeYVz#qVGo2-`XPDrr~JS+N-INv$0!q=O+ev^!tZ
z9!kGBTaS*0g^3fWd=}@9m=_mm@;ImZXD)q8eK%jSbYi*sZc(wX5@AlN7^~EaM>Hz+
z4xPTW(DWemoRvk43%@At2}!0XcECjR$E2ON!^xqg_HKo7rn(oSB{AC@xGTcb=93R!
zRE^{G61=RvYZhJgD$bUwl?`2kilaXl8NADz_oaK6P{#CL{Tc(siia=8<i_#UV}<OR
zqijAq@_jx%6f3@RM0K7QC*8B1p;yMupdrGs=0!_N_b*k~Sf%Fu1{chq_2*I-&y_>f
zEES)UDT)S{lHK}2wOeslCQ=HUZ%*__lk`)U?6M4kc8rx@8t}Cw+)=;Zbfug?HMyOX
z^j)%f$D_9Tr$c>X^)fVW0e*|f%O~zxkDRyUi<rAHE!1YjuAnsLqWzLxiHbMnuH2Or
zPQK9&&qfR@g5O5_N{o2hP!NxCP0js2)4rY8wF5>&=y&sWZiO+{6t+qGcpUGYX+ZNg
z&unijZMX{KeA=VNP@?0egN}bn|AFsKh1|i2k(;IGBA?||2=gb3+J3njXlos&W5v1+
z&$;OC7bn-Idu#GT^4ZFha*NSl3MdgB{EO#2)RuH>;~ubg&R$3f3E*5jzL<(*F}74!
z=6z3aW(!Ox(fat<qM&v^vxP<Haskoz<I$iihn64F7#9akd*AtZk9qdN>-PqtHBZY4
z=a=#)s?(~D3eD=u*1EOu-SmBOYD*1=28Q2@U9S3~j$6sJ=&PcqQ3F@Ab}oJY@y(_>
zKzg6qXJp>6{H*r2_pYT}NBCk4oX$QH>CN1S(;aPId``OSDi*5Kcbm2PNAm6(ab~$*
zTZcq?eEU%y7jIE&BuidXcA3wqTW?aSRWIHQh?Z6`Pvsdfy<iEqoqYT_`@#Gbj+xY^
z$j9`3>NeO%J$!@eeUcg>*2lhoDLeB3c{|Cx!rE$1pyL&FZ0{2q%3!!Y*J0j}Lw5bN
zd>?HU%Pnhfm+v-Vr`K1rIH}4B8WOR&<%d-1E<k3*qeo6yO33x~r;h*f4+-CYWBN^B
zAIZyS#%Wf8|LDm~?PJffmI6$9YdJpQzfdmQY`w_LKix3!R&!)W^>A}etz8aN_C#<a
z%Ix$Q^Asl$o_%QY+~8BZ)a3%VX~$0o{BxfOoRcaOJLBnK>D4(qq9th~(?rpg{xfq~
zpf##Mokk_~W9LUN+R^W-J`Ya|qg|xOZ+D-svajm8&8YN(wfy!nsw%Pa2_^CM5K^>v
zTY#DLr1ey3)UD-x%+k+oYaZRYbmZpT;FOJfd7;*is!G?JO1}c|X_>kbpHeTEf9I;@
zQ15|U+3#iaw^Ji8uh2S-!eEiBo(}!DYx!S`v!vvjg&7x9!&HZ)Qw}PqVqK67osTw*
z|24>BE>RWK!#+l=YmsAC`_Q=GPU%g7zXmZ|&H#JZUFx>F_k&tH3>U{Q%usN|)M%(3
zR%1k<`@PruLt=q)sV{g#{$(2dx9|R6{GoBgH4MrcYIrpu8EbxUoY?B8F|Bb`bZq(A
z*16tP7TdEFbOnCrS_c-K_8kftX_k=I61;PQ);00R*=w|UZei`h&B6!I9BLiYj5DiD
zN;AtU+Sb~MD>PZC*K60-juK+kYSEBATN-kNcK%Apy}U2xc0Y4%PxqK7Xj5P2`IMZS
z^t{wKi)Umf_GOZ3R+_Ipt#~ZKE?>mw``4%MXtj{}2)1s8GmTjzUQyx<Vze{9#l!Oc
z2W#McM!>kr=@;UM4XLjaF<Z1$pKjZ2<=TG{ZnCvj#hI>>^X+Ve?mj^|B7ua|G0BW*
zDK&g!7S!p;#8n+x+v5?K;JLiGt9IY;v*}$gzBt%zpi2n#Y|0Vik!y7sr^VI)XH=Tc
zP*kg1IGIVR*8KF~Cevk?(jDP*b-^}*xtyY5C5=%u`3*v!T{yynkOQn<;UiQ!k>#g_
zJeini^6Lw1)21I6`1EMrsO=6J@WVP>CY{}{66GtfJoZ&^DEBqb1h)*c6{_WhX8vRv
z^00kD@oeFTTf3gVk!4oC-t_(}P4VX!YJ{Rad$q=wl>M;Jm*VgI*!gN#n!TjZ27*H9
zY%_uquxsSR;0Pv2{nAPbwt~n}*r;j>Of1quUhYhC8#Q~|mWp=m_E2#pExVNHGJ!aI
z8|%nMG`=jfRnUw^p2tOJTl*YeF_wbTb4TI^96{J|GaN}0)^H?yzJ?V{Go_#cE2=N_
zk-=U?8NUWY4>)xxN$FP6;~6`<r|LUS5fnbDAW9|NLtz5vgt!;Kbu{IerZtrmsEmhW
zH8KbWA9T-Yj#0jw+vYD<?ZY3q0ecXB-3)suJZ=qtDi@^LC<Pdmf^@TYhn}npz$c!)
zcapbFT7ySCM)*L^KB;f54YszekDM#ZMC&;}8A>r-Es7~Qd-a&*l{Ra%k-Nnc3#)6T
zEpEDH-*=qIo}C;GFEPp7QJ<<!&xA~tz8m%RyFo_E385N>u<@kW)_SB`&%C_0&ZVmn
zr+!{Q1RPh;PaBPOPDxgC<}kt@ySUe!A?)j^D^mI2DW5KDU%hu=XiheC18E_+y%}jS
zn01Sup;m&q0^Q&PqYqn)k1st5$Vl%SSsJ99ym93eCtDn+WJ{wKqd}}!)2@K98L3kb
zCf+;!lpBud4ZAA%tL~N)?{srI&Y%=)Bi1hW?bjZGW}|nnSU%iUXLqned%?NS{J14|
zvj37c@h455llS*M8i6ePpTFesJ#i>bp(l|iWpX6SXMbrmoz6vGXF+b_Ei|sPIr9xV
zbH{FlK5@84R+Sz{j5=ng^s)T`ac}&uaUnwP#Xdt{C`vV<M&G6%wUSPwzq&_|<-rx9
zyU{g{<9p6KJW3yoh)fWie<_=hF<1Q{Lz(&r{TuOqw)5P9t!PncCe16y-&0+*zV|J{
zGaav4W%P8)vaICQF%j#b(b5V-G*gGC2jWsYt5DEOlr{U#sEhNTSYsM%7cE^_E$+78
z6?uB$*Pa*ZyS6_Yv(QPqx#!KUYa`Jd&lZ+{4tI7Qh$94-ZD1N`>$4ftpm3Nq0}*$j
zjj0j%qv>KB9<%5Yfv<MSMz@zI!Y-?;$_!v^={Q)>pRJp)YAvS_7pk4UGj-hka74(2
z*}#<bIkzDFXdSUp)G%A<BKu*JF+S5R)SSa5y5CzY1E{N}>iR19b$$%`73#2kV^Oow
zToillGodypw3lb}F6a1`>StE@j%?+erOkNcQ_aF)>C(gdI{lV|EKbh)<0Cha@bA6l
z|8i)AwQDe@0%-u&CJGyjH-_4Kw^qb<C5hb(m@D!UsnllhsJ5;ymR2+#Wr)OLsmryW
z#Ae$dW91KN#VHW=e-SO5DD;fk5F2o_*}WMCxpt|PO^{9i?o}!1au`=j%tGSG7dz49
z`K2j2^kdUdjcAHJ2%og_0vWiI2>o*Rn>RMn@BN-<K54JJrHo*26YI?0*EpoISe_PJ
zT`tvhtpR!2e6BF#T%=B7`q*AA?mG;b4Y#7lQi2<q3D`nfpIEJWrk8z=hhR@Ue3qr}
zsSWtyWVkLFkN0w2;4<JCT@=vEiTbGd_JqtH0gE7{Dks~BQMdg03X7YwM-;zpyTemn
zuW(#rtU^@gxlf>;gd5ir7G}G<#tXUzlT%%o`;(gVvTV2Cn0*SCPA~h&b4hYCWLmYq
zHKy26Af)A>=!CG<kdcTAo1N}$Gq-bnnV+)v6nIB^bHm36&PE!J+8n%ol2O`S95Eo?
zlGM(CR=ui|De2!55}Q63kUZx9Ym3VlZ|Q#(kJ&8lLW7cj?}%{*8g4-GwG$j)PEov-
zc+jk%H?>^C-$QR@<*1eMgq4?<)j#@Sr^3i1^ZuChg-eeuXKG|PcGBMo6E&bZ5*ai=
zFX6r;^_OJ(iO)w46BBv#&~UDSxzEZiR;ltI>Lz)5tuHx=evb_AJg<)9J*0b7t4r%T
z%O1UfkgE)ul{d#brz#{Wr=0a92d0Wdll9=4=H**cV-IR`+H%h+zn5s5Q@B<Ukr_Ul
zb7EWhVxa(=WlqK^Q5xD@dFj%z+gZ^EBd+-PWZjO<<`Lu9TgGE?8iFJ2!WjcSRptvz
znL)pPQWh8+Wo@9Y&B8$Z+7%mv#~(!MBe7KRs;V@w0(sN30$l2KT`kH2sSHANHt$0Z
z0pFb?Nn1=)ztt(ckfscNEp~rwb~zoVGM<S-Er+vh170=@qtNKJ&K##FT@IY(1TjJ)
zjyuNHs<R@^7%}gjWmnF`wxm8gr>rJ&WVD6u#=y*NPVO_@3!FwXC9+cu;p4JZQFQO$
zDqge}%D_!jUAp?N>4D{YO#j0gru#N2TvWG_iC>eO(%O6<n(>Hw>X!4`K2vtFwVe34
zy`NQk52e86DWq@<ABR|Qd2S3*N_QlB%;s8?#Di(;;D-gMoO-^4)OuZZ`YE#0B3Jmi
z=3e#1zS6*+JF*oWZ)JU1Xtd*y?n0llTC>49L6m47_XwJ`C!^Q~mQCLzB@NrlL7h78
z-TG2)+0oKmX9Hbr7Kx#;>wtxRO{@}%aYB}wi_Yqocu=OFdh*}{Sxr@s2!X=~5)QLn
zJX5E|SeE4eR!vnwmf-<c#9K^bCs%|*M|hrm##e5ZH2=sKs%{^AJvwR|FfqifTup~P
z3ZABYvv@v|K`YAA;hHoPqgbtB%;9OA<*7Y$t-Le#*{S?nE)`GXCk0}|wKRl9QbUaP
zE?0_~@BQ#Xg}qdL<W!;#_Ko-fr39PyD@g|P$Xn;<Jj>(Ne7YVFWDxhK4Sc#frG0|g
zOeD?Vz1Zb3w47+s7Yp6sPZj*#BRTmdKQev|$EVB+-WtsA`*={;AT74<YFw4fa!l6D
z66N_ThscMIYj<pz&7wvO5xr)SkTukDqejsUc8XQW=YQ7m$}AfG^zx+nrFP?MFa+go
zR8lHqk??`>gfkfJkgr~rZw+Sexg0+>D)31HU$9{2b+kJ2xmS0(O_?6wV?;}(v01k1
z%Uq?Ih&}Ag6#HMFqJDFqZ+V8R@Zq!42(N}9yRBB5y>9Z4l)Aaq4=J!(|A^=Kkf&eb
z@@9Gu{9x~mh-{M08Kx8pvHRDVqAwN>T!{)#PPoS3GZT~d+5Dkpw6(su`5=|QLl?`y
zaC|H)W~!Up^AM}^0ojka2IyM`+{`-1PRzVS8-D2q&GxjsY*jkesVHhCG^%lN%wzYP
z%s~YP)SPEYik{2-MKRO(&(zYbTPW}S8cvAq40<gdV=*|XE&93J=_()l&HFoTew{wx
zqIJold0&*u_QKjdbRo|a*ga2}HO<@QD*IJ#>EF%X=P1m=$X4Gmg`D+&6fiK%Q8_xN
z?3k;(v-do!?4I(b_Zb|eSAX7kQpT6bEE42ic_FF2y)c}k*Eg{6cBd_WuV3bm??1Ol
zP-Js&Fda6Fv(ebKONWj@jjJ-44zIRbyr!6N4#;FMwEX6-^q4NL9-fU5yDH52?U5Ga
z8(MSb`y=rtN+)+7Kuu!AWgn099vHp**m(PWj%$}L>T}t=`Q=De%Sl2A?rWueb-|v|
zyJP-!E=#JF2-fO`yM_o?UoFP2K05n+gTf`_9ez67#fv$fU|Ob33FT51a2J1J_P38H
zl&J=|;+|OL#*`(;HF@nUge6cPwtDuF*Wg^(zNCJ7Qcizkb^Gh<W8ZuUN{HFlh1LGS
zj1yX=^&j$s#7`6}?rNXYmiE)weeZJtJF5dTwPD)j_+%+ro8UY0%USmO7JfYL{PK%|
zV~@q&4fMBJ<W5}cG<9MX*y`YPl+(3>diu&;%cQ`b<)hbox81AL-5SlQ5;liBu7&(m
zh30V97rU60T4#c^V0iH<r*#Rfp?x{>Y0w?Y)8#Xsr+YMheDe&Xxc)1U<|T6vpP<<W
z9BvlCuHRdz7;1SnA?XIkwp+k4qsQKTsntFB@z<SX52=?jbBuKKXjyypf+2j0UHAy!
zZ2m#JGktMcs^^s(Szd`{*A>s+txajPXmN|yoXO@tkeW|GKPm3Ic1P_h{+udq-|jKE
zQh1>0hxShfgLVxQz5*Ymj5wn_#Fv$*%aeRs#geF86<$5x{lT46hINRhV$`eRuD09P
z2cOItGM`DAm$}l))2Db3ThUN7HqgJIXdHSm%qg~Kmr3<mjryCj7NXMnx4W*9Y6+p-
zv^$7a4GA~5J#%!YXpm!`%TaCm#&M-k9KO^s;P7x5^;o@Bbd*L{)>En}-a|w3(N_7h
zH{Wirp5cG~7Qt0~CjB$LkCeSmZGGm8x6O))bkl~OTR#i(x>@q8+;Y(yqpv)$Hvwal
zx3jUEKE5}hgLQi<&DSoTA63!ZC9#d8nfG5Fs3&Gk{NUoOI{kVB(`*)bug4JRJNf{5
zW4EwHa*Du49caj>0{4re?<z`gsWUdVZmD1ltrX4(J7cbhRY(dmW65_AabVGqq!di0
zp?F-$@qPndHjBF@QEM0W_8N{*Bpr2^oKY>S;)&47cV#`l%pWTAR-Ndh<QL%XWpd+>
zs=T4)ml!-Acq=-uCA7!rxCM`HOrcY7QSkBCHN2x*ebEP=bdFI*HFG=v%NTaExOAPr
z#%!dd6c$qnE^Tqz_O0D19x%a78j7r7##<Y7x><CG!mm|_%3lXlpVeqiKqphv4{i$R
zRD@nFlc(_$r7F5b&E8FoaMh}2kTa+6c;x>9?}lR=q4Ehe&3osVWEgz6$tjsM<B0d(
z$8eLK-(g!|QZDxu7AcP5shkn7f=?ZbY93#YepM^`l<}mo&hGuTaEGsApH5ujH?VwZ
zce2OUt>5`1$G2;xl{fO=hYxmGVy;kqD&m{^MP1}A6~2L_o5qf)b^9&-8dfC~9l%2j
zBONEE2%T;DZA-2~8F`9$MYg1L?Z-(`QmqL$#Ps;(v}~ekqu#sO;f43AAQDT3=euAt
z!O^4J=U?`vR48Fn9f=_fn7!9{`JZax(>?aGS6>-wz<g(NB~hA2?JP5gcjZ|<yAAhh
zX5PX0Fpd43Afu)Z&GJpJdBdS;Sw8oiGfAY+#f$If+B)~w>dKy1uD-W;*4eW_3a0F#
zdVKaQ$-1e$MJcb33MWhHe2b!ZIB=2Rs7B%ron()Ght20xTJEoEo}i36YdE)odNvCe
zCD*R+Dt}I+t*Amo9J2N#SDfU9YYutnRR>i`v7wEIts|qTG8x4qjb>=la;<Pxs)n(b
zCv!{Pj{2M$4K%v`b)I&wp0BT`?2|U`jriIu5JO||Yv!wtzLuM^0OO$dS&P>iLIPVl
zl=@7@mIlijjHmWS?l`H={IT3*=Y)}sPxRz$zD{Z@I)J0_z24xm0pnJ;<*pWm$~!Ta
zZARJpo9cyR-eY(3iarzHbwOH>*CFCjW9+-j65Li<oLg|_$xluZU8M-})p0y`u2l@%
z=`#wBD?Xu3l*qW+oqC3tS!ku{Cvq!}*Fvr@o6_^}J=k5_bBXSae222@th=;5`;Ttv
zFMM2ns0FSODbNv|8<Yf#Eo=?g$5GFmn9ce%vhNB_^4*A>qW1|u`Cn{BQs0j8`N*hw
z>KetvS9`ApKiDD^YiYEBF*gf#Q0u^stlnKF$`$M{Sr;g7{>370y!<3@tEwTLp!`g{
zy;?KO!e(pYc_JIa>A->$&nYt+oZ4LYS!=V`ZV6kNPm60F%-ny3G5&?3rTRiY%kWKY
zyBwJkR%s7Wu3TC<W8OnM1LdWaISv(TG??|7S{}b#x|OfIquxih{TkK#Zc%5kUHZm~
z_ejfGK`(wVP`D!|Hjr(zKm>(d`&t5bL2DayU^-8XKp+i6!aa6gD=wISg||LTPtP!j
zc*uD=K1NI1YVxA(8)0L;n{q<CFH~y~S&H-tRC<ouj2M?5;m@t4SLjUh+6>e2FkSVL
zk2ml<IWZ1L8oG@-V3(HKhJ=pWtEN$+Cyei~mzP^nk5b=~na6x8wD=Gnb-A_lgDxs(
z$U&7SWO;CIX=@M~X}f`Rn*}lJgu~z~fu}@bk+cb_k@6|=(TdP^o1(G(eHi`dpRKk?
z)IKJPM<19(Ht2Y>E*gbfi;q>}?n29>vbTu>JzFW=cv*9|-*OSxqGqK##r~)gVWVri
zL<I|mo;l7T1GZ)92%CyO7rC9t<HgIVpozNZuZy~#dnY?Bs&}HQ@LH9RL-JFnw$`ST
zUmOo&5@qXg<D#E+J}hL(R2F?Ozf!`G6fA#B^0J0LD!9bvRBndvNx_UAujRPa(!_Rq
z+PJ|c=ea2y?jNjr^p5)F7DB)VQf}7yV=!yCb^8s8=Y>#=ZNFe4RumJnGnqTJdw(pG
zG)$<euer3zjCeID3x|lu4H`JVeWvPgrT%lt_4bHq^}!aYGtMn47LQZvk130fpc-Ww
z!=?upxOXnzXn$Nm`v@^G&uuzcMBk?P=p3f9Vek2dy)9){6N1-*{EhM&iiDoOJGU&?
z<jobvd4IQK9b$3p`fmUGc`7et9R@4SnNyt9lDDUvo^tAF8_}8&KiK?k5e{}u(u&6q
zhUOss9tB98N4wf9J6j(+o%g8e+vxKP`Bau49@*zfZe2{<TTiK(-tLkEUuyX%a23r`
zQ2bF-^o{|C-}qyTio3O!HPGhsDrcW;@0h4h`|5WQef^1aKni=luF3*;&}}NF%kRbq
zoYNvy?+_pE6F!0W4fLA5mwt%(agn+@$J1(<Zf?tVt>-BV>1We>dfPs~N}cCExx5AQ
z_QaNdje^@O@L7)zF?b_@4v|~g3JM}OcxNQueoA~dLtF}<gc5@-($&T#&&t7u#_kXt
z1+VU*@LFJt-+-6R;vo!b?KHrFO$3+*;01-psg8>ef-_0w$HAMraM69G`{flS5QloS
zLVAS<^BYDE?wS`aQ<S?fmELaHG|ySFy>owwS|{tB{V*y6cG(5AD2HHgXYU6of9q5^
z$`Te`F;0KlL<*XcH%I-GbWhhw^P-1Y6XSVUgdV1AozqcIlGXUR@BP=jWGpE}WS{q9
zkXO<6<GGLNIN)Y?pVQ{;6Wb}oE*q_0?sbn~dS*E8U_$dTVG(YYmm+04m38lik}Jd0
zE-F%wnuYd}DyqNpX`8((4WHR-&DnQaClgl`?ofH&)~>;SzFs>zBC>%^|59)={k7B!
zPlxDt7pUp@PSuy2<EI6ba~Cn<Ne`R%2yKbdwH1Q(l!mgNBHk-KdvBD1$XR#B{&KoF
zqpPEEP2l;6x~`0heCtQgGM{}}qE#^u%H6;Uo5gDDj6hxmLmR;PJOxkE*2xVT-Ym*N
zVb;N2@REZHIFD)LH*B-qaz#;nna`I>CXne0iVF&Y(lbc98((JeT}_|6a~cRs_`d75
z$>O`J0}V}?KgLDX&T{BGAq^TM4r%i%S$AA)ZAn4j;eB&k=kyJiWaFEsePw8vZ(p^)
zP8U+-YCSGK!&q-}q^g<eK%CZ1zk4S4PbwNAQ#X)hv&d-ueivs*EPg4^I1vo9gopY*
z$rz(93i<RZGU+HxsnYap`gzV|r$u`Kll_Sj52iF+qj_iNAH$;A6AlW$$Y+e#UzCY3
zesHX<_I`f`D_koXmYA5(EX=UPm^v)MH)iX6;KFV{*+46km~gf&ZQI#%oexRXG^EG=
zlquM9;4Q0HvDqCyTMl)mYR?xO(@I|gT6o|N7gNr{EKQ?1LlkDNsND>nH#0VacTe$Y
zc*Ml&$A|i!dZErptl+FEM0`J#bdjDkbx!b8P92*@dxnjr)OTh%l6c_*{JXZ2^P1n)
z_c?37@XOV3Y8Ol_PE~OyxChHt$$yb(J$d8_T*0V+kJE{+YnN^Jq;uOx`0=GPVw?G*
z+Sz5IFl<S4!FxiEYKc_ui9fK7Df{Hnow+d?H11oxyR&YerR16!urA@Vis)%d9|@M<
z%N}U;efil!z|v1jzT2V38<=LZ_zR6)duYDjfS5q3K-V_L5vb_>bh@l9et!8QtaZ!j
zNF}(`eoy^JdMqjXhG$z$D&I_}?Re<-F#A=j4=*Q?1%#dKFxDxwtlfE!)|by=>z6#7
z_-XD5$^*7ALZGL=_M?F*A8s5~&@*Q}2YvKDF3-n})J8tfThbcz3St<hEIOH5?y~mV
z@5sB9{3s@Tw~m0JdiJ<T5k?`7D)+iUZY*C@$j7$bm#s~9A7}hwp*8r-o4tbvE6sUb
zhM2E*Fr)2p6T?2*U$sx_RE&A9%doI{JQ!5^l`%O&81A`S(KNSk#Jju}$ze7#dW$zs
z=hgxN@peJ>-NfQg%EEke&J7f}X^=-)J5P1ubhar9u$-RH$l*d}f1$ka#nA$s<@kPH
zd>b~WC7^V*fs$YCt*TmN#|Qt?$Ea<!3}}%fv0t)cCWOw@k8`*_tge*)^h6vbD$c&!
z$~kH1lG^r#Hy0Tc!X{@MxOOP#IbU?gW%(qAgw#B>?9KO?f4Uo%(Q;m&k=KLz`BT|#
zLu?(>=^_t|#kem?CNUdx#g`AzM-=AUq#Btx#=JF^oj!MDYH`cMZ_+<Dka@GvkFXY+
z;9|;HL7)O<P(c?R-ppz`QA-7!PoB`FMDR1G<~%&iR2{b4!>8zH_ioF(Vk)V&9bs=g
zuN_rKsmd1aemC5#clW3VCtL6a{A?N#u6tK6dqH6vB?=t*1w<TR5%X0v8?G<`8>^#Q
z=`W9T(Z!%R6_wlzk8isa#?*W+U#qbHv)I{a&224dR@?IPdiamU2ii#P7}{CM+wr#S
z<9F7Y1k>QIQPpZ;$1t@?Bd4JhPxeMi9QC-^xJ7O)%iXM^gU_)EFDuIvoFxlh-5V>K
zSbl0UeDvrmiI@#U+ARD-gSg){um1Ggn1FY$IR@h_+`jdcU8d<hpJiv)E@^AlWks`=
zB)Wj2#~orHwdAC<nQu!xC=+D3=_5SZnhww0H=UhUUe^8mt(6Cl$dOP^g4GA7SRNed
zO;~|Mvf`;G;m>I&e1bmQ*W>m-&h6z(#j5n!ivD;tr;b^rOjexkmbc3x4WIUpWG8%S
z&E#F=&6xTuUE%gd^ti8Jw-*Pxw*uR|gN|Kgr%ZrKfydj5e!o5}TdxCyZpxNo-2*Y(
zD69mJ915K*RwioooX^j4uWg=ZmZvbP{8u%ozpbrk?QKtvwFZIA$E{6$j6nP{2yAw=
z_JYE!z2PWGk09n)#SKIeLs8T+GBAX)lY@gj6ypqHlFi`&KO;cQvn`0-M#2%~;Aa<m
z2N04?&U!f=b0opBcyMd)<>YP$LY?7=KzlEDIKtfxge8;wK#t_-Wp58dIDj~3IKsin
z7tp<({J|f4Uwbz=!k*j)4pg*tcX4-vgZLPP0vw?T2QU$#3<odq5vp(k0gljtgEI?+
zK1>n=MPo~XVkU5eDVzu%S;L7$P#=!41MR_4NKoP!xPT%~-~tLb!x1hZIvUgmJp!IQ
zzzr6L01N?z0BL;S2wyk~gvucT;FhaF*niXlg?Fnud3lo{)d7_uAf+m<#Cq>XAwjkM
zSUd)fA`r2l5ex<j^M;Bj+k4x3IeC!Wy}+{-RYPPQYmygG0EiQpKq4Wa3jY1cfvVcp
zzk5RCfCReMu0Uh|84!jr^06V2!|tJ2aUkkS$oPt(4?5YA9D!w_g_j%%z8bpt_o}4e
zZ3~5<6VX6vO4c6gD<%MPK@3Ar_Cu#IP=xt0Z!iLY@SBr&J*yy4AAA?!WUJtI%*7s#
z1Tpf~{$vK>Ab=7KEF+S=t0`dgFA4rq8N%!jW558ba;$n3b2uJ}{C^|C{%_mCfVqYR
z_QQf$KS{85;xTv_5p048V3?vo*gpomLV*$v@CiMK%45LsEBT5cK;{2ef~`KsfU;16
z+E`6EfNN1mNCA>qkU&F3Nia#kJ(#dK&^ZB0I5Zqh#KAxUvkDJ=k^q+k<CFkqDFKAZ
z;lVs1U!mt1kU)GZ0n~@uBuWy&b%h39$--a=Fz5;~0SN^qJPJduivxnMv;oS3e#v?8
z_7zQ#Bqtn!NWKz5z#tyb@hCJ9aHU@?=obej5%dlgdWHj6NuVs~N>1Rz2}l``5}+&8
zKiO!I0s#*|p)jaV9H>K9$|`fvl|0g*p0Lmj5>gSwHY6H!1&M4AsJoQ}!~M3|pQcta
zfWeg}&_H5P4F_=rIESi3!(i21t5?7+Y?Tpm3s``0fY@aAp>go1eW(;96L~O!mZ1AT
ztw2m6kudVJ-whD)MEGig2Exi1fX2bvMAkJJ7tomaE0M<oG$vLPd93^%6VQ10J3)Mr
z??D^T*jP=F<ai_w*bwjvje!jkQ4-h$8W$)?Q~#yaF|m4ulmhGyBVVEXUx{q%mFG~I
zKM5KakX50p<)6jjZ(N`NCr4%*LaqeZ4q6U@J3&KRQjk-H12d9yI&d_!0|PXDI4~(W
zX9P!+7f={*F(3qmOf!X}A>RthSp&_j<ZR*OC?r5z@m;_e<Pvs}-z4Yk!Kj6Du-|?T
zh(M;3{UG3f<py$%U@((!j={k&C+D2tK>6gHGngAt4hFm!`TY{;(G8CM-J3fcET-gg
zkkiHfmdgW<{oU7pUK_k}BDb>Ed6TQII`H3pdqJ+8{J<Ly&L~%MWQR>Y<^h~QJ{<R(
zgB64TrXaZ-2@XOFS8|Z+hvR<p0igll(a2AH;9x-lIT-M}U|Ay50^k4zkaJ`R0m9Lt
zoB!5+sL?->jbQ$)6#nly)XnO11=Pw%Tnd2p+r9kv85+XY8XO)$UR?pbS`4`U6+ilK
zvA<Y=xc}Y7fBmkyB=FVPe?&;8Rc3U+%JW+{I)Lxr%F#i?!7D1VygC3ptSIKc2FNv5
zTUZ4f7zrGFKlEocK}h3&eGrC0{e>4)-Mw5_#a#VT4s`hETQS!-g>B(*MwNM46~t!^
zEfxET9_Okw?YF<~P<qO1zr=RkmkBXgO8BgsU$-Dcuccct68JQ(r&>E^$7o;qp5`kC
z`};e5BnvycJZwql%C;Bu_J8otz4Q@3oOs}S=^+8t1AN;`!h<-KL@dNmG*Pb@5Q7cN
zteHo8>vrYDhS(lf`n3U`{oCmO|JB0ZL;L?$3jm({2Oa;Ph5|;v<t<wO^V=SrDMgNj
z1TvX2Pd-l~y!G{4ueVz%GL&geH?Y-Tvjg$}SuUZ8xH_$r+<gGr1yf(k$qw4>kUtYO
zgy8$iTCnn|>wj|+AicjTgD)&ePVR2X(3h=yl%>!}G#Z6QV$moJ5raJ}216)8B>?!}
z<7!RvaB(NOIN5-Si<Ur(!yQQ^4=DuVzXuZTUdKQcuragqu?63E{$0nz&H-*?ZR-p$
z@JfBLzZM#QFM9_V<YZw05B~lExQ7GI#R2|@25ktSAKdLX4TVGE!5X!eh9MHbd|OKc
zpa<A>9SsG7ng5|d-gA99us2*w19xDDx1NRp`?vKpG!iWE(C-h=NR%YBVy>Z~fWCnj
zT1UffAO`_*x9iFgq5aod8X61v)pa!Vzcdu#U%g<kU<bIiF5vl}evv35fM4s%p)nBB
zSko^Oje|zqIvN4_k518&|LPP2ZG+d<#o+$Y9}+`=5XIU)Fp>bet)pR~@59#9aQJ`A
z5s?4T0D}FOhJo<e+J12afXmm>@X$VfJq`WO*hS*;V5_md9JGyCPa^^ty^cmef{*jo
z(=ZTLgns{xKLQ^4j}8a~faTYflZ5se>uG4PuUJpRL0%sE{nrQBq5WewNod=)t{f2!
z4oKG1u>Z~(A^~gx)|Zn6Fn=u#>_&*|<p7%h$8IR3B!rxy-yfdA!OB1OK!MYgf7%0+
zb-lbOpn|`6A$eIlx!8M=|Dyp7HwSmH$yk9%hVJgrN==4KMoxkDV8MeHL2r_^7m2(C
QqHti-4HFht(Nl%}A2~^7UH||9

literal 0
HcmV?d00001

diff --git a/paper/paper.tex b/paper/paper.tex
index cf820bd..932f846 100644
--- a/paper/paper.tex
+++ b/paper/paper.tex
@@ -4,6 +4,7 @@
 \usepackage{amsmath}
 \usepackage{mathtools}
 \usepackage{blkarray, bigstrut}
+\usepackage{multirow}
 
 \setlength\parskip{6pt}
 \setlist[itemize]{parsep=0pt}
@@ -29,6 +30,7 @@
 \end{abstract}
 
 \section{Introduction}
+\label{sec:introduction}
 \input{section/introduction.tex}
 
 \section{Unified Interface for Accessing User-Item Data}
@@ -45,8 +47,10 @@ \section{Evaluation Framework}
 
 \section{Experimental Results}
 \label{sec:experiment}
+\input{section/experiment.tex}
 
 \section{Conclusion}
+\input{section/conclusion.tex}
 
 \input{bib.tex}
 
diff --git a/paper/ref.bib b/paper/ref.bib
index a251e90..a7e2890 100644
--- a/paper/ref.bib
+++ b/paper/ref.bib
@@ -1,284 +1,275 @@
-@article{bezanson2017julia,
-  title={Julia: A fresh approach to numerical computing},
-  author={Bezanson, Jeff and Edelman, Alan and Karpinski, Stefan and Shah, Viral B},
-  doi={10.1137/141000671},
-  journal={SIAM review},
-  volume={59},
-  number={1},
-  pages={65--98},
-  year={2017},
-  publisher={SIAM}
+@article {bezanson2017julia,
+  author       = { J.~Bezanson and others },
+  title        = { {Julia: A Fresh Approach to Numerical Computing} },
+  journal      = { SIAM Review },
+  year         = 2017,
+  pages        = { 65--98 },
+  volume       = 59,
+  number       = 1
 }
 
-@inproceedings{ekstrand2020lenskit,
-  title={Lenskit for python: Next-generation software for recommender systems experiments},
-  author={Ekstrand, Michael D},
-  booktitle={Proceedings of the 29th ACM international conference on information \& knowledge management},
-  pages={2999--3006},
-  year={2020}
+@inproceedings {ekstrand2020lenskit,
+  author       = { M.~D.~Ekstrand },
+  title        = { {LensKit for Python: Next-Generation Software for Recommender Systems Experiments} },
+  booktitle    = { {Proceedings of the 29th ACM International Conference on Information \& Knowledge Management} },
+  year         = 2020,
+  pages        = { 2999--3006 }
 }
 
-@inproceedings{gantner2011mymedialite,
-  title={MyMediaLite: A free recommender system library},
-  author={Gantner, Zeno and Rendle, Steffen and Freudenthaler, Christoph and Schmidt-Thieme, Lars},
-  booktitle={Proceedings of the fifth ACM conference on Recommender systems},
-  pages={305--308},
-  year={2011}
+@inproceedings {gantner2011mymedialite,
+  author       = { Z.~Gantner and others },
+  title        = { {MyMediaLite: A Free Recommender System Library} },
+  booktitle    = { Proceedings of RecSys 2011 },
+  year         = 2011,
+  pages        = { 305--308 }
 }
 
-@inproceedings{guo2015librec,
-  title={LibRec: A Java Library for Recommender Systems.},
-  author={Guo, Guibing and Zhang, Jie and Sun, Zhu and Yorke-Smith, Neil},
-  booktitle={Umap Workshops},
-  volume={4},
-  year={2015},
-  organization={Citeseer}
+@inproceedings {guo2015librec,
+  author       = { G.~Guo and others },
+  title        = { {LibRec: A Java Library for Recommender Systems} },
+  booktitle    = { {UMAP} Workshops },
+  year         = 2015,
+  volume       = 4,
 }
 
-@article{harper2015movielens,
-  title={The movielens datasets: History and context},
-  author={Harper, F Maxwell and Konstan, Joseph A},
-  journal={Acm transactions on interactive intelligent systems (tiis)},
-  volume={5},
-  number={4},
-  pages={1--19},
-  year={2015},
-  publisher={ACM New York, NY, USA}
+@article {harper2015movielens,
+  author       = { F.~M.~Harper and J.~A.~Konstan },
+  title        = { {The MovieLens Datasets: History and Context} },
+  journal      = { ACM Transactions on Interactive Intelligent Systems },
+  year         = 2015,
+  pages        = { 1--19 },
+  volume       = 5,
+  number       = 4,
 }
 
-@inproceedings{ni2019justifying,
-  title={Justifying recommendations using distantly-labeled reviews and fine-grained aspects},
-  author={Ni, Jianmo and Li, Jiacheng and McAuley, Julian},
-  booktitle={Proceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP)},
-  pages={188--197},
-  year={2019}
+@inproceedings {ni2019justifying,
+  author       = { J.~Ni and others },
+  title        = { {Justifying Recommendations using Distantly-Labeled Reviews and Fine-Grained Aspects} },
+  booktitle    = { {Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)} },
+  year         = 2019,
+  pages        = { 188--197 }
 }
 
-@inproceedings{Cantador:RecSys2011,
-  author = {Cantador, Iv\'{a}n and Brusilovsky, Peter and Kuflik, Tsvi},
-  title = {2nd Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011)},
-  booktitle = {Proceedings of the 5th ACM conference on Recommender systems},
-  series = {RecSys 2011},
-  year = {2011},
-  location = {Chicago, IL, USA},
-  publisher = {ACM},
-  address = {New York, NY, USA},
-  keywords = {information heterogeneity, information integration, recommender systems},
-} 
+@inproceedings {Cantador:RecSys2011,
+  author       = { I.~Cantador and others },
+  title        = { {2nd Workshop on Information Heterogeneity and Fusion in Recommender Systems (HetRec 2011)} },
+  booktitle    = { {Proceedings of RecSys 2011} },
+  year         = 2011,
+}
+
+@article {Goldberg1992,
+  author       = { D.~Goldberg and others },
+  title        = { {Using Collaborative Filtering to Weave an Information Tapestry} },
+  journal      = { Communications of the ACM },
+  year         = 1992,
+  pages        = { 61--70 },
+  month        = December,
+  volume       = 35,
+  number       = 12
+}
 
-@ARTICLE{Goldberg1992,
-   author = "D.~Goldberg and D.~Nichols and B.~M.~Oki and D.~Terry",
-    title = "{Using Collaborative Filtering to Weave an Information Tapestry}",
-  journal = {Communications of the ACM},
-  volume = {35},
-  number = {12},
-  pages = {61-70},
-     year = {1992},
-    month = {December}
+@inproceedings {Sarwar2001,
+  author       = { B.~Sarwar and others },
+  title        = { {Item-Based Collaborative Filtering Recommendation Algorithms} },
+  booktitle    = { Proceedings of WWW 2001 },
+  year         = 2001,
+  pages        = { 285--295 },
+  month        = April
 }
 
-@INPROCEEDINGS{Sarwar2001,
-  AUTHOR = "B.~Sarwar and G.~Karypis and J.~Konstan and J.~Riedl",
-  TITLE = "{Item-Based Collaborative Filtering Recommendation Algorithms}",
-  BOOKTITLE = "Proceedings of WWW 2001",
-  PAGES = {285-295},
-  MONTH = "April",
-  YEAR = {2001}
+@inproceedings {Herlocker1999,
+  author       = { J.~L.~Herlocker and others },
+  title        = { {An Algorithmic Framework for Performing Collaborative Filtering} },
+  booktitle    = { Proceedings of SIGIR 1999 },
+  year         = 1999,
+  pages        = { 230--237 },
+  month        = August
 }
 
-@INPROCEEDINGS{Herlocker1999,
-  AUTHOR = "J.~L.~Herlocker and J.~A.~Konstan and A.~Borchers and J.~Riedl",
-  TITLE = "{An Algorithmic Framework for Performing Collaborative Filtering}",
-  BOOKTITLE = "Proceedings of SIGIR 1999",
-  PAGES = {230-237},
-  MONTH = "August",
-  YEAR = {1999}
+@article {Deshpande2004,
+  author       = { M.~Deshpande and G.~Karypis },
+  title        = { {Item-Based Top-N Recommendation Algorithms} },
+  journal      = { ACM Transactions on Information Systems },
+  year         = 2004,
+  pages        = { 143--177 },
+  month        = January,
+  volume       = 22,
+  number       = 1
 }
 
-@ARTICLE{Deshpande2004,
-  AUTHOR = "M.~Deshpande and G.~Karypis",
-  TITLE = "{Item-Based Top-N Recommendation Algorithms}",
-  JOURNAL = "ACM Transactions on Information Systems",
-  VOLUME = {22},
-  NUMBER = {1},
-  PAGES = {143-177},
-  MONTH = "January",
-  YEAR = {2004}
+@article {Sarwar2000,
+  author       = { B.~M.~Sarwar and others },
+  title        = { {Application of Dimensionality Reduction in Recommender System -- A Case Study} },
+  journal      = { ACM WebKDD 2000 Workshop },
+  year         = 2000,
+  month        = August
 }
 
-@ARTICLE{Sarwar2000,
-  AUTHOR = "B.~M.~Sarwar and G.~Karypis and J.~A.~Konstan and J.~T.~Riedl",
-  TITLE = "{Application of Dimensionality Reduction in Recommender System -- A Case Study}",
-  JOURNAL = "ACM WebKDD 2000 Workshop",
-  MONTH = "August",
-  YEAR = {2000}
+@article {Koren2009,
+  author       = { Y.~Koren and others },
+  title        = { {Matrix Factorization Techniques for Recommender Systems} },
+  journal      = { Computer },
+  year         = 2009,
+  pages        = { 30--37 },
+  month        = August,
+  volume       = 42,
+  number       = 8
 }
 
-@ARTICLE{Koren2009,
-  AUTHOR = "Y.~Koren and R.~Bell and C.~Volinsky",
-  TITLE = "{Matrix Factorization Techniques for Recommender Systems}",
-  JOURNAL = "Computer",
-  VOLUME = {42},
-  NUMBER = {8},
-  PAGES = {30-37},
-  MONTH = "August",
-  YEAR = {2009}
+@misc {Funk2006,
+  author       = { S.~Funk },
+  title        = { {Netflix Update: Try This at Home} },
+  howpublished = { \url{http://sifter.org/~simon/journal/20061211.html} },
+  month        = December,
+  year         = 2006,
+  note         = { (visited on December 4, 2022) }
 }
 
-@MISC{Funk2006,
-  AUTHOR = "S.~Funk",
-  TITLE = "{Netflix Update: Try This at Home}",
-  HOWPUBLISHED = "\url{http://sifter.org/~simon/journal/20061211.html}",
-  MONTH = "December",
-  YEAR = {2006},
-  NOTE = "(visited on November 10, 2016)"
+@article {Chen2011,
+  author       = { T.~Chen and others },
+  title        = { {Feature-Based Matrix Factorization} },
+  journal      = { {\tt arXiv:1109.2271 [cs.AI]} },
+  year         = 2011,
+  month        = September
 }
 
-@ARTICLE{Chen2011,
-  AUTHOR = "T.~Chen and Z.~Zheng and Q.~Lu and W.~Zhang and Y.~Yu",
-  TITLE = "{Feature-Based Matrix Factorization}",
-  JOURNAL = "{\tt arXiv:1109.2271 [cs.AI]}",
-  MONTH = "September",
-  YEAR = {2011}
+@inproceedings {10.5555/1795114.1795167,
+  author       = { S.~Rendle and others },
+  title        = { {BPR: Bayesian Personalized Ranking from Implicit Feedback} },
+  booktitle    = { Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence },
+  year         = 2009,
+  pages        = { 452--461 },
 }
 
-@inproceedings{10.5555/1795114.1795167,
-author = {Rendle, Steffen and Freudenthaler, Christoph and Gantner, Zeno and Schmidt-Thieme, Lars},
-title = {BPR: Bayesian Personalized Ranking from Implicit Feedback},
-year = {2009},
-isbn = {9780974903958},
-publisher = {AUAI Press},
-address = {Arlington, Virginia, USA},
-abstract = {Item recommendation is the task of predicting a personalized ranking on a set of items (e.g. websites, movies, products). In this paper, we investigate the most common scenario with implicit feedback (e.g. clicks, purchases). There are many methods for item recommendation from implicit feedback like matrix factorization (MF) or adaptive k-nearest-neighbor (kNN). Even though these methods are designed for the item prediction task of personalized ranking, none of them is directly optimized for ranking. In this paper we present a generic optimization criterion BPR-Opt for personalized ranking that is the maximum posterior estimator derived from a Bayesian analysis of the problem. We also provide a generic learning algorithm for optimizing models with respect to BPR-Opt. The learning method is based on stochastic gradient descent with bootstrap sampling. We show how to apply our method to two state-of-the-art recommender models: matrix factorization and adaptive kNN. Our experiments indicate that for the task of personalized ranking our optimization method outperforms the standard learning techniques for MF and kNN. The results show the importance of optimizing models for the right criterion.},
-booktitle = {Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence},
-pages = {452–461},
-numpages = {10},
-location = {Montreal, Quebec, Canada},
-series = {UAI '09}
+@inproceedings {Karatzoglou2010,
+  author       = { A.~Karatzoglou and others },
+  title        = { {Multiverse Recommendation: N-Dimensional Tensor Factorization for Context-Aware Collaborative Filtering} },
+  booktitle    = { Proceedings of RecSys 2010 },
+  year         = 2010,
+  pages        = { 79--86 },
+  month        = September
 }
 
-@INPROCEEDINGS{Karatzoglou2010,
-  AUTHOR = "A.~Karatzoglou and X.~Amatriain and L.~Baltrunas and N.~Oliver",
-  TITLE = "{Multiverse Recommendation: N-Dimensional Tensor Factorization for Context-Aware Collaborative Filtering}",
-  BOOKTITLE = "Proceedings of RecSys 2010",
-  PAGES = {79-86},
-  MONTH = "September",
-  YEAR = {2010}
+@incollection {Lops2011,
+  author       = { P.~Lops and others },
+  title        = { {Content-Based Recommender Systems: State of the Art and Trends} },
+  booktitle    = { Recommender Systems Handbook },
+  year         = 2011,
+  chapter      = { 3 },
+  pages        = { 73--105 }
 }
 
-@INCOLLECTION{Lops2011,
-  AUTHOR = "P.~Lops and M.~de~Gemmis and G.~Semeraro",
-  TITLE = "{Content-Based Recommender Systems: State of the Art and Trends}", 
-  BOOKTITLE = "Recommender Systems Handbook",
-  PUBLISHER = "Springer US",
-  YEAR = {2011},
-  PAGES = {73-105},
-  CHAPTER = {3}
+@inproceedings {Geuens2015,
+  author       = { S.~Geuens },
+  title        = { {Factorization Machines for Hybrid Recommendation Systems Based on Behavioral, Product, and Customer Data} },
+  booktitle    = { Proceedings of RecSys 2015 },
+  year         = 2015,
+  pages        = { 379--382 },
+  month        = September
 }
 
-@INPROCEEDINGS{Geuens2015,
-  AUTHOR = "S.~Geuens",
-  TITLE = "{Factorization Machines for Hybrid Recommendation Systems Based on Behavioral, Product, and Customer Data}",
-  BOOKTITLE = "Proceedings of RecSys 2015",
-  PAGES = {379-382},
-  MONTH = "September",
-  YEAR = {2015}
+@article {Rendle2012-1,
+  author       = { S.~Rendle },
+  title        = { {Factorization Machines with libFM} },
+  journal      = { ACM Transactions on Intelligent Systems and Technology },
+  year         = 2012,
+  month        = May,
+  volume       = 3,
+  number       = 3
 }
 
-@ARTICLE{Rendle2012-1,
-  AUTHOR = "S.~Rendle",
-  TITLE = "{Factorization Machines with libFM}",
-  JOURNAL = "ACM Transactions on Intelligent Systems and Technology",
-  VOLUME = {3},
-  NUMBER = {3},
-  MONTH = "May",
-  YEAR = {2012}
+@inproceedings {Rendle2012-2,
+  author       = { S.~Rendle },
+  title        = { {Learning Recommender Systems with Adaptive Regularization} },
+  booktitle    = { Proceedings of WSDM 2012 },
+  year         = 2012,
+  pages        = { 133--142 },
+  month        = February
 }
 
-@INPROCEEDINGS{Rendle2012-2,
-  AUTHOR = "S.~Rendle",
-  TITLE = "{Learning Recommender Systems with Adaptive Regularization}",
-  BOOKTITLE = "Proceedings of WSDM 2012",
-  PAGES = {133-142},
-  MONTH = "February",
-  YEAR = {2012}
+@article {Rendle2012-3,
+  author       = { S.~Rendle },
+  title        = { {Social Network and Click-Through Prediction with Factorization Machines} },
+  journal      = { KDD Cup Workshop 2012 },
+  year         = 2012
 }
 
-@ARTICLE{Rendle2012-3,
-  AUTHOR = "S.~Rendle",
-  TITLE = "{Social Network and Click-Through Prediction with Factorization Machines}",
-  JOURNAL = "KDD Cup Workshop 2012",
-  YEAR = {2012}
+@inproceedings {michiels2022recpack,
+  author       = { L.~Michiels and others },
+  title        = { {RecPack: An (Other) Experimentation Toolkit for Top-N Recommendation using Implicit Feedback Data} },
+  booktitle    = { {Proceedings of RecSys 2022} },
+  year         = 2022,
+  pages        = { 648--651 }
 }
 
-@inproceedings{michiels2022recpack,
-  title={RecPack: An (other) Experimentation Toolkit for Top-N Recommendation using Implicit Feedback Data},
-  author={Michiels, Lien and Verachtert, Robin and Goethals, Bart},
-  booktitle={Proceedings of the 16th ACM Conference on Recommender Systems},
-  pages={648--651},
-  year={2022}
+@incollection {shani2011evaluating,
+  author       = { G.~Shani and A.~Gunawardana },
+  title        = { {Evaluating Recommendation Systems} },
+  booktitle    = { Recommender Systems Handbook },
+  year         = 2011,
+  pages        = { 257--297 }
 }
 
-@incollection{shani2011evaluating,
-  title={Evaluating recommendation systems},
-  author={Shani, Guy and Gunawardana, Asela},
-  booktitle={Recommender systems handbook},
-  pages={257--297},
-  year={2011},
-  publisher={Springer}
+@article {kotkov2016survey,
+  author       = { D.~Kotkov and others },
+  title        = { {A Survey of Serendipity in Recommender Systems} },
+  journal      = { Knowledge-Based Systems },
+  year         = 2016,
+  pages        = { 180--192 },
+  volume       = 111,
 }
 
-@article{kotkov2016survey,
-  title={A survey of serendipity in recommender systems},
-  author={Kotkov, Denis and Wang, Shuaiqiang and Veijalainen, Jari},
-  journal={Knowledge-Based Systems},
-  volume={111},
-  pages={180--192},
-  year={2016},
-  publisher={Elsevier}
+@inproceedings {ziegler2005improving,
+  author       = { Cai-Nicolas Ziegler and others },
+  title        = { {Improving Recommendation Lists through Topic Diversification} },
+  booktitle    = { Proceedings of the 14th International Conference on World Wide Web },
+  year         = 2005,
+  pages        = { 22--32 }
 }
 
-@inproceedings{ziegler2005improving,
-  title={Improving recommendation lists through topic diversification},
-  author={Ziegler, Cai-Nicolas and McNee, Sean M and Konstan, Joseph A and Lausen, Georg},
-  booktitle={Proceedings of the 14th international conference on World Wide Web},
-  pages={22--32},
-  year={2005}
+@book {balbaert2019julia,
+  author       = { I.~Balbaert and A.~Salceanu },
+  title        = { {Julia 1.0 Programming Complete Reference Guide: Discover Julia, a High-Performance Language for Technical Computing} },
+  publisher    = { Packt Publishing Ltd },
+  year         = 2019
 }
 
-@book{balbaert2019julia,
-  title={Julia 1.0 programming complete reference guide: discover Julia, a high-performance language for technical computing},
-  author={Balbaert, Ivo and Salceanu, Adrian},
-  year={2019},
-  publisher={Packt Publishing Ltd}
+@book {salceanu2018julia,
+  author       = { A.~Salceanu },
+  title        = { {Julia Programming Projects: Learn Julia 1.x by Building Apps for Data Analysis, Visualization, Machine Learning, and the Web} },
+  publisher    = { Packt Publishing Ltd },
+  year         = 2018
 }
 
-@book{salceanu2018julia,
-  title={Julia Programming Projects: Learn Julia 1.x by building apps for data analysis, visualization, machine learning, and the web},
-  author={Salceanu, Adrian},
-  year={2018},
-  publisher={Packt Publishing Limited}
+@inproceedings {Aharon2013,
+  author       = { M.~Aharon and others },
+  title        = { {OFF-Set: One-Pass Factorization of Feature Sets for Online Recommendation in Persistent Cold Start Settings} },
+  booktitle    = { {Proceedings of RecSys 2013} },
+  year         = 2013,
+  pages        = { 375--378 },
+  month        = October
 }
 
-@INPROCEEDINGS{Aharon2013,
-  AUTHOR = "M.~Aharon and N.~Aizenberg and E.~Bortnikov and R.~Lempel and R.~Adadi and T.~Benyamini and L.~Levin and R.~Roth and O.~Serfaty",
-  TITLE = "{OFF-Set: One-Pass Factorization of Feature Sets for Online Recommendation in Persistent Cold Start Settings}",
-  BOOKTITLE = "Proceedings of RecSys 2013",
-  PAGES = {375-378},
-  MONTH = "October",
-  YEAR = {2013}
+@inproceedings {Bennett07thenetflix,
+  author       = { J.~Bennett and S.~Lanning },
+  title        = { {The Netflix Prize} },
+  booktitle    = { In KDD Cup and Workshop (in conjunction with KDD) },
+  year         = 2007
 }
 
-@INPROCEEDINGS{Bennett07thenetflix,
-    author = {James Bennett and Stan Lanning and Netflix Netflix},
-    title = {The Netflix Prize},
-    booktitle = {In KDD Cup and Workshop in conjunction with KDD},
-    year = {2007}
+@book {Manning2008,
+  author       = { C.~D.~Manning and others },
+  title        = { {Introduction to Information Retrieval} },
+  publisher    = { Cambridge University Press },
+  year         = 2008
 }
 
-@BOOK{Manning2008,
-  AUTHOR = "C.~D.~Manning and P.~Raghavan and H.~Sch{\"u}tze",
-  TITLE = "{Introduction to Information Retrieval}",
-  PUBLISHER = "Cambridge University Press",
-  YEAR = {2008}
+@article {milano2020recommender,
+  author       = { S.~Milano and others },
+  title        = { {Recommender Systems and Their Ethical Challenges} },
+  journal      = { AI \& Society },
+  year         = 2020,
+  pages        = { 957--967 },
+  volume       = 35,
+  number       = 4,
 }
\ No newline at end of file
diff --git a/paper/section/algorithm.tex b/paper/section/algorithm.tex
index a4395f7..dd973b3 100644
--- a/paper/section/algorithm.tex
+++ b/paper/section/algorithm.tex
@@ -1,4 +1,4 @@
-As mentioned in \fig{accessor}, a general flow of building recommender systems is (1) taking a list of user-item interactions, (2) applying certain mathematical operations, and (3) finding out top-$k$ most promising list of items for a target user; \texttt{Recommendation.jl} provides standard interfaces \texttt{fit!()} and \texttt{recommend()} to undergo step (2) and (3), respectively. 
+As mentioned in \fig{accessor}, a general flow of building recommender systems is (1) taking a list of user-item interactions, (2) applying certain mathematical operations, and (3) finding out the top-$k$ most promising list of items for a target user; \texttt{Recommendation.jl} provides standard interfaces \texttt{fit!()} and \texttt{recommend()} to undergo step (2) and (3), respectively. 
 
 \begin{lstlisting}[language = Julia]
 abstract type Recommender end
@@ -13,7 +13,7 @@
 end
 \end{lstlisting}
 
-That is, among various pre-defined options we see in the following sections, we can choose an arbitrary recommender that inherits the abstract \texttt{Recommender} type. Meanwhile, by implementing a custom concrete subtype like \texttt{MyCustomModel} below and corresponding \texttt{fit!()} and \texttt{recommend()} functions, the developers can build a custom recommendation pipeline on the top of \texttt{Recommendation.jl}. The separation of common interface and actual algorithm implementation makes the package extensible.
+That is, among various pre-defined options we see in the following sections, we can choose an arbitrary recommender that inherits the abstract \texttt{Recommender} type. Meanwhile, by implementing a custom concrete subtype like \texttt{MyCustomModel} below and corresponding \texttt{fit!()} and \texttt{recommend()} functions, the developers can build a custom recommendation pipeline on the top of \texttt{Recommendation.jl}. The separation of common interfaces and actual algorithm implementation makes the package extensible.
 
 \begin{lstlisting}[language = Julia]
 struct MyCustomModel <: Recommender 
@@ -25,7 +25,7 @@
 
 \subsection{Non-Personalized Baselines}
 
-First and foremost, recommender systems are not necessarily built by complex linear algebra or machine learning, and rule-based ``non-personalized'' recommenders are commonly used as a baseline method that derives reasonable recommendations. For instance, regardless of target user's characteristics, a recommender \texttt{MostPopular(data::DataAccessor)} will return top-$k$ most popular items to every user, measured by the number of occurrences (i.e., popularity) in the whole user-item events:
+First and foremost, recommender systems are not necessarily built by complex linear algebra or machine learning, and rule-based ``non-personalized'' recommenders are commonly used as a baseline method that derives reasonable recommendations. For instance, regardless of the target user's characteristics, a recommender \texttt{MostPopular(data::DataAccessor)} will return top-$k$ most popular items to every user, measured by the number of occurrences (i.e., popularity) in the whole user-item events:
 
 \begin{lstlisting}[language = Julia]
 recommender = MostPopular(data)
@@ -38,34 +38,34 @@ \subsection{Non-Personalized Baselines}
 # -> [item# => popularity] : [4 => 4.0, 6 => 4.0]
 \end{lstlisting}
 
-As of writing, the other non-personalized options implemented in the package recommend items: that are most frequently co-occurred with a specific reference item (\texttt{CoOccurrence}), based on a percentage of observed \texttt{Event} values that are greater than a certain threshold (\texttt{ThresholdPercentage}), or based on a global mean of observed \texttt{Event} values (\texttt{UserMean}, \texttt{ItemMean}).
+As of writing, the other non-personalized options implemented in the package will recommend items: that is most frequently co-occurred with a specific reference item (\texttt{CoOccurrence}), based on a percentage of observed \texttt{Event} values that are greater than a certain threshold (\texttt{ThresholdPercentage}), or based on a global mean of observed \texttt{Event} values (\texttt{UserMean}, \texttt{ItemMean}).
 
 \subsection{Collaborative Filtering}
 \label{sec:cf}
 
-Collaborative filtering (CF) is one of the earliest recommendation techniques that was initially introduced in 1992 \cite{Goldberg1992}. The goal of CF algorithm is to suggest new items for a particular user based on a similarity metric. From a users' perspective, CF assumes that users who behaved similarly on a service share common tastes for items. On the other hand, items which resemble each other are likely to be preferred by the same users.
+Collaborative filtering (CF) is one of the earliest recommendation techniques that was initially introduced in 1992 \cite{Goldberg1992}. The goal of the CF algorithm is to suggest new items for a particular user based on a similarity metric. From a user's perspective, CF assumes that users who behaved similarly on a service share common tastes for items. On the other hand, items which resemble each other are likely to be preferred by the same users.
 
 \subsubsection{$k$-Nearest Neighbor}
 
-A $k$-nearest neighbor ($k$-NN) approach, one of the simplest CF algorithms, runs in two-fold. First, missing values in $R$ is predicted based on the past observations. Here, a $(u, i)$ element between a target user $u$ and item $i$ is estimated by computing the similarities of users (items). Second, a recommender chooses top-$N$ items from the results of the prediction step.
+A $k$-nearest neighbor ($k$-NN) approach, one of the simplest CF algorithms, runs two-fold. First, missing values in $R$ are predicted based on past observations. Here, a $(u, i)$ element between a target user $u$ and item $i$ is estimated by computing the similarities of users (items). Second, a recommender chooses top-$N$ items from the results of the prediction step.
 
-Importantly, $k$-NN can be classified into a \textit{user-based} and \textit{item-based} algorithm. In a user-based algorithm, user-user similarities are computed for every pairs of rows in $R$. By contrast, item-based CF stands on column-wise similarities between items. \fig{cf} illustrates how CF works on a user-item matrix $R$. The elements are ratings in a $[1, 5]$ range for each user-item pair, so $1$ and $2$ mean relatively negative feedback and vice versa. In the figure, user $a$ and $c$ seem to have similar tastes because both of them gave nearly identical feedback to item $1$, $4$ and $6$. From an item-item perspective, item $4$ and $6$ are similarly rated by user $a$, $b$ and $c$.
+Importantly, $k$-NN can be classified into a \textit{user-based} and \textit{item-based} algorithm. In a user-based algorithm, user-user similarities are computed for every pair of rows in $R$. By contrast, item-based CF stands on column-wise similarities between items. \fig{cf} illustrates how CF works on a user-item matrix $R$. The elements are ratings in a $[1, 5]$ range for each user-item pair, so $1$ and $2$ mean relatively negative feedback and vice versa. In the figure, users $a$ and $c$ seem to have similar tastes because both of them gave nearly identical feedback to the item $1$, $4$, and $6$. From an item-item perspective, items $4$ and $6$ are similarly rated by user $a$, $b$, and $c$.
 
 \begin{figure}[htbp]
   \centering
   \includegraphics[width=1.0\linewidth]{images/cf.pdf}
-  \caption{A schematic diagram of the $k$-NN-based recommender systems on a five-level rating matrix. This figure used Figure~1 in \cite{Sarwar2001} as a reference. For an active user $u$, his/her missing elements $r_{u,i}$ are estimated based on either user-user or item-item similarities, and a recommendation list includes highest-scored items.}
+  \caption{A schematic diagram of the $k$-NN-based recommender systems on a five-level rating matrix. This figure is based on Figure~1 in \cite{Sarwar2001} as a reference. For an active user $u$, his/her missing elements $r_{u,i}$ are estimated based on either user-user or item-item similarities, and a recommendation list contains the highest-scored items.}
   \label{fig:cf}
 \end{figure}
 
-In order to measure the similarities between rows (columns), the Pearson correlation and cosine similarity are widely used. For $d$-dimensional vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^d$, the Pearson correlation $\mathrm{corr}(\mathbf{x}, \mathbf{y})$ and cosine similarity $\mathrm{cos}(\mathbf{x}, \mathbf{y})$ are respectively defined as:
+To measure the similarities between rows (columns), the Pearson correlation and cosine similarity are widely used. For $d$-dimensional vectors $\mathbf{x}, \mathbf{y} \in \mathbb{R}^d$, the Pearson correlation $\mathrm{corr}(\mathbf{x}, \mathbf{y})$ and cosine similarity $\mathrm{cos}(\mathbf{x}, \mathbf{y})$ are respectively defined as:
 $$
 \mathrm{corr}(\mathbf{x}, \mathbf{y}) = \frac{\sum_i (x_{i} - \overline{x})(y_{i} - \overline{y})}{\sqrt{\sum_i (x_{i} - \overline{x})^2} \sqrt{\sum_i (y_{i} - \overline{y})^2}},
 $$
 $$
 \mathrm{cos}(\mathbf{x}, \mathbf{y}) = \frac{\mathbf{x} \cdot \mathbf{y}}{\| \mathbf{x} \| \| \mathbf{y} \|} = \frac{\sum_i x_{i} y_{i}}{\sqrt{\sum_i x_{i}^2} \sqrt{\sum_i y_{i}^2}},
 $$
-where $\overline{x} = \frac{1}{d} \sum^d_{i=1} x_i$ and $\overline{y} = \frac{1}{d} \sum^d_{i=1} y_i$ denote mean values of the elements in a vector. Additionally, in a context of data mining, elements in $\mathbf{x}$ and $\mathbf{y}$ can be distributed on a different scale, so mean-centering of the vectors usually leads better results \cite{Sarwar2001}. Note that cosine similarity between the mean-centered vectors, $\hat{\mathbf{x}} = (x_1 - \overline{x}, x_2 - \overline{x}, \dots, x_n - \overline{x})$ and $\hat{\mathbf{y}} = (y_1 - \overline{y}, y_2 - \overline{y}, \dots, y_n - \overline{y})$, is mathematically equivalent to the Pearson correlation $\mathrm{corr}(\mathbf{x}, \mathbf{y})$, meaning $\mathrm{cos}(\hat{\mathbf{x}}, \hat{\mathbf{y}}) = \mathrm{corr}(\mathbf{x}, \mathbf{y})$, and the following code snippet demonstrates its implementation in the Julia ecosystem.
+where $\overline{x} = \frac{1}{d} \sum^d_{i=1} x_i$ and $\overline{y} = \frac{1}{d} \sum^d_{i=1} y_i$ denote mean values of the elements in a vector. Additionally, in the context of data mining, elements in $\mathbf{x}$ and $\mathbf{y}$ can be distributed on a different scale, so mean-centering of the vectors usually leads to better results \cite{Sarwar2001}. Note that cosine similarity between the mean-centered vectors, $\hat{\mathbf{x}} = (x_1 - \overline{x}, x_2 - \overline{x}, \dots, x_n - \overline{x})$ and $\hat{\mathbf{y}} = (y_1 - \overline{y}, y_2 - \overline{y}, \dots, y_n - \overline{y})$, is mathematically equivalent to the Pearson correlation $\mathrm{corr}(\mathbf{x}, \mathbf{y})$, meaning $\mathrm{cos}(\hat{\mathbf{x}}, \hat{\mathbf{y}}) = \mathrm{corr}(\mathbf{x}, \mathbf{y})$, and the following code snippet demonstrates its implementation in the Julia ecosystem.
 
 \begin{lstlisting}[language = Julia]
 import Statistics: mean
@@ -74,11 +74,12 @@ \subsubsection{$k$-Nearest Neighbor}
 function similarity(x::AbstractVector,
                     y::AbstractVector)
     x_hat, y_hat = x .- mean(x), y .- mean(y)
-    dot(x_hat, y_hat) / (norm(x_hat) * norm(y_hat))
+    dot(x_hat, y_hat) / (
+        norm(x_hat) * norm(y_hat))
 end
 \end{lstlisting}
 
-Based on the similarity definition, user-based CF using the Pearson correlation \cite{Herlocker1999} sees $\mathbf{x}$ and $\mathbf{y}$ as two different rows in $R$, respectively, and gives a weight to a user-user pair by the similarity. In the \texttt{fit!()} phase, the weights allow a recommender to (1) select the top-$k$ highest-weighted users (i.e., nearest neighbors) of a target user $u$, and (2) predict missing elements based on a mean value of neighbors' feedback. Ultimately, sorting items by the predicted values enables \texttt{recommend()} to generate a ranked list of recommended items for a user $u$. Simply put, a constructor of user-based CF in \texttt{Recommendation.jl} is as follows.
+Based on the similarity definition, user-based CF using the Pearson correlation \cite{Herlocker1999} sees $\mathbf{x}$ and $\mathbf{y}$ as two different rows in $R$, respectively, and gives weight to a user-user pair by the similarity. In the \texttt{fit!()} phase, the weights allow a recommender to (1) select the top-$k$ highest-weighted users (i.e., nearest neighbors) of a target user $u$, and (2) predict missing elements based on a mean value of neighbors' feedback. Ultimately, sorting items by the predicted values enables \texttt{recommend()} to generate a ranked list of recommended items for a user $u$. Simply put, a constructor of user-based CF in \texttt{Recommendation.jl} is as follows.
 
 \begin{lstlisting}[language = Julia]
 UserKNN(data::DataAccessor, n_neighbors::Integer)
@@ -91,8 +92,9 @@ \subsubsection{$k$-Nearest Neighbor}
 \end{lstlisting}
 
 \subsubsection{Singular Value Decomposition}
+\label{sec:svd}
 
-Along with the development of the CF techniques, researchers noticed that handling the original huge user-item matrices is computationally expensive. Moreover, CF-based recommendation leads overfitting to individual taste due to the sparsity of $R$. Thus, dimensionality reduction techniques were applied to recommendation in order to capture more abstract preferences \cite{Sarwar2000}.
+Along with the development of the CF techniques, researchers noticed that handling the original huge user-item matrices is computationally expensive. Moreover, CF-based recommendation leads to overfitting to individual taste due to the sparsity of $R$. Thus, dimensionality reduction techniques were applied to the recommendation to capture more abstract preferences \cite{Sarwar2000}.
 
 Singular value decomposition (SVD) is one of the most popular dimensionality reduction techniques that decomposes an $m$-by-$n$ matrix $A$ to $U \in \mathbb{R}^{m \times m}$, $\Sigma \in \mathbb{R}^{m \times n}$ and $V \in \mathbb{R}^{n \times n}$: 
 \begin{align*}
@@ -100,9 +102,9 @@ \subsubsection{Singular Value Decomposition}
 = & \ \left[\mathbf{u}_1, \mathbf{u}_2, \cdots, \mathbf{u}_m\right] \cdot \mathrm{diag}\left(\sigma_1, \sigma_2, \dots, \sigma_{\min(m, n)}\right) \cdot \\
 & \ \left[\mathbf{v}_1, \mathbf{v}_2, \cdots, \mathbf{v}_n\right]^{\mathrm{T}},
 \end{align*}
-by letting $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_{\min(m, n)} \geq 0$. An orthogonal matrix $U$ ($V$) is called left (right) singular vectors which represents characteristics of columns (rows) in $R$, and a diagonal matrix $\Sigma$ holds singular values on the diagonal elements as weights of each singular vector.
+by letting $\sigma_1 \geq \sigma_2 \geq \cdots \geq \sigma_{\min(m, n)} \geq 0$. An orthogonal matrix $U$ ($V$) is called left (right) singular vectors which represent characteristics of columns (rows) in $R$, and a diagonal matrix $\Sigma$ holds singular values on the diagonal elements as weights of each singular vector.
 
-In practice, the most lower singular values of real-world matrices are very close to zero, and hence using only top-$k$ singular values $\Sigma_k \in \mathbb{R}^{k \times k}$ and corresponding singular vectors $U_k \in \mathbb{R}^{m \times k}$, $V_k \in \mathbb{R}^{n \times k}$ is sufficient to make reasonable rank-$k$ approximation of a matrix $A$ as: $\mathrm{SVD}_k(A) = U_k \Sigma_k V_k^{\mathrm{T}}$. It is mathematically proven that $\mathrm{SVD}_k(A)$ is the best rank-$k$ approximation of the matrix $A$ in both the spectral and Frobenius norm, where the spectral norm of a matrix equals to its largest singular value.
+In practice, the most lower singular values of real-world matrices are very close to zero, and hence using only top-$k$ singular values $\Sigma_k \in \mathbb{R}^{k \times k}$ and corresponding singular vectors $U_k \in \mathbb{R}^{m \times k}$, $V_k \in \mathbb{R}^{n \times k}$ is sufficient to make a reasonable rank-$k$ approximation of a matrix $A$ as $\mathrm{SVD}_k(A) = U_k \Sigma_k V_k^{\mathrm{T}}$. It is mathematically proven that $\mathrm{SVD}_k(A)$ is the best rank-$k$ approximation of the matrix $A$ in both the spectral and Frobenius norm, where the spectral norm of a matrix equals its largest singular value.
 
 \begin{figure}[htbp]
   \centering
@@ -111,21 +113,21 @@ \subsubsection{Singular Value Decomposition}
   \label{fig:svd}
 \end{figure}
 
-Sarwar et~al. \cite{Sarwar2000} studied the use of SVD on user-item matrix $R \in \mathbb{R}^{|\mathcal{U}| \times |\mathcal{I}|}$. In a context of recommendation, $U_k \in \mathbb{R}^{|\mathcal{U}| \times k}$, $V \in \mathbb{R}^{|\mathcal{I}| \times k}$ and $\Sigma \in \mathbb{R}^{k \times k}$ are respectively seen as $k$ user/item feature vectors and corresponding weights. The idea of low-rank approximation that discards lower singular values intuitively works as \textit{compression} or \textit{denoising} of the original matrix; that is, each element in a rank-$k$ matrix $A_k$ holds the best \textit{compressed} (or \textit{denoised}) value of the original element in $A$. Thus, $R_k = \mathrm{SVD}_k(R)$, the best rank-$k$ approximation of $R$, captures as much as possible of underlying users' preferences. Once $R$ is decomposed into $U, \Sigma$ and $V$, a $(u, i)$ element of $R_k$ calculated by $\sum^k_{j=1} \sigma_j u_{u, j} v_{i, j}$ could be a prediction for the user-item pair. In the Julia ecosystem, the process can be implemented in a few lines of code with the standard \texttt{LinearAlgebra} library:
+Sarwar et~al. \cite{Sarwar2000} studied the use of SVD on user-item matrix $R \in \mathbb{R}^{|\mathcal{U}| \times |\mathcal{I}|}$. In a context of recommendation, $U_k \in \mathbb{R}^{|\mathcal{U}| \times k}$, $V \in \mathbb{R}^{|\mathcal{I}| \times k}$ and $\Sigma \in \mathbb{R}^{k \times k}$ are respectively seen as $k$ user/item feature vectors and corresponding weights. The idea of low-rank approximation that discards lower singular values intuitively works as \textit{compression} or \textit{denoising} of the original matrix; that is, each element in a rank-$k$ matrix $A_k$ holds the best \textit{compressed} (or \textit{denoised}) value of the original element in $A$. Thus, $R_k = \mathrm{SVD}_k(R)$, the best rank-$k$ approximation of $R$, holds underlying users' preferences the most. Once $R$ is decomposed into $U, \Sigma$ and $V$, a $(u, i)$ element of $R_k$ calculated by $\sum^k_{j=1} \sigma_j u_{u, j} v_{i, j}$ could be a prediction for the user-item pair. In the Julia ecosystem, the process can be implemented in a few lines of code with the standard \texttt{LinearAlgebra} library:
 
 \begin{lstlisting}[language = Julia]
 import LinearAlgebra: svd
 F = svd(data.R)
 U, S, Vt = F.U[:, 1:k], F.S[1:k], F.Vt[1:k, :]
-# predict a missing value between user and item
+# predict a value for an arbitrary user-item pair
 r_k = dot(U[user, :] .* S, Vt[:, item])
 \end{lstlisting}
 
 \subsubsection{Matrix Factorization}
 
-Even though dimensionality reduction is a promising approach to make effective recommendation, the feasibility of SVD is still questionable due to the computational cost of decomposition and need for uncertain preliminary work such as missing value imputation and searching an optimal $k$. As a result, a new technique generally called matrix factorization (MF) was introduced \cite{Koren2009} as an alternative.
+Even though dimensionality reduction is a promising approach to making an effective recommendation, the feasibility of SVD is still questionable due to the computational cost of decomposition and the need for uncertain preliminary work such as missing value imputation and searching an optimal $k$. As a result, a new technique generally called matrix factorization (MF) was introduced \cite{Koren2009} as an alternative.
 
-The initial MF technique was invented by Funk \cite{Funk2006} during the Netflix Prize \cite{Bennett07thenetflix}, and the method is also known as \textit{regularized SVD} because it can be seen as an extension of the conventional SVD-based recommendation that gives efficient approximation of the original SVD. The basic idea of MF is to factorize a user-item matrix $R$ to a user factored matrix $P \in \mathbb{R}^{|\mathcal{U}| \times k}$ and item factored matrix $Q \in \mathbb{R}^{|\mathcal{I}| \times k}$, by solving the following minimization problem for a set of observed user-item interactions $\mathcal{S} = \{(u, i) \in \mathcal{U} \times \mathcal{I}\}$:
+The initial MF technique was invented by Funk \cite{Funk2006} during the Netflix Prize \cite{Bennett07thenetflix}, and the method is also known as \textit{regularized SVD} because it can be seen as an extension of the conventional SVD-based recommendation that gives an efficient approximation of the original SVD. The basic idea of MF is to factorize a user-item matrix $R$ to a user-factored matrix $P \in \mathbb{R}^{|\mathcal{U}| \times k}$ and item factored matrix $Q \in \mathbb{R}^{|\mathcal{I}| \times k}$, by solving the following minimization problem for a set of observed user-item interactions $\mathcal{S} = \{(u, i) \in \mathcal{U} \times \mathcal{I}\}$:
 $$
   \min_{P, Q} \sum_{(u, i) \in \mathcal{S}} \left( r_{u,i} - \mathbf{p}_u^{\mathrm{T}} \mathbf{q}_i \right)^2 + \lambda \ (\|\mathbf{p}_u\|^2 + \|\mathbf{q}_i\|^2),
 $$
@@ -140,29 +142,29 @@ \subsubsection{Matrix Factorization}
 end
 \end{lstlisting}
 
-Eventually, $R$ is approximated by $PQ^{\mathrm{T}}$ as shown in \fig{mf}, and a recommender can rank items by the  prediction. Notice that mathematically tractable properties of SVD such as orthogonality of factored matrices will be lost over the course of approximation.
+Eventually, $R$ is approximated by $PQ^{\mathrm{T}}$ as shown in \fig{mf}, and a recommender can rank items by the prediction. Notice that mathematically tractable properties of SVD such as orthogonality of factored matrices will be lost for approximation.
 
 \begin{figure}[htbp]
   \centering
   \includegraphics[width=0.8\linewidth]{images/mf.pdf}
-  \caption{MF for an $m$-by-$n$ rating matrix $R$. Unlike SVD, singular values in $\Sigma$ are considered to be embedded to the factored matrices.}
+  \caption{MF for an $m$-by-$n$ rating matrix $R$. Unlike SVD, singular values in $\Sigma$ are considered to be embedded in the factored matrices.}
   \label{fig:mf}
 \end{figure}
 
-MF is attractive in terms of not only efficiency but extensibility. Since prediction for each user-item pair can be written by a simple vector product as $r_{u,i} = \mathbf{p}_u^{\mathrm{T}} \mathbf{q}_i$, incorporating different features (e.g., biases and temporal factors) into the model as linear combinations is straightforward. For example, let $\mu$ be a global mean of all elements in $R$, and $b_u, b_i$ be respectively a user and item bias term. Here, we assume that each observation can be represented as $r_{u,i} = \mu + b_u + b_i + \mathbf{p}_u^{\mathrm{T}} \mathbf{q}_i$. This formulation is known as biased MF \cite{Koren2009}, and it is possible to capture more information than the original MF even on the same set of events $\mathcal{S}$. It should be noted that advanced methods such as tensor factorization \cite{Karatzoglou2010} would require higher dimensionality and more costly optimization scheme to enrich MF.
+MF is attractive in terms of not only efficiency but extensibility. Since prediction for each user-item pair can be written by a simple vector product as $r_{u,i} = \mathbf{p}_u^{\mathrm{T}} \mathbf{q}_i$, incorporating different features (e.g., biases and temporal factors) into the model as linear combinations is straightforward. For example, let $\mu$ be a global mean of all elements in $R$, and $b_u, b_i$ be respectively a user and item bias term. Here, we assume that each observation can be represented as $r_{u,i} = \mu + b_u + b_i + \mathbf{p}_u^{\mathrm{T}} \mathbf{q}_i$. This formulation is known as biased MF \cite{Koren2009}, and it is possible to capture more information than the original MF even on the same set of events $\mathcal{S}$. There are also other advanced methods such as tensor factorization \cite{Karatzoglou2010} that require higher dimensionality and a more costly optimization scheme to enrich MF.
 
-Meanwhile, there are different options for loss functions to optimize MF. To give an example, Chen et~al. \cite{Chen2011} showed various types of features and loss functions which can be incorporated into a MF scheme. An appropriate choice of their combinations is likely to lead surprisingly better accuracy compared to the classical MF, and \texttt{Recommendation.jl} currently supports Bayesian personalized ranking (BPR) loss \cite{10.5555/1795114.1795167} as an alternative option via \texttt{BPRMatrixFactorization <: Recommender}.
+Meanwhile, there are different options for loss functions to optimize MF. To give an example, Chen et~al. \cite{Chen2011} showed various types of features and loss functions which can be incorporated into an MF scheme. An appropriate choice of their combinations is likely to lead to surprisingly better accuracy compared to the classical MF, and \texttt{Recommendation.jl} currently supports Bayesian personalized ranking (BPR) loss \cite{10.5555/1795114.1795167} as an alternative option via \texttt{BPRMatrixFactorization <: Recommender}.
 
 \subsection{Factorization Machines}
 
-Beyond numerous discussions about MF, factorization machines (FMs) have been recently developed as its generalized model. In contrast to MF, FMs are formulated by a equation that is similar to the polynomial regression, and the model can be applied all of regression, classification and ranking problems depending on a choice of loss function with or without SGD-based optimization.
+Beyond numerous discussions about MF, factorization machines (FMs) have been recently developed as their generalized model. In contrast to MF, FMs are formulated by an equation that is similar to polynomial regression, and the model can be applied to all regression, classification, and ranking problems depending on a choice of the loss function with or without SGD-based optimization.
 
-First of all, for an input vector $\mathbf{x} \in \mathbb{R}^d$, let us imagine the following second-order polynomial model parameterized by $w_0 \in \mathbb{R}$, $\mathbf{w} \in \mathbb{R}^d$ as: $\hat{y}(\mathbf{x}) := w_0 + \mathbf{w}^{\mathrm{T}} \mathbf{x} + \sum_{i=1}^d \sum_{j=i}^d w_{i,j} x_i x_j,$ where $w_{i,j}$ is an element in a symmetric matrix $W \in \mathbb{R}^{d \times d}$, and it indicates a weight of $x_i x_j$, an interaction between the $i$-th and $j$-th element in $\mathbf{x}$. Here, FMs assume that $W$ can be approximated by a low-rank matrix $V \in \mathbb{R}^{d \times k}$ for $k < d$, and the weights are replaced with inner products of $k$ dimensional vectors as $w_{i, j} \approx \mathbf{v}_i^{\mathrm{T}} \mathbf{v}_j$ for $\mathbf{v}_1, \cdots, \mathbf{v}_d \in \mathbb{R}^k$. As a result, the formulation of FM model is:
+First of all, for an input vector $\mathbf{x} \in \mathbb{R}^d$, let us imagine the following second-order polynomial model parameterized by $w_0 \in \mathbb{R}$, $\mathbf{w} \in \mathbb{R}^d$ as: $\hat{y}(\mathbf{x}) := w_0 + \mathbf{w}^{\mathrm{T}} \mathbf{x} + \sum_{i=1}^d \sum_{j=i}^d w_{i,j} x_i x_j,$ where $w_{i,j}$ is an element in a symmetric matrix $W \in \mathbb{R}^{d \times d}$, and it indicates a weight of $x_i x_j$, an interaction between the $i$-th and $j$-th element in $\mathbf{x}$. Here, FMs assume that $W$ can be approximated by a low-rank matrix $V \in \mathbb{R}^{d \times k}$ for $k < d$, and the weights are replaced with inner products of $k$ dimensional vectors as $w_{i, j} \approx \mathbf{v}_i^{\mathrm{T}} \mathbf{v}_j$ for $\mathbf{v}_1, \cdots, \mathbf{v}_d \in \mathbb{R}^k$. As a result, the formulation of the FM model is:
 \begin{equation}
 \hat{y}^{\mathrm{FM}}(\mathbf{x}) := \underbrace{w_0}_{\textbf{global bias}} + \underbrace{\mathbf{w}^{\mathrm{T}} \mathbf{x}_{ }}_{\textbf{linear}} + \sum_{i=1}^d \sum_{j=i}^d \underbrace{\mathbf{v}_i^{\mathrm{T}} \mathbf{v}_j}_{\textbf{interaction}} x_i x_j.
 \label{eq:FMs}
 \end{equation}
-Several studies \cite{Geuens2015,Rendle2012-1,Rendle2012-3} prove that the flexibility of feature representations $\mathbf{x}$ is one of the most important characteristics that makes FMs versatile. The code snippet below demonstrates how an input vector is created with \texttt{Recommendation.jl}'s utility function \texttt{onehot()}.
+Several studies \cite{Geuens2015,Rendle2012-1,Rendle2012-3} prove that the flexibility of feature representations $\mathbf{x}$ is one of the most important characteristics that makes FMs versatile. The code snippet below demonstrates how a concatenated input vector is created with \texttt{Recommendation.jl}'s utility function \texttt{onehot()}.
 
 \begin{lstlisting}[language = Julia]
 x = vcat(
@@ -170,8 +172,8 @@ \subsection{Factorization Machines}
     onehot(3, collect(1:n_items)), # item ID
     2.5, # rating
     # ...
-    onehot("Male", # gender
-          ["Male", "Female", "Others", missing]),
+    onehot("Weekly", # email preference
+          ["Daily", "Weekly", "Monthly", missing]),
     onehot(2, collect(1:7)) # day of week
 )
 \end{lstlisting}
@@ -180,7 +182,7 @@ \subsection{Factorization Machines}
 \begin{align*}
 \hat{y}^{\mathrm{FM}^{(p)}}(\mathbf{x}) &:= w_0 + \mathbf{w}^{\mathrm{T}} \mathbf{x} \\ &+ \sum^p_{\ell=2} \sum^d_{j_1 = 1} \cdots \sum^d_{j_p = j_{p-1} + 1} \left( \prod^{\ell}_{i=1} x_{j_i} \right) \sum^{k_{\ell}}_{f=1} \prod^{\ell}_{i=1} v_{j_i,f},
 \end{align*}
-with the model parameters $w_0 \in \mathbb{R}, \ \mathbf{w} \in \mathbb{R}^d, \ V_{\ell} \in \mathbb{R}^{d \times k_{\ell}},$ where $\ell \in \{2, \cdots, p\}$. Although the higher-order FMs are attractive to capture more complex underlying concepts from dynamic data, the computational cost should become more expensive accordingly. In favor of balancing the algorithmic sophistication and its efficiency, \texttt{Recommendation.jl} only considers the second-order model trained by SGD for the time being.
+with the model parameters $w_0 \in \mathbb{R}, \ \mathbf{w} \in \mathbb{R}^d, \ V_{\ell} \in \mathbb{R}^{d \times k_{\ell}},$ where $\ell \in \{2, \cdots, p\}$. Although the higher-order FMs are attractive to capturing more complex underlying concepts from dynamic data, the computational cost should become more expensive accordingly. In favor of balancing the algorithmic sophistication and its efficiency, \texttt{Recommendation.jl} only considers the second-order model trained by SGD for the time being.
 
 \begin{lstlisting}[language = Julia]
 struct FactorizationMachines <: Recommender
@@ -195,11 +197,11 @@ \subsection{Factorization Machines}
 
 \subsection{Content-Based Filtering}
 
-All techniques introduced so far rely on users' historical behavior on a service, but these kinds of recommenders easily face a challenge so-called \textit{cold-start} when it comes to recommending new items (for new users) that do not have sufficient amount of historical data to capture meaningful information. In order to work around the difficulty, content-based recommender systems \cite{Lops2011} are likely to be preferred in reality.
+All techniques introduced so far rely on users' historical behavior on a service, but these kinds of recommenders easily face a challenge so-called \textit{cold-start} when it comes to recommending new items (for new users) that do not have a sufficient amount of historical data to capture meaningful information. To work around the difficulty, content-based recommender systems \cite{Lops2011} are likely to be preferred in reality.
 
-Most importantly, content-based recommenders make recommendation without using the other users' feedbacks. In particular, a content-based approach gives scores to items based on two kinds of information: item model and (static) user preference. In order to model the items, an item-attribute matrix is defined as: $I \in \mathbb{R}^{|\mathcal{I}| \times |\mathcal{A}|}$, where $\mathcal{A}$ is a set of item attributes. Meanwhile, user attributes can be captured through \texttt{DataAccessor}'s \texttt{user\_attributes} property, which is independent from what kind of \texttt{Event}s a system has observed.
+Most importantly, content-based recommenders make a recommendation without using the other users' feedback. In particular, a content-based approach gives scores to items based on two kinds of information: item model and (static) user preference. To model the items, an item-attribute matrix is defined as $I \in \mathbb{R}^{|\mathcal{I}| \times |\mathcal{A}|}$, where $\mathcal{A}$ is a set of item attributes. Meanwhile, user attributes can be captured through \texttt{DataAccessor}'s \texttt{user\_attributes} property, which is independent of what kind of \texttt{Event}s a system has observed.
 
-From a practical perspective, choosing a set of attributes $\mathcal{A}$ is an essential problem to launch a content-based recommender successfully. In fact, there tend to be numerous candidates on a real-world dataset such as item category and brand, but using too much attributes may increase sparsity and complexity of the vectors, which ends up with poor recommendation performance. With that in mind, one of the most well-studied types of attribute \texttt{Recommendation.jl} also supports is ``term''. More concretely, each item is represented by a set of words, and the items are modeled by TF-IDF weighting \cite{Manning2008}. For instance, if we like to recommend web pages to users, we first need to parse sentences on a page and then construct a vector based on the frequency of each term as:
+From a practical perspective, choosing a set of attributes $\mathcal{A}$ is an essential problem to launch a content-based recommender successfully. In fact, there tend to be numerous candidates on a real-world dataset such as item category and brand, but using too many attributes may increase the sparsity and complexity of the vectors, which ends up with poor recommendation performance. With that in mind, one of the most well-studied types of attribute \texttt{Recommendation.jl} also supports is ``term''. More concretely, each item is represented by a set of words, and the items are modeled by TF-IDF weighting \cite{Manning2008}. For instance, if we like to recommend web pages to users, we first need to parse sentences on a page and then construct a vector based on the frequency of each term as:
 
 \begin{equation*}
   I=
@@ -216,9 +218,9 @@ \subsection{Content-Based Filtering}
   \end{blockarray}
 \end{equation*}
 
-In case of our item-word matrices, for a given item $i$, term frequency (TF) for a term $t$ is defined as: $\mathrm{tf}(t, i) = \frac{n_{t,i}}{N_i},$ where $n_{t,i}$ denotes an $(i, t)$ element in $I$, and $N_i$ is the total number of words that an item $i$ contains. Meanwhile, inverse document frequency (IDF) is computed over $M$ items as: $\mathrm{idf}(t) = \log \frac{M}{\mathrm{df}(t)} + 1,$ where $\mathrm{df}(t)$ counts the number of items which associate with a term $t$. Finally, each item-term pair is weighted by: $\mathrm{tf}(t, i) \cdot \mathrm{idf}(t)$ in the TF-IDF scheme. 
+In the case of our item-word matrices, for a given item $i$, term frequency (TF) for a term $t$ is defined as $\mathrm{tf}(t, i) = \frac{n_{t,i}}{N_i},$ where $n_{t,i}$ denotes an $(i, t)$ element in $I$, and $N_i$ is the total number of words that an item $i$ contains. Meanwhile, inverse document frequency (IDF) is computed over $M$ items as $\mathrm{idf}(t) = \log \frac{M}{\mathrm{df}(t)} + 1,$ where $\mathrm{df}(t)$ counts the number of items which associate with a term $t$. Finally, each item-term pair is weighted by: $\mathrm{tf}(t, i) \cdot \mathrm{idf}(t)$ in the TF-IDF scheme. 
 
-Since there are several variations of how to calculate $\mathrm{tf}(t, i)$ and $\mathrm{idf}(t)$, \texttt{Recommendation.jl} requires users to pre-compute these numbers in order to maximize the feasibility of the recommender:
+Since there are several variations of how to calculate $\mathrm{tf}(t, i)$ and $\mathrm{idf}(t)$, \texttt{Recommendation.jl} requires users to pre-compute these numbers to maximize the feasibility of the recommender:
 
 \begin{lstlisting}[language = Julia]
 struct TFIDF <: Recommender
@@ -228,4 +230,4 @@ \subsection{Content-Based Filtering}
 end
 \end{lstlisting}
 
-% If the features were chosen appropriately, content-based recommenders could work well even on challenging settings which cannot be handled by the conventional recommenders. To give an example, when a new item is added to a system, making reasonable prediction for the item is impossible by using the classical approaches such as CF. By contrast, since content-based recommenders only require the attributes of items, new items can show up in a recommendation list with equal chance to the old items. Furthermore, explaining the results of content-based recommendation is possible because the attributes are manually selected by humans.
+% If the features were chosen appropriately, content-based recommenders could work well even in challenging settings that cannot be handled by conventional recommenders. To give an example, when a new item is added to a system, making a reasonable prediction for the item is impossible by using classical approaches such as CF. By contrast, since content-based recommenders only require the attributes of items, new items can show up in a recommendation list with an equal chance to the old items. Furthermore, explaining the results of content-based recommendations is possible because the attributes are manually selected by humans.
diff --git a/paper/section/conclusion.tex b/paper/section/conclusion.tex
new file mode 100644
index 0000000..44e0750
--- /dev/null
+++ b/paper/section/conclusion.tex
@@ -0,0 +1,5 @@
+This paper introduced \texttt{Recommendation.jl}, an open-source package for building recommender systems in the Julia programming language. First, by reviewing each of the core features of practical recommender pipelines, data model (\sect{data}), recommender interface and algorithms (\sect{algorithm}), and evaluation methods (\sect{evaluation}), we observed how diverse recommender's interests can be; the applications must be able to address both explicit and implicit representation of user feedback, hybridize rule-based and machine learning-based algorithms, and assess the outcomes from wide-ranging perspectives in terms of not only accuracy but diversity, coverage, novelty, and serendipity. Thus, Julia's extensible and mathematical operation-friendly APIs come in handy for working with the unique characteristics we demonstrated by their formulation and corresponding code snippet throughout the paper.
+
+Moreover, we conducted a benchmark with multiple recommender-metric pairs provided by \texttt{Recommendation.jl} and confirmed there are no one-size-fits-all approaches to making ``good'' recommendations. On the one hand, we can maximize prediction accuracy by training a sophisticated model-based recommender with an optimal set of hyperparameters. However, at the same time, the best prediction accuracy does not always yield the most diverse recommendation, which might eventually hinder recommenders from acknowledging fairness implications. The observations tell us that one of the most important requirements for recommender frameworks is to make a wide variety of options available for developers while leaving enough space for customization, which \texttt{Recommendation.jl} has tried to incorporate by design.
+
+Finally, there are numerous possible directions to improve the package as we learned from the other open-source solutions in \sect{introduction}. For instance, the availability of state-of-the-art recommendation algorithms makes a framework more promising in a competitive environment in the industry, where Python-based machine learning packages play a dominant role. Meanwhile, since computational efficiency is a key criterion that directly leads to a developer's productivity, the use of acceleration techniques such as distributed multiprocessing and GPU programming would be a mandatory step to undergo. Last but not least, easing to run an end-to-end recommendation pipeline iteratively is a foundational challenge so we can bridge a gap between an offline and online setup. In particular, evaluation phases pose a crucial challenge in reproducibility as mentioned in \sect{evaluation}.
diff --git a/paper/section/data.tex b/paper/section/data.tex
index 9caf4e3..467dd17 100644
--- a/paper/section/data.tex
+++ b/paper/section/data.tex
@@ -1,6 +1,6 @@
 As depicted in \fig{recommender}, a common first step of building a recommender is to capture user-item events and translate them into matrix representation. Here, \texttt{Recommendation.jl} eases the step by providing a unified wrapper called \texttt{DataAccessor}. Since data for recommender systems is easily standardizable as a collection of a user, item, and auxiliary attributes, the common interface helps developers to follow the separation-of-concerns principle and ensure the easiness and reliability of data manipulation. 
 
-To be more precise, raw data is always converted into a \texttt{DataAccessor} instance at the data preprocessing phase with proper validation (e.g., data type check, missing value handling), and hence the subsequent steps can simply take the instance and access the data (or metadata) without worrying about unexpected input. \fig{accessor} illustrates the procedure.
+To be more precise, raw data is always converted into a \texttt{DataAccessor} instance at the data preprocessing phase with proper validation (e.g., data type check, missing value handling), and hence the subsequent steps can simply take the instance, and access the data (or metadata) without worrying about unexpected input. \fig{accessor} illustrates the procedure.
 
 \begin{figure}[htbp]
     \centering
@@ -57,4 +57,4 @@
                    attribute::AbstractVector)
 \end{lstlisting}
 
-Additionally, the package provides data loaders that import publicly available datasets such as MovieLens \cite{harper2015movielens}, Amazon Reviews \cite{ni2019justifying}, HetRec 2011 Last.FM\footnote{\url{https://www.last.fm/}} dataset \cite{Cantador:RecSys2011}, as well as a synthetic implicit feedback generator using a simple rule-based method demonstrated in \cite{Aharon2013}. These modules return a ready-to-use \texttt{DataAccessor} instance for easing experiments.
+Additionally, the package provides data loaders that import publicly available datasets such as MovieLens \cite{harper2015movielens}, Amazon Reviews \cite{ni2019justifying}, and HetRec 2011 Last.FM\footnote{\url{https://www.last.fm/}} dataset \cite{Cantador:RecSys2011}, as well as a synthetic implicit feedback generator using a simple rule-based method demonstrated in \cite{Aharon2013}. These modules return a ready-to-use \texttt{DataAccessor} instance for easing experiments.
diff --git a/paper/section/evaluation.tex b/paper/section/evaluation.tex
index c5fd3bf..8b63064 100644
--- a/paper/section/evaluation.tex
+++ b/paper/section/evaluation.tex
@@ -1,25 +1,29 @@
-One of the notable characteristics of \texttt{Recommendation.jl} is a diverse set of evaluation metrics, including not only the standard accuracy metrics but fairness metrics such as diversity and serendipity. Even though the idea of diverse or serendipitous recommendation is not new in the literature, the topic has rapidly gained traction in these days as the society realizes the importance of fairness in intelligent systems. This section highlights the high-level concept of these metrics and their implementation in Julia based on a common abstract type, \texttt{Matric}.
+One of the notable characteristics of \texttt{Recommendation.jl} is a diverse set of evaluation metrics, including not only the standard accuracy metrics but fairness metrics such as diversity and serendipity. Even though the idea of diverse or serendipitous recommendations is not new in the literature, the topic has rapidly gained traction these days as society realizes the importance of ethical implications in intelligent systems \cite{milano2020recommender}. This section highlights the high-level concept of these metrics and their implementation in Julia based on a common abstract type, \texttt{Matric}.
 
 \begin{lstlisting}[language = Julia]
 abstract type Metric end
 \end{lstlisting}  
 
-For accuracy metrics, users can use the standard evaluation scheme, \texttt{cross\_validation} and \texttt{leave\_one\_out}, provided by the package. For instance, the following module runs \texttt{n\_folds} cross validation for a specific combination of recommender and ranking metric. Notice that a recommender is initialized with \texttt{recommender\_args} and runs top-k recommendation.
+For accuracy metrics, users can use the standard evaluation scheme, \texttt{cross\_validation} and \texttt{leave\_one\_out}, provided by the package. For instance, the following module runs \texttt{n\_folds} cross-validation for a specific combination of recommender and ranking metric. Notice that a recommender is initialized with \texttt{recommender\_args} for making a top-k recommendation.
 
 \begin{lstlisting}[language = Julia]
 cross_validation(
-    n_folds::Integer,
-    metric::Type{<:RankingMetric},
-    topk::Integer,
-    recommender_type::Type{<:Recommender},
-    data::DataAccessor,
-    recommender_args...
+    n_folds::Integer, 
+    metric::Metric, 
+    topk::Integer, 
+    recommender_type::Type{<:Recommender}, 
+    data::DataAccessor, 
+    recommender_args...; 
+    # control whether recommending the same item to 
+    # the same user multiple times is allowed
+    allow_repeat=false
 )
 \end{lstlisting}
 
-It should be noted that evaluating recommender systems is not always the same as measuring the accuracy of machine learning-based prediction, and there is a separate research domain discussing about what an appropriate evaluation method is. In the open-source community, the Python-based \texttt{RecPack} package \cite{michiels2022recpack} takes this point into consideration and provides a dedicated layer called \texttt{Scenario}, which can be a future direction \texttt{Recommendation.jl} possibly aims for.
+It should be noted that evaluating recommender systems is not always the same as measuring the accuracy of machine learning-based prediction, and there is a separate research domain discussing what an appropriate evaluation method is. In the open-source community, the Python-based \texttt{RecPack} package \cite{michiels2022recpack} considers this point and provides a dedicated layer called \texttt{Scenario}, which can be a future direction \texttt{Recommendation.jl} possibly aims for.
 
 \subsection{Rating Metrics}
+\label{sec:rating-metrics}
 
 First and foremost, even though the community focuses more on implicit feedback-based ranking problems lately, rating prediction is still an important foundation in the field of recommender systems as the previous sections mentioned.
 
@@ -48,16 +52,16 @@ \subsection{Ranking Metrics}
 end
 \end{lstlisting}
 
-Although the interface is the same across the metrics, each of them has a different objective as part of its formulation. To review the differences with some intuition, let a target user $u \in \mathcal{U}$, set of all items $\mathcal{I}$, ordered set of top-$N$ recommended items $I_N(u) \subset \mathcal{I}$, and set of truth items $\mathcal{I}^+_u$. 
+Although the interface is the same across the metrics, each of them has a different objective as part of its formulation. To review the differences with some intuition, let a target user $u \in \mathcal{U}$, set of all items $\mathcal{I}$, ordered set of top-$k$ recommended items $I_k(u) \subset \mathcal{I}$, and set of truth items $\mathcal{I}^+_u$. 
 
-\subsubsection{Recall-at-$N$}
+\subsubsection{Recall-at-$k$}
 
-Recall-at-$N$ (Recall@$N$) indicates coverage of truth samples as a result of top-$N$ recommendation. The value is computed by the following equation:
+Recall-at-$k$ (Recall@$k$) indicates coverage of truth samples as a result of top-$k$ recommendation. The value is computed by the following equation:
 $$
-\mathrm{Recall@}N = \frac{|\mathcal{I}^+_u \cap I_N(u)|}{|\mathcal{I}^+_u|}.
+\mathrm{Recall@}k = \frac{|\mathcal{I}^+_u \cap I_k(u)|}{|\mathcal{I}^+_u|}.
 $$
 
-Here, $|\mathcal{I}^+_u \cap I_N(u)|$ is the number of \textit{true positives} which can be simply computed by the following piece of code:
+Here, $|\mathcal{I}^+_u \cap I_k(u)|$ is the number of \textit{true positives} which can be simply computed by the following piece of code:
 
 \begin{lstlisting}[language = Julia]
 function count_intersect(
@@ -67,27 +71,27 @@ \subsubsection{Recall-at-$N$}
 end
 \end{lstlisting}
 
-\subsubsection{Precision-at-$N$}
+\subsubsection{Precision-at-$k$}
 
-Unlike Recall@$N$, Precision-at-$N$ (Precision@$N$) evaluates correctness of a top-$N$ recommendation list $I_N(u)$ according to the portion of true positives in the list as:
+Unlike Recall@$N$, Precision-at-$k$ (Precision@$k$) evaluates the correctness of a top-$k$ recommendation list $I_k(u)$ according to the portion of true positives in the list as:
 $$
-\mathrm{Precision@}N = \frac{|\mathcal{I}^+_u \cap \mathcal{I}_N(u)|}{|\mathcal{I}_N(u)|}.
+\mathrm{Precision@}k = \frac{|\mathcal{I}^+_u \cap \mathcal{I}_k(u)|}{|\mathcal{I}_k(u)|}.
 $$
-In other words, Precision@$N$ means how much the recommendation list covers true pairs.
+In other words, Precision@$k$ measures how much the recommendation list covers true pairs.
 
 \subsubsection{Mean Average Precision (MAP)}
 
-While the original Precision@$N$ provides a score for a fixed-length recommendation list $I_N(u)$, mean average precision (MAP) computes an average of the scores over all recommendation sizes from 1 to $|\mathcal{I}|$. MAP is formulated with an indicator function for $i_n$, the $n$-th item of $I(u)$, as:
+While the original Precision@$k$ provides a score for a fixed-length recommendation list $I_k(u)$, mean average precision (MAP) computes an average of the scores against all possible recommendation sizes from 1 to $|\mathcal{I}|$. MAP is formulated with an indicator function for $i_n$, the $n$-th item of $I(u)$, as:
 \begin{equation*}
 \mathrm{MAP} = \frac{1}{|\mathcal{I}^+_u|} \sum_{n = 1}^{|\mathcal{I}|} \mathrm{Precision@}n \cdot \mathds{1}_{\mathcal{I}^+_u}(i_n).
 \end{equation*}
-It should be noticed that, MAP is not a simple mean of sum of Precision@$1$, Precision@$2$, $\dots$, Precision@$|\mathcal{I}|$, and higher-ranked true positives lead better MAP.
+It should be noticed that MAP is not a simple mean of the sum of Precision@$1$, Precision@$2$, $\dots$, Precision@$|\mathcal{I}|$, and higher-ranked true positives lead better MAP.
 
 \subsubsection{Area under the ROC Curve (AUC)}
 
-ROC curve and area under the ROC curve (AUC) are generally used in evaluation of the classification problems, but these concepts can also be interpreted in a context of ranking problem. Basically, the AUC metric for ranking considers all possible pairs of truth and other items which are respectively denoted by $i^+ \in \mathcal{I}^+_u$ and $i^- \in \mathcal{I}^-_u$, and it expects that the ``best'' recommender completely ranks $i^+$ higher than $i^-$.
+ROC curve and area under the ROC curve (AUC) are generally used in the evaluation of classification problems, but these concepts can also be interpreted in the context of the ranking problem. The AUC metric for ranking considers all possible pairs of truth and other items which are respectively denoted by $i^+ \in \mathcal{I}^+_u$ and $i^- \in \mathcal{I}^-_u$, and it expects that the ``best'' recommender completely ranks $i^+$ higher than $i^-$.
 
-AUC calculation keeps track the number of true positives at different rank in $\mathcal{I}$. In the implementation of \texttt{measure()}, the code adds the number of true positives which were ranked higher than the current non-truth sample to the accumulated count of correct pairs. Ultimately, an AUC score is computed as portion of the correct ordered $(i^+, i^-)$ pairs in the all possible combinations determined by $|\mathcal{I}^+_u| \times |\mathcal{I}^-_u|$ in set notation. 
+AUC calculation keeps tracking the number of true positives at different ranks in $\mathcal{I}$. In the implementation of \texttt{measure()}, the code adds the number of true positives which were ranked higher than the current non-truth sample to the accumulated count of correct pairs. Ultimately, an AUC score is computed as a portion of the correct ordered $(i^+, i^-)$ pairs in all possible combinations determined by $|\mathcal{I}^+_u| \times |\mathcal{I}^-_u|$ in set notation. 
 
 \subsubsection{Reciprocal Rank (RR)}
 
@@ -98,19 +102,20 @@ \subsubsection{Reciprocal Rank (RR)}
 RR can be zero if and only if $\mathcal{I}^+_u$ is empty.
 
 \subsubsection{Mean Percentile Rank (MPR)}
-Mean percentile rank (MPR) is a ranking metric based on $r_{i} \in [0, 100]$, the percentile-ranking of an item $i$ within the sorted list of all items for a user $u$. It can be formulated as:
+Mean percentile rank (MPR) is a ranking metric based on $r_{i} \in [0, 100]$, the percentile ranking of an item $i$ within the sorted list of all items for a user $u$. It can be formulated as:
 \begin{equation*}
 \mathrm{MPR} = \frac{1}{|\mathcal{I}^+_u|} \sum_{i \in \mathcal{I}^+_u} r_{i}.
 \end{equation*}
-$r_{i} = 0\%$ is the best value that means the truth item $i$ is ranked at the highest position in a recommendation list. On the other hand, $r_{i} = 100\%$ is the worst case that the item $i$ is at the lowest rank.
+$r_{i} = 0\%$ is the best value which means the truth item $i$ is ranked at the highest position in a recommendation list. On the other hand, $r_{i} = 100\%$ is the worst case that the item $i$ is at the lowest rank.
 
-MPR internally considers not only top-$N$ recommended items also all of the non-recommended items, and it accumulates the percentile ranks for all true positives unlike MRR. So, the measure is suitable to estimate users' overall satisfaction for a recommender. Intuitively, $\mathrm{MPR} > 50\%$ should be worse than random ranking from a users' point of view.
+MPR internally considers not only top-$k$ recommended items but also all of the non-recommended items, and it accumulates the percentile ranks for all true positives, unlike MRR. So, the measure is suitable to estimate users' overall satisfaction with a recommender. Intuitively, $\mathrm{MPR} > 50\%$ should be worse than random ranking from a user's point of view.
 
 \subsubsection{Normalized Discounted Cumulative Gain (NDCG)}
 
-Like MPR, normalized discounted cumulative gain (NDCG) computes a score for $I(u)$ which places emphasis on higher-ranked true positives. In addition to being a more well-formulated measure, the difference between NDCG and MPR is that NDCG allows us to specify an expected ranking within $\mathcal{I}^+_u$; that is, the metric can incorporate $\mathrm{rel}_n$, a relevance score which suggests how likely the $n$-th sample is to be ranked at the top of a recommendation list, and it directly corresponds to an expected ranking of the truth samples.
+Like MPR, normalized discounted cumulative gain (NDCG) computes a score for $I(u)$ which emphasizes higher-ranked true positives. In addition to being a more well-formulated measure, the difference between NDCG and MPR is that NDCG allows us to specify an expected ranking within $\mathcal{I}^+_u$; that is, the metric can incorporate $\mathrm{rel}_n$, a relevance score which suggests how likely the $n$-th sample is to be ranked at the top of a recommendation list, and it directly corresponds to an expected ranking of the truth samples.
 
 \subsection{Aggregated Metrics}
+\label{sec:aggregated-metrics}
 
 Aggregated metrics return a single score for an array of multiple top-$k$ recommendation lists as the following function signature illustrates. 
 
@@ -118,45 +123,46 @@ \subsection{Aggregated Metrics}
 abstract type AggregatedMetric <: Metric end
 function measure(
     metric::AggregatedMetric, 
-    recommendations::AbstractVector{
-        <:AbstractVector{<:Integer}}; 
-    kwargs...)
+    recommendations::
+        AbstractVector{<:AbstractVector{<:Integer}}; 
+    topk::Union{Integer, Nothing})
 end
 \end{lstlisting}
 
-A comprehensive summary of these metrics are available in \cite{shani2011evaluating}, and Eq.~(20) and (21) on its page 26 provide the formulation of two metrics that are available in \texttt{Recommendation.jl} supports, Gini index and Shannon Entropy. Unlike calculating errors for every truth-prediction pair as we have seen in the previous sections, aggregating multiple recommendation lists gives a bird's eye view of how good a recommender system is as a whole. Thus, the metrics are useful to measure the global diversity of recommender's outputs.
+A comprehensive summary of these metrics is available in \cite{shani2011evaluating}, and Equation~(20) and (21) on page 26 provide the formulation of two metrics that are available in \texttt{Recommendation.jl}, the Gini index and Shannon Entropy. Unlike calculating errors for every truth-prediction pair as we have seen in the previous sections, aggregating multiple recommendation lists gives a bird's eye view of how good a recommender system is as a whole. Thus, the metrics are useful to measure the global diversity of the recommender's outputs.
 
 \subsubsection{Aggregated Diversity}
 
-\texttt{AggregatedDiversity} calculates the number of distinct items recommended across all suers. A larger value indicates more diverse recommendation result overall.
+\texttt{AggregatedDiversity} calculates the number of distinct items recommended across all users. A larger value indicates a more diverse recommendation result overall.
 
-Let $\mathcal{U}$ and $\mathcal{I}$ be a set of users and items, respectively, and $L_N(u)$ a list of top-$N$ recommended items for a user $u$. Here, an aggregated diversity can be calculated as:
+Let $\mathcal{U}$ and $\mathcal{I}$ be a set of users and items, respectively, and $L_k(u)$ a list of top-$k$ recommended items for a user $u$. Here, an aggregated diversity can be calculated as:
 \begin{equation*}
-\left| \bigcup\limits_{u \in \mathcal{U}} L_N(u) \right|.
+\left| \bigcup\limits_{u \in \mathcal{U}} L_k(u) \right|.
 \end{equation*}
 
 Not to mention the equation is translated to a simple set operation in Julia.
 
 \subsubsection{Shannon Entropy}
 
-If we focus more on individual items and how many users are recommended a particular item, the diversity of top-$N$ recommender can be defined by Shannon Entropy (\texttt{ShannonEntropy}):
+If we focus more on individual items and how many users are recommended a particular item, the diversity of top-$k$ recommender can be defined by Shannon Entropy (\texttt{ShannonEntropy}):
 \begin{align*}
--\sum_{j = 1}^{|\mathcal{I}|} \Bigg( & \frac{\left|\{u \mid u \in \mathcal{U} \wedge i_j \in L_N(u) \}\right|}{N |\mathcal{U}|} \cdot \\ 
-& \ln \left( \frac{\left|\{u \mid u \in \mathcal{U} \wedge i_j \in L_N(u) \}\right|}{N |\mathcal{U}|}  \right) \Bigg),
+-\sum_{j = 1}^{|\mathcal{I}|} \Bigg( & \frac{\left|\{u \mid u \in \mathcal{U} \wedge i_j \in L_k(u) \}\right|}{k |\mathcal{U}|} \cdot \\ 
+& \ln \left( \frac{\left|\{u \mid u \in \mathcal{U} \wedge i_j \in L_k(u) \}\right|}{k |\mathcal{U}|}  \right) \Bigg),
 \end{align*}
-where $i_j$ denotes $j$-th item in the available item set $\mathcal{I}$.
+where $i_j$ denotes $j$-th item in the available item set $\mathcal{I}$. The ``worst'' entropy is zero when a single item is always recommended.
 
 \subsubsection{Gini Index}
 
-Gini Index, which is normally used to measure a degree of inequality in a distribution of income, can be applied to assess diversity in the context of top-$N$ recommendation:
+The Gini Index, which is normally used to measure a degree of inequality in the distribution of income, can also be applied to assess diversity in the context of top-$k$ recommendation:
 \begin{equation*}
-\frac{1}{|\mathcal{I}| - 1} \sum_{j = 1}^{|\mathcal{I}|} \left( (2j - |\mathcal{I}| - 1) \cdot \frac{\left|\{u \mid u \in \mathcal{U} \wedge i_j \in L_N(u) \}\right|}{N |\mathcal{U}|} \right).
+\frac{1}{|\mathcal{I}| - 1} \sum_{j = 1}^{|\mathcal{I}|} \left( (2j - |\mathcal{I}| - 1) \cdot \frac{\left|\{u \mid u \in \mathcal{U} \wedge i_j \in L_k(u) \}\right|}{k |\mathcal{U}|} \right).
 \end{equation*}
-\texttt{measure(metric::GiniIndex, recommendations, topk)} is 0 when all items are equally chosen in terms of the number of recommended users.
+\texttt{measure(metric::GiniIndex, recommendations, topk)} returns 0 when all items are equally chosen (``best''), and 1 when a single item is always chosen.
 
 \subsection{Intra-List Metrics}
+\label{sec:intra-list-metrics}
 
-Given a list of recommended items (for a single user), intra-list metrics quantifies the quality of the recommendation list from a non-accuracy perspective. Kotkov et~al. \cite{kotkov2016survey} highlighted the foundation of these metrics, and \texttt{Recommendation.jl} implements four of them: \texttt{Coverage}, \texttt{Novelty}, \texttt{IntraListSimilarity}, and \texttt{Serendipity} under the following schema.
+Given a list of recommended items (for a single user), intra-list metrics quantify the quality of the recommendation list from a non-accuracy perspective. Kotkov et~al. \cite{kotkov2016survey} highlighted the foundation of these metrics, and \texttt{Recommendation.jl} implements four of them: \texttt{Coverage}, \texttt{Novelty}, \texttt{IntraListSimilarity}, and \texttt{Serendipity} under the following schema.
 
 \begin{lstlisting}[language = Julia]
 abstract type IntraListMetric <: Metric end
@@ -168,7 +174,7 @@ \subsection{Intra-List Metrics}
 end
 \end{lstlisting}
 
-Notice that standardizing an interface for the quality measures is not straightforward because the definition of ``quality'' is ambiguous. Hence, a list of \texttt{recommendations} can be given either as a set or array (vector) depending on whether the uniqueness of items in the list matters, for example. Meanwhile, \texttt{kwargs...} differ a lot depending on a choice of metric.
+Notice that standardizing an interface for the quality measures is not straightforward because the definition of ``quality'' is ambiguous. Hence, a list of \texttt{recommendations} can be given either as a set or array (vector) depending on whether the uniqueness of items in the list matters, for example. Meanwhile, \texttt{kwargs...} differ depending on a choice of metric.
 
 \subsubsection{Coverage}
 
@@ -182,7 +188,7 @@ \subsubsection{Coverage}
 )
 \end{lstlisting}
 
-The set operation could leverage \texttt{count\_intersect()} \sect{ranking-metrics} highlighted.
+A larger coverage can indicate a recommender is unlikely biased toward a limited set of items. The set operation could leverage \texttt{count\_intersect()} \sect{ranking-metrics} highlighted.
 
 \subsubsection{Novelty}
 
@@ -196,9 +202,11 @@ \subsubsection{Novelty}
 )
 \end{lstlisting}
 
+The metric quantifies the recommender's capability to surface unseen items, which allows users to encounter unexpected items for discovery.
+
 \subsubsection{Intra-List Similarity}
 
-Ziegler et~al. \cite{ziegler2005improving} demonstrated a metric that computes a sum of similarities between every pairs of recommended items. A larger value represents less diversity.
+Ziegler et~al. \cite{ziegler2005improving} demonstrated a metric that computes a sum of similarities between every pair of recommended items. A larger value represents less diversity.
 
 \begin{lstlisting}[language = Julia]
 struct IntraListSimilarity <: IntraListMetric end
@@ -223,4 +231,4 @@ \subsubsection{Serendipity}
 )
 \end{lstlisting}
 
-It should be noticed that quantifying relevance and unexpectedness is another task we must undergo before calculating the metric, and the results must be largely affected by how these factors are calculated.
+It should be noticed that we must first quantify \texttt{relevance} and \texttt{unexpectedness} before calculating the metric, and the results can be largely affected by how these factors are calculated.
diff --git a/paper/section/experiment.tex b/paper/section/experiment.tex
new file mode 100644
index 0000000..7f6db46
--- /dev/null
+++ b/paper/section/experiment.tex
@@ -0,0 +1,49 @@
+So far, this paper has introduced various recommendation techniques and metrics implemented in \texttt{Recommendation.jl}. This section finally evaluates the recommenders on different metrics. Since the purpose of the following experiment is to demonstrate the capability of \texttt{Recommendation.jl} and undergo trade-off discussions among different metrics, we test only on the minimal MovieLens 100k dataset \cite{harper2015movielens} and use the \texttt{SVD} recommender (\sect{svd}) as a model-based advanced option, which requires the simplest set of hyperparameters, along with multiple baselines. However, developers can easily evaluate larger datasets with more complex models in the same way as we describe below.
+
+We conducted a 5-fold cross-validation of top-10 recommendations on the 100,000 user-item-rating pairs, by randomly splitting the data into five distinct sets. For each trial, we call \texttt{fit!()} on four-fifths of them (80\% samples) and then run top-10 \texttt{recommend()} for every user. Ultimately, resulting recommendations, as well as predicted ratings, are compared with the ones observed in the rest of 20\% samples for validation.\footnote{A complete Julia script used for the experiment can be found at \url{https://github.com/takuti/Recommendation.jl/blob/v1.0.0/examples/benchmark.jl}.}
+
+\begin{lstlisting}[language = Julia]
+n_folds = 5
+topk = 10
+data = load_movielens_100k()
+cross_validation(
+    n_folds, metrics, recommender, data,
+    params...)
+\end{lstlisting}
+
+\tab{results} summarizes the results obtained from each recommender-metric pair. On the one hand, model-based SVD recommenders showed higher accuracy than the baselines in terms of both rating and ranking metrics. In particular, as the accuracy changes by $k$ for $\mathrm{SVD}_k$, we see $k = 16$ can be an optimal hyperparameter for the recommender. On the other hand, aggregated and intra-list metrics do not yield the same conclusion; since larger $k$ gives a closer approximation to real-world diverse user-item behaviors, $\mathrm{SVD}_{32}$ shows the highest aggregated diversity and Shannon entropy. These observations demonstrate the trade-off between accuracy and non-accuracy metrics as \fig{tradeoff} depicts.
+
+\begin{figure}[htbp]
+    \centering
+    \includegraphics[width=1.0\linewidth]{images/tradeoff.pdf}
+    \caption{$F_1$ score (accuracy metric calculated by $2 \frac{\mathrm{recall} \cdot \mathrm{precision}}{\mathrm{recall} + \mathrm{precision}}$) and aggregated diversity (non-accuracy metric) for $\mathrm{SVD}_k$ recommenders, based on the numbers in \tab{results}. The accuracy graph shows that an optimal $k$ is $16$ where $F_1$ score is maximized, whereas diversity monotonically increases as $k$ gets larger. Best baseline metrics are illustrated as dashed lines for reference.}
+    \label{fig:tradeoff}
+\end{figure}
+
+\begin{table*}[]
+    \centering
+    \tbl{Results from 5-fold cross-validation of top-10 recommendation conducted on MovieLens 100k user-item-rating pairs. Numbers are rounded to 3 decimal places, and those in the bold font indicate the ``best'' values for each metric. Accuracy metrics for \texttt{MostPopular} are not calculated because the recommender does not explicitly predict ratings.}{
+    \begin{tabular}{|cl||r|r|r|r|r|r|r|}
+    \hline
+    \multicolumn{2}{|c||}{}                                                                                                           & \texttt{ItemMean} & \texttt{UserMean} & \texttt{MostPopular} & \texttt{SVD(4)} & \texttt{SVD(8)} & \texttt{SVD(16)} & \texttt{SVD(32)} \\ \hline \hline
+    \multicolumn{1}{|c|}{\multirow{2}{*}{\begin{tabular}[c]{@{}c@{}}Rating\\ (\sect{rating-metrics})\end{tabular}}}         & \texttt{RMSE               } & 0.642    & 0.681    & -           & 0.545  & \textbf{0.524}  & \textbf{0.524}   & 0.550   \\ \cline{2-9} 
+    \multicolumn{1}{|c|}{}                                                                                           & \texttt{MAE                } & 0.603    & 0.642    & -           & 0.493  & 0.471  & \textbf{0.470}   & 0.496   \\ \hline \hline
+    \multicolumn{1}{|c|}{\multirow{6}{*}{\begin{tabular}[c]{@{}c@{}}Ranking\\ (\sect{ranking-metrics})\end{tabular}}}       & \texttt{Recall             } & 0.108    & 0.002    & 0.114       & 0.182  & 0.212  & \textbf{0.228}   & 0.218   \\ \cline{2-9} 
+    \multicolumn{1}{|c|}{}                                                                                           & \texttt{Precision          } & 0.185    & 0.004    & 0.189       & 0.297  & 0.335  & \textbf{0.353}   & 0.328   \\ \cline{2-9} 
+    \multicolumn{1}{|c|}{}                                                                                           & \texttt{AUC                } & 0.417    & 0.018    & 0.429       & 0.531  & 0.558  & \textbf{0.579}   & 0.571   \\ \cline{2-9} 
+    \multicolumn{1}{|c|}{}                                                                                           & \texttt{ReciprocalRank     } & 0.415    & 0.011    & 0.409       & 0.583  & 0.642  & \textbf{0.670}   & 0.645   \\ \cline{2-9} 
+    \multicolumn{1}{|c|}{}                                                                                           & \texttt{MPR                } & 84.671   & 89.784   & 84.021      & 80.192 & 78.431 & \textbf{77.417}  & 78.023  \\ \cline{2-9} 
+    \multicolumn{1}{|c|}{}                                                                                           & \texttt{NDCG               } & 0.201    & 0.004    & 0.203       & 0.327  & 0.371  & \textbf{0.392}   & 0.365   \\ \hline \hline
+    \multicolumn{1}{|c|}{\multirow{3}{*}{\begin{tabular}[c]{@{}c@{}}Aggregated\\ (\sect{aggregated-metrics})\end{tabular}}} & \texttt{AggregatedDiversity} & 52.2     & 145.0    & 52.4        & 163.8  & 253.0  & 328.4   & \textbf{403.4}   \\ \cline{2-9} 
+    \multicolumn{1}{|c|}{}                                                                                           & \texttt{ShannonEntropy     } & 3.149    & 4.170    & 3.160       & 4.486  & 4.847  & 5.138   & \textbf{5.386}   \\ \cline{2-9} 
+    \multicolumn{1}{|c|}{}                                                                                           & \texttt{GiniIndex          } & 0.662    & 0.669    & 0.658       & 0.597  & 0.629  & 0.616   & \textbf{0.599}   \\ \hline \hline
+    \multicolumn{1}{|c|}{\multirow{2}{*}{\begin{tabular}[c]{@{}c@{}}Intra-list\\ (\sect{intra-list-metrics})\end{tabular}}} & \texttt{Coverage           } & 0.006    & 0.006    & 0.006       & 0.006  & 0.006  & 0.006   & 0.006   \\ \cline{2-9} 
+    \multicolumn{1}{|c|}{}                                                                                           & \texttt{Novelty            } & 8.998    & \textbf{9.944}    & 8.970       & 8.763  & 8.751  & 8.991   & 9.424   \\ \hline
+    \end{tabular}
+    }
+    \label{tab:results}
+\end{table*}
+
+Meanwhile, rule-based \texttt{UserMean} recommender, which simply scores items by a mean rating per user, was the best in terms of novelty, demonstrating the higher ability to surface unseen items at the top. In combination with the trade-off discussion above, the results tell us that focusing only on a single metric can easily confuse developers and mislead the users of recommender systems. Therefore, it is crucial to holistically assess the systems from multiple perspectives, and the design principle of \texttt{Recommendation.jl} follows the point as we explained in \sect{introduction}.
+
+It should be noticed that, as \texttt{kwargs...} in \sect{intra-list-metrics} indicate, evaluation in intra-list metrics is not straightforward due to the need for specifying additional arguments to set up a scenario. For the sake of simplicity, this section assumes \texttt{catalog} for \texttt{Coverage} is a set of all items available in the dataset, and \texttt{observed} for \texttt{Novelty} is a set of items in target user's training samples, allowing the recommenders to recommend the same items in a training set to the same user. Thus, \texttt{Coverage} in \tab{results} is the same across the recommenders because we always recommend 10 items per user from the fixed set of all items. Moreover, we did not evaluate in \texttt{IntraListSimilarity} and \texttt{Serendipity} because there is no obvious way to define item-item similarities, relevance, and unexpectedness; the choices depend largely on the developer's hypotheses and objectives that this paper does not discuss in detail.