From c9530f922d022abbaf884bf6658f31e7a3f7324f Mon Sep 17 00:00:00 2001 From: pablo-gar Date: Fri, 5 Apr 2024 20:54:30 +0000 Subject: [PATCH] =?UTF-8?q?Deploying=20to=20gh-pages=20from=20@=20chanzuck?= =?UTF-8?q?erberg/cellxgene-census@861444a2100333459e7cdfed63519facc14a9f4?= =?UTF-8?q?2=20=F0=9F=9A=80?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- _sources/articles.rst.txt | 8 +++++++ articles.html | 10 ++++++++ articles/2023/20230808-r_api_release.html | 2 +- .../2023/20230919-out_of_core_methods.html | 2 +- ...231012-normalized_layer_precalc_stats.html | 2 +- articles/2024/20240404-categoricals.html | 6 ++--- notebooks/experimental/pytorch.ipynb | 4 ++-- objects.inv | Bin 13189 -> 13198 bytes r/articles/comp_bio_data_integration.html | 22 +++++++++--------- r/pkgdown.yml | 2 +- r/search.json | 2 +- searchindex.js | 2 +- 12 files changed, 39 insertions(+), 23 deletions(-) diff --git a/_sources/articles.rst.txt b/_sources/articles.rst.txt index 788091d8b..b4e3c174e 100644 --- a/_sources/articles.rst.txt +++ b/_sources/articles.rst.txt @@ -9,4 +9,12 @@ What's new? :maxdepth: 1 articles/2023/* + +2024 +---------- + +.. toctree:: + :glob: + :maxdepth: 1 + articles/2024/* diff --git a/articles.html b/articles.html index bccad5aff..388776b3a 100644 --- a/articles.html +++ b/articles.html @@ -147,6 +147,9 @@
  • R package cellxgene.census V1 is out!
  • Memory-efficient implementations of commonly used single-cell methods
  • Introducing a normalized layer and pre-calculated cell and gene statistics in Census
  • + + +
  • 2024
  • @@ -205,6 +208,13 @@

    2023R package cellxgene.census V1 is out!
  • Memory-efficient implementations of commonly used single-cell methods
  • Introducing a normalized layer and pre-calculated cell and gene statistics in Census
  • + + + +
    +

    2024

    + diff --git a/articles/2023/20230808-r_api_release.html b/articles/2023/20230808-r_api_release.html index 44e8fad24..e000f39ab 100644 --- a/articles/2023/20230808-r_api_release.html +++ b/articles/2023/20230808-r_api_release.html @@ -152,9 +152,9 @@
  • Memory-efficient implementations of commonly used single-cell methods
  • Introducing a normalized layer and pre-calculated cell and gene statistics in Census
  • -
  • Census supports categoricals for cell metadata
  • +
  • 2024
  • Census in AWS ☁️
  • diff --git a/articles/2023/20230919-out_of_core_methods.html b/articles/2023/20230919-out_of_core_methods.html index 6a0741f0c..63bd04933 100644 --- a/articles/2023/20230919-out_of_core_methods.html +++ b/articles/2023/20230919-out_of_core_methods.html @@ -151,9 +151,9 @@
  • Introducing a normalized layer and pre-calculated cell and gene statistics in Census
  • -
  • Census supports categoricals for cell metadata
  • +
  • 2024
  • Census in AWS ☁️
  • diff --git a/articles/2023/20231012-normalized_layer_precalc_stats.html b/articles/2023/20231012-normalized_layer_precalc_stats.html index df907b78b..959f90af6 100644 --- a/articles/2023/20231012-normalized_layer_precalc_stats.html +++ b/articles/2023/20231012-normalized_layer_precalc_stats.html @@ -152,9 +152,9 @@
  • Help us improve these data additions
  • -
  • Census supports categoricals for cell metadata
  • +
  • 2024
  • Census in AWS ☁️
  • diff --git a/articles/2024/20240404-categoricals.html b/articles/2024/20240404-categoricals.html index 7f735298b..58f5ddb61 100644 --- a/articles/2024/20240404-categoricals.html +++ b/articles/2024/20240404-categoricals.html @@ -143,10 +143,8 @@
  • Installation
  • Quick start
  • What’s new?
      -
    • 2023
        -
      • R package cellxgene.census V1 is out!
      • -
      • Memory-efficient implementations of commonly used single-cell methods
      • -
      • Introducing a normalized layer and pre-calculated cell and gene statistics in Census
      • +
      • 2023
      • +
      • 2024
        • Census supports categoricals for cell metadata
          • Potential breaking changes
          • Identifying the obs columns encoded as categorical
          • diff --git a/notebooks/experimental/pytorch.ipynb b/notebooks/experimental/pytorch.ipynb index c7ab9c710..e7e6d721a 100644 --- a/notebooks/experimental/pytorch.ipynb +++ b/notebooks/experimental/pytorch.ipynb @@ -118,7 +118,7 @@ }, { "cell_type": "markdown", - "id": "ae75cc69", + "id": "a99f825d", "metadata": { "collapsed": false }, @@ -130,7 +130,7 @@ }, { "cell_type": "markdown", - "id": "3e7b33fd", + "id": "1268d196", "metadata": { "collapsed": false }, diff --git a/objects.inv b/objects.inv index 35cc3a6d6b8c9a981b0e5d072be9baaee6f46b93..88a566f2aa6ef9d704f8fd657f12339e27478b31 100644 GIT binary patch delta 12012 zcmV3W?N*R>T0W3INy*t_^Pt!j#+j_XjFo)DY{L5c+*~AQx;dr zDm^@bAehX#H5ys&lhBjw7uuwQ&vo8~0@=r5YVWWLi z1K;Lksr0opU))Ih&DN}FFJH^M<{RiuCkbV>x;Z?T&%t)DvTf?#Zl-x!)WR7i0a{B? zh!;)L)c)o|Hm>4B8pRcedX8NNOb1ybaT{!xIGZBeZgJdyzR9j{@?CtJR2j|&p}fcR za*nG7vhb*U5pB|>VARDb^+M9&G=l(V!9`&`VbtgrxOeB*@j9(nRklSTwgRVFSz^el zbVY0T&KrSXv><<-UL|eb&;{BiiC1OOq#qh<*<(x=*H%U`vAFM_8=oh4E z%g?y2GHYpnJ^C4GtRx-kUh7gJvin=FtXSY?tsyizxh`Q#xDweg}N?Jd)&* zwwbjz{MJcY;v_j8uEM4^AI2~cdoRXNhe5&W zb@zOUxKIYt$ENN@@|0~4MsNf2Nm$P2d~&arvjOsvEGLgH!?&f*sVMQ`5Li#AsvTYK zB0M{P-lLKHfPIn{b~&BD7Yn-pyEhAu9K!N5SaAgF(_GCCU5O`?@Cq+9k4+wFJfb0w zJsZp$-Y0BuHy-z9a7VX*e>#J!L*P%_@(O1tMn2y{xqB#YkAVH_lzRv9<+2Bz?fS40 zCi^y;Kfq7g>W)nA&+4w8NdJsh58m7E&HRpk{p}nl+Yd7D$>Z7GF1ra4KOVugUGB#? z3+;KNIaTe|SiaysSzEg~u-%ib-N1cpTLZ5xxJ?FX!V?ae#DhUtC0!bapx>i`ykLET z#Pjaf*-r)`^{eTuV@9D0r}ot4>UqXv9Bj>T{U;!Yn_!-e*}-QqbHBaZ}m#KI!> zW*i?-pPXIogJAB#uJ)kr&#rrEjXo`LzgjI<_Ta70r{y`ey8E)BU0~OqZD8UcjD0nj2@`dw9t%OI#FEHgf$`7Z4M>ReFx#*STZ{jQ z1~QC8-j{(4#ovR0WSnw<2GEIlCRQZA(9P?c${)C>@Ek0ihJL5P8ogSj?Od-UN#xgPNe9R>7gZj%8*bCM6y5 zjn@`YFyi?o0w`f7=w39P%cK!h(>Xz*H0>wXaHw-*`8kM5Nk@F+RY6NB5kZ!b07{q% zx))97@~H&XbWTtRO;xfL%bM1I&C)`q-&-LVeqwly!<;*~Ii|sJ4|CrHz&&I-hdZg7 zY?Y7g&&xOX&3?Q|n*UgnBE9>jOLUN}k7fw*_3`85Cj+YEd8%We`m#rLM^Qv|L{ZEU zYo$hEKLR}X&*MKm{!>)N$u^6tG*6Q{J>0CvSi%7^67g)vt;;*C?y$P`OT{ZCTeDY~5e(q9)7J_2sbY7^AipbILSQ%2LIkJQ3x| z<&LQG=K9kAQbA?Z*0h68iZvyHKbsrc67#ZMM>l2Nq-&XvXs};puCt4%b45cq4QWb3 zX^2iE?87#;jCH@d%2pYF_-6VOQz+Mr5B57-AdXCCYaC@#j zM$Guw9OSQUTJ6Auz_LmcjH}fvm5i~)s>HiSM;+XRz2;P?9;fQmUKv1+RhE5Fa|@_W zbTEs!J%WyvjsUI-8}SZ21l0|-(FQDwmduJ(m9hh!d`JI^RyS?&-uV+6a=#G8Sl7}i zM*kwL*r-x}KKe6%m?bW+=u*B)W0uIZHW}lmNO7cpnQqEz7s*bPWSecyKtbc7X1*e; zawX;rR+fmhE_;>atClz4@+wMh(+cVZv;=X!hR6DFnn_ocxQ&^tDmp$# zn=H>Uxe=vfeY1}t_Jv7}F&bMiDrbw$SDriqTcX$!#+C?Rw)ioCZ7_q6x=`G=g)Ko$ znw0S&f)8Q748d&pA%Uh5U1bIK(Br6KqQNq7!9;^%hMCw+yGe>Dt;_9Aa^3oxF&vI_ z#_DVGl<}IdAWVTMFFpgJ_Jf%s@*|dQ?>+6(;q;gCj${qNKWrgQuB%Ov00SQYo8BlF^+q}DT=*yl%c6CE z{soNuR0soMZ^QM`ih+rYvhFKvWh3V9e?;7KmqW#EKnlx>d2Gny#22AGY_k$#m*R|A__|IedKgyv=ugfZB=a(+KCYfzWJ)m+YMcBrNw`$ z#HnNloTIu9f}Sf9>;s`*aT3BwNYb%{vD(|R0p3X-T~=xG9?yd4gyVXybDvdjhT$vm ziilT)zxr%&&>tsvb&MzW=xg6^7z@Q&oA?_!J%91y!?W}6&dFI;ujt@diRweUu#+Dz z4CEfr2hl3euqM(0_PRKY;yKo5tpt%JOpp(bOpgg+0 z?lTiSe4@(T2e2JaIcAt3n`g&ozTD#gy9C|MVi~Qr>l2{u&vMiDh|F62x?I&+lOlg{ zLKBMnrDkj4G5x+_<|eZ7@{h!+iebA6wVaW>Xx4Y;5|KA zP3I38cuMkZld4Vq)UcLzCJ2a5K!AXdtd2Pa$|L`oyrnd@7D%hgyY6rv|Nd|PAtXJZD?;+#wA{StJoIw$zA>gcbY7RMRSLsw`#4b5U7k*IAL|EZmM? zt-<36?ryTx4GXwnpYkrLBe-^>aWg;&!OSj#6Wd?>?DH>+6iXCul(6+i;mZQ%sEihL{M~pcDd>ow5>-gvW$9=jS7FR5-R_A}6QCaRexfE*)P|<0 zZ^|}b^9iS4yO?4HYOr^Fnrc?J^-x#yMP7%ByWXoO=Hfb$wyv_fOdNr*nZ7dwPR^$P z7Gf)ZXNeWWi<`?ye{C}s7FxTD)S0}|{+I!hCl~`}UbO_lqF;`YP;u6_O18;mmS>Im zTvi7%%7*t9r`j_sOw7duqUPE7)S1^$$4_*qi08XB}3i zB_D9Y(_wI$X$c7o5tJB;U6;n)L|48>eu}?;$p!uW2SedAK0Z!p=J@%yx@}edJzX`- z6&9!Tbhz^SeoHK0@%LEa-{QjKhuZcV)5T@YPO-1jbbXnu-lMpzu3NWOFwgL{c>4u; z%3mY==lFUCGVOX-@4-d6YRr39od|o^*YJSg17leh$J%{)NU9P{urwedaN6YfaDe20 z*d597FzAu*ELZ@rpS%o_ILuvoFa|N8c6&AhNR#~hEmmH_a+$>{;O7VT>8B00tt2XC3s@6$SoLMzMzB`Uj zBYf%#c7zk(N&nTQk5ajBxavI1gROIa*#PG<zju3=c1n|nYTq#-l+E1tnQ%T)x@^wXGqF8I z0p+TGReIo`;wXcF@I87UNZjy;8E%n*DR0+~+sr!VIc{e>i4(r-?kgK1b*`{~obbZD zLZW{+8|+-H&>=^6aQ?@L@J$Arl}0-WswnFV`cWn?#rp*@!w44n)(XKKA1iRCJz}RP zoD2Mz{M70bCv~0dYIA{t-~|6O%etF|PV87;{=IX0QC8PUk<}ZFeS-=EPp_(ElUiW% zmDKTXB>(vRzY{Km^7y&a2mbPZvtaYfA%xyRS*Y32sGkC_1&b&+ODF;1R9eRcV>k=Z zw{HxsAT6M0bzC-q<|j4+68m_-wkqkQePyuz8Zm&q2pKM={Mtq=fMd!pBtorECAEpx zGgXw~;bWY8JMa+0wV!AP|M>mC(f{ii;pfjetIvONYYEU<-Ddq{#wI#{N=gn}{B0tY zpEMfBw~ft41}r*V?tEz5s==U6zIf|xf=YL|2_5BoPqXd;12-;mhK1+I$S+;=7K|;*xwc%Xofruwbx{RE z8Pr*1-+r>(V%?M_?Q0}|hPCIVRVO29PQ|Zz8FLZBM(c%+bOUa_(@q?oFkgX4VazQ2 zTVgrbFEGNXjChUfasBl&M7d70L?r1ttG#7;?Cz7O*=^I>R%VA5YqP<^W6#bTO35pS@tj4(R#BBg!0`PdQ=&GLgNJ&H^^;LIS&EhWC|l& zFZZNX40mF?1dj>}W=j;MDeSze^`%E)XAbFQdL?ksEi z2c+?7a{EN_k~JQG_)uT4sVI3sOFndNRfj+Q{5fUd)OK`5HNQ2cv#q`wgHXobFQ(3? z&>ZjBSMm8CikaJIY4avyc|lUKbwWq7rb@D9hpLV(=NY}!U$ZTC?e#3gD&3}-l+555 zKu6T2d&B>l${V8X{j%)Vwlk3q*#-V#F#oWgqb}`bYpl6{u`PIWiVODNS?(eh@-hA2 zk%f#dors=%eE9hA$WY!<^w>!r&?$LGhmVcr9XZN7>c}(G?+7g8Adk@rc~m% zdDOAzk+D6GqGLyS#~pdc#`2yx%6rn0_ry@%vE%nU?tH)Fsqfd6kQ6_r>p2-EyDjr< zwKF(8nWkNTK2YJr&))2Utw9QRf zS;5xn&LEQSkA(fTAWz(lA|Tty?9=j#QH1&nk{!Q~M)u6qs3JA<*pB6_unysK?T-b4 zc{v`#Ix9>Pz2H9lbRlC!^*#jh z6pAu`k8ZQ;B5j(5mNqW^Kud-yN<@O8c|S6R*MWU-4MhVT$hXsEnvH1j1Gr1bT#Y*4 z`X_`fHvdU+%k6(kWH0O9`=Y$d(>2Z&a2B*m$DBnL&=@l9C-9TyNT=MA4zvBsB*^Iw zWbZTD}rh*1UQXYRp#G9Vjwpi0uf5U>SOzY}2 z(GU8YZk^5Os}I9g#U2}krAqqc;8VrZJi>Iqe%4ieD8V#XbX?AONa%!=g}6^m!gO?} z&U!9{Q)4z2sHwA>jK-A|bE&)B^@ERGH3j14ol(O+VNR`a)&G3 zH`x@HdDPW`fKM>*JA|^g%m=Ryq0BI>$XkV}xFRRGe8%#%yjYnXj&?5}VQDBmwKzAI zBySrQH#D_m5e?IipAeps7o^=L?>4Pd6JY8~a-ZU53(U%M>G>KbCI z5fLoxr#YQ$!n~SJ{P^|k5=k>QYxB@sN_smbNop;MlDGVxcd0Qi>Zfzj(&cuAsU)*i zRbH$6(&s`nJyc&7$cF`K0Y-tzDu*wsN$Aebr# zXA?5sJr|)D@)f&?2wr*Wb(Iw|+T<#2U|W-IWV773EcZN)_|b!-#~;8gyWGGHytpez z>q7(6(Bp^Utnkzd(NEDc9nl3Mxrf@HT{;twKoyh*DDL6iihFM_-szB(vFMON|R1$Xg;bmn(xrlAXx!ksq(}GPVCBx zCAC-eJD44wn@CfISOvjCEN#9lQm|`#JJcmyBz&SUDweCl^tPNJjk3fHpi=Z!hB0_&JcsG{DKd7NK zAelsJF`t?R06i*+!44!%`9)lW~H>vEfHXq~R8b)q6|D>$&DtIbOYGYSX& zY*vi^mhiXmQShy&f~lwkkzNTRQ3)bZ2_m&{pUOV(V}NrD7^edmAG0Ro)IQ2vTx-MT z7A_O$BDKwblc-L%Sy}|N)6-FDZ<(|UE)5G2D8zTqaKXU}5njks>p^N;AEkiV_u4hL z-q?;qVml6s4KIygi^Z$9KnJ{NGEl$=`12)6yZ~FD@&aV7M;CZ$x=Hh-C|A2C#YUU% z(r~)H9P-v4K`W<^Jn?t#T@Hp^uSy~7)wazyWq}8OuBB!f>?j^|JEMo;w0Sirwc9B@ zB(gA(h2cwKH}#;~IX(2W#`F~qKH9#SzSF@IZG$%qUS``ePbz)<587(p%*)}g34g6m zH2hl=k7bsdYSGbGkKgM<7|vz*p;$PTFRaA!7;Cvg&^KsaiSm`jqf;;&wkA8Vq;-mJK025JKy! z7@tbVg!@f~dD`v;w#FT?-`0fx9YleDDrHrmlMIEDRcVbO>gIW0fU~fo_k!iol{L`zlJ4SJhg_w_h$qlnn+Kk&z->9&S9#R%fzC8 z?RwrgY<@(?u6>P`FBawXTyoefjE<5)N64PqaBzH|YLCe4sf{J}t#)UE;^i7=ssE_p zX_qY30&IQb7A;bz9|Eg-K{?X@4+5^eboA0(1gUY>eun|ge4a$@cJxDB)?h*me<9nB zg|i|wI!FqWR*>;>ovoaYqQ*JldhMBie1#t3%ws&KD=eZ);%tk%M__E0nN!3J{)2US zSAZR%5sDakhtD#t*umM91nq?g#3cS-q@xk^A8;NUH7%U`Dl!(GyB{TcLZ9`8>Uu-3 zrON7+0LyxsLem#Q7FY-a)Tu}n#d$yX%Mo=doL!!|BKSzn0p4+k9nE-JWH)Gk0bY39 z%gri!@;-1|d8Wian-Tw0^y=G}r_qzYM$-A3I}d4Q)7EleLGZ5wD)Qv7AzB$b#Q~sv zr+FQ1lhym=+ImcyF~{>6A@Us&?}&KktD5px%ZjVC!Zgct)!XJxNZsb7Xx?!0CE+hU zwflR??)jUj-mnJg*Kg+8QPRGDs?ww49Tw>`_V}+H*RwfX-HNZ{I_=@&?4}w7teE;O zf;p)mJ3XS)aIav_=5k~x-=_uczJ%>IUX1g9AqE-oI7GwWrWL&E)?0D0A)T4BPCA%s zuZ+a0Qd)ZvcPWZezKU1rHmRf+t4#3`P}eC5@`btxFvRqrrxADV-ls}`nuwlG+#x)q zQvEv$_o3B5hEl74{h_B`9tXNJn)n>(&Y`V$b(3zQ{eZq0Va}eBDx3-|7Y0n`^=Y)4 z*JEW>vs%&1sBS8p!aqnT&m5t!#2v8O&?nYv;lbLZoN3FBRdTpDWA1G!Vs6Af1asrr zFGj@K3LI9$jSQ$ktU|+oowgYknc|Macd8V@%OH64N&nsl!_Lc06SXU^iIn>ST0A{| zgaIB&>ssVm~I+KY0HV6o<)zy|oqre4OU#9p-C)W{K~IV)WSGmFY&l zp;ouF%$h|hG&fb*Uf&3JYF=@Llh9t2eP=9Pl_b0*Q6-61(!ewjZ4u+p6t{KiG>-IG z;HZJYuyUe+p)lS%a$`_K?M#p8u$L|AMIE53=6y9P|8?1z!r>NVqoCE|0ly}- zjHu$hg^#kgT(=wsQ11s#KK-(m^401Azp|q4FS^vXSJeA7S@?=?VyNlG)wFpWDnFj~ zX1}Fz@`1jj<8iff8v@4|;Xru7-()<-2%G7(WgeVqoGNX9kt`D>u(XHpaj^l4j+S50 zsUut)PR)!P;&4x4-lp+YmN(YuV(8?8Z$^IkDi~ZxOV8@Q6HZ6GF7k9#1iTtt29K}&wDMl3OluP=FVPJ8 zTJ;^QCcvbBz$ZP|v^E#ZqW0aB4Yb|AlGD14s}Bb00zh}v^~g&emWemB6^NBU9^7u`T$1zFSNbehrtYR zzzm?}V4Fih4WMn`KK!WGD++d9xB_^aB~kxotHbMmw267y2O-T?6leD0IhAf0N=4?W z_G75@!#1rlSQtQS#>TvN>zqrOCsgKZlsRFNk)obh z7mZxE>tlx(^D@6{H?M?DC`x;NWd}&TRMJ@&7A07eB<%8AZ#&@5i~2SU6s(O|RVgp) zO;qQ9WfRsC6H-NGQBsBHgjOe*F$!5-pfG{Lp4PyM&mG=jXlAvfuFF-{yHzBdEnb_; zWNn0b+rzXV$66wbPR0JX*dbfJhe>1Agj7^9#oq*fdqNdgDzb|r!=b(C^5uUL_U9bt zRo1Y?Ji>i((LLFKf%Hn|u7#oub0-w}%rQNGnG-hM*HRRspQ^H{>QlINN6G>35uFHs z7fg|wZwj$Q@GOVh9&}=2PCH`u`N@p-4Ndhi+v_#P2IIRJKdfdGlK|gQj$&+J9O)A| z0Q|Z%C76fWO_LrD@EG5Be2S-5;LG6gKBqK>J7?k#mc5Bi`xCzeNnRO6TiZ_r_b2Ip zPq?m(JjjX_`}9fvkRe zaTR4E#@9;|<%%>XLAo8mM&yKgzS?DLnWe(i+23Bl=Vww#-*dmCBP}OVVa&OI?!36hbP_Qytl<7{iEak1`aebTy2nfhdSW6~^yf%_8We2w|j>awk9 ze|7U!vW~7YcDHfjT3hOIH>SR@zkGFaHYTj^+a%}+qa#2^fDQ(d#HPm7gXAeAgW|&N zS+gpicNp`5@T|)?>=zxI$9SpmbcW&BY{e6@=3}g9mEo~AZe!J$CT4PfItpse%z8c> zMq)H#wc9*f!LDkIizCZ^pihncLHBAI%w&h&6S(GV8zQOLddwc)0UR#KtG+~$?j=_| z`R1mrvO1zYX0~txr-^0}%DTH(9*L?$sOd>oPCz28*B}<3K6ZVnz zjv&KH52>FCp3bs<8u&wh&nklZknkKYxvyv+z2!So_2C|soE1B30NfGPD2YZH+&FhP zX%VSZaPpjMpOzWO6#JI*W&tv?lDXgHdDnhyLglFO{<0Q*ZL&!Z_imdx*#P5!AM7e8!dEBBp6*U5Pl=aSJ&!AL$L zFAZEH`^dsOHT4jG-@6BssfHfjN`@*@X>ThjCLTv z)YWGqR)#D=F*S@s#8NxIx7CBguTJXpB@pBUhvqDQ_(klQKd2vm!P9ZlpIDt_ z2+4!`p_^X(E~QRK<0hWnx76u{fDxSLOGZR6B9al-bs6aLpnhm?yRyi4J*SrkbU|OG z^|mb5HKrn9wmqahqo1J+WO4nOEM^s;>bc+ORFL{Wr?N3YR?y}M=V|mgyAPQDx7}*9 zdR8SJwSI1Y%AOZw?E$_i?2fiuucoo3Ll_M6^k4j0XPXS}Y@tSO=z1ZI>YMVeh{|Gk zSNuAwSK!vb;#{fV8jYWq9FG` zqt^FS;6PU(VhTh;frvSNQU&&@C(J}g-}|aI&3N*SlLC7IKhD$+G`Q){0J1`aiW?@U7^h?lGIuqIBv48}aB}zYj@Rdkdlvvln+fg%>+Qd>S->;Q< zPbB$>MZirXVKHl6M{q0q=q@7Nz?>mqgHVbk!+h^3?mnste1IVe)9!FY*|(z%q35G- zMu5+CcH3Q3R;wF}1}7-6slA2FQS8ot!dJVu@SMEtC%2oH$7Xvep2WyPFhp*%ZE6jM z&Ek-o8UbLF?-g)kFJZ%U_|guo@VkT)gXwHfi!gx=0C>UJaD33*)Kf!Xc!4_OIGf^d z8q-;y1ABmQHWnXr_!;@-GcbYW84zu;4Zqib=g9YF*|cR5cl_G~=4(w_g;rt}l<|lr zs+_eto8-No39)AB!9428%agJ!CqFI*Ebxi$pKS=HVV8!xf4a9l25;HD;+CJL;T=Am zVel7Tco-)>bVM%|u+cGv(%(Qn&uw;C)xEEwy~B~|6_Xe(DLTns%iA00acrv+9K-Ao zXhdDvujGq?)GIo=wN*MF!P%EOMf8+CAI4$!gu)>Kn+GTLxCd}w;wZ&4lY1>bELDfw zWt-*e1=Sa+-^l{74AIfB0-rqshu}?mxakeYZ_ZE7UY>IgPC6+*E0g6dG677J7%m`7 zfv<@6ittyuOn%zU>q=VOW>r~W;VF^R6c_A3oHr%yjV_R4$gXzLu57Ew$aIqg@9+hI z7bNSNwA>^kwE~`ErT7s-O(E#z`m$Mqddi40{I1*P8rt3IQcI^IJfAA5A z>;A;diGkQLY^rkIuGo2qq990t=lngtyY77=z$&Y$c#NLs$6q?0FY%wp^nVdV+C%Pb8f8~uL&Wfwji1&c1V^lN=e_KzB9&BPRN!1BPHmN7+ za-N*RL{2knLP3B@k2_9WaCy#4cTeoGNt#)3e{f>*N6`WZ zBmGQt2dsyRuiI>$LP$^D#WGQ-^Y#Am`~SYvhsb~Y{(nTzN78jh5oAeD6tR#k&bsG? zkCf_2haGuQ=Fx3-U8GGjA(y)zcBF`BowM(zWyJMo*AS$$PJ73#;;Cr%^yg=)80V~& zm~FY6ay=qqyNkpNJ|c_oe?<_sc>fePUcU>qp<`mjX*fwPA%D&w!jCJl$Qb+RjLwg* zS?Vghl+g;>C+I^Kxo0dyoTYO zPM@G<-X?U8#qUqnCLDS&{jy|a&pHY`3d&oGqCX?>VAIzk5#d=pe}5p0m0kv_?&gZM zt7C02j&e|Gw%IG~x>9s$Tezmy`vz?*4UDeiw!2u}@*Y$xxM8l>cDgpt4yVr@sBRbH zOyIlU+x4t&;4nk%d(n<*I)6hnp)Tq$&y{QjEM&6(<@~Q!m@E=4o9Cu8L=Ys!t3l2c zGdT9?=?sGE%|v^pVrMZ7i8VvApRjtXC_yuzVWMblqOrzt1r851iBmBE5-@IS-rLv! zi&;pEww75~#veEFuQD%9|0pN(k4#wJWg(tEKe@-G)^}c z>3W?NSG5%gW3INy*t_^Pt!j#+j_XjFo)DY{L5c->xIQ5KiU zDm^@*g-VYT*o%0Iek` z#PcR;YJYPf8&~lmjp7PKJ;yEsrh}}JxDB>ToJ|pKw>WNpUuRd>`7XXqstjj?P~KyD zImgukS$I^wh&E|bFzRBJdLij>nn8fG;G(deFluxQ+`IGZc%9a(D%+wETY=N8EHPwN zx}vpv=Z!!xT9Cg^FOxQJ=mKq%#H+Gs(hrTb>@g+_a<^4lr?djZo203-57r`k^b69o z4R!GtZ6!VM9Cvc9!c^@ z+sxXVaeP31a(1=fB)kW^+Jm}hyDsb@hG?4x+ms}K1gAaeZ`erqJHaRdMykEshY`a7 z{r}=i_-wo1>zD8W`cb|_xTbI2!UKUeXa->_gLUh(`Lw}}VQ}`d4`UdJy%%Gs!=PaG zx_iDvTqpzSV^jAcdCIm2Be((iBrIohKDk%R*#P-SmXk-9;oDN@RFwE|2&|`5)s8N9 z5uP1?@6kwpz&=R}yPVG7i-lc)-J69+4q^EjtT=-8X|866uEdi`c!d|5$0m<79?_7; zo(<*=?-Mq-8;^T4xT9OZKb^tVA@HYdd4)3+BcE@f+&z@HN5KAd%Dsd5a@m8JZs0z)t%27T+$IAx;R%OK;=v%Sk}i!y(C^VeUa&qv z!|G~VdV`w}DxtE{c?k7gTx zBa(gD#w;Y^)3!~}KE>Hc4n0Sg&dO}FQ3F05$6`1Fai7F@U|+n5OtYa2Zl(Xx*B!x+h22}>Nq z3WLt$JlN6$71ZYn)Z`xIMlhZ{2aujq$9rZs-UHmh#<>I042-> z-HWDk`BZ{xIwvTErYhNrWligUW@#bQ@2wCFKQX+B-@`pM%bPkU5%lwVXwl;0e&R%#T$BjAkx zJpR+;KSfoXY_qsZ^E9c`!_9h}^(+x8`<85z)lG7x?P?wJjKJ@YNLG`7vTgogjN*V- zq$=WEqpYTb-a3u8WnE{m34gJRnk-M(7sE)>r4Mj z1(i`-)2=ot)|3SPY_4gW%FA{gU6*x}u4QJS!OoMp&Mu$EnXFc6jcDqBthmZqj1nSgcj6H&Nwm0cU|{f|4*wFoH&j ziW5pn{>#&+udpT51W3tS3v+zfmb7SbCB8tNXDjxN>a?vA*yCJtydXhs8avy=UE-M_ z6`!7wf?dv$lu%M~j-(Jgqg%%kiWKR zwF46Z%PLJULRPO-GFld^67L!vb#N2*hEt(>oT^iMWhgjSS@uEAEucEl!F=HM2s&0e z0=Oz{#5?d1R5#Q{8?Y=|GAmY9%8q989sMg>UAM)J^CvXqelCi!uBB6q{zX`^QKkNT z^k*XdvA{{W+ z{W=D$ED>v6_A1F&EpNW%Wt7~e71Rr83F4#-59r}slCBSNLo!)abbO9BS)OB>B1*^l zW*n;)z{=1<27MH zm;zB=dbF+B)t3fge%)-0JCdc^73( zbXQh4o_6VQ`b&97vWDOvwvZ-Q$fihufe(O9Z~HZd9Go?1*aV^il!>p?FuKbL{ShVq`)>p zspLSBvedKq9LGptmDN;y4ee%na%-m60bBn;Z<=Qpli&#&e`7&RbYk7IIOSyKrNrbN zB|iN&I{*IFv5(=_YlyKYOWZxO8^t#_s4Rup7(AtX>koB*t)w!4W{S>`!oKSbe6aw)W8}2{?zEQ zmug^V7D5fVlX(gge@)hetDc@P)czC9F#CNMt+j%@GpY%45A)Ia)|L*sFeX>!Z3^h? zRJN>)4XvS!WF+h_v=AM@2`@yjkZi-OkdMN2YR7kt@k`NuK{DE_NitSxM>r%f9u{0xKC^z)zT?R_8Vx} zQw2-lqEzm7mKU{HTLrVRtNb4XUEf+P=n@vdf{su+v4-nf>P!6RG5sIq(4<#ol~GRf z)8;7}oFt1)_;w53VomfCh-qdJoCYI~$pqWZdo+>Zh(3qiR>+=I&DL~flwIz`hy>q! zRp#x6uD8-+e^ugCvIEXhT?aug6bbf$P_H-%;UpyKSi)HCZP@_tB#$ntG`YdEAUff= zn(N$W)th1XTD&6S72&Tw8ywlk$z2`e**kj8_Z!ASan>gOMo!LNzWng~?7K5^n$;^h zI98(i&@Sxw$8!U@2lPR-$}_Bqbb!68k`3($SMrjVf7%+E0ZWRFoVSkUN}IqN50HbnVeht`1K3&;58P(&hYI+GwkvZTDv-gj%Ku^3+x$g+SxUc zFD)v_>No9hYR&7Ju#Yoh1Ezs0I9i3DQ3+S>K_+(-%u|dY$pzuhiF$5~t=^`q?2`5| zBa>@me_TB#h^hO`1P`C6a`ypj$5W0OCdlU5@tH68IKVDJce7YVtL^#(X#2C=v^^rT z7QZf6b=IWFU!0J}S#en|_x01H`dnzO)wV!7kTwJ6hcM>__)R-Yc^|uqY@fo5MGHo8%f0^dlqEZ8bs3X4MMmUD3itBcpLl=cJ z#k9KRX8?HPPFB zlZFi>9!pvP4l`c9I}g$a=OrxG(`qAjDPS*72@j-|p^%r8$PFWZJqZR3c`0DI()1?Z^{aRUJF5K{3tPnVtbfF}{T4$Wu-6Z* z^?=|DIzLLr`Xf(0W?4h12e`N|lGq4k9Ao^wAk zSoUnnR-9eI{)ISy*27|!FkB>@Bn5fFw&uA*8jP|?B3x{#5g7?9_Km5gRoYZp%8uuv zvdXTqBFS0c9KR%k#}V9JXRB)#aKS$1T~bGIfkxwIfDnS2T?8k#z05h0zkGFkN(3k7 z9FCl@6t8cT6G2YK=7Nv3DvLFq@^Y_y9h=x!*k*24vEa`ml!q#S1hCt}33L_tR>tw` z7X~`g|JcF?5~GF%+eJ*rq`twE3=bv*x?W`$lQs`m0dkX$4%Bga5_j+6u`2~ZMFY5U2#s5B&1 zTOg8b10%{X01XlzQBz-3+3LpI8qRY2#?cJI84<>40E*B)>r^r_wpF*AD9R?fPUsYL z#7~JTByXj3G?1$>W|eODM1QFXP>*>(QIjcZLsQh(Wt*@0gwwBGOtAtr*gHN=HLKft zsH^!RuS3OM@6{7?ah*t8SJ_=AjzHK<-x&fYXVZTRv6ZvL3gX4h<)pv184C-o-9_q5 z-fDl$0Lc@K0W+^!f?&}v$4IC+Yg;AT;lJFgVGygan2NN({xWOXF^$ zD_$oURtr{TT9ml5;K6M2V4 zpbgTHnfn#b;D4OuDgtTxY0N$seCZ#bTj{{kFQSita#g=9J@8L)ltDoF9z75wZur9t zx5&Vhw`<33W*ze!w^N?P3Ey@1m5q=(S6EJXZeAhLzncwqE>`G}qdPeNV?_8SgUw2# zodi{s^#%PXlUL&Xf|y|hi+pQ^V2+O!IMW`n(-Y1Get%4UYW0bex=wbrxj;d1jQ^Qs z-OWNLcC0V{-Z{M}tE;5Q>J7%eL4|>*msPS!Eim~?>i9R3fBgR62^T_n{M_jSfB9Lk z`Q;Eo@1QKyY-rTafY*XW6r3fLfN(0UuSbu+w7{FeH43|=VZ6g-IG36H$q1Gpo+C=M_D$4NiG0wdmc!=TJPc(yn{QlqQ z|MiUU^XHt^=fAkM1n8`8vwkvT6CEWbhb{g#k;+dRjpN(KW+MX@oi2Aiv~6;>g#8%v z^IrOPdQs2n5>$F`HilD49^W+8rS$4j3^XyL>3`L$OVfJQU{EJty!AFgr90e&j`F>y zS@(c}8y7jl!gFNgmo9n>#unvVTdvejj0BUqr~;u3>MXKvKUr?EZpxDOH4?+x^U|u5 zku)db*Sw6m2w|i3LPxp*H{WR|4o{e`K%_8c7XB@<9PAeu;Z#Pv#`U=VdKsczr&%JB zbbp=I-m*M)i%Hb%wrOoEvqOt=`Nb*tWL|c93D<}l#8h{rYE-A{_pBuDG%9+y{&hBs zGgjKBt0IY(*g(6lw@F1GuF<>5LamPm@fo8KeK-*m;yaoyOg8n;pTGU?82T*^BIYdn z*X3xv**QY_?i4+$j6R|9f{PpEHmRJ4fPZi@g^{h7d(x_hko(7O(7bHve2L{_N`GzX z2f)tCcIiFG<(YU#)H{+aFVcBsWHyC4S5zf;mNoqY()cvFeWG~98V`J^FW6L+JfI~X zI=8CBpML(FGH_};x}uuj8q?WUUyVU1~S$1pNnMjB1 z9RDzwe^}2^m-ezX*4)?@yg9`=`|m7w5exa4{_n^_Mwd=RPd+|;e0XFi?e%zh*q%qx6GwSZI`WlH$j7Jtw1Nw`HEKb_Rzh)3nP6Dx8@7QRc8>q_!A4 zoAkV7#NPVBpJS|_ObbQ);EM!b`d0hj>*e*knGTcOyOG+R{d#zBug&a+2Y)xBKojOG zIY_*pifOSEu%P42?f^?G#?nZ}S8cXV^Q=hgk;Xe!QuhM+{7&6*__@eVNz!#j0YZjJ zMK&*6)H`*1V#Lcul3AoEY@ph*4z1Z50{4=(xh^X!*gD-QMDqQSu)h}MiMvq*WE+`% zT7EH#P=7(P}}+p(M#)**bM{jnf0FUMn8XN5_km;A?qkeVH5+L=!h zo%0_HLYl0xtQCyn9sDbK7iE)Pl;w><7cy2J< zBp90aBU5-C*az28G|+*3J58q9h!#J9yM)ZusPnCVLfB&SpA@&;{(q-L_OkBX6y;r> zu5qq_v!GQv<}9*+#*k@0fuAf#I^~vhnC)LCK~8rt6JpfqYgsg7QE~I~hUD+$p(}r~ zh+wjho0h6~a=d+vTrn-*U=PL}shgHHHS%G$4}l_!}bL^u)Hsny&gA7F=aoSEq@7(BE|HY=1^yeHgYX_ShgSRnjj9 zpDLc_5vBw7v##nx38ulK<8sDBLMNmw#C>WKrlUJ`)^j188ndZDO`X+bG^WPKIObAk z>>gODBwK|~YQnu^BozdUNJtHUU$~!<_r<=oUyDKib zK&ClOWnn;AAIDhNl@rh ziMgG?O$-Ds&;|i7zObg9mYm^rSPJLXj?a=iTcy$P6 zhG9kCDon){Il<*KmapZ-%It8od-(`UL+Po-nYko++kddQp_v`hb}v1_qJBE-y(FYb zOX(zV;JqSY1!?AIKC-1*Dm&W?qWkFWg5L1naU|~Pa>0}e;)pX*=uV@ZL(E(+Yp>+<4zuUoJ7gWeiG_TGlWX=hejDVd^F{q~DCOjDk(|Pd@3F_3 z{w9$(XW~9eSq%3R?e%&ETu%mE;jx%{sK-vYmw)m!+GGA+U8y}Heu7Q^G5crri! zl!9j-i^Am`b6V8Un;CWsGk8TLgnY-Y4uS%~R5>`Ekn#S72)&T6*hNI}%2Th)tdP+r zmw#yk+nQ`6o8`u3x#w}jj~*O7{s3;-7|x8( z=fo=QxLB<;>6C`%qe`Rs4m}N$72uUBPk&6{#ICGZQhQaugW2JQi8NJ+RS+!1(&o!T z-df9kD(du8EIUsPpT%KjG^@Fx#Cz&O2R++GLK|g-zE$jUU=i7qt@mIB6R9uyJuDU` zyf7r5Qibo)8qvS7+Z8{WVRQgCHX{E*1l`9m7yXI58!?^FlzDlzi{XYT_JYoH=zoL{ zoO5GX>wa|0+F*732=wQ;2>PNYBJ}rycVlV%gBn^B@?tny67LLDlSkum`eB=wjE~dQ zb(|AUUC;0V^t!Yl{TguHc)mlg*W09M$SQAZ`h9Tm&FAIn$lgzh131vSs>-%l2h-=^ zTg6lT^u)O?x555t>D$=%s1AjZZ+Pr`;qj1nqXT|7m34aS81>brqn2Jge>6IW7 zl^_z8AW{qWsqFJU1~{{TaWa7MF>5kT?W4TKwKi;S;WB|PQrkR<>SUXxML;_}8VleD@3&9IO!Gg*>$$q_*`@3YdMbU32S=?MX;%Pl95@OC#7~@qe-{&;c)+ z3>5GI{$fcIFTmEPyZ~A2(FIS|~ z+nVsdgD6m?tO|6Jp-{3atuaKM-7yB;k^7o*NlhmVf~tF|nWEzNn?dJjw~$;inanvD z)(WSBt@sfHD|1*04S$t$<^K%aj$WFJAT`d~?=Ya5&y%R#j(&*C8cc}cFJ#-Xa8`sy2T5Vl3Nl`;^5s3vYX|Sw&B80=JdtO8oN~@jpeczkPKQ zJ^gDWov*p`kY+Y*Ee93^|2m)|PyZUCm9bMC0Lpio*MHGAS=}U8)??C)IbO^Nk?)9j zN5nf{)s(+lR$QhPrdg(|-ZpPS>NY1u^M;cz34iIS-QP=g&)-D#hBZjPelyRGlJ-@V z9u@DfNT0FCf91HI&*AD;d>z+G4;N=Q)gWNS)Nc{YNd?*I5uJv61#>o+BSU$U7P$Ko zw%d3)&VT=<7-Yob5DkBuR`9A@Z^gxibY{vr>0qk8HWH^wY3)Var6@}IDqf}Aq>^5& zGQ~$gU8f|-7wRIw5YvO6M%=l3pDJl0dOC53@Q_ON?3 zbD%qiw%*lEx{3A!`h0{rdq%2oDzID_Fqzk<(SK@QkCj!;YDF)jx~Xsq{~)0}cZ9+c zcfe{xpIEDf2WyjZrY$>G$>H9NxwoZ=xe@yi%#CNi7!hYHa99mDGN1;r3JrJKW>{p3 zI}YEeQUou9;L#`jdmjutFEdTluDm8v?h9z~^!O1T$mpDh`!Ippn>PX@qi`Ooy?Nbm zVt*5(XFH-1hO?tqOh3zO$|zZ}%NC$2YoSkjG2nOBDD+yVohCLV;S|`Aunsbc{g_Pu z;QdQb93~6))>;Jcahj)hn6H^7z8{LwV}Dns8~KJ>-O@5^7NyW!S7m#3E!?Sj#T8CM zdr|hiv2<0E@RCH8Bw9%W(?GOEj6+l0)_i%HDF_au`6pA2j*&%YRzR zSE~p7%8I(b=u+QaQSZ}a;VZt0p{5sC)8=ug{CL`%{g%eb2l|qZ$JNel2pnUC1K~M; zlkpfMY^GP1d2pt2sT}P1}HjOenF>>aBVm>Gj52(J%xFj#+O;%Sm&>r z)K_ISI`_jPa~(-OBq5BC+_GhSh=1V2cIFeJ^9gU@wJihYN%6pwMtxc%BL^J4X)pxL(0Fwfr^jy>0oG**ocTYCZ zcK=FF>o%&=HshlBwIQojTGuh8>S}N-QV}Kf!YE?H35StmETQEVTa`IwEPqyn+M(qG z7$CpU_8=en>2Sb`=sDJO(faZ00~qbU(DrU01~b3`Gk}(ZZ4L!BfVO@6@S|Ff10-ur_8@rM##&QJt4fSW8Su6_rIv6`m7XonXc& zWOaeU1PXgv11mmvc!!~x)sng{S6S~?k#M$nV=j}m5$0_V(}EmpiGM6Q75n32hir8N zlg6qEsi6Oe~3q={` zPAKx3V|p?tY`U+dC`3P1WmDCsaO;kg1KuM#5&kZiA~oL>Vu|2c4!1q%#KfF*#O(8v z8SNXI>SMOoYm5!XcYiT{Sj{FT0ludk#n`|&(kF5N_;qPYFb}nxCOsVBF~0Bl6i=W%yfTWmwx0;@Ptu=oT^V_h6)X1Xll&n`@~g7Sn(Iw% zi|;^QLKV^G{(;|S!-PZ~A(0Z|q13^NF7GEpdG}dTaJ$R}3&O%9PIXWEue4T6`3KVJJc? z$8IcOew(x$(aU+Mc$-vN0v@p%$56VB^466S1$1@hv=N^bd_M4S!>UsmP|NuVY%C^q zTd|8G@M;b0?0+7J9IAqZ&}meaeRyaOQh6=Grl3gTD#}ERua_pu6=_a_bUT8L$O-j) zwaeBrONFVkzrBJlPNk5(=YB^=T27?Gm~+WDQiBd_SbyKQNzf5SM}Uq19SkIiO^vAs$x}uK#f96mW>r4xFy;f{X_s->FFH1l@lxU0 z48yV6iYH{v$5_uQ!((mS#;P$*%;a?7?RL57naQa=+sn`Qkh@Q0pN1ot7~ zIbL#K(LQ?1cc$vYJt{dXcGdv6BdSpnjWW1#?yl1!QmNqNIoCcdGmt6vE$7VwWMn0C zzkkW|uKn1A%2DI}MJ@W;WRo85-8OTw4Q#~8*_7j=7JWXDJJ_mwV;N>A-Cw`)gQc+j z4x^OQMAdEljZR)KI|-&uJ|W`s{{uMKLI>&4<=I$J-n3+ zRh;rn8q?9zF=5>@jK!B;M^&~16yXJ$WoebFM9NzP)r+@9e zyIBo2wDzeQs9^0t)d%6`E!D%}>>70iV=6qbn|diSoO``{2lYcYz4%>9osPy$JiBkH(+dG3 zIL()gh+sq{BdqH((B(n>(B5`sk$>-cPA?DWg1$=YZCR{qOhv$Kdq{gmKSLSF;`%dL z%ql?DbHCB4AoYPxWn+S@pv@7^)97<{A29uIyVYj(tV%j+{lb(zFUZ;hd{fvRZMR-c zV@roH80P7}__NM78Qj@Ijoi@nLK@ZAj!^`IdPdp|jil>cnf8`w-+x5`0^T3x5 zRD0f7$=F*L3zt>*M5=`EVRSA($B^b;sy8_ZKI%l% z9eQZ=#$WjTe^Y6(-oR)x+RoWxjIGqNH>@s=pR@RTXMQ2D_gPBZy~9Fh%$=_vJ-AJ( zAT}RkE3tQ~3&EFj_<7=Y+=PwM5z_KxbBsaH*?}5Ux+#K#4A1q>(tj!-g%d(`eUHdq z&)!BC>5fj_qOMpL=5Bu8HKmuju*XWOys$@Vk5q(F^-aX7+{~gJB6?UPV9bWWc3hYx) zn2C70@#GsP1@;1doT(jXaMO>4gS$n2@6usC|6X%4=*D+oEC9^HmOSZ~ps92w zvd3cq2{22Pe*EAok+3MSu7kIuW-7IbrBuFOEAyU6@)3)Gn}0^aV%EBj;8yn0T|~Np zIYYn(p%hDo`QB07eN+|r07Dd}-QkF`??xFy&qv>k0H5pZw!5aRR@W8{PEcS|dk2}L z*qw#1ckkdidDTyDH!Y9N_EJ2Fk%eH0+-BR<8VsAoAvZMwz$V{o;Kp9UhUxI79a`ac z2`2{A*`5|*0$CdX@RG6N_@KF|r-s1r0(Hi5HpSsIrqezL_5k5*EI#P)GxE!4U;@iC zAlhOZey;(~k?+m2Y0Dz+__qnn*P65nt;8tf5lvJ%Yjrlsjh+dyX6eB^>dC8=0-~UD>bYi-FWjI=Z!0Iv>H=mpVoC zj6EO5VfKW=Apx5QC-t}oa9`pm#dDK{Ej}z|hucM)_D71CGCwakYdO#chRnFtH{W7lLYVa1%VeN>zcIOBqOx~o?@l=5kgHN=;ixY z$CHdMeK**}g*amgrI29iu9)Cpqu?e#yl$_qD4WY^vCN$pqQXySYD+Y z+Tx=0@-kaxuql**|AXB1z)W}G3623HZ4C!|n$Iq0HG=i%@zIk=Tt%3!<2>21`)_~2 zM3F`xe;(8SMG$EZ z`A;dcN%#}RdCFP7<=t+#WkpTUVJL6il8e!Pb&iv}I>sv9AMEB!a^a&98yphV{rJc8 zp+w5R&YCoScKl{mo>q=Yqd%s(;fjBhH;OnbE=wcc1FDWu(IotBJuP~$iMb?ICm7kJ zo}|lpatad}yzKIKnihwFm9Yw}ToS337ois_mU5D+UR|e~#7F=uI{ZW!i-qroVm4V- zF<;p^39=MtE{(;-yb|w+0u1nXaL+ZF^9dz*bSa%Yu4QfAmU*_?&C8u6Gr)hsiOC;D z3m}a2GtnKe9xA?Svvmp~J#`n$M4`^t`^WG9`%WJs|MC0(5j`JC*BM2SB{@;VLbf>T zo)mLD=H`Q`~s{F4TsOi4~{eB)Ne6IfDp4uEZi^?4vU} zKfYqAtMF1rD`=w}3&Q@^XToj@ zhIcxBf|hxk&^Z>rKUte_=)v^Ml94^@DDWsKZz+oYjKG6UUyDS9XYqghfh<;f8K}CO zE7q=#wZS;bL8aMdue9q*(WPzSnp*E0w5>ESx{llKVs*=VP_5vGxnA4p+B`d)K6jwH zU5GP*?|yIBv$}!946*M;JErOU4bg5 zIakc!*r%s62&y*|?UiDl#V{n+49R}N>aC&#&47l9qP2;}8pjnlJkTUg#Q;daxUG3_ zV*@N^AuZZkW?_-T*I%8rj`Dxh0nKpF diff --git a/r/articles/comp_bio_data_integration.html b/r/articles/comp_bio_data_integration.html index 2ef392e32..b7dd2196a 100644 --- a/r/articles/comp_bio_data_integration.html +++ b/r/articles/comp_bio_data_integration.html @@ -312,20 +312,20 @@

            # Run the standard workflow for visualization and clustering seurat_obj.combined <- RunPCA(seurat_obj.combined, npcs = 30, verbose = FALSE) seurat_obj.combined <- RunUMAP(seurat_obj.combined, reduction = "pca", dims = 1:30) -#> 17:51:31 UMAP embedding parameters a = 0.9922 b = 1.112 -#> 17:51:31 Read 10153 rows and found 30 numeric columns -#> 17:51:31 Using Annoy for neighbor search, n_neighbors = 30 -#> 17:51:31 Building Annoy index with metric = cosine, n_trees = 50 +#> 20:51:25 UMAP embedding parameters a = 0.9922 b = 1.112 +#> 20:51:25 Read 10153 rows and found 30 numeric columns +#> 20:51:25 Using Annoy for neighbor search, n_neighbors = 30 +#> 20:51:25 Building Annoy index with metric = cosine, n_trees = 50 #> 0% 10 20 30 40 50 60 70 80 90 100% #> [----|----|----|----|----|----|----|----|----|----| #> **************************************************| -#> 17:51:33 Writing NN index file to temp file /tmp/RtmpHvsscV/file11b137946e7a4 -#> 17:51:33 Searching Annoy index using 1 thread, search_k = 3000 -#> 17:51:37 Annoy recall = 100% -#> 17:51:38 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30 -#> 17:51:39 Initializing from normalized Laplacian + noise (using RSpectra) -#> 17:51:39 Commencing optimization for 200 epochs, with 409958 positive edges -#> 17:51:44 Optimization finished +#> 20:51:27 Writing NN index file to temp file /tmp/RtmpHRWXl8/file40a97c51cfd1 +#> 20:51:27 Searching Annoy index using 1 thread, search_k = 3000 +#> 20:51:31 Annoy recall = 100% +#> 20:51:31 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30 +#> 20:51:32 Initializing from normalized Laplacian + noise (using RSpectra) +#> 20:51:32 Commencing optimization for 200 epochs, with 409958 positive edges +#> 20:51:37 Optimization finished

            Plot the UMAP.

             # By assay
            diff --git a/r/pkgdown.yml b/r/pkgdown.yml
            index 3d6c45e9b..920bf8e4d 100644
            --- a/r/pkgdown.yml
            +++ b/r/pkgdown.yml
            @@ -12,5 +12,5 @@ articles:
               comp_bio_data_integration: comp_bio_data_integration.html
               comp_bio_normalizing_full_gene_sequencing: comp_bio_normalizing_full_gene_sequencing.html
               comp_bio_summarize_axis_query: comp_bio_summarize_axis_query.html
            -last_built: 2024-04-05T16:56Z
            +last_built: 2024-04-05T19:56Z
             
            diff --git a/r/search.json b/r/search.json
            index 52cdbdc78..d372befa9 100644
            --- a/r/search.json
            +++ b/r/search.json
            @@ -1 +1 @@
            -[{"path":"/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2022, Chan Zuckerberg Initiative Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"/articles/census_access_maintained_embeddings.html","id":"open-census","dir":"Articles","previous_headings":"","what":"Open Census","title":"Access CELLxGENE collaboration embeddings (scVI, Geneformer)","text":"","code":"library(\"cellxgene.census\") census <- open_soma(census_version = \"2023-12-15\")"},{"path":"/articles/census_access_maintained_embeddings.html","id":"load-embeddings-as-seurat-reductions","dir":"Articles","previous_headings":"","what":"Load embeddings as Seurat reductions","title":"Access CELLxGENE collaboration embeddings (scVI, Geneformer)","text":"high-level cellxgene.census::get_seurat() function can query Census load embeddings dimensional reductions Seurat object. ask Seurat object expression data human cells tissue_general equal 'central nervous system', along scVI geneformer embeddings (obsm_layers). embeddings stored dimensional reductions seurat_obj, can take quick look scVI embeddings 2D scatter plot via UMAP, colored Census cell_type annotations.","code":"library(\"Seurat\")  seurat_obj <- get_seurat(   census,   organism = \"homo_sapiens\",   obs_value_filter = \"tissue_general == 'central nervous system'\",   obs_column_names = c(\"cell_type\"),   obsm_layers = c(\"scvi\", \"geneformer\") ) seurat_obj <- RunUMAP(   seurat_obj,   reduction = \"scvi\",   dims = 1:ncol(Embeddings(seurat_obj, \"scvi\")) )  DimPlot(seurat_obj, reduction = \"umap\", group.by = \"cell_type\") +   theme(legend.text = element_text(size = 8))"},{"path":"/articles/census_access_maintained_embeddings.html","id":"load-embeddings-as-singlecellexperiment-reductions","dir":"Articles","previous_headings":"","what":"Load embeddings as SingleCellExperiment reductions","title":"Access CELLxGENE collaboration embeddings (scVI, Geneformer)","text":"Similarly, cellxgene.census::get_single_cell_experiment() can query Census store embeddings dimensionality reduction results Bioconductor SingleCellExperiment object. , can view UMAP Geneformer embeddings colored cell_type.","code":"library(\"SingleCellExperiment\") sce_obj <- get_single_cell_experiment(   census,   organism = \"homo_sapiens\",   obs_value_filter = \"tissue_general == 'central nervous system'\",   obs_column_names = c(\"cell_type\"),   obsm_layers = c(\"scvi\", \"geneformer\") ) sce_obj <- scater::runUMAP(sce_obj, dimred = \"geneformer\") scater::plotReducedDim(sce_obj, dimred = \"UMAP\", colour_by = \"cell_type\")"},{"path":"/articles/census_access_maintained_embeddings.html","id":"load-embeddings-as-sparsematrix","dir":"Articles","previous_headings":"","what":"Load embeddings as sparseMatrix","title":"Access CELLxGENE collaboration embeddings (scVI, Geneformer)","text":"Lastly, can use SOMAExperimentAxisQuery lower-level access embeddings’ numerical data. can performant use cases don’t need features Seurat SingleCellExperiment. row embeddings sparseMatrix provides fine-tuned Geneformer model’s 512-dimensional embedding vector cell, cell soma_joinids row names. different arguments, SOMAExperimentAxisQuery$to_sparse_matrix() can also read scVI embeddings expression data. Still lower-level access available SOMAExperimentAxisQuery$read(), streams Arrow tables. methods SOMAExperimentAxisQuery can fetch metadata like cell_type: SOMAExperimentAxisQuery loads ask Census, unlike high-level get_seurat() get_single_cell_experiment() functions, eagerly populate objects based query.","code":"query <- census$get(\"census_data\")$get(\"homo_sapiens\")$axis_query(   \"RNA\",   obs_query = tiledbsoma::SOMAAxisQuery$new(value_filter = \"tissue == 'tongue'\") ) embeddings <- query$to_sparse_matrix(\"obsm\", \"geneformer\") str(embeddings) #> Formal class 'dgTMatrix' [package \"Matrix\"] with 6 slots #>   ..@ i       : int [1:190464] 0 0 0 0 0 0 0 0 0 0 ... #>   ..@ j       : int [1:190464] 0 1 2 3 4 5 6 7 8 9 ... #>   ..@ Dim     : int [1:2] 372 512 #>   ..@ Dimnames:List of 2 #>   .. ..$ : chr [1:372] \"51784858\" \"51784859\" \"51784860\" \"51784861\" ... #>   .. ..$ : chr [1:512] \"0\" \"1\" \"2\" \"3\" ... #>   ..@ x       : num [1:190464] 0.1104 -1.2031 1.0078 0.0131 1.2422 ... #>   ..@ factors : list() head(as.data.frame(query$obs(column_names = c(\"soma_joinid\", \"cell_type\"))$concat())) #>   soma_joinid  cell_type #> 1    51784858 basal cell #> 2    51784859 basal cell #> 3    51784860 fibroblast #> 4    51784861 fibroblast #> 5    51784862 basal cell #> 6    51784863 basal cell census$close()"},{"path":"/articles/census_axis_query.html","id":"axis-query-example","dir":"Articles","previous_headings":"","what":"Axis Query Example","title":"Axis Query Example","text":"Goal: demonstrate basic axis metadata handling. CZ CELLxGENE Census stores obs (cell) metadata SOMA DataFrame, can queried read R data frame. Census also convenience package simplifies opening census. R data frames -memory objects. Take care queries small enough results fit memory.","code":""},{"path":"/articles/census_axis_query.html","id":"opening-the-census","dir":"Articles","previous_headings":"Axis Query Example","what":"Opening the census","title":"Axis Query Example","text":"cellxgene.census R package contains convenient API open latest version Census. can learn cellxgene.census methods accessing corresponding documentation. example ?cellxgene.census::open_soma.","code":"census <- cellxgene.census::open_soma()"},{"path":"/articles/census_axis_query.html","id":"summarize-census-cell-metadata","dir":"Articles","previous_headings":"Axis Query Example","what":"Summarize Census cell metadata","title":"Axis Query Example","text":"Tips: can read entire SOMA dataframe R using .data.frame(soma_df$read()). Queries much faster request DataFrame columns required analysis (e.g. column_names = c(\"soma_joinid\", \"cell_type_ontology_term_id\")). can also refine query results using value_filter, filter census matching records.","code":""},{"path":"/articles/census_axis_query.html","id":"summarize-all-cell-types","dir":"Articles","previous_headings":"Axis Query Example > Summarize Census cell metadata","what":"Summarize all cell types","title":"Axis Query Example","text":"example reads cell metadata (obs) R data frame summarize variety ways.","code":"human <- census$get(\"census_data\")$get(\"homo_sapiens\")  # Read obs into an R data frame (tibble). obs_df <- as.data.frame(human$obs$read(   column_names = c(\"soma_joinid\", \"cell_type_ontology_term_id\") ))  # Find all unique values in the cell_type_ontology_term_id column. unique_cell_type_ontology_term_id <- unique(obs_df$cell_type_ontology_term_id)  cat(paste(   \"There are\",   length(unique_cell_type_ontology_term_id),   \"cell types in the Census! The first few are:\" )) #> There are 604 cell types in the Census! The first few are: head(unique_cell_type_ontology_term_id) #> [1] \"CL:0000540\" \"CL:0000738\" \"CL:0000763\" \"CL:0000136\" \"CL:0000235\" #> [6] \"CL:0000115\""},{"path":"/articles/census_axis_query.html","id":"summarize-a-subset-of-cell-types-selected-with-a-value_filter","dir":"Articles","previous_headings":"Axis Query Example > Summarize Census cell metadata","what":"Summarize a subset of cell types, selected with a value_filter","title":"Axis Query Example","text":"example utilizes SOMA “value filter” read subset cells tissue_ontology_term_id equal UBERON:0002048 (lung tissue), summarizes query result. can also define much complex value filters. example: combine terms use %% operator query multiple values","code":"# Read cell_type terms for cells which have a specific tissue term LUNG_TISSUE <- \"UBERON:0002048\"  obs_df <- as.data.frame(human$obs$read(   column_names = c(\"cell_type_ontology_term_id\"),   value_filter = paste(\"tissue_ontology_term_id == '\", LUNG_TISSUE, \"'\", sep = \"\") ))  # Find all unique values in the cell_type_ontology_term_id column as an R data frame. unique_cell_type_ontology_term_id <- unique(obs_df$cell_type_ontology_term_id) cat(paste(   \"There are \",   length(unique_cell_type_ontology_term_id),   \" cell types in the Census where tissue_ontology_term_id == \",   LUNG_TISSUE,   \"!\\nThe first few are:\",   sep = \"\" )) #> There are 185 cell types in the Census where tissue_ontology_term_id == UBERON:0002048! #> The first few are: head(unique_cell_type_ontology_term_id) #> [1] \"CL:0000003\" \"CL:4028004\" \"CL:0002145\" \"CL:0000625\" \"CL:0000624\" #> [6] \"CL:4028006\"  # Report the 10 most common top_10 <- sort(table(obs_df$cell_type_ontology_term_id), decreasing = TRUE)[1:10] cat(paste(\"The top 10 cell types where tissue_ontology_term_id ==\", LUNG_TISSUE)) #> The top 10 cell types where tissue_ontology_term_id == UBERON:0002048 print(top_10) #>  #> CL:0000003 CL:0000583 CL:0000625 CL:0000624 CL:0000235 CL:0002063 CL:0000860  #>     562038     526859     323433     323067     254173     246279     203526  #> CL:0000623 CL:0001064 CL:0002632  #>     164944     149067     132243 # You can also do more complex queries, such as testing for inclusion in a list of values obs_df <- as.data.frame(human$obs$read(   column_names = c(\"cell_type_ontology_term_id\"),   value_filter = \"tissue_ontology_term_id %in% c('UBERON:0002082', 'UBERON:OOO2084', 'UBERON:0002080')\" ))  # Summarize top_10 <- sort(table(obs_df$cell_type_ontology_term_id), decreasing = TRUE)[1:10] print(top_10) #>  #> CL:0000746 CL:0008034 CL:0002548 CL:0000115 CL:0002131 CL:0000763 CL:0000669  #>     159096      84750      79618      64190      61830      32088      27515  #> CL:0000003 CL:0000057 CL:0002144  #>      22707      20117      18593"},{"path":"/articles/census_axis_query.html","id":"full-census-stats","dir":"Articles","previous_headings":"Axis Query Example > Summarize Census cell metadata","what":"Full census stats","title":"Axis Query Example","text":"example queries organisms Census, summarizes diversity various metadata labels.","code":"cols_to_query <- c(   \"cell_type_ontology_term_id\",   \"assay_ontology_term_id\",   \"tissue_ontology_term_id\" )  total_cells <- 0 for (organism in census$get(\"census_data\")$names()) {   print(organism)   obs_df <- as.data.frame(     census$get(\"census_data\")$get(organism)$obs$read(column_names = cols_to_query)   )   total_cells <- total_cells + nrow(obs_df)   for (col in cols_to_query) {     cat(paste(\"  Unique \", col, \" values: \", length(unique(obs_df[[col]])), \"\\n\", sep = \"\"))   } } #> [1] \"homo_sapiens\" #>   Unique cell_type_ontology_term_id values: 604 #>   Unique assay_ontology_term_id values: 20 #>   Unique tissue_ontology_term_id values: 227 #> [1] \"mus_musculus\" #>   Unique cell_type_ontology_term_id values: 226 #>   Unique assay_ontology_term_id values: 9 #>   Unique tissue_ontology_term_id values: 51 cat(paste(\"Complete Census contains\", total_cells, \"cells.\")) #> Complete Census contains 60361716 cells."},{"path":"/articles/census_citation_generation.html","id":"requirements","dir":"Articles","previous_headings":"","what":"Requirements","title":"Generating citations for Census slices","text":"notebook requires: cellxgene_census Python package. Census data release schema version 1.3.0 greater.","code":""},{"path":"/articles/census_citation_generation.html","id":"generating-citation-strings","dir":"Articles","previous_headings":"","what":"Generating citation strings","title":"Generating citations for Census slices","text":"First open handle Census data. ensure open data release schema version 1.3.0 greater, use census_version=\"latest\" load dataset table contains column \"citation\" dataset included Census. now can use column \"dataset_id\" present dataset table Census cell metadata create citation strings Census slice.","code":"library(\"tiledb\") library(\"cellxgene.census\")  census <- open_soma(census_version = \"latest\") census_release_info <- census$get(\"census_info\")$get(\"summary\")$read()$concat() as.data.frame(census_release_info) #>   soma_joinid                      label      value #> 1           0      census_schema_version      2.0.0 #> 2           1          census_build_date 2024-04-01 #> 3           2     dataset_schema_version      5.0.0 #> 4           3           total_cell_count  114405937 #> 5           4          unique_cell_count   59761180 #> 6           5 number_donors_homo_sapiens      17082 #> 7           6 number_donors_mus_musculus       4186 datasets <- census$get(\"census_info\")$get(\"datasets\")$read()$concat() datasets <- as.data.frame(datasets) head(datasets[\"citation\"]) #>                                                                                                                                                                                                                                                                                                           citation #> 1            Publication: https://doi.org/10.1002/hep4.1854 Dataset Version: https://datasets.cellxgene.cziscience.com/fb76c95f-0391-4fac-9fb9-082ce2430b59.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/44531dd9-1388-4416-a117-af0a99de2294 #> 2   Publication: https://doi.org/10.1126/sciimmunol.abe6291 Dataset Version: https://datasets.cellxgene.cziscience.com/b6737a5e-9069-4dd6-9a57-92e17a746df9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/3a2af25b-2338-4266-aad3-aa8d07473f50 #> 3   Publication: https://doi.org/10.1038/s41593-020-00764-7 Dataset Version: https://datasets.cellxgene.cziscience.com/0e02290f-b992-450b-8a19-554f73cd7f09.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/180bff9c-c8a5-4539-b13b-ddbc00d643e6 #> 4   Publication: https://doi.org/10.1038/s41467-022-29450-x Dataset Version: https://datasets.cellxgene.cziscience.com/40832710-d7b1-43fb-b2c2-1cd2255bc3ac.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/bf325905-5e8e-42e3-933d-9a9053e9af80 #> 5   Publication: https://doi.org/10.1038/s41590-021-01059-0 Dataset Version: https://datasets.cellxgene.cziscience.com/eb6c070c-ff67-4c1f-8d4d-65f9fe2119ee.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/93eebe82-d8c3-41bc-a906-63b5b5f24a9d #> 6 Publication: https://doi.org/10.1016/j.celrep.2019.12.082 Dataset Version: https://datasets.cellxgene.cziscience.com/650a47be-6666-4f70-ac47-8414c50bbd8e.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/939769a8-d8d2-4d01-abfc-55699893fd49"},{"path":"/articles/census_citation_generation.html","id":"via-cell-metadata-query","dir":"Articles","previous_headings":"Generating citation strings","what":"Via cell metadata query","title":"Generating citations for Census slices","text":"","code":"# Query cell metadata cell_metadata <- census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(   value_filter = \"tissue == 'cardiac atrium'\",   column_names = c(\"dataset_id\", \"cell_type\") )  cell_metadata <- as.data.frame(cell_metadata$concat())  # Get a citation string for the slice slice_datasets <- datasets[datasets$dataset_id %in% cell_metadata$dataset_id, ] print(slice_datasets$citation) #> [1] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/9227d155-6f2d-4534-be73-b86c5c34d8e6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [2] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/017c9ef2-a5e5-429e-a9a1-919e330c4087.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [3] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8c189c08-4eba-45d4-925f-a5fe1a13d2ae.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [4] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b76d37f6-0654-447f-bd1b-477be2c747f9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [5] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/860c49d4-8ab1-4576-b67e-02d66e4a6ddd.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [6] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b84def55-a776-4aa4-a9a6-7aab8b973086.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\""},{"path":"/articles/census_citation_generation.html","id":"via-seurat-query","dir":"Articles","previous_headings":"Generating citation strings","what":"Via Seurat query","title":"Generating citations for Census slices","text":"","code":"# Fetch a Seurat object seurat_obj <- get_seurat(   census = census,   organism = \"homo_sapiens\",   measurement_name = \"RNA\",   obs_value_filter = \"tissue == 'cardiac atrium'\",   var_value_filter = \"feature_name == 'MYBPC3'\",   obs_column_names = c(\"dataset_id\", \"cell_type\") )  # Get a citation string for the slice slice_datasets <- datasets[datasets$dataset_id %in% seurat_obj[[]]$dataset_id, ] print(slice_datasets$citation) #> [1] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/9227d155-6f2d-4534-be73-b86c5c34d8e6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [2] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/017c9ef2-a5e5-429e-a9a1-919e330c4087.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [3] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8c189c08-4eba-45d4-925f-a5fe1a13d2ae.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [4] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b76d37f6-0654-447f-bd1b-477be2c747f9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [5] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/860c49d4-8ab1-4576-b67e-02d66e4a6ddd.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [6] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b84def55-a776-4aa4-a9a6-7aab8b973086.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\""},{"path":"/articles/census_citation_generation.html","id":"via-singlecellexperiment-query","dir":"Articles","previous_headings":"Generating citation strings","what":"Via SingleCellExperiment query","title":"Generating citations for Census slices","text":"","code":"# Fetch a Seurat object sce_obj <- get_single_cell_experiment(   census = census,   organism = \"homo_sapiens\",   measurement_name = \"RNA\",   obs_value_filter = \"tissue == 'cardiac atrium'\",   var_value_filter = \"feature_name == 'MYBPC3'\",   obs_column_names = c(\"dataset_id\", \"cell_type\") )  # Get a citation string for the slice slice_datasets <- datasets[datasets$dataset_id %in% sce_obj$dataset_id, ] print(slice_datasets$citation) #> [1] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/9227d155-6f2d-4534-be73-b86c5c34d8e6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [2] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/017c9ef2-a5e5-429e-a9a1-919e330c4087.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [3] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8c189c08-4eba-45d4-925f-a5fe1a13d2ae.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [4] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b76d37f6-0654-447f-bd1b-477be2c747f9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [5] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/860c49d4-8ab1-4576-b67e-02d66e4a6ddd.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [6] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b84def55-a776-4aa4-a9a6-7aab8b973086.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\""},{"path":"/articles/census_compute_over_X.html","id":"incremental-mean-calculation","dir":"Articles","previous_headings":"","what":"Incremental mean calculation","title":"Computing on X using online (incremental) algorithms","text":"Many statistics, marginal means, easy calculate incrementally. Let’s begin query X$raw sparse matrix unnormalized read counts, return results shards incrementally accumulate read count gene, divide cell count get mean reads per cell gene. First define query - case slice obs axis cells specific tissue & sex value, genes var axis. query$X() method returns iterator results, Arrow Table. table contain sparse X data obs/var coordinates, using standard SOMA names: soma_data - X values (float32) soma_dim_0 - obs coordinate (int64) soma_dim_1 - var coordinate (int64) Important: X matrices joined var/obs axis DataFrames integer join “id” (aka soma_joinid). positionally indexed, given cell gene may soma_joinid value (e.g., large integer). words, given X value, soma_dim_0 corresponds soma_joinid obs dataframe, soma_dim_1 coordinate corresponds soma_joinid var dataframe. convenience, query class includes utility simplify operations query slices. query$indexer indexer used wrap output query$X(), converting soma_joinids positional indexing query results. Positions [0, N), N number results query given axis. Key points: expensive query read results - rather make multiple passes data, read perform multiple computations. default, data census indexed soma_joinid positionally.","code":"library(\"tiledbsoma\") library(\"cellxgene.census\") census <- open_soma()  query <- census$get(\"census_data\")$get(\"mus_musculus\")$axis_query(   measurement_name = \"RNA\",   obs_query = SOMAAxisQuery$new(value_filter = \"tissue=='brain' && sex=='male'\") )  genes_df <- query$var(column_names = c(\"feature_id\", \"feature_name\"))$concat() genes_df <- as.data.frame(genes_df) n_genes <- nrow(genes_df)  # accumulator vector (for each gene) for the total count over all cells in X(\"raw\") raw_sum_by_gene <- numeric(n_genes) names(raw_sum_by_gene) <- genes_df$feature_id  # iterate through in-memory shards of query results tables <- query$X(\"raw\")$tables() while (!tables$read_complete()) {   table_part <- tables$read_next()   # table_part is an Arrow table with the columns mentioned above. The result   # order is not guaranteed!    # table_part$soma_dim_1 is the var/gene soma_joinid. But note that these are   # arbitrary int64 id's, and moreover each table_part may exhibit only a subset   # of the values we'll see over all query results. query$indexer helps us map   # any given soma_dim_1 values onto positions in query$var() (genes_df), that is   # the union of all values we'll see.   gene_indexes <- query$indexer$by_var(table_part$soma_dim_1)$as_vector()   stopifnot(sum(gene_indexes >= n_genes) == 0)   # sum(table_part) group by gene, yielding a numeric vector with the gene_index   # in its names   sum_part <- tapply(as.vector(table_part$soma_data), gene_indexes, sum)   # update the accumulator vector   which_genes <- as.integer(names(sum_part)) + 1 # nb: gene_indexes is zero-based   stopifnot(sum(which_genes > n_genes) == 0)   raw_sum_by_gene[which_genes] <- raw_sum_by_gene[which_genes] + sum_part }  # Divide each sum by cell count to get mean reads per cell (for each gene), # implicitly averaging in all zero entries in X even though they weren't included # in the sparse query results. genes_df$raw_mean <- raw_sum_by_gene / query$n_obs genes_df #>            feature_id  feature_name     raw_mean #> 1  ENSMUSG00000051951          Xkr4 1.397121e+00 #> 2  ENSMUSG00000025900           Rp1 3.162902e-01 #> 3  ENSMUSG00000025902         Sox17 6.604085e+01 #> 4  ENSMUSG00000033845        Mrpl15 3.939172e+01 #> 5  ENSMUSG00000025903        Lypla1 1.986548e+01 #> 6  ENSMUSG00000033813         Tcea1 4.305924e+01 #> 7  ENSMUSG00000002459         Rgs20 3.496194e+00 #> 8  ENSMUSG00000033793       Atp6v1h 7.470932e+01 #> 9  ENSMUSG00000025905         Oprk1 4.568752e-01 #> 10 ENSMUSG00000033774        Npbwr1 1.241003e-04 #> 11 ENSMUSG00000025907        Rb1cc1 3.631679e+01 #> 12 ENSMUSG00000033740          St18 1.660110e+01 #> 13 ENSMUSG00000051285        Pcmtd1 5.410501e+01 #> 14 ENSMUSG00000025909         Sntg1 1.178725e+00 #> 15 ENSMUSG00000061024          Rrs1 2.098927e+01 #> 16 ENSMUSG00000025911        Adhfe1 1.266112e+01 #> 17 ENSMUSG00000079671 2610203C22Rik 9.474621e+00 #> 18 ENSMUSG00000025912         Mybl1 2.643129e-01 #> 19 ENSMUSG00000045210        Vcpip1 3.456668e+01 #> 20 ENSMUSG00000097893 1700034P13Rik 5.721023e-01 #> 21 ENSMUSG00000025915          Sgk3 2.012592e+01 #> 22 ENSMUSG00000098234         Snhg6 6.784314e+00 #> 23 ENSMUSG00000025916       Ppp1r42 2.585422e-01 #> 24 ENSMUSG00000025917         Cops5 7.909310e+01 #> 25 ENSMUSG00000056763         Cspp1 1.635604e+01 #> 26 ENSMUSG00000067851       Arfgef1 1.582897e+01 #> 27 ENSMUSG00000042501          Cpa6 1.880119e-02 #> 28 ENSMUSG00000048960         Prex2 2.283623e+01 #> 29 ENSMUSG00000057715 A830018L16Rik 9.992140e-01 #> 30 ENSMUSG00000016918         Sulf1 5.567469e+00 #> 31 ENSMUSG00000025938       Slco5a1 2.452015e-01 #> 32 ENSMUSG00000042414        Prdm14 6.142964e-03 #> 33 ENSMUSG00000005886         Ncoa2 1.707928e+01 #>  [ reached 'max' / getOption(\"max.print\") -- omitted 52384 rows ]"},{"path":"/articles/census_compute_over_X.html","id":"counting-cells-grouped-by-dataset-and-gene","dir":"Articles","previous_headings":"","what":"Counting cells grouped by dataset and gene","title":"Computing on X using online (incremental) algorithms","text":"goal example count number cells nonzero reads, grouped gene Census dataset_id. result data frame dataset, gene, number cells nonzero reads dataset gene. multi-factor aggregation, ’ll take advantage dplyr routines instead lower-level vector indexer shown . presentation purposes, ’ll limit query four genes, can expanded genes easily. Don’t forget close census.","code":"library(\"dplyr\")  query <- census$get(\"census_data\")$get(\"mus_musculus\")$axis_query(   measurement_name = \"RNA\",   obs_query = SOMAAxisQuery$new(value_filter = \"tissue=='brain'\"),   var_query = SOMAAxisQuery$new(value_filter = \"feature_name %in% c('Malat1', 'Ptprd', 'Dlg2', 'Pcdh9')\") )  obs_tbl <- query$obs(column_names = c(\"soma_joinid\", \"dataset_id\"))$concat() obs_df <- data.frame(   # materialize soma_joinid as character to avoid overflowing R 32-bit integer   cell_id = as.character(obs_tbl$soma_joinid),   dataset_id = obs_tbl$dataset_id$as_vector() ) var_tbl <- query$var(column_names = c(\"soma_joinid\", \"feature_name\"))$concat() var_df <- data.frame(   gene_id = as.character(var_tbl$soma_joinid),   feature_name = var_tbl$feature_name$as_vector() )  # accumulator for # cells by dataset & gene n_cells_grouped <- data.frame(   \"dataset_id\" = character(0),   \"gene_id\" = character(0),   \"n_cells\" = numeric(0) )  # iterate through in-memory shards of query results tables <- query$X(\"raw\")$tables() while (!tables$read_complete()) {   table_part <- tables$read_next()    # prepare a (dataset,gene,1) tuple for each entry in table_part   n_cells_part <- data.frame(     \"cell_id\" = as.character(table_part$soma_dim_0),     \"gene_id\" = as.character(table_part$soma_dim_1),     \"n_cells\" = 1   )   n_cells_part <- left_join(n_cells_part, obs_df, by = \"cell_id\")   stopifnot(sum(is.null(n_cells_part$dataset_id)) == 0)    # fold those into n_cells_grouped   n_cells_grouped <- n_cells_part %>%     select(-cell_id) %>%     bind_rows(n_cells_grouped) %>%     group_by(dataset_id, gene_id) %>%     summarise(n_cells = sum(n_cells)) %>%     ungroup() }  # add gene names for display n_cells_grouped <- left_join(n_cells_grouped, var_df, by = \"gene_id\") stopifnot(sum(is.null(n_cells_grouped$feature_name)) == 0) n_cells_grouped[c(\"dataset_id\", \"feature_name\", \"n_cells\")] #> # A tibble: 21 x 3 #>    dataset_id                           feature_name n_cells #>                                               #>  1 3bbb6cf9-72b9-41be-b568-656de6eb18b5 Ptprd          79578 #>  2 3bbb6cf9-72b9-41be-b568-656de6eb18b5 Dlg2           79513 #>  3 3bbb6cf9-72b9-41be-b568-656de6eb18b5 Pcdh9          79476 #>  4 3bbb6cf9-72b9-41be-b568-656de6eb18b5 Malat1         79667 #>  5 58b01044-c5e5-4b0f-8a2d-6ebf951e01ff Ptprd            474 #>  6 58b01044-c5e5-4b0f-8a2d-6ebf951e01ff Dlg2              81 #>  7 58b01044-c5e5-4b0f-8a2d-6ebf951e01ff Pcdh9            125 #>  8 58b01044-c5e5-4b0f-8a2d-6ebf951e01ff Malat1         12622 #>  9 66ff82b4-9380-469c-bc4b-cfa08eacd325 Dlg2             856 #> 10 66ff82b4-9380-469c-bc4b-cfa08eacd325 Pcdh9           2910 #> # i 11 more rows census$close()"},{"path":"/articles/census_dataset_presence.html","id":"opening-the-census","dir":"Articles","previous_headings":"","what":"Opening the Census","title":"Genes measured in each cell (dataset presence matrix)","text":"cellxgene.census R package contains convenient API open version Census (default, newest stable version).","code":"library(\"cellxgene.census\") census <- open_soma()"},{"path":"/articles/census_dataset_presence.html","id":"fetching-the-ids-of-the-census-datasets","dir":"Articles","previous_headings":"","what":"Fetching the IDs of the Census datasets","title":"Genes measured in each cell (dataset presence matrix)","text":"Let’s grab table datasets included Census use table combination presence matrix .","code":"# Grab the experiment containing human data, and the measurement therein with RNA human <- census$get(\"census_data\")$get(\"homo_sapiens\") human_rna <- human$ms$get(\"RNA\")  # The census-wide datasets datasets_df <- as.data.frame(census$get(\"census_info\")$get(\"datasets\")$read()$concat()) print(datasets_df) #>    soma_joinid                        collection_id #> 1            0 4dca242c-d302-4dba-a68f-4c61e7bad553 #> 2            1 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 3            2 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 4            3 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 5            4 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 6            5 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 7            6 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 8            7 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 9            8 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 10           9 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 11          10 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #>                                                                       collection_name #> 1                Comparative transcriptomics reveals human-specific cortical features #> 2  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 3  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 4  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 5  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 6  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 7  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 8  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 9  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 10 Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 11 Transcriptomic cytoarchitecture reveals principles of human neocortex organization #>             collection_doi                           dataset_id #> 1  10.1126/science.ade9516 2bdd3a2c-2ff4-4314-adf3-8a06b797a33a #> 2  10.1126/science.adf6812 f5b0810c-1664-4a62-ad06-be1d9964aa8b #> 3  10.1126/science.adf6812 e4ddac12-f48f-4455-8e8d-c2a48a683437 #> 4  10.1126/science.adf6812 e2808a6e-e2ea-41b9-b38c-4a08f1677f02 #> 5  10.1126/science.adf6812 d01c9dff-abd1-4825-bf30-2eb2ba74597e #> 6  10.1126/science.adf6812 c3aa4f95-7a18-4a7d-8dd8-ca324d714363 #> 7  10.1126/science.adf6812 be401db3-d732-408a-b0c4-71af0458b8ab #> 8  10.1126/science.adf6812 a5d5c529-8a1f-40b5-bda3-35208970070d #> 9  10.1126/science.adf6812 9c63201d-bfd9-41a8-bbbc-18d947556f3d #> 10 10.1126/science.adf6812 93cb76aa-a84b-4a92-8e6c-66a914e26d4c #> 11 10.1126/science.adf6812 8d1dd010-5cbc-43fb-83f8-e0de8e8517da #>                      dataset_version_id #> 1  7eb7f2fd-fd74-4c99-863c-97836415652e #> 2  d4427196-7876-4bdd-a929-ae4d177ec776 #> 3  3280113b-7148-4a3e-98d4-015f443aab8a #> 4  dc092185-3b8e-4fcb-ae21-1dc106d683ac #> 5  c4959ded-83dc-4442-aac7-9a59bdb47801 #> 6  0476ef54-aefe-4754-b0e9-d9fcd75adff4 #> 7  ee027704-72aa-4195-a467-0754db1ed65d #> 8  d47c0742-cea2-46c1-9e72-4d479214041c #> 9  8b09695a-1426-4867-961e-c40a1fbcc2da #> 10 98ad7381-f464-4f49-b850-5321b4f98be6 #> 11 c56683d2-452a-45dc-b402-35397e27e325 #>                                           dataset_title #> 1                               Human: Great apes study #> 2                       Dissection: Angular gyrus (AnG) #> 3                Supercluster: CGE-derived interneurons #> 4               Dissection: Primary auditory cortex(A1) #> 5  Supercluster: Deep layer (non-IT) excitatory neurons #> 6        Supercluster: IT-projecting excitatory neurons #> 7           Dissection: Anterior cingulate cortex (ACC) #> 8               Human Multiple Cortical Areas SMART-seq #> 9                Supercluster: MGE-derived interneurons #> 10        Dissection: Primary somatosensory cortex (S1) #> 11                Dissection: Primary visual cortex(V1) #>                            dataset_h5ad_path dataset_total_cell_count #> 1  2bdd3a2c-2ff4-4314-adf3-8a06b797a33a.h5ad                   156285 #> 2  f5b0810c-1664-4a62-ad06-be1d9964aa8b.h5ad                   110752 #> 3  e4ddac12-f48f-4455-8e8d-c2a48a683437.h5ad                   129495 #> 4  e2808a6e-e2ea-41b9-b38c-4a08f1677f02.h5ad                   139054 #> 5  d01c9dff-abd1-4825-bf30-2eb2ba74597e.h5ad                    92969 #> 6  c3aa4f95-7a18-4a7d-8dd8-ca324d714363.h5ad                   638941 #> 7  be401db3-d732-408a-b0c4-71af0458b8ab.h5ad                   135462 #> 8  a5d5c529-8a1f-40b5-bda3-35208970070d.h5ad                    49417 #> 9  9c63201d-bfd9-41a8-bbbc-18d947556f3d.h5ad                   185477 #> 10 93cb76aa-a84b-4a92-8e6c-66a914e26d4c.h5ad                   153159 #> 11 8d1dd010-5cbc-43fb-83f8-e0de8e8517da.h5ad                   241077 #>  [ reached 'max' / getOption(\"max.print\") -- omitted 640 rows ]"},{"path":"/articles/census_dataset_presence.html","id":"fetching-the-dataset-presence-matrix","dir":"Articles","previous_headings":"","what":"Fetching the dataset presence matrix","title":"Genes measured in each cell (dataset presence matrix)","text":"Now let’s fetch dataset presence matrix. convenience, read entire presence matrix (Homo sapiens) sparse matrix. convenience function providing capability: also need var dataframe, read R data frame convenient manipulation:","code":"presence_matrix <- get_presence_matrix(census, \"Homo sapiens\", \"RNA\") print(dim(presence_matrix)) #> NULL var_df <- as.data.frame(human_rna$var$read()$concat()) print(var_df) #>    soma_joinid      feature_id feature_name feature_length      nnz n_measured_obs #> 1            0 ENSG00000233576      HTR3C2P           1057    69370       19581263 #> 2            1 ENSG00000121410         A1BG           3999  5640476       62641311 #> 3            2 ENSG00000268895     A1BG-AS1           3374  3071864       61946057 #> 4            3 ENSG00000148584         A1CF           9603   734347       58195911 #> 5            4 ENSG00000175899          A2M           6318  7894261       62704378 #> 6            5 ENSG00000245105      A2M-AS1           2948  1637794       62086816 #> 7            6 ENSG00000166535        A2ML1           7156  2156616       60911688 #> 8            7 ENSG00000256069        A2MP1           4657   835384       23554778 #> 9            8 ENSG00000184389      A3GALT2           1023   439067       53780311 #> 10           9 ENSG00000128274       A4GALT           3358  2432348       62706770 #> 11          10 ENSG00000118017        A4GNT           1779    52430       56117399 #> 12          11 ENSG00000265544         AA06            632   220755       22545140 #> 13          12 ENSG00000081760         AACS          16039 11280800       62842909 #> 14          13 ENSG00000250420       AACSP1           3380   211588       22831831 #> 15          14 ENSG00000114771        AADAC           1632   552258       54941618 #> 16          15 ENSG00000188984      AADACL3           4055    24626       43074608 #>  [ reached 'max' / getOption(\"max.print\") -- omitted 60648 rows ]"},{"path":"/articles/census_dataset_presence.html","id":"identifying-genes-measured-in-a-specific-dataset","dir":"Articles","previous_headings":"","what":"Identifying genes measured in a specific dataset","title":"Genes measured in each cell (dataset presence matrix)","text":"Now dataset table, genes metadata table, dataset presence matrix, can check gene set genes measured specific dataset. Important: presence matrix indexed soma_joinid, positionally indexed. words: first dimension presence matrix dataset’s soma_joinid, stored census_datasets dataframe. second dimension presence matrix feature’s soma_joinid, stored var dataframe. presence matrix method $take() lets slice soma_joinids census_datasets var. full presence matrix, slices , can exported regular matrix method $get_one_based_matrix() Let’s find gene \"ENSG00000286096\" measured dataset id \"97a17473-e2b1-4f31-a544-44a60773e2dd\".","code":"# Get soma_joinid for datasets and genes of interest var_joinid <- var_df$soma_joinid[var_df$feature_id == \"ENSG00000286096\"] dataset_joinid <- datasets_df$soma_joinid[datasets_df$dataset_id == \"97a17473-e2b1-4f31-a544-44a60773e2dd\"]  # Slice presence matrix with datasets and genes of interest presence_matrix_slice <- presence_matrix$take(i = dataset_joinid, j = var_joinid)  # Convert presence matrix to regular matrix presence_matrix_slice <- presence_matrix_slice$get_one_based_matrix()  # Find how if the gene is present in this dataset is_present <- presence_matrix_slice[, , drop = TRUE] cat(paste(\"Feature is\", if (is_present) \"present.\" else \"not present.\")) #> Feature is present."},{"path":"/articles/census_dataset_presence.html","id":"identifying-datasets-that-measured-specific-genes","dir":"Articles","previous_headings":"","what":"Identifying datasets that measured specific genes","title":"Genes measured in each cell (dataset presence matrix)","text":"Similarly, can determine datasets measured specific gene set genes.","code":"# Grab the feature's soma_joinid from the var dataframe var_joinid <- var_df$soma_joinid[var_df$feature_id == \"ENSG00000286096\"]  # The presence matrix is indexed by the joinids of the dataset and var dataframes, # so slice out the feature of interest by its joinid. presence_matrix_slice <- presence_matrix$take(j = var_joinid)$get_one_based_matrix() measured_datasets <- presence_matrix_slice[, , drop = TRUE] != 0 dataset_joinids <- datasets_df$soma_joinid[measured_datasets]  # From the datasets dataframe, slice out the datasets which have a joinid in the list print(datasets_df[dataset_joinids, ]) #>    soma_joinid                        collection_id #> 63          62 3f50314f-bdc9-40c6-8e4a-b0901ebfbe4c #> 64          63 e5f58829-1a66-40b5-a624-9046778e74f5 #> 65          64 e5f58829-1a66-40b5-a624-9046778e74f5 #> 66          65 e5f58829-1a66-40b5-a624-9046778e74f5 #> 67          66 e5f58829-1a66-40b5-a624-9046778e74f5 #> 69          68 e5f58829-1a66-40b5-a624-9046778e74f5 #> 70          69 e5f58829-1a66-40b5-a624-9046778e74f5 #> 72          71 e5f58829-1a66-40b5-a624-9046778e74f5 #> 73          72 e5f58829-1a66-40b5-a624-9046778e74f5 #> 77          76 e5f58829-1a66-40b5-a624-9046778e74f5 #> 78          77 e5f58829-1a66-40b5-a624-9046778e74f5 #>                                                                                                                             collection_name #> 63 Single-cell sequencing links multiregional immune landscapes and tissue-resident T cells in ccRCC to tumor topology and therapy efficacy #> 64                                                                                                                           Tabula Sapiens #> 65                                                                                                                           Tabula Sapiens #> 66                                                                                                                           Tabula Sapiens #> 67                                                                                                                           Tabula Sapiens #> 69                                                                                                                           Tabula Sapiens #> 70                                                                                                                           Tabula Sapiens #> 72                                                                                                                           Tabula Sapiens #> 73                                                                                                                           Tabula Sapiens #> 77                                                                                                                           Tabula Sapiens #> 78                                                                                                                           Tabula Sapiens #>                 collection_doi                           dataset_id #> 63 10.1016/j.ccell.2021.03.007 bd65a70f-b274-4133-b9dd-0d1431b6af34 #> 64     10.1126/science.abl4896 ff45e623-7f5f-46e3-b47d-56be0341f66b #> 65     10.1126/science.abl4896 f01bdd17-4902-40f5-86e3-240d66dd2587 #> 66     10.1126/science.abl4896 e6a11140-2545-46bc-929e-da243eed2cae #> 67     10.1126/science.abl4896 e5c63d94-593c-4338-a489-e1048599e751 #> 69     10.1126/science.abl4896 d77ec7d6-ef2e-49d6-9e79-05b7f8881484 #> 70     10.1126/science.abl4896 cee11228-9f0b-4e57-afe2-cfe15ee56312 #> 72     10.1126/science.abl4896 a2d4d33e-4c62-4361-b80a-9be53d2e50e8 #> 73     10.1126/science.abl4896 a0754256-f44b-4c4a-962c-a552e47d3fdc #> 77     10.1126/science.abl4896 6d41668c-168c-4500-b06a-4674ccf3e19d #> 78     10.1126/science.abl4896 5e5e7a2f-8f1c-42ac-90dc-b4f80f38e84c #>                      dataset_version_id #> 63 71815674-a8cf-4add-95dd-c5d5d1631597 #> 64 0b29f4ce-5e72-4356-b74b-b54714979234 #> 65 bd13c169-af97-4d8f-ba45-7588808c2e48 #> 66 47615a3d-0a9f-4a78-88ef-5cce2a84637d #> 67 ac7714f0-dce2-40ba-9912-324de6c9a77f #> 69 c7679ec2-652d-437a-bded-3ec2344829e4 #> 70 f89fa18f-c32b-4bae-9511-1a4d18f200e1 #> 72 37ada0d2-9970-4ff2-8bcd-41e80ab6e081 #> 73 1cda78aa-f0d9-4d50-96bf-8bc309318802 #> 77 5297a910-453f-4e3f-af16-e18fd5a79090 #> 78 b783b036-c837-4290-a07d-f6b79a301f59 #>                                                                                                                               dataset_title #> 63 Single-cell sequencing links multiregional immune landscapes and tissue-resident T cells in ccRCC to tumor topology and therapy efficacy #> 64                                                                                                                Tabula Sapiens - Pancreas #> 65                                                                                                          Tabula Sapiens - Salivary_Gland #> 66                                                                                                                   Tabula Sapiens - Heart #> 67                                                                                                                 Tabula Sapiens - Bladder #> 69                                                                                                                Tabula Sapiens - Prostate #> 70                                                                                                                  Tabula Sapiens - Spleen #> 72                                                                                                             Tabula Sapiens - Vasculature #> 73                                                                                                                     Tabula Sapiens - Eye #> 77                                                                                                                   Tabula Sapiens - Liver #> 78                                                                                                                     Tabula Sapiens - Fat #>                            dataset_h5ad_path dataset_total_cell_count #> 63 bd65a70f-b274-4133-b9dd-0d1431b6af34.h5ad                   167283 #> 64 ff45e623-7f5f-46e3-b47d-56be0341f66b.h5ad                    13497 #> 65 f01bdd17-4902-40f5-86e3-240d66dd2587.h5ad                    27199 #> 66 e6a11140-2545-46bc-929e-da243eed2cae.h5ad                    11505 #> 67 e5c63d94-593c-4338-a489-e1048599e751.h5ad                    24583 #> 69 d77ec7d6-ef2e-49d6-9e79-05b7f8881484.h5ad                    16375 #> 70 cee11228-9f0b-4e57-afe2-cfe15ee56312.h5ad                    34004 #> 72 a2d4d33e-4c62-4361-b80a-9be53d2e50e8.h5ad                    16037 #> 73 a0754256-f44b-4c4a-962c-a552e47d3fdc.h5ad                    10650 #> 77 6d41668c-168c-4500-b06a-4674ccf3e19d.h5ad                     5007 #> 78 5e5e7a2f-8f1c-42ac-90dc-b4f80f38e84c.h5ad                    20263 #>  [ reached 'max' / getOption(\"max.print\") -- omitted 31 rows ]"},{"path":"/articles/census_dataset_presence.html","id":"identifying-all-genes-measured-in-a-dataset","dir":"Articles","previous_headings":"","what":"Identifying all genes measured in a dataset","title":"Genes measured in each cell (dataset presence matrix)","text":"Finally, can find set genes measured cells given dataset.","code":"# Slice the dataset(s) of interest, and get the joinid(s) dataset_joinids <- datasets_df$soma_joinid[datasets_df$collection_id == \"17481d16-ee44-49e5-bcf0-28c0780d8c4a\"]  # Slice the presence matrix by the first dimension, i.e., by dataset presence_matrix_slice <- presence_matrix$take(i = dataset_joinids)$get_one_based_matrix() genes_measured <- Matrix::colSums(presence_matrix_slice) > 0 var_joinids <- var_df$soma_joinid[genes_measured]  print(var_df[var_joinids, ]) #>    soma_joinid      feature_id feature_name feature_length      nnz n_measured_obs #> 1            0 ENSG00000233576      HTR3C2P           1057    69370       19581263 #> 2            1 ENSG00000121410         A1BG           3999  5640476       62641311 #> 3            2 ENSG00000268895     A1BG-AS1           3374  3071864       61946057 #> 4            3 ENSG00000148584         A1CF           9603   734347       58195911 #> 5            4 ENSG00000175899          A2M           6318  7894261       62704378 #> 6            5 ENSG00000245105      A2M-AS1           2948  1637794       62086816 #> 9            8 ENSG00000184389      A3GALT2           1023   439067       53780311 #> 10           9 ENSG00000128274       A4GALT           3358  2432348       62706770 #> 12          11 ENSG00000265544         AA06            632   220755       22545140 #> 14          13 ENSG00000250420       AACSP1           3380   211588       22831831 #> 16          15 ENSG00000188984      AADACL3           4055    24626       43074608 #> 18          17 ENSG00000240602      AADACP1           2012    29491       23133490 #> 19          18 ENSG00000109576        AADAT           2970  4524608       61559099 #> 20          19 ENSG00000158122       PRXL2C           3098  5424472       55618144 #> 21          20 ENSG00000103591        AAGAB           4138 12427442       62843055 #> 22          21 ENSG00000115977         AAK1          24843 29280566       62664775 #>  [ reached 'max' / getOption(\"max.print\") -- omitted 27195 rows ]"},{"path":"/articles/census_dataset_presence.html","id":"close-the-census","dir":"Articles","previous_headings":"Identifying all genes measured in a dataset","what":"Close the census","title":"Genes measured in each cell (dataset presence matrix)","text":"use, census object closed release memory resources. also closes SOMA objects accessed via top-level census. Closing can automated using .exit(census$close(), add = TRUE) immediately census <- open_soma().","code":"census$close()"},{"path":"/articles/census_datasets.html","id":"fetching-the-datasets-table","dir":"Articles","previous_headings":"","what":"Fetching the datasets table","title":"Census Datasets example","text":"Census contains top-level data frame itemizing datasets contained therein. can read SOMADataFrame Arrow Table: R data frame: sum cell counts across datasets match number cells across SOMA experiments (human, mouse).","code":"library(\"cellxgene.census\") census <- open_soma() census_datasets <- census$get(\"census_info\")$get(\"datasets\")$read()$concat() print(census_datasets) #> Table #> 651 rows x 9 columns #> $soma_joinid  #> $collection_id  #> $collection_name  #> $collection_doi  #> $dataset_id  #> $dataset_version_id  #> $dataset_title  #> $dataset_h5ad_path  #> $dataset_total_cell_count  census_datasets <- as.data.frame(census_datasets) print(census_datasets[, c(   \"dataset_id\",   \"dataset_title\",   \"dataset_total_cell_count\" )]) #>                              dataset_id #> 1  2bdd3a2c-2ff4-4314-adf3-8a06b797a33a #> 2  f5b0810c-1664-4a62-ad06-be1d9964aa8b #> 3  e4ddac12-f48f-4455-8e8d-c2a48a683437 #> 4  e2808a6e-e2ea-41b9-b38c-4a08f1677f02 #> 5  d01c9dff-abd1-4825-bf30-2eb2ba74597e #> 6  c3aa4f95-7a18-4a7d-8dd8-ca324d714363 #> 7  be401db3-d732-408a-b0c4-71af0458b8ab #> 8  a5d5c529-8a1f-40b5-bda3-35208970070d #> 9  9c63201d-bfd9-41a8-bbbc-18d947556f3d #> 10 93cb76aa-a84b-4a92-8e6c-66a914e26d4c #> 11 8d1dd010-5cbc-43fb-83f8-e0de8e8517da #> 12 716a4acc-919e-4326-9672-ebe06ede84e6 #> 13 5bdc423a-59e6-457d-aa01-debd2c9c564f #> 14 5346f9c6-755e-4336-94cc-38706ec00c2f #> 15 015c230d-650c-4527-870d-8a805849a382 #> 16 d567b692-c374-4628-a508-8008f6778f22 #> 17 cf83c98a-3791-4537-bbde-a719f6d73c13 #> 18 738942eb-ac72-44ff-a64b-8943b5ecd8d9 #> 19 f8d8b443-bca6-4c3c-9042-669dfb7f8030 #> 20 f5be4b96-f5a3-4c3d-84ac-6f69daf744d5 #> 21 dea1aa78-c0a2-413f-b375-f91cce49e4d0 #> 22 92161459-9103-4379-ae34-73a38eee1d1d #> 23 5829c7ba-697f-418e-8b98-d605b192dc48 #> 24 4dd1cd23-fc4d-4fd1-9709-602540f3ca6f #> 25 2856d06c-0ff9-4e01-bfc9-202b74d0b60f #> 26 251b1a7e-d050-4486-8d50-4c2619eb0f46 #> 27 07760522-707a-4a1c-8891-dbd1226d6b27 #> 28 9fcb0b73-c734-40a5-be9c-ace7eea401c9 #> 29 1a38e762-2465-418f-b81c-6a4bce261c34 #> 30 f16a8f4d-bc97-43c5-a2f6-bbda952e4c5c #> 31 94c41723-b2c4-4b59-a49a-64c9b851903e #> 32 6ceeaa86-9ceb-4582-b390-6d4dd6ff0572 #> 33 9a64bf99-ebe5-4276-93a8-bee9dff1cd47 #> 34 fc0ceb80-d2d9-47c1-9d78-b0e45c64c500 #> 35 d0ea3ec4-0f3b-4649-9146-1c0b5f303a55 #> 36 b8920ef5-7d22-497b-abca-a7a9eb76d79a #> 37 b1d37bbd-9ae4-4404-b2f9-f2fe66750e4e #> 38 a4e89c26-e8d4-4471-9b06-16a1405880f0 #> 39 a190b2e9-3796-4785-9a2f-013e2a9a43e6 #> 40 9ff9f9ba-016b-4cbb-8899-45dc20860b8b #> 41 9940f951-3dc0-4579-bbb2-2392786e59a3 #> 42 74d584f0-74fc-482e-b944-e76f29c1ab85 #> 43 6f7fd0f1-a2ed-4ff1-80d3-33dde731cbc3 #> 44 6cda07c7-5d7a-41ba-9799-5bb73da25a60 #> 45 646e3e87-e46b-4b12-85b5-8d8589e26088 #> 46 6437bc9c-16cb-46c8-8f79-9a7384a0212a #> 47 58c43cc2-e00e-43c4-94eb-8501369264e1 #> 48 53bc5729-6202-4351-bc99-1f36139e9dc4 #> 49 44c83972-e5d2-4858-ac58-2df9f4bf564b #> 50 2ecc72f8-085f-4e86-8692-771f316c54f6 #> 51 2e5a9b5d-d31b-4e9f-a179-d5d70ba459fb #> 52 1c9f5c6b-73da-4d17-95de-df080ffe0df1 #> 53 100c6145-7b0e-4ba6-81c1-ffebed0d1ac4 #> 54 0ed60482-a34f-4268-b576-d69cc30210f6 #> 55 0eccaf0c-19d2-4900-9962-899378adf8be #> 56 04c94a7d-1133-42c9-bb48-c697bd302a8d #> 57 0374f03c-62e2-4859-8a14-acb00b0627d5 #> 58 03181d87-4769-41e7-8c39-d9a81835f0d2 #> 59 f171db61-e57e-4535-a06a-35d8b6ef8f2b #> 60 ecf2e08e-2032-4a9e-b466-b65b395f4a02 #> 61 74cff64f-9da9-4b2a-9b3b-8a04a1598040 #> 62 5af90777-6760-4003-9dba-8f945fec6fdf #> 63 bd65a70f-b274-4133-b9dd-0d1431b6af34 #> 64 ff45e623-7f5f-46e3-b47d-56be0341f66b #> 65 f01bdd17-4902-40f5-86e3-240d66dd2587 #> 66 e6a11140-2545-46bc-929e-da243eed2cae #> 67 e5c63d94-593c-4338-a489-e1048599e751 #> 68 d8732da6-8d1d-42d9-b625-f2416c30054b #> 69 d77ec7d6-ef2e-49d6-9e79-05b7f8881484 #> 70 cee11228-9f0b-4e57-afe2-cfe15ee56312 #> 71 a357414d-2042-4eb5-95f0-c58604a18bdd #> 72 a2d4d33e-4c62-4361-b80a-9be53d2e50e8 #> 73 a0754256-f44b-4c4a-962c-a552e47d3fdc #> 74 983d5ec9-40e8-4512-9e65-a572a9c486cb #> 75 7357cee7-9f7f-4ab0-8cec-90de8f047e38 #> 76 6ec405bb-4727-4c6d-ab4e-01fe489af7ea #> 77 6d41668c-168c-4500-b06a-4674ccf3e19d #> 78 5e5e7a2f-8f1c-42ac-90dc-b4f80f38e84c #> 79 55cf0ea3-9d2b-4294-871e-bb4b49a79fc7 #> 80 4f1555bc-4664-46c3-a606-78d34dd10d92 #> 81 2ba40233-8576-4dec-a5f1-2adfa115e2dc #> 82 2423ce2c-3149-4cca-a2ff-cf682ea29b5f #> 83 1c9eb291-6d31-47e1-96b2-129b5e1ae64f #> 84 18eb630b-a754-4111-8cd4-c24ec80aa5ec #> 85 0d2ee4ac-05ee-40b2-afb6-ebb584caa867 #>                                                                                                                               dataset_title #> 1                                                                                                                   Human: Great apes study #> 2                                                                                                           Dissection: Angular gyrus (AnG) #> 3                                                                                                    Supercluster: CGE-derived interneurons #> 4                                                                                                   Dissection: Primary auditory cortex(A1) #> 5                                                                                      Supercluster: Deep layer (non-IT) excitatory neurons #> 6                                                                                            Supercluster: IT-projecting excitatory neurons #> 7                                                                                               Dissection: Anterior cingulate cortex (ACC) #> 8                                                                                                   Human Multiple Cortical Areas SMART-seq #> 9                                                                                                    Supercluster: MGE-derived interneurons #> 10                                                                                            Dissection: Primary somatosensory cortex (S1) #> 11                                                                                                    Dissection: Primary visual cortex(V1) #> 12                                                                                         Dissection: Dorsolateral prefrontal cortex (DFC) #> 13                                                                                                    Dissection: Primary motor cortex (M1) #> 14                                                                                                         Supercluster: Non-neuronal cells #> 15                                                                                                  Dissection: Middle temporal gyrus (MTG) #> 16                                                                       Combined single cell and single nuclei RNA-Seq data - Heart Global #> 17                                                                                                    Global dataset of infant KMT2Ar B-ALL #> 18                                                                                     Normal immune cells landscape of infant KMT2Ar B-ALL #> 19                                                                                                      Human Human Microglia 10x scRNA-seq #> 20                                                                                                    Human Endothelial cells 10x scRNA-seq #> 21                                                                                                 Human Nurr-Negative Nuclei 10x scRNA-seq #> 22                                                                                                 Human Nurr-Positive Nuclei 10x scRNA-seq #> 23                                                                                                     Human Oligodendrocytes 10x scRNA-seq #> 24                                                                                                            Human OPC Cells 10x scRNA-seq #> 25                                                                                                           Human DA Neurons 10x scRNA-seq #> 26                                                                                                       Human Non-DA Neurons 10x scRNA-seq #> 27                                                                                                           Human Astrocytes 10x scRNA-seq #> 28                                                                              An Integrated Single Cell Meta-atlas of Human Periodontitis #> 29                                                                Single-cell analysis of prenatal and postnatal human cortical development #> 30                                                       All - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse #> 31                                                                                    snRNA-seq of human anterior and posterior hippocampus #> 32                                                                                                                        3-prime FGID data #> 33                                                      Single-Cell RNA Sequencing of Breast Tissues: Cell Subtypes and Cancer Risk Factors #> 34                                                                            Sst Chodl - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 35                                                                                  L6b - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 36                                                                              L5/6 NP - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 37                                                                                 Sncg - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 38                                                                                L6 CT - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 39                                                                           Lamp5 Lhx6 - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 40                                                                                L4 IT - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 41                                                                      Oligodendrocyte - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 42                                                                            Astrocyte - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 43                                                                       Whole Taxonomy - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 44                                                                                L5 ET - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 45                                                                              L2/3 IT - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 46                                                                                L6 IT - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 47                                                                                  OPC - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 48                                                                                  Vip - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 49                                                                                L5 IT - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 50                                                                          Endothelial - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 51                                                                                 VLMC - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 52                                                                           L6 IT Car3 - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 53                                                                        Microglia-PVM - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 54                                                                                Lamp5 - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 55                                                                                 Pax6 - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 56                                                                                Pvalb - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 57                                                                           Chandelier - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 58                                                                                  Sst - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 59                                                                                                                   donor_p13_trophoblasts #> 60                                                                                                                  All donors trophoblasts #> 61                                                                                                     All donors all cell states (in vivo) #> 62                                                                     Single-cell transcriptomic datasets of Renal cell carcinoma patients #> 63 Single-cell sequencing links multiregional immune landscapes and tissue-resident T cells in ccRCC to tumor topology and therapy efficacy #> 64                                                                                                                Tabula Sapiens - Pancreas #> 65                                                                                                          Tabula Sapiens - Salivary_Gland #> 66                                                                                                                   Tabula Sapiens - Heart #> 67                                                                                                                 Tabula Sapiens - Bladder #> 68                                                                                                                 Tabula Sapiens - Trachea #> 69                                                                                                                Tabula Sapiens - Prostate #> 70                                                                                                                  Tabula Sapiens - Spleen #> 71                                                                                                         Tabula Sapiens - Small_Intestine #> 72                                                                                                             Tabula Sapiens - Vasculature #> 73                                                                                                                     Tabula Sapiens - Eye #> 74                                                                                                                   Tabula Sapiens - Blood #> 75                                                                                                         Tabula Sapiens - Large_Intestine #> 76                                                                                                                  Tabula Sapiens - Uterus #> 77                                                                                                                   Tabula Sapiens - Liver #> 78                                                                                                                     Tabula Sapiens - Fat #> 79                                                                                                                  Tabula Sapiens - Tongue #> 80                                                                                                             Tabula Sapiens - Bone_Marrow #> 81                                                                                                                 Tabula Sapiens - Mammary #> 82                                                                                                                  Tabula Sapiens - Kidney #> 83                                                                                                                  Tabula Sapiens - Muscle #> 84                                                                                                              Tabula Sapiens - Lymph_Node #> 85                                                                                                                    Tabula Sapiens - Lung #>    dataset_total_cell_count #> 1                    156285 #> 2                    110752 #> 3                    129495 #> 4                    139054 #> 5                     92969 #> 6                    638941 #> 7                    135462 #> 8                     49417 #> 9                    185477 #> 10                   153159 #> 11                   241077 #> 12                   113339 #> 13                   114605 #> 14                   108940 #> 15                   148374 #> 16                   493236 #> 17                   128588 #> 18                    36313 #> 19                    33041 #> 20                    14903 #> 21                   104097 #> 22                    80576 #> 23                   178815 #> 24                    13691 #> 25                    22048 #> 26                    91479 #> 27                    33506 #> 28                   105918 #> 29                   700391 #> 30                   356213 #> 31                   129905 #> 32                    89849 #> 33                    52681 #> 34                     1772 #> 35                    17996 #> 36                    18154 #> 37                    23640 #> 38                    27454 #> 39                    21603 #> 40                    76195 #> 41                   136076 #> 42                    82936 #> 43                  1309414 #> 44                     3848 #> 45                   317116 #> 46                    44174 #> 47                    27670 #> 48                    95014 #> 49                    97173 #> 50                     2496 #> 51                     4619 #> 52                    13007 #> 53                    40625 #> 54                    52828 #> 55                     8984 #> 56                   109618 #> 57                    14871 #> 58                    71545 #> 59                    31497 #> 60                    67070 #> 61                   286326 #> 62                   270855 #> 63                   167283 #> 64                    13497 #> 65                    27199 #> 66                    11505 #> 67                    24583 #> 68                     9522 #> 69                    16375 #> 70                    34004 #> 71                    12467 #> 72                    16037 #> 73                    10650 #> 74                    50115 #> 75                    13680 #> 76                     7124 #> 77                     5007 #> 78                    20263 #> 79                    15020 #> 80                    12297 #> 81                    11375 #> 82                     9641 #> 83                    30746 #> 84                    53275 #> 85                    35682 #>  [ reached 'max' / getOption(\"max.print\") -- omitted 566 rows ] census_data <- census$get(\"census_data\") all_experiments <- lapply(census_data$to_list(), function(x) census_data$get(x$name)) print(all_experiments) #> $homo_sapiens #>  #>   uri: s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/census_data/homo_sapiens  #>   arrays: obs*  #>   groups: ms*  #>  #> $mus_musculus #>  #>   uri: s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/census_data/mus_musculus  #>   arrays: obs*  #>   groups: ms* experiments_total_cells <- sum(sapply(all_experiments, function(x) {   nrow(x$obs$read(column_names = c(\"soma_joinid\"))$concat()) }))  print(paste(\"Found\", experiments_total_cells, \"cells in all experiments.\")) #> [1] \"Found 68683222 cells in all experiments.\" print(paste(   \"Found\", sum(as.vector(census_datasets$dataset_total_cell_count)),   \"cells in all datasets.\" )) #> [1] \"Found 68683222 cells in all datasets.\""},{"path":"/articles/census_datasets.html","id":"fetching-the-expression-data-from-a-single-dataset","dir":"Articles","previous_headings":"","what":"Fetching the expression data from a single dataset","title":"Census Datasets example","text":"Let’s pick one dataset slice census, turn Seurat -memory object. (requires Seurat package installed beforehand.) Create query mouse experiment, “RNA” measurement, dataset_id.","code":"census_datasets[census_datasets$dataset_id == \"0bd1a1de-3aee-40e0-b2ec-86c7a30c7149\", ] #>     soma_joinid                        collection_id    collection_name #> 581         580 0b9d8a04-bb9d-44da-aa27-705bb65b54eb Tabula Muris Senis #>                collection_doi                           dataset_id #> 581 10.1038/s41586-020-2496-1 0bd1a1de-3aee-40e0-b2ec-86c7a30c7149 #>                       dataset_version_id #> 581 ff352f35-58a2-4962-b716-649d1f9e9f44 #>                                                                                        dataset_title #> 581 Bone marrow - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - 10x #>                             dataset_h5ad_path dataset_total_cell_count #> 581 0bd1a1de-3aee-40e0-b2ec-86c7a30c7149.h5ad                    40220 library(\"tiledbsoma\") obs_query <- SOMAAxisQuery$new(   value_filter = \"dataset_id == '0bd1a1de-3aee-40e0-b2ec-86c7a30c7149'\" ) expt_query <- census_data$get(\"mus_musculus\")$axis_query(   measurement_name = \"RNA\",   obs_query = obs_query ) dataset_seurat <- expt_query$to_seurat(c(counts = \"raw\")) print(dataset_seurat) #> An object of class Seurat  #> 52417 features across 40220 samples within 1 assay  #> Active assay: RNA (52417 features, 0 variable features) #>  2 layers present: counts, data #>  1 dimensional reduction calculated: scvi"},{"path":"/articles/census_datasets.html","id":"downloading-the-original-source-h5ad-file-of-a-dataset","dir":"Articles","previous_headings":"","what":"Downloading the original source H5AD file of a dataset","title":"Census Datasets example","text":"can use cellxgene.census::get_source_h5ad_uri() API fetch URI pointing H5AD associated dataset_id. H5AD can download CZ CELLxGENE Discover, may contain additional data-submitter provided information included Census. can fetch location cloud directly download system. local H5AD file can used R using SeuratDisk’s anndata converter.","code":"# Option 1: Direct download download_source_h5ad(   dataset_id = \"0bd1a1de-3aee-40e0-b2ec-86c7a30c7149\",   file = \"/tmp/Tabula_Muris_Senis-bone_marrow.h5ad\",   overwrite = TRUE ) # Option 2: Get location and download via preferred method get_source_h5ad_uri(\"0bd1a1de-3aee-40e0-b2ec-86c7a30c7149\") #> $uri #> [1] \"s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/h5ads/0bd1a1de-3aee-40e0-b2ec-86c7a30c7149.h5ad\" #>  #> $s3_region #> [1] \"us-west-2\""},{"path":"/articles/census_datasets.html","id":"close-the-census","dir":"Articles","previous_headings":"Downloading the original source H5AD file of a dataset","what":"Close the census","title":"Census Datasets example","text":"use, census object closed release memory resources. also closes SOMA objects accessed via top-level census. Closing can automated using .exit(census$close(), add = TRUE) immediately census <- open_soma().","code":"census$close()"},{"path":"/articles/census_query_extract.html","id":"opening-the-census","dir":"Articles","previous_headings":"","what":"Opening the census","title":"Querying and fetching the single-cell data and cell/gene metadata","text":"cellxgene.census R package contains convenient API open version Census (default, newest stable version). can learn cellxgene.census methods accessing corresponding documentation, example ?cellxgene.census::open_soma.","code":"library(\"cellxgene.census\") census <- open_soma()"},{"path":"/articles/census_query_extract.html","id":"querying-cell-metadata-obs","dir":"Articles","previous_headings":"","what":"Querying cell metadata (obs)","title":"Querying and fetching the single-cell data and cell/gene metadata","text":"human gene metadata Census, RNA assays, located census$get(\"census_data\")$get(\"homo_sapiens\")$obs. SOMADataFrame can materialized R data frame (tibble) using .data.frame(obs$read()$concat()). mouse cell metadata census$get(\"census_data\")$get(\"mus_musculus\").obs. slicing cell metadata two relevant arguments can passed read(): column_names — character vector indicating metadata columns fetch. Expressions one comparisons Comparisons one       Expressions can combine comparisons using && || op one < | > | <= | >= | == | != %% learn metadata columns available fetching filtering can directly look keys cell metadata. soma_joinid special SOMADataFrame column used join operations. definition columns can found Census schema. can used fetch specific columns specific rows matching condition. latter need know values looking priori. example let’s see possible values available sex. can load cell metadata fetching column sex. can see three different values sex, \"male\", \"female\" \"unknown\". information can fetch cell metatadata specific sex value, example \"unknown\". can use column_names value_filter perform specific queries. example let’s fetch disease column cell_type \"B cell\" tissue_general \"lung\".","code":"census$get(\"census_data\")$get(\"homo_sapiens\")$obs$colnames() #>  [1] \"soma_joinid\"                              #>  [2] \"dataset_id\"                               #>  [3] \"assay\"                                    #>  [4] \"assay_ontology_term_id\"                   #>  [5] \"cell_type\"                                #>  [6] \"cell_type_ontology_term_id\"               #>  [7] \"development_stage\"                        #>  [8] \"development_stage_ontology_term_id\"       #>  [9] \"disease\"                                  #> [10] \"disease_ontology_term_id\"                 #> [11] \"donor_id\"                                 #> [12] \"is_primary_data\"                          #> [13] \"self_reported_ethnicity\"                  #> [14] \"self_reported_ethnicity_ontology_term_id\" #> [15] \"sex\"                                      #> [16] \"sex_ontology_term_id\"                     #> [17] \"suspension_type\"                          #> [18] \"tissue\"                                   #> [19] \"tissue_ontology_term_id\"                  #> [20] \"tissue_general\"                           #> [21] \"tissue_general_ontology_term_id\"          #> [22] \"raw_sum\"                                  #> [23] \"nnz\"                                      #> [24] \"raw_mean_nnz\"                             #> [25] \"raw_variance_nnz\"                         #> [26] \"n_measured_vars\" unique(as.data.frame(census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(column_names = \"sex\")$concat())) #>             sex #> 1          male #> 224      female #> 3747640 unknown as.data.frame(census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(value_filter = \"sex == 'unknown'\")$concat()) #>   soma_joinid                           dataset_id     assay assay_ontology_term_id #> 1     3747639 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 2     3747640 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 3     3747641 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 4     3747642 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 5     3747643 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 6     3747644 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 7     3747645 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 8     3747646 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 9     3747647 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #>    cell_type cell_type_ontology_term_id development_stage #> 1 fibroblast                 CL:0000057 human adult stage #> 2 fibroblast                 CL:0000057 human adult stage #> 3 fibroblast                 CL:0000057 human adult stage #> 4 fibroblast                 CL:0000057 human adult stage #> 5 fibroblast                 CL:0000057 human adult stage #> 6 fibroblast                 CL:0000057 human adult stage #> 7 fibroblast                 CL:0000057 human adult stage #> 8 fibroblast                 CL:0000057 human adult stage #> 9 fibroblast                 CL:0000057 human adult stage #>   development_stage_ontology_term_id disease disease_ontology_term_id #> 1                     HsapDv:0000087  normal             PATO:0000461 #> 2                     HsapDv:0000087  normal             PATO:0000461 #> 3                     HsapDv:0000087  normal             PATO:0000461 #> 4                     HsapDv:0000087  normal             PATO:0000461 #> 5                     HsapDv:0000087  normal             PATO:0000461 #> 6                     HsapDv:0000087  normal             PATO:0000461 #> 7                     HsapDv:0000087  normal             PATO:0000461 #> 8                     HsapDv:0000087  normal             PATO:0000461 #> 9                     HsapDv:0000087  normal             PATO:0000461 #>                       donor_id is_primary_data self_reported_ethnicity #> 1 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 2 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 3 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 4 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 5 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 6 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 7 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 8 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 9 Pagella_GSE161267_GSM4904134            TRUE                 unknown #>   self_reported_ethnicity_ontology_term_id     sex sex_ontology_term_id suspension_type #> 1                                  unknown unknown              unknown            cell #> 2                                  unknown unknown              unknown            cell #> 3                                  unknown unknown              unknown            cell #> 4                                  unknown unknown              unknown            cell #> 5                                  unknown unknown              unknown            cell #> 6                                  unknown unknown              unknown            cell #> 7                                  unknown unknown              unknown            cell #> 8                                  unknown unknown              unknown            cell #> 9                                  unknown unknown              unknown            cell #>    tissue tissue_ontology_term_id tissue_general tissue_general_ontology_term_id #> 1 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 2 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 3 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 4 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 5 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 6 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 7 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 8 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 9 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #>   raw_sum  nnz raw_mean_nnz raw_variance_nnz n_measured_vars #> 1     547  329     1.662614        14.559604           31602 #> 2     982  563     1.744227         5.315247           31602 #> 3   12467 3809     3.273038       109.305683           31602 #> 4    1053  566     1.860424         7.430042           31602 #> 5     548  363     1.509642         2.410818           31602 #> 6     678  429     1.580420        11.379616           31602 #> 7     848  524     1.618321         9.437216           31602 #> 8     935  608     1.537829         4.868418           31602 #> 9     735  485     1.515464         6.213087           31602 #>  [ reached 'max' / getOption(\"max.print\") -- omitted 3301779 rows ] cell_metadata_b_cell <- census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(   value_filter = \"cell_type == 'B cell' & tissue_general == 'lung'\",   column_names = \"disease\" )  cell_metadata_b_cell <- as.data.frame(cell_metadata_b_cell$concat())  table(cell_metadata_b_cell) #> disease #>                              COVID-19 chronic obstructive pulmonary disease  #>                                  2729                                  6369  #>          hypersensitivity pneumonitis             interstitial lung disease  #>                                    52                                   376  #>                   lung adenocarcinoma             lung large cell carcinoma  #>                                 62351                                  1534  #>              lymphangioleiomyomatosis         non-small cell lung carcinoma  #>                                   133                                 17484  #>   non-specific interstitial pneumonia                                normal  #>                                   231                                 25461  #>                 pleomorphic carcinoma                             pneumonia  #>                                  1210                                    50  #>                   pulmonary emphysema                    pulmonary fibrosis  #>                                  1512                                  6798  #>                 pulmonary sarcoidosis             small cell lung carcinoma  #>                                     6                                   583  #>          squamous cell lung carcinoma  #>                                 11920"},{"path":"/articles/census_query_extract.html","id":"querying-gene-metadata-var","dir":"Articles","previous_headings":"","what":"Querying gene metadata (var)","title":"Querying and fetching the single-cell data and cell/gene metadata","text":"human gene metadata Census located census$get(\"census_data\")$get(\"homo_sapiens\")$ms$get(\"RNA\")$var. Similarly cell metadata, SOMADataFrame thus can also use method read(). mouse gene metadata census$get(\"census_data\")$get(\"mus_musculus\")$ms$get(\"RNA\")$var. Let’s take look metadata available column selection row filtering. exception soma_joinid columns defined Census schema. Similarly cell metadata, can use operations learn fetch gene metadata. example, get feature_name feature_length genes \"ENSG00000161798\" \"ENSG00000188229\" can following.","code":"census$get(\"census_data\")$get(\"homo_sapiens\")$ms$get(\"RNA\")$var$colnames() #> [1] \"soma_joinid\"    \"feature_id\"     \"feature_name\"   \"feature_length\" \"nnz\"            #> [6] \"n_measured_obs\" var_df <- census$get(\"census_data\")$get(\"homo_sapiens\")$ms$get(\"RNA\")$var$read(   value_filter = \"feature_id %in% c('ENSG00000161798', 'ENSG00000188229')\",   column_names = c(\"feature_name\", \"feature_length\") )  as.data.frame(var_df$concat()) #>   feature_name feature_length #> 1         AQP5           1884 #> 2       TUBB4B           2037"},{"path":"/articles/census_query_extract.html","id":"querying-expression-data-as-seurat","dir":"Articles","previous_headings":"","what":"Querying expression data as Seurat","title":"Querying and fetching the single-cell data and cell/gene metadata","text":"convenient way query fetch expression data use get_seurat method cellxgene.census API. method combines column selection value filtering described obtain slices expression data based metadata queries. method return Seurat object, takes input census object, string organism, cell gene metadata can specify filters column selection described following arguments: obs_column_names — character vector indicating columns select cell metadata. obs_value_filter — expression selection conditions fetch cells meeting criteria. var_column_names — character vector indicating columns select gene metadata. var_value_filter — expression selection conditions fetch genes meeting criteria. example want fetch expression data : Genes \"ENSG00000161798\" \"ENSG00000188229\". \"B cells\" \"lung\" \"COVID-19\". gene metadata adding sex cell metadata. full description refer ?cellxgene.census::get_seurat.","code":"library(\"Seurat\")  seurat_obj <- get_seurat(   census, \"Homo sapiens\",   obs_column_names = c(\"cell_type\", \"tissue_general\", \"disease\", \"sex\"),   var_value_filter = \"feature_id %in% c('ENSG00000161798', 'ENSG00000188229')\",   obs_value_filter = \"cell_type == 'B cell' & tissue_general == 'lung' & disease == 'COVID-19'\" ) seurat_obj #> An object of class Seurat  #> 2 features across 2729 samples within 1 assay  #> Active assay: RNA (2 features, 0 variable features) #>  2 layers present: counts, data head(seurat_obj[[]]) #>                 orig.ident nCount_RNA nFeature_RNA cell_type tissue_general  disease #> cell13391229 SeuratProject          0            0    B cell           lung COVID-19 #> cell13393737 SeuratProject          1            1    B cell           lung COVID-19 #> cell13394391 SeuratProject          0            0    B cell           lung COVID-19 #> cell13394897 SeuratProject          0            0    B cell           lung COVID-19 #> cell13395941 SeuratProject          0            0    B cell           lung COVID-19 #> cell13397408 SeuratProject          0            0    B cell           lung COVID-19 #>                  sex #> cell13391229    male #> cell13393737 unknown #> cell13394391    male #> cell13394897 unknown #> cell13395941    male #> cell13397408 unknown head(seurat_obj$RNA[[]]) #>                 feature_name feature_length      nnz n_measured_obs #> ENSG00000161798         AQP5           1884  1029069       58250439 #> ENSG00000188229       TUBB4B           2037 21416107       62655002"},{"path":"/articles/census_query_extract.html","id":"querying-expression-data-as-singlecellexperiment","dir":"Articles","previous_headings":"","what":"Querying expression data as SingleCellExperiment","title":"Querying and fetching the single-cell data and cell/gene metadata","text":"Similarly previous section, get_single_cell_experiment method cellxgene.census API. behaves exactly get_seurat returns SingleCellExperiment object. example, repeat query can simply following. full description refer ?cellxgene.census::get_single_cell_experiment.","code":"library(\"SingleCellExperiment\")  sce_obj <- get_single_cell_experiment(   census, \"Homo sapiens\",   obs_column_names = c(\"cell_type\", \"tissue_general\", \"disease\", \"sex\"),   var_value_filter = \"feature_id %in% c('ENSG00000161798', 'ENSG00000188229')\",   obs_value_filter = \"cell_type == 'B cell' & tissue_general == 'lung' & disease == 'COVID-19'\" ) sce_obj #> class: SingleCellExperiment  #> dim: 2 2729  #> metadata(0): #> assays(1): counts #> rownames(2): ENSG00000161798 ENSG00000188229 #> rowData names(4): feature_name feature_length nnz n_measured_obs #> colnames(2729): obs13391229 obs13393737 ... obs54635684 obs54635708 #> colData names(4): cell_type tissue_general disease sex #> reducedDimNames(0): #> mainExpName: RNA #> altExpNames(0): head(colData(sce_obj)) #> DataFrame with 6 rows and 4 columns #>               cell_type tissue_general     disease         sex #>                    #> obs13391229      B cell           lung    COVID-19        male #> obs13393737      B cell           lung    COVID-19     unknown #> obs13394391      B cell           lung    COVID-19        male #> obs13394897      B cell           lung    COVID-19     unknown #> obs13395941      B cell           lung    COVID-19        male #> obs13397408      B cell           lung    COVID-19     unknown head(rowData(sce_obj)) #> DataFrame with 2 rows and 4 columns #>                 feature_name feature_length       nnz n_measured_obs #>                                #> ENSG00000161798         AQP5           1884   1029069       58250439 #> ENSG00000188229       TUBB4B           2037  21416107       62655002"},{"path":"/articles/census_query_extract.html","id":"close-the-census","dir":"Articles","previous_headings":"Querying expression data as SingleCellExperiment","what":"Close the census","title":"Querying and fetching the single-cell data and cell/gene metadata","text":"use, census object closed release memory resources. also closes SOMA objects accessed via top-level census. Closing can automated using .exit(census$close(), add = TRUE) immediately census <- open_soma().","code":"census$close()"},{"path":"/articles/comp_bio_census_info.html","id":"opening-the-census","dir":"Articles","previous_headings":"","what":"Opening the Census","title":"Learning about the CZ CELLxGENE Census","text":"cellxgene.census R package contains convenient open_soma() API open version Census (stable default). can learn cellxgene.census methods accessing corresponding documentation, example ?cellxgene.census::open_soma.","code":"library(\"cellxgene.census\") census <- open_soma()"},{"path":"/articles/comp_bio_census_info.html","id":"census-organization","dir":"Articles","previous_headings":"","what":"Census organization","title":"Learning about the CZ CELLxGENE Census","text":"Census schema defines structure Census. short, can think Census structured collection items stores different pieces information. items parent collection SOMA objects various types can accessed TileDB-SOMA API (documentation). cellxgene.census package contains convenient wrappers TileDB-SOMA API. example function used open Census: cellxgene_census.open_soma().","code":""},{"path":"/articles/comp_bio_census_info.html","id":"main-census-components","dir":"Articles","previous_headings":"Census organization","what":"Main Census components","title":"Learning about the CZ CELLxGENE Census","text":"command created census, SOMACollection, R6 class providing key-value associative map. get() method can access two top-level collection members, census_info census_data, instances SOMACollection.","code":""},{"path":"/articles/comp_bio_census_info.html","id":"census-summary-info","dir":"Articles","previous_headings":"Census organization","what":"Census summary info","title":"Learning about the CZ CELLxGENE Census","text":"census$get(\"census_info\")$get(\"summary\"): data frame high-level information Census, e.g. build date, total cell count, etc. census$get(\"census_info\")$get(\"datasets\"): data frame datasets CELLxGENE Discover used create Census. census$get(\"census_info\")$get(\"summary_cell_counts\"): data frame cell counts stratified relevant cell metadata Census data Data organism stored independent SOMAExperiment objects specialized form SOMACollection. store data matrix (cell genes), cell metadata, gene metadata, useful components covered notebook. data organized one organism – Homo sapiens: census$get(\"census_data\")$get(\"homo_sapiens\")$obs: Cell metadata census$get(\"census_data\")$get(\"homo_sapiens\")$ms$get(\"RNA\"): Data matrices, currently … census$get(\"census_data\")$get(\"homo_sapiens\")$ms$get(\"RNA\")$X$get(\"raw\"): matrix raw counts SOMASparseNDArray census$get(\"census_data\")$get(\"homo_sapiens\")$ms$get(\"RNA\")$var: Gene Metadata","code":""},{"path":"/articles/comp_bio_census_info.html","id":"cell-metadata","dir":"Articles","previous_headings":"","what":"Cell metadata","title":"Learning about the CZ CELLxGENE Census","text":"can obtain cell metadata variables directly querying columns corresponding SOMADataFrame. variables can used querying Census case want work specific cells. variables defined CELLxGENE dataset schema except following: soma_joinid: SOMA-defined value use join operations. dataset_id: dataset id encoded census$get(\"census_info\")$get(\"datasets\"). tissue_general tissue_general_ontology_term_id: high-level tissue mapping.","code":"census$get(\"census_data\")$get(\"homo_sapiens\")$obs$colnames() #>  [1] \"soma_joinid\"                              #>  [2] \"dataset_id\"                               #>  [3] \"assay\"                                    #>  [4] \"assay_ontology_term_id\"                   #>  [5] \"cell_type\"                                #>  [6] \"cell_type_ontology_term_id\"               #>  [7] \"development_stage\"                        #>  [8] \"development_stage_ontology_term_id\"       #>  [9] \"disease\"                                  #> [10] \"disease_ontology_term_id\"                 #> [11] \"donor_id\"                                 #> [12] \"is_primary_data\"                          #> [13] \"self_reported_ethnicity\"                  #> [14] \"self_reported_ethnicity_ontology_term_id\" #> [15] \"sex\"                                      #> [16] \"sex_ontology_term_id\"                     #> [17] \"suspension_type\"                          #> [18] \"tissue\"                                   #> [19] \"tissue_ontology_term_id\"                  #> [20] \"tissue_general\"                           #> [21] \"tissue_general_ontology_term_id\"          #> [22] \"raw_sum\"                                  #> [23] \"nnz\"                                      #> [24] \"raw_mean_nnz\"                             #> [25] \"raw_variance_nnz\"                         #> [26] \"n_measured_vars\""},{"path":"/articles/comp_bio_census_info.html","id":"gene-metadata","dir":"Articles","previous_headings":"","what":"Gene metadata","title":"Learning about the CZ CELLxGENE Census","text":"Similarly, can obtain gene metadata variables directly querying columns corresponding SOMADataFrame. variables can use querying Census case specific genes interested . variables defined CELLxGENE dataset schema except following: soma_joinid: SOMA-defined value use join operations. feature_length: length base pairs gene.","code":"census$get(\"census_data\")$get(\"homo_sapiens\")$ms$get(\"RNA\")$var$colnames() #> [1] \"soma_joinid\"    \"feature_id\"     \"feature_name\"   \"feature_length\" \"nnz\"            #> [6] \"n_measured_obs\""},{"path":"/articles/comp_bio_census_info.html","id":"census-summary-content-tables","dir":"Articles","previous_headings":"","what":"Census summary content tables","title":"Learning about the CZ CELLxGENE Census","text":"can take quick look high-level Census information looking census$get(\"census_info\")$get(\"summary\"): special interest label-value combinations : total_cell_count total number cells Census. unique_cell_count number unique cells, cells may present twice due meta-analysis consortia-like data. number_donors_homo_sapiens number_donors_mus_musculus number individuals human mouse. guaranteed unique one individual ID may present identical different datasets.","code":"as.data.frame(census$get(\"census_info\")$get(\"summary\")$read()$concat()) #>   soma_joinid                      label      value #> 1           0      census_schema_version      1.2.0 #> 2           1          census_build_date 2023-10-23 #> 3           2     dataset_schema_version      3.1.0 #> 4           3           total_cell_count   68683222 #> 5           4          unique_cell_count   40356133 #> 6           5 number_donors_homo_sapiens      15588 #> 7           6 number_donors_mus_musculus       1990"},{"path":"/articles/comp_bio_census_info.html","id":"cell-counts-by-cell-metadata","dir":"Articles","previous_headings":"Census summary content tables","what":"Cell counts by cell metadata","title":"Learning about the CZ CELLxGENE Census","text":"looking census$get(\"census_info)$get(\"summary_cell_counts\") can get general idea cell counts stratified relevant cell metadata. cell metadata included table, can take look cell gene metadata available sections “Cell metadata” “Gene metadata”. line retrieves table casts R data frame: combination organism values category cell metadata can take look total_cell_count unique_cell_count cell counts combination. values category specified ontology_term_id label, value’s IDs labels, respectively.","code":"census_counts <- as.data.frame(census$get(\"census_info\")$get(\"summary_cell_counts\")$read()$concat()) head(census_counts) #>   soma_joinid     organism category ontology_term_id unique_cell_count total_cell_count #> 1           0 Homo sapiens      all               na          36227903         62998417 #> 2           1 Homo sapiens    assay      EFO:0008722            264166           279635 #> 3           2 Homo sapiens    assay      EFO:0008780             25652            51304 #> 4           3 Homo sapiens    assay      EFO:0008796             54753            54753 #> 5           4 Homo sapiens    assay      EFO:0008919             89477           206754 #> 6           5 Homo sapiens    assay      EFO:0008931             78750           188248 #>        label #> 1         na #> 2   Drop-seq #> 3     inDrop #> 4   MARS-seq #> 5   Seq-Well #> 6 Smart-seq2"},{"path":"/articles/comp_bio_census_info.html","id":"example-cell-metadata-included-in-the-summary-counts-table","dir":"Articles","previous_headings":"Census summary content tables > Cell counts by cell metadata","what":"Example: cell metadata included in the summary counts table","title":"Learning about the CZ CELLxGENE Census","text":"get available cell metadata summary counts table can following. Remember cell metadata available, variables omitted creation table.","code":"t(table(census_counts$organism, census_counts$category)) #>                           #>                           Homo sapiens Mus musculus #>   all                                1            1 #>   assay                             20           10 #>   cell_type                        631          248 #>   disease                           72            5 #>   self_reported_ethnicity           30            1 #>   sex                                3            3 #>   suspension_type                    1            1 #>   tissue                           230           74 #>   tissue_general                    53           27"},{"path":"/articles/comp_bio_census_info.html","id":"example-cell-counts-for-each-sequencing-assay-in-human-data","dir":"Articles","previous_headings":"Census summary content tables > Cell counts by cell metadata","what":"Example: cell counts for each sequencing assay in human data","title":"Learning about the CZ CELLxGENE Census","text":"get cell counts sequencing assay type human data, can perform following operations:","code":"human_assay_counts <- census_counts[census_counts$organism == \"Homo sapiens\" & census_counts$category == \"assay\", ] human_assay_counts <- human_assay_counts[order(human_assay_counts$total_cell_count, decreasing = TRUE), ]"},{"path":"/articles/comp_bio_census_info.html","id":"example-number-of-microglial-cells-in-the-census","dir":"Articles","previous_headings":"Census summary content tables > Cell counts by cell metadata","what":"Example: number of microglial cells in the Census","title":"Learning about the CZ CELLxGENE Census","text":"specific term categories shown can directly find number cells term.","code":"census_counts[census_counts$label == \"microglial cell\", ] #>      soma_joinid     organism  category ontology_term_id unique_cell_count #> 72            71 Homo sapiens cell_type       CL:0000129            359243 #> 1080        1079 Mus musculus cell_type       CL:0000129             48998 #>      total_cell_count           label #> 72             544977 microglial cell #> 1080            75885 microglial cell"},{"path":"/articles/comp_bio_census_info.html","id":"understanding-census-contents-beyond-the-summary-tables","dir":"Articles","previous_headings":"","what":"Understanding Census contents beyond the summary tables","title":"Learning about the CZ CELLxGENE Census","text":"using pre-computed tables census$get(\"census_info\") easy quick way understand contents Census, falls short want learn certain slices Census. example, may want learn : cell types available human liver? total number cells lung datasets stratified sequencing technology? sex distribution cells brain mouse? diseases available T cells? questions can answered directly querying cell metadata shown examples .","code":""},{"path":"/articles/comp_bio_census_info.html","id":"example-all-cell-types-available-in-human","dir":"Articles","previous_headings":"Understanding Census contents beyond the summary tables","what":"Example: all cell types available in human","title":"Learning about the CZ CELLxGENE Census","text":"exemplify process accessing slicing cell metadata summary stats, let’s start trivial example take look human cell types available Census: number rows total number cells humans. Now, wish get cell counts per cell type can work data frame. addition, focus cells marked is_primary_data=TRUE ensures de-duplicate cells appear CELLxGENE Discover. number unique cells. Now let’s look counts per cell type: shows abundant cell types “glutamatergic neuron”, “CD8-positive, alpha-beta T cell”, “CD4-positive, alpha-beta T cell”. Now let’s take look number unique cell types: total number different cell types human. information example can quickly obtained summary table census$get(\"census-info\")$get(\"summary_cell_counts\"). examples complex can achieved accessing cell metadata.","code":"obs_df <- census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(column_names = c(\"cell_type\", \"is_primary_data\")) as.data.frame(obs_df$concat()) #>                            cell_type is_primary_data #> 1                    oligodendrocyte           FALSE #> 2     oligodendrocyte precursor cell           FALSE #> 3   astrocyte of the cerebral cortex           FALSE #> 4   astrocyte of the cerebral cortex           FALSE #> 5   astrocyte of the cerebral cortex           FALSE #> 6     oligodendrocyte precursor cell           FALSE #> 7   astrocyte of the cerebral cortex           FALSE #> 8                    microglial cell           FALSE #> 9   astrocyte of the cerebral cortex           FALSE #> 10  astrocyte of the cerebral cortex           FALSE #> 11  astrocyte of the cerebral cortex           FALSE #> 12  astrocyte of the cerebral cortex           FALSE #> 13  astrocyte of the cerebral cortex           FALSE #> 14  astrocyte of the cerebral cortex           FALSE #> 15  astrocyte of the cerebral cortex           FALSE #> 16    oligodendrocyte precursor cell           FALSE #> 17                   oligodendrocyte           FALSE #> 18  astrocyte of the cerebral cortex           FALSE #> 19  astrocyte of the cerebral cortex           FALSE #> 20  astrocyte of the cerebral cortex           FALSE #> 21  astrocyte of the cerebral cortex           FALSE #> 22  astrocyte of the cerebral cortex           FALSE #> 23    oligodendrocyte precursor cell           FALSE #> 24  astrocyte of the cerebral cortex           FALSE #> 25  astrocyte of the cerebral cortex           FALSE #> 26    oligodendrocyte precursor cell           FALSE #> 27                   microglial cell           FALSE #> 28                   oligodendrocyte           FALSE #> 29  astrocyte of the cerebral cortex           FALSE #> 30  cerebral cortex endothelial cell           FALSE #> 31                   microglial cell           FALSE #> 32                   microglial cell           FALSE #> 33                   microglial cell           FALSE #> 34                   oligodendrocyte           FALSE #> 35                   oligodendrocyte           FALSE #> 36                   microglial cell           FALSE #> 37                   oligodendrocyte           FALSE #> 38                   oligodendrocyte           FALSE #> 39  astrocyte of the cerebral cortex           FALSE #> 40                   oligodendrocyte           FALSE #> 41  astrocyte of the cerebral cortex           FALSE #> 42                   oligodendrocyte           FALSE #> 43    oligodendrocyte precursor cell           FALSE #> 44                   oligodendrocyte           FALSE #> 45  astrocyte of the cerebral cortex           FALSE #> 46    oligodendrocyte precursor cell           FALSE #> 47                   oligodendrocyte           FALSE #> 48    oligodendrocyte precursor cell           FALSE #> 49  astrocyte of the cerebral cortex           FALSE #> 50  astrocyte of the cerebral cortex           FALSE #> 51  astrocyte of the cerebral cortex           FALSE #> 52                   oligodendrocyte           FALSE #> 53                   oligodendrocyte           FALSE #> 54                   oligodendrocyte           FALSE #> 55  astrocyte of the cerebral cortex           FALSE #> 56  cerebral cortex endothelial cell           FALSE #> 57                   oligodendrocyte           FALSE #> 58                   oligodendrocyte           FALSE #> 59                   oligodendrocyte           FALSE #> 60                   microglial cell           FALSE #> 61                   microglial cell           FALSE #> 62    oligodendrocyte precursor cell           FALSE #> 63    oligodendrocyte precursor cell           FALSE #> 64                   oligodendrocyte           FALSE #> 65    oligodendrocyte precursor cell           FALSE #> 66                   oligodendrocyte           FALSE #> 67  astrocyte of the cerebral cortex           FALSE #> 68                   oligodendrocyte           FALSE #> 69    oligodendrocyte precursor cell           FALSE #> 70                   oligodendrocyte           FALSE #> 71  astrocyte of the cerebral cortex           FALSE #> 72  astrocyte of the cerebral cortex           FALSE #> 73  astrocyte of the cerebral cortex           FALSE #> 74    oligodendrocyte precursor cell           FALSE #> 75  astrocyte of the cerebral cortex           FALSE #> 76    oligodendrocyte precursor cell           FALSE #> 77                   microglial cell           FALSE #> 78                   microglial cell           FALSE #> 79    oligodendrocyte precursor cell           FALSE #> 80                   oligodendrocyte           FALSE #> 81                   oligodendrocyte           FALSE #> 82  astrocyte of the cerebral cortex           FALSE #> 83                   oligodendrocyte           FALSE #> 84  astrocyte of the cerebral cortex           FALSE #> 85  astrocyte of the cerebral cortex           FALSE #> 86                   oligodendrocyte           FALSE #> 87  astrocyte of the cerebral cortex           FALSE #> 88                   oligodendrocyte           FALSE #> 89    oligodendrocyte precursor cell           FALSE #> 90    oligodendrocyte precursor cell           FALSE #> 91  astrocyte of the cerebral cortex           FALSE #> 92  astrocyte of the cerebral cortex           FALSE #> 93  astrocyte of the cerebral cortex           FALSE #> 94                   oligodendrocyte           FALSE #> 95  astrocyte of the cerebral cortex           FALSE #> 96  astrocyte of the cerebral cortex           FALSE #> 97                   oligodendrocyte           FALSE #> 98                   oligodendrocyte           FALSE #> 99    oligodendrocyte precursor cell           FALSE #> 100                  oligodendrocyte           FALSE #> 101                  oligodendrocyte           FALSE #> 102                  oligodendrocyte           FALSE #> 103 astrocyte of the cerebral cortex           FALSE #> 104   oligodendrocyte precursor cell           FALSE #> 105                  oligodendrocyte           FALSE #> 106   oligodendrocyte precursor cell           FALSE #> 107                  oligodendrocyte           FALSE #> 108                  oligodendrocyte           FALSE #> 109                  oligodendrocyte           FALSE #> 110                  oligodendrocyte           FALSE #> 111   oligodendrocyte precursor cell           FALSE #> 112                  oligodendrocyte           FALSE #> 113                  oligodendrocyte           FALSE #> 114 astrocyte of the cerebral cortex           FALSE #> 115                  oligodendrocyte           FALSE #> 116 astrocyte of the cerebral cortex           FALSE #> 117                  oligodendrocyte           FALSE #> 118                  oligodendrocyte           FALSE #> 119                  oligodendrocyte           FALSE #> 120 astrocyte of the cerebral cortex           FALSE #> 121 astrocyte of the cerebral cortex           FALSE #> 122   oligodendrocyte precursor cell           FALSE #> 123                  microglial cell           FALSE #> 124 astrocyte of the cerebral cortex           FALSE #> 125 astrocyte of the cerebral cortex           FALSE #> 126                  microglial cell           FALSE #> 127 cerebral cortex endothelial cell           FALSE #> 128   oligodendrocyte precursor cell           FALSE #>  [ reached 'max' / getOption(\"max.print\") -- omitted 62998289 rows ] obs_df <- census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(   column_names = \"cell_type\",   value_filter = \"is_primary_data == TRUE\" )  obs_df <- as.data.frame(obs_df$concat()) nrow(obs_df) #> [1] 36227903 human_cell_type_counts <- table(obs_df$cell_type) sort(human_cell_type_counts, decreasing = TRUE)[1:10] #>  #>                                                             neuron  #>                                                            2815336  #>                                               glutamatergic neuron  #>                                                            1563446  #>                                    CD4-positive, alpha-beta T cell  #>                                                            1243885  #>                                    CD8-positive, alpha-beta T cell  #>                                                            1197715  #> L2/3-6 intratelencephalic projecting glutamatergic cortical neuron  #>                                                            1123360  #>                                                    oligodendrocyte  #>                                                            1063874  #>                                                 classical monocyte  #>                                                            1030996  #>                                                        native cell  #>                                                            1011949  #>                                                             B cell  #>                                                             934060  #>                                                natural killer cell  #>                                                             770637 length(human_cell_type_counts) #> [1] 610"},{"path":"/articles/comp_bio_census_info.html","id":"example-cell-types-available-in-human-liver","dir":"Articles","previous_headings":"Understanding Census contents beyond the summary tables","what":"Example: cell types available in human liver","title":"Learning about the CZ CELLxGENE Census","text":"Similar example , can learn cell types available specific tissue, e.g. liver. achieve goal just need limit cell metadata tissue. use information cell metadata variable tissue_general. variable contains high-level tissue label cells Census: cell types cell counts human liver.","code":"obs_liver_df <- census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(   column_names = \"cell_type\",   value_filter = \"is_primary_data == TRUE & tissue_general == 'liver'\" )  obs_liver_df <- as.data.frame(obs_liver_df$concat())  sort(table(obs_liver_df$cell_type), decreasing = TRUE)[1:10] #>  #>                          T cell                     hepatoblast  #>                           85739                           58447  #>                 neoplastic cell                    erythroblast  #>                           52431                           45605  #>                        monocyte                      hepatocyte  #>                           31388                           28309  #>             natural killer cell    periportal region hepatocyte  #>                           26871                           23509  #>                      macrophage centrilobular region hepatocyte  #>                           16707                           15819"},{"path":"/articles/comp_bio_census_info.html","id":"example-diseased-t-cells-in-human-tissues","dir":"Articles","previous_headings":"Understanding Census contents beyond the summary tables","what":"Example: diseased T cells in human tissues","title":"Learning about the CZ CELLxGENE Census","text":"example going get counts diseased cells annotated T cells. sake example focus “CD8-positive, alpha-beta T cell” “CD4-positive, alpha-beta T cell”: cell counts annotated indicated disease across human tissues “CD8-positive, alpha-beta T cell” “CD4-positive, alpha-beta T cell”.","code":"obs_t_cells_df <- census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(   column_names = c(\"disease\", \"tissue_general\"),   value_filter = \"is_primary_data == TRUE & disease != 'normal' & cell_type %in% c('CD8-positive, alpha-beta T cell', 'CD4-positive, alpha-beta T cell')\" )  obs_t_cells_df <- as.data.frame(obs_t_cells_df$concat())  print(table(obs_t_cells_df)) #>                                        tissue_general #> disease                                 adrenal gland  blood bone marrow  brain breast #>   COVID-19                                          0 819428           0      0      0 #>   Crohn disease                                     0      0           0      0      0 #>   Down syndrome                                     0      0         181      0      0 #>   breast cancer                                     0      0           0      0   1850 #>   chronic obstructive pulmonary disease             0      0           0      0      0 #>   chronic rhinitis                                  0      0           0      0      0 #>   clear cell renal carcinoma                        0   6548           0      0      0 #>   cystic fibrosis                                   0      0           0      0      0 #>   follicular lymphoma                               0      0           0      0      0 #>   influenza                                         0   8871           0      0      0 #>   interstitial lung disease                         0      0           0      0      0 #>   kidney benign neoplasm                            0      0           0      0      0 #>   kidney oncocytoma                                 0      0           0      0      0 #>   lung adenocarcinoma                             205      0           0   3274      0 #>   lung large cell carcinoma                         0      0           0      0      0 #>   lymphangioleiomyomatosis                          0      0           0      0      0 #>                                        tissue_general #> disease                                  colon kidney  liver   lung lymph node   nose #>   COVID-19                                   0      0      0  30578          0     13 #>   Crohn disease                          17490      0      0      0          0      0 #>   Down syndrome                              0      0      0      0          0      0 #>   breast cancer                              0      0      0      0          0      0 #>   chronic obstructive pulmonary disease      0      0      0   9382          0      0 #>   chronic rhinitis                           0      0      0      0          0    909 #>   clear cell renal carcinoma                 0  20540      0      0         36      0 #>   cystic fibrosis                            0      0      0      7          0      0 #>   follicular lymphoma                        0      0      0      0       1089      0 #>   influenza                                  0      0      0      0          0      0 #>   interstitial lung disease                  0      0      0   1803          0      0 #>   kidney benign neoplasm                     0     10      0      0          0      0 #>   kidney oncocytoma                          0   2303      0      0          0      0 #>   lung adenocarcinoma                        0      0    507 215013      24969      0 #>   lung large cell carcinoma                  0      0      0   5922          0      0 #>   lymphangioleiomyomatosis                   0      0      0    513          0      0 #>                                        tissue_general #> disease                                 pleural fluid respiratory system saliva #>   COVID-19                                          0                  4     41 #>   Crohn disease                                     0                  0      0 #>   Down syndrome                                     0                  0      0 #>   breast cancer                                     0                  0      0 #>   chronic obstructive pulmonary disease             0                  0      0 #>   chronic rhinitis                                  0                  0      0 #>   clear cell renal carcinoma                        0                  0      0 #>   cystic fibrosis                                   0                  0      0 #>   follicular lymphoma                               0                  0      0 #>   influenza                                         0                  0      0 #>   interstitial lung disease                         0                  0      0 #>   kidney benign neoplasm                            0                  0      0 #>   kidney oncocytoma                                 0                  0      0 #>   lung adenocarcinoma                           11558                  0      0 #>   lung large cell carcinoma                         0                  0      0 #>   lymphangioleiomyomatosis                          0                  0      0 #>                                        tissue_general #> disease                                 small intestine vasculature #>   COVID-19                                            0           0 #>   Crohn disease                                   52029           0 #>   Down syndrome                                       0           0 #>   breast cancer                                       0           0 #>   chronic obstructive pulmonary disease               0           0 #>   chronic rhinitis                                    0           0 #>   clear cell renal carcinoma                          0           0 #>   cystic fibrosis                                     0           0 #>   follicular lymphoma                                 0           0 #>   influenza                                           0           0 #>   interstitial lung disease                           0           0 #>   kidney benign neoplasm                              0           0 #>   kidney oncocytoma                                   0           0 #>   lung adenocarcinoma                                 0           0 #>   lung large cell carcinoma                           0           0 #>   lymphangioleiomyomatosis                            0           0 #>  [ reached getOption(\"max.print\") -- omitted 8 rows ]"},{"path":"/articles/comp_bio_data_integration.html","id":"finding-and-fetching-data-from-mouse-liver-10x-genomics-and-smart-seq2","dir":"Articles","previous_headings":"","what":"Finding and fetching data from mouse liver (10X Genomics and Smart-Seq2)","title":"Integrating multi-dataset slices of data with Seurat","text":"Let’s load packages needed notebook. Now can open Census. notebook use Tabula Muris Senis data liver contains cells 10X Genomics Smart-Seq2 technologies. Let’s query datasets table Census filtering collection_name “Tabula Muris Senis” dataset_title “liver”. Now can use values dataset_id query load Seurat object cells datasets. can check cell counts 10X Genomics Smart-Seq2 data looking assay metadata.","code":"library(\"cellxgene.census\") library(\"Seurat\") census <- open_soma() census_datasets <- census$get(\"census_info\")$get(\"datasets\") census_datasets <- census_datasets$read(value_filter = \"collection_name == 'Tabula Muris Senis'\") census_datasets <- as.data.frame(census_datasets$concat())  # Print rows with liver data census_datasets[grep(\"Liver\", census_datasets$dataset_title), ] #>    soma_joinid                        collection_id    collection_name #> 15         583 0b9d8a04-bb9d-44da-aa27-705bb65b54eb Tabula Muris Senis #> 36         605 0b9d8a04-bb9d-44da-aa27-705bb65b54eb Tabula Muris Senis #>               collection_doi                           dataset_id #> 15 10.1038/s41586-020-2496-1 4546e757-34d0-4d17-be06-538318925fcd #> 36 10.1038/s41586-020-2496-1 6202a243-b713-4e12-9ced-c387f8483dea #>                      dataset_version_id #> 15 0a851e26-a629-4e59-9b52-9b4d1ce4440b #> 36 70f4f091-86a9-44e3-a92a-54cee98cc223 #>                                                                                        dataset_title #> 15 Liver - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - Smart-seq2 #> 36        Liver - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - 10x #>                            dataset_h5ad_path dataset_total_cell_count #> 15 4546e757-34d0-4d17-be06-538318925fcd.h5ad                     2859 #> 36 6202a243-b713-4e12-9ced-c387f8483dea.h5ad                     7294 tabula_muris_liver_ids <- c(\"4546e757-34d0-4d17-be06-538318925fcd\", \"6202a243-b713-4e12-9ced-c387f8483dea\")  seurat_obj <- get_seurat(   census,   organism = \"Mus musculus\",   obs_value_filter = \"dataset_id %in% tabula_muris_liver_ids\" ) table(seurat_obj$assay) #>  #>  10x 3' v2 Smart-seq2  #>       7294       2859"},{"path":"/articles/comp_bio_data_integration.html","id":"gene-length-normalization-of-smart-seq2-data-","dir":"Articles","previous_headings":"","what":"Gene-length normalization of Smart-Seq2 data.","title":"Integrating multi-dataset slices of data with Seurat","text":"Smart-seq2 read counts normalized gene length. Lets first get gene lengths var.feature_length. Now can use normalize Smart-seq data. let’s split object assay. normalize Smart-seq slice using gene lengths merge back single object.","code":"smart_seq_gene_lengths <- seurat_obj$RNA[[]]$feature_length seurat_obj.list <- SplitObject(seurat_obj, split.by = \"assay\") seurat_obj.list[[\"Smart-seq2\"]][[\"RNA\"]]@counts <- seurat_obj.list[[\"Smart-seq2\"]][[\"RNA\"]]@counts / smart_seq_gene_lengths seurat_obj <- merge(seurat_obj.list[[1]], seurat_obj.list[[2]])"},{"path":"/articles/comp_bio_data_integration.html","id":"integration-with-seurat","dir":"Articles","previous_headings":"","what":"Integration with Seurat","title":"Integrating multi-dataset slices of data with Seurat","text":"use native integration capabilities Seurat. comprehensive usage best practices Seurat intergation please refer doc site Seurat.","code":""},{"path":"/articles/comp_bio_data_integration.html","id":"inspecting-data-prior-to-integration","dir":"Articles","previous_headings":"Integration with Seurat","what":"Inspecting data prior to integration","title":"Integrating multi-dataset slices of data with Seurat","text":"Let’s take look strength batch effects data. perform embedding visualization via UMAP. Let’s basic data normalization variable gene selection now perform PCA UMAP   can see batch effects strong cells cluster primarily assay cell_type. Properly integrated embedding principle cluster primarily cell_type, assay best randomly distributed.","code":"seurat_obj <- SCTransform(seurat_obj) seurat_obj <- FindVariableFeatures(seurat_obj, selection.method = \"vst\", nfeatures = 2000) seurat_obj <- RunPCA(seurat_obj, features = VariableFeatures(object = seurat_obj)) seurat_obj <- RunUMAP(seurat_obj, dims = 1:30) # By assay p1 <- DimPlot(seurat_obj, reduction = \"umap\", group.by = \"assay\") p1 # By cell type p2 <- DimPlot(seurat_obj, reduction = \"umap\", group.by = \"cell_type\") p2"},{"path":"/articles/comp_bio_data_integration.html","id":"data-integration-with-seurat","dir":"Articles","previous_headings":"Integration with Seurat","what":"Data integration with Seurat","title":"Integrating multi-dataset slices of data with Seurat","text":"Whenever query fetch Census data multiple datasets integration needs performed evidenced batch effects observed. paramaters Seurat used notebook selected model run quickly. best practices integration single-cell data using Seurat please refer documentation page. seurat_d reading article integrated cell atlas human lung health disease Sikkema et al. perfomed integration 43 datasets Lung. focus metadata Census can batch information integration.","code":""},{"path":"/articles/comp_bio_data_integration.html","id":"integration-across-datasets-using-dataset_id","dir":"Articles","previous_headings":"Integration with Seurat > Data integration with Seurat","what":"Integration across datasets using dataset_id","title":"Integrating multi-dataset slices of data with Seurat","text":"cells Census annotated dataset come \"dataset_id\". great place start integration. let’s run Seurat integration pipeline. First define model batch set dataset_id. Firs normalize select variable genes seperated batch key dataset_id Now perform integration. Let’s inspect results normalization UMAP visulization. plot UMAP.   Great! can see clustering longer mainly driven assay, albeit still contributing . Great! can see clustering longer mainly driven assay, albeit still contributing .","code":"# split the dataset into a list of two seurat objects for each dataset seurat_obj.list <- SplitObject(seurat_obj, split.by = \"dataset_id\")  # normalize each dataset independently seurat_obj.list <- lapply(X = seurat_obj.list, FUN = function(x) {   x <- SCTransform(x) })  # select features for integration features <- SelectIntegrationFeatures(object.list = seurat_obj.list) seurat_obj.list <- PrepSCTIntegration(seurat_obj.list, anchor.features = features) seurat_obj.anchors <- FindIntegrationAnchors(object.list = seurat_obj.list, anchor.features = features, normalization.method = \"SCT\") seurat_obj.combined <- IntegrateData(anchorset = seurat_obj.anchors, normalization.method = \"SCT\") DefaultAssay(seurat_obj.combined) <- \"integrated\"  # Run the standard workflow for visualization and clustering seurat_obj.combined <- ScaleData(seurat_obj.combined, verbose = FALSE) seurat_obj.combined <- RunPCA(seurat_obj.combined, npcs = 30, verbose = FALSE) seurat_obj.combined <- RunUMAP(seurat_obj.combined, reduction = \"pca\", dims = 1:30) # By assay p1 <- DimPlot(seurat_obj.combined, reduction = \"umap\", group.by = \"assay\") p1 # By cell type p2 <- DimPlot(seurat_obj.combined, reduction = \"umap\", group.by = \"cell_type\") p2"},{"path":"/articles/comp_bio_data_integration.html","id":"integration-across-datasets-using-dataset_id-and-controlling-for-batch-using-donor_id","dir":"Articles","previous_headings":"Integration with Seurat > Data integration with Seurat","what":"Integration across datasets using dataset_id and controlling for batch using donor_id","title":"Integrating multi-dataset slices of data with Seurat","text":"Similar dataset_id, cells Census annotated donor_id. definition donor_id depends dataset left discretion data curators. However still rich information can used batch variable integration. donor_id guaranteed unique across cells Census, strongly recommend concatenating dataset_id donor_id use batch separator Seurat Now perform integration. inspect new results UMAP. Plot UMAP.   can see using dataset_id donor_id batch cells now mostly cluster cell type.","code":"# split the dataset into a list of two seurat objects for each dataset seurat_obj.list <- SplitObject(seurat_obj, split.by = \"dataset_id\")  # normalize each dataset independently controlling for batch seurat_obj.list <- lapply(X = seurat_obj.list, FUN = function(x) {   x <- SCTransform(x, vars.to.regress = \"donor_id\") })  # select features for integration features <- SelectIntegrationFeatures(object.list = seurat_obj.list) seurat_obj.list <- PrepSCTIntegration(seurat_obj.list, anchor.features = features) seurat_obj.anchors <- FindIntegrationAnchors(object.list = seurat_obj.list, anchor.features = features, normalization.method = \"SCT\") #> Finding all pairwise anchors #> Running CCA #> Merging objects #> Finding neighborhoods #> Finding anchors #>  Found 7161 anchors #> Filtering anchors #>  Retained 4990 anchors seurat_obj.combined <- IntegrateData(anchorset = seurat_obj.anchors, normalization.method = \"SCT\") #> [1] 1 #> Warning: Different cells and/or features from existing assay SCT #> [1] 2 #> Warning: Different cells and/or features from existing assay SCT #> Merging dataset 1 into 2 #> Extracting anchors for merged samples #> Finding integration vectors #> Finding integration vector weights #> Integrating data #> Warning: Assay integrated changing from Assay to SCTAssay  #> Warning: Different cells and/or features from existing assay SCT DefaultAssay(seurat_obj.combined) <- \"integrated\"  # Run the standard workflow for visualization and clustering seurat_obj.combined <- RunPCA(seurat_obj.combined, npcs = 30, verbose = FALSE) seurat_obj.combined <- RunUMAP(seurat_obj.combined, reduction = \"pca\", dims = 1:30) #> 17:51:31 UMAP embedding parameters a = 0.9922 b = 1.112 #> 17:51:31 Read 10153 rows and found 30 numeric columns #> 17:51:31 Using Annoy for neighbor search, n_neighbors = 30 #> 17:51:31 Building Annoy index with metric = cosine, n_trees = 50 #> 0%   10   20   30   40   50   60   70   80   90   100% #> [----|----|----|----|----|----|----|----|----|----| #> **************************************************| #> 17:51:33 Writing NN index file to temp file /tmp/RtmpHvsscV/file11b137946e7a4 #> 17:51:33 Searching Annoy index using 1 thread, search_k = 3000 #> 17:51:37 Annoy recall = 100% #> 17:51:38 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30 #> 17:51:39 Initializing from normalized Laplacian + noise (using RSpectra) #> 17:51:39 Commencing optimization for 200 epochs, with 409958 positive edges #> 17:51:44 Optimization finished # By assay p1 <- DimPlot(seurat_obj.combined, reduction = \"umap\", group.by = \"assay\") p1 # By cell type p2 <- DimPlot(seurat_obj.combined, reduction = \"umap\", group.by = \"cell_type\") p2"},{"path":"/articles/comp_bio_data_integration.html","id":"integration-across-datasets-using-dataset_id-and-controlling-for-batch-using-donor_id-assay_ontology_term_id-suspension_type-","dir":"Articles","previous_headings":"Integration with Seurat > Data integration with Seurat","what":"Integration across datasets using dataset_id and controlling for batch using donor_id + assay_ontology_term_id + suspension_type.","title":"Integrating multi-dataset slices of data with Seurat","text":"cases one dataset may contain multiple assay types /multiple suspension types (cell vs nucleus), important consider metadata batches. Therefore, comprehensive definition batch Census can accomplished combining cell metadata dataset_id, donor_id, assay_ontology_term_id suspension_type, latter encode EFO ids assay types. example, two datasets used contain cells one assay , one suspension type . Thus make difference include metadata part batch. implementation look line","code":"# EXAMPLE, DON'T RUN.  # split the dataset into a list of seurat objects for each dataset seurat_obj.list <- SplitObject(seurat_obj, split.by = \"dataset_id\")  # normalize each dataset independently controlling for batch seurat_obj.list <- lapply(X = seurat_obj.list, FUN = function(x) {   x <- SCTransform(x, vars.to.regress = c(\"donor_id\", \"assay_ontology_term_id\", \"suspension_type\")) })  # select features for integration features <- SelectIntegrationFeatures(object.list = seurat_obj.list)  # integrate seurat_obj.list <- PrepSCTIntegration(seurat_obj.list, anchor.features = features) seurat_obj.anchors <- FindIntegrationAnchors(object.list = seurat_obj.list, anchor.features = features, normalization.method = \"SCT\") seurat_obj.combined <- IntegrateData(anchorset = seurat_obj.anchors, normalization.method = \"SCT\")"},{"path":"/articles/comp_bio_normalizing_full_gene_sequencing.html","id":"opening-the-census","dir":"Articles","previous_headings":"","what":"Opening the census","title":"Normalizing full-length gene sequencing data","text":"First open Census: can learn cellxgene.census methods accessing corresponding documentation, example ?cellxgene.census::open_soma.","code":"library(\"Seurat\") census <- cellxgene.census::open_soma()"},{"path":"/articles/comp_bio_normalizing_full_gene_sequencing.html","id":"fetching-full-length-example-sequencing-data-smart-seq","dir":"Articles","previous_headings":"","what":"Fetching full-length example sequencing data (Smart-Seq)","title":"Normalizing full-length gene sequencing data","text":"Let’s get example data, case ’ll fetch cells relatively small dataset derived Smart-Seq2 technology performs full-length gene sequencing: Collection: Tabula Muris Senis Dataset: Liver - single-cell transcriptomic atlas characterizes ageing tissues mouse - Smart-seq2 Let’s first find dataset’s id using dataset table Census. Now can use id fetch data. Let’s make sure data contains Smart-Seq2 cells. Great! can see small dataset containing 2,859 cells. Now let’s proceed normalize gene lengths.","code":"liver_dataset <- as.data.frame(   census$get(\"census_info\")$get(\"datasets\")   $read(value_filter = \"dataset_title == 'Liver - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - Smart-seq2'\")   $concat() ) liver_dataset #>   soma_joinid                        collection_id    collection_name #> 1         583 0b9d8a04-bb9d-44da-aa27-705bb65b54eb Tabula Muris Senis #>              collection_doi                           dataset_id #> 1 10.1038/s41586-020-2496-1 4546e757-34d0-4d17-be06-538318925fcd #>                     dataset_version_id #> 1 0a851e26-a629-4e59-9b52-9b4d1ce4440b #>                                                                                       dataset_title #> 1 Liver - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - Smart-seq2 #>                           dataset_h5ad_path dataset_total_cell_count #> 1 4546e757-34d0-4d17-be06-538318925fcd.h5ad                     2859 liver_dataset_id <- liver_dataset[1, \"dataset_id\"] liver_seurat <- cellxgene.census::get_seurat(   census,   organism = \"Mus musculus\",   obs_value_filter = paste0(\"dataset_id == '\", liver_dataset_id, \"'\") ) table(liver_seurat$assay) #>  #> Smart-seq2  #>       2859"},{"path":"/articles/comp_bio_normalizing_full_gene_sequencing.html","id":"normalizing-expression-to-account-for-gene-length","dir":"Articles","previous_headings":"","what":"Normalizing expression to account for gene length","title":"Normalizing full-length gene sequencing data","text":"default cellxgene_census::get_seurat() fetches genes Census. let’s first identify genes measured dataset subset Seurat obect include . goal can use “Dataset Presence Matrix” census$get(\"census_data\")$get(\"mus_musculus\")$ms$get(\"RNA\")$get(\"feature_dataset_presence_matrix\"). boolean matrix N x M N number datasets, M number genes Census, 1 entry indicates gene measured dataset. (Note Seurat objects transposed layout M x N.) Let’s get genes measured dataset. can see genes Census 17,992 measured dataset. Now let’s normalize genes gene length. can easily Census gene lengths included gene metadata feature_length. done! can now see real numbers instead integers.","code":"liver_seurat #> An object of class Seurat  #> 52417 features across 2859 samples within 1 assay  #> Active assay: RNA (52417 features, 0 variable features) #>  2 layers present: counts, data liver_dataset_joinid <- liver_dataset$soma_joinid[1] presence_matrix <- cellxgene.census::get_presence_matrix(census, \"Mus musculus\", \"RNA\") presence_matrix <- presence_matrix$take(liver_dataset_joinid) gene_presence <- as.vector(presence_matrix$get_one_based_matrix())  liver_seurat <- liver_seurat[gene_presence, ] liver_seurat #> An object of class Seurat  #> 17992 features across 2859 samples within 1 assay  #> Active assay: RNA (17992 features, 0 variable features) #>  2 layers present: counts, data GetAssayData(liver_seurat[1:5, 1:5], slot = \"data\") #> Warning: The `slot` argument of `GetAssayData()` is deprecated as of SeuratObject 5.0.0. #> i Please use the `layer` argument instead. #> This warning is displayed once every 8 hours. #> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated. #> 5 x 5 sparse Matrix of class \"dgCMatrix\" #>                    cell3959639 cell3959640 cell3959641 cell3959642 cell3959643 #> ENSMUSG00000025900           .           .           .           .           . #> ENSMUSG00000025902           .           .           .           .        2250 #> ENSMUSG00000033845           .         559        1969           .           . #> ENSMUSG00000025903           .           .           .           .           . #> ENSMUSG00000033813           .           .         828           1          54 gene_lengths <- liver_seurat$RNA@meta.features$feature_length liver_seurat <- SetAssayData(   liver_seurat,   new.data = sweep(GetAssayData(liver_seurat, slot = \"data\"), 1, gene_lengths, \"/\") ) GetAssayData(liver_seurat[1:5, 1:5], slot = \"data\") #> 5 x 5 sparse Matrix of class \"dgCMatrix\" #>                    cell3959639 cell3959640 cell3959641  cell3959642 cell3959643 #> ENSMUSG00000025900           .  .            .         .             .          #> ENSMUSG00000025902           .  .            .         .             0.47150042 #> ENSMUSG00000033845           .  0.06586544   0.2320019 .             .          #> ENSMUSG00000025903           .  .            .         .             .          #> ENSMUSG00000033813           .  .            0.2744448 0.0003314551  0.01789857"},{"path":"/articles/comp_bio_normalizing_full_gene_sequencing.html","id":"validation-through-clustering-exploration","dir":"Articles","previous_headings":"","what":"Validation through clustering exploration","title":"Normalizing full-length gene sequencing data","text":"Let’s perform basic clustering analysis see cell types cluster expected using normalized counts. First basic filtering cells genes. normalize account sequencing depth transform data log scale. subset highly variable genes. finally scale values across gene axis. Now can proceed clustering analysis.  exceptions can see cells cell type cluster near serves sanity check gene-length normalization applied. Don’t forget close census.","code":"cells_per_gene <- rowSums(GetAssayData(liver_seurat, slot = \"counts\") > 0) genes_per_cell <- Matrix::colSums(liver_seurat$RNA@counts > 0) liver_seurat <- liver_seurat[cells_per_gene >= 5, genes_per_cell >= 500] liver_seurat <- Seurat::NormalizeData(   liver_seurat,   normalization.method = \"LogNormalize\",   scale.factor = 10000 ) liver_seurat <- Seurat::FindVariableFeatures(   liver_seurat,   selection.method = \"vst\",   nfeatures = 1000 ) all.genes <- rownames(liver_seurat) liver_seurat <- Seurat::ScaleData(liver_seurat, features = all.genes) liver_seurat <- RunPCA(   liver_seurat,   features = VariableFeatures(object = liver_seurat) ) liver_seurat <- FindNeighbors(liver_seurat, dims = 1:40) liver_seurat <- RunUMAP(liver_seurat, dims = 1:40) DimPlot(liver_seurat, reduction = \"umap\", group.by = \"cell_type\") census$close()"},{"path":"/articles/comp_bio_summarize_axis_query.html","id":"opening-the-census","dir":"Articles","previous_headings":"","what":"Opening the Census","title":"Summarizing cell and gene metadata","text":"cellxgene.census R package contains convenient API open version Census (default, newest stable version). open Census, close census$close(). can automated using .exit(census$close(), add = TRUE) immediately census <- open_soma(). can learn cellxgene.census methods accessing corresponding documentation. example ?cellxgene.census::open_soma.","code":"library(\"cellxgene.census\") census <- open_soma()"},{"path":"/articles/comp_bio_summarize_axis_query.html","id":"summarizing-cell-metadata","dir":"Articles","previous_headings":"","what":"Summarizing cell metadata","title":"Summarizing cell and gene metadata","text":"Census open can use TileDB-SOMA methods SOMACollection. can thus access metadata SOMADataFrame objects encoding cell gene metadata. Tips: can read entire SOMADataFrame R using .data.frame(soma_df$read()$concat()). Queries much faster request DataFrame columns required analysis (e.g. column_names = c(\"soma_joinid\", \"cell_type_ontology_term_id\")). can also refine query results using value_filter, filter census matching records.","code":""},{"path":"/articles/comp_bio_summarize_axis_query.html","id":"example-summarize-all-cell-types","dir":"Articles","previous_headings":"Summarizing cell metadata","what":"Example: Summarize all cell types","title":"Summarizing cell and gene metadata","text":"example reads cell metadata (obs) R data frame summarize variety ways.","code":"human <- census$get(\"census_data\")$get(\"homo_sapiens\")  # Read obs into an R data frame (tibble). obs_df <- human$obs$read(column_names = c(\"cell_type\")) obs_df <- as.data.frame(obs_df$concat())  # Find all unique values in the cell_type column. unique_cell_type <- unique(obs_df$cell_type)  cat(   \"There are\",   length(unique_cell_type),   \"cell types in the Census! The first few are: \",   paste(head(unique_cell_type), collapse = \", \") ) #> There are 631 cell types in the Census! The first few are:  oligodendrocyte, oligodendrocyte precursor cell, astrocyte of the cerebral cortex, microglial cell, cerebral cortex endothelial cell, vascular leptomeningeal cell"},{"path":"/articles/comp_bio_summarize_axis_query.html","id":"example-summarize-a-subset-of-cell-types-selected-with-a-value_filter","dir":"Articles","previous_headings":"Summarizing cell metadata","what":"Example: Summarize a subset of cell types, selected with a value_filter","title":"Summarizing cell and gene metadata","text":"example utilizes SOMA “value filter” read subset cells tissue_ontology_term_id equal UBERON:0002048 (lung tissue), summarizes query result. can also define much complex value filters. example: combine terms & | use %% operator query multiple values","code":"# Read cell_type terms for cells which have a specific tissue term LUNG_TISSUE <- \"UBERON:0002048\"  obs_df <- human$obs$read(column_names = c(\"cell_type\"), value_filter = paste0(\"tissue_ontology_term_id == '\", LUNG_TISSUE, \"'\")) obs_df <- as.data.frame(obs_df$concat())  # Find all unique values in the cell_type column as an R data frame. unique_cell_type <- unique(obs_df$cell_type) cat(   \"There are \",   length(unique_cell_type),   \" cell types in the Census where tissue_ontology_term_id == \",   LUNG_TISSUE,   \"!\\nThe first few are:\",   paste(head(unique_cell_type), collapse = \", \"),   \"\\n\" ) #> There are  185  cell types in the Census where tissue_ontology_term_id ==  UBERON:0002048 ! #> The first few are: type II pneumocyte, neutrophil, effector CD4-positive, alpha-beta T cell, effector CD8-positive, alpha-beta T cell, mature NK T cell, blood vessel endothelial cell  # Report the 10 most common top_10 <- sort(table(obs_df$cell_type), decreasing = TRUE)[1:10] cat(   \"The top 10 cell types where tissue_ontology_term_id ==\",   LUNG_TISSUE,   \"are: \",   paste(names(top_10), collapse = \", \") ) #> The top 10 cell types where tissue_ontology_term_id == UBERON:0002048 are:  native cell, alveolar macrophage, CD8-positive, alpha-beta T cell, CD4-positive, alpha-beta T cell, macrophage, type II pneumocyte, classical monocyte, natural killer cell, malignant cell, epithelial cell of lower respiratory tract # You can also do more complex queries, such as testing for inclusion in a list of values obs_df <- human$obs$read(   column_names = c(\"cell_type_ontology_term_id\"),   value_filter = \"tissue_ontology_term_id %in% c('UBERON:0002082', 'UBERON:OOO2084', 'UBERON:0002080')\" )  obs_df <- as.data.frame(obs_df$concat())  # Summarize top_10 <- sort(table(obs_df$cell_type_ontology_term_id), decreasing = TRUE)[1:10] print(top_10) #>  #> CL:0000746 CL:0008034 CL:0002131 CL:0002548 CL:0000115 CL:0000763 CL:0000057 CL:0000669  #>     160974      99458      96953      79733      79626      35560      33075      27515  #> CL:0000003 CL:0002144  #>      23613      18593"},{"path":"/articles/comp_bio_summarize_axis_query.html","id":"full-census-metadata-stats","dir":"Articles","previous_headings":"","what":"Full Census metadata stats","title":"Summarizing cell and gene metadata","text":"example queries organisms Census, summarizes diversity various metadata labels.","code":"cols_to_query <- c(   \"cell_type_ontology_term_id\",   \"assay_ontology_term_id\",   \"tissue_ontology_term_id\" )  total_cells <- 0 for (organism in census$get(\"census_data\")$names()) {   print(organism)    obs_df <- census$get(\"census_data\")$get(organism)$obs$read(column_names = cols_to_query)   obs_df <- as.data.frame(obs_df$concat())    total_cells <- total_cells + nrow(obs_df)   for (col in cols_to_query) {     cat(\"  Unique \", col, \" values: \", length(unique(obs_df[[col]])), \"\\n\")   } } #> [1] \"homo_sapiens\" #>   Unique  cell_type_ontology_term_id  values:  631  #>   Unique  assay_ontology_term_id  values:  20  #>   Unique  tissue_ontology_term_id  values:  230  #> [1] \"mus_musculus\" #>   Unique  cell_type_ontology_term_id  values:  248  #>   Unique  assay_ontology_term_id  values:  10  #>   Unique  tissue_ontology_term_id  values:  74 cat(\"Complete Census contains \", total_cells, \" cells.\") #> Complete Census contains  68683222  cells."},{"path":"/articles/comp_bio_summarize_axis_query.html","id":"close-the-census","dir":"Articles","previous_headings":"Full Census metadata stats","what":"Close the census","title":"Summarizing cell and gene metadata","text":"use, census object closed release memory resources. also closes SOMA objects accessed via top-level census. Closing can automated using .exit(census$close(), add = TRUE) immediately census <- open_soma().","code":"census$close()"},{"path":"/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Chan Zuckerberg Initiative Foundation. Author, maintainer, copyright holder, funder.","code":""},{"path":"/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Chan Zuckerberg Initiative Foundation (2024). cellxgene.census: CZ CELLxGENE Discover Cell Census. R package version 1.13.0, https://github.com/chanzuckerberg/cellxgene-census.","code":"@Manual{,   title = {cellxgene.census: CZ CELLxGENE Discover Cell Census},   author = {{Chan Zuckerberg Initiative Foundation}},   year = {2024},   note = {R package version 1.13.0},   url = {https://github.com/chanzuckerberg/cellxgene-census}, }"},{"path":"/index.html","id":"r-package-of-cz-cellxgene-discover-census","dir":"","previous_headings":"","what":"CZ CELLxGENE Discover Cell Census","title":"CZ CELLxGENE Discover Cell Census","text":"documentation R package cellxgene.census part CZ CELLxGENE Discover Census. full details Census data capabilities please go main Census site. cellxgene.census provides API efficiently access cloud-hosted Census single-cell data R. just seconds users can access slice Census data using cell gene filters across hundreds single-cell datasets. Census data can fetched iterative fashion bigger--memory slices data, quickly exported basic R structures, well Seurat SingleCellExperiment objects downstream analysis.","code":""},{"path":"/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"CZ CELLxGENE Discover Cell Census","text":"installing Ubuntu, may need install following libraries via apt install, libxml2-dev libssl-dev libcurl4-openssl-dev. addition must cmake v3.21 greater. installing MacOS, need install developer tools Xcode. Windows supported. R session install cellxgene.census R-Universe. able export Census data Seurat SingleCellExperiment also need install respective packages.","code":"install.packages(   \"cellxgene.census\",   repos=c('https://chanzuckerberg.r-universe.dev', 'https://cloud.r-project.org') ) # Seurat install.packages(\"Seurat\")  # SingleCellExperiment if (!require(\"BiocManager\", quietly = TRUE))     install.packages(\"BiocManager\")  BiocManager::install(\"SingleCellExperiment\")"},{"path":"/index.html","id":"usage","dir":"","previous_headings":"","what":"Usage","title":"CZ CELLxGENE Discover Cell Census","text":"Check vignettes “Articles” section navigation bar site. highly recommend following vignettes starting point: Querying fetching single-cell data cell/gene metadata Learning CZ CELLxGENE Discover Census can also check quick start guide main Census site.","code":""},{"path":"/index.html","id":"example-seurat-and-singlecellexperiment-query","dir":"","previous_headings":"Usage","what":"Example Seurat and SingleCellExperiment query","title":"CZ CELLxGENE Discover Cell Census","text":"following creates Seurat object -demand sympathetic neurons Census filtering genes ENSG00000161798, ENSG00000188229. following retrieves data SingleCellExperiment object.","code":"library(\"cellxgene.census\") library(\"Seurat\")  census <- open_soma()  organism <- \"Homo sapiens\" gene_filter <- \"feature_id %in% c('ENSG00000107317', 'ENSG00000106034')\" cell_filter <-  \"cell_type == 'sympathetic neuron'\" cell_columns <- c(\"assay\", \"cell_type\", \"tissue\", \"tissue_general\", \"suspension_type\", \"disease\")  seurat_obj <- get_seurat(    census = census,    organism = organism,    var_value_filter = gene_filter,    obs_value_filter = cell_filter,    obs_column_names = cell_columns ) library(\"SingleCellExperiment\")  sce_obj <- get_single_cell_experiment(    census = census,    organism = organism,    var_value_filter = gene_filter,    obs_value_filter = cell_filter,    obs_column_names = cell_columns )"},{"path":"/index.html","id":"for-more-help","dir":"","previous_headings":"","what":"For More Help","title":"CZ CELLxGENE Discover Cell Census","text":"help, please go visit main Census site. believe found security issue, appreciate notification. Please send email security@chanzuckerberg.com.","code":""},{"path":"/reference/download_source_h5ad.html","id":null,"dir":"Reference","previous_headings":"","what":"Download source H5AD to local file name. — download_source_h5ad","title":"Download source H5AD to local file name. — download_source_h5ad","text":"Download source H5AD local file name.","code":""},{"path":"/reference/download_source_h5ad.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Download source H5AD to local file name. — download_source_h5ad","text":"","code":"download_source_h5ad(   dataset_id,   file,   overwrite = FALSE,   census_version = \"stable\",   census = NULL )"},{"path":"/reference/download_source_h5ad.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Download source H5AD to local file name. — download_source_h5ad","text":"dataset_id dataset_id interest. file Local file name store H5AD file. overwrite TRUE allow overwriting existing file. census_version desired Census version. census open Census handle census_version. provided, opened closed automatically; efficient reuse handle calling download_source_h5ad() multiple times.","code":""},{"path":"/reference/download_source_h5ad.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Download source H5AD to local file name. — download_source_h5ad","text":"","code":"download_source_h5ad(\"0895c838-e550-48a3-a777-dbcd35d30272\", \"/tmp/data.h5ad\", overwrite = TRUE)"},{"path":"/reference/get_census_mirror.html","id":null,"dir":"Reference","previous_headings":"","what":"Get locator information about a Census mirror — get_census_mirror","title":"Get locator information about a Census mirror — get_census_mirror","text":"Get locator information Census mirror","code":""},{"path":"/reference/get_census_mirror.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get locator information about a Census mirror — get_census_mirror","text":"","code":"get_census_mirror(mirror)"},{"path":"/reference/get_census_mirror.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get locator information about a Census mirror — get_census_mirror","text":"mirror Name mirror.","code":""},{"path":"/reference/get_census_mirror.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get locator information about a Census mirror — get_census_mirror","text":"List mirror information","code":""},{"path":"/reference/get_census_mirror.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get locator information about a Census mirror — get_census_mirror","text":"","code":"get_census_mirror(\"AWS-S3-us-west-2\") #> $provider #> [1] \"S3\" #>  #> $base_uri #> [1] \"s3://cellxgene-census-public-us-west-2/\" #>  #> $region #> [1] \"us-west-2\" #>  #> $alias #> [1] \"\" #>"},{"path":"/reference/get_census_mirror_directory.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the directory of Census mirrors currently available — get_census_mirror_directory","title":"Get the directory of Census mirrors currently available — get_census_mirror_directory","text":"Get directory Census mirrors currently available","code":""},{"path":"/reference/get_census_mirror_directory.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the directory of Census mirrors currently available — get_census_mirror_directory","text":"","code":"get_census_mirror_directory()"},{"path":"/reference/get_census_mirror_directory.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the directory of Census mirrors currently available — get_census_mirror_directory","text":"Nested list information available mirrors","code":""},{"path":"/reference/get_census_mirror_directory.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the directory of Census mirrors currently available — get_census_mirror_directory","text":"","code":"get_census_mirror_directory() #> $default #> $default$provider #> [1] \"S3\" #>  #> $default$base_uri #> [1] \"s3://cellxgene-census-public-us-west-2/\" #>  #> $default$region #> [1] \"us-west-2\" #>  #> $default$alias #> [1] \"default\" #>  #>  #> $`AWS-S3-us-west-2` #> $`AWS-S3-us-west-2`$provider #> [1] \"S3\" #>  #> $`AWS-S3-us-west-2`$base_uri #> [1] \"s3://cellxgene-census-public-us-west-2/\" #>  #> $`AWS-S3-us-west-2`$region #> [1] \"us-west-2\" #>  #> $`AWS-S3-us-west-2`$alias #> [1] \"\" #>  #>"},{"path":"/reference/get_census_version_description.html","id":null,"dir":"Reference","previous_headings":"","what":"Get release description for a Census version — get_census_version_description","title":"Get release description for a Census version — get_census_version_description","text":"Get release description Census version","code":""},{"path":"/reference/get_census_version_description.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get release description for a Census version — get_census_version_description","text":"","code":"get_census_version_description(census_version)"},{"path":"/reference/get_census_version_description.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get release description for a Census version — get_census_version_description","text":"census_version census version name.","code":""},{"path":"/reference/get_census_version_description.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get release description for a Census version — get_census_version_description","text":"List release location metadata","code":""},{"path":"/reference/get_census_version_description.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get release description for a Census version — get_census_version_description","text":"","code":"as.data.frame(get_census_version_description(\"stable\")) #>   release_date release_build #> 1                 2023-12-15 #>                                                              soma.uri #> 1 s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/ #>               soma.relative_uri soma.s3_region #> 1 /cell-census/2023-12-15/soma/      us-west-2 #>                                                              h5ads.uri #> 1 s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/h5ads/ #>               h5ads.relative_uri h5ads.s3_region do_not_delete  lts  alias #> 1 /cell-census/2023-12-15/h5ads/       us-west-2          TRUE TRUE stable #>   census_version #> 1         stable"},{"path":"/reference/get_census_version_directory.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the directory of Census releases currently available — get_census_version_directory","title":"Get the directory of Census releases currently available — get_census_version_directory","text":"Get directory Census releases currently available","code":""},{"path":"/reference/get_census_version_directory.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the directory of Census releases currently available — get_census_version_directory","text":"","code":"get_census_version_directory()"},{"path":"/reference/get_census_version_directory.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the directory of Census releases currently available — get_census_version_directory","text":"Data frame available cell census releases, including location metadata.","code":""},{"path":"/reference/get_census_version_directory.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the directory of Census releases currently available — get_census_version_directory","text":"","code":"get_census_version_directory() #>            release_date release_build #> stable                     2023-12-15 #> latest                     2024-04-01 #> 2023-05-15                 2023-05-15 #> 2023-07-25                 2023-07-25 #> 2023-12-15                 2023-12-15 #> 2024-03-04                 2024-03-04 #> 2024-03-11                 2024-03-11 #> 2024-03-12                 2024-03-12 #> 2024-03-18                 2024-03-18 #> 2024-03-25                 2024-03-25 #> 2024-03-26                 2024-03-26 #> 2024-04-01                 2024-04-01 #>                                                                       soma.uri #> stable     s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/ #> latest     s3://cellxgene-census-public-us-west-2/cell-census/2024-04-01/soma/ #> 2023-05-15 s3://cellxgene-census-public-us-west-2/cell-census/2023-05-15/soma/ #> 2023-07-25 s3://cellxgene-census-public-us-west-2/cell-census/2023-07-25/soma/ #> 2023-12-15 s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/ #> 2024-03-04 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-04/soma/ #> 2024-03-11 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-11/soma/ #> 2024-03-12 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-12/soma/ #> 2024-03-18 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-18/soma/ #> 2024-03-25 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-25/soma/ #> 2024-03-26 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-26/soma/ #> 2024-04-01 s3://cellxgene-census-public-us-west-2/cell-census/2024-04-01/soma/ #>                        soma.relative_uri soma.s3_region #> stable     /cell-census/2023-12-15/soma/      us-west-2 #> latest     /cell-census/2024-04-01/soma/      us-west-2 #> 2023-05-15 /cell-census/2023-05-15/soma/      us-west-2 #> 2023-07-25 /cell-census/2023-07-25/soma/      us-west-2 #> 2023-12-15 /cell-census/2023-12-15/soma/      us-west-2 #> 2024-03-04 /cell-census/2024-03-04/soma/      us-west-2 #> 2024-03-11 /cell-census/2024-03-11/soma/      us-west-2 #> 2024-03-12 /cell-census/2024-03-12/soma/      us-west-2 #> 2024-03-18 /cell-census/2024-03-18/soma/      us-west-2 #> 2024-03-25 /cell-census/2024-03-25/soma/      us-west-2 #> 2024-03-26 /cell-census/2024-03-26/soma/      us-west-2 #> 2024-04-01 /cell-census/2024-04-01/soma/      us-west-2 #>                                                                       h5ads.uri #> stable     s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/h5ads/ #> latest     s3://cellxgene-census-public-us-west-2/cell-census/2024-04-01/h5ads/ #> 2023-05-15 s3://cellxgene-census-public-us-west-2/cell-census/2023-05-15/h5ads/ #> 2023-07-25 s3://cellxgene-census-public-us-west-2/cell-census/2023-07-25/h5ads/ #> 2023-12-15 s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/h5ads/ #> 2024-03-04 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-04/h5ads/ #> 2024-03-11 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-11/h5ads/ #> 2024-03-12 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-12/h5ads/ #> 2024-03-18 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-18/h5ads/ #> 2024-03-25 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-25/h5ads/ #> 2024-03-26 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-26/h5ads/ #> 2024-04-01 s3://cellxgene-census-public-us-west-2/cell-census/2024-04-01/h5ads/ #>                        h5ads.relative_uri h5ads.s3_region do_not_delete  lts #> stable     /cell-census/2023-12-15/h5ads/       us-west-2          TRUE TRUE #> latest     /cell-census/2024-04-01/h5ads/       us-west-2         FALSE   NA #> 2023-05-15 /cell-census/2023-05-15/h5ads/       us-west-2          TRUE TRUE #> 2023-07-25 /cell-census/2023-07-25/h5ads/       us-west-2          TRUE TRUE #> 2023-12-15 /cell-census/2023-12-15/h5ads/       us-west-2          TRUE TRUE #> 2024-03-04 /cell-census/2024-03-04/h5ads/       us-west-2         FALSE   NA #> 2024-03-11 /cell-census/2024-03-11/h5ads/       us-west-2         FALSE   NA #> 2024-03-12 /cell-census/2024-03-12/h5ads/       us-west-2         FALSE   NA #> 2024-03-18 /cell-census/2024-03-18/h5ads/       us-west-2         FALSE   NA #> 2024-03-25 /cell-census/2024-03-25/h5ads/       us-west-2         FALSE   NA #> 2024-03-26 /cell-census/2024-03-26/h5ads/       us-west-2         FALSE   NA #> 2024-04-01 /cell-census/2024-04-01/h5ads/       us-west-2         FALSE   NA #>             alias #> stable     stable #> latest     latest #> 2023-05-15        #> 2023-07-25        #> 2023-12-15        #> 2024-03-04        #> 2024-03-11        #> 2024-03-12        #> 2024-03-18        #> 2024-03-25        #> 2024-03-26        #> 2024-04-01"},{"path":"/reference/get_presence_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Read the feature dataset presence matrix. — get_presence_matrix","title":"Read the feature dataset presence matrix. — get_presence_matrix","text":"Read feature dataset presence matrix.","code":""},{"path":"/reference/get_presence_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read the feature dataset presence matrix. — get_presence_matrix","text":"","code":"get_presence_matrix(census, organism, measurement_name = \"RNA\")"},{"path":"/reference/get_presence_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read the feature dataset presence matrix. — get_presence_matrix","text":"census census object cellxgene.census::open_soma(). organism organism query, usually one Homo sapiens Mus musculus measurement_name measurement object query. Defaults RNA.","code":""},{"path":"/reference/get_presence_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read the feature dataset presence matrix. — get_presence_matrix","text":"tiledbsoma::matrixZeroBasedView object dataset join id & feature join id dimensions, filled 1s indicating presence. sparse matrix accessed zero-based indexes since join id's may zero.","code":""},{"path":"/reference/get_presence_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read the feature dataset presence matrix. — get_presence_matrix","text":"","code":"census <- open_soma() #> The stable Census release is currently 2023-12-15. Specify census_version = \"2023-12-15\" in future calls to open_soma() to ensure data consistency. on.exit(census$close(), add = TRUE) print(get_presence_matrix(census, \"Homo sapiens\")$dim()) #> Error in private$check_open_for_read_or_write(): Item must be open for read or write. s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/"},{"path":"/reference/get_seurat.html","id":null,"dir":"Reference","previous_headings":"","what":"Export Census slices to Seurat — get_seurat","title":"Export Census slices to Seurat — get_seurat","text":"Convenience wrapper around SOMAExperimentAxisQuery, build execute query, return Seurat object.","code":""},{"path":"/reference/get_seurat.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Export Census slices to Seurat — get_seurat","text":"","code":"get_seurat(   census,   organism,   measurement_name = \"RNA\",   X_layers = c(counts = \"raw\", data = NULL),   obs_value_filter = NULL,   obs_coords = NULL,   obs_column_names = NULL,   obsm_layers = FALSE,   var_value_filter = NULL,   var_coords = NULL,   var_column_names = NULL,   var_index = \"feature_id\" )"},{"path":"/reference/get_seurat.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Export Census slices to Seurat — get_seurat","text":"census census object, usually returned cellxgene.census::open_soma(). organism organism query, usually one Homo sapiens Mus musculus measurement_name measurement object query. Defaults RNA. X_layers named character X layers add Seurat assay, names names Seurat slots (counts data) values names layers within X. obs_value_filter SOMA value_filter across columns obs dataframe, expressed string. obs_coords set coordinates obs dataframe index, expressed type format supported SOMADataFrame's read() method. obs_column_names Columns fetch obs data frame. obsm_layers Names arrays obsm add cell embeddings; pass FALSE suppress loading dimensional reductions. var_value_filter obs_value_filter var. var_coords obs_coords var. var_column_names Columns fetch var data frame. var_index Name column ‘var’ add feature names.","code":""},{"path":"/reference/get_seurat.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Export Census slices to Seurat — get_seurat","text":"Seurat object containing sensus slice.","code":""},{"path":"/reference/get_seurat.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Export Census slices to Seurat — get_seurat","text":"","code":"if (FALSE) { census <- open_soma() seurat_obj <- get_seurat(   census,   organism = \"Homo sapiens\",   obs_value_filter = \"cell_type == 'leptomeningeal cell'\",   var_value_filter = \"feature_id %in% c('ENSG00000107317', 'ENSG00000106034')\" )  seurat_obj  census$close() }"},{"path":"/reference/get_single_cell_experiment.html","id":null,"dir":"Reference","previous_headings":"","what":"Export Census slices to SingleCellExperiment — get_single_cell_experiment","title":"Export Census slices to SingleCellExperiment — get_single_cell_experiment","text":"Convenience wrapper around SOMAExperimentAxisQuery, build execute query, return SingleCellExperiment object.","code":""},{"path":"/reference/get_single_cell_experiment.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Export Census slices to SingleCellExperiment — get_single_cell_experiment","text":"","code":"get_single_cell_experiment(   census,   organism,   measurement_name = \"RNA\",   X_layers = c(counts = \"raw\"),   obs_value_filter = NULL,   obs_coords = NULL,   obs_column_names = NULL,   obsm_layers = FALSE,   var_value_filter = NULL,   var_coords = NULL,   var_column_names = NULL,   var_index = \"feature_id\" )"},{"path":"/reference/get_single_cell_experiment.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Export Census slices to SingleCellExperiment — get_single_cell_experiment","text":"census census object, usually returned cellxgene.census::open_soma(). organism organism query, usually one Homo sapiens Mus musculus measurement_name measurement object query. Defaults RNA. X_layers character vector X layers add assays main experiment; may optionally named set name resulting assay (eg. ‘X_layers = c(counts = \"raw\")’ load X layer “‘raw’” assay “‘counts’”); default, loads X layers obs_value_filter SOMA value_filter across columns obs dataframe, expressed string. obs_coords set coordinates obs dataframe index, expressed type format supported SOMADataFrame's read() method. obs_column_names Columns fetch obs data frame. obsm_layers Names arrays obsm add cell embeddings; pass FALSE suppress loading dimensional reductions. var_value_filter obs_value_filter var. var_coords obs_coords var. var_column_names Columns fetch var data frame. var_index Name column ‘var’ add feature names.","code":""},{"path":"/reference/get_single_cell_experiment.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Export Census slices to SingleCellExperiment — get_single_cell_experiment","text":"SingleCellExperiment object containing sensus slice.","code":""},{"path":"/reference/get_single_cell_experiment.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Export Census slices to SingleCellExperiment — get_single_cell_experiment","text":"","code":"if (FALSE) { census <- open_soma() sce_obj <- get_single_cell_experiment(   census,   organism = \"Homo sapiens\",   obs_value_filter = \"cell_type == 'leptomeningeal cell'\",   var_value_filter = \"feature_id %in% c('ENSG00000107317', 'ENSG00000106034')\" )  sce_obj  census$close() }"},{"path":"/reference/get_source_h5ad_uri.html","id":null,"dir":"Reference","previous_headings":"","what":"Locate source h5ad file for a dataset. — get_source_h5ad_uri","title":"Locate source h5ad file for a dataset. — get_source_h5ad_uri","text":"Locate source h5ad file dataset.","code":""},{"path":"/reference/get_source_h5ad_uri.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Locate source h5ad file for a dataset. — get_source_h5ad_uri","text":"","code":"get_source_h5ad_uri(dataset_id, census_version = \"stable\", census = NULL)"},{"path":"/reference/get_source_h5ad_uri.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Locate source h5ad file for a dataset. — get_source_h5ad_uri","text":"dataset_id dataset_id interest. census_version desired Census version. census open Census handle census_version. provided, opened closed automatically; efficient reuse handle calling get_source_h5ad_uri() multiple times.","code":""},{"path":"/reference/get_source_h5ad_uri.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Locate source h5ad file for a dataset. — get_source_h5ad_uri","text":"list uri optional s3_region.","code":""},{"path":"/reference/get_source_h5ad_uri.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Locate source h5ad file for a dataset. — get_source_h5ad_uri","text":"","code":"get_source_h5ad_uri(\"0895c838-e550-48a3-a777-dbcd35d30272\") #> $uri #> [1] \"s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/h5ads/0895c838-e550-48a3-a777-dbcd35d30272.h5ad\" #>  #> $s3_region #> [1] \"us-west-2\" #>"},{"path":"/reference/new_SOMATileDBContext_for_census.html","id":null,"dir":"Reference","previous_headings":"","what":"Create SOMATileDBContext for Census — new_SOMATileDBContext_for_census","title":"Create SOMATileDBContext for Census — new_SOMATileDBContext_for_census","text":"Create SOMATileDBContext suitable using open_soma(). Typically open_soma() creates context automatically, one can created separately order set custom configuration options, share multiple open Census handles.","code":""},{"path":"/reference/new_SOMATileDBContext_for_census.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create SOMATileDBContext for Census — new_SOMATileDBContext_for_census","text":"","code":"new_SOMATileDBContext_for_census(   census_version_description,   mirror = \"default\",   ... )"},{"path":"/reference/new_SOMATileDBContext_for_census.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create SOMATileDBContext for Census — new_SOMATileDBContext_for_census","text":"census_version_description result get_census_version_description() desired Census version. mirror name intended census mirror (get_census_mirror_directory()[[name]] save lookup), NULL configure local file access. ... Custom configuration options.","code":""},{"path":"/reference/new_SOMATileDBContext_for_census.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create SOMATileDBContext for Census — new_SOMATileDBContext_for_census","text":"SOMATileDBContext object open_soma().","code":""},{"path":"/reference/new_SOMATileDBContext_for_census.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create SOMATileDBContext for Census — new_SOMATileDBContext_for_census","text":"","code":"census_desc <- get_census_version_description(\"stable\") ctx <- new_SOMATileDBContext_for_census(census_desc, \"soma.init_buffer_bytes\" = paste(4 * 1024**3)) census <- open_soma(\"stable\", tiledbsoma_ctx = ctx) #> The stable Census release is currently 2023-12-15. Specify census_version = \"2023-12-15\" in future calls to open_soma() to ensure data consistency. census$close()"},{"path":"/reference/open_soma.html","id":null,"dir":"Reference","previous_headings":"","what":"Open the Census — open_soma","title":"Open the Census — open_soma","text":"Open Census","code":""},{"path":"/reference/open_soma.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Open the Census — open_soma","text":"","code":"open_soma(   census_version = \"stable\",   uri = NULL,   tiledbsoma_ctx = NULL,   mirror = NULL )"},{"path":"/reference/open_soma.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Open the Census — open_soma","text":"census_version version Census, e.g., \"stable\". uri URI containing Census SOMA objects open instead released version. (supplied, takes precedence census_version.) tiledbsoma_ctx tiledbsoma::SOMATileDBContext built using new_SOMATileDBContext_for_census(). Optional (created automatically) using census_version context need reused. mirror Census mirror access; one names(get_census_mirror_directory()).","code":""},{"path":"/reference/open_soma.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Open the Census — open_soma","text":"Top-level tiledbsoma::SOMACollection object. use, census closed release memory resources, usually .exit(census$close(), add = TRUE). Closing top-level census also close SOMA objects accessed .","code":""},{"path":"/reference/open_soma.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Open the Census — open_soma","text":"","code":"census <- open_soma() #> The stable Census release is currently 2023-12-15. Specify census_version = \"2023-12-15\" in future calls to open_soma() to ensure data consistency. as.data.frame(census$get(\"census_info\")$get(\"summary\")$read()$concat()) #>   soma_joinid                      label      value #> 1           0      census_schema_version      1.2.0 #> 2           1          census_build_date 2023-10-23 #> 3           2     dataset_schema_version      3.1.0 #> 4           3           total_cell_count   68683222 #> 5           4          unique_cell_count   40356133 #> 6           5 number_donors_homo_sapiens      15588 #> 7           6 number_donors_mus_musculus       1990 census$close()"}]
            +[{"path":"/LICENSE.html","id":null,"dir":"","previous_headings":"","what":"MIT License","title":"MIT License","text":"Copyright (c) 2022, Chan Zuckerberg Initiative Permission hereby granted, free charge, person obtaining copy software associated documentation files (“Software”), deal Software without restriction, including without limitation rights use, copy, modify, merge, publish, distribute, sublicense, /sell copies Software, permit persons Software furnished , subject following conditions: copyright notice permission notice shall included copies substantial portions Software. SOFTWARE PROVIDED “”, WITHOUT WARRANTY KIND, EXPRESS IMPLIED, INCLUDING LIMITED WARRANTIES MERCHANTABILITY, FITNESS PARTICULAR PURPOSE NONINFRINGEMENT. EVENT SHALL AUTHORS COPYRIGHT HOLDERS LIABLE CLAIM, DAMAGES LIABILITY, WHETHER ACTION CONTRACT, TORT OTHERWISE, ARISING , CONNECTION SOFTWARE USE DEALINGS SOFTWARE.","code":""},{"path":"/articles/census_access_maintained_embeddings.html","id":"open-census","dir":"Articles","previous_headings":"","what":"Open Census","title":"Access CELLxGENE collaboration embeddings (scVI, Geneformer)","text":"","code":"library(\"cellxgene.census\") census <- open_soma(census_version = \"2023-12-15\")"},{"path":"/articles/census_access_maintained_embeddings.html","id":"load-embeddings-as-seurat-reductions","dir":"Articles","previous_headings":"","what":"Load embeddings as Seurat reductions","title":"Access CELLxGENE collaboration embeddings (scVI, Geneformer)","text":"high-level cellxgene.census::get_seurat() function can query Census load embeddings dimensional reductions Seurat object. ask Seurat object expression data human cells tissue_general equal 'central nervous system', along scVI geneformer embeddings (obsm_layers). embeddings stored dimensional reductions seurat_obj, can take quick look scVI embeddings 2D scatter plot via UMAP, colored Census cell_type annotations.","code":"library(\"Seurat\")  seurat_obj <- get_seurat(   census,   organism = \"homo_sapiens\",   obs_value_filter = \"tissue_general == 'central nervous system'\",   obs_column_names = c(\"cell_type\"),   obsm_layers = c(\"scvi\", \"geneformer\") ) seurat_obj <- RunUMAP(   seurat_obj,   reduction = \"scvi\",   dims = 1:ncol(Embeddings(seurat_obj, \"scvi\")) )  DimPlot(seurat_obj, reduction = \"umap\", group.by = \"cell_type\") +   theme(legend.text = element_text(size = 8))"},{"path":"/articles/census_access_maintained_embeddings.html","id":"load-embeddings-as-singlecellexperiment-reductions","dir":"Articles","previous_headings":"","what":"Load embeddings as SingleCellExperiment reductions","title":"Access CELLxGENE collaboration embeddings (scVI, Geneformer)","text":"Similarly, cellxgene.census::get_single_cell_experiment() can query Census store embeddings dimensionality reduction results Bioconductor SingleCellExperiment object. , can view UMAP Geneformer embeddings colored cell_type.","code":"library(\"SingleCellExperiment\") sce_obj <- get_single_cell_experiment(   census,   organism = \"homo_sapiens\",   obs_value_filter = \"tissue_general == 'central nervous system'\",   obs_column_names = c(\"cell_type\"),   obsm_layers = c(\"scvi\", \"geneformer\") ) sce_obj <- scater::runUMAP(sce_obj, dimred = \"geneformer\") scater::plotReducedDim(sce_obj, dimred = \"UMAP\", colour_by = \"cell_type\")"},{"path":"/articles/census_access_maintained_embeddings.html","id":"load-embeddings-as-sparsematrix","dir":"Articles","previous_headings":"","what":"Load embeddings as sparseMatrix","title":"Access CELLxGENE collaboration embeddings (scVI, Geneformer)","text":"Lastly, can use SOMAExperimentAxisQuery lower-level access embeddings’ numerical data. can performant use cases don’t need features Seurat SingleCellExperiment. row embeddings sparseMatrix provides fine-tuned Geneformer model’s 512-dimensional embedding vector cell, cell soma_joinids row names. different arguments, SOMAExperimentAxisQuery$to_sparse_matrix() can also read scVI embeddings expression data. Still lower-level access available SOMAExperimentAxisQuery$read(), streams Arrow tables. methods SOMAExperimentAxisQuery can fetch metadata like cell_type: SOMAExperimentAxisQuery loads ask Census, unlike high-level get_seurat() get_single_cell_experiment() functions, eagerly populate objects based query.","code":"query <- census$get(\"census_data\")$get(\"homo_sapiens\")$axis_query(   \"RNA\",   obs_query = tiledbsoma::SOMAAxisQuery$new(value_filter = \"tissue == 'tongue'\") ) embeddings <- query$to_sparse_matrix(\"obsm\", \"geneformer\") str(embeddings) #> Formal class 'dgTMatrix' [package \"Matrix\"] with 6 slots #>   ..@ i       : int [1:190464] 0 0 0 0 0 0 0 0 0 0 ... #>   ..@ j       : int [1:190464] 0 1 2 3 4 5 6 7 8 9 ... #>   ..@ Dim     : int [1:2] 372 512 #>   ..@ Dimnames:List of 2 #>   .. ..$ : chr [1:372] \"51784858\" \"51784859\" \"51784860\" \"51784861\" ... #>   .. ..$ : chr [1:512] \"0\" \"1\" \"2\" \"3\" ... #>   ..@ x       : num [1:190464] 0.1104 -1.2031 1.0078 0.0131 1.2422 ... #>   ..@ factors : list() head(as.data.frame(query$obs(column_names = c(\"soma_joinid\", \"cell_type\"))$concat())) #>   soma_joinid  cell_type #> 1    51784858 basal cell #> 2    51784859 basal cell #> 3    51784860 fibroblast #> 4    51784861 fibroblast #> 5    51784862 basal cell #> 6    51784863 basal cell census$close()"},{"path":"/articles/census_axis_query.html","id":"axis-query-example","dir":"Articles","previous_headings":"","what":"Axis Query Example","title":"Axis Query Example","text":"Goal: demonstrate basic axis metadata handling. CZ CELLxGENE Census stores obs (cell) metadata SOMA DataFrame, can queried read R data frame. Census also convenience package simplifies opening census. R data frames -memory objects. Take care queries small enough results fit memory.","code":""},{"path":"/articles/census_axis_query.html","id":"opening-the-census","dir":"Articles","previous_headings":"Axis Query Example","what":"Opening the census","title":"Axis Query Example","text":"cellxgene.census R package contains convenient API open latest version Census. can learn cellxgene.census methods accessing corresponding documentation. example ?cellxgene.census::open_soma.","code":"census <- cellxgene.census::open_soma()"},{"path":"/articles/census_axis_query.html","id":"summarize-census-cell-metadata","dir":"Articles","previous_headings":"Axis Query Example","what":"Summarize Census cell metadata","title":"Axis Query Example","text":"Tips: can read entire SOMA dataframe R using .data.frame(soma_df$read()). Queries much faster request DataFrame columns required analysis (e.g. column_names = c(\"soma_joinid\", \"cell_type_ontology_term_id\")). can also refine query results using value_filter, filter census matching records.","code":""},{"path":"/articles/census_axis_query.html","id":"summarize-all-cell-types","dir":"Articles","previous_headings":"Axis Query Example > Summarize Census cell metadata","what":"Summarize all cell types","title":"Axis Query Example","text":"example reads cell metadata (obs) R data frame summarize variety ways.","code":"human <- census$get(\"census_data\")$get(\"homo_sapiens\")  # Read obs into an R data frame (tibble). obs_df <- as.data.frame(human$obs$read(   column_names = c(\"soma_joinid\", \"cell_type_ontology_term_id\") ))  # Find all unique values in the cell_type_ontology_term_id column. unique_cell_type_ontology_term_id <- unique(obs_df$cell_type_ontology_term_id)  cat(paste(   \"There are\",   length(unique_cell_type_ontology_term_id),   \"cell types in the Census! The first few are:\" )) #> There are 604 cell types in the Census! The first few are: head(unique_cell_type_ontology_term_id) #> [1] \"CL:0000540\" \"CL:0000738\" \"CL:0000763\" \"CL:0000136\" \"CL:0000235\" #> [6] \"CL:0000115\""},{"path":"/articles/census_axis_query.html","id":"summarize-a-subset-of-cell-types-selected-with-a-value_filter","dir":"Articles","previous_headings":"Axis Query Example > Summarize Census cell metadata","what":"Summarize a subset of cell types, selected with a value_filter","title":"Axis Query Example","text":"example utilizes SOMA “value filter” read subset cells tissue_ontology_term_id equal UBERON:0002048 (lung tissue), summarizes query result. can also define much complex value filters. example: combine terms use %% operator query multiple values","code":"# Read cell_type terms for cells which have a specific tissue term LUNG_TISSUE <- \"UBERON:0002048\"  obs_df <- as.data.frame(human$obs$read(   column_names = c(\"cell_type_ontology_term_id\"),   value_filter = paste(\"tissue_ontology_term_id == '\", LUNG_TISSUE, \"'\", sep = \"\") ))  # Find all unique values in the cell_type_ontology_term_id column as an R data frame. unique_cell_type_ontology_term_id <- unique(obs_df$cell_type_ontology_term_id) cat(paste(   \"There are \",   length(unique_cell_type_ontology_term_id),   \" cell types in the Census where tissue_ontology_term_id == \",   LUNG_TISSUE,   \"!\\nThe first few are:\",   sep = \"\" )) #> There are 185 cell types in the Census where tissue_ontology_term_id == UBERON:0002048! #> The first few are: head(unique_cell_type_ontology_term_id) #> [1] \"CL:0000003\" \"CL:4028004\" \"CL:0002145\" \"CL:0000625\" \"CL:0000624\" #> [6] \"CL:4028006\"  # Report the 10 most common top_10 <- sort(table(obs_df$cell_type_ontology_term_id), decreasing = TRUE)[1:10] cat(paste(\"The top 10 cell types where tissue_ontology_term_id ==\", LUNG_TISSUE)) #> The top 10 cell types where tissue_ontology_term_id == UBERON:0002048 print(top_10) #>  #> CL:0000003 CL:0000583 CL:0000625 CL:0000624 CL:0000235 CL:0002063 CL:0000860  #>     562038     526859     323433     323067     254173     246279     203526  #> CL:0000623 CL:0001064 CL:0002632  #>     164944     149067     132243 # You can also do more complex queries, such as testing for inclusion in a list of values obs_df <- as.data.frame(human$obs$read(   column_names = c(\"cell_type_ontology_term_id\"),   value_filter = \"tissue_ontology_term_id %in% c('UBERON:0002082', 'UBERON:OOO2084', 'UBERON:0002080')\" ))  # Summarize top_10 <- sort(table(obs_df$cell_type_ontology_term_id), decreasing = TRUE)[1:10] print(top_10) #>  #> CL:0000746 CL:0008034 CL:0002548 CL:0000115 CL:0002131 CL:0000763 CL:0000669  #>     159096      84750      79618      64190      61830      32088      27515  #> CL:0000003 CL:0000057 CL:0002144  #>      22707      20117      18593"},{"path":"/articles/census_axis_query.html","id":"full-census-stats","dir":"Articles","previous_headings":"Axis Query Example > Summarize Census cell metadata","what":"Full census stats","title":"Axis Query Example","text":"example queries organisms Census, summarizes diversity various metadata labels.","code":"cols_to_query <- c(   \"cell_type_ontology_term_id\",   \"assay_ontology_term_id\",   \"tissue_ontology_term_id\" )  total_cells <- 0 for (organism in census$get(\"census_data\")$names()) {   print(organism)   obs_df <- as.data.frame(     census$get(\"census_data\")$get(organism)$obs$read(column_names = cols_to_query)   )   total_cells <- total_cells + nrow(obs_df)   for (col in cols_to_query) {     cat(paste(\"  Unique \", col, \" values: \", length(unique(obs_df[[col]])), \"\\n\", sep = \"\"))   } } #> [1] \"homo_sapiens\" #>   Unique cell_type_ontology_term_id values: 604 #>   Unique assay_ontology_term_id values: 20 #>   Unique tissue_ontology_term_id values: 227 #> [1] \"mus_musculus\" #>   Unique cell_type_ontology_term_id values: 226 #>   Unique assay_ontology_term_id values: 9 #>   Unique tissue_ontology_term_id values: 51 cat(paste(\"Complete Census contains\", total_cells, \"cells.\")) #> Complete Census contains 60361716 cells."},{"path":"/articles/census_citation_generation.html","id":"requirements","dir":"Articles","previous_headings":"","what":"Requirements","title":"Generating citations for Census slices","text":"notebook requires: cellxgene_census Python package. Census data release schema version 1.3.0 greater.","code":""},{"path":"/articles/census_citation_generation.html","id":"generating-citation-strings","dir":"Articles","previous_headings":"","what":"Generating citation strings","title":"Generating citations for Census slices","text":"First open handle Census data. ensure open data release schema version 1.3.0 greater, use census_version=\"latest\" load dataset table contains column \"citation\" dataset included Census. now can use column \"dataset_id\" present dataset table Census cell metadata create citation strings Census slice.","code":"library(\"tiledb\") library(\"cellxgene.census\")  census <- open_soma(census_version = \"latest\") census_release_info <- census$get(\"census_info\")$get(\"summary\")$read()$concat() as.data.frame(census_release_info) #>   soma_joinid                      label      value #> 1           0      census_schema_version      2.0.0 #> 2           1          census_build_date 2024-04-01 #> 3           2     dataset_schema_version      5.0.0 #> 4           3           total_cell_count  114405937 #> 5           4          unique_cell_count   59761180 #> 6           5 number_donors_homo_sapiens      17082 #> 7           6 number_donors_mus_musculus       4186 datasets <- census$get(\"census_info\")$get(\"datasets\")$read()$concat() datasets <- as.data.frame(datasets) head(datasets[\"citation\"]) #>                                                                                                                                                                                                                                                                                                           citation #> 1            Publication: https://doi.org/10.1002/hep4.1854 Dataset Version: https://datasets.cellxgene.cziscience.com/fb76c95f-0391-4fac-9fb9-082ce2430b59.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/44531dd9-1388-4416-a117-af0a99de2294 #> 2   Publication: https://doi.org/10.1126/sciimmunol.abe6291 Dataset Version: https://datasets.cellxgene.cziscience.com/b6737a5e-9069-4dd6-9a57-92e17a746df9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/3a2af25b-2338-4266-aad3-aa8d07473f50 #> 3   Publication: https://doi.org/10.1038/s41593-020-00764-7 Dataset Version: https://datasets.cellxgene.cziscience.com/0e02290f-b992-450b-8a19-554f73cd7f09.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/180bff9c-c8a5-4539-b13b-ddbc00d643e6 #> 4   Publication: https://doi.org/10.1038/s41467-022-29450-x Dataset Version: https://datasets.cellxgene.cziscience.com/40832710-d7b1-43fb-b2c2-1cd2255bc3ac.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/bf325905-5e8e-42e3-933d-9a9053e9af80 #> 5   Publication: https://doi.org/10.1038/s41590-021-01059-0 Dataset Version: https://datasets.cellxgene.cziscience.com/eb6c070c-ff67-4c1f-8d4d-65f9fe2119ee.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/93eebe82-d8c3-41bc-a906-63b5b5f24a9d #> 6 Publication: https://doi.org/10.1016/j.celrep.2019.12.082 Dataset Version: https://datasets.cellxgene.cziscience.com/650a47be-6666-4f70-ac47-8414c50bbd8e.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/939769a8-d8d2-4d01-abfc-55699893fd49"},{"path":"/articles/census_citation_generation.html","id":"via-cell-metadata-query","dir":"Articles","previous_headings":"Generating citation strings","what":"Via cell metadata query","title":"Generating citations for Census slices","text":"","code":"# Query cell metadata cell_metadata <- census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(   value_filter = \"tissue == 'cardiac atrium'\",   column_names = c(\"dataset_id\", \"cell_type\") )  cell_metadata <- as.data.frame(cell_metadata$concat())  # Get a citation string for the slice slice_datasets <- datasets[datasets$dataset_id %in% cell_metadata$dataset_id, ] print(slice_datasets$citation) #> [1] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/9227d155-6f2d-4534-be73-b86c5c34d8e6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [2] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/017c9ef2-a5e5-429e-a9a1-919e330c4087.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [3] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8c189c08-4eba-45d4-925f-a5fe1a13d2ae.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [4] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b76d37f6-0654-447f-bd1b-477be2c747f9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [5] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/860c49d4-8ab1-4576-b67e-02d66e4a6ddd.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [6] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b84def55-a776-4aa4-a9a6-7aab8b973086.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\""},{"path":"/articles/census_citation_generation.html","id":"via-seurat-query","dir":"Articles","previous_headings":"Generating citation strings","what":"Via Seurat query","title":"Generating citations for Census slices","text":"","code":"# Fetch a Seurat object seurat_obj <- get_seurat(   census = census,   organism = \"homo_sapiens\",   measurement_name = \"RNA\",   obs_value_filter = \"tissue == 'cardiac atrium'\",   var_value_filter = \"feature_name == 'MYBPC3'\",   obs_column_names = c(\"dataset_id\", \"cell_type\") )  # Get a citation string for the slice slice_datasets <- datasets[datasets$dataset_id %in% seurat_obj[[]]$dataset_id, ] print(slice_datasets$citation) #> [1] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/9227d155-6f2d-4534-be73-b86c5c34d8e6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [2] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/017c9ef2-a5e5-429e-a9a1-919e330c4087.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [3] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8c189c08-4eba-45d4-925f-a5fe1a13d2ae.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [4] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b76d37f6-0654-447f-bd1b-477be2c747f9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [5] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/860c49d4-8ab1-4576-b67e-02d66e4a6ddd.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [6] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b84def55-a776-4aa4-a9a6-7aab8b973086.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\""},{"path":"/articles/census_citation_generation.html","id":"via-singlecellexperiment-query","dir":"Articles","previous_headings":"Generating citation strings","what":"Via SingleCellExperiment query","title":"Generating citations for Census slices","text":"","code":"# Fetch a Seurat object sce_obj <- get_single_cell_experiment(   census = census,   organism = \"homo_sapiens\",   measurement_name = \"RNA\",   obs_value_filter = \"tissue == 'cardiac atrium'\",   var_value_filter = \"feature_name == 'MYBPC3'\",   obs_column_names = c(\"dataset_id\", \"cell_type\") )  # Get a citation string for the slice slice_datasets <- datasets[datasets$dataset_id %in% sce_obj$dataset_id, ] print(slice_datasets$citation) #> [1] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/9227d155-6f2d-4534-be73-b86c5c34d8e6.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [2] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/017c9ef2-a5e5-429e-a9a1-919e330c4087.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [3] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/8c189c08-4eba-45d4-925f-a5fe1a13d2ae.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [4] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b76d37f6-0654-447f-bd1b-477be2c747f9.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [5] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/860c49d4-8ab1-4576-b67e-02d66e4a6ddd.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\" #> [6] \"Publication: https://doi.org/10.1126/science.abl4896 Dataset Version: https://datasets.cellxgene.cziscience.com/b84def55-a776-4aa4-a9a6-7aab8b973086.h5ad curated and distributed by CZ CELLxGENE Discover in Collection: https://cellxgene.cziscience.com/collections/e5f58829-1a66-40b5-a624-9046778e74f5\""},{"path":"/articles/census_compute_over_X.html","id":"incremental-mean-calculation","dir":"Articles","previous_headings":"","what":"Incremental mean calculation","title":"Computing on X using online (incremental) algorithms","text":"Many statistics, marginal means, easy calculate incrementally. Let’s begin query X$raw sparse matrix unnormalized read counts, return results shards incrementally accumulate read count gene, divide cell count get mean reads per cell gene. First define query - case slice obs axis cells specific tissue & sex value, genes var axis. query$X() method returns iterator results, Arrow Table. table contain sparse X data obs/var coordinates, using standard SOMA names: soma_data - X values (float32) soma_dim_0 - obs coordinate (int64) soma_dim_1 - var coordinate (int64) Important: X matrices joined var/obs axis DataFrames integer join “id” (aka soma_joinid). positionally indexed, given cell gene may soma_joinid value (e.g., large integer). words, given X value, soma_dim_0 corresponds soma_joinid obs dataframe, soma_dim_1 coordinate corresponds soma_joinid var dataframe. convenience, query class includes utility simplify operations query slices. query$indexer indexer used wrap output query$X(), converting soma_joinids positional indexing query results. Positions [0, N), N number results query given axis. Key points: expensive query read results - rather make multiple passes data, read perform multiple computations. default, data census indexed soma_joinid positionally.","code":"library(\"tiledbsoma\") library(\"cellxgene.census\") census <- open_soma()  query <- census$get(\"census_data\")$get(\"mus_musculus\")$axis_query(   measurement_name = \"RNA\",   obs_query = SOMAAxisQuery$new(value_filter = \"tissue=='brain' && sex=='male'\") )  genes_df <- query$var(column_names = c(\"feature_id\", \"feature_name\"))$concat() genes_df <- as.data.frame(genes_df) n_genes <- nrow(genes_df)  # accumulator vector (for each gene) for the total count over all cells in X(\"raw\") raw_sum_by_gene <- numeric(n_genes) names(raw_sum_by_gene) <- genes_df$feature_id  # iterate through in-memory shards of query results tables <- query$X(\"raw\")$tables() while (!tables$read_complete()) {   table_part <- tables$read_next()   # table_part is an Arrow table with the columns mentioned above. The result   # order is not guaranteed!    # table_part$soma_dim_1 is the var/gene soma_joinid. But note that these are   # arbitrary int64 id's, and moreover each table_part may exhibit only a subset   # of the values we'll see over all query results. query$indexer helps us map   # any given soma_dim_1 values onto positions in query$var() (genes_df), that is   # the union of all values we'll see.   gene_indexes <- query$indexer$by_var(table_part$soma_dim_1)$as_vector()   stopifnot(sum(gene_indexes >= n_genes) == 0)   # sum(table_part) group by gene, yielding a numeric vector with the gene_index   # in its names   sum_part <- tapply(as.vector(table_part$soma_data), gene_indexes, sum)   # update the accumulator vector   which_genes <- as.integer(names(sum_part)) + 1 # nb: gene_indexes is zero-based   stopifnot(sum(which_genes > n_genes) == 0)   raw_sum_by_gene[which_genes] <- raw_sum_by_gene[which_genes] + sum_part }  # Divide each sum by cell count to get mean reads per cell (for each gene), # implicitly averaging in all zero entries in X even though they weren't included # in the sparse query results. genes_df$raw_mean <- raw_sum_by_gene / query$n_obs genes_df #>            feature_id  feature_name     raw_mean #> 1  ENSMUSG00000051951          Xkr4 1.397121e+00 #> 2  ENSMUSG00000025900           Rp1 3.162902e-01 #> 3  ENSMUSG00000025902         Sox17 6.604085e+01 #> 4  ENSMUSG00000033845        Mrpl15 3.939172e+01 #> 5  ENSMUSG00000025903        Lypla1 1.986548e+01 #> 6  ENSMUSG00000033813         Tcea1 4.305924e+01 #> 7  ENSMUSG00000002459         Rgs20 3.496194e+00 #> 8  ENSMUSG00000033793       Atp6v1h 7.470932e+01 #> 9  ENSMUSG00000025905         Oprk1 4.568752e-01 #> 10 ENSMUSG00000033774        Npbwr1 1.241003e-04 #> 11 ENSMUSG00000025907        Rb1cc1 3.631679e+01 #> 12 ENSMUSG00000033740          St18 1.660110e+01 #> 13 ENSMUSG00000051285        Pcmtd1 5.410501e+01 #> 14 ENSMUSG00000025909         Sntg1 1.178725e+00 #> 15 ENSMUSG00000061024          Rrs1 2.098927e+01 #> 16 ENSMUSG00000025911        Adhfe1 1.266112e+01 #> 17 ENSMUSG00000079671 2610203C22Rik 9.474621e+00 #> 18 ENSMUSG00000025912         Mybl1 2.643129e-01 #> 19 ENSMUSG00000045210        Vcpip1 3.456668e+01 #> 20 ENSMUSG00000097893 1700034P13Rik 5.721023e-01 #> 21 ENSMUSG00000025915          Sgk3 2.012592e+01 #> 22 ENSMUSG00000098234         Snhg6 6.784314e+00 #> 23 ENSMUSG00000025916       Ppp1r42 2.585422e-01 #> 24 ENSMUSG00000025917         Cops5 7.909310e+01 #> 25 ENSMUSG00000056763         Cspp1 1.635604e+01 #> 26 ENSMUSG00000067851       Arfgef1 1.582897e+01 #> 27 ENSMUSG00000042501          Cpa6 1.880119e-02 #> 28 ENSMUSG00000048960         Prex2 2.283623e+01 #> 29 ENSMUSG00000057715 A830018L16Rik 9.992140e-01 #> 30 ENSMUSG00000016918         Sulf1 5.567469e+00 #> 31 ENSMUSG00000025938       Slco5a1 2.452015e-01 #> 32 ENSMUSG00000042414        Prdm14 6.142964e-03 #> 33 ENSMUSG00000005886         Ncoa2 1.707928e+01 #>  [ reached 'max' / getOption(\"max.print\") -- omitted 52384 rows ]"},{"path":"/articles/census_compute_over_X.html","id":"counting-cells-grouped-by-dataset-and-gene","dir":"Articles","previous_headings":"","what":"Counting cells grouped by dataset and gene","title":"Computing on X using online (incremental) algorithms","text":"goal example count number cells nonzero reads, grouped gene Census dataset_id. result data frame dataset, gene, number cells nonzero reads dataset gene. multi-factor aggregation, ’ll take advantage dplyr routines instead lower-level vector indexer shown . presentation purposes, ’ll limit query four genes, can expanded genes easily. Don’t forget close census.","code":"library(\"dplyr\")  query <- census$get(\"census_data\")$get(\"mus_musculus\")$axis_query(   measurement_name = \"RNA\",   obs_query = SOMAAxisQuery$new(value_filter = \"tissue=='brain'\"),   var_query = SOMAAxisQuery$new(value_filter = \"feature_name %in% c('Malat1', 'Ptprd', 'Dlg2', 'Pcdh9')\") )  obs_tbl <- query$obs(column_names = c(\"soma_joinid\", \"dataset_id\"))$concat() obs_df <- data.frame(   # materialize soma_joinid as character to avoid overflowing R 32-bit integer   cell_id = as.character(obs_tbl$soma_joinid),   dataset_id = obs_tbl$dataset_id$as_vector() ) var_tbl <- query$var(column_names = c(\"soma_joinid\", \"feature_name\"))$concat() var_df <- data.frame(   gene_id = as.character(var_tbl$soma_joinid),   feature_name = var_tbl$feature_name$as_vector() )  # accumulator for # cells by dataset & gene n_cells_grouped <- data.frame(   \"dataset_id\" = character(0),   \"gene_id\" = character(0),   \"n_cells\" = numeric(0) )  # iterate through in-memory shards of query results tables <- query$X(\"raw\")$tables() while (!tables$read_complete()) {   table_part <- tables$read_next()    # prepare a (dataset,gene,1) tuple for each entry in table_part   n_cells_part <- data.frame(     \"cell_id\" = as.character(table_part$soma_dim_0),     \"gene_id\" = as.character(table_part$soma_dim_1),     \"n_cells\" = 1   )   n_cells_part <- left_join(n_cells_part, obs_df, by = \"cell_id\")   stopifnot(sum(is.null(n_cells_part$dataset_id)) == 0)    # fold those into n_cells_grouped   n_cells_grouped <- n_cells_part %>%     select(-cell_id) %>%     bind_rows(n_cells_grouped) %>%     group_by(dataset_id, gene_id) %>%     summarise(n_cells = sum(n_cells)) %>%     ungroup() }  # add gene names for display n_cells_grouped <- left_join(n_cells_grouped, var_df, by = \"gene_id\") stopifnot(sum(is.null(n_cells_grouped$feature_name)) == 0) n_cells_grouped[c(\"dataset_id\", \"feature_name\", \"n_cells\")] #> # A tibble: 21 x 3 #>    dataset_id                           feature_name n_cells #>                                               #>  1 3bbb6cf9-72b9-41be-b568-656de6eb18b5 Ptprd          79578 #>  2 3bbb6cf9-72b9-41be-b568-656de6eb18b5 Dlg2           79513 #>  3 3bbb6cf9-72b9-41be-b568-656de6eb18b5 Pcdh9          79476 #>  4 3bbb6cf9-72b9-41be-b568-656de6eb18b5 Malat1         79667 #>  5 58b01044-c5e5-4b0f-8a2d-6ebf951e01ff Ptprd            474 #>  6 58b01044-c5e5-4b0f-8a2d-6ebf951e01ff Dlg2              81 #>  7 58b01044-c5e5-4b0f-8a2d-6ebf951e01ff Pcdh9            125 #>  8 58b01044-c5e5-4b0f-8a2d-6ebf951e01ff Malat1         12622 #>  9 66ff82b4-9380-469c-bc4b-cfa08eacd325 Dlg2             856 #> 10 66ff82b4-9380-469c-bc4b-cfa08eacd325 Pcdh9           2910 #> # i 11 more rows census$close()"},{"path":"/articles/census_dataset_presence.html","id":"opening-the-census","dir":"Articles","previous_headings":"","what":"Opening the Census","title":"Genes measured in each cell (dataset presence matrix)","text":"cellxgene.census R package contains convenient API open version Census (default, newest stable version).","code":"library(\"cellxgene.census\") census <- open_soma()"},{"path":"/articles/census_dataset_presence.html","id":"fetching-the-ids-of-the-census-datasets","dir":"Articles","previous_headings":"","what":"Fetching the IDs of the Census datasets","title":"Genes measured in each cell (dataset presence matrix)","text":"Let’s grab table datasets included Census use table combination presence matrix .","code":"# Grab the experiment containing human data, and the measurement therein with RNA human <- census$get(\"census_data\")$get(\"homo_sapiens\") human_rna <- human$ms$get(\"RNA\")  # The census-wide datasets datasets_df <- as.data.frame(census$get(\"census_info\")$get(\"datasets\")$read()$concat()) print(datasets_df) #>    soma_joinid                        collection_id #> 1            0 4dca242c-d302-4dba-a68f-4c61e7bad553 #> 2            1 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 3            2 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 4            3 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 5            4 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 6            5 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 7            6 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 8            7 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 9            8 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 10           9 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #> 11          10 d17249d2-0e6e-4500-abb8-e6c93fa1ac6f #>                                                                       collection_name #> 1                Comparative transcriptomics reveals human-specific cortical features #> 2  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 3  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 4  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 5  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 6  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 7  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 8  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 9  Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 10 Transcriptomic cytoarchitecture reveals principles of human neocortex organization #> 11 Transcriptomic cytoarchitecture reveals principles of human neocortex organization #>             collection_doi                           dataset_id #> 1  10.1126/science.ade9516 2bdd3a2c-2ff4-4314-adf3-8a06b797a33a #> 2  10.1126/science.adf6812 f5b0810c-1664-4a62-ad06-be1d9964aa8b #> 3  10.1126/science.adf6812 e4ddac12-f48f-4455-8e8d-c2a48a683437 #> 4  10.1126/science.adf6812 e2808a6e-e2ea-41b9-b38c-4a08f1677f02 #> 5  10.1126/science.adf6812 d01c9dff-abd1-4825-bf30-2eb2ba74597e #> 6  10.1126/science.adf6812 c3aa4f95-7a18-4a7d-8dd8-ca324d714363 #> 7  10.1126/science.adf6812 be401db3-d732-408a-b0c4-71af0458b8ab #> 8  10.1126/science.adf6812 a5d5c529-8a1f-40b5-bda3-35208970070d #> 9  10.1126/science.adf6812 9c63201d-bfd9-41a8-bbbc-18d947556f3d #> 10 10.1126/science.adf6812 93cb76aa-a84b-4a92-8e6c-66a914e26d4c #> 11 10.1126/science.adf6812 8d1dd010-5cbc-43fb-83f8-e0de8e8517da #>                      dataset_version_id #> 1  7eb7f2fd-fd74-4c99-863c-97836415652e #> 2  d4427196-7876-4bdd-a929-ae4d177ec776 #> 3  3280113b-7148-4a3e-98d4-015f443aab8a #> 4  dc092185-3b8e-4fcb-ae21-1dc106d683ac #> 5  c4959ded-83dc-4442-aac7-9a59bdb47801 #> 6  0476ef54-aefe-4754-b0e9-d9fcd75adff4 #> 7  ee027704-72aa-4195-a467-0754db1ed65d #> 8  d47c0742-cea2-46c1-9e72-4d479214041c #> 9  8b09695a-1426-4867-961e-c40a1fbcc2da #> 10 98ad7381-f464-4f49-b850-5321b4f98be6 #> 11 c56683d2-452a-45dc-b402-35397e27e325 #>                                           dataset_title #> 1                               Human: Great apes study #> 2                       Dissection: Angular gyrus (AnG) #> 3                Supercluster: CGE-derived interneurons #> 4               Dissection: Primary auditory cortex(A1) #> 5  Supercluster: Deep layer (non-IT) excitatory neurons #> 6        Supercluster: IT-projecting excitatory neurons #> 7           Dissection: Anterior cingulate cortex (ACC) #> 8               Human Multiple Cortical Areas SMART-seq #> 9                Supercluster: MGE-derived interneurons #> 10        Dissection: Primary somatosensory cortex (S1) #> 11                Dissection: Primary visual cortex(V1) #>                            dataset_h5ad_path dataset_total_cell_count #> 1  2bdd3a2c-2ff4-4314-adf3-8a06b797a33a.h5ad                   156285 #> 2  f5b0810c-1664-4a62-ad06-be1d9964aa8b.h5ad                   110752 #> 3  e4ddac12-f48f-4455-8e8d-c2a48a683437.h5ad                   129495 #> 4  e2808a6e-e2ea-41b9-b38c-4a08f1677f02.h5ad                   139054 #> 5  d01c9dff-abd1-4825-bf30-2eb2ba74597e.h5ad                    92969 #> 6  c3aa4f95-7a18-4a7d-8dd8-ca324d714363.h5ad                   638941 #> 7  be401db3-d732-408a-b0c4-71af0458b8ab.h5ad                   135462 #> 8  a5d5c529-8a1f-40b5-bda3-35208970070d.h5ad                    49417 #> 9  9c63201d-bfd9-41a8-bbbc-18d947556f3d.h5ad                   185477 #> 10 93cb76aa-a84b-4a92-8e6c-66a914e26d4c.h5ad                   153159 #> 11 8d1dd010-5cbc-43fb-83f8-e0de8e8517da.h5ad                   241077 #>  [ reached 'max' / getOption(\"max.print\") -- omitted 640 rows ]"},{"path":"/articles/census_dataset_presence.html","id":"fetching-the-dataset-presence-matrix","dir":"Articles","previous_headings":"","what":"Fetching the dataset presence matrix","title":"Genes measured in each cell (dataset presence matrix)","text":"Now let’s fetch dataset presence matrix. convenience, read entire presence matrix (Homo sapiens) sparse matrix. convenience function providing capability: also need var dataframe, read R data frame convenient manipulation:","code":"presence_matrix <- get_presence_matrix(census, \"Homo sapiens\", \"RNA\") print(dim(presence_matrix)) #> NULL var_df <- as.data.frame(human_rna$var$read()$concat()) print(var_df) #>    soma_joinid      feature_id feature_name feature_length      nnz n_measured_obs #> 1            0 ENSG00000233576      HTR3C2P           1057    69370       19581263 #> 2            1 ENSG00000121410         A1BG           3999  5640476       62641311 #> 3            2 ENSG00000268895     A1BG-AS1           3374  3071864       61946057 #> 4            3 ENSG00000148584         A1CF           9603   734347       58195911 #> 5            4 ENSG00000175899          A2M           6318  7894261       62704378 #> 6            5 ENSG00000245105      A2M-AS1           2948  1637794       62086816 #> 7            6 ENSG00000166535        A2ML1           7156  2156616       60911688 #> 8            7 ENSG00000256069        A2MP1           4657   835384       23554778 #> 9            8 ENSG00000184389      A3GALT2           1023   439067       53780311 #> 10           9 ENSG00000128274       A4GALT           3358  2432348       62706770 #> 11          10 ENSG00000118017        A4GNT           1779    52430       56117399 #> 12          11 ENSG00000265544         AA06            632   220755       22545140 #> 13          12 ENSG00000081760         AACS          16039 11280800       62842909 #> 14          13 ENSG00000250420       AACSP1           3380   211588       22831831 #> 15          14 ENSG00000114771        AADAC           1632   552258       54941618 #> 16          15 ENSG00000188984      AADACL3           4055    24626       43074608 #>  [ reached 'max' / getOption(\"max.print\") -- omitted 60648 rows ]"},{"path":"/articles/census_dataset_presence.html","id":"identifying-genes-measured-in-a-specific-dataset","dir":"Articles","previous_headings":"","what":"Identifying genes measured in a specific dataset","title":"Genes measured in each cell (dataset presence matrix)","text":"Now dataset table, genes metadata table, dataset presence matrix, can check gene set genes measured specific dataset. Important: presence matrix indexed soma_joinid, positionally indexed. words: first dimension presence matrix dataset’s soma_joinid, stored census_datasets dataframe. second dimension presence matrix feature’s soma_joinid, stored var dataframe. presence matrix method $take() lets slice soma_joinids census_datasets var. full presence matrix, slices , can exported regular matrix method $get_one_based_matrix() Let’s find gene \"ENSG00000286096\" measured dataset id \"97a17473-e2b1-4f31-a544-44a60773e2dd\".","code":"# Get soma_joinid for datasets and genes of interest var_joinid <- var_df$soma_joinid[var_df$feature_id == \"ENSG00000286096\"] dataset_joinid <- datasets_df$soma_joinid[datasets_df$dataset_id == \"97a17473-e2b1-4f31-a544-44a60773e2dd\"]  # Slice presence matrix with datasets and genes of interest presence_matrix_slice <- presence_matrix$take(i = dataset_joinid, j = var_joinid)  # Convert presence matrix to regular matrix presence_matrix_slice <- presence_matrix_slice$get_one_based_matrix()  # Find how if the gene is present in this dataset is_present <- presence_matrix_slice[, , drop = TRUE] cat(paste(\"Feature is\", if (is_present) \"present.\" else \"not present.\")) #> Feature is present."},{"path":"/articles/census_dataset_presence.html","id":"identifying-datasets-that-measured-specific-genes","dir":"Articles","previous_headings":"","what":"Identifying datasets that measured specific genes","title":"Genes measured in each cell (dataset presence matrix)","text":"Similarly, can determine datasets measured specific gene set genes.","code":"# Grab the feature's soma_joinid from the var dataframe var_joinid <- var_df$soma_joinid[var_df$feature_id == \"ENSG00000286096\"]  # The presence matrix is indexed by the joinids of the dataset and var dataframes, # so slice out the feature of interest by its joinid. presence_matrix_slice <- presence_matrix$take(j = var_joinid)$get_one_based_matrix() measured_datasets <- presence_matrix_slice[, , drop = TRUE] != 0 dataset_joinids <- datasets_df$soma_joinid[measured_datasets]  # From the datasets dataframe, slice out the datasets which have a joinid in the list print(datasets_df[dataset_joinids, ]) #>    soma_joinid                        collection_id #> 63          62 3f50314f-bdc9-40c6-8e4a-b0901ebfbe4c #> 64          63 e5f58829-1a66-40b5-a624-9046778e74f5 #> 65          64 e5f58829-1a66-40b5-a624-9046778e74f5 #> 66          65 e5f58829-1a66-40b5-a624-9046778e74f5 #> 67          66 e5f58829-1a66-40b5-a624-9046778e74f5 #> 69          68 e5f58829-1a66-40b5-a624-9046778e74f5 #> 70          69 e5f58829-1a66-40b5-a624-9046778e74f5 #> 72          71 e5f58829-1a66-40b5-a624-9046778e74f5 #> 73          72 e5f58829-1a66-40b5-a624-9046778e74f5 #> 77          76 e5f58829-1a66-40b5-a624-9046778e74f5 #> 78          77 e5f58829-1a66-40b5-a624-9046778e74f5 #>                                                                                                                             collection_name #> 63 Single-cell sequencing links multiregional immune landscapes and tissue-resident T cells in ccRCC to tumor topology and therapy efficacy #> 64                                                                                                                           Tabula Sapiens #> 65                                                                                                                           Tabula Sapiens #> 66                                                                                                                           Tabula Sapiens #> 67                                                                                                                           Tabula Sapiens #> 69                                                                                                                           Tabula Sapiens #> 70                                                                                                                           Tabula Sapiens #> 72                                                                                                                           Tabula Sapiens #> 73                                                                                                                           Tabula Sapiens #> 77                                                                                                                           Tabula Sapiens #> 78                                                                                                                           Tabula Sapiens #>                 collection_doi                           dataset_id #> 63 10.1016/j.ccell.2021.03.007 bd65a70f-b274-4133-b9dd-0d1431b6af34 #> 64     10.1126/science.abl4896 ff45e623-7f5f-46e3-b47d-56be0341f66b #> 65     10.1126/science.abl4896 f01bdd17-4902-40f5-86e3-240d66dd2587 #> 66     10.1126/science.abl4896 e6a11140-2545-46bc-929e-da243eed2cae #> 67     10.1126/science.abl4896 e5c63d94-593c-4338-a489-e1048599e751 #> 69     10.1126/science.abl4896 d77ec7d6-ef2e-49d6-9e79-05b7f8881484 #> 70     10.1126/science.abl4896 cee11228-9f0b-4e57-afe2-cfe15ee56312 #> 72     10.1126/science.abl4896 a2d4d33e-4c62-4361-b80a-9be53d2e50e8 #> 73     10.1126/science.abl4896 a0754256-f44b-4c4a-962c-a552e47d3fdc #> 77     10.1126/science.abl4896 6d41668c-168c-4500-b06a-4674ccf3e19d #> 78     10.1126/science.abl4896 5e5e7a2f-8f1c-42ac-90dc-b4f80f38e84c #>                      dataset_version_id #> 63 71815674-a8cf-4add-95dd-c5d5d1631597 #> 64 0b29f4ce-5e72-4356-b74b-b54714979234 #> 65 bd13c169-af97-4d8f-ba45-7588808c2e48 #> 66 47615a3d-0a9f-4a78-88ef-5cce2a84637d #> 67 ac7714f0-dce2-40ba-9912-324de6c9a77f #> 69 c7679ec2-652d-437a-bded-3ec2344829e4 #> 70 f89fa18f-c32b-4bae-9511-1a4d18f200e1 #> 72 37ada0d2-9970-4ff2-8bcd-41e80ab6e081 #> 73 1cda78aa-f0d9-4d50-96bf-8bc309318802 #> 77 5297a910-453f-4e3f-af16-e18fd5a79090 #> 78 b783b036-c837-4290-a07d-f6b79a301f59 #>                                                                                                                               dataset_title #> 63 Single-cell sequencing links multiregional immune landscapes and tissue-resident T cells in ccRCC to tumor topology and therapy efficacy #> 64                                                                                                                Tabula Sapiens - Pancreas #> 65                                                                                                          Tabula Sapiens - Salivary_Gland #> 66                                                                                                                   Tabula Sapiens - Heart #> 67                                                                                                                 Tabula Sapiens - Bladder #> 69                                                                                                                Tabula Sapiens - Prostate #> 70                                                                                                                  Tabula Sapiens - Spleen #> 72                                                                                                             Tabula Sapiens - Vasculature #> 73                                                                                                                     Tabula Sapiens - Eye #> 77                                                                                                                   Tabula Sapiens - Liver #> 78                                                                                                                     Tabula Sapiens - Fat #>                            dataset_h5ad_path dataset_total_cell_count #> 63 bd65a70f-b274-4133-b9dd-0d1431b6af34.h5ad                   167283 #> 64 ff45e623-7f5f-46e3-b47d-56be0341f66b.h5ad                    13497 #> 65 f01bdd17-4902-40f5-86e3-240d66dd2587.h5ad                    27199 #> 66 e6a11140-2545-46bc-929e-da243eed2cae.h5ad                    11505 #> 67 e5c63d94-593c-4338-a489-e1048599e751.h5ad                    24583 #> 69 d77ec7d6-ef2e-49d6-9e79-05b7f8881484.h5ad                    16375 #> 70 cee11228-9f0b-4e57-afe2-cfe15ee56312.h5ad                    34004 #> 72 a2d4d33e-4c62-4361-b80a-9be53d2e50e8.h5ad                    16037 #> 73 a0754256-f44b-4c4a-962c-a552e47d3fdc.h5ad                    10650 #> 77 6d41668c-168c-4500-b06a-4674ccf3e19d.h5ad                     5007 #> 78 5e5e7a2f-8f1c-42ac-90dc-b4f80f38e84c.h5ad                    20263 #>  [ reached 'max' / getOption(\"max.print\") -- omitted 31 rows ]"},{"path":"/articles/census_dataset_presence.html","id":"identifying-all-genes-measured-in-a-dataset","dir":"Articles","previous_headings":"","what":"Identifying all genes measured in a dataset","title":"Genes measured in each cell (dataset presence matrix)","text":"Finally, can find set genes measured cells given dataset.","code":"# Slice the dataset(s) of interest, and get the joinid(s) dataset_joinids <- datasets_df$soma_joinid[datasets_df$collection_id == \"17481d16-ee44-49e5-bcf0-28c0780d8c4a\"]  # Slice the presence matrix by the first dimension, i.e., by dataset presence_matrix_slice <- presence_matrix$take(i = dataset_joinids)$get_one_based_matrix() genes_measured <- Matrix::colSums(presence_matrix_slice) > 0 var_joinids <- var_df$soma_joinid[genes_measured]  print(var_df[var_joinids, ]) #>    soma_joinid      feature_id feature_name feature_length      nnz n_measured_obs #> 1            0 ENSG00000233576      HTR3C2P           1057    69370       19581263 #> 2            1 ENSG00000121410         A1BG           3999  5640476       62641311 #> 3            2 ENSG00000268895     A1BG-AS1           3374  3071864       61946057 #> 4            3 ENSG00000148584         A1CF           9603   734347       58195911 #> 5            4 ENSG00000175899          A2M           6318  7894261       62704378 #> 6            5 ENSG00000245105      A2M-AS1           2948  1637794       62086816 #> 9            8 ENSG00000184389      A3GALT2           1023   439067       53780311 #> 10           9 ENSG00000128274       A4GALT           3358  2432348       62706770 #> 12          11 ENSG00000265544         AA06            632   220755       22545140 #> 14          13 ENSG00000250420       AACSP1           3380   211588       22831831 #> 16          15 ENSG00000188984      AADACL3           4055    24626       43074608 #> 18          17 ENSG00000240602      AADACP1           2012    29491       23133490 #> 19          18 ENSG00000109576        AADAT           2970  4524608       61559099 #> 20          19 ENSG00000158122       PRXL2C           3098  5424472       55618144 #> 21          20 ENSG00000103591        AAGAB           4138 12427442       62843055 #> 22          21 ENSG00000115977         AAK1          24843 29280566       62664775 #>  [ reached 'max' / getOption(\"max.print\") -- omitted 27195 rows ]"},{"path":"/articles/census_dataset_presence.html","id":"close-the-census","dir":"Articles","previous_headings":"Identifying all genes measured in a dataset","what":"Close the census","title":"Genes measured in each cell (dataset presence matrix)","text":"use, census object closed release memory resources. also closes SOMA objects accessed via top-level census. Closing can automated using .exit(census$close(), add = TRUE) immediately census <- open_soma().","code":"census$close()"},{"path":"/articles/census_datasets.html","id":"fetching-the-datasets-table","dir":"Articles","previous_headings":"","what":"Fetching the datasets table","title":"Census Datasets example","text":"Census contains top-level data frame itemizing datasets contained therein. can read SOMADataFrame Arrow Table: R data frame: sum cell counts across datasets match number cells across SOMA experiments (human, mouse).","code":"library(\"cellxgene.census\") census <- open_soma() census_datasets <- census$get(\"census_info\")$get(\"datasets\")$read()$concat() print(census_datasets) #> Table #> 651 rows x 9 columns #> $soma_joinid  #> $collection_id  #> $collection_name  #> $collection_doi  #> $dataset_id  #> $dataset_version_id  #> $dataset_title  #> $dataset_h5ad_path  #> $dataset_total_cell_count  census_datasets <- as.data.frame(census_datasets) print(census_datasets[, c(   \"dataset_id\",   \"dataset_title\",   \"dataset_total_cell_count\" )]) #>                              dataset_id #> 1  2bdd3a2c-2ff4-4314-adf3-8a06b797a33a #> 2  f5b0810c-1664-4a62-ad06-be1d9964aa8b #> 3  e4ddac12-f48f-4455-8e8d-c2a48a683437 #> 4  e2808a6e-e2ea-41b9-b38c-4a08f1677f02 #> 5  d01c9dff-abd1-4825-bf30-2eb2ba74597e #> 6  c3aa4f95-7a18-4a7d-8dd8-ca324d714363 #> 7  be401db3-d732-408a-b0c4-71af0458b8ab #> 8  a5d5c529-8a1f-40b5-bda3-35208970070d #> 9  9c63201d-bfd9-41a8-bbbc-18d947556f3d #> 10 93cb76aa-a84b-4a92-8e6c-66a914e26d4c #> 11 8d1dd010-5cbc-43fb-83f8-e0de8e8517da #> 12 716a4acc-919e-4326-9672-ebe06ede84e6 #> 13 5bdc423a-59e6-457d-aa01-debd2c9c564f #> 14 5346f9c6-755e-4336-94cc-38706ec00c2f #> 15 015c230d-650c-4527-870d-8a805849a382 #> 16 d567b692-c374-4628-a508-8008f6778f22 #> 17 cf83c98a-3791-4537-bbde-a719f6d73c13 #> 18 738942eb-ac72-44ff-a64b-8943b5ecd8d9 #> 19 f8d8b443-bca6-4c3c-9042-669dfb7f8030 #> 20 f5be4b96-f5a3-4c3d-84ac-6f69daf744d5 #> 21 dea1aa78-c0a2-413f-b375-f91cce49e4d0 #> 22 92161459-9103-4379-ae34-73a38eee1d1d #> 23 5829c7ba-697f-418e-8b98-d605b192dc48 #> 24 4dd1cd23-fc4d-4fd1-9709-602540f3ca6f #> 25 2856d06c-0ff9-4e01-bfc9-202b74d0b60f #> 26 251b1a7e-d050-4486-8d50-4c2619eb0f46 #> 27 07760522-707a-4a1c-8891-dbd1226d6b27 #> 28 9fcb0b73-c734-40a5-be9c-ace7eea401c9 #> 29 1a38e762-2465-418f-b81c-6a4bce261c34 #> 30 f16a8f4d-bc97-43c5-a2f6-bbda952e4c5c #> 31 94c41723-b2c4-4b59-a49a-64c9b851903e #> 32 6ceeaa86-9ceb-4582-b390-6d4dd6ff0572 #> 33 9a64bf99-ebe5-4276-93a8-bee9dff1cd47 #> 34 fc0ceb80-d2d9-47c1-9d78-b0e45c64c500 #> 35 d0ea3ec4-0f3b-4649-9146-1c0b5f303a55 #> 36 b8920ef5-7d22-497b-abca-a7a9eb76d79a #> 37 b1d37bbd-9ae4-4404-b2f9-f2fe66750e4e #> 38 a4e89c26-e8d4-4471-9b06-16a1405880f0 #> 39 a190b2e9-3796-4785-9a2f-013e2a9a43e6 #> 40 9ff9f9ba-016b-4cbb-8899-45dc20860b8b #> 41 9940f951-3dc0-4579-bbb2-2392786e59a3 #> 42 74d584f0-74fc-482e-b944-e76f29c1ab85 #> 43 6f7fd0f1-a2ed-4ff1-80d3-33dde731cbc3 #> 44 6cda07c7-5d7a-41ba-9799-5bb73da25a60 #> 45 646e3e87-e46b-4b12-85b5-8d8589e26088 #> 46 6437bc9c-16cb-46c8-8f79-9a7384a0212a #> 47 58c43cc2-e00e-43c4-94eb-8501369264e1 #> 48 53bc5729-6202-4351-bc99-1f36139e9dc4 #> 49 44c83972-e5d2-4858-ac58-2df9f4bf564b #> 50 2ecc72f8-085f-4e86-8692-771f316c54f6 #> 51 2e5a9b5d-d31b-4e9f-a179-d5d70ba459fb #> 52 1c9f5c6b-73da-4d17-95de-df080ffe0df1 #> 53 100c6145-7b0e-4ba6-81c1-ffebed0d1ac4 #> 54 0ed60482-a34f-4268-b576-d69cc30210f6 #> 55 0eccaf0c-19d2-4900-9962-899378adf8be #> 56 04c94a7d-1133-42c9-bb48-c697bd302a8d #> 57 0374f03c-62e2-4859-8a14-acb00b0627d5 #> 58 03181d87-4769-41e7-8c39-d9a81835f0d2 #> 59 f171db61-e57e-4535-a06a-35d8b6ef8f2b #> 60 ecf2e08e-2032-4a9e-b466-b65b395f4a02 #> 61 74cff64f-9da9-4b2a-9b3b-8a04a1598040 #> 62 5af90777-6760-4003-9dba-8f945fec6fdf #> 63 bd65a70f-b274-4133-b9dd-0d1431b6af34 #> 64 ff45e623-7f5f-46e3-b47d-56be0341f66b #> 65 f01bdd17-4902-40f5-86e3-240d66dd2587 #> 66 e6a11140-2545-46bc-929e-da243eed2cae #> 67 e5c63d94-593c-4338-a489-e1048599e751 #> 68 d8732da6-8d1d-42d9-b625-f2416c30054b #> 69 d77ec7d6-ef2e-49d6-9e79-05b7f8881484 #> 70 cee11228-9f0b-4e57-afe2-cfe15ee56312 #> 71 a357414d-2042-4eb5-95f0-c58604a18bdd #> 72 a2d4d33e-4c62-4361-b80a-9be53d2e50e8 #> 73 a0754256-f44b-4c4a-962c-a552e47d3fdc #> 74 983d5ec9-40e8-4512-9e65-a572a9c486cb #> 75 7357cee7-9f7f-4ab0-8cec-90de8f047e38 #> 76 6ec405bb-4727-4c6d-ab4e-01fe489af7ea #> 77 6d41668c-168c-4500-b06a-4674ccf3e19d #> 78 5e5e7a2f-8f1c-42ac-90dc-b4f80f38e84c #> 79 55cf0ea3-9d2b-4294-871e-bb4b49a79fc7 #> 80 4f1555bc-4664-46c3-a606-78d34dd10d92 #> 81 2ba40233-8576-4dec-a5f1-2adfa115e2dc #> 82 2423ce2c-3149-4cca-a2ff-cf682ea29b5f #> 83 1c9eb291-6d31-47e1-96b2-129b5e1ae64f #> 84 18eb630b-a754-4111-8cd4-c24ec80aa5ec #> 85 0d2ee4ac-05ee-40b2-afb6-ebb584caa867 #>                                                                                                                               dataset_title #> 1                                                                                                                   Human: Great apes study #> 2                                                                                                           Dissection: Angular gyrus (AnG) #> 3                                                                                                    Supercluster: CGE-derived interneurons #> 4                                                                                                   Dissection: Primary auditory cortex(A1) #> 5                                                                                      Supercluster: Deep layer (non-IT) excitatory neurons #> 6                                                                                            Supercluster: IT-projecting excitatory neurons #> 7                                                                                               Dissection: Anterior cingulate cortex (ACC) #> 8                                                                                                   Human Multiple Cortical Areas SMART-seq #> 9                                                                                                    Supercluster: MGE-derived interneurons #> 10                                                                                            Dissection: Primary somatosensory cortex (S1) #> 11                                                                                                    Dissection: Primary visual cortex(V1) #> 12                                                                                         Dissection: Dorsolateral prefrontal cortex (DFC) #> 13                                                                                                    Dissection: Primary motor cortex (M1) #> 14                                                                                                         Supercluster: Non-neuronal cells #> 15                                                                                                  Dissection: Middle temporal gyrus (MTG) #> 16                                                                       Combined single cell and single nuclei RNA-Seq data - Heart Global #> 17                                                                                                    Global dataset of infant KMT2Ar B-ALL #> 18                                                                                     Normal immune cells landscape of infant KMT2Ar B-ALL #> 19                                                                                                      Human Human Microglia 10x scRNA-seq #> 20                                                                                                    Human Endothelial cells 10x scRNA-seq #> 21                                                                                                 Human Nurr-Negative Nuclei 10x scRNA-seq #> 22                                                                                                 Human Nurr-Positive Nuclei 10x scRNA-seq #> 23                                                                                                     Human Oligodendrocytes 10x scRNA-seq #> 24                                                                                                            Human OPC Cells 10x scRNA-seq #> 25                                                                                                           Human DA Neurons 10x scRNA-seq #> 26                                                                                                       Human Non-DA Neurons 10x scRNA-seq #> 27                                                                                                           Human Astrocytes 10x scRNA-seq #> 28                                                                              An Integrated Single Cell Meta-atlas of Human Periodontitis #> 29                                                                Single-cell analysis of prenatal and postnatal human cortical development #> 30                                                       All - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse #> 31                                                                                    snRNA-seq of human anterior and posterior hippocampus #> 32                                                                                                                        3-prime FGID data #> 33                                                      Single-Cell RNA Sequencing of Breast Tissues: Cell Subtypes and Cancer Risk Factors #> 34                                                                            Sst Chodl - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 35                                                                                  L6b - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 36                                                                              L5/6 NP - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 37                                                                                 Sncg - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 38                                                                                L6 CT - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 39                                                                           Lamp5 Lhx6 - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 40                                                                                L4 IT - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 41                                                                      Oligodendrocyte - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 42                                                                            Astrocyte - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 43                                                                       Whole Taxonomy - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 44                                                                                L5 ET - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 45                                                                              L2/3 IT - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 46                                                                                L6 IT - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 47                                                                                  OPC - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 48                                                                                  Vip - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 49                                                                                L5 IT - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 50                                                                          Endothelial - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 51                                                                                 VLMC - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 52                                                                           L6 IT Car3 - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 53                                                                        Microglia-PVM - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 54                                                                                Lamp5 - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 55                                                                                 Pax6 - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 56                                                                                Pvalb - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 57                                                                           Chandelier - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 58                                                                                  Sst - DLPFC: Seattle Alzheimer's Disease Atlas (SEA-AD) #> 59                                                                                                                   donor_p13_trophoblasts #> 60                                                                                                                  All donors trophoblasts #> 61                                                                                                     All donors all cell states (in vivo) #> 62                                                                     Single-cell transcriptomic datasets of Renal cell carcinoma patients #> 63 Single-cell sequencing links multiregional immune landscapes and tissue-resident T cells in ccRCC to tumor topology and therapy efficacy #> 64                                                                                                                Tabula Sapiens - Pancreas #> 65                                                                                                          Tabula Sapiens - Salivary_Gland #> 66                                                                                                                   Tabula Sapiens - Heart #> 67                                                                                                                 Tabula Sapiens - Bladder #> 68                                                                                                                 Tabula Sapiens - Trachea #> 69                                                                                                                Tabula Sapiens - Prostate #> 70                                                                                                                  Tabula Sapiens - Spleen #> 71                                                                                                         Tabula Sapiens - Small_Intestine #> 72                                                                                                             Tabula Sapiens - Vasculature #> 73                                                                                                                     Tabula Sapiens - Eye #> 74                                                                                                                   Tabula Sapiens - Blood #> 75                                                                                                         Tabula Sapiens - Large_Intestine #> 76                                                                                                                  Tabula Sapiens - Uterus #> 77                                                                                                                   Tabula Sapiens - Liver #> 78                                                                                                                     Tabula Sapiens - Fat #> 79                                                                                                                  Tabula Sapiens - Tongue #> 80                                                                                                             Tabula Sapiens - Bone_Marrow #> 81                                                                                                                 Tabula Sapiens - Mammary #> 82                                                                                                                  Tabula Sapiens - Kidney #> 83                                                                                                                  Tabula Sapiens - Muscle #> 84                                                                                                              Tabula Sapiens - Lymph_Node #> 85                                                                                                                    Tabula Sapiens - Lung #>    dataset_total_cell_count #> 1                    156285 #> 2                    110752 #> 3                    129495 #> 4                    139054 #> 5                     92969 #> 6                    638941 #> 7                    135462 #> 8                     49417 #> 9                    185477 #> 10                   153159 #> 11                   241077 #> 12                   113339 #> 13                   114605 #> 14                   108940 #> 15                   148374 #> 16                   493236 #> 17                   128588 #> 18                    36313 #> 19                    33041 #> 20                    14903 #> 21                   104097 #> 22                    80576 #> 23                   178815 #> 24                    13691 #> 25                    22048 #> 26                    91479 #> 27                    33506 #> 28                   105918 #> 29                   700391 #> 30                   356213 #> 31                   129905 #> 32                    89849 #> 33                    52681 #> 34                     1772 #> 35                    17996 #> 36                    18154 #> 37                    23640 #> 38                    27454 #> 39                    21603 #> 40                    76195 #> 41                   136076 #> 42                    82936 #> 43                  1309414 #> 44                     3848 #> 45                   317116 #> 46                    44174 #> 47                    27670 #> 48                    95014 #> 49                    97173 #> 50                     2496 #> 51                     4619 #> 52                    13007 #> 53                    40625 #> 54                    52828 #> 55                     8984 #> 56                   109618 #> 57                    14871 #> 58                    71545 #> 59                    31497 #> 60                    67070 #> 61                   286326 #> 62                   270855 #> 63                   167283 #> 64                    13497 #> 65                    27199 #> 66                    11505 #> 67                    24583 #> 68                     9522 #> 69                    16375 #> 70                    34004 #> 71                    12467 #> 72                    16037 #> 73                    10650 #> 74                    50115 #> 75                    13680 #> 76                     7124 #> 77                     5007 #> 78                    20263 #> 79                    15020 #> 80                    12297 #> 81                    11375 #> 82                     9641 #> 83                    30746 #> 84                    53275 #> 85                    35682 #>  [ reached 'max' / getOption(\"max.print\") -- omitted 566 rows ] census_data <- census$get(\"census_data\") all_experiments <- lapply(census_data$to_list(), function(x) census_data$get(x$name)) print(all_experiments) #> $homo_sapiens #>  #>   uri: s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/census_data/homo_sapiens  #>   arrays: obs*  #>   groups: ms*  #>  #> $mus_musculus #>  #>   uri: s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/census_data/mus_musculus  #>   arrays: obs*  #>   groups: ms* experiments_total_cells <- sum(sapply(all_experiments, function(x) {   nrow(x$obs$read(column_names = c(\"soma_joinid\"))$concat()) }))  print(paste(\"Found\", experiments_total_cells, \"cells in all experiments.\")) #> [1] \"Found 68683222 cells in all experiments.\" print(paste(   \"Found\", sum(as.vector(census_datasets$dataset_total_cell_count)),   \"cells in all datasets.\" )) #> [1] \"Found 68683222 cells in all datasets.\""},{"path":"/articles/census_datasets.html","id":"fetching-the-expression-data-from-a-single-dataset","dir":"Articles","previous_headings":"","what":"Fetching the expression data from a single dataset","title":"Census Datasets example","text":"Let’s pick one dataset slice census, turn Seurat -memory object. (requires Seurat package installed beforehand.) Create query mouse experiment, “RNA” measurement, dataset_id.","code":"census_datasets[census_datasets$dataset_id == \"0bd1a1de-3aee-40e0-b2ec-86c7a30c7149\", ] #>     soma_joinid                        collection_id    collection_name #> 581         580 0b9d8a04-bb9d-44da-aa27-705bb65b54eb Tabula Muris Senis #>                collection_doi                           dataset_id #> 581 10.1038/s41586-020-2496-1 0bd1a1de-3aee-40e0-b2ec-86c7a30c7149 #>                       dataset_version_id #> 581 ff352f35-58a2-4962-b716-649d1f9e9f44 #>                                                                                        dataset_title #> 581 Bone marrow - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - 10x #>                             dataset_h5ad_path dataset_total_cell_count #> 581 0bd1a1de-3aee-40e0-b2ec-86c7a30c7149.h5ad                    40220 library(\"tiledbsoma\") obs_query <- SOMAAxisQuery$new(   value_filter = \"dataset_id == '0bd1a1de-3aee-40e0-b2ec-86c7a30c7149'\" ) expt_query <- census_data$get(\"mus_musculus\")$axis_query(   measurement_name = \"RNA\",   obs_query = obs_query ) dataset_seurat <- expt_query$to_seurat(c(counts = \"raw\")) print(dataset_seurat) #> An object of class Seurat  #> 52417 features across 40220 samples within 1 assay  #> Active assay: RNA (52417 features, 0 variable features) #>  2 layers present: counts, data #>  1 dimensional reduction calculated: scvi"},{"path":"/articles/census_datasets.html","id":"downloading-the-original-source-h5ad-file-of-a-dataset","dir":"Articles","previous_headings":"","what":"Downloading the original source H5AD file of a dataset","title":"Census Datasets example","text":"can use cellxgene.census::get_source_h5ad_uri() API fetch URI pointing H5AD associated dataset_id. H5AD can download CZ CELLxGENE Discover, may contain additional data-submitter provided information included Census. can fetch location cloud directly download system. local H5AD file can used R using SeuratDisk’s anndata converter.","code":"# Option 1: Direct download download_source_h5ad(   dataset_id = \"0bd1a1de-3aee-40e0-b2ec-86c7a30c7149\",   file = \"/tmp/Tabula_Muris_Senis-bone_marrow.h5ad\",   overwrite = TRUE ) # Option 2: Get location and download via preferred method get_source_h5ad_uri(\"0bd1a1de-3aee-40e0-b2ec-86c7a30c7149\") #> $uri #> [1] \"s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/h5ads/0bd1a1de-3aee-40e0-b2ec-86c7a30c7149.h5ad\" #>  #> $s3_region #> [1] \"us-west-2\""},{"path":"/articles/census_datasets.html","id":"close-the-census","dir":"Articles","previous_headings":"Downloading the original source H5AD file of a dataset","what":"Close the census","title":"Census Datasets example","text":"use, census object closed release memory resources. also closes SOMA objects accessed via top-level census. Closing can automated using .exit(census$close(), add = TRUE) immediately census <- open_soma().","code":"census$close()"},{"path":"/articles/census_query_extract.html","id":"opening-the-census","dir":"Articles","previous_headings":"","what":"Opening the census","title":"Querying and fetching the single-cell data and cell/gene metadata","text":"cellxgene.census R package contains convenient API open version Census (default, newest stable version). can learn cellxgene.census methods accessing corresponding documentation, example ?cellxgene.census::open_soma.","code":"library(\"cellxgene.census\") census <- open_soma()"},{"path":"/articles/census_query_extract.html","id":"querying-cell-metadata-obs","dir":"Articles","previous_headings":"","what":"Querying cell metadata (obs)","title":"Querying and fetching the single-cell data and cell/gene metadata","text":"human gene metadata Census, RNA assays, located census$get(\"census_data\")$get(\"homo_sapiens\")$obs. SOMADataFrame can materialized R data frame (tibble) using .data.frame(obs$read()$concat()). mouse cell metadata census$get(\"census_data\")$get(\"mus_musculus\").obs. slicing cell metadata two relevant arguments can passed read(): column_names — character vector indicating metadata columns fetch. Expressions one comparisons Comparisons one       Expressions can combine comparisons using && || op one < | > | <= | >= | == | != %% learn metadata columns available fetching filtering can directly look keys cell metadata. soma_joinid special SOMADataFrame column used join operations. definition columns can found Census schema. can used fetch specific columns specific rows matching condition. latter need know values looking priori. example let’s see possible values available sex. can load cell metadata fetching column sex. can see three different values sex, \"male\", \"female\" \"unknown\". information can fetch cell metatadata specific sex value, example \"unknown\". can use column_names value_filter perform specific queries. example let’s fetch disease column cell_type \"B cell\" tissue_general \"lung\".","code":"census$get(\"census_data\")$get(\"homo_sapiens\")$obs$colnames() #>  [1] \"soma_joinid\"                              #>  [2] \"dataset_id\"                               #>  [3] \"assay\"                                    #>  [4] \"assay_ontology_term_id\"                   #>  [5] \"cell_type\"                                #>  [6] \"cell_type_ontology_term_id\"               #>  [7] \"development_stage\"                        #>  [8] \"development_stage_ontology_term_id\"       #>  [9] \"disease\"                                  #> [10] \"disease_ontology_term_id\"                 #> [11] \"donor_id\"                                 #> [12] \"is_primary_data\"                          #> [13] \"self_reported_ethnicity\"                  #> [14] \"self_reported_ethnicity_ontology_term_id\" #> [15] \"sex\"                                      #> [16] \"sex_ontology_term_id\"                     #> [17] \"suspension_type\"                          #> [18] \"tissue\"                                   #> [19] \"tissue_ontology_term_id\"                  #> [20] \"tissue_general\"                           #> [21] \"tissue_general_ontology_term_id\"          #> [22] \"raw_sum\"                                  #> [23] \"nnz\"                                      #> [24] \"raw_mean_nnz\"                             #> [25] \"raw_variance_nnz\"                         #> [26] \"n_measured_vars\" unique(as.data.frame(census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(column_names = \"sex\")$concat())) #>             sex #> 1          male #> 224      female #> 3747640 unknown as.data.frame(census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(value_filter = \"sex == 'unknown'\")$concat()) #>   soma_joinid                           dataset_id     assay assay_ontology_term_id #> 1     3747639 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 2     3747640 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 3     3747641 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 4     3747642 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 5     3747643 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 6     3747644 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 7     3747645 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 8     3747646 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #> 9     3747647 9fcb0b73-c734-40a5-be9c-ace7eea401c9 10x 3' v2            EFO:0009899 #>    cell_type cell_type_ontology_term_id development_stage #> 1 fibroblast                 CL:0000057 human adult stage #> 2 fibroblast                 CL:0000057 human adult stage #> 3 fibroblast                 CL:0000057 human adult stage #> 4 fibroblast                 CL:0000057 human adult stage #> 5 fibroblast                 CL:0000057 human adult stage #> 6 fibroblast                 CL:0000057 human adult stage #> 7 fibroblast                 CL:0000057 human adult stage #> 8 fibroblast                 CL:0000057 human adult stage #> 9 fibroblast                 CL:0000057 human adult stage #>   development_stage_ontology_term_id disease disease_ontology_term_id #> 1                     HsapDv:0000087  normal             PATO:0000461 #> 2                     HsapDv:0000087  normal             PATO:0000461 #> 3                     HsapDv:0000087  normal             PATO:0000461 #> 4                     HsapDv:0000087  normal             PATO:0000461 #> 5                     HsapDv:0000087  normal             PATO:0000461 #> 6                     HsapDv:0000087  normal             PATO:0000461 #> 7                     HsapDv:0000087  normal             PATO:0000461 #> 8                     HsapDv:0000087  normal             PATO:0000461 #> 9                     HsapDv:0000087  normal             PATO:0000461 #>                       donor_id is_primary_data self_reported_ethnicity #> 1 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 2 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 3 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 4 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 5 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 6 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 7 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 8 Pagella_GSE161267_GSM4904134            TRUE                 unknown #> 9 Pagella_GSE161267_GSM4904134            TRUE                 unknown #>   self_reported_ethnicity_ontology_term_id     sex sex_ontology_term_id suspension_type #> 1                                  unknown unknown              unknown            cell #> 2                                  unknown unknown              unknown            cell #> 3                                  unknown unknown              unknown            cell #> 4                                  unknown unknown              unknown            cell #> 5                                  unknown unknown              unknown            cell #> 6                                  unknown unknown              unknown            cell #> 7                                  unknown unknown              unknown            cell #> 8                                  unknown unknown              unknown            cell #> 9                                  unknown unknown              unknown            cell #>    tissue tissue_ontology_term_id tissue_general tissue_general_ontology_term_id #> 1 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 2 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 3 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 4 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 5 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 6 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 7 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 8 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #> 9 gingiva          UBERON:0001828         mucosa                  UBERON:0000344 #>   raw_sum  nnz raw_mean_nnz raw_variance_nnz n_measured_vars #> 1     547  329     1.662614        14.559604           31602 #> 2     982  563     1.744227         5.315247           31602 #> 3   12467 3809     3.273038       109.305683           31602 #> 4    1053  566     1.860424         7.430042           31602 #> 5     548  363     1.509642         2.410818           31602 #> 6     678  429     1.580420        11.379616           31602 #> 7     848  524     1.618321         9.437216           31602 #> 8     935  608     1.537829         4.868418           31602 #> 9     735  485     1.515464         6.213087           31602 #>  [ reached 'max' / getOption(\"max.print\") -- omitted 3301779 rows ] cell_metadata_b_cell <- census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(   value_filter = \"cell_type == 'B cell' & tissue_general == 'lung'\",   column_names = \"disease\" )  cell_metadata_b_cell <- as.data.frame(cell_metadata_b_cell$concat())  table(cell_metadata_b_cell) #> disease #>                              COVID-19 chronic obstructive pulmonary disease  #>                                  2729                                  6369  #>          hypersensitivity pneumonitis             interstitial lung disease  #>                                    52                                   376  #>                   lung adenocarcinoma             lung large cell carcinoma  #>                                 62351                                  1534  #>              lymphangioleiomyomatosis         non-small cell lung carcinoma  #>                                   133                                 17484  #>   non-specific interstitial pneumonia                                normal  #>                                   231                                 25461  #>                 pleomorphic carcinoma                             pneumonia  #>                                  1210                                    50  #>                   pulmonary emphysema                    pulmonary fibrosis  #>                                  1512                                  6798  #>                 pulmonary sarcoidosis             small cell lung carcinoma  #>                                     6                                   583  #>          squamous cell lung carcinoma  #>                                 11920"},{"path":"/articles/census_query_extract.html","id":"querying-gene-metadata-var","dir":"Articles","previous_headings":"","what":"Querying gene metadata (var)","title":"Querying and fetching the single-cell data and cell/gene metadata","text":"human gene metadata Census located census$get(\"census_data\")$get(\"homo_sapiens\")$ms$get(\"RNA\")$var. Similarly cell metadata, SOMADataFrame thus can also use method read(). mouse gene metadata census$get(\"census_data\")$get(\"mus_musculus\")$ms$get(\"RNA\")$var. Let’s take look metadata available column selection row filtering. exception soma_joinid columns defined Census schema. Similarly cell metadata, can use operations learn fetch gene metadata. example, get feature_name feature_length genes \"ENSG00000161798\" \"ENSG00000188229\" can following.","code":"census$get(\"census_data\")$get(\"homo_sapiens\")$ms$get(\"RNA\")$var$colnames() #> [1] \"soma_joinid\"    \"feature_id\"     \"feature_name\"   \"feature_length\" \"nnz\"            #> [6] \"n_measured_obs\" var_df <- census$get(\"census_data\")$get(\"homo_sapiens\")$ms$get(\"RNA\")$var$read(   value_filter = \"feature_id %in% c('ENSG00000161798', 'ENSG00000188229')\",   column_names = c(\"feature_name\", \"feature_length\") )  as.data.frame(var_df$concat()) #>   feature_name feature_length #> 1         AQP5           1884 #> 2       TUBB4B           2037"},{"path":"/articles/census_query_extract.html","id":"querying-expression-data-as-seurat","dir":"Articles","previous_headings":"","what":"Querying expression data as Seurat","title":"Querying and fetching the single-cell data and cell/gene metadata","text":"convenient way query fetch expression data use get_seurat method cellxgene.census API. method combines column selection value filtering described obtain slices expression data based metadata queries. method return Seurat object, takes input census object, string organism, cell gene metadata can specify filters column selection described following arguments: obs_column_names — character vector indicating columns select cell metadata. obs_value_filter — expression selection conditions fetch cells meeting criteria. var_column_names — character vector indicating columns select gene metadata. var_value_filter — expression selection conditions fetch genes meeting criteria. example want fetch expression data : Genes \"ENSG00000161798\" \"ENSG00000188229\". \"B cells\" \"lung\" \"COVID-19\". gene metadata adding sex cell metadata. full description refer ?cellxgene.census::get_seurat.","code":"library(\"Seurat\")  seurat_obj <- get_seurat(   census, \"Homo sapiens\",   obs_column_names = c(\"cell_type\", \"tissue_general\", \"disease\", \"sex\"),   var_value_filter = \"feature_id %in% c('ENSG00000161798', 'ENSG00000188229')\",   obs_value_filter = \"cell_type == 'B cell' & tissue_general == 'lung' & disease == 'COVID-19'\" ) seurat_obj #> An object of class Seurat  #> 2 features across 2729 samples within 1 assay  #> Active assay: RNA (2 features, 0 variable features) #>  2 layers present: counts, data head(seurat_obj[[]]) #>                 orig.ident nCount_RNA nFeature_RNA cell_type tissue_general  disease #> cell13391229 SeuratProject          0            0    B cell           lung COVID-19 #> cell13393737 SeuratProject          1            1    B cell           lung COVID-19 #> cell13394391 SeuratProject          0            0    B cell           lung COVID-19 #> cell13394897 SeuratProject          0            0    B cell           lung COVID-19 #> cell13395941 SeuratProject          0            0    B cell           lung COVID-19 #> cell13397408 SeuratProject          0            0    B cell           lung COVID-19 #>                  sex #> cell13391229    male #> cell13393737 unknown #> cell13394391    male #> cell13394897 unknown #> cell13395941    male #> cell13397408 unknown head(seurat_obj$RNA[[]]) #>                 feature_name feature_length      nnz n_measured_obs #> ENSG00000161798         AQP5           1884  1029069       58250439 #> ENSG00000188229       TUBB4B           2037 21416107       62655002"},{"path":"/articles/census_query_extract.html","id":"querying-expression-data-as-singlecellexperiment","dir":"Articles","previous_headings":"","what":"Querying expression data as SingleCellExperiment","title":"Querying and fetching the single-cell data and cell/gene metadata","text":"Similarly previous section, get_single_cell_experiment method cellxgene.census API. behaves exactly get_seurat returns SingleCellExperiment object. example, repeat query can simply following. full description refer ?cellxgene.census::get_single_cell_experiment.","code":"library(\"SingleCellExperiment\")  sce_obj <- get_single_cell_experiment(   census, \"Homo sapiens\",   obs_column_names = c(\"cell_type\", \"tissue_general\", \"disease\", \"sex\"),   var_value_filter = \"feature_id %in% c('ENSG00000161798', 'ENSG00000188229')\",   obs_value_filter = \"cell_type == 'B cell' & tissue_general == 'lung' & disease == 'COVID-19'\" ) sce_obj #> class: SingleCellExperiment  #> dim: 2 2729  #> metadata(0): #> assays(1): counts #> rownames(2): ENSG00000161798 ENSG00000188229 #> rowData names(4): feature_name feature_length nnz n_measured_obs #> colnames(2729): obs13391229 obs13393737 ... obs54635684 obs54635708 #> colData names(4): cell_type tissue_general disease sex #> reducedDimNames(0): #> mainExpName: RNA #> altExpNames(0): head(colData(sce_obj)) #> DataFrame with 6 rows and 4 columns #>               cell_type tissue_general     disease         sex #>                    #> obs13391229      B cell           lung    COVID-19        male #> obs13393737      B cell           lung    COVID-19     unknown #> obs13394391      B cell           lung    COVID-19        male #> obs13394897      B cell           lung    COVID-19     unknown #> obs13395941      B cell           lung    COVID-19        male #> obs13397408      B cell           lung    COVID-19     unknown head(rowData(sce_obj)) #> DataFrame with 2 rows and 4 columns #>                 feature_name feature_length       nnz n_measured_obs #>                                #> ENSG00000161798         AQP5           1884   1029069       58250439 #> ENSG00000188229       TUBB4B           2037  21416107       62655002"},{"path":"/articles/census_query_extract.html","id":"close-the-census","dir":"Articles","previous_headings":"Querying expression data as SingleCellExperiment","what":"Close the census","title":"Querying and fetching the single-cell data and cell/gene metadata","text":"use, census object closed release memory resources. also closes SOMA objects accessed via top-level census. Closing can automated using .exit(census$close(), add = TRUE) immediately census <- open_soma().","code":"census$close()"},{"path":"/articles/comp_bio_census_info.html","id":"opening-the-census","dir":"Articles","previous_headings":"","what":"Opening the Census","title":"Learning about the CZ CELLxGENE Census","text":"cellxgene.census R package contains convenient open_soma() API open version Census (stable default). can learn cellxgene.census methods accessing corresponding documentation, example ?cellxgene.census::open_soma.","code":"library(\"cellxgene.census\") census <- open_soma()"},{"path":"/articles/comp_bio_census_info.html","id":"census-organization","dir":"Articles","previous_headings":"","what":"Census organization","title":"Learning about the CZ CELLxGENE Census","text":"Census schema defines structure Census. short, can think Census structured collection items stores different pieces information. items parent collection SOMA objects various types can accessed TileDB-SOMA API (documentation). cellxgene.census package contains convenient wrappers TileDB-SOMA API. example function used open Census: cellxgene_census.open_soma().","code":""},{"path":"/articles/comp_bio_census_info.html","id":"main-census-components","dir":"Articles","previous_headings":"Census organization","what":"Main Census components","title":"Learning about the CZ CELLxGENE Census","text":"command created census, SOMACollection, R6 class providing key-value associative map. get() method can access two top-level collection members, census_info census_data, instances SOMACollection.","code":""},{"path":"/articles/comp_bio_census_info.html","id":"census-summary-info","dir":"Articles","previous_headings":"Census organization","what":"Census summary info","title":"Learning about the CZ CELLxGENE Census","text":"census$get(\"census_info\")$get(\"summary\"): data frame high-level information Census, e.g. build date, total cell count, etc. census$get(\"census_info\")$get(\"datasets\"): data frame datasets CELLxGENE Discover used create Census. census$get(\"census_info\")$get(\"summary_cell_counts\"): data frame cell counts stratified relevant cell metadata Census data Data organism stored independent SOMAExperiment objects specialized form SOMACollection. store data matrix (cell genes), cell metadata, gene metadata, useful components covered notebook. data organized one organism – Homo sapiens: census$get(\"census_data\")$get(\"homo_sapiens\")$obs: Cell metadata census$get(\"census_data\")$get(\"homo_sapiens\")$ms$get(\"RNA\"): Data matrices, currently … census$get(\"census_data\")$get(\"homo_sapiens\")$ms$get(\"RNA\")$X$get(\"raw\"): matrix raw counts SOMASparseNDArray census$get(\"census_data\")$get(\"homo_sapiens\")$ms$get(\"RNA\")$var: Gene Metadata","code":""},{"path":"/articles/comp_bio_census_info.html","id":"cell-metadata","dir":"Articles","previous_headings":"","what":"Cell metadata","title":"Learning about the CZ CELLxGENE Census","text":"can obtain cell metadata variables directly querying columns corresponding SOMADataFrame. variables can used querying Census case want work specific cells. variables defined CELLxGENE dataset schema except following: soma_joinid: SOMA-defined value use join operations. dataset_id: dataset id encoded census$get(\"census_info\")$get(\"datasets\"). tissue_general tissue_general_ontology_term_id: high-level tissue mapping.","code":"census$get(\"census_data\")$get(\"homo_sapiens\")$obs$colnames() #>  [1] \"soma_joinid\"                              #>  [2] \"dataset_id\"                               #>  [3] \"assay\"                                    #>  [4] \"assay_ontology_term_id\"                   #>  [5] \"cell_type\"                                #>  [6] \"cell_type_ontology_term_id\"               #>  [7] \"development_stage\"                        #>  [8] \"development_stage_ontology_term_id\"       #>  [9] \"disease\"                                  #> [10] \"disease_ontology_term_id\"                 #> [11] \"donor_id\"                                 #> [12] \"is_primary_data\"                          #> [13] \"self_reported_ethnicity\"                  #> [14] \"self_reported_ethnicity_ontology_term_id\" #> [15] \"sex\"                                      #> [16] \"sex_ontology_term_id\"                     #> [17] \"suspension_type\"                          #> [18] \"tissue\"                                   #> [19] \"tissue_ontology_term_id\"                  #> [20] \"tissue_general\"                           #> [21] \"tissue_general_ontology_term_id\"          #> [22] \"raw_sum\"                                  #> [23] \"nnz\"                                      #> [24] \"raw_mean_nnz\"                             #> [25] \"raw_variance_nnz\"                         #> [26] \"n_measured_vars\""},{"path":"/articles/comp_bio_census_info.html","id":"gene-metadata","dir":"Articles","previous_headings":"","what":"Gene metadata","title":"Learning about the CZ CELLxGENE Census","text":"Similarly, can obtain gene metadata variables directly querying columns corresponding SOMADataFrame. variables can use querying Census case specific genes interested . variables defined CELLxGENE dataset schema except following: soma_joinid: SOMA-defined value use join operations. feature_length: length base pairs gene.","code":"census$get(\"census_data\")$get(\"homo_sapiens\")$ms$get(\"RNA\")$var$colnames() #> [1] \"soma_joinid\"    \"feature_id\"     \"feature_name\"   \"feature_length\" \"nnz\"            #> [6] \"n_measured_obs\""},{"path":"/articles/comp_bio_census_info.html","id":"census-summary-content-tables","dir":"Articles","previous_headings":"","what":"Census summary content tables","title":"Learning about the CZ CELLxGENE Census","text":"can take quick look high-level Census information looking census$get(\"census_info\")$get(\"summary\"): special interest label-value combinations : total_cell_count total number cells Census. unique_cell_count number unique cells, cells may present twice due meta-analysis consortia-like data. number_donors_homo_sapiens number_donors_mus_musculus number individuals human mouse. guaranteed unique one individual ID may present identical different datasets.","code":"as.data.frame(census$get(\"census_info\")$get(\"summary\")$read()$concat()) #>   soma_joinid                      label      value #> 1           0      census_schema_version      1.2.0 #> 2           1          census_build_date 2023-10-23 #> 3           2     dataset_schema_version      3.1.0 #> 4           3           total_cell_count   68683222 #> 5           4          unique_cell_count   40356133 #> 6           5 number_donors_homo_sapiens      15588 #> 7           6 number_donors_mus_musculus       1990"},{"path":"/articles/comp_bio_census_info.html","id":"cell-counts-by-cell-metadata","dir":"Articles","previous_headings":"Census summary content tables","what":"Cell counts by cell metadata","title":"Learning about the CZ CELLxGENE Census","text":"looking census$get(\"census_info)$get(\"summary_cell_counts\") can get general idea cell counts stratified relevant cell metadata. cell metadata included table, can take look cell gene metadata available sections “Cell metadata” “Gene metadata”. line retrieves table casts R data frame: combination organism values category cell metadata can take look total_cell_count unique_cell_count cell counts combination. values category specified ontology_term_id label, value’s IDs labels, respectively.","code":"census_counts <- as.data.frame(census$get(\"census_info\")$get(\"summary_cell_counts\")$read()$concat()) head(census_counts) #>   soma_joinid     organism category ontology_term_id unique_cell_count total_cell_count #> 1           0 Homo sapiens      all               na          36227903         62998417 #> 2           1 Homo sapiens    assay      EFO:0008722            264166           279635 #> 3           2 Homo sapiens    assay      EFO:0008780             25652            51304 #> 4           3 Homo sapiens    assay      EFO:0008796             54753            54753 #> 5           4 Homo sapiens    assay      EFO:0008919             89477           206754 #> 6           5 Homo sapiens    assay      EFO:0008931             78750           188248 #>        label #> 1         na #> 2   Drop-seq #> 3     inDrop #> 4   MARS-seq #> 5   Seq-Well #> 6 Smart-seq2"},{"path":"/articles/comp_bio_census_info.html","id":"example-cell-metadata-included-in-the-summary-counts-table","dir":"Articles","previous_headings":"Census summary content tables > Cell counts by cell metadata","what":"Example: cell metadata included in the summary counts table","title":"Learning about the CZ CELLxGENE Census","text":"get available cell metadata summary counts table can following. Remember cell metadata available, variables omitted creation table.","code":"t(table(census_counts$organism, census_counts$category)) #>                           #>                           Homo sapiens Mus musculus #>   all                                1            1 #>   assay                             20           10 #>   cell_type                        631          248 #>   disease                           72            5 #>   self_reported_ethnicity           30            1 #>   sex                                3            3 #>   suspension_type                    1            1 #>   tissue                           230           74 #>   tissue_general                    53           27"},{"path":"/articles/comp_bio_census_info.html","id":"example-cell-counts-for-each-sequencing-assay-in-human-data","dir":"Articles","previous_headings":"Census summary content tables > Cell counts by cell metadata","what":"Example: cell counts for each sequencing assay in human data","title":"Learning about the CZ CELLxGENE Census","text":"get cell counts sequencing assay type human data, can perform following operations:","code":"human_assay_counts <- census_counts[census_counts$organism == \"Homo sapiens\" & census_counts$category == \"assay\", ] human_assay_counts <- human_assay_counts[order(human_assay_counts$total_cell_count, decreasing = TRUE), ]"},{"path":"/articles/comp_bio_census_info.html","id":"example-number-of-microglial-cells-in-the-census","dir":"Articles","previous_headings":"Census summary content tables > Cell counts by cell metadata","what":"Example: number of microglial cells in the Census","title":"Learning about the CZ CELLxGENE Census","text":"specific term categories shown can directly find number cells term.","code":"census_counts[census_counts$label == \"microglial cell\", ] #>      soma_joinid     organism  category ontology_term_id unique_cell_count #> 72            71 Homo sapiens cell_type       CL:0000129            359243 #> 1080        1079 Mus musculus cell_type       CL:0000129             48998 #>      total_cell_count           label #> 72             544977 microglial cell #> 1080            75885 microglial cell"},{"path":"/articles/comp_bio_census_info.html","id":"understanding-census-contents-beyond-the-summary-tables","dir":"Articles","previous_headings":"","what":"Understanding Census contents beyond the summary tables","title":"Learning about the CZ CELLxGENE Census","text":"using pre-computed tables census$get(\"census_info\") easy quick way understand contents Census, falls short want learn certain slices Census. example, may want learn : cell types available human liver? total number cells lung datasets stratified sequencing technology? sex distribution cells brain mouse? diseases available T cells? questions can answered directly querying cell metadata shown examples .","code":""},{"path":"/articles/comp_bio_census_info.html","id":"example-all-cell-types-available-in-human","dir":"Articles","previous_headings":"Understanding Census contents beyond the summary tables","what":"Example: all cell types available in human","title":"Learning about the CZ CELLxGENE Census","text":"exemplify process accessing slicing cell metadata summary stats, let’s start trivial example take look human cell types available Census: number rows total number cells humans. Now, wish get cell counts per cell type can work data frame. addition, focus cells marked is_primary_data=TRUE ensures de-duplicate cells appear CELLxGENE Discover. number unique cells. Now let’s look counts per cell type: shows abundant cell types “glutamatergic neuron”, “CD8-positive, alpha-beta T cell”, “CD4-positive, alpha-beta T cell”. Now let’s take look number unique cell types: total number different cell types human. information example can quickly obtained summary table census$get(\"census-info\")$get(\"summary_cell_counts\"). examples complex can achieved accessing cell metadata.","code":"obs_df <- census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(column_names = c(\"cell_type\", \"is_primary_data\")) as.data.frame(obs_df$concat()) #>                            cell_type is_primary_data #> 1                    oligodendrocyte           FALSE #> 2     oligodendrocyte precursor cell           FALSE #> 3   astrocyte of the cerebral cortex           FALSE #> 4   astrocyte of the cerebral cortex           FALSE #> 5   astrocyte of the cerebral cortex           FALSE #> 6     oligodendrocyte precursor cell           FALSE #> 7   astrocyte of the cerebral cortex           FALSE #> 8                    microglial cell           FALSE #> 9   astrocyte of the cerebral cortex           FALSE #> 10  astrocyte of the cerebral cortex           FALSE #> 11  astrocyte of the cerebral cortex           FALSE #> 12  astrocyte of the cerebral cortex           FALSE #> 13  astrocyte of the cerebral cortex           FALSE #> 14  astrocyte of the cerebral cortex           FALSE #> 15  astrocyte of the cerebral cortex           FALSE #> 16    oligodendrocyte precursor cell           FALSE #> 17                   oligodendrocyte           FALSE #> 18  astrocyte of the cerebral cortex           FALSE #> 19  astrocyte of the cerebral cortex           FALSE #> 20  astrocyte of the cerebral cortex           FALSE #> 21  astrocyte of the cerebral cortex           FALSE #> 22  astrocyte of the cerebral cortex           FALSE #> 23    oligodendrocyte precursor cell           FALSE #> 24  astrocyte of the cerebral cortex           FALSE #> 25  astrocyte of the cerebral cortex           FALSE #> 26    oligodendrocyte precursor cell           FALSE #> 27                   microglial cell           FALSE #> 28                   oligodendrocyte           FALSE #> 29  astrocyte of the cerebral cortex           FALSE #> 30  cerebral cortex endothelial cell           FALSE #> 31                   microglial cell           FALSE #> 32                   microglial cell           FALSE #> 33                   microglial cell           FALSE #> 34                   oligodendrocyte           FALSE #> 35                   oligodendrocyte           FALSE #> 36                   microglial cell           FALSE #> 37                   oligodendrocyte           FALSE #> 38                   oligodendrocyte           FALSE #> 39  astrocyte of the cerebral cortex           FALSE #> 40                   oligodendrocyte           FALSE #> 41  astrocyte of the cerebral cortex           FALSE #> 42                   oligodendrocyte           FALSE #> 43    oligodendrocyte precursor cell           FALSE #> 44                   oligodendrocyte           FALSE #> 45  astrocyte of the cerebral cortex           FALSE #> 46    oligodendrocyte precursor cell           FALSE #> 47                   oligodendrocyte           FALSE #> 48    oligodendrocyte precursor cell           FALSE #> 49  astrocyte of the cerebral cortex           FALSE #> 50  astrocyte of the cerebral cortex           FALSE #> 51  astrocyte of the cerebral cortex           FALSE #> 52                   oligodendrocyte           FALSE #> 53                   oligodendrocyte           FALSE #> 54                   oligodendrocyte           FALSE #> 55  astrocyte of the cerebral cortex           FALSE #> 56  cerebral cortex endothelial cell           FALSE #> 57                   oligodendrocyte           FALSE #> 58                   oligodendrocyte           FALSE #> 59                   oligodendrocyte           FALSE #> 60                   microglial cell           FALSE #> 61                   microglial cell           FALSE #> 62    oligodendrocyte precursor cell           FALSE #> 63    oligodendrocyte precursor cell           FALSE #> 64                   oligodendrocyte           FALSE #> 65    oligodendrocyte precursor cell           FALSE #> 66                   oligodendrocyte           FALSE #> 67  astrocyte of the cerebral cortex           FALSE #> 68                   oligodendrocyte           FALSE #> 69    oligodendrocyte precursor cell           FALSE #> 70                   oligodendrocyte           FALSE #> 71  astrocyte of the cerebral cortex           FALSE #> 72  astrocyte of the cerebral cortex           FALSE #> 73  astrocyte of the cerebral cortex           FALSE #> 74    oligodendrocyte precursor cell           FALSE #> 75  astrocyte of the cerebral cortex           FALSE #> 76    oligodendrocyte precursor cell           FALSE #> 77                   microglial cell           FALSE #> 78                   microglial cell           FALSE #> 79    oligodendrocyte precursor cell           FALSE #> 80                   oligodendrocyte           FALSE #> 81                   oligodendrocyte           FALSE #> 82  astrocyte of the cerebral cortex           FALSE #> 83                   oligodendrocyte           FALSE #> 84  astrocyte of the cerebral cortex           FALSE #> 85  astrocyte of the cerebral cortex           FALSE #> 86                   oligodendrocyte           FALSE #> 87  astrocyte of the cerebral cortex           FALSE #> 88                   oligodendrocyte           FALSE #> 89    oligodendrocyte precursor cell           FALSE #> 90    oligodendrocyte precursor cell           FALSE #> 91  astrocyte of the cerebral cortex           FALSE #> 92  astrocyte of the cerebral cortex           FALSE #> 93  astrocyte of the cerebral cortex           FALSE #> 94                   oligodendrocyte           FALSE #> 95  astrocyte of the cerebral cortex           FALSE #> 96  astrocyte of the cerebral cortex           FALSE #> 97                   oligodendrocyte           FALSE #> 98                   oligodendrocyte           FALSE #> 99    oligodendrocyte precursor cell           FALSE #> 100                  oligodendrocyte           FALSE #> 101                  oligodendrocyte           FALSE #> 102                  oligodendrocyte           FALSE #> 103 astrocyte of the cerebral cortex           FALSE #> 104   oligodendrocyte precursor cell           FALSE #> 105                  oligodendrocyte           FALSE #> 106   oligodendrocyte precursor cell           FALSE #> 107                  oligodendrocyte           FALSE #> 108                  oligodendrocyte           FALSE #> 109                  oligodendrocyte           FALSE #> 110                  oligodendrocyte           FALSE #> 111   oligodendrocyte precursor cell           FALSE #> 112                  oligodendrocyte           FALSE #> 113                  oligodendrocyte           FALSE #> 114 astrocyte of the cerebral cortex           FALSE #> 115                  oligodendrocyte           FALSE #> 116 astrocyte of the cerebral cortex           FALSE #> 117                  oligodendrocyte           FALSE #> 118                  oligodendrocyte           FALSE #> 119                  oligodendrocyte           FALSE #> 120 astrocyte of the cerebral cortex           FALSE #> 121 astrocyte of the cerebral cortex           FALSE #> 122   oligodendrocyte precursor cell           FALSE #> 123                  microglial cell           FALSE #> 124 astrocyte of the cerebral cortex           FALSE #> 125 astrocyte of the cerebral cortex           FALSE #> 126                  microglial cell           FALSE #> 127 cerebral cortex endothelial cell           FALSE #> 128   oligodendrocyte precursor cell           FALSE #>  [ reached 'max' / getOption(\"max.print\") -- omitted 62998289 rows ] obs_df <- census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(   column_names = \"cell_type\",   value_filter = \"is_primary_data == TRUE\" )  obs_df <- as.data.frame(obs_df$concat()) nrow(obs_df) #> [1] 36227903 human_cell_type_counts <- table(obs_df$cell_type) sort(human_cell_type_counts, decreasing = TRUE)[1:10] #>  #>                                                             neuron  #>                                                            2815336  #>                                               glutamatergic neuron  #>                                                            1563446  #>                                    CD4-positive, alpha-beta T cell  #>                                                            1243885  #>                                    CD8-positive, alpha-beta T cell  #>                                                            1197715  #> L2/3-6 intratelencephalic projecting glutamatergic cortical neuron  #>                                                            1123360  #>                                                    oligodendrocyte  #>                                                            1063874  #>                                                 classical monocyte  #>                                                            1030996  #>                                                        native cell  #>                                                            1011949  #>                                                             B cell  #>                                                             934060  #>                                                natural killer cell  #>                                                             770637 length(human_cell_type_counts) #> [1] 610"},{"path":"/articles/comp_bio_census_info.html","id":"example-cell-types-available-in-human-liver","dir":"Articles","previous_headings":"Understanding Census contents beyond the summary tables","what":"Example: cell types available in human liver","title":"Learning about the CZ CELLxGENE Census","text":"Similar example , can learn cell types available specific tissue, e.g. liver. achieve goal just need limit cell metadata tissue. use information cell metadata variable tissue_general. variable contains high-level tissue label cells Census: cell types cell counts human liver.","code":"obs_liver_df <- census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(   column_names = \"cell_type\",   value_filter = \"is_primary_data == TRUE & tissue_general == 'liver'\" )  obs_liver_df <- as.data.frame(obs_liver_df$concat())  sort(table(obs_liver_df$cell_type), decreasing = TRUE)[1:10] #>  #>                          T cell                     hepatoblast  #>                           85739                           58447  #>                 neoplastic cell                    erythroblast  #>                           52431                           45605  #>                        monocyte                      hepatocyte  #>                           31388                           28309  #>             natural killer cell    periportal region hepatocyte  #>                           26871                           23509  #>                      macrophage centrilobular region hepatocyte  #>                           16707                           15819"},{"path":"/articles/comp_bio_census_info.html","id":"example-diseased-t-cells-in-human-tissues","dir":"Articles","previous_headings":"Understanding Census contents beyond the summary tables","what":"Example: diseased T cells in human tissues","title":"Learning about the CZ CELLxGENE Census","text":"example going get counts diseased cells annotated T cells. sake example focus “CD8-positive, alpha-beta T cell” “CD4-positive, alpha-beta T cell”: cell counts annotated indicated disease across human tissues “CD8-positive, alpha-beta T cell” “CD4-positive, alpha-beta T cell”.","code":"obs_t_cells_df <- census$get(\"census_data\")$get(\"homo_sapiens\")$obs$read(   column_names = c(\"disease\", \"tissue_general\"),   value_filter = \"is_primary_data == TRUE & disease != 'normal' & cell_type %in% c('CD8-positive, alpha-beta T cell', 'CD4-positive, alpha-beta T cell')\" )  obs_t_cells_df <- as.data.frame(obs_t_cells_df$concat())  print(table(obs_t_cells_df)) #>                                        tissue_general #> disease                                 adrenal gland  blood bone marrow  brain breast #>   COVID-19                                          0 819428           0      0      0 #>   Crohn disease                                     0      0           0      0      0 #>   Down syndrome                                     0      0         181      0      0 #>   breast cancer                                     0      0           0      0   1850 #>   chronic obstructive pulmonary disease             0      0           0      0      0 #>   chronic rhinitis                                  0      0           0      0      0 #>   clear cell renal carcinoma                        0   6548           0      0      0 #>   cystic fibrosis                                   0      0           0      0      0 #>   follicular lymphoma                               0      0           0      0      0 #>   influenza                                         0   8871           0      0      0 #>   interstitial lung disease                         0      0           0      0      0 #>   kidney benign neoplasm                            0      0           0      0      0 #>   kidney oncocytoma                                 0      0           0      0      0 #>   lung adenocarcinoma                             205      0           0   3274      0 #>   lung large cell carcinoma                         0      0           0      0      0 #>   lymphangioleiomyomatosis                          0      0           0      0      0 #>                                        tissue_general #> disease                                  colon kidney  liver   lung lymph node   nose #>   COVID-19                                   0      0      0  30578          0     13 #>   Crohn disease                          17490      0      0      0          0      0 #>   Down syndrome                              0      0      0      0          0      0 #>   breast cancer                              0      0      0      0          0      0 #>   chronic obstructive pulmonary disease      0      0      0   9382          0      0 #>   chronic rhinitis                           0      0      0      0          0    909 #>   clear cell renal carcinoma                 0  20540      0      0         36      0 #>   cystic fibrosis                            0      0      0      7          0      0 #>   follicular lymphoma                        0      0      0      0       1089      0 #>   influenza                                  0      0      0      0          0      0 #>   interstitial lung disease                  0      0      0   1803          0      0 #>   kidney benign neoplasm                     0     10      0      0          0      0 #>   kidney oncocytoma                          0   2303      0      0          0      0 #>   lung adenocarcinoma                        0      0    507 215013      24969      0 #>   lung large cell carcinoma                  0      0      0   5922          0      0 #>   lymphangioleiomyomatosis                   0      0      0    513          0      0 #>                                        tissue_general #> disease                                 pleural fluid respiratory system saliva #>   COVID-19                                          0                  4     41 #>   Crohn disease                                     0                  0      0 #>   Down syndrome                                     0                  0      0 #>   breast cancer                                     0                  0      0 #>   chronic obstructive pulmonary disease             0                  0      0 #>   chronic rhinitis                                  0                  0      0 #>   clear cell renal carcinoma                        0                  0      0 #>   cystic fibrosis                                   0                  0      0 #>   follicular lymphoma                               0                  0      0 #>   influenza                                         0                  0      0 #>   interstitial lung disease                         0                  0      0 #>   kidney benign neoplasm                            0                  0      0 #>   kidney oncocytoma                                 0                  0      0 #>   lung adenocarcinoma                           11558                  0      0 #>   lung large cell carcinoma                         0                  0      0 #>   lymphangioleiomyomatosis                          0                  0      0 #>                                        tissue_general #> disease                                 small intestine vasculature #>   COVID-19                                            0           0 #>   Crohn disease                                   52029           0 #>   Down syndrome                                       0           0 #>   breast cancer                                       0           0 #>   chronic obstructive pulmonary disease               0           0 #>   chronic rhinitis                                    0           0 #>   clear cell renal carcinoma                          0           0 #>   cystic fibrosis                                     0           0 #>   follicular lymphoma                                 0           0 #>   influenza                                           0           0 #>   interstitial lung disease                           0           0 #>   kidney benign neoplasm                              0           0 #>   kidney oncocytoma                                   0           0 #>   lung adenocarcinoma                                 0           0 #>   lung large cell carcinoma                           0           0 #>   lymphangioleiomyomatosis                            0           0 #>  [ reached getOption(\"max.print\") -- omitted 8 rows ]"},{"path":"/articles/comp_bio_data_integration.html","id":"finding-and-fetching-data-from-mouse-liver-10x-genomics-and-smart-seq2","dir":"Articles","previous_headings":"","what":"Finding and fetching data from mouse liver (10X Genomics and Smart-Seq2)","title":"Integrating multi-dataset slices of data with Seurat","text":"Let’s load packages needed notebook. Now can open Census. notebook use Tabula Muris Senis data liver contains cells 10X Genomics Smart-Seq2 technologies. Let’s query datasets table Census filtering collection_name “Tabula Muris Senis” dataset_title “liver”. Now can use values dataset_id query load Seurat object cells datasets. can check cell counts 10X Genomics Smart-Seq2 data looking assay metadata.","code":"library(\"cellxgene.census\") library(\"Seurat\") census <- open_soma() census_datasets <- census$get(\"census_info\")$get(\"datasets\") census_datasets <- census_datasets$read(value_filter = \"collection_name == 'Tabula Muris Senis'\") census_datasets <- as.data.frame(census_datasets$concat())  # Print rows with liver data census_datasets[grep(\"Liver\", census_datasets$dataset_title), ] #>    soma_joinid                        collection_id    collection_name #> 15         583 0b9d8a04-bb9d-44da-aa27-705bb65b54eb Tabula Muris Senis #> 36         605 0b9d8a04-bb9d-44da-aa27-705bb65b54eb Tabula Muris Senis #>               collection_doi                           dataset_id #> 15 10.1038/s41586-020-2496-1 4546e757-34d0-4d17-be06-538318925fcd #> 36 10.1038/s41586-020-2496-1 6202a243-b713-4e12-9ced-c387f8483dea #>                      dataset_version_id #> 15 0a851e26-a629-4e59-9b52-9b4d1ce4440b #> 36 70f4f091-86a9-44e3-a92a-54cee98cc223 #>                                                                                        dataset_title #> 15 Liver - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - Smart-seq2 #> 36        Liver - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - 10x #>                            dataset_h5ad_path dataset_total_cell_count #> 15 4546e757-34d0-4d17-be06-538318925fcd.h5ad                     2859 #> 36 6202a243-b713-4e12-9ced-c387f8483dea.h5ad                     7294 tabula_muris_liver_ids <- c(\"4546e757-34d0-4d17-be06-538318925fcd\", \"6202a243-b713-4e12-9ced-c387f8483dea\")  seurat_obj <- get_seurat(   census,   organism = \"Mus musculus\",   obs_value_filter = \"dataset_id %in% tabula_muris_liver_ids\" ) table(seurat_obj$assay) #>  #>  10x 3' v2 Smart-seq2  #>       7294       2859"},{"path":"/articles/comp_bio_data_integration.html","id":"gene-length-normalization-of-smart-seq2-data-","dir":"Articles","previous_headings":"","what":"Gene-length normalization of Smart-Seq2 data.","title":"Integrating multi-dataset slices of data with Seurat","text":"Smart-seq2 read counts normalized gene length. Lets first get gene lengths var.feature_length. Now can use normalize Smart-seq data. let’s split object assay. normalize Smart-seq slice using gene lengths merge back single object.","code":"smart_seq_gene_lengths <- seurat_obj$RNA[[]]$feature_length seurat_obj.list <- SplitObject(seurat_obj, split.by = \"assay\") seurat_obj.list[[\"Smart-seq2\"]][[\"RNA\"]]@counts <- seurat_obj.list[[\"Smart-seq2\"]][[\"RNA\"]]@counts / smart_seq_gene_lengths seurat_obj <- merge(seurat_obj.list[[1]], seurat_obj.list[[2]])"},{"path":"/articles/comp_bio_data_integration.html","id":"integration-with-seurat","dir":"Articles","previous_headings":"","what":"Integration with Seurat","title":"Integrating multi-dataset slices of data with Seurat","text":"use native integration capabilities Seurat. comprehensive usage best practices Seurat intergation please refer doc site Seurat.","code":""},{"path":"/articles/comp_bio_data_integration.html","id":"inspecting-data-prior-to-integration","dir":"Articles","previous_headings":"Integration with Seurat","what":"Inspecting data prior to integration","title":"Integrating multi-dataset slices of data with Seurat","text":"Let’s take look strength batch effects data. perform embedding visualization via UMAP. Let’s basic data normalization variable gene selection now perform PCA UMAP   can see batch effects strong cells cluster primarily assay cell_type. Properly integrated embedding principle cluster primarily cell_type, assay best randomly distributed.","code":"seurat_obj <- SCTransform(seurat_obj) seurat_obj <- FindVariableFeatures(seurat_obj, selection.method = \"vst\", nfeatures = 2000) seurat_obj <- RunPCA(seurat_obj, features = VariableFeatures(object = seurat_obj)) seurat_obj <- RunUMAP(seurat_obj, dims = 1:30) # By assay p1 <- DimPlot(seurat_obj, reduction = \"umap\", group.by = \"assay\") p1 # By cell type p2 <- DimPlot(seurat_obj, reduction = \"umap\", group.by = \"cell_type\") p2"},{"path":"/articles/comp_bio_data_integration.html","id":"data-integration-with-seurat","dir":"Articles","previous_headings":"Integration with Seurat","what":"Data integration with Seurat","title":"Integrating multi-dataset slices of data with Seurat","text":"Whenever query fetch Census data multiple datasets integration needs performed evidenced batch effects observed. paramaters Seurat used notebook selected model run quickly. best practices integration single-cell data using Seurat please refer documentation page. seurat_d reading article integrated cell atlas human lung health disease Sikkema et al. perfomed integration 43 datasets Lung. focus metadata Census can batch information integration.","code":""},{"path":"/articles/comp_bio_data_integration.html","id":"integration-across-datasets-using-dataset_id","dir":"Articles","previous_headings":"Integration with Seurat > Data integration with Seurat","what":"Integration across datasets using dataset_id","title":"Integrating multi-dataset slices of data with Seurat","text":"cells Census annotated dataset come \"dataset_id\". great place start integration. let’s run Seurat integration pipeline. First define model batch set dataset_id. Firs normalize select variable genes seperated batch key dataset_id Now perform integration. Let’s inspect results normalization UMAP visulization. plot UMAP.   Great! can see clustering longer mainly driven assay, albeit still contributing . Great! can see clustering longer mainly driven assay, albeit still contributing .","code":"# split the dataset into a list of two seurat objects for each dataset seurat_obj.list <- SplitObject(seurat_obj, split.by = \"dataset_id\")  # normalize each dataset independently seurat_obj.list <- lapply(X = seurat_obj.list, FUN = function(x) {   x <- SCTransform(x) })  # select features for integration features <- SelectIntegrationFeatures(object.list = seurat_obj.list) seurat_obj.list <- PrepSCTIntegration(seurat_obj.list, anchor.features = features) seurat_obj.anchors <- FindIntegrationAnchors(object.list = seurat_obj.list, anchor.features = features, normalization.method = \"SCT\") seurat_obj.combined <- IntegrateData(anchorset = seurat_obj.anchors, normalization.method = \"SCT\") DefaultAssay(seurat_obj.combined) <- \"integrated\"  # Run the standard workflow for visualization and clustering seurat_obj.combined <- ScaleData(seurat_obj.combined, verbose = FALSE) seurat_obj.combined <- RunPCA(seurat_obj.combined, npcs = 30, verbose = FALSE) seurat_obj.combined <- RunUMAP(seurat_obj.combined, reduction = \"pca\", dims = 1:30) # By assay p1 <- DimPlot(seurat_obj.combined, reduction = \"umap\", group.by = \"assay\") p1 # By cell type p2 <- DimPlot(seurat_obj.combined, reduction = \"umap\", group.by = \"cell_type\") p2"},{"path":"/articles/comp_bio_data_integration.html","id":"integration-across-datasets-using-dataset_id-and-controlling-for-batch-using-donor_id","dir":"Articles","previous_headings":"Integration with Seurat > Data integration with Seurat","what":"Integration across datasets using dataset_id and controlling for batch using donor_id","title":"Integrating multi-dataset slices of data with Seurat","text":"Similar dataset_id, cells Census annotated donor_id. definition donor_id depends dataset left discretion data curators. However still rich information can used batch variable integration. donor_id guaranteed unique across cells Census, strongly recommend concatenating dataset_id donor_id use batch separator Seurat Now perform integration. inspect new results UMAP. Plot UMAP.   can see using dataset_id donor_id batch cells now mostly cluster cell type.","code":"# split the dataset into a list of two seurat objects for each dataset seurat_obj.list <- SplitObject(seurat_obj, split.by = \"dataset_id\")  # normalize each dataset independently controlling for batch seurat_obj.list <- lapply(X = seurat_obj.list, FUN = function(x) {   x <- SCTransform(x, vars.to.regress = \"donor_id\") })  # select features for integration features <- SelectIntegrationFeatures(object.list = seurat_obj.list) seurat_obj.list <- PrepSCTIntegration(seurat_obj.list, anchor.features = features) seurat_obj.anchors <- FindIntegrationAnchors(object.list = seurat_obj.list, anchor.features = features, normalization.method = \"SCT\") #> Finding all pairwise anchors #> Running CCA #> Merging objects #> Finding neighborhoods #> Finding anchors #>  Found 7161 anchors #> Filtering anchors #>  Retained 4990 anchors seurat_obj.combined <- IntegrateData(anchorset = seurat_obj.anchors, normalization.method = \"SCT\") #> [1] 1 #> Warning: Different cells and/or features from existing assay SCT #> [1] 2 #> Warning: Different cells and/or features from existing assay SCT #> Merging dataset 1 into 2 #> Extracting anchors for merged samples #> Finding integration vectors #> Finding integration vector weights #> Integrating data #> Warning: Assay integrated changing from Assay to SCTAssay  #> Warning: Different cells and/or features from existing assay SCT DefaultAssay(seurat_obj.combined) <- \"integrated\"  # Run the standard workflow for visualization and clustering seurat_obj.combined <- RunPCA(seurat_obj.combined, npcs = 30, verbose = FALSE) seurat_obj.combined <- RunUMAP(seurat_obj.combined, reduction = \"pca\", dims = 1:30) #> 20:51:25 UMAP embedding parameters a = 0.9922 b = 1.112 #> 20:51:25 Read 10153 rows and found 30 numeric columns #> 20:51:25 Using Annoy for neighbor search, n_neighbors = 30 #> 20:51:25 Building Annoy index with metric = cosine, n_trees = 50 #> 0%   10   20   30   40   50   60   70   80   90   100% #> [----|----|----|----|----|----|----|----|----|----| #> **************************************************| #> 20:51:27 Writing NN index file to temp file /tmp/RtmpHRWXl8/file40a97c51cfd1 #> 20:51:27 Searching Annoy index using 1 thread, search_k = 3000 #> 20:51:31 Annoy recall = 100% #> 20:51:31 Commencing smooth kNN distance calibration using 1 thread with target n_neighbors = 30 #> 20:51:32 Initializing from normalized Laplacian + noise (using RSpectra) #> 20:51:32 Commencing optimization for 200 epochs, with 409958 positive edges #> 20:51:37 Optimization finished # By assay p1 <- DimPlot(seurat_obj.combined, reduction = \"umap\", group.by = \"assay\") p1 # By cell type p2 <- DimPlot(seurat_obj.combined, reduction = \"umap\", group.by = \"cell_type\") p2"},{"path":"/articles/comp_bio_data_integration.html","id":"integration-across-datasets-using-dataset_id-and-controlling-for-batch-using-donor_id-assay_ontology_term_id-suspension_type-","dir":"Articles","previous_headings":"Integration with Seurat > Data integration with Seurat","what":"Integration across datasets using dataset_id and controlling for batch using donor_id + assay_ontology_term_id + suspension_type.","title":"Integrating multi-dataset slices of data with Seurat","text":"cases one dataset may contain multiple assay types /multiple suspension types (cell vs nucleus), important consider metadata batches. Therefore, comprehensive definition batch Census can accomplished combining cell metadata dataset_id, donor_id, assay_ontology_term_id suspension_type, latter encode EFO ids assay types. example, two datasets used contain cells one assay , one suspension type . Thus make difference include metadata part batch. implementation look line","code":"# EXAMPLE, DON'T RUN.  # split the dataset into a list of seurat objects for each dataset seurat_obj.list <- SplitObject(seurat_obj, split.by = \"dataset_id\")  # normalize each dataset independently controlling for batch seurat_obj.list <- lapply(X = seurat_obj.list, FUN = function(x) {   x <- SCTransform(x, vars.to.regress = c(\"donor_id\", \"assay_ontology_term_id\", \"suspension_type\")) })  # select features for integration features <- SelectIntegrationFeatures(object.list = seurat_obj.list)  # integrate seurat_obj.list <- PrepSCTIntegration(seurat_obj.list, anchor.features = features) seurat_obj.anchors <- FindIntegrationAnchors(object.list = seurat_obj.list, anchor.features = features, normalization.method = \"SCT\") seurat_obj.combined <- IntegrateData(anchorset = seurat_obj.anchors, normalization.method = \"SCT\")"},{"path":"/articles/comp_bio_normalizing_full_gene_sequencing.html","id":"opening-the-census","dir":"Articles","previous_headings":"","what":"Opening the census","title":"Normalizing full-length gene sequencing data","text":"First open Census: can learn cellxgene.census methods accessing corresponding documentation, example ?cellxgene.census::open_soma.","code":"library(\"Seurat\") census <- cellxgene.census::open_soma()"},{"path":"/articles/comp_bio_normalizing_full_gene_sequencing.html","id":"fetching-full-length-example-sequencing-data-smart-seq","dir":"Articles","previous_headings":"","what":"Fetching full-length example sequencing data (Smart-Seq)","title":"Normalizing full-length gene sequencing data","text":"Let’s get example data, case ’ll fetch cells relatively small dataset derived Smart-Seq2 technology performs full-length gene sequencing: Collection: Tabula Muris Senis Dataset: Liver - single-cell transcriptomic atlas characterizes ageing tissues mouse - Smart-seq2 Let’s first find dataset’s id using dataset table Census. Now can use id fetch data. Let’s make sure data contains Smart-Seq2 cells. Great! can see small dataset containing 2,859 cells. Now let’s proceed normalize gene lengths.","code":"liver_dataset <- as.data.frame(   census$get(\"census_info\")$get(\"datasets\")   $read(value_filter = \"dataset_title == 'Liver - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - Smart-seq2'\")   $concat() ) liver_dataset #>   soma_joinid                        collection_id    collection_name #> 1         583 0b9d8a04-bb9d-44da-aa27-705bb65b54eb Tabula Muris Senis #>              collection_doi                           dataset_id #> 1 10.1038/s41586-020-2496-1 4546e757-34d0-4d17-be06-538318925fcd #>                     dataset_version_id #> 1 0a851e26-a629-4e59-9b52-9b4d1ce4440b #>                                                                                       dataset_title #> 1 Liver - A single-cell transcriptomic atlas characterizes ageing tissues in the mouse - Smart-seq2 #>                           dataset_h5ad_path dataset_total_cell_count #> 1 4546e757-34d0-4d17-be06-538318925fcd.h5ad                     2859 liver_dataset_id <- liver_dataset[1, \"dataset_id\"] liver_seurat <- cellxgene.census::get_seurat(   census,   organism = \"Mus musculus\",   obs_value_filter = paste0(\"dataset_id == '\", liver_dataset_id, \"'\") ) table(liver_seurat$assay) #>  #> Smart-seq2  #>       2859"},{"path":"/articles/comp_bio_normalizing_full_gene_sequencing.html","id":"normalizing-expression-to-account-for-gene-length","dir":"Articles","previous_headings":"","what":"Normalizing expression to account for gene length","title":"Normalizing full-length gene sequencing data","text":"default cellxgene_census::get_seurat() fetches genes Census. let’s first identify genes measured dataset subset Seurat obect include . goal can use “Dataset Presence Matrix” census$get(\"census_data\")$get(\"mus_musculus\")$ms$get(\"RNA\")$get(\"feature_dataset_presence_matrix\"). boolean matrix N x M N number datasets, M number genes Census, 1 entry indicates gene measured dataset. (Note Seurat objects transposed layout M x N.) Let’s get genes measured dataset. can see genes Census 17,992 measured dataset. Now let’s normalize genes gene length. can easily Census gene lengths included gene metadata feature_length. done! can now see real numbers instead integers.","code":"liver_seurat #> An object of class Seurat  #> 52417 features across 2859 samples within 1 assay  #> Active assay: RNA (52417 features, 0 variable features) #>  2 layers present: counts, data liver_dataset_joinid <- liver_dataset$soma_joinid[1] presence_matrix <- cellxgene.census::get_presence_matrix(census, \"Mus musculus\", \"RNA\") presence_matrix <- presence_matrix$take(liver_dataset_joinid) gene_presence <- as.vector(presence_matrix$get_one_based_matrix())  liver_seurat <- liver_seurat[gene_presence, ] liver_seurat #> An object of class Seurat  #> 17992 features across 2859 samples within 1 assay  #> Active assay: RNA (17992 features, 0 variable features) #>  2 layers present: counts, data GetAssayData(liver_seurat[1:5, 1:5], slot = \"data\") #> Warning: The `slot` argument of `GetAssayData()` is deprecated as of SeuratObject 5.0.0. #> i Please use the `layer` argument instead. #> This warning is displayed once every 8 hours. #> Call `lifecycle::last_lifecycle_warnings()` to see where this warning was generated. #> 5 x 5 sparse Matrix of class \"dgCMatrix\" #>                    cell3959639 cell3959640 cell3959641 cell3959642 cell3959643 #> ENSMUSG00000025900           .           .           .           .           . #> ENSMUSG00000025902           .           .           .           .        2250 #> ENSMUSG00000033845           .         559        1969           .           . #> ENSMUSG00000025903           .           .           .           .           . #> ENSMUSG00000033813           .           .         828           1          54 gene_lengths <- liver_seurat$RNA@meta.features$feature_length liver_seurat <- SetAssayData(   liver_seurat,   new.data = sweep(GetAssayData(liver_seurat, slot = \"data\"), 1, gene_lengths, \"/\") ) GetAssayData(liver_seurat[1:5, 1:5], slot = \"data\") #> 5 x 5 sparse Matrix of class \"dgCMatrix\" #>                    cell3959639 cell3959640 cell3959641  cell3959642 cell3959643 #> ENSMUSG00000025900           .  .            .         .             .          #> ENSMUSG00000025902           .  .            .         .             0.47150042 #> ENSMUSG00000033845           .  0.06586544   0.2320019 .             .          #> ENSMUSG00000025903           .  .            .         .             .          #> ENSMUSG00000033813           .  .            0.2744448 0.0003314551  0.01789857"},{"path":"/articles/comp_bio_normalizing_full_gene_sequencing.html","id":"validation-through-clustering-exploration","dir":"Articles","previous_headings":"","what":"Validation through clustering exploration","title":"Normalizing full-length gene sequencing data","text":"Let’s perform basic clustering analysis see cell types cluster expected using normalized counts. First basic filtering cells genes. normalize account sequencing depth transform data log scale. subset highly variable genes. finally scale values across gene axis. Now can proceed clustering analysis.  exceptions can see cells cell type cluster near serves sanity check gene-length normalization applied. Don’t forget close census.","code":"cells_per_gene <- rowSums(GetAssayData(liver_seurat, slot = \"counts\") > 0) genes_per_cell <- Matrix::colSums(liver_seurat$RNA@counts > 0) liver_seurat <- liver_seurat[cells_per_gene >= 5, genes_per_cell >= 500] liver_seurat <- Seurat::NormalizeData(   liver_seurat,   normalization.method = \"LogNormalize\",   scale.factor = 10000 ) liver_seurat <- Seurat::FindVariableFeatures(   liver_seurat,   selection.method = \"vst\",   nfeatures = 1000 ) all.genes <- rownames(liver_seurat) liver_seurat <- Seurat::ScaleData(liver_seurat, features = all.genes) liver_seurat <- RunPCA(   liver_seurat,   features = VariableFeatures(object = liver_seurat) ) liver_seurat <- FindNeighbors(liver_seurat, dims = 1:40) liver_seurat <- RunUMAP(liver_seurat, dims = 1:40) DimPlot(liver_seurat, reduction = \"umap\", group.by = \"cell_type\") census$close()"},{"path":"/articles/comp_bio_summarize_axis_query.html","id":"opening-the-census","dir":"Articles","previous_headings":"","what":"Opening the Census","title":"Summarizing cell and gene metadata","text":"cellxgene.census R package contains convenient API open version Census (default, newest stable version). open Census, close census$close(). can automated using .exit(census$close(), add = TRUE) immediately census <- open_soma(). can learn cellxgene.census methods accessing corresponding documentation. example ?cellxgene.census::open_soma.","code":"library(\"cellxgene.census\") census <- open_soma()"},{"path":"/articles/comp_bio_summarize_axis_query.html","id":"summarizing-cell-metadata","dir":"Articles","previous_headings":"","what":"Summarizing cell metadata","title":"Summarizing cell and gene metadata","text":"Census open can use TileDB-SOMA methods SOMACollection. can thus access metadata SOMADataFrame objects encoding cell gene metadata. Tips: can read entire SOMADataFrame R using .data.frame(soma_df$read()$concat()). Queries much faster request DataFrame columns required analysis (e.g. column_names = c(\"soma_joinid\", \"cell_type_ontology_term_id\")). can also refine query results using value_filter, filter census matching records.","code":""},{"path":"/articles/comp_bio_summarize_axis_query.html","id":"example-summarize-all-cell-types","dir":"Articles","previous_headings":"Summarizing cell metadata","what":"Example: Summarize all cell types","title":"Summarizing cell and gene metadata","text":"example reads cell metadata (obs) R data frame summarize variety ways.","code":"human <- census$get(\"census_data\")$get(\"homo_sapiens\")  # Read obs into an R data frame (tibble). obs_df <- human$obs$read(column_names = c(\"cell_type\")) obs_df <- as.data.frame(obs_df$concat())  # Find all unique values in the cell_type column. unique_cell_type <- unique(obs_df$cell_type)  cat(   \"There are\",   length(unique_cell_type),   \"cell types in the Census! The first few are: \",   paste(head(unique_cell_type), collapse = \", \") ) #> There are 631 cell types in the Census! The first few are:  oligodendrocyte, oligodendrocyte precursor cell, astrocyte of the cerebral cortex, microglial cell, cerebral cortex endothelial cell, vascular leptomeningeal cell"},{"path":"/articles/comp_bio_summarize_axis_query.html","id":"example-summarize-a-subset-of-cell-types-selected-with-a-value_filter","dir":"Articles","previous_headings":"Summarizing cell metadata","what":"Example: Summarize a subset of cell types, selected with a value_filter","title":"Summarizing cell and gene metadata","text":"example utilizes SOMA “value filter” read subset cells tissue_ontology_term_id equal UBERON:0002048 (lung tissue), summarizes query result. can also define much complex value filters. example: combine terms & | use %% operator query multiple values","code":"# Read cell_type terms for cells which have a specific tissue term LUNG_TISSUE <- \"UBERON:0002048\"  obs_df <- human$obs$read(column_names = c(\"cell_type\"), value_filter = paste0(\"tissue_ontology_term_id == '\", LUNG_TISSUE, \"'\")) obs_df <- as.data.frame(obs_df$concat())  # Find all unique values in the cell_type column as an R data frame. unique_cell_type <- unique(obs_df$cell_type) cat(   \"There are \",   length(unique_cell_type),   \" cell types in the Census where tissue_ontology_term_id == \",   LUNG_TISSUE,   \"!\\nThe first few are:\",   paste(head(unique_cell_type), collapse = \", \"),   \"\\n\" ) #> There are  185  cell types in the Census where tissue_ontology_term_id ==  UBERON:0002048 ! #> The first few are: type II pneumocyte, neutrophil, effector CD4-positive, alpha-beta T cell, effector CD8-positive, alpha-beta T cell, mature NK T cell, blood vessel endothelial cell  # Report the 10 most common top_10 <- sort(table(obs_df$cell_type), decreasing = TRUE)[1:10] cat(   \"The top 10 cell types where tissue_ontology_term_id ==\",   LUNG_TISSUE,   \"are: \",   paste(names(top_10), collapse = \", \") ) #> The top 10 cell types where tissue_ontology_term_id == UBERON:0002048 are:  native cell, alveolar macrophage, CD8-positive, alpha-beta T cell, CD4-positive, alpha-beta T cell, macrophage, type II pneumocyte, classical monocyte, natural killer cell, malignant cell, epithelial cell of lower respiratory tract # You can also do more complex queries, such as testing for inclusion in a list of values obs_df <- human$obs$read(   column_names = c(\"cell_type_ontology_term_id\"),   value_filter = \"tissue_ontology_term_id %in% c('UBERON:0002082', 'UBERON:OOO2084', 'UBERON:0002080')\" )  obs_df <- as.data.frame(obs_df$concat())  # Summarize top_10 <- sort(table(obs_df$cell_type_ontology_term_id), decreasing = TRUE)[1:10] print(top_10) #>  #> CL:0000746 CL:0008034 CL:0002131 CL:0002548 CL:0000115 CL:0000763 CL:0000057 CL:0000669  #>     160974      99458      96953      79733      79626      35560      33075      27515  #> CL:0000003 CL:0002144  #>      23613      18593"},{"path":"/articles/comp_bio_summarize_axis_query.html","id":"full-census-metadata-stats","dir":"Articles","previous_headings":"","what":"Full Census metadata stats","title":"Summarizing cell and gene metadata","text":"example queries organisms Census, summarizes diversity various metadata labels.","code":"cols_to_query <- c(   \"cell_type_ontology_term_id\",   \"assay_ontology_term_id\",   \"tissue_ontology_term_id\" )  total_cells <- 0 for (organism in census$get(\"census_data\")$names()) {   print(organism)    obs_df <- census$get(\"census_data\")$get(organism)$obs$read(column_names = cols_to_query)   obs_df <- as.data.frame(obs_df$concat())    total_cells <- total_cells + nrow(obs_df)   for (col in cols_to_query) {     cat(\"  Unique \", col, \" values: \", length(unique(obs_df[[col]])), \"\\n\")   } } #> [1] \"homo_sapiens\" #>   Unique  cell_type_ontology_term_id  values:  631  #>   Unique  assay_ontology_term_id  values:  20  #>   Unique  tissue_ontology_term_id  values:  230  #> [1] \"mus_musculus\" #>   Unique  cell_type_ontology_term_id  values:  248  #>   Unique  assay_ontology_term_id  values:  10  #>   Unique  tissue_ontology_term_id  values:  74 cat(\"Complete Census contains \", total_cells, \" cells.\") #> Complete Census contains  68683222  cells."},{"path":"/articles/comp_bio_summarize_axis_query.html","id":"close-the-census","dir":"Articles","previous_headings":"Full Census metadata stats","what":"Close the census","title":"Summarizing cell and gene metadata","text":"use, census object closed release memory resources. also closes SOMA objects accessed via top-level census. Closing can automated using .exit(census$close(), add = TRUE) immediately census <- open_soma().","code":"census$close()"},{"path":"/authors.html","id":null,"dir":"","previous_headings":"","what":"Authors","title":"Authors and Citation","text":"Chan Zuckerberg Initiative Foundation. Author, maintainer, copyright holder, funder.","code":""},{"path":"/authors.html","id":"citation","dir":"","previous_headings":"","what":"Citation","title":"Authors and Citation","text":"Chan Zuckerberg Initiative Foundation (2024). cellxgene.census: CZ CELLxGENE Discover Cell Census. R package version 1.13.0, https://github.com/chanzuckerberg/cellxgene-census.","code":"@Manual{,   title = {cellxgene.census: CZ CELLxGENE Discover Cell Census},   author = {{Chan Zuckerberg Initiative Foundation}},   year = {2024},   note = {R package version 1.13.0},   url = {https://github.com/chanzuckerberg/cellxgene-census}, }"},{"path":"/index.html","id":"r-package-of-cz-cellxgene-discover-census","dir":"","previous_headings":"","what":"CZ CELLxGENE Discover Cell Census","title":"CZ CELLxGENE Discover Cell Census","text":"documentation R package cellxgene.census part CZ CELLxGENE Discover Census. full details Census data capabilities please go main Census site. cellxgene.census provides API efficiently access cloud-hosted Census single-cell data R. just seconds users can access slice Census data using cell gene filters across hundreds single-cell datasets. Census data can fetched iterative fashion bigger--memory slices data, quickly exported basic R structures, well Seurat SingleCellExperiment objects downstream analysis.","code":""},{"path":"/index.html","id":"installation","dir":"","previous_headings":"","what":"Installation","title":"CZ CELLxGENE Discover Cell Census","text":"installing Ubuntu, may need install following libraries via apt install, libxml2-dev libssl-dev libcurl4-openssl-dev. addition must cmake v3.21 greater. installing MacOS, need install developer tools Xcode. Windows supported. R session install cellxgene.census R-Universe. able export Census data Seurat SingleCellExperiment also need install respective packages.","code":"install.packages(   \"cellxgene.census\",   repos=c('https://chanzuckerberg.r-universe.dev', 'https://cloud.r-project.org') ) # Seurat install.packages(\"Seurat\")  # SingleCellExperiment if (!require(\"BiocManager\", quietly = TRUE))     install.packages(\"BiocManager\")  BiocManager::install(\"SingleCellExperiment\")"},{"path":"/index.html","id":"usage","dir":"","previous_headings":"","what":"Usage","title":"CZ CELLxGENE Discover Cell Census","text":"Check vignettes “Articles” section navigation bar site. highly recommend following vignettes starting point: Querying fetching single-cell data cell/gene metadata Learning CZ CELLxGENE Discover Census can also check quick start guide main Census site.","code":""},{"path":"/index.html","id":"example-seurat-and-singlecellexperiment-query","dir":"","previous_headings":"Usage","what":"Example Seurat and SingleCellExperiment query","title":"CZ CELLxGENE Discover Cell Census","text":"following creates Seurat object -demand sympathetic neurons Census filtering genes ENSG00000161798, ENSG00000188229. following retrieves data SingleCellExperiment object.","code":"library(\"cellxgene.census\") library(\"Seurat\")  census <- open_soma()  organism <- \"Homo sapiens\" gene_filter <- \"feature_id %in% c('ENSG00000107317', 'ENSG00000106034')\" cell_filter <-  \"cell_type == 'sympathetic neuron'\" cell_columns <- c(\"assay\", \"cell_type\", \"tissue\", \"tissue_general\", \"suspension_type\", \"disease\")  seurat_obj <- get_seurat(    census = census,    organism = organism,    var_value_filter = gene_filter,    obs_value_filter = cell_filter,    obs_column_names = cell_columns ) library(\"SingleCellExperiment\")  sce_obj <- get_single_cell_experiment(    census = census,    organism = organism,    var_value_filter = gene_filter,    obs_value_filter = cell_filter,    obs_column_names = cell_columns )"},{"path":"/index.html","id":"for-more-help","dir":"","previous_headings":"","what":"For More Help","title":"CZ CELLxGENE Discover Cell Census","text":"help, please go visit main Census site. believe found security issue, appreciate notification. Please send email security@chanzuckerberg.com.","code":""},{"path":"/reference/download_source_h5ad.html","id":null,"dir":"Reference","previous_headings":"","what":"Download source H5AD to local file name. — download_source_h5ad","title":"Download source H5AD to local file name. — download_source_h5ad","text":"Download source H5AD local file name.","code":""},{"path":"/reference/download_source_h5ad.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Download source H5AD to local file name. — download_source_h5ad","text":"","code":"download_source_h5ad(   dataset_id,   file,   overwrite = FALSE,   census_version = \"stable\",   census = NULL )"},{"path":"/reference/download_source_h5ad.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Download source H5AD to local file name. — download_source_h5ad","text":"dataset_id dataset_id interest. file Local file name store H5AD file. overwrite TRUE allow overwriting existing file. census_version desired Census version. census open Census handle census_version. provided, opened closed automatically; efficient reuse handle calling download_source_h5ad() multiple times.","code":""},{"path":"/reference/download_source_h5ad.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Download source H5AD to local file name. — download_source_h5ad","text":"","code":"download_source_h5ad(\"0895c838-e550-48a3-a777-dbcd35d30272\", \"/tmp/data.h5ad\", overwrite = TRUE)"},{"path":"/reference/get_census_mirror.html","id":null,"dir":"Reference","previous_headings":"","what":"Get locator information about a Census mirror — get_census_mirror","title":"Get locator information about a Census mirror — get_census_mirror","text":"Get locator information Census mirror","code":""},{"path":"/reference/get_census_mirror.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get locator information about a Census mirror — get_census_mirror","text":"","code":"get_census_mirror(mirror)"},{"path":"/reference/get_census_mirror.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get locator information about a Census mirror — get_census_mirror","text":"mirror Name mirror.","code":""},{"path":"/reference/get_census_mirror.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get locator information about a Census mirror — get_census_mirror","text":"List mirror information","code":""},{"path":"/reference/get_census_mirror.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get locator information about a Census mirror — get_census_mirror","text":"","code":"get_census_mirror(\"AWS-S3-us-west-2\") #> $provider #> [1] \"S3\" #>  #> $base_uri #> [1] \"s3://cellxgene-census-public-us-west-2/\" #>  #> $region #> [1] \"us-west-2\" #>  #> $alias #> [1] \"\" #>"},{"path":"/reference/get_census_mirror_directory.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the directory of Census mirrors currently available — get_census_mirror_directory","title":"Get the directory of Census mirrors currently available — get_census_mirror_directory","text":"Get directory Census mirrors currently available","code":""},{"path":"/reference/get_census_mirror_directory.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the directory of Census mirrors currently available — get_census_mirror_directory","text":"","code":"get_census_mirror_directory()"},{"path":"/reference/get_census_mirror_directory.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the directory of Census mirrors currently available — get_census_mirror_directory","text":"Nested list information available mirrors","code":""},{"path":"/reference/get_census_mirror_directory.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the directory of Census mirrors currently available — get_census_mirror_directory","text":"","code":"get_census_mirror_directory() #> $default #> $default$provider #> [1] \"S3\" #>  #> $default$base_uri #> [1] \"s3://cellxgene-census-public-us-west-2/\" #>  #> $default$region #> [1] \"us-west-2\" #>  #> $default$alias #> [1] \"default\" #>  #>  #> $`AWS-S3-us-west-2` #> $`AWS-S3-us-west-2`$provider #> [1] \"S3\" #>  #> $`AWS-S3-us-west-2`$base_uri #> [1] \"s3://cellxgene-census-public-us-west-2/\" #>  #> $`AWS-S3-us-west-2`$region #> [1] \"us-west-2\" #>  #> $`AWS-S3-us-west-2`$alias #> [1] \"\" #>  #>"},{"path":"/reference/get_census_version_description.html","id":null,"dir":"Reference","previous_headings":"","what":"Get release description for a Census version — get_census_version_description","title":"Get release description for a Census version — get_census_version_description","text":"Get release description Census version","code":""},{"path":"/reference/get_census_version_description.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get release description for a Census version — get_census_version_description","text":"","code":"get_census_version_description(census_version)"},{"path":"/reference/get_census_version_description.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Get release description for a Census version — get_census_version_description","text":"census_version census version name.","code":""},{"path":"/reference/get_census_version_description.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get release description for a Census version — get_census_version_description","text":"List release location metadata","code":""},{"path":"/reference/get_census_version_description.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get release description for a Census version — get_census_version_description","text":"","code":"as.data.frame(get_census_version_description(\"stable\")) #>   release_date release_build #> 1                 2023-12-15 #>                                                              soma.uri #> 1 s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/ #>               soma.relative_uri soma.s3_region #> 1 /cell-census/2023-12-15/soma/      us-west-2 #>                                                              h5ads.uri #> 1 s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/h5ads/ #>               h5ads.relative_uri h5ads.s3_region do_not_delete  lts  alias #> 1 /cell-census/2023-12-15/h5ads/       us-west-2          TRUE TRUE stable #>   census_version #> 1         stable"},{"path":"/reference/get_census_version_directory.html","id":null,"dir":"Reference","previous_headings":"","what":"Get the directory of Census releases currently available — get_census_version_directory","title":"Get the directory of Census releases currently available — get_census_version_directory","text":"Get directory Census releases currently available","code":""},{"path":"/reference/get_census_version_directory.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Get the directory of Census releases currently available — get_census_version_directory","text":"","code":"get_census_version_directory()"},{"path":"/reference/get_census_version_directory.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Get the directory of Census releases currently available — get_census_version_directory","text":"Data frame available cell census releases, including location metadata.","code":""},{"path":"/reference/get_census_version_directory.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Get the directory of Census releases currently available — get_census_version_directory","text":"","code":"get_census_version_directory() #>            release_date release_build #> stable                     2023-12-15 #> latest                     2024-04-01 #> 2023-05-15                 2023-05-15 #> 2023-07-25                 2023-07-25 #> 2023-12-15                 2023-12-15 #> 2024-03-04                 2024-03-04 #> 2024-03-11                 2024-03-11 #> 2024-03-12                 2024-03-12 #> 2024-03-18                 2024-03-18 #> 2024-03-25                 2024-03-25 #> 2024-03-26                 2024-03-26 #> 2024-04-01                 2024-04-01 #>                                                                       soma.uri #> stable     s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/ #> latest     s3://cellxgene-census-public-us-west-2/cell-census/2024-04-01/soma/ #> 2023-05-15 s3://cellxgene-census-public-us-west-2/cell-census/2023-05-15/soma/ #> 2023-07-25 s3://cellxgene-census-public-us-west-2/cell-census/2023-07-25/soma/ #> 2023-12-15 s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/ #> 2024-03-04 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-04/soma/ #> 2024-03-11 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-11/soma/ #> 2024-03-12 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-12/soma/ #> 2024-03-18 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-18/soma/ #> 2024-03-25 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-25/soma/ #> 2024-03-26 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-26/soma/ #> 2024-04-01 s3://cellxgene-census-public-us-west-2/cell-census/2024-04-01/soma/ #>                        soma.relative_uri soma.s3_region #> stable     /cell-census/2023-12-15/soma/      us-west-2 #> latest     /cell-census/2024-04-01/soma/      us-west-2 #> 2023-05-15 /cell-census/2023-05-15/soma/      us-west-2 #> 2023-07-25 /cell-census/2023-07-25/soma/      us-west-2 #> 2023-12-15 /cell-census/2023-12-15/soma/      us-west-2 #> 2024-03-04 /cell-census/2024-03-04/soma/      us-west-2 #> 2024-03-11 /cell-census/2024-03-11/soma/      us-west-2 #> 2024-03-12 /cell-census/2024-03-12/soma/      us-west-2 #> 2024-03-18 /cell-census/2024-03-18/soma/      us-west-2 #> 2024-03-25 /cell-census/2024-03-25/soma/      us-west-2 #> 2024-03-26 /cell-census/2024-03-26/soma/      us-west-2 #> 2024-04-01 /cell-census/2024-04-01/soma/      us-west-2 #>                                                                       h5ads.uri #> stable     s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/h5ads/ #> latest     s3://cellxgene-census-public-us-west-2/cell-census/2024-04-01/h5ads/ #> 2023-05-15 s3://cellxgene-census-public-us-west-2/cell-census/2023-05-15/h5ads/ #> 2023-07-25 s3://cellxgene-census-public-us-west-2/cell-census/2023-07-25/h5ads/ #> 2023-12-15 s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/h5ads/ #> 2024-03-04 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-04/h5ads/ #> 2024-03-11 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-11/h5ads/ #> 2024-03-12 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-12/h5ads/ #> 2024-03-18 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-18/h5ads/ #> 2024-03-25 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-25/h5ads/ #> 2024-03-26 s3://cellxgene-census-public-us-west-2/cell-census/2024-03-26/h5ads/ #> 2024-04-01 s3://cellxgene-census-public-us-west-2/cell-census/2024-04-01/h5ads/ #>                        h5ads.relative_uri h5ads.s3_region do_not_delete  lts #> stable     /cell-census/2023-12-15/h5ads/       us-west-2          TRUE TRUE #> latest     /cell-census/2024-04-01/h5ads/       us-west-2         FALSE   NA #> 2023-05-15 /cell-census/2023-05-15/h5ads/       us-west-2          TRUE TRUE #> 2023-07-25 /cell-census/2023-07-25/h5ads/       us-west-2          TRUE TRUE #> 2023-12-15 /cell-census/2023-12-15/h5ads/       us-west-2          TRUE TRUE #> 2024-03-04 /cell-census/2024-03-04/h5ads/       us-west-2         FALSE   NA #> 2024-03-11 /cell-census/2024-03-11/h5ads/       us-west-2         FALSE   NA #> 2024-03-12 /cell-census/2024-03-12/h5ads/       us-west-2         FALSE   NA #> 2024-03-18 /cell-census/2024-03-18/h5ads/       us-west-2         FALSE   NA #> 2024-03-25 /cell-census/2024-03-25/h5ads/       us-west-2         FALSE   NA #> 2024-03-26 /cell-census/2024-03-26/h5ads/       us-west-2         FALSE   NA #> 2024-04-01 /cell-census/2024-04-01/h5ads/       us-west-2         FALSE   NA #>             alias #> stable     stable #> latest     latest #> 2023-05-15        #> 2023-07-25        #> 2023-12-15        #> 2024-03-04        #> 2024-03-11        #> 2024-03-12        #> 2024-03-18        #> 2024-03-25        #> 2024-03-26        #> 2024-04-01"},{"path":"/reference/get_presence_matrix.html","id":null,"dir":"Reference","previous_headings":"","what":"Read the feature dataset presence matrix. — get_presence_matrix","title":"Read the feature dataset presence matrix. — get_presence_matrix","text":"Read feature dataset presence matrix.","code":""},{"path":"/reference/get_presence_matrix.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Read the feature dataset presence matrix. — get_presence_matrix","text":"","code":"get_presence_matrix(census, organism, measurement_name = \"RNA\")"},{"path":"/reference/get_presence_matrix.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Read the feature dataset presence matrix. — get_presence_matrix","text":"census census object cellxgene.census::open_soma(). organism organism query, usually one Homo sapiens Mus musculus measurement_name measurement object query. Defaults RNA.","code":""},{"path":"/reference/get_presence_matrix.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Read the feature dataset presence matrix. — get_presence_matrix","text":"tiledbsoma::matrixZeroBasedView object dataset join id & feature join id dimensions, filled 1s indicating presence. sparse matrix accessed zero-based indexes since join id's may zero.","code":""},{"path":"/reference/get_presence_matrix.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Read the feature dataset presence matrix. — get_presence_matrix","text":"","code":"census <- open_soma() #> The stable Census release is currently 2023-12-15. Specify census_version = \"2023-12-15\" in future calls to open_soma() to ensure data consistency. on.exit(census$close(), add = TRUE) print(get_presence_matrix(census, \"Homo sapiens\")$dim()) #> Error in private$check_open_for_read_or_write(): Item must be open for read or write. s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/soma/"},{"path":"/reference/get_seurat.html","id":null,"dir":"Reference","previous_headings":"","what":"Export Census slices to Seurat — get_seurat","title":"Export Census slices to Seurat — get_seurat","text":"Convenience wrapper around SOMAExperimentAxisQuery, build execute query, return Seurat object.","code":""},{"path":"/reference/get_seurat.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Export Census slices to Seurat — get_seurat","text":"","code":"get_seurat(   census,   organism,   measurement_name = \"RNA\",   X_layers = c(counts = \"raw\", data = NULL),   obs_value_filter = NULL,   obs_coords = NULL,   obs_column_names = NULL,   obsm_layers = FALSE,   var_value_filter = NULL,   var_coords = NULL,   var_column_names = NULL,   var_index = \"feature_id\" )"},{"path":"/reference/get_seurat.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Export Census slices to Seurat — get_seurat","text":"census census object, usually returned cellxgene.census::open_soma(). organism organism query, usually one Homo sapiens Mus musculus measurement_name measurement object query. Defaults RNA. X_layers named character X layers add Seurat assay, names names Seurat slots (counts data) values names layers within X. obs_value_filter SOMA value_filter across columns obs dataframe, expressed string. obs_coords set coordinates obs dataframe index, expressed type format supported SOMADataFrame's read() method. obs_column_names Columns fetch obs data frame. obsm_layers Names arrays obsm add cell embeddings; pass FALSE suppress loading dimensional reductions. var_value_filter obs_value_filter var. var_coords obs_coords var. var_column_names Columns fetch var data frame. var_index Name column ‘var’ add feature names.","code":""},{"path":"/reference/get_seurat.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Export Census slices to Seurat — get_seurat","text":"Seurat object containing sensus slice.","code":""},{"path":"/reference/get_seurat.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Export Census slices to Seurat — get_seurat","text":"","code":"if (FALSE) { census <- open_soma() seurat_obj <- get_seurat(   census,   organism = \"Homo sapiens\",   obs_value_filter = \"cell_type == 'leptomeningeal cell'\",   var_value_filter = \"feature_id %in% c('ENSG00000107317', 'ENSG00000106034')\" )  seurat_obj  census$close() }"},{"path":"/reference/get_single_cell_experiment.html","id":null,"dir":"Reference","previous_headings":"","what":"Export Census slices to SingleCellExperiment — get_single_cell_experiment","title":"Export Census slices to SingleCellExperiment — get_single_cell_experiment","text":"Convenience wrapper around SOMAExperimentAxisQuery, build execute query, return SingleCellExperiment object.","code":""},{"path":"/reference/get_single_cell_experiment.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Export Census slices to SingleCellExperiment — get_single_cell_experiment","text":"","code":"get_single_cell_experiment(   census,   organism,   measurement_name = \"RNA\",   X_layers = c(counts = \"raw\"),   obs_value_filter = NULL,   obs_coords = NULL,   obs_column_names = NULL,   obsm_layers = FALSE,   var_value_filter = NULL,   var_coords = NULL,   var_column_names = NULL,   var_index = \"feature_id\" )"},{"path":"/reference/get_single_cell_experiment.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Export Census slices to SingleCellExperiment — get_single_cell_experiment","text":"census census object, usually returned cellxgene.census::open_soma(). organism organism query, usually one Homo sapiens Mus musculus measurement_name measurement object query. Defaults RNA. X_layers character vector X layers add assays main experiment; may optionally named set name resulting assay (eg. ‘X_layers = c(counts = \"raw\")’ load X layer “‘raw’” assay “‘counts’”); default, loads X layers obs_value_filter SOMA value_filter across columns obs dataframe, expressed string. obs_coords set coordinates obs dataframe index, expressed type format supported SOMADataFrame's read() method. obs_column_names Columns fetch obs data frame. obsm_layers Names arrays obsm add cell embeddings; pass FALSE suppress loading dimensional reductions. var_value_filter obs_value_filter var. var_coords obs_coords var. var_column_names Columns fetch var data frame. var_index Name column ‘var’ add feature names.","code":""},{"path":"/reference/get_single_cell_experiment.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Export Census slices to SingleCellExperiment — get_single_cell_experiment","text":"SingleCellExperiment object containing sensus slice.","code":""},{"path":"/reference/get_single_cell_experiment.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Export Census slices to SingleCellExperiment — get_single_cell_experiment","text":"","code":"if (FALSE) { census <- open_soma() sce_obj <- get_single_cell_experiment(   census,   organism = \"Homo sapiens\",   obs_value_filter = \"cell_type == 'leptomeningeal cell'\",   var_value_filter = \"feature_id %in% c('ENSG00000107317', 'ENSG00000106034')\" )  sce_obj  census$close() }"},{"path":"/reference/get_source_h5ad_uri.html","id":null,"dir":"Reference","previous_headings":"","what":"Locate source h5ad file for a dataset. — get_source_h5ad_uri","title":"Locate source h5ad file for a dataset. — get_source_h5ad_uri","text":"Locate source h5ad file dataset.","code":""},{"path":"/reference/get_source_h5ad_uri.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Locate source h5ad file for a dataset. — get_source_h5ad_uri","text":"","code":"get_source_h5ad_uri(dataset_id, census_version = \"stable\", census = NULL)"},{"path":"/reference/get_source_h5ad_uri.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Locate source h5ad file for a dataset. — get_source_h5ad_uri","text":"dataset_id dataset_id interest. census_version desired Census version. census open Census handle census_version. provided, opened closed automatically; efficient reuse handle calling get_source_h5ad_uri() multiple times.","code":""},{"path":"/reference/get_source_h5ad_uri.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Locate source h5ad file for a dataset. — get_source_h5ad_uri","text":"list uri optional s3_region.","code":""},{"path":"/reference/get_source_h5ad_uri.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Locate source h5ad file for a dataset. — get_source_h5ad_uri","text":"","code":"get_source_h5ad_uri(\"0895c838-e550-48a3-a777-dbcd35d30272\") #> $uri #> [1] \"s3://cellxgene-census-public-us-west-2/cell-census/2023-12-15/h5ads/0895c838-e550-48a3-a777-dbcd35d30272.h5ad\" #>  #> $s3_region #> [1] \"us-west-2\" #>"},{"path":"/reference/new_SOMATileDBContext_for_census.html","id":null,"dir":"Reference","previous_headings":"","what":"Create SOMATileDBContext for Census — new_SOMATileDBContext_for_census","title":"Create SOMATileDBContext for Census — new_SOMATileDBContext_for_census","text":"Create SOMATileDBContext suitable using open_soma(). Typically open_soma() creates context automatically, one can created separately order set custom configuration options, share multiple open Census handles.","code":""},{"path":"/reference/new_SOMATileDBContext_for_census.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Create SOMATileDBContext for Census — new_SOMATileDBContext_for_census","text":"","code":"new_SOMATileDBContext_for_census(   census_version_description,   mirror = \"default\",   ... )"},{"path":"/reference/new_SOMATileDBContext_for_census.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Create SOMATileDBContext for Census — new_SOMATileDBContext_for_census","text":"census_version_description result get_census_version_description() desired Census version. mirror name intended census mirror (get_census_mirror_directory()[[name]] save lookup), NULL configure local file access. ... Custom configuration options.","code":""},{"path":"/reference/new_SOMATileDBContext_for_census.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Create SOMATileDBContext for Census — new_SOMATileDBContext_for_census","text":"SOMATileDBContext object open_soma().","code":""},{"path":"/reference/new_SOMATileDBContext_for_census.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Create SOMATileDBContext for Census — new_SOMATileDBContext_for_census","text":"","code":"census_desc <- get_census_version_description(\"stable\") ctx <- new_SOMATileDBContext_for_census(census_desc, \"soma.init_buffer_bytes\" = paste(4 * 1024**3)) census <- open_soma(\"stable\", tiledbsoma_ctx = ctx) #> The stable Census release is currently 2023-12-15. Specify census_version = \"2023-12-15\" in future calls to open_soma() to ensure data consistency. census$close()"},{"path":"/reference/open_soma.html","id":null,"dir":"Reference","previous_headings":"","what":"Open the Census — open_soma","title":"Open the Census — open_soma","text":"Open Census","code":""},{"path":"/reference/open_soma.html","id":"ref-usage","dir":"Reference","previous_headings":"","what":"Usage","title":"Open the Census — open_soma","text":"","code":"open_soma(   census_version = \"stable\",   uri = NULL,   tiledbsoma_ctx = NULL,   mirror = NULL )"},{"path":"/reference/open_soma.html","id":"arguments","dir":"Reference","previous_headings":"","what":"Arguments","title":"Open the Census — open_soma","text":"census_version version Census, e.g., \"stable\". uri URI containing Census SOMA objects open instead released version. (supplied, takes precedence census_version.) tiledbsoma_ctx tiledbsoma::SOMATileDBContext built using new_SOMATileDBContext_for_census(). Optional (created automatically) using census_version context need reused. mirror Census mirror access; one names(get_census_mirror_directory()).","code":""},{"path":"/reference/open_soma.html","id":"value","dir":"Reference","previous_headings":"","what":"Value","title":"Open the Census — open_soma","text":"Top-level tiledbsoma::SOMACollection object. use, census closed release memory resources, usually .exit(census$close(), add = TRUE). Closing top-level census also close SOMA objects accessed .","code":""},{"path":"/reference/open_soma.html","id":"ref-examples","dir":"Reference","previous_headings":"","what":"Examples","title":"Open the Census — open_soma","text":"","code":"census <- open_soma() #> The stable Census release is currently 2023-12-15. Specify census_version = \"2023-12-15\" in future calls to open_soma() to ensure data consistency. as.data.frame(census$get(\"census_info\")$get(\"summary\")$read()$concat()) #>   soma_joinid                      label      value #> 1           0      census_schema_version      1.2.0 #> 2           1          census_build_date 2023-10-23 #> 3           2     dataset_schema_version      3.1.0 #> 4           3           total_cell_count   68683222 #> 5           4          unique_cell_count   40356133 #> 6           5 number_donors_homo_sapiens      15588 #> 7           6 number_donors_mus_musculus       1990 census$close()"}]
            diff --git a/searchindex.js b/searchindex.js
            index 42756be0b..3766d1b9a 100644
            --- a/searchindex.js
            +++ b/searchindex.js
            @@ -1 +1 @@
            -Search.setIndex({"docnames": ["README", "_autosummary/cellxgene_census.download_source_h5ad", "_autosummary/cellxgene_census.experimental.get_all_available_embeddings", "_autosummary/cellxgene_census.experimental.get_all_census_versions_with_embedding", "_autosummary/cellxgene_census.experimental.get_embedding", "_autosummary/cellxgene_census.experimental.get_embedding_metadata", "_autosummary/cellxgene_census.experimental.get_embedding_metadata_by_name", "_autosummary/cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder", "_autosummary/cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer", "_autosummary/cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe", "_autosummary/cellxgene_census.experimental.ml.pytorch.Stats", "_autosummary/cellxgene_census.experimental.ml.pytorch.experiment_dataloader", "_autosummary/cellxgene_census.experimental.pp.get_highly_variable_genes", "_autosummary/cellxgene_census.experimental.pp.highly_variable_genes", "_autosummary/cellxgene_census.experimental.pp.mean_variance", "_autosummary/cellxgene_census.get_anndata", "_autosummary/cellxgene_census.get_census_version_description", "_autosummary/cellxgene_census.get_census_version_directory", "_autosummary/cellxgene_census.get_default_soma_context", "_autosummary/cellxgene_census.get_presence_matrix", "_autosummary/cellxgene_census.get_source_h5ad_uri", "_autosummary/cellxgene_census.open_soma", "articles", "articles/2023/20230808-r_api_release", "articles/2023/20230919-out_of_core_methods", "articles/2023/20231012-normalized_layer_precalc_stats", "articles/2024/20240404-categoricals", "cellxgene_census_aws_open_data", "cellxgene_census_docsite_FAQ", "cellxgene_census_docsite_data_release_info", "cellxgene_census_docsite_installation", "cellxgene_census_docsite_landing", "cellxgene_census_docsite_quick_start", "cellxgene_census_docsite_schema", "cellxgene_census_mirroring", "cellxgene_census_schema", "cellxgene_census_storage_and_release_policy", "census_article_guidelines", "census_notebook_guidelines", "examples", "index", "notebooks/analysis_demo/comp_bio_census_info", "notebooks/analysis_demo/comp_bio_data_integration_scvi", "notebooks/analysis_demo/comp_bio_embedding_exploration", "notebooks/analysis_demo/comp_bio_explore_and_load_lung_data", "notebooks/analysis_demo/comp_bio_geneformer_prediction", "notebooks/analysis_demo/comp_bio_normalizing_full_gene_sequencing", "notebooks/analysis_demo/comp_bio_scvi_model_use", "notebooks/analysis_demo/comp_bio_summarize_axis_query", "notebooks/api_demo/census_access_maintained_embeddings", "notebooks/api_demo/census_citation_generation", "notebooks/api_demo/census_compute_over_X", "notebooks/api_demo/census_dataset_presence", "notebooks/api_demo/census_datasets", "notebooks/api_demo/census_duplicated_cells", "notebooks/api_demo/census_embedding", "notebooks/api_demo/census_gget_demo", "notebooks/api_demo/census_query_extract", "notebooks/api_demo/census_summary_cell_counts", "notebooks/experimental/highly_variable_genes", "notebooks/experimental/mean_variance", "notebooks/experimental/pytorch", "python-api", "setup", "soma"], "filenames": ["README.md", "_autosummary/cellxgene_census.download_source_h5ad.rst", "_autosummary/cellxgene_census.experimental.get_all_available_embeddings.rst", "_autosummary/cellxgene_census.experimental.get_all_census_versions_with_embedding.rst", "_autosummary/cellxgene_census.experimental.get_embedding.rst", "_autosummary/cellxgene_census.experimental.get_embedding_metadata.rst", "_autosummary/cellxgene_census.experimental.get_embedding_metadata_by_name.rst", "_autosummary/cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder.rst", "_autosummary/cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer.rst", "_autosummary/cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe.rst", "_autosummary/cellxgene_census.experimental.ml.pytorch.Stats.rst", "_autosummary/cellxgene_census.experimental.ml.pytorch.experiment_dataloader.rst", "_autosummary/cellxgene_census.experimental.pp.get_highly_variable_genes.rst", "_autosummary/cellxgene_census.experimental.pp.highly_variable_genes.rst", "_autosummary/cellxgene_census.experimental.pp.mean_variance.rst", "_autosummary/cellxgene_census.get_anndata.rst", "_autosummary/cellxgene_census.get_census_version_description.rst", "_autosummary/cellxgene_census.get_census_version_directory.rst", "_autosummary/cellxgene_census.get_default_soma_context.rst", "_autosummary/cellxgene_census.get_presence_matrix.rst", "_autosummary/cellxgene_census.get_source_h5ad_uri.rst", "_autosummary/cellxgene_census.open_soma.rst", "articles.rst", "articles/2023/20230808-r_api_release.md", "articles/2023/20230919-out_of_core_methods.md", "articles/2023/20231012-normalized_layer_precalc_stats.md", "articles/2024/20240404-categoricals.md", "cellxgene_census_aws_open_data.md", "cellxgene_census_docsite_FAQ.md", "cellxgene_census_docsite_data_release_info.md", "cellxgene_census_docsite_installation.md", "cellxgene_census_docsite_landing.md", "cellxgene_census_docsite_quick_start.md", "cellxgene_census_docsite_schema.md", "cellxgene_census_mirroring.md", "cellxgene_census_schema.md", "cellxgene_census_storage_and_release_policy.md", "census_article_guidelines.md", "census_notebook_guidelines.md", "examples.rst", "index.rst", "notebooks/analysis_demo/comp_bio_census_info.ipynb", "notebooks/analysis_demo/comp_bio_data_integration_scvi.ipynb", "notebooks/analysis_demo/comp_bio_embedding_exploration.ipynb", "notebooks/analysis_demo/comp_bio_explore_and_load_lung_data.ipynb", "notebooks/analysis_demo/comp_bio_geneformer_prediction.ipynb", "notebooks/analysis_demo/comp_bio_normalizing_full_gene_sequencing.ipynb", "notebooks/analysis_demo/comp_bio_scvi_model_use.ipynb", "notebooks/analysis_demo/comp_bio_summarize_axis_query.ipynb", "notebooks/api_demo/census_access_maintained_embeddings.ipynb", "notebooks/api_demo/census_citation_generation.ipynb", "notebooks/api_demo/census_compute_over_X.ipynb", "notebooks/api_demo/census_dataset_presence.ipynb", "notebooks/api_demo/census_datasets.ipynb", "notebooks/api_demo/census_duplicated_cells.ipynb", "notebooks/api_demo/census_embedding.ipynb", "notebooks/api_demo/census_gget_demo.ipynb", "notebooks/api_demo/census_query_extract.ipynb", "notebooks/api_demo/census_summary_cell_counts.ipynb", "notebooks/experimental/highly_variable_genes.ipynb", "notebooks/experimental/mean_variance.ipynb", "notebooks/experimental/pytorch.ipynb", "python-api.rst", "setup.rst", "soma.rst"], "titles": ["API Documentation", "cellxgene_census.download_source_h5ad", "cellxgene_census.experimental.get_all_available_embeddings", "cellxgene_census.experimental.get_all_census_versions_with_embedding", "cellxgene_census.experimental.get_embedding", "cellxgene_census.experimental.get_embedding_metadata", "cellxgene_census.experimental.get_embedding_metadata_by_name", "cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder", "cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer", "cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe", "cellxgene_census.experimental.ml.pytorch.Stats", "cellxgene_census.experimental.ml.pytorch.experiment_dataloader", "cellxgene_census.experimental.pp.get_highly_variable_genes", "cellxgene_census.experimental.pp.highly_variable_genes", "cellxgene_census.experimental.pp.mean_variance", "cellxgene_census.get_anndata", "cellxgene_census.get_census_version_description", "cellxgene_census.get_census_version_directory", "cellxgene_census.get_default_soma_context", "cellxgene_census.get_presence_matrix", "cellxgene_census.get_source_h5ad_uri", "cellxgene_census.open_soma", "What\u2019s new?", "R package cellxgene.census V1 is out!", "Memory-efficient implementations of commonly used single-cell methods", "Introducing a normalized layer and pre-calculated cell and gene statistics in Census", "Census supports categoricals for cell metadata", "CZ CELLxGENE Discover Census in AWS", "FAQ", "Census data releases", "Installation", "CZ CELLxGENE Discover Census", "Quick start", "Census data and schema", "CELLxGENE Census Mirroring", "CZ CELLxGENE Discover Census Schema", "CZ CELLxGENE Discover Census storage & release policy", "Census \u201cwhat\u2019s new?\u201d article editorial guidelines", "Census API notebook/vignette editorial guidelines", "Python tutorials", "CZ CELLxGENE Discover Census", "Learning about the CZ CELLxGENE Census", "Integrating multi-dataset slices of data", "Exploring biologically relevant clusters in Census embeddings", "Exploring all data from a tissue", "Geneformer for cell class prediction and data projection", "Normalizing full-length gene sequencing data", "scVI for cell type prediction and data projection", "Summarizing cell and gene metadata", "Access CELLxGENE collaboration embeddings (scVI, Geneformer)", "Generating citations for Census slices", "Computing on X using online (incremental) algorithms", "Genes measured in each cell (dataset presence matrix)", "Exploring the Census Datasets table", "Understanding and filtering out duplicate cells", "Access CELLxGENE-hosted embeddings", "Querying data using the gget cellxgene module", "Querying and fetching the single-cell data and cell/gene metadata.", "Exploring pre-calculated summary cell counts", "Experimental Highly Variable Genes API", "Out-of-core (incremental) mean and variance calculation", "Training a PyTorch Model", "Python API", "Installation", "What is SOMA"], "terms": {"The": [0, 1, 2, 3, 4, 5, 6, 7, 9, 12, 13, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56, 57, 58, 59, 61, 62, 64], "websit": 0, "i": [0, 1, 3, 4, 6, 9, 10, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], "current": [0, 17, 24, 25, 31, 32, 34, 40, 41, 42, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 64], "host": [0, 27, 29, 30, 31, 34, 35, 36, 38, 39, 40, 43, 45, 47, 49, 62, 63, 64], "http": [0, 8, 13, 17, 28, 30, 34, 38, 42, 44, 45, 46, 47, 50, 55, 56, 63], "chanzuckerberg": [0, 25, 30, 31, 35, 37, 38, 40, 56, 63], "github": [0, 25, 28, 31, 35, 38, 40, 55, 56, 63], "io": [0, 13, 42, 44, 46], "cellxgen": [0, 7, 8, 16, 17, 20, 22, 25, 26, 28, 29, 30, 32, 33, 37, 38, 39, 42, 43, 44, 45, 46, 47, 50, 53, 54, 62, 63, 64], "censu": [0, 1, 2, 3, 4, 6, 7, 8, 12, 15, 16, 17, 18, 19, 20, 21, 22, 24, 30, 32, 42, 47, 49, 51, 56, 58, 59, 63, 64], "site": [0, 28, 37, 38, 42, 44, 46], "rebuilt": 0, "each": [0, 2, 7, 8, 9, 17, 24, 25, 26, 28, 29, 32, 33, 34, 35, 39, 42, 43, 44, 45, 46, 48, 49, 50, 51, 53, 55, 56, 58, 59, 61, 62], "time": [0, 17, 24, 28, 35, 54, 56, 61], "tag": [0, 2, 4, 6, 27, 29, 36], "creat": [0, 12, 13, 27, 28, 31, 32, 33, 36, 38, 40, 41, 42, 45, 49, 50, 53, 55, 59], "repo": [0, 30, 62], "which": [0, 3, 4, 5, 6, 7, 9, 11, 12, 13, 14, 15, 17, 19, 21, 23, 24, 25, 26, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 59, 60, 61], "happen": [0, 17, 34], "releas": [0, 16, 17, 23, 25, 30, 32, 34, 35, 37, 41, 42, 44, 46, 48, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61], "includ": [0, 17, 24, 27, 28, 31, 37, 38, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 60, 62, 64], "regener": 0, "sphinx": 0, "python": [0, 5, 9, 23, 24, 25, 26, 29, 31, 37, 38, 40, 41, 45, 48, 50, 52, 55, 56, 57, 64], "doc": [0, 23, 28, 37, 38, 42, 61], "r": [0, 22, 25, 26, 28, 29, 31, 37, 38, 40, 44, 64], "pkgdown": 0, "check": [0, 23, 26, 31, 32, 34, 40, 42, 46, 52, 63], "git": [0, 8, 63], "simpli": [0, 28, 45, 63], "copi": [0, 18, 27, 36, 42, 43, 44, 46, 47], "dure": [0, 42, 45], "rebuild": 0, "see": [0, 9, 13, 24, 25, 26, 27, 28, 30, 32, 33, 35, 42, 43, 44, 45, 46, 54, 55, 56, 57, 59, 61, 62], "vignettes_": 0, "further": [0, 18, 25, 37, 43, 48, 55], "explan": [0, 37, 38, 54], "A": [0, 2, 3, 4, 5, 6, 9, 11, 13, 14, 17, 18, 19, 20, 21, 27, 29, 31, 32, 33, 34, 35, 36, 37, 39, 40, 41, 42, 44, 45, 46, 47, 52, 53, 54, 55, 56, 57], "docsit": 0, "can": [0, 9, 11, 12, 13, 18, 21, 23, 24, 25, 26, 27, 29, 31, 32, 33, 34, 35, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 63], "trigger": 0, "manual": 0, "through": [0, 7, 30, 31, 40, 47, 55, 57, 61], "workflow_dispatch": 0, "run": [0, 42, 43, 45, 47, 56, 61, 63], "workflow": [0, 39, 45], "thi": [0, 1, 8, 9, 10, 11, 13, 15, 17, 20, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63], "should": [0, 27, 29, 35, 36, 37, 38, 41, 42, 43, 44, 46, 48, 53, 61, 63], "done": [0, 12, 13, 14, 24, 27, 35, 44, 46, 59, 61], "bug": [0, 31, 40], "found": [0, 6, 19, 21, 23, 31, 40, 42, 43, 44, 46, 47, 53, 57], "necessari": [0, 7, 24, 31, 38, 40, 43], "In": [0, 4, 24, 25, 26, 29, 30, 31, 35, 38, 40, 41, 42, 43, 44, 45, 49, 51, 52, 54, 55, 59, 61, 63], "order": [0, 9, 26, 29, 38, 45, 61], "test": [0, 45, 48, 61, 63], "chang": [0, 25, 35, 36], "local": [0, 9, 27, 43, 53, 61, 63], "first": [0, 9, 19, 23, 24, 30, 32, 42, 44, 45, 46, 48, 49, 50, 51, 52, 54, 55, 61], "instal": [0, 8, 37], "requir": [0, 8, 9, 27, 35, 36, 44, 48, 49, 55, 56, 61], "pip": [0, 8, 28, 30, 56, 63], "txt": 0, "brew": 0, "pandoc": 0, "mac": 0, "o": [0, 18, 45, 47, 56], "Then": [0, 42, 45, 46, 49, 50, 55, 61], "And": [0, 25, 27, 32, 41, 42, 44, 45, 46, 49, 50, 54, 55, 57], "follow": [0, 23, 24, 25, 27, 28, 29, 30, 31, 32, 33, 35, 36, 37, 38, 40, 41, 43, 44, 45, 47, 54, 55, 57, 60, 61, 63], "command": [0, 28, 41, 45, 47], "cd": [0, 63], "make": [0, 23, 30, 35, 42, 44, 45, 46, 51, 63], "html": [0, 13, 28, 42, 44, 46], "gener": [0, 8, 10, 12, 13, 24, 28, 29, 31, 35, 39, 40, 41, 42, 43, 55, 56], "_build": 0, "index": [0, 4, 9, 12, 14, 15, 19, 33, 35, 43, 45, 47, 51, 52, 53, 59, 60], "dataset_id": [1, 13, 20, 24, 26, 35, 38, 41, 43, 44, 45, 46, 47, 49, 50, 52, 53, 54, 55, 56, 57, 60], "str": [1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 35, 42, 43, 45, 56], "to_path": [1, 53], "census_vers": [1, 2, 4, 6, 8, 12, 16, 17, 20, 21, 25, 26, 29, 32, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61], "stabl": [1, 12, 13, 17, 20, 21, 23, 29, 30, 32, 41, 42, 44, 46, 48, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61], "none": [1, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 15, 16, 17, 18, 21, 35, 43, 44, 45, 51, 56], "download": [1, 24, 28, 49, 55, 62], "sourc": [1, 20, 21, 27, 30, 35, 36, 55, 56, 61, 63], "h5ad": [1, 16, 17, 20, 21, 27, 35, 36, 42, 45, 46, 50, 52, 56, 62], "dataset": [1, 7, 8, 12, 13, 19, 23, 25, 27, 29, 31, 33, 37, 38, 39, 40, 41, 43, 46, 47, 48, 49, 50, 51, 54, 55, 56, 57, 58], "given": [1, 2, 9, 16, 24, 27, 29, 35, 36, 43, 44, 49, 51, 52, 53, 55, 61], "user": [1, 12, 13, 18, 20, 23, 24, 25, 26, 27, 28, 31, 33, 34, 37, 38, 40, 42, 44, 45, 46, 47, 51, 59, 61], "specifi": [1, 3, 6, 8, 9, 12, 13, 14, 17, 18, 21, 25, 27, 29, 32, 34, 36, 41, 42, 44, 46, 48, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61], "file": [1, 8, 21, 27, 28, 29, 34, 35, 36, 41, 45, 47, 48, 56], "name": [1, 3, 6, 7, 9, 12, 13, 16, 17, 20, 27, 29, 32, 33, 35, 36, 37, 41, 42, 43, 44, 46, 48, 50, 51, 54, 55, 56, 57, 59, 62], "paramet": [1, 2, 3, 4, 5, 6, 7, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 42, 44, 46, 55], "fetch": [1, 4, 9, 12, 13, 15, 23, 37, 38, 39, 45, 47, 49, 50, 54, 61], "origin": [1, 9, 25, 35, 43, 44, 45, 47, 54, 61], "associ": [1, 3, 6, 35, 39, 44, 45], "where": [1, 9, 14, 34, 35, 36, 38, 42, 43, 44, 46, 48, 49, 51, 54, 55, 59, 60, 61], "written": [1, 12, 15, 37], "must": [1, 9, 12, 13, 30, 32, 35, 36, 37, 38, 43, 54, 63], "alreadi": [1, 43, 47], "exist": [1, 20, 26, 27, 28, 31, 34, 36, 40, 41, 44, 45, 54], "version": [1, 2, 3, 4, 6, 15, 16, 17, 20, 21, 23, 25, 28, 30, 34, 38, 41, 42, 43, 45, 47, 48, 49, 50, 52, 54, 55, 56, 57], "default": [1, 3, 4, 5, 6, 7, 8, 9, 12, 14, 15, 17, 18, 20, 21, 26, 29, 34, 42, 46, 51, 56, 60, 61], "rais": [1, 4, 6, 11, 12, 13, 16, 19, 20, 21, 41, 48], "valueerror": [1, 4, 6, 11, 12, 13, 16, 19, 21], "path": [1, 21, 27, 35, 36, 45, 56], "e": [1, 2, 3, 4, 6, 7, 9, 13, 14, 21, 25, 27, 29, 31, 33, 34, 35, 36, 37, 40, 41, 43, 44, 45, 48, 49, 51, 52, 53, 54, 55, 56, 59, 63], "overwrit": 1, "an": [1, 2, 4, 7, 9, 11, 12, 14, 15, 17, 21, 23, 24, 25, 27, 30, 31, 32, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 48, 50, 51, 53, 57, 60, 62, 63], "lifecycl": [1, 4, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21], "matur": [1, 16, 17, 19, 20, 21, 41, 43], "get_source_h5ad_uri": [1, 21, 53], "look": [1, 21, 25, 34, 41, 42, 44, 45, 46, 47, 49, 54, 55, 56, 57, 61, 64], "up": [1, 21, 24, 47, 51, 54], "locat": [1, 18, 21, 28, 34, 36, 53, 55, 57], "exampl": [1, 2, 4, 5, 8, 12, 13, 15, 16, 17, 18, 19, 20, 21, 23, 25, 27, 28, 30, 32, 35, 36, 39, 42, 43, 44, 49, 51, 55, 56, 57, 61, 63], "8e47ed12": 1, "c658": 1, "4252": [1, 44, 52], "b126": 1, "381df8d52a3d": 1, "tmp": [1, 21], "data": [1, 4, 9, 10, 11, 12, 13, 16, 17, 20, 24, 26, 30, 32, 34, 37, 38, 43, 48, 49, 50, 51, 52, 58, 59, 60, 61, 64], "list": [2, 3, 12, 13, 15, 27, 31, 33, 34, 35, 37, 38, 40, 41, 43, 44, 45, 47, 48, 52, 56, 57, 62], "dict": [2, 5, 6, 16, 17, 18, 21, 43, 47], "ani": [2, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 18, 21, 23, 24, 25, 27, 28, 29, 31, 32, 35, 38, 40, 41, 43, 47, 49, 50, 51, 52, 53, 55, 58, 59, 61], "return": [2, 3, 4, 5, 6, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 41, 43, 48, 51, 52, 56, 57, 58, 59, 60, 61], "dictionari": [2, 5, 6, 16, 17, 18, 21, 26, 34, 36, 41, 45, 55, 57], "all": [2, 3, 8, 9, 12, 13, 15, 17, 23, 25, 26, 27, 28, 29, 31, 32, 33, 35, 36, 37, 38, 39, 40, 42, 43, 45, 46, 47, 49, 50, 51, 53, 54, 55, 56, 57, 58, 63, 64], "avail": [2, 9, 13, 15, 17, 24, 25, 28, 29, 30, 34, 36, 42, 44, 45, 49, 55, 56, 57, 59, 62], "embed": [2, 3, 4, 5, 6, 15, 31, 40, 42, 47], "g": [2, 3, 4, 6, 7, 13, 14, 21, 27, 29, 31, 33, 34, 35, 36, 37, 40, 41, 43, 45, 48, 51, 53, 55, 56, 59, 63], "2023": [2, 4, 6, 23, 24, 25, 27, 31, 32, 34, 36, 37, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61], "12": [2, 4, 6, 16, 17, 20, 21, 25, 36, 41, 42, 43, 44, 45, 46, 47, 49, 52, 54, 55, 57], "15": [2, 4, 6, 17, 27, 36, 37, 41, 42, 43, 44, 45, 46, 47, 49, 52, 54, 55, 56, 60], "contain": [2, 3, 4, 5, 6, 8, 12, 13, 15, 16, 17, 19, 20, 21, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44, 45, 46, 48, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 62], "metadata": [2, 5, 6, 8, 12, 15, 22, 24, 27, 28, 31, 33, 38, 39, 40, 42, 43, 45, 46, 47, 49, 51, 52, 53, 54, 58, 59, 61, 63], "describ": [2, 5, 6, 27, 33, 35, 36, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 57, 58, 59, 60], "experiment_nam": [2, 43, 49, 55, 60], "experiment_1": 2, "measurement_nam": [2, 7, 9, 12, 15, 19, 24, 25, 32, 43, 45, 47, 49, 50, 51, 52, 54, 55, 59, 60, 61], "rna": [2, 7, 12, 15, 19, 24, 25, 28, 31, 32, 33, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 59, 60, 61], "organ": [2, 3, 6, 12, 15, 19, 23, 24, 25, 28, 31, 32, 33, 37, 38, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 53, 54, 55, 57, 58, 59, 61], "homo_sapien": [2, 7, 8, 23, 24, 25, 26, 27, 32, 33, 35, 41, 43, 44, 45, 47, 48, 49, 50, 52, 53, 54, 55, 56, 57, 58, 61], "n_embed": [2, 55], "1000": [2, 12, 13, 15, 24, 35, 42, 46], "n_featur": [2, 55], "200": [2, 4], "uri": [2, 4, 5, 16, 17, 18, 20, 21, 27, 34, 36, 43, 45, 47, 53, 62], "s3": [2, 16, 17, 18, 20, 21, 27, 28, 30, 34, 36, 43, 45, 47, 49, 53, 55, 63], "bucket": [2, 18, 21, 27, 28, 30, 35, 36], "embedding_1": 2, "embedding_nam": [3, 6, 43, 49, 55], "embedding_typ": [3, 6], "obs_embed": [3, 6, 15, 49, 55], "get": [3, 12, 16, 17, 21, 23, 24, 25, 26, 31, 32, 38, 40, 41, 42, 43, 44, 45, 46, 47, 49, 50, 52, 53, 54, 55, 57, 64], "specif": [3, 6, 21, 28, 29, 31, 33, 35, 36, 40, 41, 43, 48, 51, 54, 57], "scvi": [3, 6, 29, 38, 39, 43, 55, 63], "type": [3, 9, 19, 23, 26, 27, 29, 32, 33, 36, 39, 42, 43, 45, 46, 51, 52, 58, 61, 63], "embedding_uri": [4, 5, 43, 49, 55], "obs_soma_joinid": [4, 55], "ndarrai": [4, 12, 15, 49, 51, 55], "dtype": [4, 9, 12, 15, 41, 42, 43, 44, 46, 48, 49, 50, 51, 54, 55, 57, 61], "int64": [4, 9, 26, 35, 41, 42, 44, 46, 48, 51, 54, 57], "arrai": [4, 12, 15, 19, 28, 31, 33, 40, 44, 46, 51, 52, 61], "context": [4, 5, 7, 18, 21, 25, 27, 32, 41, 44, 48, 54, 55], "somatiledbcontext": [4, 5, 18, 21, 55], "float32": [4, 35, 46, 49, 51, 55], "read": [4, 5, 9, 10, 15, 19, 24, 25, 27, 28, 31, 32, 33, 35, 40, 41, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 53, 55, 57, 58, 59, 61], "cell": [4, 7, 8, 9, 12, 13, 16, 17, 20, 22, 27, 28, 31, 34, 36, 37, 38, 39, 40, 42, 46, 53, 59, 60, 61, 64], "ob": [4, 8, 9, 12, 13, 14, 15, 23, 27, 32, 33, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 58, 59, 60, 61], "dens": [4, 9, 28, 31, 33, 40], "numpi": [4, 19, 28, 31, 38, 40, 42, 43, 44, 45, 47, 51, 52, 61], "without": [4, 31, 36, 40, 43, 45, 61], "nan": [4, 43, 55, 59], "valu": [4, 8, 9, 12, 13, 14, 15, 16, 19, 23, 24, 25, 26, 28, 29, 33, 35, 36, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 54, 55, 56, 57, 59, 60, 61, 64], "us": [4, 5, 9, 10, 11, 12, 13, 14, 15, 17, 18, 21, 22, 26, 27, 29, 30, 32, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44, 46, 48, 49, 50, 52, 53, 54, 55, 57, 58, 59, 60, 61, 62, 63, 64], "verifi": 4, "content": [4, 8, 27, 29, 32, 33, 34, 35, 36, 37, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 56, 57, 58, 60, 61], "from": [4, 7, 8, 9, 14, 16, 19, 24, 25, 27, 29, 30, 31, 32, 33, 34, 35, 37, 38, 39, 40, 41, 46, 47, 49, 51, 52, 54, 55, 57, 58, 59, 60, 61, 63], "same": [4, 24, 25, 32, 33, 43, 45, 46, 49, 53, 54, 55, 57, 59], "slice": [4, 12, 15, 24, 25, 28, 31, 37, 38, 39, 40, 41, 44, 49, 51, 52, 53, 55, 57, 64], "custom": [4, 5, 18, 21, 27], "tiledbsoma": [4, 5, 7, 8, 9, 12, 13, 14, 15, 18, 21, 24, 25, 27, 32, 43, 49, 51, 54, 55, 59, 60, 61, 62], "open": [4, 5, 18, 20, 21, 23, 24, 25, 27, 29, 31, 32, 38, 40, 42, 44, 45, 50, 55, 56, 59, 64], "soma": [4, 5, 9, 10, 12, 15, 16, 17, 18, 21, 23, 24, 28, 29, 31, 32, 34, 35, 36, 38, 40, 41, 43, 48, 49, 51, 52, 53, 55, 59, 60, 61, 62], "object": [4, 5, 9, 12, 15, 18, 19, 20, 21, 23, 25, 28, 31, 35, 37, 40, 41, 42, 43, 44, 48, 49, 50, 53, 55, 57, 61, 64], "option": [4, 5, 12, 13, 17, 20, 21, 27, 30, 35, 36, 53, 56, 63], "ar": [4, 6, 9, 11, 12, 13, 14, 17, 21, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 55, 56, 57, 59, 60, 61, 63], "position": [4, 51, 52], "other": [4, 7, 11, 24, 25, 35, 37, 41, 43, 46, 49, 51, 52, 53, 54, 55, 57, 63], "word": [4, 35, 36, 38, 43, 51, 52, 55], "identifi": [4, 12, 13, 17, 24, 29, 34, 36, 43, 46], "correspond": [4, 14, 17, 25, 27, 35, 38, 41, 43, 44, 45, 46, 47, 48, 49, 51, 54, 55, 57], "ith": 4, "posit": [4, 9, 41, 44, 45, 51], "mismatch": 4, "obs_somaids_to_fetch": 4, "np": [4, 38, 42, 43, 44, 45, 47, 51, 55], "10": [4, 17, 28, 29, 32, 37, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 53, 54, 55, 56, 57, 60, 61], "11": [4, 17, 28, 30, 41, 42, 43, 44, 45, 46, 47, 49, 52, 53, 54, 55, 56, 57, 60, 61, 63], "emb": [4, 43, 45, 55], "shape": [4, 41, 43, 44, 49, 51, 54, 55, 61], "2": [4, 9, 16, 17, 18, 20, 21, 24, 27, 28, 29, 30, 32, 34, 36, 37, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63], "0": [4, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 29, 32, 33, 34, 37, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 56, 57, 58, 59, 60, 61], "4": [4, 9, 24, 29, 32, 35, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 61], "02954102": 4, "1": [4, 9, 14, 18, 24, 25, 27, 29, 32, 33, 34, 36, 37, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63], "0390625": 4, "14550781": 4, "40820312": 4, "00224304": 4, "265625": 4, "05883789": 4, "7890625": 4, "get_experiment_metadata": 5, "If": [6, 7, 9, 11, 12, 13, 14, 17, 18, 21, 27, 28, 29, 30, 31, 34, 35, 36, 38, 40, 41, 44, 48, 54, 55, 56, 61, 63], "more": [6, 9, 13, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 35, 36, 40, 41, 42, 43, 45, 46, 47, 48, 49, 51, 54, 55, 56, 57, 59, 61, 62], "match": [6, 12, 45, 47, 48, 53, 55, 56, 57, 59], "queri": [6, 7, 8, 9, 12, 13, 14, 15, 19, 24, 29, 31, 35, 38, 39, 40, 41, 42, 44, 47, 48, 51, 53, 54, 58, 59, 60, 61], "most": [6, 17, 24, 28, 29, 35, 41, 42, 43, 44, 45, 48, 54, 59, 61, 64], "recent": [6, 17, 23, 29], "one": [6, 7, 12, 15, 19, 21, 28, 29, 33, 34, 35, 36, 37, 38, 41, 42, 43, 45, 47, 53, 54, 55, 56, 57, 61], "either": [6, 17, 20, 27, 28, 35, 61], "var_embed": [6, 15, 55], "class": [7, 8, 9, 10, 19, 32, 33, 39, 49, 51, 52, 55], "experi": [7, 8, 9, 12, 15, 18, 25, 33, 35, 38, 43, 48, 49, 52, 53, 55, 58, 59, 61], "layer_nam": 7, "raw": [7, 9, 12, 13, 14, 15, 24, 25, 32, 33, 41, 43, 44, 49, 51, 54, 55, 60, 61], "block_siz": 7, "int": [7, 8, 9, 10, 11, 12, 13, 14, 15, 35, 43, 47, 51], "kwarg": [7, 8], "abstract": 7, "base": [7, 9, 12, 17, 23, 24, 25, 31, 33, 35, 36, 40, 41, 43, 44, 45, 47, 49, 54, 55, 56, 57, 59, 64], "method": [7, 8, 9, 10, 11, 12, 13, 18, 22, 25, 26, 27, 28, 29, 41, 43, 46, 48, 49, 51, 53, 55, 57, 59, 61, 64], "process": [7, 8, 9, 11, 24, 25, 28, 41, 45, 51, 54], "experimentaxisqueri": [7, 8, 13, 14, 59, 60], "result": [7, 8, 9, 12, 13, 14, 17, 24, 32, 42, 43, 47, 48, 49, 51, 55, 57, 59, 60, 61], "hug": [7, 8], "face": [7, 8, 38], "item": [7, 8, 33, 41, 48, 53, 61], "repres": [7, 14, 23, 29, 33, 35, 44, 55, 60], "subclass": [7, 43], "implement": [7, 22, 28, 31, 35, 40, 51, 59, 61, 64], "cell_item": 7, "row": [7, 9, 12, 13, 14, 19, 25, 32, 35, 41, 43, 44, 49, 51, 52, 53, 55, 56, 57, 58, 59, 60, 61], "x": [7, 9, 12, 13, 14, 15, 25, 32, 33, 37, 38, 41, 42, 43, 44, 45, 46, 47, 54, 55, 60, 61], "layer": [7, 9, 12, 13, 14, 15, 22, 31, 35, 37, 40, 42, 52, 56, 60], "mai": [7, 9, 12, 15, 17, 23, 26, 28, 29, 30, 31, 35, 36, 37, 38, 40, 41, 42, 43, 51, 52, 53, 54, 55, 61], "also": [7, 9, 17, 24, 26, 27, 28, 30, 43, 45, 47, 48, 52, 53, 54, 55, 56, 57, 59, 61, 63], "overrid": [7, 18, 21], "__init__": [7, 8, 9, 10, 51, 61], "__enter__": 7, "perform": [7, 9, 17, 24, 25, 29, 30, 31, 32, 35, 40, 41, 42, 44, 46, 51, 54, 55, 57, 60, 61], "preprocess": [7, 44], "inherit": 7, "so": [7, 9, 28, 41, 42, 43, 44, 45, 46, 47, 48, 51, 52, 61], "typic": [7, 43, 61], "usag": [7, 8, 9, 24, 27, 28, 32, 37, 42, 54, 61], "would": [7, 9, 42, 54, 61], "import": [7, 8, 24, 25, 26, 27, 29, 32, 37, 38, 41, 42, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63], "geneformertoken": 7, "open_soma": [7, 8, 12, 15, 18, 23, 24, 25, 26, 27, 29, 32, 34, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63], "subclassofcelldatasetbuild": 7, "census_data": [7, 8, 23, 24, 25, 26, 27, 32, 41, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 59, 60, 61], "obs_queri": [7, 8, 9, 24, 25, 32, 49, 51, 54, 55, 59, 60, 61], "tilebsoma": 7, "axisqueri": [7, 8, 9, 24, 25, 32, 49, 51, 54, 55, 59, 60, 61], "defin": [7, 8, 12, 13, 28, 33, 34, 35, 36, 38, 41, 48, 51, 56, 57], "some": [7, 8, 9, 11, 24, 26, 35, 41, 42, 43, 44, 45, 46, 47, 54, 56, 63], "subset": [7, 8, 12, 26, 42, 43, 44, 45, 46, 47, 55, 60, 61], "var_queri": [7, 9, 24, 51, 61], "builder": 7, "build": [7, 8, 12, 15, 26, 27, 28, 29, 30, 33, 35, 36, 41, 44, 49, 55, 56], "initi": [7, 32, 35, 47, 49, 54, 55], "measur": [7, 9, 12, 15, 19, 24, 25, 33, 35, 39, 44, 46, 53, 55], "number": [7, 8, 9, 11, 12, 13, 14, 17, 25, 29, 35, 44, 45, 46, 47, 49, 51, 53, 54, 55, 59, 60, 61, 62], "memori": [7, 9, 18, 22, 23, 25, 26, 28, 30, 31, 37, 39, 40, 48, 51, 53, 54, 56, 57, 61, 63, 64], "onc": [7, 12, 13, 17, 23, 29, 41, 48, 51, 61], "unspecifi": 7, "sparsendarrayread": 7, "blockwis": [7, 55], "select": [7, 12, 13, 14, 15, 25, 32, 34, 35, 38, 42, 43, 44, 45, 49, 52, 53, 54, 55, 57, 59], "pass": [7, 9, 11, 18, 42, 47, 51, 56, 57, 61], "especi": 7, "attribut": [7, 8, 9, 10, 45, 49, 55, 56, 61], "obs_column_nam": [8, 9, 23, 25, 32, 61], "sequenc": [8, 9, 12, 13, 15, 31, 33, 38, 39, 40, 42, 43, 44, 52, 53, 55], "obs_attribut": 8, "max_input_token": 8, "2048": 8, "token_dictionary_fil": 8, "gene_median_fil": 8, "geneform": [8, 29, 39, 43, 55], "token": 8, "human": [8, 23, 25, 32, 33, 35, 36, 38, 39, 42, 43, 45, 48, 49, 52, 53, 54, 57, 58], "packag": [8, 22, 28, 30, 31, 32, 37, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 55, 56, 57, 59, 60, 63], "separ": [8, 35, 43, 54, 56, 59], "co": [8, 28, 31, 40], "ctheodori": 8, "8df5dc1": 8, "latest": [8, 16, 17, 21, 25, 26, 30, 36, 38, 41, 42, 48, 50, 52, 55, 56, 57], "set": [8, 9, 18, 21, 24, 25, 32, 36, 42, 43, 45, 47, 52, 59, 61], "value_filt": [8, 12, 15, 23, 24, 25, 27, 29, 32, 41, 42, 43, 44, 46, 49, 50, 51, 54, 55, 57, 58, 59, 60, 61], "is_primary_data": [8, 12, 24, 31, 33, 35, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61], "true": [8, 9, 12, 14, 17, 24, 30, 35, 36, 38, 41, 42, 43, 44, 46, 47, 48, 51, 54, 55, 56, 57, 58, 59, 60, 61], "tissue_gener": [8, 12, 15, 23, 24, 27, 32, 35, 41, 43, 44, 49, 53, 54, 55, 56, 57, 58, 59, 60, 61], "tongu": [8, 49, 52, 54, 55, 61], "soma_joinid": [8, 9, 12, 14, 15, 19, 24, 26, 29, 32, 35, 41, 42, 43, 44, 46, 49, 50, 51, 52, 53, 55, 56, 57, 58, 59, 60], "cell_type_ontology_term_id": [8, 26, 35, 41, 44, 48, 49, 53, 55, 56, 57, 58, 60], "input_id": [8, 45], "length": [8, 35, 38, 39, 41, 44, 45, 50], "datafram": [8, 9, 12, 13, 14, 15, 19, 23, 25, 26, 32, 33, 35, 41, 43, 44, 48, 49, 51, 52, 53, 55, 56, 57, 59, 60, 61], "column": [8, 9, 12, 13, 14, 15, 32, 33, 35, 41, 43, 44, 45, 48, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61], "propag": [8, 61], "maximum": [8, 9, 12, 13, 61], "input": [8, 25, 51, 57, 61], "pickl": [8, 61], "suppli": 8, "map": [8, 23, 35, 41, 44, 45, 47, 51, 52, 53], "ensembl": [8, 45, 47, 56], "gene": [8, 9, 12, 13, 22, 23, 28, 31, 32, 33, 37, 38, 39, 40, 43, 45, 47, 49, 54, 55, 61], "id": [8, 35, 41, 42, 43, 45, 46, 47, 49, 51, 55, 56], "onto": 8, "median": 8, "express": [8, 25, 28, 35, 42, 43, 47, 49, 51, 55], "By": [8, 23, 24, 25, 26, 37, 41, 46, 56], "load": [8, 11, 23, 26, 28, 31, 38, 40, 42, 44, 47, 50, 57, 61, 63], "x_name": [9, 12, 15, 25, 49, 55, 61], "batch_siz": [9, 11, 61], "shuffl": [9, 11, 61], "bool": [9, 14, 17, 43], "fals": [9, 14, 17, 18, 24, 26, 27, 35, 36, 41, 42, 43, 44, 45, 47, 54, 56, 57, 58, 59, 60], "seed": [9, 42, 61], "return_sparse_x": 9, "soma_chunk_s": [9, 61], "use_eager_fetch": 9, "torchdata": [9, 11, 61], "datapip": [9, 11, 61], "iter": [9, 11, 23, 25, 27, 32, 37, 51, 54, 61, 64], "iterdatapip": [9, 11, 61], "upon": [9, 12, 21, 29, 41, 48, 59], "along": [9, 14, 23, 25, 36, 45, 60, 61], "var": [9, 12, 13, 14, 15, 19, 24, 32, 33, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 55, 56, 59, 60, 61], "ax": [9, 14, 61], "provid": [9, 21, 27, 28, 29, 31, 32, 33, 34, 35, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 52, 53, 54, 55, 56, 58, 59, 61, 62], "over": [9, 13, 14, 21, 25, 32, 51, 55, 56, 60], "when": [9, 11, 12, 13, 26, 34, 35, 36, 43, 51, 55, 56, 58, 59, 61], "": [9, 13, 17, 23, 27, 28, 35, 38, 39, 41, 42, 43, 44, 45, 46, 47, 49, 51, 52, 54, 55, 57, 60, 61], "built": [9, 25, 31, 35, 40, 62, 64], "function": [9, 12, 13, 25, 28, 29, 41, 51, 55, 56, 58, 59, 61, 62], "batch": [9, 12, 13, 24, 38, 43, 45, 47, 51, 59, 61], "x_batch": [9, 61], "y_batch": [9, 61], "control": [9, 24, 59, 61], "tensor": [9, 61], "have": [9, 17, 23, 25, 29, 30, 31, 35, 37, 40, 41, 42, 43, 46, 47, 48, 49, 51, 52, 55, 59, 61], "rank": [9, 12, 13, 59, 61], "2415": 9, "torch": [9, 11, 61], "encod": [9, 41, 42, 48, 49, 51, 55, 61], "For": [9, 14, 23, 25, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 40, 41, 42, 43, 44, 45, 46, 47, 48, 51, 52, 54, 55, 56, 57, 59, 61, 62, 63], "larger": [9, 28, 31, 32, 40, 43, 51, 64], "dataload": [9, 11], "3": [9, 12, 13, 24, 28, 29, 30, 32, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63], "2416": 9, "2417": 9, "whether": [9, 56], "spars": [9, 14, 19, 28, 31, 32, 33, 38, 40, 42, 46, 51, 52, 55], "model": [9, 12, 13, 29, 31, 35, 40, 42, 43, 49, 55, 59, 64], "support": [9, 11, 22, 27, 30, 31, 33, 34, 36, 40, 43, 45, 56, 60, 61, 64], "reduc": [9, 18, 25, 44, 49, 54, 55, 61], "determin": [9, 52, 61], "element": [9, 14, 19, 51, 52, 60], "alwai": [9, 17, 26, 33, 35, 54], "panda": [9, 12, 13, 14, 19, 26, 28, 31, 32, 40, 41, 43, 44, 48, 51, 52, 53, 57, 58, 59, 60, 61], "equival": [9, 25, 49, 51, 55], "soma_dim_0": [9, 49, 51, 54, 55], "matrix": [9, 14, 19, 25, 28, 31, 32, 33, 39, 40, 41, 42, 43, 44, 46, 49, 55, 56], "remain": [9, 43], "string": [9, 12, 13, 26, 34, 35, 36, 55, 57, 61], "integ": [9, 12, 15, 33, 36, 44, 46, 51, 61], "need": [9, 26, 30, 32, 35, 38, 41, 42, 45, 47, 52, 54, 57, 63], "decod": [9, 55, 61], "obtain": [9, 24, 38, 41, 42, 44, 45, 47, 54, 57, 61], "call": [9, 12, 13, 27, 29, 32, 34, 41, 42, 44, 46, 48, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61], "its": [9, 18, 23, 25, 27, 29, 31, 33, 35, 40, 42, 45, 47, 48, 52, 54, 57, 61, 64], "inverse_transform": [9, 61], "exp_data_pip": 9, "obs_encod": [9, 61], "obs_attr_nam": 9, "encoded_valu": 9, "construct": [9, 41, 43, 44, 52, 53, 55], "new": [9, 29, 31, 32, 35, 40, 42, 45, 56, 61], "filter": [9, 12, 15, 17, 23, 24, 25, 28, 29, 31, 32, 33, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 56, 57, 58, 61], "axi": [9, 12, 14, 15, 24, 33, 35, 43, 44, 45, 46, 47, 48, 49, 51, 54, 55, 59, 60, 61], "veri": [9, 58], "larg": [9, 23, 41, 44, 51, 54, 55, 56, 57, 58], "featur": [9, 19, 31, 32, 33, 37, 39, 40, 44, 45, 47, 49, 52, 55, 56, 59], "doe": [9, 20, 34, 47, 51, 55, 61], "onli": [9, 13, 14, 17, 21, 24, 25, 29, 32, 33, 34, 35, 36, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 54, 55, 57, 59, 61], "being": [9, 61], "singl": [9, 12, 13, 22, 27, 31, 34, 35, 37, 38, 39, 40, 41, 42, 43, 46, 52, 54, 55, 56, 61, 62, 64], "multipl": [9, 12, 13, 17, 31, 33, 35, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 57, 58], "reason": [9, 17, 36, 43], "two": [9, 25, 27, 33, 35, 38, 41, 42, 49, 55, 56, 57, 59], "step": [9, 24, 28, 44, 45, 61], "global": [9, 43, 44, 61], "contigu": 9, "group": [9, 35, 41, 43, 45, 58], "chunk": [9, 14, 28, 54, 61], "random": [9, 42, 43, 44, 45, 47, 61], "within": [9, 32, 35, 37, 38, 41, 43, 55, 61], "sinc": [9, 11, 23, 28, 29, 37, 42, 44, 54, 56, 61], "retriev": [9, 10, 15, 21, 23, 25, 34, 35, 39, 41, 49, 61], "keep": [9, 25, 45, 58, 63], "fix": [9, 28, 35, 61], "size": [9, 14, 33, 35, 43, 45, 47, 55, 58, 61], "ensur": [9, 25, 28, 32, 38, 41, 42, 44, 46, 48, 50, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61], "non": [9, 17, 24, 28, 33, 35, 38, 41, 43, 44, 45, 51, 54, 55, 57], "occur": [9, 12, 13, 28, 55], "second": [9, 19, 23, 37, 38, 49, 52, 55, 61, 63], "note": [9, 23, 26, 31, 33, 34, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 56, 57, 58, 61], "maintain": [9, 42, 49, 55], "proxim": [9, 44, 58], "even": [9, 26, 55], "after": [9, 28, 37, 38, 44], "suffici": [9, 28, 61, 63], "train": [9, 39, 42, 55], "To": [9, 18, 23, 25, 28, 29, 30, 31, 35, 38, 40, 41, 42, 43, 44, 45, 46, 47, 49, 50, 53, 54, 55, 56, 57, 61, 64], "end": [9, 35, 42, 43, 54], "treat": 9, "hyperparamet": 9, "tune": [9, 29, 49], "nn": [9, 61], "parallel": [9, 51], "distributeddataparallel": 9, "partit": 9, "disjoint": [9, 33], "across": [9, 23, 28, 31, 32, 33, 35, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58], "worker": [9, 11], "As": [9, 24, 31, 32, 33, 35, 40, 42, 46, 49, 52, 55, 57, 60, 64], "still": [9, 42], "impact": [9, 43], "aspect": 9, "behavior": 9, "util": [9, 11, 24, 28, 38, 43, 45, 47, 48, 51, 54, 55, 63], "better": [9, 35, 39], "granular": [9, 61], "detail": [9, 24, 26, 27, 31, 32, 40, 42, 43, 57, 61], "gib": 9, "ram": [9, 51, 56, 63], "per": [9, 12, 13, 24, 25, 28, 35, 41, 44, 46, 52, 61], "request": [9, 27, 28, 31, 40, 45, 47, 48, 51, 58, 59, 61], "assum": [9, 13, 35, 43, 51, 61], "sparsiti": 9, "95": 9, "depend": [9, 23, 28, 30, 42, 45, 47], "next": [9, 23, 25, 27, 32, 61], "immedi": [9, 37, 38], "previous": [9, 43, 44], "made": [9, 43], "via": [9, 10, 27, 28, 29, 30, 31, 32, 34, 35, 40, 41, 42, 44, 45, 46, 47, 48, 53, 57, 61, 63, 64], "allow": [9, 23, 25, 47, 48, 54, 61], "network": 9, "filesystem": 9, "client": [9, 28], "side": 9, "potenti": [9, 31, 40, 43], "improv": 9, "overal": [9, 26, 61], "cost": [9, 28], "doubl": [9, 35], "n_ob": [10, 32, 44, 45, 47, 49, 51, 53, 55, 56, 57], "nnz": [10, 14, 25, 35, 49, 55], "elaps": 10, "n_soma_chunk": 10, "statist": [10, 14, 22, 51, 58], "about": [10, 23, 25, 28, 31, 32, 35, 36, 38, 39, 40, 42, 46, 48, 49, 54, 55, 56, 57], "experimentdatapip": [10, 11], "api": [10, 13, 24, 25, 29, 30, 31, 32, 34, 35, 37, 39, 40, 41, 42, 44, 46, 48, 52, 53, 56, 57, 61, 63, 64], "assess": [10, 43, 44], "throughput": 10, "attr": 10, "num_work": 11, "dataloader_kwarg": 11, "factori": 11, "safe": 11, "instanti": [11, 61], "work": [11, 23, 25, 30, 31, 40, 41, 63], "constructor": [11, 61], "applic": [11, 55], "sampler": [11, 61], "batch_sampl": [11, 61], "collate_fn": [11, 61], "ha": [11, 12, 13, 23, 25, 31, 33, 35, 37, 38, 40, 41, 42, 45, 46, 49, 52, 54, 55, 63, 64], "been": [11, 23, 25, 29, 37, 55, 63], "chain": [11, 61], "main": [11, 28, 30, 33, 38, 43, 49, 54, 55], "addit": [11, 15, 30, 31, 35, 38, 40, 41, 45, 47, 53, 56, 59, 60], "keyword": [11, 37], "argument": [11, 12, 13, 18, 21, 24, 25, 56, 57, 59, 60], "except": [11, 41, 43, 46, 57], "param": [11, 21], "collect": [12, 15, 19, 21, 27, 29, 33, 36, 41, 44, 45, 46, 47, 50, 52, 56], "obs_value_filt": [12, 15, 23, 24, 25, 29, 32, 42, 43, 45, 46, 47, 49, 50, 53, 54, 55, 57, 59, 60], "obs_coord": [12, 15, 43, 44], "byte": [12, 15], "float": [12, 13, 15, 25, 54, 61], "datetime64": [12, 15], "timestamptyp": [12, 15], "chunkedarrai": [12, 15], "var_value_filt": [12, 15, 23, 25, 32, 50, 54, 57], "var_coord": [12, 15, 44], "n_top_gen": [12, 13, 24, 42, 44, 46, 59], "flavor": [12, 13, 42, 44], "liter": [12, 13], "seurat_v3": [12, 13, 42, 44, 59], "span": [12, 13, 28, 43, 59], "batch_kei": [12, 13, 24, 42, 59], "max_loess_jitt": [12, 13], "1e": [12, 13, 61], "06": [12, 13, 36], "batch_key_func": [12, 13], "callabl": [12, 13], "convienc": 12, "wrapper": [12, 15, 27, 41, 59], "around": [12, 15, 32, 59], "highly_variable_gen": [12, 24, 42, 44, 45, 46], "execut": [12, 15, 27, 54], "annot": [12, 13, 28, 33, 35, 41, 42, 44, 45, 47, 59], "variabl": [12, 13, 25, 26, 28, 31, 32, 33, 35, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 61], "usual": [12, 15, 19, 24, 28, 61], "homo": [12, 15, 19, 23, 24, 25, 29, 32, 33, 35, 41, 44, 45, 47, 52, 54, 57, 58], "sapien": [12, 15, 19, 23, 24, 25, 29, 32, 33, 35, 41, 44, 45, 47, 52, 54, 57, 58], "mu": [12, 15, 19, 29, 35, 41, 42, 46, 53, 58], "musculu": [12, 15, 19, 29, 35, 41, 42, 46, 53, 58], "syntax": [12, 15], "coordin": [12, 15, 43, 51], "fraction": [12, 13, 24, 59], "estim": [12, 13, 59], "loess": [12, 13, 59], "varianc": [12, 13, 14, 25, 35, 39, 59], "fit": [12, 13, 42, 47, 48, 51, 59], "combin": [12, 13, 23, 28, 35, 41, 42, 43, 44, 47, 48, 51, 52, 54, 57], "kei": [12, 13, 35, 36, 41, 42, 43, 44, 49, 51, 55, 57], "convert": [12, 13, 23, 32, 51], "concaten": [12, 13, 32, 42, 54, 55, 60], "them": [12, 13, 23, 27, 28, 42, 45, 49, 54, 55, 57], "max_lowess_jitt": [12, 13, 59], "jitter": [12, 13, 44, 59], "add": [12, 13, 15, 25, 30, 35, 45, 46, 49, 51, 55], "case": [12, 13, 33, 35, 41, 42, 43, 46, 51, 54, 55, 59, 60, 61], "failur": [12, 13], "low": [12, 13, 28, 31, 40], "entri": [12, 13, 34], "count": [12, 13, 24, 25, 28, 31, 32, 33, 39, 40, 42, 44, 45, 46, 48, 53, 54, 57], "receiv": [12, 13, 44], "seri": [12, 13, 26, 35, 44, 51], "paramat": [12, 42], "hvg": [12, 13, 24, 59], "lung": [12, 15, 28, 35, 38, 41, 42, 45, 48, 52, 53, 54, 56, 57], "500": [12, 25, 28, 44, 46, 59], "anndata": [12, 15, 25, 27, 28, 31, 35, 40, 42, 43, 44, 45, 46, 47, 51, 53, 57, 64], "top": [12, 21, 24, 35, 36, 44, 48, 53, 58, 59], "mus_musculu": [12, 35, 46, 48, 51, 53, 54, 55, 56, 57, 59, 60], "highli": [12, 13, 28, 39, 42, 43, 44, 45, 46, 47, 61, 63, 64], "just": [12, 24, 28, 41, 44, 51, 54, 56], "hvg_soma_id": 12, "highly_vari": [12, 24, 44, 45, 46, 59], "adata": [12, 25, 32, 35, 42, 43, 44, 45, 47, 49, 50, 53, 54, 55, 56, 57], "get_anndata": [12, 25, 32, 42, 43, 44, 45, 46, 47, 50, 53, 54, 57, 59], "scanpi": [13, 24, 28, 32, 38, 42, 43, 44, 45, 46, 47, 49, 53, 55, 56, 59, 62], "mimic": 13, "seurat": [13, 24, 25, 28, 30, 31, 37, 40, 64], "v3": [13, 24, 30, 32, 41, 44, 57], "readthedoc": [13, 42, 44, 46], "en": [13, 42, 44, 46], "inform": [13, 25, 27, 28, 31, 33, 34, 37, 38, 40, 41, 42, 43, 44, 45, 47, 53, 54, 55, 56, 57, 59, 62], "ident": [13, 41], "those": [13, 24, 35, 42, 44, 46, 51], "produc": 13, "donor_id": [13, 35, 38, 41, 44, 49, 53, 55, 56, 57, 60], "lambda": [13, 47], "batch0": 13, "99": 13, "els": [13, 43, 52, 61], "batch1": 13, "calculate_mean": [14, 24, 60], "calculate_vari": [14, 24, 60], "ddof": [14, 60], "nnz_onli": 14, "calcul": [14, 22, 35, 39, 42, 43, 45], "mean": [14, 24, 35, 38, 39, 59, 63], "accumul": [14, 24, 51], "fashion": [14, 23, 24, 37], "total": [14, 24, 28, 29, 33, 35, 41, 44, 46], "n": [14, 25, 28, 32, 33, 35, 41, 44, 46, 49, 50, 51, 55, 60], "dimens": [14, 19, 33, 49, 52, 55, 61], "wise": [14, 44], "metric": [14, 38, 43, 47], "explicitli": [14, 25, 35, 55], "store": [14, 19, 25, 33, 35, 36, 38, 41, 43, 45, 48, 49, 52, 55, 56], "comput": [14, 23, 24, 28, 31, 40, 41, 60, 61], "otherwis": [14, 35, 36, 54], "skip": 14, "delta": [14, 51, 60], "degre": [14, 43, 60], "freedom": [14, 60], "divisor": [14, 60], "x_layer": [15, 25], "obsm_lay": [15, 43, 45, 47], "obsp_lay": 15, "varm_lay": 15, "varp_lay": 15, "column_nam": [15, 23, 25, 27, 32, 41, 43, 44, 48, 49, 50, 51, 53, 54, 55, 56, 57, 58], "axiscolumnnam": 15, "conveni": [15, 27, 41, 48, 51, 52, 53, 57, 59, 64], "obsm": [15, 29, 33, 42, 43, 45, 47], "slot": [15, 29, 43], "obsp": [15, 43], "varm": [15, 33], "varp": [15, 35, 52], "part": [15, 38, 42, 43], "get_all_available_embed": [15, 55], "experiment": [15, 18, 24, 30, 35, 39, 43, 49, 55, 60, 61], "brain": [15, 25, 32, 41, 51], "tissu": [15, 23, 25, 27, 29, 32, 35, 38, 39, 45, 46, 48, 49, 50, 51, 53, 54, 55, 57, 60], "censusversiondescript": [16, 17], "descript": [16, 17, 28, 31, 33, 35, 37, 40, 55, 57, 62], "directori": [16, 17, 30, 34, 36, 63], "unknown": [16, 44, 56, 57], "get_census_version_directori": 16, "entir": [16, 44, 48, 52, 61], "release_d": [16, 17, 36], "release_build": [16, 17, 36], "2022": [16, 17, 20, 21, 52, 53], "01": [16, 17, 20, 26, 35, 42, 46, 49, 50, 55], "public": [16, 17, 20, 27, 29, 34, 35, 36, 43, 45, 47, 49, 50, 53, 55, 56], "s3_region": [16, 17, 20, 34, 36, 53], "u": [16, 17, 18, 20, 21, 27, 28, 30, 31, 34, 36, 40, 44, 51, 53, 55, 63], "west": [16, 17, 20, 21, 27, 28, 30, 34, 36, 53, 55, 63], "lt": [17, 25, 27, 36, 42, 52], "retract": [17, 36], "flag": [17, 36, 61], "both": [17, 23, 25, 28, 35, 37, 38, 42, 43, 50, 54, 55, 57, 59, 61], "long": [17, 23, 27, 31, 36, 37, 38, 40, 61], "term": [17, 27, 31, 35, 36, 40, 41, 48, 51, 56, 61], "weekli": [17, 27, 31, 36, 40], "exclud": [17, 35, 44, 54, 61], "date": [17, 27, 29, 33, 35, 36, 41, 55], "yyyi": [17, 29, 36, 37], "mm": [17, 29, 36], "dd": [17, 29, 36, 37], "alias": 17, "alia": [17, 36], "appear": [17, 35, 36, 41, 43, 61], "under": [17, 34, 35, 36, 38, 44, 46], "again": [17, 56], "v": [17, 36, 42, 51], "sequenti": 17, "increment": [17, 24, 39], "get_census_version_descript": 17, "29": [17, 44, 45, 61], "v2": [17, 41, 42, 44, 56, 60], "v1": [17, 22, 25, 33, 34, 36, 41, 42, 44], "30": [17, 29, 42, 44, 45, 55, 61], "mistak": 17, "info_url": 17, "com": [17, 25, 28, 31, 34, 37, 38, 40, 45, 47, 50, 55, 56, 63], "errata": 17, "replaced_bi": [17, 36], "tiledb_config": [18, 21, 27, 55], "sensibl": 18, "somacor": 18, "somaobject": 18, "replac": [18, 36, 43, 45, 47], "tiledb": [18, 21, 23, 28, 29, 30, 31, 32, 38, 40, 41, 48, 57, 64], "configur": [18, 21, 27, 28, 61], "amount": [18, 56, 58], "oper": [18, 26, 28, 32, 41, 48, 51, 57, 61], "ctx": [18, 27, 55], "py": [18, 21, 42, 44, 46, 56], "init_buffer_byt": [18, 21], "128": [18, 21, 29, 44, 59, 61], "1024": [18, 21], "c": [18, 23, 25, 30, 32, 42, 44, 45, 46, 47, 52, 53, 63], "my": [18, 27], "privat": [18, 27], "access": [18, 20, 28, 29, 31, 32, 33, 34, 35, 37, 38, 39, 40, 41, 43, 46, 48, 57, 58, 61, 64], "differ": [18, 28, 35, 41, 42, 43, 49, 52, 54, 55, 57], "region": [18, 20, 27, 28, 30, 34, 36, 55, 63], "vf": [18, 27, 55], "no_sign_request": [18, 27, 55], "east": [18, 27], "csr_matrix": [19, 38, 42, 46], "presenc": [19, 33, 37, 38, 39, 43, 44, 46], "scipi": [19, 28, 31, 38, 40, 42, 43, 46, 52, 55], "csr_arrai": 19, "deafult": 19, "cannot": [19, 21], "321x60554": 19, "uint8": [19, 52], "6441269": 19, "compress": [19, 52], "format": [19, 27, 35, 36, 37, 51, 52, 62], "censusloc": 20, "guarante": [20, 31, 35, 40, 41, 42], "interest": [20, 31, 33, 40, 41, 43, 52, 54, 56], "_release_directori": 20, "keyerror": 20, "do": [20, 25, 30, 32, 35, 39, 41, 42, 43, 44, 45, 46, 47, 48, 53, 55, 57, 58, 60, 63], "cb5efdb0": 20, "f91c": 20, "4cbd": 20, "9ad4": 20, "9d4fa41c572d": 20, "mirror": 21, "suitabl": [21, 55], "chosen": [21, 34], "automat": [21, 28, 38, 41, 48], "take": [21, 24, 41, 42, 44, 45, 46, 49, 54, 55, 56, 57, 61, 63, 64], "preced": 21, "get_default_soma_context": [21, 27], "level": [21, 33, 35, 36, 37, 38, 41, 45, 51, 53, 54, 56, 58, 59], "It": [21, 28, 29, 33, 35, 37, 38, 41, 55, 59], "manag": [21, 25, 32, 41, 48, 58, 59], "close": [21, 23, 24, 25, 32, 41, 42, 43, 44, 46, 48, 49, 50, 53, 55, 57, 58], "exit": 21, "neither": 21, "invalid": [21, 51], "updat": [21, 24, 28, 35, 37, 42, 44, 46, 51, 55, 56], "31": [21, 44, 45, 61], "rather": [21, 44, 51], "than": [21, 23, 25, 28, 30, 31, 32, 35, 37, 40, 41, 43, 44, 51, 64], "out": [22, 25, 28, 29, 31, 33, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 56, 57, 58, 61, 63, 64], "effici": [22, 25, 28, 31, 38, 39, 40, 54, 56, 64], "commonli": [22, 56], "introduc": [22, 43, 56], "normal": [22, 27, 29, 31, 32, 33, 37, 38, 39, 40, 41, 43, 47, 55, 57, 59, 60], "pre": [22, 24, 28, 36, 39, 41, 45, 54, 55], "categor": [22, 31, 40, 56], "publish": [23, 24, 25, 26, 28, 29, 31, 35, 37, 40], "august": [23, 37], "7th": 23, "pablo": [23, 24, 25, 26, 37], "garcia": [23, 24, 25, 26, 37], "nieto": [23, 24, 25, 26, 37], "team": [23, 24, 25, 28, 37], "pleas": [23, 25, 27, 28, 31, 37, 40, 42, 43, 44, 45, 46, 47, 54, 56, 64], "announc": [23, 24, 25, 37], "come": [23, 32, 37, 42, 43, 44], "our": [23, 25, 28, 32, 37, 41, 42, 43, 45, 47, 49, 55], "back": [23, 37, 42, 45, 61], "now": [23, 24, 25, 26, 31, 32, 37, 38, 40, 41, 42, 44, 45, 46, 49, 50, 52, 53, 54, 55, 57, 60, 61, 63], "biologist": 23, "largest": [23, 28, 37], "standard": [23, 28, 31, 33, 40, 48, 51], "aggreg": [23, 37], "compos": [23, 33, 37], "60k": [23, 28, 37], "With": [23, 24, 25, 37, 41, 43, 46, 49, 55, 57, 61], "few": [23, 24, 39, 43, 45, 46, 54, 55, 56, 63], "hundr": [23, 37], "bigger": [23, 37], "quickli": [23, 29, 41, 42], "basic": [23, 42, 43, 44, 45, 46, 48, 49, 53, 55, 61], "structur": [23, 31, 35, 36, 40, 41, 43], "downstream": [23, 24, 25, 26, 32, 35, 55], "analysi": [23, 25, 32, 35, 37, 39, 41, 42, 43, 44, 46, 48, 54, 55], "instruct": [23, 28, 32], "learn": [23, 35, 38, 42, 43, 46, 48, 54, 55, 57], "sure": [23, 46], "resourc": [23, 34, 44], "quick": [23, 27, 28, 31, 39, 40, 41, 58, 61], "start": [23, 26, 27, 28, 29, 31, 39, 40, 41, 42, 44], "guid": [23, 27, 38, 42], "refer": [23, 25, 27, 28, 31, 32, 35, 37, 40, 42, 43, 45, 47, 57], "tutori": [23, 24, 28, 29, 31, 32, 40, 43, 44, 45, 46, 47, 49, 51, 53, 54, 55, 57, 58, 59, 60, 61], "reli": 23, "capabl": [23, 37, 39, 43, 52, 64], "shown": [23, 26, 35, 36, 41, 43, 49, 61], "section": [23, 27, 35, 41, 44, 45, 49, 54, 55], "czi": [23, 28, 31, 40, 62], "develop": [23, 29, 30, 37, 42, 44, 56], "upgrad": [23, 28, 56], "beta": [23, 41, 44, 45], "here": [23, 24, 28, 31, 32, 33, 35, 36, 38, 40, 42, 43, 54, 55, 56, 61], "ever": 23, "grow": 23, "cz": [23, 28, 29, 33, 39, 44, 46, 50, 53, 54], "discov": [23, 28, 29, 33, 38, 41, 44, 45, 49, 50, 53, 54, 55, 62], "accompani": 23, "ontologi": [23, 35, 45, 56], "cl": [23, 35, 41, 44, 45, 48, 57, 58, 60], "uberon": [23, 35, 41, 44, 48, 56, 57, 58, 60], "respect": [23, 25, 30, 35, 41, 43, 56, 57], "you": [23, 25, 28, 29, 30, 31, 32, 33, 38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 51, 53, 54, 55, 56, 57, 58, 61, 63], "find": [23, 25, 29, 31, 33, 38, 40, 41, 43, 45, 46, 47, 48, 49, 52, 55, 59], "schema": [23, 25, 26, 27, 28, 29, 36, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58], "page": [23, 27, 28, 29, 32, 33, 42, 43, 45, 47, 49, 55, 64], "research": [23, 25, 28, 31, 40], "directli": [23, 27, 28, 29, 39, 41, 43, 44, 48, 53, 57, 61, 62], "session": [23, 27, 30], "librari": [23, 26, 28, 29, 30, 32, 33, 35, 41, 44, 61], "your": [23, 25, 28, 30, 31, 39, 40, 48, 53, 54, 55, 58], "navig": 23, "300k": [23, 32], "microgli": [23, 27, 32], "neuron": [23, 25, 27, 32, 41, 45, 52, 58], "femal": [23, 27, 32, 44, 54, 56, 57, 60], "donor": [23, 35, 44, 52, 53, 56], "somadatafram": [23, 32, 41, 48, 57], "cell_metadata": [23, 27, 32, 50], "arrow": [23, 25, 26, 28, 31, 32, 37, 40], "tabl": [23, 25, 32, 33, 39, 42, 43, 44, 46, 50, 51, 52, 54], "sex": [23, 25, 27, 29, 32, 35, 41, 49, 51, 53, 54, 55, 56, 57, 60], "cell_typ": [23, 24, 25, 26, 27, 32, 35, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 53, 54, 55, 56, 57, 58, 60, 61], "assai": [23, 26, 27, 29, 32, 42, 43, 46, 49, 53, 55, 56, 57, 58, 60], "suspension_typ": [23, 27, 32, 35, 38, 41, 44, 49, 53, 55, 56, 57, 60], "diseas": [23, 27, 29, 32, 35, 42, 43, 49, 53, 54, 55, 56, 57, 60], "concat": [23, 24, 32, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 59, 60], "tibbl": [23, 32], "frame": [23, 27, 28, 31, 32, 33, 40, 41, 52], "similarli": [23, 25, 26, 32, 41, 52, 57], "gene_filt": [23, 24, 25, 32], "feature_id": [23, 24, 25, 32, 35, 41, 44, 45, 47, 49, 51, 52, 53, 55, 56, 57, 59], "ensg00000107317": [23, 25, 32], "ensg00000106034": [23, 25, 32], "cell_filt": [23, 24, 25, 32], "leptomening": 23, "cell_column": [23, 25, 32], "seurat_obj": [23, 25, 32], "get_seurat": [23, 25, 32], "sce_obj": [23, 25, 32], "get_single_cell_experi": [23, 25, 32], "sometim": 23, "too": 23, "overview": [23, 33, 58], "septemb": 24, "18": [24, 41, 42, 44, 45, 47, 55, 60], "thrill": 24, "offici": [24, 35], "wide": [24, 27, 31, 40, 43, 52], "algorithm": [24, 43, 59, 60], "line": [24, 41, 45, 47, 61], "code": [24, 25, 38, 51, 56, 58, 61], "task": [24, 28, 43], "ten": 24, "convent": [24, 36, 41], "laptop": 24, "8gb": 24, "below": [24, 25, 26, 32, 35, 36, 37, 41, 44, 45, 49, 52, 58, 61], "full": [24, 27, 31, 33, 36, 37, 38, 39, 40, 42, 43, 57, 58, 61], "correct": [24, 29, 35, 61], "These": [24, 25, 28, 31, 34, 35, 38, 40, 41, 43, 44, 45, 47, 55], "interwoven": 24, "wai": [24, 41, 48, 49, 52, 54, 55, 57], "seamlessli": 24, "appli": [24, 43, 46, 47], "33m": [24, 28], "continu": [24, 32], "cellxgene_censu": [24, 25, 26, 27, 29, 32, 38, 41, 42, 43, 44, 45, 46, 47, 48, 50, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61, 62, 63], "pp": [24, 42, 43, 44, 45, 46, 47, 49, 55, 59, 60], "mean_vari": [24, 60], "small": [24, 25, 35, 37, 41, 43, 44, 46, 48, 51, 56, 57], "advantag": [24, 49, 55], "cpu": [24, 42, 45, 61], "multiprocess": 24, "speed": [24, 28], "popul": 24, "zero": [24, 25, 33, 35, 43, 51, 55, 59], "futur": [24, 29, 32, 34, 41, 42, 44, 45, 46, 48, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61], "we": [24, 25, 28, 31, 32, 34, 38, 40, 41, 42, 43, 44, 45, 46, 47, 49, 50, 51, 52, 54, 55, 56, 57, 60, 61], "enabl": [24, 28, 29, 35, 56], "easili": [24, 25, 28, 46, 49], "switch": [24, 56], "human_data": 24, "feature_nam": [24, 32, 35, 41, 43, 44, 49, 50, 51, 52, 53, 54, 55, 56, 57, 59], "axis_queri": [24, 25, 32, 49, 51, 54, 55, 59, 60], "mean_variance_df": 24, "gene_df": 24, "to_panda": [24, 32, 41, 42, 43, 44, 46, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 59, 60], "8624": 24, "071926": 24, "5741": 24, "242485": 24, "16437": 24, "8": [24, 30, 32, 41, 42, 43, 44, 45, 46, 47, 49, 52, 54, 55, 56, 57, 59, 60, 61, 63], "233282": 24, "452": 24, "119153": 24, "feature_length": [24, 32, 35, 41, 42, 44, 46, 49, 51, 52, 53, 55, 56, 57, 59], "ensg00000171885": 24, "5943": 24, "ensg00000133703": 24, "6845": 24, "get_highly_variable_gen": 24, "while": [24, 32, 41, 43, 45, 49, 55, 59], "account": [24, 42, 61], "effect": [24, 25, 42, 43, 55], "integr": [24, 28, 31, 38, 40, 43, 44], "particular": [24, 26, 43, 61], "design": [24, 56], "paradigm": [24, 31, 40], "abov": [24, 28, 32, 33, 35, 41, 45, 54, 56, 57, 58], "tweak": 24, "compli": 24, "rule": 24, "thumb": 24, "good": [24, 43, 46, 55], "variances_norm": [24, 59], "003692": 24, "004627": 24, "748221": 24, "003084": 24, "003203": 24, "898657": 24, "014962": 24, "037395": 24, "513473": 24, "218865": 24, "547648": 24, "786928": 24, "002142": 24, "002242": 24, "894955": 24, "60659": [24, 44, 52], "000000": [24, 43, 51, 59], "60660": [24, 44, 52], "60661": [24, 44, 52], "60662": [24, 44, 52], "60663": [24, 44, 52], "octob": 25, "maximilian": 25, "lombardo": 25, "happi": 25, "introduct": 25, "tailor": 25, "empow": 25, "reflect": [25, 35, 43], "expand": [25, 35, 43, 51], "exclus": 25, "thei": [25, 35, 36, 42, 43, 49, 51, 52, 54, 55], "invit": 25, "feedback": 25, "explor": [25, 28, 31, 38, 39, 40, 55], "novel": [25, 44], "were": [25, 28, 33, 35, 41, 42, 43, 44, 46, 52, 54, 55], "mous": [25, 33, 35, 38, 41, 46, 51, 53, 54, 57, 59, 60], "divid": [25, 51, 54], "sum": [25, 26, 35, 43, 44, 45, 47, 48, 51, 53, 61], "point": [25, 33, 36, 43, 51], "precis": [25, 49, 55], "round": 25, "sigma": 25, "artifact": [25, 34, 35, 43], "m": [25, 30, 33, 41, 44, 45, 46, 47, 52, 57, 59, 63], "enrich": 25, "field": [25, 34, 35, 55, 64], "n_measured_ob": [25, 35, 49, 55], "wa": [25, 35, 43, 46, 47, 52, 53, 55, 56, 61], "augment": 25, "forego": 25, "common": [25, 32, 43, 48, 55, 57, 59, 61], "earli": 25, "raw_sum": [25, 35, 49, 51, 55], "deriv": [25, 45, 46, 55], "raw_mean_nnz": [25, 35, 49, 55], "averag": 25, "raw_variance_nnz": [25, 35, 49, 55], "n_measured_var": [25, 35, 49, 55], "thu": [25, 28, 31, 35, 38, 40, 42, 45, 48, 57], "ensg00000161798": [25, 32, 57], "ensg00000188229": [25, 32, 57], "sympathet": [25, 32], "singlecellexperi": [25, 30, 31, 37, 40], "outlin": 25, "like": [25, 26, 28, 34, 37, 41, 43, 44, 45, 48, 55, 61], "male": [25, 32, 44, 45, 51, 56, 57, 58, 60], "pyarrow": [25, 28, 31, 32, 40, 51, 54], "raw_slic": [25, 32], "somaaxisqueri": [25, 32], "read_next": [25, 32], "print": [25, 32, 43, 48, 50, 52, 53, 54, 55, 56, 61, 63], "encourag": [25, 31, 40], "engag": 25, "share": [25, 28, 31, 40], "invalu": 25, "ongo": 25, "project": [25, 30, 39, 43], "reach": [25, 31, 40, 42], "report": [25, 29, 43, 56], "issu": [25, 28, 29, 43], "repositori": [25, 28, 31, 35, 40, 55, 63], "april": [26, 36], "4th": 26, "2024": [26, 28, 31, 35, 40, 50], "emanuel": 26, "bezzi": 26, "04": [26, 35, 46, 49, 55], "instead": [26, 42, 43, 46, 56, 61], "observ": [26, 33, 35, 42, 51, 54, 56, 58], "smaller": [26, 32, 61], "footprint": 26, "howev": [26, 28, 42, 43, 44, 61], "pipelin": [26, 31, 39, 40], "explain": 26, "adapt": [26, 51, 55], "link": [26, 37, 44, 52, 53], "value_count": [26, 41, 42, 44, 46, 48, 51, 54, 57], "categori": [26, 29, 35, 41, 44, 45, 58], "present": [26, 28, 31, 33, 35, 36, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 57, 58], "groupbi": [26, 44, 47, 56, 58], "pivot": 26, "show": [26, 35, 38, 39, 41, 43, 45, 46, 47, 51, 54, 61], "unus": 26, "factor": [26, 43], "interfac": [26, 45, 47, 49, 55, 56, 61], "inspect": [26, 38, 49, 55, 61], "null": [26, 36], "indic": [26, 28, 33, 35, 38, 41, 43, 44, 46, 51, 52, 55, 57], "int16": 26, "int8": 26, "assay_ontology_term_id": [26, 35, 38, 41, 44, 48, 49, 53, 55, 56, 57, 60], "development_stag": [26, 35, 41, 44, 49, 53, 55, 56, 57, 60], "development_stage_ontology_term_id": [26, 35, 41, 44, 49, 53, 55, 56, 57, 60], "output": [26, 32, 51, 61], "truncat": 26, "amazon": [27, 28], "web": [27, 28], "servic": [27, 28, 34], "what": [27, 34, 35, 38, 41, 42, 43, 44, 54, 55, 57], "inclus": [27, 35, 48], "criteria": [27, 28, 32, 33, 35, 57], "individu": [27, 31, 35, 40, 41, 42, 46, 54], "root": [27, 35, 36, 63], "definit": [27, 42, 57], "publicli": [27, 28, 29, 31, 36, 40, 64], "uniqu": [27, 28, 29, 35, 41, 42, 43, 44, 48, 51, 54], "05": [27, 36, 37, 49, 54, 56, 61], "bulk": 27, "07": [27, 32, 34, 36, 41, 42, 44, 46, 48, 51, 52, 53, 56, 57, 58, 59, 60, 61], "25": [27, 32, 34, 41, 42, 43, 44, 45, 46, 48, 51, 52, 53, 56, 57, 58, 59, 60, 61], "shell": [27, 45, 47, 53], "sync": [27, 45], "sign": [27, 43, 45, 47], "recommend": [27, 28, 30, 32, 35, 36, 42, 43, 45, 47, 54, 56, 63, 64], "folder": [27, 36, 37, 38, 47], "interact": [27, 31, 35, 40], "document": [27, 28, 32, 35, 36, 38, 41, 42, 46, 48, 55, 57, 64], "last": [28, 29, 35, 36], "jan": 28, "latenc": [28, 31, 40], "acceler": [28, 31, 40], "50m": 28, "mice": 28, "harmon": [28, 31, 37, 40], "label": [28, 35, 36, 41, 43, 44, 45, 47, 50, 54, 56, 58, 61], "multi": [28, 33, 39, 44, 55], "core": [28, 39, 42, 51], "k": [28, 43], "onlin": [28, 29, 31, 36, 40, 60, 64], "t": [28, 35, 42, 44, 45, 46, 47, 48, 50, 53, 54, 57, 58], "covid": [28, 41, 44, 54, 57], "19": [28, 29, 41, 42, 44, 45, 47, 48, 52, 54, 55, 57], "suit": 28, "author": [28, 35], "spatial": [28, 33, 35, 42, 43, 44, 52, 53], "yet": [28, 30], "d": [28, 55, 63], "click": [28, 32], "citat": [28, 31, 35, 39, 40], "guidelin": [28, 31, 40], "offer": [28, 31, 40, 43, 49, 55, 64], "becaus": [28, 42, 44, 46, 54], "therefor": [28, 42, 46, 48, 54, 55], "numer": [28, 43], "incompat": [28, 35], "purpos": 28, "suggest": [28, 43], "fast": 28, "corpu": 28, "60": [28, 45, 54], "gencod": 28, "readi": [28, 45, 61], "cloud": [28, 30, 31, 34, 40, 53, 64], "matric": [28, 31, 32, 33, 40, 41, 43, 51], "possibl": [28, 35, 38, 45, 57], "due": [28, 41, 43, 51, 61], "free": [28, 56], "aw": [28, 30, 34, 45, 47, 53, 63], "ye": 28, "download_source_h5ad": [28, 53], "help": [28, 32, 38, 41, 46, 48, 55, 56, 57, 59, 61], "pattern": [28, 43], "internet": [28, 30, 56], "limit": [28, 41, 54], "bandwidth": [28, 54, 63], "tactic": 28, "connect": [28, 30, 44, 45, 56, 58, 63], "high": [28, 33, 35, 41, 43, 44, 45, 54, 56, 59, 63], "ethernet": 28, "wifi": 28, "coast": 28, "ec2": [28, 30], "instanc": [28, 30, 35, 43, 48, 56, 63], "There": [28, 30, 44, 45, 48, 49, 52, 54, 55, 59], "environ": [28, 30], "census_env": 28, "activ": [28, 30, 32, 55, 63], "submit": [28, 31, 40], "join": [28, 31, 40, 41, 44, 51, 53, 57, 59], "scienc": [28, 31, 40, 50, 52, 62], "commun": [28, 31, 40, 43, 49, 55], "slack": [28, 31, 37, 40], "question": [28, 41], "channel": [28, 31, 37, 40], "inquir": 28, "accept": [28, 35, 59], "meet": [28, 32, 57, 59], "biolog": [28, 39, 54, 55, 61], "try": [28, 61], "old": [28, 44, 60], "persist": [28, 33], "notebook": [28, 30, 37, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 55, 56, 61, 63], "sh": [28, 30], "restart": 28, "runtim": 28, "reload": [28, 45], "numba": [28, 51], "relat": [28, 36], "magic": 28, "similar": [28, 41, 42, 43, 44, 47, 57, 58, 59], "dbutil": 28, "restartpython": 28, "addition": [28, 42, 43], "node": [28, 41], "cluster": [28, 39, 42, 47], "0d53f00001ghvp3cap": 28, "between": [28, 35, 43, 45], "altern": [28, 61], "ad": [28, 35, 37, 43, 56, 57], "tab": 28, "edit": [29, 35, 36], "decemb": 29, "15th": [29, 31, 40], "stabil": 29, "scientif": 29, "reproduc": [29, 42, 56, 58], "plan": [29, 31, 40], "regular": 29, "everi": [29, 31, 40], "six": [29, 31, 40], "month": [29, 31, 36, 37, 40, 60], "least": [29, 31, 35, 40], "5": [29, 30, 32, 35, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 59, 60, 61], "year": [29, 31, 36, 37, 40, 44], "recogn": 29, "previou": [29, 34, 42, 44, 49, 55], "ingest": [29, 54], "hand": 29, "week": [29, 57], "651": 29, "62": [29, 44, 45, 47, 54], "998": 29, "417": 29, "684": 29, "805": 29, "36": [29, 41, 45, 61], "227": [29, 59], "903": 29, "230": 29, "588": [29, 44, 52, 53], "990": 29, "20": [29, 41, 42, 44, 45, 47, 50, 52, 55, 60, 63], "631": 29, "248": [29, 41, 48], "stage": [29, 44, 56, 57, 60], "173": [29, 59], "72": [29, 45], "self": [29, 37, 38, 42, 51, 56, 61], "ethnic": [29, 56], "na": [29, 35, 41, 58, 60], "suspens": [29, 42, 56], "74": [29, 45], "53": [29, 45], "27": [29, 41, 42, 44, 45, 52, 61], "fine": [29, 49, 63], "593": [29, 44, 52, 53], "56": [29, 44, 45], "400": 29, "873": 29, "255": 29, "245": [29, 52], "33": [29, 44, 45, 55, 61], "364": 29, "242": 29, "083": 29, "531": [29, 44], "13": [29, 41, 42, 44, 45, 46, 49, 54, 55], "035": 29, "9": [29, 32, 41, 42, 43, 44, 45, 46, 47, 48, 49, 52, 54, 55, 56, 57, 61], "613": [29, 41, 48, 58], "164": 29, "64": [29, 41, 45], "26": [29, 41, 42, 44, 45, 52, 61], "220": [29, 41, 48, 52], "66": [29, 41, 45, 48], "54": [29, 41, 45], "prevent": [29, 55], "analys": [29, 56], "mark": [29, 35, 41, 43, 54], "is_primari": 29, "exactli": [29, 35], "243": [29, 41, 52], "569": 29, "twice": [29, 41], "wish": [29, 41, 59], "consid": [29, 42], "duplicate_cells_census_lts_2023": 29, "csv": [29, 56], "zip": [29, 47, 51], "562": 29, "794": 29, "728": 29, "086": 29, "032": 29, "758": 29, "887": 29, "914": 29, "318": 29, "493": 29, "362": 29, "604": 29, "226": 29, "68": [29, 45], "51": [29, 44, 45], "61": [29, 45], "linux": [30, 63], "maco": [30, 63], "system": [30, 41, 43, 49, 53, 55, 63], "Or": 30, "tbd": 30, "16": [30, 41, 42, 44, 45, 46, 47, 49, 55, 56, 60, 61], "gb": [30, 56], "mbp": [30, 56], "increas": [30, 31, 40, 56], "virtual": [30, 63], "conda": 30, "venv": [30, 42, 44, 46, 63], "bin": [30, 63], "modul": [30, 38, 39, 42, 61], "less": [30, 31, 40, 43, 61], "complex": [30, 41, 43, 48, 51, 52], "databrick": 30, "faq": [30, 31, 40], "ubuntu": [30, 63], "apt": 30, "libxml2": 30, "dev": 30, "libssl": 30, "libcurl4": 30, "openssl": 30, "cmake": 30, "21": [30, 42, 44, 45, 46, 47, 52, 54, 57, 60], "greater": [30, 35, 50], "tool": [30, 38, 43, 47, 56, 63], "xcode": 30, "window": [30, 61], "univers": [30, 43, 55], "cran": 30, "org": [30, 50], "abl": [30, 34], "export": [30, 37, 49, 64], "biocmanag": 30, "quietli": 30, "break": [31, 40, 54], "ve": [31, 40], "central": [31, 40, 49, 55], "hub": [31, 40], "analyz": [31, 40], "significantli": [31, 40], "minim": [31, 40, 43], "studi": [31, 40, 42, 43], "scale": [31, 40, 42, 44, 45, 46], "interoper": [31, 40, 56], "toolkit": [31, 39, 40], "smart": [31, 33, 38, 40, 41, 44, 52, 53, 58, 60], "seq2": [31, 33, 38, 40, 41, 44, 46, 52, 53, 58, 60], "molecul": [31, 33, 35, 40], "10x": [31, 32, 33, 38, 40, 41, 43, 44, 47, 52, 53, 54, 56, 57, 60], "duplic": [31, 33, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 57, 58], "five": [31, 40], "perman": [31, 40], "ask": [31, 40], "email": [31, 37, 40, 55], "believ": [31, 40], "secur": [31, 40], "disclos": [31, 40], "contact": [31, 40], "seamless": [31, 40], "pytorch": [31, 39, 40], "usabl": [31, 40, 61], "area": [31, 40], "On": [31, 40], "demand": [31, 32, 40], "rich": [31, 40, 42], "subsampl": [31, 40], "vignett": [32, 47], "soon": 32, "remind": [32, 49, 52, 55], "etc": [32, 33, 38, 41], "consist": [32, 38, 41, 42, 43, 44, 46, 48, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61], "ey": [32, 52], "379219": 32, "microwel": [32, 41, 44, 57], "seq": [32, 41, 42, 44, 57, 58], "adren": [32, 41], "gland": [32, 41, 45, 54, 55, 58], "379220": 32, "379221": 32, "379222": 32, "379223": 32, "379224": 32, "7": [32, 41, 42, 43, 44, 45, 46, 47, 49, 52, 53, 54, 55, 56, 57, 61], "n_var": [32, 44, 46, 49, 51, 52, 53, 55, 56, 57], "demonstr": [32, 38, 39, 41, 42, 43, 47, 49, 50, 51, 53, 55, 56, 59, 61], "lazi": [32, 49, 54, 55], "evalu": 32, "well": [32, 41, 42, 44, 54, 58], "logic": [32, 44], "wrap": [32, 51, 61], "loop": 32, "r6": 32, "familiar": [32, 35, 42, 44, 46, 61, 64], "379": 32, "224": 32, "chr": 32, "fema": 32, "6": [32, 34, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 53, 54, 55, 56, 57, 58, 61], "\u2139": 32, "214": 32, "4k": 32, "4744": 32, "sampl": [32, 44, 45, 47, 51], "bioconductor": 32, "ecosystem": 32, "dim": 32, "rownam": 32, "rowdata": 32, "colnam": 32, "obs48350835": 32, "obs48351829": 32, "obs52469564": 32, "obs52470190": 32, "coldata": 32, "reduceddimnam": 32, "mainexpnam": 32, "altexpnam": 32, "sparse_matrix": 32, "state": [32, 43, 44, 52, 53], "monitor": 32, "read_complet": 32, "friendli": [33, 35], "varieti": [33, 43, 48, 51, 55], "hierarchi": 33, "somacollect": [33, 41, 48, 63], "whole": [33, 41, 44], "summary_cell_count": [33, 41, 44, 58], "stratifi": [33, 41, 45], "relev": [33, 35, 39, 41, 57], "independ": [33, 41], "somaexperi": [33, 41, 51], "special": [33, 35, 41, 57], "form": [33, 41, 52, 61], "how": [33, 38, 39, 41, 43, 44, 46, 50, 54, 55, 58, 61], "avialbl": 33, "feature_dataset_presence_matrix": [33, 44, 46], "boolean": [33, 35, 44, 46, 52], "adher": 33, "technologi": [33, 35, 38, 41, 42, 44, 46], "short": [33, 37, 41], "densendarrai": 33, "dimension": [33, 35, 43, 44], "offset": 33, "sparsendarrai": [33, 49, 55], "primari": [33, 35, 38, 43, 45, 58], "geograph": 34, "json": [34, 45, 47, 55], "cziscienc": [34, 45, 47, 50, 55, 56], "base_uri": 34, "three": [34, 56, 57], "gc": 34, "rememb": [34, 41, 54], "relative_uri": 34, "hood": 34, "cloudfront": 34, "registri": 34, "resolv": 34, "against": 34, "onward": 34, "togeth": [34, 61], "could": [34, 43, 47, 61], "deprec": [34, 42], "march": 35, "NOT": [35, 36, 51, 52], "shall": [35, 36], "interpret": [35, 36, 43], "bcp": [35, 36], "14": [35, 36, 41, 42, 44, 45, 46, 49, 52, 55], "rfc2119": [35, 36], "rfc8174": [35, 36], "capit": [35, 36], "hereaft": 35, "visit": [35, 43, 62], "understand": [35, 38, 43], "reader": [35, 38], "throughout": [35, 45, 47, 54, 55], "serv": [35, 46], "deposit": [35, 36, 38], "heart": [35, 52, 54, 59], "left": [35, 37, 38, 42, 44], "ventricl": [35, 48], "semver": 35, "major": [35, 44], "delet": 35, "modal": 35, "minor": 35, "compat": 35, "patch": 35, "editori": 35, "impos": 35, "organism_ontology_term_id": 35, "ncbitaxon": 35, "10090": 35, "9606": 35, "feature_refer": 35, "speic": 35, "AND": 35, "compris": 35, "children": 35, "efo": [35, 41, 42, 44, 57, 58, 60], "0002772": 35, "0010183": [35, 41], "nascent": 35, "elong": 35, "target": [35, 41], "manner": [35, 49, 55, 61], "doesn": [35, 44], "concurr": 35, "perturb": 35, "intend": [35, 37, 59, 61], "primarili": [35, 42, 43, 44], "fusion": 35, "modif": 35, "mrna": [35, 41], "trna": 35, "rrna": 35, "viral": 35, "intron": 35, "ribosom": 35, "profil": [35, 41, 44], "umi": 35, "tissue_typ": 35, "equal": [35, 38, 48], "referenc": [35, 44], "whose": [35, 44, 57], "readabl": [35, 36, 38, 44], "census_schema_vers": [35, 41, 50], "census_build_d": [35, 41, 50], "iso": [35, 36, 55], "8601": [35, 36], "dataset_schema_vers": [35, 41, 50], "total_cell_count": [35, 41, 44, 50, 58], "unique_cell_count": [35, 41, 44, 50, 58], "number_donors_homo_sapien": [35, 41, 50], "number_donors_mus_musculu": [35, 41, 50], "10000": [35, 43], "100": [35, 41, 42, 44], "collection_id": [35, 42, 46, 52, 53], "quot": 35, "collection_nam": [35, 38, 42, 44, 46, 52, 53], "collection_doi": [35, 42, 46, 52, 53], "dataset_titl": [35, 38, 42, 44, 46, 52, 53], "dataset_h5ad_path": [35, 42, 46, 52, 53], "rel": [35, 46, 60], "storag": [35, 64], "dataset_total_cell_count": [35, 42, 46, 52, 53], "dataset_version_id": 35, "self_reported_ethn": [35, 41, 44, 49, 53, 55, 56, 57], "ontology_term_id": [35, 41, 44, 58], "0002048": [35, 44, 48], "cell_type_a": 35, "xxxxx": 35, "cell_type_n": 35, "assay_a": 35, "assay_n": 35, "tissue_a": 35, "tissue_n": 35, "tissue_general_a": 35, "tissue_general_n": 35, "disease_a": 35, "mondo": [35, 44], "disease_n": 35, "self_reported_ethnicity_a": 35, "hancestro": [35, 57], "self_reported_ethnicity_n": 35, "sex_a": 35, "pato": [35, 44, 57, 60], "sex_n": 35, "suspension_type_a": 35, "suspension_type_n": 35, "organism_label": 35, "machin": [35, 36, 45], "somameasur": 35, "somaindexeddatafram": 35, "fill": [35, 55], "remov": [35, 42, 44, 54], "variant": 35, "j": [35, 43, 50, 52, 53], "feature_biotyp": 35, "pin": 35, "clarifi": 35, "feature_1": 35, "feature_m": 35, "dataset_soma_joinid_1": 35, "dataset_soma_joinid_n": 35, "tissue_general_ontology_term_id": [35, 41, 44, 49, 53, 55, 56, 57, 60], "disease_ontology_term_id": [35, 41, 44, 49, 53, 55, 56, 57, 60], "observation_joinid": 35, "self_reported_ethnicity_ontology_term_id": [35, 41, 44, 49, 53, 55, 56, 57, 60], "sex_ontology_term_id": [35, 41, 44, 49, 53, 55, 56, 57, 60], "tissue_ontology_term_id": [35, 41, 44, 48, 49, 53, 55, 56, 57, 60], "handl": [35, 41, 48, 50, 54, 61], "text": [35, 36, 37, 38], "cell_census_build_d": 35, "cell_census_schema_vers": 35, "renam": [35, 44], "move": [35, 61], "dataset_presence_matrix": 35, "ascii": [35, 36], "0x22": 35, "exclam": 36, "intern": 36, "Its": 36, "notic": [36, 56], "printabl": 36, "charact": 36, "record": [36, 48], "parent": [36, 41], "longer": [36, 42], "dai": 36, "info_permalink": 36, "later": [36, 43, 45, 47, 49, 55], "release_alia": 36, "release_nam": 36, "url": [36, 45, 47], "blog": 37, "piec": [37, 41], "deliv": 37, "hous": 37, "blurb": 37, "extern": 37, "goal": [37, 38, 41, 42, 46, 51], "master": 37, "twitter": 37, "One": [37, 43], "stop": [37, 42, 54], "place": [37, 38, 42, 61], "histor": 37, "view": [37, 38, 44, 55, 56, 59], "great": [37, 42, 46], "approach": [37, 43], "apach": 37, "subdirectori": 37, "markdown": [37, 38], "md": [37, 38], "prefix": 37, "yyyymmdd": 37, "discret": [37, 38, 42], "20230810": 37, "r_api_is_out": 37, "highest": [37, 38], "header": [37, 38], "concis": [37, 38], "explanatori": [37, 38], "white_check_mark": [37, 38], "cool": 37, "error": [37, 41, 45, 47, 48], "ital": 37, "keyboard": 37, "john": 37, "smith": 37, "author1": 37, "phil": 37, "scoot": 37, "author2": 37, "introductori": [37, 38], "paragraph": [37, 38], "right": [37, 38, 44, 54], "underneath": [37, 38], "summari": [37, 38, 39, 50], "30m": 37, "rest": [37, 38, 44], "render": [37, 38], "sidebar": [37, 38], "absenc": [37, 38], "sub": [37, 38, 56], "writer": [37, 38], "pgarcia": 37, "capabitli": 37, "cellcensu": 38, "symlink": 38, "asset": 38, "onboard": 38, "product": 38, "unless": 38, "direct": [38, 53], "mention": 38, "action": 38, "extract": [38, 51, 61], "exhaust": [38, 42], "proper": [38, 42], "showcas": [38, 41, 42, 51, 54, 55, 57], "clear": [38, 41, 43, 54], "power": 38, "bold": 38, "lower": [38, 44, 59, 61], "qc": 38, "much": [38, 43, 48], "kept": 38, "succinct": 38, "liver": [38, 46, 54], "prior": 38, "blob": [38, 56], "cellxgene_census_schema": 38, "repeat": [38, 54], "let": [38, 41, 42, 43, 44, 45, 46, 47, 49, 52, 53, 54, 55, 56, 57], "sc": [38, 42, 43, 44, 45, 46, 47, 56], "tabula": [38, 42, 44, 46, 52, 53], "muri": [38, 42, 46, 53], "seni": [38, 42, 46, 53], "genom": [38, 56], "stream": [39, 64], "gget": 39, "collabor": [39, 43, 45], "predict": [39, 43], "biologi": [39, 55], "gain": 39, "natur": [39, 44, 45, 54, 56], "summar": [39, 41, 44, 58], "leverag": 39, "cover": 41, "simpl": [41, 43, 47, 51, 56, 61], "sever": [41, 48, 49], "prefer": [41, 48, 53], "34": [41, 42, 44, 45, 46, 47, 48, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61], "39": [41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61], "think": [41, 47], "variou": [41, 43, 48, 58], "analog": 41, "census_info": [41, 42, 44, 46, 50, 52, 53, 58], "census_obj": 41, "want": [41, 51, 54, 57, 61, 63], "pair": [41, 51], "61656118": [41, 48, 53], "37447773": 41, "13035": 41, "1417": 41, "Of": 41, "meta": [41, 54, 56], "consortia": 41, "idea": 41, "Not": 41, "cast": 41, "census_count": 41, "33364242": [41, 58], "56400873": [41, 53, 58], "0008722": [41, 44, 58], "264166": [41, 58], "279635": [41, 58], "drop": [41, 44, 45, 51, 58], "0008780": [41, 58], "25652": [41, 44, 58], "51304": [41, 58], "indrop": [41, 44, 58], "0008919": [41, 58], "89477": [41, 58], "206754": [41, 58], "0008931": [41, 58, 60], "78750": [41, 58], "188248": [41, 58], "1357": [41, 58], "0002113": [41, 58], "179684": [41, 58], "208324": [41, 58], "kidnei": [41, 45, 52, 54, 58], "1358": [41, 58], "0002365": [41, 58], "15577": [41, 58], "31154": [41, 58], "exocrin": [41, 45, 55, 58], "1359": [41, 58], "0002367": [41, 58], "37715": [41, 58], "130135": [41, 58], "prostat": [41, 58], "1360": [41, 58], "0002368": [41, 58], "13322": [41, 58], "26644": [41, 58], "endocrin": [41, 45, 58], "1361": [41, 58], "0002371": [41, 58], "90225": [41, 58], "144962": [41, 58], "bone": [41, 45, 53, 54, 58], "marrow": [41, 53, 54, 58], "1362": [41, 58], "omit": 41, "creation": 41, "sort": 41, "census_human_assai": 41, "sort_valu": [41, 45], "ascend": 41, "0009922": [41, 57], "11845077": 41, "25597563": 41, "0009899": [41, 44, 60], "7559102": 41, "12638794": 41, "0011025": 41, "3872375": 41, "6139786": 41, "0010550": 41, "4062980": 41, "5064268": 41, "sci": [41, 44], "0009900": 41, "2930054": 41, "3139770": 41, "17": [41, 42, 44, 45, 46, 47, 49, 55, 56, 60], "0030004": 41, "915037": 41, "1084235": 41, "transcript": [41, 44], "0030003": [41, 44], "744798": 41, "811422": 41, "0030002": [41, 57], "625175": 41, "642559": 41, "0700003": 41, "146278": 41, "177276": 41, "bd": [41, 44], "rhapsodi": [41, 44], "transcriptom": [41, 42, 44, 46, 52, 53, 54], "0009901": 41, "42397": 41, "121394": 41, "58981": [41, 44], "117962": 41, "0700004": 41, "96145": 41, "0008995": 41, "29128": 41, "0008953": 41, "4693": 41, "9386": 41, "strt": 41, "0010010": 41, "3105": 41, "5244": 41, "cel": 41, "69": [41, 45], "0000129": 41, "268114": 41, "370771": 41, "1038": [41, 42, 46, 50, 52, 53], "48998": 41, "62617": 41, "easi": [41, 51, 55], "fall": [41, 42], "certain": [41, 43, 61], "distribut": [41, 42, 50], "answer": 41, "exemplifi": 41, "stat": 41, "trivial": 41, "human_cell_typ": 41, "syncytiotrophoblast": [41, 57], "placent": [41, 57], "villou": [41, 57], "trophoblast": [41, 44, 45, 52, 53, 57], "extravil": [41, 57], "56400868": [41, 44], "pericyt": [41, 44, 45, 61], "56400869": [41, 44], "56400870": [41, 44], "56400871": [41, 44], "56400872": [41, 44], "focu": [41, 42, 43, 46], "de": 41, "human_cell_type_count": 41, "2673669": 41, "glutamaterg": [41, 45], "1541605": 41, "cd4": [41, 44, 45, 47], "alpha": [41, 44, 45], "1258976": 41, "cd8": [41, 44, 45, 47], "1235987": 41, "classic": [41, 44], "monocyt": [41, 44, 45, 47], "1030996": 41, "microfold": 41, "epithelium": 41, "intestin": [41, 45, 54], "dendrit": [41, 45, 47], "serou": 41, "bronchu": 41, "sperm": [41, 58], "enteroendocrin": 41, "599": 41, "abund": [41, 44], "That": 41, "achiev": [41, 55], "human_liver_cell_typ": 41, "85739": 41, "hepatoblast": 41, "58447": 41, "neoplast": [41, 45], "52431": 41, "erythroblast": 41, "45605": 41, "31388": 41, "pulmonari": [41, 44, 56, 57], "arteri": 41, "endotheli": [41, 44, 45, 52, 54, 61], "germin": 41, "center": 41, "b": [41, 44, 45, 47, 57], "pneumocyt": [41, 44], "innat": 41, "lymphoid": 41, "126": [41, 61], "go": 41, "sake": [41, 44, 51], "t_cells_list": 41, "t_cells_diseas": 41, "f": [41, 42, 43, 44, 45, 46, 47, 48, 49, 52, 53, 54, 55, 60, 61], "hodgkin": 41, "lymphoma": 41, "blood": [41, 52, 54, 56, 57], "62499": 41, "819428": 41, "30578": 41, "nose": 41, "respiratori": [41, 44, 58], "saliva": 41, "41": [41, 45], "crohn": 41, "colon": 41, "17490": 41, "52029": 41, "down": 41, "syndrom": 41, "181": 41, "breast": 41, "cancer": [41, 44], "1850": 41, "chronic": [41, 44, 57], "obstruct": [41, 44, 57], "9382": 41, "rhiniti": 41, "909": 41, "renal": [41, 44, 52, 53], "carcinoma": [41, 44, 57], "6548": 41, "20540": 41, "lymph": 41, "cystic": [41, 44], "fibrosi": [41, 44, 57], "follicular": 41, "1089": 41, "influenza": 41, "8871": 41, "interstiti": [41, 44, 45, 56, 57], "1803": 41, "benign": 41, "neoplasm": 41, "oncocytoma": 41, "2408": 41, "adenocarcinoma": [41, 44, 57], "205": 41, "3274": 41, "507": 41, "215013": 41, "24969": 41, "pleural": 41, "fluid": 41, "11558": 41, "5922": 41, "lymphangioleiomyomatosi": [41, 44, 57], "513": 41, "36573": 41, "nonpapillari": 41, "adipos": [41, 54], "4828": 41, "288": [41, 52], "clot": 41, "1717": 41, "69136": 41, "pleomorph": [41, 44, 57], "1715": 41, "pneumonia": [41, 44, 57], "856": [41, 51], "1671": 41, "disord": 41, "34301": 41, "squamou": [41, 44, 45, 57], "52053": 41, "lupu": 41, "erythematosu": 41, "355471": 41, "don": [41, 46, 48, 50, 54, 57], "forget": [41, 46, 48, 50, 57], "del": [41, 42, 43, 44], "opportun": 42, "inter": 42, "ignor": [42, 43, 44, 45, 46, 47, 49, 51, 55], "home": [42, 44, 46], "ssm": [42, 44, 46], "lib": [42, 44, 46], "python3": [42, 44, 46], "_set": 42, "63": [42, 45], "userwarn": [42, 44, 46], "70": [42, 45], "dl_pin_memory_gpu_train": 42, "pin_memori": 42, "loader": 42, "tqdm": [42, 44, 46], "auto": [42, 44, 46], "tqdmwarn": [42, 44, 46], "iprogress": [42, 44, 46], "jupyt": [42, 44, 46, 63], "ipywidget": [42, 44, 46], "user_instal": [42, 44, 46], "autonotebook": [42, 44, 46], "notebook_tqdm": [42, 44, 46], "census_dataset": [42, 44, 52, 53], "tabula_liv": 42, "loc": [42, 52], "525": [42, 46], "0b9d8a04": [42, 46, 53], "bb9d": [42, 46, 53], "44da": [42, 46, 53], "aa27": [42, 46, 53], "705bb65b54eb": [42, 46, 53], "s41586": [42, 46, 50, 52, 53], "020": [42, 46, 52, 53], "2496": [42, 46, 53], "4546e757": [42, 46], "34d0": [42, 46], "4d17": [42, 46], "be06": [42, 46], "538318925fcd": [42, 46], "atla": [42, 44, 46, 52, 53, 54], "cha": [42, 46], "2859": [42, 46], "547": 42, "6202a243": [42, 54], "b713": [42, 54], "4e12": [42, 54], "9ced": [42, 54], "c387f8483dea": [42, 54], "7294": [42, 54], "tabula_muris_liver_id": 42, "smart_seq_gene_length": 42, "to_numpi": [42, 43, 44, 45, 46, 49, 51, 55], "smart_seq_index": 42, "smart_seq_x": 42, "proce": [42, 46], "ceil": 42, "put": [42, 55], "omic": [42, 55], "yosef": 42, "lab": [42, 44, 52, 53, 55], "uc": [42, 43, 55], "berkelei": 42, "variat": [42, 43], "infer": [42, 61], "deep": 42, "scrna": [42, 44], "comprehens": 42, "best": [42, 43], "practic": [42, 46], "strength": 42, "bread": [42, 44], "butter": [42, 44], "neighbor": [42, 43, 44, 45, 46, 47, 49, 55], "graph": [42, 43], "visual": [42, 43, 44, 45, 47], "umap": [42, 43, 44, 45, 46, 47, 49, 55], "But": [42, 54], "save": [42, 49, 53, 55, 56, 61], "normalize_tot": [42, 43, 44, 45, 46, 47], "target_sum": [42, 43, 44, 45, 46, 47], "1e4": [42, 44, 45, 46, 47], "log1p": [42, 43, 44, 45, 46, 47], "max_valu": [42, 44, 45, 46], "final": [42, 43, 45, 46, 49, 51, 52, 54, 55, 59, 61], "tl": [42, 43, 44, 45, 46, 47, 49, 55], "pca": [42, 44, 45, 46], "n_neighbor": [42, 43, 45, 47], "n_pc": [42, 45], "40": [42, 45], "pl": [42, 43, 44, 45, 46, 47, 49, 55, 56], "color": [42, 43, 44, 45, 46, 47, 49, 55], "plot": [42, 43, 44, 46, 47, 49, 55], "_tool": [42, 44, 46], "scatterplot": [42, 43, 44, 46], "392": [42, 44, 46], "No": [42, 44, 46], "colormap": [42, 44, 46], "cmap": [42, 44, 46], "cax": [42, 44, 46], "scatter": [42, 43, 44, 46, 47, 49, 55], "strong": [42, 44], "properli": 42, "principl": 42, "randomli": [42, 43], "whenev": 42, "evidenc": 42, "articl": 42, "health": 42, "sikkema": 42, "et": [42, 54], "al": [42, 54], "whom": 42, "perfom": 42, "43": [42, 45, 52, 59], "latent": [42, 43, 47], "setup_anndata": 42, "vae": 42, "n_layer": 42, "n_latent": 42, "gene_likelihood": 42, "nb": 42, "n_hidden": 42, "50": [42, 45, 49, 57], "gpu": [42, 45, 47], "tpu": 42, "tf_cpp_min_log_level": 42, "rerun": [42, 43], "info": [42, 44, 47, 56], "max_epoch": 42, "ipu": 42, "hpu": 42, "epoch": [42, 61], "00": [42, 46, 49], "15it": 42, "v_num": 42, "train_loss_step": 42, "545": 42, "train_loss_epoch": 42, "560": 42, "trainer": [42, 45], "17it": 42, "represent": [42, 43, 45], "x_scvi": 42, "get_latent_represent": [42, 47], "use_rep": [42, 43, 45, 47, 49, 55], "mainli": 42, "driven": [42, 43], "albeit": 42, "contribut": [42, 43, 44, 49, 55], "curat": [42, 50, 56], "strongli": 42, "22": [42, 44, 45, 56, 58, 60, 63], "dataset_id_donor_id": 42, "astyp": [42, 43, 45], "23": [42, 44, 45, 52, 56], "24": [42, 44, 45, 52, 60], "27it": 42, "520": 42, "550": 42, "25it": 42, "mostli": [42, 44], "nucleu": [42, 55, 57], "accomplish": [42, 44], "latter": [42, 57], "knowledg": 43, "journei": 43, "2d": [43, 49, 55], "involv": 43, "nonlinear": 43, "transform": [43, 44, 45, 46, 47, 55], "Such": 43, "affect": [43, 61], "manifold": 43, "overclust": 43, "reduct": [43, 54], "mind": [43, 58], "hypothes": 43, "focus": 43, "ultim": 43, "underli": [43, 61, 62], "investig": 43, "behind": 43, "foundat": [43, 55], "technic": 43, "often": 43, "might": [43, 56], "pure": 43, "systemat": 43, "bias": [43, 44], "complic": 43, "matter": 43, "techniqu": 43, "nearest": 43, "themselv": 43, "amplifi": [43, 45], "rigor": 43, "benchmark": 43, "fulli": 43, "space": [43, 45], "highlight": 43, "challeng": 43, "unsolv": 43, "problem": 43, "briefli": [43, 56], "illustr": [43, 55], "capac": 43, "captur": 43, "intrigu": 43, "phenomena": 43, "disclaim": 43, "depth": [43, 44, 46], "insight": [43, 55], "glean": 43, "innacur": 43, "leidenalg": 43, "hdbscan": 43, "scikit": [43, 63], "warn": [43, 44, 45, 47, 49, 55], "get_embed": [43, 49, 55], "filterwarn": [43, 45, 47, 49, 55], "def": [43, 51, 61], "generate_umaps_from_embed": 43, "emb_nam": [43, 49], "euclidean": 43, "key_ad": 43, "neighbors_kei": 43, "x_emb_nam": 43, "x_": 43, "_": [43, 55], "_umap": 43, "x_umap": 43, "var_nam": [43, 44, 45, 47], "build_anndata_with_embed": 43, "coord": [43, 55], "miss": [43, 47, 51, 55], "intersect": 43, "accordingli": 43, "filt": 43, "ones": 43, "nan_row_sum": 43, "isnan": [43, 51], "total_column": 43, "embedding_uris_commun": 43, "scgpt": [43, 55], "contrib": [43, 45, 47, 49, 55], "cxg": [43, 55], "embedding_names_censu": 43, "embedding_names_al": 43, "obs_df": [43, 48, 49, 51, 55, 58, 60], "n_subset_cel": 43, "150000": 43, "idx_rand": 43, "choic": [43, 45, 47, 56], "soma_joinids_subset": 43, "tolist": [43, 44, 47, 48], "799353": 43, "distinctli": 43, "oca2": 43, "marker": [43, 47], "kit": 43, "vari": 43, "immatur": 43, "clearli": 43, "slight": 43, "extens": [43, 54], "concentr": 43, "seen": 43, "satellit": 43, "signatur": 43, "probabl": [43, 45, 61], "mani": [43, 51, 61], "disconnect": 43, "compon": 43, "tend": 43, "extent": 43, "versu": 43, "unclear": 43, "qualit": 43, "pronounc": 43, "basi": 43, "geneformer_umap": 43, "use_raw": 43, "scgpt_umap": 43, "uce_umap": 43, "scvi_umap": 43, "subclust": 43, "leiden": [43, 45, 47], "emploi": 43, "densiti": 43, "pairwis": 43, "distanc": [43, 51], "compar": [43, 47], "reveal": [43, 44], "distinct": [43, 61], "signific": [43, 58], "agreement": 43, "mutual": 43, "nmi": 43, "score": 43, "assign": [43, 51], "yield": 43, "65": [43, 45], "inher": 43, "expect": [43, 44, 46, 55], "finetun": 43, "homogen": [43, 61], "belong": 43, "underscor": 43, "draw": 43, "coupl": 43, "conclus": 43, "lead": 43, "identif": 43, "evid": 43, "examin": [43, 61], "relianc": 43, "unjustifi": 43, "known": 43, "phenomenon": 43, "cross": [43, 44], "fuller": 43, "hold": [43, 61], "lack": 43, "necessit": 43, "thereof": 43, "pd": [43, 44, 51, 59, 60, 61], "pdist": 43, "squareform": 43, "sklearn": [43, 47], "normalized_mutual_info_scor": 43, "adata_rbn": 43, "_connect": 43, "_leiden": 43, "pairwise_dist": 43, "_hdbscan": 43, "min_cluster_s": 43, "min_sampl": 43, "precomput": [43, 58], "fit_predict": 43, "displai": [43, 47, 48, 51, 55, 56, 61], "embedding_kei": 43, "sim_scores_leiden": 43, "len": [43, 44, 45, 47, 48, 51, 53, 54, 61], "sim_scores_hdbscan": 43, "embedding_i": 43, "enumer": 43, "embedding_j": 43, "sim_scores_leiden_t": 43, "sim_scores_hdbscan_t": 43, "seem": [43, 44], "log": [43, 44, 46, 47], "08115140648299893": 43, "7314893672395334": 43, "33702547333985217": 43, "7730928192948211": 43, "723355": 43, "721222": 43, "677754": 43, "775717": 43, "753719": 43, "822202": 43, "089308": 43, "106379": 43, "073141": 43, "480575": 43, "646415": 43, "356779": 43, "11896761": 43, "th": 43, "wherea": [43, 55], "tendenc": 43, "condit": [43, 57], "glioblastoma": 43, "pilocyt": 43, "astrocytoma": 43, "mix": 43, "outsid": 43, "53d208b0": [43, 44, 52], "2cfd": [43, 44, 52], "4366": [43, 44, 52], "9866": [43, 44, 52], "c3c6114081bc": [43, 44, 52], "smartseq": 43, "cftr": 43, "rare": 43, "recogniz": 43, "summary_t": 44, "980": [44, 59], "2907156": 44, "6011592": 44, "lung_ob": 44, "5945423": 44, "9f222629": [44, 56], "9e39": [44, 56], "47d0": [44, 56], "b83f": [44, 56], "e08d610c7479": [44, 56], "nativ": [44, 58], "0000003": [44, 48, 58], "0000461": [44, 57, 60], "5945426": 44, "ciliat": [44, 45], "columnar": [44, 45], "tracheobronchi": 44, "tree": 44, "0002145": 44, "57": [44, 45], "hsapdv": [44, 57], "0000151": 44, "0002771": 44, "0000384": [44, 60], "5945428": 44, "0000625": [44, 48], "0005097": 44, "5945432": 44, "0000624": [44, 48], "0005061": 44, "5945441": 44, "2907151": 44, "8c42cfd0": [44, 52, 53, 56], "0b0a": [44, 52, 53, 56], "46d5": [44, 52, 53, 56], "910c": [44, 52, 53, 56], "fc833d83c45e": [44, 52, 53, 56], "0000669": [44, 48], "0000145": 44, "0000383": [44, 60], "2907152": 44, "2907153": 44, "2907154": 44, "2907155": 44, "deeper": 44, "dive": 44, "characterist": 44, "set_index": [44, 47, 51, 53, 59, 60], "f171db61": [44, 52, 53, 57], "e57": [44, 52, 53, 57], "4535": [44, 52, 53, 57], "a06a": [44, 52, 53, 57], "35d8b6ef8f2b": [44, 52, 53, 57], "multiom": [44, 52, 53], "developm": [44, 52, 53], "donor_p13_trophoblast": [44, 52, 53], "ecf2e08": [44, 52, 53], "2032": [44, 52, 53], "4a9e": [44, 52, 53], "b466": [44, 52, 53], "b65b395f4a02": [44, 52, 53], "74cff64f": [44, 52, 53], "9da9": [44, 52, 53], "4b2a": [44, 52, 53], "9b3b": [44, 52, 53], "8a04a1598040": [44, 52, 53], "vivo": [44, 52, 53], "5af90777": [44, 52, 53], "6760": [44, 52, 53], "4003": [44, 52, 53], "9dba": [44, 52, 53], "8f945fec6fdf": [44, 52, 53], "intr": [44, 52, 53], "bd65a70f": [44, 52, 53], "b274": [44, 52, 53], "4133": [44, 52, 53], "b9dd": [44, 52, 53], "0d1431b6af34": [44, 52, 53], "multiregion": [44, 52, 53], "imm": [44, 52, 53], "f9ad5649": [44, 52, 53], "f372": [44, 52, 53], "43e1": [44, 52, 53], "a3a8": [44, 52, 53], "423383e5a8a2": [44, 52, 53], "molecular": [44, 52, 53], "character": [44, 46, 52, 53, 54], "vuln": [44, 52, 53], "456e8b9b": [44, 52, 53], "f872": [44, 52, 53], "488b": [44, 52, 53], "871d": [44, 52, 53], "94534090a865": [44, 52, 53], "peripher": [44, 52, 53], "immun": [44, 52, 53, 54], "respon": [44, 52, 53], "589": [44, 52, 53], "2adb1f8a": [44, 52, 53, 57], "a6b1": [44, 52, 53, 57], "4909": [44, 52, 53, 57], "8ee8": [44, 52, 53, 57], "484814e2d4bf": [44, 52, 53, 57], "landscap": [44, 52, 53], "sing": [44, 52, 53], "590": [44, 52, 53], "e04daea4": [44, 52, 53], "4412": [44, 52, 53], "45b5": [44, 52, 53], "989e": [44, 52, 53], "76a9be070a89": [44, 52, 53], "krasnow": [44, 52, 53], "591": [44, 52, 53], "592": [44, 52, 53], "append": [44, 55], "dataset_cell_count": 44, "cell_count": 44, "merg": [44, 45, 55, 59], "1e6a6ef9": 44, "7ec9": 44, "4c90": 44, "bbfb": 44, "2ad3c3165fd1": 44, "1028006": 44, "resolut": [44, 56], "luca": 44, "ex": 44, "314": 44, "784630": 44, "f7c1c579": 44, "2dc0": 44, "47e2": 44, "ba19": 44, "8165c5a0e353": 44, "217738": 44, "fetal": 44, "survei": 44, "embryon": 44, "483": 44, "d8da613f": 44, "e681": 44, "4c69": 44, "b463": 44, "e94f5e66847f": 44, "116313": 44, "lethal": 44, "80": [44, 45, 58], "576f193c": 44, "75d0": 44, "4a11": 44, "bd25": 44, "8676587e6dc2": 44, "90384": 44, "htan": 44, "msk": 44, "377": 44, "d41f45c1": 44, "1b7b": 44, "4573": 44, "a998": 44, "ac5c5acb1647": 44, "82991": 44, "reg": 44, "regulatori": 44, "58": [44, 45], "3dc61ca1": 44, "ce40": 44, "46b6": 44, "8337": 44, "f27260fd9a03": 44, "71752": 44, "uncov": 44, "proxima": 44, "325": 44, "60993": 44, "2672b679": 44, "8048": 44, "4f5e": 44, "9786": 44, "f1b196ccfd08": 44, "57019": 44, "spleen": [44, 52, 54], "parenchyma": 44, "416": 44, "9dbab10c": 44, "118d": 44, "496b": 44, "966a": 44, "67f1763a6b7d": 44, "49014": 44, "criti": 44, "482": 44, "9968be68": 44, "ab65": 44, "4a38": 44, "9e1a": 44, "c9b6abece194": 44, "47909": 44, "chart": 44, "endod": 44, "78": [44, 45], "3de0ad6d": 44, "4378": 44, "4f62": 44, "b37b": 44, "ec0b75a50d94": 44, "46500": 44, "lungmap": 44, "broad": 44, "ag": [44, 46, 54], "healthi": 44, "456": 44, "2f132ec9": 44, "24b5": 44, "422f": 44, "9be0": 44, "ccef03b4fe28": 44, "39778": 44, "sar": 44, "cov": 44, "receptor": [44, 58], "ace2": [44, 56], "tmprss2": 44, "prima": 44, "312": 44, "1e5bd3b8": 44, "6a0e": 44, "4959": 44, "8d69": 44, "cafed30fe814": 44, "35699": 44, "emphysema": [44, 57], "130": 44, "35682": [44, 52], "475": [44, 52], "1b9d8702": 44, "5af8": 44, "4142": 44, "85ed": 44, "020eb06ec4f6": 44, "35419": 44, "tiss": 44, "411": 44, "4ed927e9": 44, "c099": 44, "49af": 44, "b8ce": 44, "a2652d069333": 44, "35284": 44, "367": 44, "33698": 44, "4b6af54a": 44, "4a21": 44, "46e0": 44, "bc8d": 44, "673c0561a836": 44, "18386": 44, "01209dce": 44, "3575": 44, "4bed": 44, "b1df": 44, "129f57fbc031": 44, "11059": 44, "8657": 44, "f9846bb4": 44, "784d": 44, "4582": 44, "92c1": 44, "3f279e4c6f0c": 44, "176": [44, 52], "fibroblast": [44, 45, 56, 58], "smooth": 44, "muscl": [44, 45, 52, 54], "317": 44, "f64e1be1": 44, "de15": 44, "4d27": 44, "8da4": 44, "82225cd4c035": 44, "55": [44, 45, 60], "370": 44, "810ac45f": 44, "8969": 44, "4698": 44, "b42c": 44, "652f802f75c2": 44, "endothelium": 44, "320": 44, "0ba16f4b": 44, "cb87": 44, "4fa3": 44, "9363": 44, "19fc51eec6e7": 44, "myeloid": [44, 45], "326": 44, "reprens": 44, "divers": [44, 48, 52, 55], "plastic": 44, "tumor": 44, "neutrophil": 44, "subpopul": 44, "distal": 44, "gradient": 44, "differenti": [44, 45], "regul": 44, "epitheli": [44, 45, 52, 54, 58, 61], "fate": 44, "tell": 44, "1236968": 44, "702074": 44, "262323": 44, "122902": 44, "97432": 44, "65220": 44, "41852": 44, "25662": 44, "8638": 44, "8016": 44, "1164084": 44, "772120": 44, "331019": 44, "209675": 44, "120796": 44, "55254": 44, "51343": 44, "45714": 44, "31923": 44, "31792": 44, "31540": 44, "21167": 44, "17590": 44, "12374": 44, "10765": 44, "1402565": 44, "1122990": 44, "381601": 44, "2468587": 44, "438569": 44, "head": [44, 52], "alveolar": [44, 58], "macrophag": [44, 45], "291507": 44, "263362": 44, "211456": 44, "189471": 44, "154415": 44, "ii": 44, "128463": 44, "tract": 44, "105090": 44, "102303": 44, "killer": [44, 45, 54, 56], "95953": 44, "92846": 44, "stromal": [44, 45, 52, 54], "87714": 44, "81125": 44, "malign": 44, "75917": 44, "plasma": 44, "64551": 44, "59353": 44, "45305": 44, "capillari": 44, "39416": 44, "36381": 44, "36049": 44, "35467": 44, "2576327": 44, "147410": 44, "alveolu": 44, "54085": 44, "lingula": 44, "upper": [44, 52], "lobe": 44, "32099": 44, "17854": 44, "12880": 44, "10113": 44, "9276": 44, "7981": 44, "middl": 44, "3847": 44, "lung_var": 44, "ensg00000121410": [44, 52], "a1bg": [44, 52], "3999": [44, 52], "ensg00000268895": [44, 52], "as1": [44, 52], "3374": [44, 52], "ensg00000148584": [44, 52], "a1cf": [44, 52], "9603": [44, 52], "ensg00000175899": [44, 52], "a2m": [44, 52], "6318": [44, 52], "ensg00000245105": [44, 52], "2948": [44, 52], "ensg00000288719": [44, 52], "rp4": [44, 52], "669p10": [44, 52], "ensg00000288720": [44, 52], "rp11": [44, 52], "852e15": [44, 52], "7007": [44, 52], "ensg00000288721": [44, 52], "rp5": [44, 52], "973n23": [44, 52], "7765": [44, 52], "ensg00000288723": [44, 52], "553n16": [44, 52], "1015": [44, 52], "ensg00000288724": [44, 52], "rp13": [44, 52], "546i2": [44, 52], "625": [44, 52], "60664": [44, 49, 52, 55, 61], "actual": [44, 61], "mislead": 44, "know": [44, 54, 57], "presence_matrix": [44, 46, 52], "get_presence_matrix": [44, 46, 52], "a1": 44, "17811": 44, "50259": 44, "44150": 44, "34265": 44, "22447": 44, "23642": 44, "26347": 44, "20921": 44, "24672": 44, "27705": 44, "27243": 44, "26323": 44, "27181": 44, "23203": 44, "57042": 44, "32610": 44, "29620": 44, "26454": 44, "23705": 44, "38676": 44, "47307": 44, "23740": 44, "22552": 44, "20594": 44, "19952": 44, "uint64": 44, "genes_measur": 44, "var_somaid": 44, "nonzero": [44, 46], "ensg00000128274": 44, "a4galt": 44, "3358": 44, "ensg00000094914": 44, "aaa": 44, "4727": 44, "ensg00000081760": 44, "aac": 44, "16039": 44, "29951": 44, "ensg00000177272": 44, "kcna3": 44, "2476": 44, "30157": 44, "ensg00000184709": 44, "lrrc26": 44, "1209": 44, "30185": 44, "ensg00000087250": 44, "mt3": 44, "1679": 44, "30202": 44, "ensg00000136352": 44, "nkx2": 44, "3165": 44, "30512": 44, "ensg00000231439": 44, "wasir2": 44, "1054": 44, "11595": 44, "composit": 44, "infect": 44, "12k": 44, "intens": 44, "exercis": 44, "exploratori": 44, "000": 44, "lung_cell_subsampled_n": 44, "100000": 44, "lung_cell_subsampled_id": 44, "random_st": 44, "lung_gene_id": 44, "lung_adata": 44, "highest_expr_gen": 44, "n_top": 44, "calculate_qc_metr": 44, "percent_top": 44, "inplac": [44, 47], "violin": [44, 47], "n_genes_by_count": 44, "rotat": 44, "90": 44, "total_count": 44, "outlier": 44, "exlcud": 44, "ll": [44, 46, 55, 60], "extra": 44, "_highly_variable_gen": 44, "_simpl": 44, "843": 44, "view_to_actu": 44, "28": [44, 45, 56, 61], "n_cell_typ": 44, "drop_dupl": [44, 57], "randint": 44, "rang": [44, 45, 47, 49, 55, 61], "06x": 44, "0xffffff": 44, "palett": 44, "legend_loc": 44, "hard": 44, "32": [44, 45, 61], "top_cell_typ": 44, "reset_index": [44, 51], "lung_adata_top_cell_typ": 44, "unix": [45, 47], "mkdir": [45, 47], "p": [45, 47, 50, 51, 59], "wget": [45, 47], "nv": [45, 47], "pbmc3k_filtered_gene_bc_matric": [45, 47], "tar": [45, 47], "gz": [45, 47], "cf": [45, 47], "10xgenom": [45, 47], "exp": [45, 47], "pbmc3k": [45, 47], "xzf": [45, 47], "09": [45, 55], "38": [45, 56, 59], "7621991": [45, 47], "gt": [45, 47, 52, 56], "deatail": [45, 47], "insid": [45, 47], "geneformer_info": 45, "cxg_embedding_info": [45, 47], "model_link": [45, 47, 55], "cli": [45, 53], "progress": [45, 47, 56], "fine_tuned_geneform": 45, "datacollatorforcellclassif": 45, "embextractor": 45, "transcriptometoken": 45, "bertforsequenceclassif": 45, "ensembl_id": [45, 47], "ensg00000139618": 45, "suffix": 45, "n_count": [45, 47], "joinid": [45, 47, 52, 55], "write": [45, 53], "disk": 45, "read_10x_mtx": [45, 47], "filtered_gene_bc_matric": [45, 47], "hg19": [45, 47], "gene_id": [45, 47], "h5ad_dir": 45, "makedir": 45, "track": 45, "token_dir": 45, "tokenized_data": 45, "custom_attr_name_dict": 45, "tokenize_data": 45, "data_directori": 45, "output_directori": 45, "output_prefix": 45, "file_format": 45, "filter_pass": 45, "model_dir": 45, "label_mapping_dict_fil": 45, "label_to_cell_subclass": 45, "fp": 45, "label_mapping_dict": 45, "best4": 45, "cn": 45, "sensu": 45, "vertebrata": 45, "gabaerg": 45, "abnorm": 45, "adventiti": [45, 56], "anim": 45, "cardiocyt": 45, "skelet": 45, "cuboid": 45, "contractil": 45, "defens": 45, "duct": 45, "ecto": 45, "ectoderm": 45, "endo": 45, "pancrea": [45, 52, 54], "urethra": 45, "eukaryot": 45, "fat": [45, 52], "germ": [45, 58], "glandular": 45, "35": [45, 61], "glial": 45, "37": 45, "hematopoiet": [45, 57], "precursor": 45, "hepatocyt": 45, "inflammatori": 45, "interneuron": [45, 52], "42": 45, "ionocyt": 45, "44": [45, 47, 56], "45": [45, 59], "46": 45, "leukocyt": [45, 61], "47": 45, "lymphocyt": 45, "48": [45, 51], "49": 45, "mammari": [45, 54], "mesenchym": [45, 56], "52": [45, 51], "meso": 45, "mesoderm": 45, "motor": 45, "mural": 45, "59": [45, 54], "myofibroblast": 45, "neural": 45, "termin": 45, "ovarian": 45, "surfac": 45, "67": [45, 59], "phagocyt": 45, "pigment": 45, "cultur": [45, 58], "71": 45, "primordi": 45, "progenitor": [45, 56], "73": 45, "salivari": 45, "sebac": 45, "75": [45, 52], "secretori": 45, "76": 45, "sensori": 45, "77": 45, "seromucu": 45, "secret": [45, 56], "somat": 45, "79": 45, "stem": [45, 56, 57, 60], "81": [45, 51], "82": 45, "83": [45, 51, 59], "84": 45, "transit": 45, "85": 45, "86": 45, "87": 45, "vertebr": 45, "load_from_disk": 45, "num_row": 45, "2700": 45, "dummi": [45, 47], "add_column": 45, "slow": 45, "pretrain": 45, "from_pretrain": 45, "data_col": 45, "vector": 45, "predicted_label_id": 45, "argmax": [45, 61], "predicted_label": 45, "predicted_cell_subclass": 45, "min_mean": 45, "0125": 45, "max_mean": 45, "min_disp": 45, "svd_solver": 45, "arpack": 45, "scapi": 45, "original_cell_typ": [45, 47], "cd14": [45, 47], "fcgr3a": [45, 47], "megakaryocyt": [45, 47], "rename_categori": 45, "titl": [45, 49, 55], "n_class": 45, "output_dir": 45, "geneformer_embed": 45, "embex": 45, "model_typ": 45, "cellclassifi": 45, "num_class": 45, "max_ncel": 45, "emb_label": 45, "emb_lay": 45, "forward_batch_s": 45, "nproc": 45, "extract_emb": 45, "model_directori": 45, "input_data_fil": 45, "re": [45, 52], "grab": [45, 48, 52, 55, 59], "c697eaaf": [45, 47], "a3b": [45, 47], "4251": [45, 47], "b036": [45, 47], "5f9052179e70": [45, 47], "f2a488bf": [45, 47], "782f": [45, 47], "4c20": [45, 47], "a8e5": [45, 47], "cb34d48c1f7e": [45, 47], "fa8605cf": [45, 47], "f27e": [45, 47], "44af": [45, 47], "ac2a": [45, 47], "476bee4410d3": [45, 47], "3c75a463": [45, 47], "6a87": [45, 47], "4132": [45, 47], "83a8": [45, 47], "c3002624394d": [45, 47], "adata_censu": [45, 47], "simplifi": [45, 51], "shared_gen": 45, "index_subset": [45, 47], "3000": [45, 47], "adata_join": 45, "outer": 45, "liver_dataset": 46, "liver_dataset_id": 46, "liver_adata": 46, "859": 46, "52392": [46, 51, 53, 59], "gene_pres": 46, "17992": 46, "992": 46, "toarrai": [46, 55], "000e": 46, "590e": 46, "02": [46, 49, 50, 55], "969e": 46, "03": [46, 49, 52, 53], "280e": 46, "250e": 46, "400e": 46, "gene_length": 46, "00000000e": [46, 49], "58654413e": 46, "32001885e": 46, "74444813e": 46, "31455088e": 46, "71500419e": 46, "78985747e": 46, "real": 46, "filter_cel": 46, "min_gen": 46, "filter_gen": 46, "min_cel": 46, "saniti": 46, "prepar": 47, "pbmc": 47, "3k": 47, "scvi_info": 47, "pt": 47, "cp": [47, 53], "randomforestclassifi": 47, "unassign": 47, "model_filenam": 47, "prepare_query_anndata": 47, "is_train": 47, "trick": 47, "forward": [47, 61], "reprsent": 47, "vae_q": 47, "load_query_data": 47, "gene_symbol": [47, 56], "notnul": 47, "perfectli": 47, "appropri": 47, "markers_row1": 47, "il7r": 47, "lyz": 47, "ms4a1": 47, "cd8a": 47, "gnly": 47, "markers_row2": 47, "nkg7": 47, "ms4a7": 47, "fcer1a": 47, "cst3": 47, "ppbp": 47, "catch_warn": 47, "nk": 47, "label_map": 47, "adata_census_subset": 47, "adata_combin": 47, "correl": 47, "forest": 47, "classifi": 47, "rfc": 47, "predicted_cell_typ": [47, 61], "enough": [48, 51], "itself": 48, "tip": 48, "soma_df": 48, "faster": 48, "refin": 48, "_obs_": 48, "unique_cell_type_ontology_term_id": 48, "lot": 48, "top_10": 48, "nthe": 48, "0000525": [48, 57], "2000060": [48, 57], "0008036": [48, 57], "0002488": 48, "0002343": 48, "0000084": 48, "0001078": 48, "0000815": 48, "0000235": 48, "3000001": 48, "0000540": 48, "7665340": 48, "0000679": 48, "1894047": 48, "0000128": 48, "1881077": 48, "1508920": 48, "1477453": 48, "1419507": 48, "0000057": 48, "1397813": 48, "0000860": 48, "1369142": 48, "1308000": [48, 58], "4023040": 48, "1229658": 48, "occurr": 48, "lung_tissu": 48, "ntop": 48, "185": 48, "0002063": 48, "0000775": 48, "0001044": 48, "0001050": 48, "0000814": 48, "0000071": 48, "0000192": 48, "0002503": 48, "0002370": 48, "562038": 48, "0000583": 48, "526859": 48, "323985": 48, "323610": 48, "266333": 48, "255425": 48, "205013": 48, "0000623": 48, "164944": 48, "0001064": 48, "149067": 48, "0002632": 48, "132243": 48, "0002082": 48, "ooo2084": 48, "0002080": 48, "0000746": 48, "49929": 48, "0008034": 48, "33361": 48, "0002548": 48, "33180": 48, "0002131": 48, "30915": 48, "0000115": 48, "30054": 48, "18391": 48, "0000763": 48, "14408": 48, "13552": 48, "9690": 48, "0002144": 48, "9025": 48, "labl": 48, "cols_to_queri": 48, "complet": [48, 58], "df": [48, 56], "col": [48, 51, 52], "tuniqu": 48, "372": [49, 55], "axisarrai": [49, 55], "soma_dim_1": [49, 51, 54, 55], "soma_data": [49, 51, 54, 55], "bfloat16": [49, 55], "bit": [49, 55], "expon": [49, 55], "mantissa": [49, 55], "simplest": [49, 55], "nervou": [49, 55], "befor": [49, 55], "correspondong": [49, 55], "31780": [49, 55], "get_embedding_metadata_by_nam": 49, "to_anndata": [49, 55], "obs_joinid": [49, 55], "embeddinng": [49, 55], "stand": [49, 55], "alon": [49, 55], "17187500e": 49, "82995605e": 49, "50000000e": 49, "39941406e": 49, "71606445e": 49, "39843750e": 49, "71115112e": 49, "32031250e": 49, "00781250e": 49, "55310059e": 49, "85009766e": 49, "10156250e": 49, "42614746e": 49, "45312500e": 49, "53295898e": 49, "12915039e": 49, "84765625e": 49, "54113770e": 49, "94531250e": 49, "38281250e": 49, "03149414e": 49, "28881836e": 49, "14111328e": 49, "78125000e": 49, "15234375e": 49, "39562988e": 49, "79687500e": 49, "48388672e": 49, "19628906e": 49, "62803650e": 49, "88446045e": 49, "75694072": 50, "45846761": 50, "16292": 50, "2153": 50, "doi": [50, 55], "1002": 50, "ctm2": 50, "1356": 50, "695": 50, "696": 50, "697": 50, "1016": [50, 52, 53], "isci": 50, "698": 50, "1371": 50, "journal": 50, "699": 50, "700": 50, "cardiac": 50, "atrium": 50, "slice_dataset": 50, "isin": [50, 52], "sep": 50, "1126": [50, 52], "abl4896": [50, 52], "4866a804": 50, "37eb": 50, "436f": 50, "8c87": 50, "9cd585260061": 50, "e5f58829": [50, 52], "1a66": [50, 52], "40b5": [50, 52], "a624": [50, 52], "9046778e74f5": [50, 52], "bfd80f12": 50, "725c": 50, "4482": 50, "ad7f": 50, "1ed2b4909b0d": 50, "e6df8a57": 50, "f54f": 50, "413a": 50, "9d4d": 50, "dee03294d778": 50, "8d599205": 50, "5c51": 50, "4b50": 50, "9d48": 50, "3dec31238587": 50, "f6065c51": 50, "bd26": 50, "4aa5": 50, "a05d": 50, "2805aeea48d9": 50, "8cdbf790": 50, "4d29": 50, "4f46": 50, "9aef": 50, "21adfb2e21da": 50, "mybpc3": 50, "easier": 51, "experiment_queri": 51, "x_as_seri": 51, "nd": 51, "raw_n": 51, "aka": 51, "iloc": 51, "expens": 51, "var_df": [51, 52, 59], "float64": 51, "coo": 51, "arrow_tbl": 51, "var_dim": 51, "by_var": 51, "errstat": 51, "raw_mean": 51, "ensmusg00000051951": [51, 59], "xkr4": [51, 59], "6094": [51, 59], "202": 51, "032743": 51, "ensmusg00000089699": [51, 59], "gm1992": [51, 59], "250": [51, 59], "ensmusg00000102343": [51, 59], "gm37381": [51, 59], "1364": [51, 59], "ensmusg00000025900": [51, 59], "rp1": [51, 59], "12311": [51, 59], "106": 51, "236265": 51, "ensmusg00000025902": [51, 59], "sox17": [51, 59], "4772": [51, 59], "3259": 51, "991975": 51, "52387": [51, 59], "ensmusg00000081591": [51, 59], "btf3": [51, 59], "ps9": [51, 59], "496": [51, 59], "52388": [51, 59], "ensmusg00000118710": [51, 59], "mmu": [51, 59], "mir": [51, 59], "467a": [51, 59], "3_ensmusg00000118710": [51, 59], "52389": [51, 59], "ensmusg00000119584": [51, 59], "rn18": [51, 59], "1849": [51, 59], "52390": [51, 59], "ensmusg00000118538": [51, 59], "gm18218": [51, 59], "970": [51, 59], "52391": [51, 59], "ensmusg00000084217": [51, 59], "setd9": [51, 59], "670": [51, 59], "welford": [51, 60], "npt": 51, "onlinematrixmeanvari": 51, "n_sampl": 51, "n_variabl": 51, "axix": 51, "n_a": 51, "int32": [51, 61], "u_a": 51, "m2_a": 51, "coord_vec": 51, "value_vec": 51, "_mean_variance_upd": 51, "tupl": 51, "m2": 51, "_mean_variance_fin": 51, "max": 51, "jit": 51, "nopython": 51, "col_arr": 51, "val_arr": 51, "squar": 51, "val": 51, "u_prev": 51, "m2_prev": 51, "accont": 51, "chan": 51, "n_b": 51, "u_b": 51, "m2_b": 51, "mvn": 51, "raw_vari": 51, "848": 51, "312801": 51, "169": 51, "182975": 51, "279575": 51, "656207": 51, "malat1": 51, "ptprd": 51, "dlg2": 51, "pcdh9": 51, "n_cells_by_dataset": 51, "multiindex": 51, "from_product": 51, "n_cell": 51, "x_tbl": 51, "to_fram": 51, "get_index": 51, "pick": [51, 53], "3bbb6cf9": 51, "72b9": 51, "41be": 51, "b568": 51, "656de6eb18b5": 51, "ensmusg00000028399": 51, "79578": 51, "58b01044": 51, "c5e5": 51, "4b0f": 51, "8a2d": 51, "6ebf951e01ff": 51, "474": 51, "ensmusg00000052572": 51, "79513": 51, "98e5ea9f": [51, 60], "16d6": [51, 60], "47ec": [51, 60], "a529": [51, 60], "686e76515e39": [51, 60], "908": 51, "66ff82b4": 51, "9380": 51, "469c": 51, "bc4b": 51, "cfa08eacd325": 51, "c08f8441": 51, "4a10": 51, "4748": 51, "872a": 51, "e70c0bcccdba": 51, "ensmusg00000055421": 51, "79476": 51, "125": [51, 61], "3027": 51, "2910": 51, "117": 51, "ensmusg00000092341": 51, "79667": 51, "12622": 51, "20094": 51, "7102": 51, "12992": 51, "compil": 52, "n_dataset": 52, "therein": [52, 53], "human_rna": 52, "datasets_df": 52, "e2c257e7": [52, 53], "6f79": [52, 53], "487c": [52, 53], "b81c": [52, 53], "39451cd4ab3c": [52, 53], "023": [52, 53], "05869": [52, 53], "31497": [52, 53], "67070": [52, 53], "286326": [52, 53], "f7cecffa": [52, 53], "00b4": [52, 53], "4560": [52, 53], "a29a": [52, 53], "8ad626b8ee08": [52, 53], "ccell": [52, 53], "001": [52, 53], "270855": [52, 53], "3f50314f": [52, 53], "bdc9": [52, 53], "40c6": [52, 53], "8e4a": [52, 53], "b0901ebfbe4c": [52, 53], "2021": [52, 53], "007": [52, 53], "167283": [52, 53], "180bff9c": [52, 53], "c8a5": [52, 53], "4539": [52, 53], "b13b": [52, 53], "ddbc00d643e6": [52, 53], "s41593": [52, 53], "00764": [52, 53], "8168": [52, 53], "a72afd53": [52, 53], "ab92": [52, 53], "4511": [52, 53], "88da": [52, 53], "252fb0e26b9a": [52, 53], "s41591": [52, 53], "0944": [52, 53], "y": [52, 53], "44721": [52, 53], "38833785": [52, 53], "fac5": [52, 53], "48fd": [52, 53], "944a": [52, 53], "0f62a4c23ed1": [52, 53], "2157": [52, 53], "598266": [52, 53], "5d445965": [52, 53], "6f1a": [52, 53], "4b68": [52, 53], "ba3a": [52, 53], "b8f765155d3a": [52, 53], "2922": [52, 53], "9409": [52, 53], "65662": [52, 53], "593x60664": 52, "16133717": 52, "manipul": 52, "ensg00000286096": 52, "97a17473": 52, "e2b1": 52, "4f31": 52, "a544": 52, "44a60773e2dd": 52, "var_joinid": 52, "dataset_joinid": 52, "is_pres": 52, "tocoo": 52, "ff45e623": 52, "7f5f": 52, "46e3": 52, "b47d": 52, "56be0341f66b": 52, "13497": 52, "f01bdd17": 52, "4902": 52, "40f5": 52, "86e3": 52, "240d66dd2587": 52, "salivary_gland": 52, "27199": 52, "e6a11140": 52, "2545": 52, "46bc": 52, "929e": 52, "da243eed2ca": 52, "11505": 52, "e5c63d94": 52, "593c": 52, "4338": 52, "a489": 52, "e1048599e751": 52, "bladder": [52, 54], "24583": 52, "d8732da6": 52, "8d1d": 52, "42d9": 52, "b625": 52, "f2416c30054b": 52, "trachea": [52, 54], "9522": 52, "cee11228": 52, "9f0b": 52, "4e57": 52, "afe2": 52, "cfe15ee56312": 52, "34004": 52, "a357414d": 52, "2042": 52, "4eb5": 52, "95f0": 52, "c58604a18bdd": 52, "small_intestin": 52, "12467": 52, "a0754256": 52, "f44b": 52, "4c4a": 52, "962c": 52, "a552e47d3fdc": 52, "10650": 52, "983d5ec9": 52, "40e8": 52, "4512": 52, "9e65": 52, "a572a9c486cb": 52, "50115": 52, "5e5e7a2f": 52, "8f1c": 52, "42ac": 52, "90dc": 52, "b4f80f38e84c": 52, "20263": 52, "55cf0ea3": 52, "9d2b": 52, "4294": 52, "871e": 52, "bb4b49a79fc7": 52, "15020": [52, 61], "4f1555bc": 52, "4664": 52, "46c3": 52, "a606": 52, "78d34dd10d92": 52, "bone_marrow": [52, 53], "12297": 52, "2423ce2c": 52, "3149": 52, "4cca": 52, "a2ff": 52, "cf682ea29b5f": 52, "9641": 52, "1c9eb291": 52, "6d31": 52, "47e1": 52, "96b2": 52, "129b5e1ae64f": 52, "30746": 52, "18eb630b": 52, "a754": 52, "4111": 52, "8cd4": 52, "c24ec80aa5ec": 52, "lymph_nod": 52, "53275": 52, "0d2ee4ac": 52, "05ee": 52, "40b2": 52, "afb6": 52, "ebb584caa867": 52, "0ced5e76": 52, "6040": 52, "47ff": 52, "8a72": 52, "93847965afc0": 52, "thymu": [52, 54], "33664": 52, "283d65eb": 52, "dd53": 52, "496d": 52, "adb7": 52, "7570c7caa443": 52, "1101": [52, 55], "511898": 52, "8e10f1c4": 52, "8e98": 52, "41e5": 52, "b65f": 52, "8cd89a887122": 52, "2480956": 52, "139": 52, "fe1a73ab": 52, "a203": 52, "45fd": 52, "84e9": 52, "0f7fd19efcbd": 52, "dissect": 52, "amygdaloid": 52, "ami": [52, 63], "basolat": 52, "35285": 52, "143": 52, "f8dda921": 52, "5fb4": 52, "4c94": 52, "a654": 52, "c6fc346bfd6d": 52, "cerebr": 52, "cortex": 52, "cx": 52, "occipitotem": 52, "31899": 52, "160": 52, "dd03ce70": 52, "3243": 52, "4c96": 52, "9561": 52, "330cc461e4d7": 52, "perirhin": 52, "23732": 52, "165": 52, "d2b5efc1": 52, "14c6": 52, "4b5f": 52, "bd98": 52, "40f9084872d7": 52, "tail": 52, "hippocampu": 52, "hit": 52, "caudal": 52, "36886": 52, "175": 52, "c4b03352": 52, "af8d": 52, "492a": 52, "8d6b": 52, "40f304e0a122": 52, "superclust": 52, "medium": 52, "spini": 52, "152189": 52, "c2aad8fc": 52, "b63b": 52, "4f9b": 52, "9cfd": 52, "baf7bc9c1771": 52, "tempor": 52, "po": 52, "37642": 52, "177": 52, "c202b243": 52, "1aa1": 52, "4b16": 52, "bc9a": 52, "b36241f3b1e3": 52, "amygdala": 52, "excitatori": 52, "109452": 52, "178": 52, "bdb26abd": 52, "f4ba": 52, "4ea3": 52, "8862": 52, "c2340e7a4f55": 52, "cge": 52, "227671": 52, "183": 52, "acae7679": 52, "d077": 52, "461c": 52, "b857": 52, "ee6ccfeb267f": 52, "hih": 52, "ca1": 52, "39147": 52, "196": 52, "9372df2d": 52, "13d6": 52, "4fac": 52, "980b": 52, "919a5b7eb483": 52, "midbrain": 52, "periaqueduct": 52, "grai": 52, "33794": 52, "197": 52, "93131426": 52, "0124": 52, "4ab4": 52, "a013": 52, "9dfbcd99d467": 52, "epithalamu": 52, "eth": 52, "24327": 52, "206": [52, 59], "7c1c3d47": 52, "3166": 52, "43e5": 52, "9a95": 52, "65ceb2d45f78": 52, "pon": 52, "pn": 52, "pontin": 52, "reticular": 52, "49512": 52, "208": 52, "7a0a8891": 52, "9a22": 52, "4549": 52, "a55b": 52, "c2aca23c3a2a": 52, "hippocamp": 52, "74979": 52, "5e5ab909": 52, "f73f": 52, "4b57": 52, "98a0": 52, "6d2c5662f6a4": 52, "inferior": 52, "colliculu": 52, "32306": 52, "3f56901c": 52, "dd4a": 52, "47d6": 52, "b60b": 52, "7b0c0111cfb2": 52, "37911": 52, "3a7f3ab4": 52, "a280": 52, "4b3b": 52, "b2c0": 52, "6dd05614a78c": 52, "splatter": 52, "291833": 52, "249": 52, "35c8a04c": 52, "8639": 52, "4d15": 52, "8228": 52, "765d8d93fc96": 52, "hypothalamu": 52, "hth": 52, "supraopt": 52, "16753": 52, "270": 52, "07b1d7c8": 52, "5c2e": 52, "42f7": 52, "9246": 52, "26f746cd6013": 52, "myelencephalon": 52, "medulla": 52, "oblongata": 52, "27210": 52, "273": 52, "0325478a": 52, "9b52": 52, "b40a": 52, "2e2ab0d72eb1": 52, "intratelencephal": 52, "455006": 52, "483152": 52, "476": 52, "a68b64d8": 52, "aee3": 52, "4947": 52, "81b7": 52, "36b8fe5a44d2": 52, "82478": 52, "477": 52, "c5d88abe": 52, "f23a": 52, "45fa": 52, "a534": 52, "788985e93dad": 52, "264824": 52, "478": 52, "5a11f879": 52, "d1ef": 52, "458a": 52, "9b0bdfca5ebf": 52, "31691": 52, "479": 52, "104148": 52, "17481d16": 52, "ee44": 52, "49e5": 52, "bcf0": 52, "28c0780d8c4a": 52, "58109": 52, "ensg00000277745": 52, "h2ab3": 52, "58354": 52, "ensg00000233522": 52, "fam224a": 52, "2031": 52, "58411": 52, "ensg00000183146": 52, "prori": 52, "878": 52, "58523": 52, "ensg00000279274": 52, "533e23": 52, "58632": 52, "ensg00000277836": 52, "27211": 52, "all_experi": 53, "organism_nam": 53, "organism_experi": 53, "experiments_total_cel": 53, "num_cel": 53, "nfound": 53, "5255245": 53, "turn": 53, "toolchain": 53, "0bd1a1d": 53, "3aee": 53, "40e0": 53, "b2ec": 53, "86c7a30c7149": 53, "522": 53, "atl": 53, "40220": [53, 54], "submitt": 53, "tabula_muris_seni": 53, "lineag": [54, 55], "jin": 54, "tabula_muris_dataset_id": 54, "48b37086": [54, 56, 60], "25f7": [54, 56, 60], "4ecd": [54, 56, 60], "be66": [54, 56, 60], "f5bb378e3aea": [54, 56, 60], "tabula_muris_ob": 54, "35718": 54, "limb": 54, "28867": 54, "24540": 54, "21647": 54, "20680": 54, "12295": 54, "9275": 54, "lumen": 54, "8945": 54, "8613": 54, "7976": 54, "6777": 54, "6201": 54, "skin": [54, 60], "bodi": [54, 60], "4454": 54, "1887": 54, "tabula_muris_liver_dataset_id": 54, "tabula_muris_liver_ob": 54, "awar": 54, "chanc": 54, "priori": [54, 57], "sai": 54, "nk_cell": 54, "80935": 54, "nk_cells_primari": 54, "59109": 54, "aqp5": [54, 57], "adata_primari": 54, "demo": [54, 58], "awai": 54, "8448858": 54, "52812487": 54, "52812553": 54, "52812556": 54, "52812566": 54, "113": 54, "170": 54, "37033": 54, "37052": 54, "36904": 54, "36919": 54, "meaning": 55, "confirm": 55, "easiest": [55, 57], "data_typ": 55, "nmf": 55, "featu": 55, "impli": 55, "anoth": 55, "get_embedding_metadata": 55, "00506592": 55, "01348877": 55, "03173828": 55, "02331543": 55, "02404785": 55, "02441406": 55, "00595093": 55, "0065918": 55, "00070572": 55, "00187683": 55, "04663086": 55, "04614258": 55, "115722": 55, "512": [55, 59], "advanc": [55, 59], "portion": 55, "caution": 55, "quit": 55, "500_000": 55, "fail": [55, 59], "embedding_slic": 55, "emb_data": 55, "emb_joinid": 55, "reindex_disable_on_axi": 55, "embedding_presence_mask": 55, "getnnz": 55, "embedding_data": 55, "vstack": 55, "embedding_joinid": 55, "00762939": 55, "00076675": 55, "00047874": 55, "03588867": 55, "00405884": 55, "00239563": 55, "00982666": 55, "00946045": 55, "00473022": 55, "0135498": 55, "01049805": 55, "03051758": 55, "critic": 55, "meaningless": 55, "embedding_metadata": 55, "toward": 55, "ai": 55, "burgeon": 55, "pioneer": 55, "million": 55, "distil": 55, "concern": 55, "transfer": 55, "optim": [55, 61], "superior": 55, "primary_contact": 55, "bo": 55, "wang": 55, "bowang": 55, "vectorinstitut": 55, "affili": 55, "toronto": 55, "additional_contact": 55, "538439": 55, "additional_inform": 55, "62998417": 55, "submission_d": 55, "nonsens": 55, "assert": 55, "laura": 56, "luebbert": 56, "lauraluebbert": 56, "caltech": 56, "edu": 56, "databas": 56, "facilit": [56, 62], "cite": 56, "googl": 56, "colab": 56, "q": 56, "setup": 56, "fri": 56, "jul": 56, "succesfulli": 56, "gget_cellxgen": 56, "speci": 56, "meta_onli": 56, "verbos": 56, "arg": 56, "slc5a1": 56, "ensg00000130234": 56, "ensg00000100170": 56, "ui": 56, "celltyp": 56, "mucu": 56, "neuroendocrin": 56, "canon": 56, "cellular": 56, "reus": 56, "secondari": 56, "portal": 56, "9b94ccb0a2e0a8f6182b213aa4852c491f6f6aff": 56, "backend": 56, "wmg": 56, "tissue_mapp": 56, "abca1": 56, "minut": 56, "3679": 56, "thousand": 56, "ensg00000165029": 56, "11343": 56, "5332": 56, "9739": 56, "24539": 56, "5081": 56, "3674": 56, "3675": 56, "3676": 56, "3677": 56, "3678": 56, "retina": 56, "config": 56, "inlinebackend": 56, "figure_format": 56, "dotplot": 56, "ensmusg00000015405": 56, "047d57f2": 56, "4d14": 56, "45de": 56, "aa98": 56, "336c6f583750": 56, "97547": 56, "97548": 56, "97549": 56, "97550": 56, "97551": 56, "97552": 56, "example_adata": 56, "example_meta": 56, "querycondit": 57, "2313": 57, "2308": 57, "2309": 57, "2310": 57, "2311": 57, "2312": 57, "8626": 57, "1884": 57, "27047": 57, "tubb4b": 57, "2037": 57, "materi": 57, "shortli": 57, "comparison": 57, "op": 57, "sex_cell_metadata": 57, "669": 57, "385437": 57, "metatadata": 57, "cell_metadata_all_unknown_sex": 57, "9th": 57, "post": 57, "fertil": 57, "0000046": 57, "decidua": 57, "basali": 57, "0000453": 57, "placenta": 57, "0001987": 57, "3251329": 57, "56274573": 57, "cord": 57, "2000095": 57, "newborn": 57, "0000082": 57, "han": 57, "chines": 57, "0027": 57, "umbil": 57, "0012168": 57, "0000178": 57, "3251330": 57, "56274574": 57, "3251331": 57, "56274575": 57, "3251332": 57, "56274576": 57, "3251333": 57, "56274577": 57, "3251334": 57, "cell_metadata_b_cel": 57, "42720": 57, "10631": 57, "8742": 57, "8187": 57, "2083": 57, "1534": 57, "1512": 57, "1474": 57, "1210": 57, "332": 57, "204": 57, "133": 57, "gene_metadata": 57, "isn": 58, "narrow": 58, "as_index": 58, "0000001": 58, "0000006": 58, "2502": 58, "0000015": 58, "621": 58, "0000019": 58, "608": 58, "4028006": 58, "38250": 58, "609": 58, "4030009": 58, "tubul": 58, "segment": 58, "777": 58, "610": 58, "4030011": 58, "989": 58, "611": 58, "4030018": 58, "princip": 58, "107": [58, 59], "612": 58, "4030023": 58, "hillock": 58, "10170": 58, "semant": 59, "maxmimum": 59, "nois": 59, "disabl": 59, "docstr": 59, "hvgs_df": 59, "highly_variable_rank": 59, "230445": 59, "116": 59, "044863": 59, "749637": 59, "287551": 59, "276809": 59, "461324": 59, "407450": 59, "363945": 59, "055626": 59, "280": 59, "958509": 59, "combined_df": [59, 60], "188": 59, "ensmusg00000026117": 59, "zap70": 59, "2992": 59, "409091": 59, "14793": 59, "026717": 59, "350": 59, "775560": 59, "233": 59, "ensmusg00000026073": 59, "il1r2": 59, "1908": 59, "764085": 59, "41918": 59, "471500": 59, "402176": 59, "ensmusg00000026185": 59, "igfbp5": 59, "6006": 59, "234876": 59, "314355": 59, "591239": 59, "156": 59, "825651": 59, "ensmusg00000026180": 59, "cxcr2": 59, "3048": 59, "379390": 59, "10491": 59, "033344": 59, "640129": 59, "30296": 59, "ensmusg00000024803": 59, "ankrd1": 59, "2886": 59, "548572": 59, "274005": 59, "455137": 59, "741864": 59, "30313": 59, "ensmusg00000024987": 59, "cyp26a1": 59, "1983": 59, "186686": 59, "12973": 59, "622003": 59, "454": 59, "580162": 59, "30379": 59, "ensmusg00000018822": 59, "sfrp5": 59, "1900": 59, "927853": 59, "10943": 59, "645525": 59, "410": 59, "637004": 59, "32042": 59, "ensmusg00000031838": 59, "ifi30": 59, "91": 59, "676950": 59, "995276": 59, "564962": 59, "205886": 59, "33314": 59, "ensmusg00000092572": 59, "serpinb10": 59, "3490": 59, "264085": 59, "239812": 59, "487": 59, "535469": 59, "who": 59, "own": 59, "mv_df": 60, "3095357": 60, "915025": 60, "69571": 60, "774917": 60, "3095359": 60, "972801": 60, "9471": 60, "427044": 60, "3095363": 60, "169472": 60, "139042": 60, "208628": 60, "3095366": 60, "049836": 60, "24762": 60, "926397": 60, "3095368": 60, "345415": 60, "150412": 60, "440839": 60, "3278898": 60, "164319": 60, "339741": 60, "3278899": 60, "368339": 60, "930156": 60, "3278900": 60, "246049": 60, "886186": 60, "3278901": 60, "240724": 60, "307266": 60, "3278902": 60, "278420": 60, "086994": 60, "9314": 60, "keratinocyt": [60, 61], "0002337": 60, "mmusdv": 60, "0000089": 60, "18_53_m": 60, "0002097": 60, "18_47_f": 60, "basal": [60, 61], "epidermi": 60, "0002187": 60, "0000091": 60, "epiderm": 60, "0000362": 60, "logist": 61, "regress": 61, "ml": 61, "primer": 61, "census_ml": 61, "experiment_datapip": 61, "10_000": 61, "mechan": 61, "encapsul": 61, "caller": 61, "importantli": 61, "lazili": 61, "avoid": 61, "legaci": 61, "interchang": 61, "shuffler": 61, "layout": 61, "strategi": 61, "held": 61, "1gb": 61, "caus": 61, "valid": 61, "randomsplitt": 61, "train_datapip": 61, "test_datapip": 61, "random_split": 61, "weight": 61, "experiment_dataload": 61, "style": 61, "enforc": 61, "linear": 61, "logisticregress": 61, "input_dim": 61, "output_dim": 61, "super": 61, "noqa": 61, "up008": 61, "sigmoid": 61, "train_epoch": 61, "train_dataload": 61, "loss_fn": 61, "devic": 61, "train_loss": 61, "train_correct": 61, "train_tot": 61, "zero_grad": 61, "softmax": 61, "loss": 61, "backward": 61, "train_accuraci": 61, "secondli": 61, "42496620": 61, "42496621": 61, "42496622": 61, "42496633": 61, "42496634": 61, "42496635": 61, "desir": 61, "cuda": 61, "is_avail": 61, "cell_type_encod": 61, "classes_": 61, "crossentropyloss": 61, "adam": 61, "lr": 61, "7f": 61, "accuraci": 61, "4f": 61, "0167253": 61, "4856": 61, "0156710": 61, "4943": 61, "0149408": 61, "4813": 61, "0144469": 61, "5040": 61, "0141749": 61, "5669": 61, "0139776": 61, "6672": 61, "0138565": 61, "7920": 61, "0138094": 61, "8088": 61, "0136689": 61, "8757": 61, "0136101": 61, "8923": 61, "invok": 61, "eval": 61, "recov": 61, "At": 61, "unpickl": 61, "vein": 61, "123": 61, "124": 61, "127": 61, "helper": 62, "vscode": 63, "m6i": 63, "8xlarg": 63, "mount": 63, "nvme": 63, "drive": 63, "swap": 63, "third": 63, "parti": 63, "misc": 63, "soma_typ": 63, "clone": 63, "absent": 64, "paralleliz": 64}, "objects": {"": [[62, 0, 0, "-", "cellxgene_census"]], "cellxgene_census": [[1, 1, 1, "", "download_source_h5ad"], [15, 1, 1, "", "get_anndata"], [16, 1, 1, "", "get_census_version_description"], [17, 1, 1, "", "get_census_version_directory"], [18, 1, 1, "", "get_default_soma_context"], [19, 1, 1, "", "get_presence_matrix"], [20, 1, 1, "", "get_source_h5ad_uri"], [21, 1, 1, "", "open_soma"]], "cellxgene_census.experimental": [[2, 1, 1, "", "get_all_available_embeddings"], [3, 1, 1, "", "get_all_census_versions_with_embedding"], [4, 1, 1, "", "get_embedding"], [5, 1, 1, "", "get_embedding_metadata"], [6, 1, 1, "", "get_embedding_metadata_by_name"]], "cellxgene_census.experimental.ml.huggingface": [[7, 2, 1, "", "CellDatasetBuilder"], [8, 2, 1, "", "GeneformerTokenizer"]], "cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder": [[7, 3, 1, "", "__init__"]], "cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer": [[8, 3, 1, "", "__init__"]], "cellxgene_census.experimental.ml.pytorch": [[9, 2, 1, "", "ExperimentDataPipe"], [10, 2, 1, "", "Stats"], [11, 1, 1, "", "experiment_dataloader"]], "cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe": [[9, 3, 1, "", "__init__"]], "cellxgene_census.experimental.ml.pytorch.Stats": [[10, 3, 1, "", "__init__"]], "cellxgene_census.experimental.pp": [[12, 1, 1, "", "get_highly_variable_genes"], [13, 1, 1, "", "highly_variable_genes"], [14, 1, 1, "", "mean_variance"]]}, "objtypes": {"0": "py:module", "1": "py:function", "2": "py:class", "3": "py:method"}, "objnames": {"0": ["py", "module", "Python module"], "1": ["py", "function", "Python function"], "2": ["py", "class", "Python class"], "3": ["py", "method", "Python method"]}, "titleterms": {"api": [0, 27, 28, 38, 59, 60, 62], "document": 0, "cellxgene_censu": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 28, 49, 55], "download_source_h5ad": 1, "experiment": [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 59, 62], "get_all_available_embed": 2, "get_all_census_versions_with_embed": 3, "get_embed": 4, "get_embedding_metadata": 5, "get_embedding_metadata_by_nam": 6, "ml": [7, 8, 9, 10, 11], "huggingfac": [7, 8], "celldatasetbuild": 7, "geneformertoken": 8, "pytorch": [9, 10, 11, 61], "experimentdatapip": [9, 61], "stat": [10, 25, 48], "experiment_dataload": 11, "pp": [12, 13, 14], "get_highly_variable_gen": [12, 59], "highly_variable_gen": [13, 59], "mean_vari": 14, "get_anndata": [15, 49, 55], "get_census_version_descript": 16, "get_census_version_directori": 17, "get_default_soma_context": 18, "get_presence_matrix": 19, "get_source_h5ad_uri": 20, "open_soma": 21, "what": [22, 28, 29, 37, 64], "": [22, 37, 56], "new": [22, 25, 28, 37, 39], "2023": [22, 29], "r": [23, 27, 30, 32], "packag": [23, 45], "cellxgen": [23, 27, 31, 34, 35, 36, 40, 41, 49, 55, 56], "censu": [23, 25, 26, 27, 28, 29, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44, 45, 46, 48, 50, 52, 53, 54, 55, 57, 60, 61, 62], "v1": 23, "i": [23, 28, 29, 64], "out": [23, 54, 60], "instal": [23, 28, 30, 32, 56, 63], "usag": 23, "made": 23, "possibl": 23, "tiledbsoma": 23, "effici": [23, 24, 32], "access": [23, 25, 27, 49, 55], "singl": [23, 24, 25, 28, 33, 44, 45, 53, 57], "cell": [23, 24, 25, 26, 29, 32, 33, 35, 41, 43, 44, 45, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57, 58, 62], "data": [23, 25, 27, 28, 29, 31, 33, 35, 36, 39, 40, 41, 42, 44, 45, 46, 47, 53, 54, 55, 56, 57, 62], "33m": 23, "from": [23, 28, 42, 43, 44, 45, 53, 56], "easi": 23, "us": [23, 24, 25, 28, 31, 39, 40, 45, 47, 51, 56], "handl": 23, "cloud": 23, "host": [23, 28, 55], "queri": [23, 25, 28, 32, 49, 50, 55, 56, 57], "read": [23, 54], "metadata": [23, 25, 26, 29, 32, 35, 41, 44, 48, 50, 55, 56, 57], "export": [23, 25, 39], "slice": [23, 32, 42, 50, 60, 62], "seurat": [23, 32], "singlecellexperi": [23, 32], "stream": 23, "increment": [23, 51, 60], "chunk": 23, "memori": [24, 32], "implement": 24, "commonli": 24, "method": 24, "calcul": [24, 25, 44, 51, 58, 60], "averag": 24, "varianc": [24, 51, 60], "gene": [24, 25, 35, 41, 42, 44, 46, 48, 51, 52, 56, 57, 59], "express": [24, 44, 46, 53, 56, 57], "across": 24, "million": 24, "how": [24, 25, 27, 28], "work": 24, "exampl": [24, 37, 38, 41, 45, 46, 47, 48, 54, 60], "kra": 24, "aqp4": 24, "lung": [24, 43, 44], "epitheli": 24, "highli": [24, 59], "variabl": [24, 59], "find": [24, 42], "all": [24, 41, 44, 48, 52], "human": [24, 28, 41, 44], "esophagu": 24, "introduc": 25, "normal": [25, 28, 35, 42, 44, 46], "layer": [25, 28, 44], "pre": [25, 58], "statist": 25, "descript": 25, "ad": 25, "librari": 25, "size": 25, "enhanc": 25, "featur": [25, 28, 35, 62], "exist": 25, "toolkit": 25, "via": [25, 49, 50, 55], "tiledb": [25, 27], "soma": [25, 27, 33, 64], "util": 25, "ob": [25, 26, 35, 54, 56, 57], "var": [25, 35, 57], "help": 25, "u": 25, "improv": 25, "addit": 25, "support": [26, 28, 29], "categor": 26, "potenti": 26, "break": 26, "chang": 26, "identifi": [26, 52], "column": 26, "encod": [26, 35], "cz": [27, 31, 35, 36, 40, 41, 56], "discov": [27, 31, 35, 36, 40, 56], "aw": 27, "avail": [27, 41], "specif": [27, 52], "releas": [27, 29, 31, 36, 40], "version": [27, 29, 35, 62, 63], "cli": 27, "programat": 27, "download": [27, 45, 47, 53], "python": [27, 28, 30, 32, 39, 62, 63], "faq": 28, "why": [28, 54], "should": 28, "contain": 28, "do": 28, "cite": [28, 31, 40], "public": 28, "doe": 28, "have": 28, "embed": [28, 39, 43, 44, 45, 49, 55, 62], "differenti": 28, "other": [28, 45], "tool": [28, 31, 40, 42], "can": 28, "mous": [28, 42], "where": 28, "ar": [28, 54], "retriev": [28, 62], "origin": [28, 53], "h5ad": [28, 53], "dataset": [28, 35, 42, 44, 45, 52, 53, 61], "which": 28, "wa": 28, "built": 28, "increas": 28, "perform": [28, 45], "my": 28, "conda": 28, "ask": 28, "contribut": 28, "get": [28, 62], "an": [28, 49, 54, 55, 56, 61], "arrayschema": 28, "error": 28, "when": [28, 54], "open": [28, 41, 46, 48, 52, 57, 61, 62], "run": 28, "import": [28, 43, 45], "databrick": 28, "long": 29, "term": 29, "lt": 29, "weekli": 29, "latest": [29, 63], "list": 29, "12": 29, "15": 29, "inform": [29, 35, 36], "donor": 29, "count": [29, 35, 41, 51, 58], "embbed": 29, "07": 29, "25": 29, "05": 29, "errata": 29, "duplic": [29, 54], "observ": [29, 43], "is_primary_data": [29, 38], "true": 29, "requir": [30, 43, 45, 47, 50], "capabl": [31, 40], "schema": [31, 33, 35, 40], "question": [31, 40], "feedback": [31, 40], "issu": [31, 40], "come": [31, 40], "soon": [31, 40], "project": [31, 40, 45, 47], "quick": [32, 49, 55], "start": [32, 49, 55], "obtain": 32, "anndata": [32, 49, 50, 54, 55, 56, 62], "object": [32, 33, 56], "summari": [33, 35, 41, 44, 58], "info": [33, 41], "census_info": [33, 35], "census_data": [33, 35], "includ": [33, 35, 41], "mirror": 34, "overview": 35, "definit": [35, 36, 43], "speci": 35, "multi": [35, 42], "constraint": 35, "assai": [35, 41, 44], "full": [35, 46, 48], "sequenc": [35, 41, 46], "matrix": [35, 52, 62], "type": [35, 41, 44, 47, 48, 56], "sampl": [35, 43], "repeat": 35, "organ": [35, 41], "census_obj": 35, "somacollect": 35, "somadatafram": 35, "tabl": [35, 38, 41, 53], "summary_cell_count": 35, "somaexperi": 35, "raw": 35, "m": 35, "rna": 35, "x": [35, 51], "somasparsendarrai": 35, "presenc": [35, 52, 62], "feature_dataset_presence_matrix": 35, "changelog": 35, "2": 35, "0": 35, "1": 35, "3": 35, "storag": [36, 49, 55], "polici": 36, "json": 36, "articl": 37, "editori": [37, 38], "guidelin": [37, 38], "locat": 37, "titl": [37, 38], "date": 37, "author": 37, "introduct": [37, 38], "section": [37, 38], "notebook": 38, "vignett": 38, "content": [38, 41, 55], "knowledg": 38, "reinforc": 38, "tutori": 39, "integr": [39, 42], "model": [39, 45, 47, 61], "understand": [39, 41, 54], "analyz": 39, "scalabl": 39, "comput": [39, 51], "machin": [39, 62], "learn": [39, 41, 44, 62], "about": [41, 44], "main": 41, "compon": 41, "each": [41, 52], "number": 41, "microgli": 41, "beyond": [41, 58], "liver": [41, 42], "diseas": [41, 44], "t": 41, "tissu": [41, 43, 44, 56], "fetch": [42, 43, 44, 46, 52, 53, 55, 56, 57, 58], "10x": [42, 45], "genom": 42, "smart": [42, 46], "seq2": 42, "length": [42, 46], "scvi": [42, 47, 49], "inspect": [42, 45], "prior": 42, "batch": 42, "defin": [42, 61], "dataset_id": [42, 51], "donor_id": 42, "assay_ontology_term_id": 42, "suspension_typ": 42, "explor": [43, 44, 46, 53, 58], "biolog": 43, "relev": 43, "cluster": [43, 46], "background": [43, 55], "function": 43, "melanocyt": 43, "ey": 43, "150k": 43, "retin": 43, "bipolar": 43, "neuron": 43, "dopaminerg": 43, "brain": 43, "pulmonari": 43, "ionocyt": 43, "tabula": [43, 54], "sapien": 43, "sex": 44, "v": 44, "nucleu": 44, "sub": 44, "qc": 44, "metric": 44, "creat": [44, 54, 58, 61], "geneform": [45, 49], "class": [45, 61], "predict": [45, 47, 61], "system": [45, 47], "fine": 45, "tune": 45, "prepar": 45, "subclass": 45, "infer": [45, 47], "load": [45, 49, 55], "token": 45, "result": 45, "gener": [45, 50], "pbmc": 45, "3k": 45, "join": 45, "seq": 46, "account": 46, "valid": 46, "through": 46, "train": [47, 61], "pretrain": 47, "summar": 48, "subset": 48, "select": [48, 56], "value_filt": 48, "collabor": 49, "format": [49, 55], "associ": [49, 55], "obsm": [49, 55], "slot": [49, 55], "experimentaxisqueri": [49, 55], "dens": [49, 55], "numpi": [49, 55], "arrai": [49, 55], "citat": 50, "string": 50, "onlin": 51, "algorithm": 51, "mean": [51, 60], "per": 51, "group": 51, "measur": 52, "id": 52, "sourc": 53, "file": 53, "filter": 54, "muri": 54, "seni": 54, "frame": 54, "core": [54, 60], "oper": 54, "gget": 56, "modul": 56, "set": [56, 63], "up": [56, 63], "plot": 56, "dot": 56, "similar": 56, "those": 56, "shown": 56, "onli": 56, "correspond": 56, "command": 56, "line": 56, "census_summary_cell_count": 58, "datafram": 58, "valu": 58, "The": 60, "explain": 61, "paramet": 61, "split": 61, "dataload": 61, "make": 61, "build": 62, "process": 62, "depend": 63, "environ": 63, "verifi": 63, "your": 63, "develop": 63}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "nbsphinx": 4, "sphinx.ext.intersphinx": 1, "sphinx": 57}, "alltitles": {"API Documentation": [[0, "api-documentation"]], "cellxgene_census.download_source_h5ad": [[1, "cellxgene-census-download-source-h5ad"]], "cellxgene_census.experimental.get_all_available_embeddings": [[2, "cellxgene-census-experimental-get-all-available-embeddings"]], "cellxgene_census.experimental.get_all_census_versions_with_embedding": [[3, "cellxgene-census-experimental-get-all-census-versions-with-embedding"]], "cellxgene_census.experimental.get_embedding": [[4, "cellxgene-census-experimental-get-embedding"]], "cellxgene_census.experimental.get_embedding_metadata": [[5, "cellxgene-census-experimental-get-embedding-metadata"]], "cellxgene_census.experimental.get_embedding_metadata_by_name": [[6, "cellxgene-census-experimental-get-embedding-metadata-by-name"]], "cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder": [[7, "cellxgene-census-experimental-ml-huggingface-celldatasetbuilder"]], "cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer": [[8, "cellxgene-census-experimental-ml-huggingface-geneformertokenizer"]], "cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe": [[9, "cellxgene-census-experimental-ml-pytorch-experimentdatapipe"]], "cellxgene_census.experimental.ml.pytorch.Stats": [[10, "cellxgene-census-experimental-ml-pytorch-stats"]], "cellxgene_census.experimental.ml.pytorch.experiment_dataloader": [[11, "cellxgene-census-experimental-ml-pytorch-experiment-dataloader"]], "cellxgene_census.experimental.pp.get_highly_variable_genes": [[12, "cellxgene-census-experimental-pp-get-highly-variable-genes"]], "cellxgene_census.experimental.pp.highly_variable_genes": [[13, "cellxgene-census-experimental-pp-highly-variable-genes"]], "cellxgene_census.experimental.pp.mean_variance": [[14, "cellxgene-census-experimental-pp-mean-variance"]], "cellxgene_census.get_anndata": [[15, "cellxgene-census-get-anndata"]], "cellxgene_census.get_census_version_description": [[16, "cellxgene-census-get-census-version-description"]], "cellxgene_census.get_census_version_directory": [[17, "cellxgene-census-get-census-version-directory"]], "cellxgene_census.get_default_soma_context": [[18, "cellxgene-census-get-default-soma-context"]], "cellxgene_census.get_presence_matrix": [[19, "cellxgene-census-get-presence-matrix"]], "cellxgene_census.get_source_h5ad_uri": [[20, "cellxgene-census-get-source-h5ad-uri"]], "cellxgene_census.open_soma": [[21, "cellxgene-census-open-soma"]], "What\u2019s new?": [[22, "what-s-new"]], "2023": [[22, "id1"]], "R package cellxgene.census V1 is out!": [[23, "r-package-cellxgene-census-v1-is-out"]], "Installation and usage": [[23, "installation-and-usage"]], "Census R package is made possible by tiledbsoma": [[23, "census-r-package-is-made-possible-by-tiledbsoma"]], "Efficient access to single-cell data for >33M cells from R": [[23, "efficient-access-to-single-cell-data-for-33m-cells-from-r"]], "Easy-to-use handles to the cloud-hosted Census data": [[23, "easy-to-use-handles-to-the-cloud-hosted-census-data"]], "Querying and reading single-cell metadata from Census": [[23, "querying-and-reading-single-cell-metadata-from-census"]], "Exporting Census slices to Seurat and SingleCellExperiment": [[23, "exporting-census-slices-to-seurat-and-singlecellexperiment"]], "Streaming data incrementally in chunks": [[23, "streaming-data-incrementally-in-chunks"]], "Memory-efficient implementations of commonly used single-cell methods": [[24, "memory-efficient-implementations-of-commonly-used-single-cell-methods"]], "Efficient calculation of average and variance gene expression across millions of cells": [[24, "efficient-calculation-of-average-and-variance-gene-expression-across-millions-of-cells"]], "How it works": [[24, "how-it-works"], [24, "id1"]], "Example: KRAS and AQP4 average and variance expression in lung epithelial cells": [[24, "example-kras-and-aqp4-average-and-variance-expression-in-lung-epithelial-cells"]], "Efficient calculation of highly variable genes across millions of cells": [[24, "efficient-calculation-of-highly-variable-genes-across-millions-of-cells"]], "Example: Finding highly variable genes for all cells of the human esophagus": [[24, "example-finding-highly-variable-genes-for-all-cells-of-the-human-esophagus"]], "Introducing a normalized layer and pre-calculated cell and gene statistics in Census": [[25, "introducing-a-normalized-layer-and-pre-calculated-cell-and-gene-statistics-in-census"]], "Description of new data added to Census": [[25, "description-of-new-data-added-to-census"]], "Added a new library-size normalized layer": [[25, "added-a-new-library-size-normalized-layer"]], "Enhanced gene metadata": [[25, "enhanced-gene-metadata"]], "Enhanced cell metadata": [[25, "enhanced-cell-metadata"]], "How to use the new features": [[25, "how-to-use-the-new-features"]], "Exporting the normalized data to existing single-cell toolkits": [[25, "exporting-the-normalized-data-to-existing-single-cell-toolkits"]], "Accessing library-size normalized data layer via TileDB-SOMA": [[25, "accessing-library-size-normalized-data-layer-via-tiledb-soma"]], "Utilizing pre-calculated stats for querying obs and var": [[25, "utilizing-pre-calculated-stats-for-querying-obs-and-var"]], "Help us improve these data additions": [[25, "help-us-improve-these-data-additions"]], "Census supports categoricals for cell metadata": [[26, "census-supports-categoricals-for-cell-metadata"]], "Potential breaking changes": [[26, "potential-breaking-changes"]], "Identifying the obs columns encoded as categorical": [[26, "identifying-the-obs-columns-encoded-as-categorical"]], "CZ CELLxGENE Discover Census in AWS": [[27, "cz-cellxgene-discover-census-in-aws"]], "Census data available in AWS": [[27, "census-data-available-in-aws"]], "Data specifications": [[27, "data-specifications"]], "Data release versioning": [[27, "data-release-versioning"]], "How to access AWS Census data": [[27, "how-to-access-aws-census-data"]], "AWS CLI for programatic downloads": [[27, "aws-cli-for-programatic-downloads"]], "CELLxGENE Census API (Python and R)": [[27, "cellxgene-census-api-python-and-r"]], "TileDB-SOMA API (Python and R)": [[27, "tiledb-soma-api-python-and-r"]], "FAQ": [[28, "faq"]], "Why should I use the Census?": [[28, "why-should-i-use-the-census"]], "What data is contained in the Census?": [[28, "what-data-is-contained-in-the-census"]], "How do I cite the use of the Census for a publication?": [[28, "how-do-i-cite-the-use-of-the-census-for-a-publication"]], "Why does the Census not have a normalized layer or embeddings?": [[28, "why-does-the-census-not-have-a-normalized-layer-or-embeddings"]], "How does the Census differentiate from other tools?": [[28, "how-does-the-census-differentiate-from-other-tools"]], "Can I query human and mouse data in a single query?": [[28, "can-i-query-human-and-mouse-data-in-a-single-query"]], "Where are the Census data hosted?": [[28, "where-are-the-census-data-hosted"]], "Can I retrieve the original H5AD datasets from which the Census was built?": [[28, "can-i-retrieve-the-original-h5ad-datasets-from-which-the-census-was-built"]], "How can I increase the performance of my queries?": [[28, "how-can-i-increase-the-performance-of-my-queries"]], "Can I use conda to install the Census Python API?": [[28, "can-i-use-conda-to-install-the-census-python-api"]], "How can I ask for support?": [[28, "how-can-i-ask-for-support"]], "How can I ask for new features?": [[28, "how-can-i-ask-for-new-features"]], "How can I contribute my data to the Census?": [[28, "how-can-i-contribute-my-data-to-the-census"]], "Why do I get an ArraySchema error when opening the Census?": [[28, "why-do-i-get-an-arrayschema-error-when-opening-the-census"]], "Why do I get an error when running import cellxgene_census on Databricks?": [[28, "why-do-i-get-an-error-when-running-import-cellxgene-census-on-databricks"]], "Census data releases": [[29, "census-data-releases"]], "What is a Census data release?": [[29, "what-is-a-census-data-release"]], "Long-term supported (LTS) Census releases": [[29, "long-term-supported-lts-census-releases"]], "Weekly Census releases (latest)": [[29, "weekly-census-releases-latest"]], "List of LTS Census data releases": [[29, "list-of-lts-census-data-releases"]], "LTS 2023-12-15": [[29, "lts-2023-12-15"]], "Version information": [[29, "version-information"], [29, "id1"], [29, "id4"]], "Cell and donor counts": [[29, "cell-and-donor-counts"], [29, "id2"], [29, "id5"]], "Cell metadata": [[29, "cell-metadata"], [29, "id3"], [29, "id6"], [41, "Cell-metadata"]], "Cell embbedings": [[29, "cell-embbedings"]], "LTS 2023-07-25": [[29, "lts-2023-07-25"]], "LTS 2023-05-15": [[29, "lts-2023-05-15"]], "\ud83d\udd34 Errata \ud83d\udd34": [[29, "errata"]], "Duplicate observations with  is_primary_data = True": [[29, "duplicate-observations-with-is-primary-data-true"]], "Installation": [[30, "installation"], [32, "installation"], [63, "installation"]], "Requirements": [[30, "requirements"], [43, "Requirements"], [45, "Requirements"], [47, "Requirements"], [50, "Requirements"]], "Python": [[30, "python"]], "R": [[30, "r"]], "CZ CELLxGENE Discover Census": [[31, "cz-cellxgene-discover-census"], [40, "cz-cellxgene-discover-census"]], "Citing Census": [[31, "citing-census"], [40, "citing-census"]], "Census Capabilities": [[31, "census-capabilities"], [40, "census-capabilities"]], "Census Data and Schema": [[31, "census-data-and-schema"], [40, "census-data-and-schema"]], "Census Data Releases": [[31, "census-data-releases"], [40, "census-data-releases"]], "Questions, Feedback and Issues": [[31, "questions-feedback-and-issues"], [40, "questions-feedback-and-issues"]], "Coming Soon!": [[31, "coming-soon"], [40, "coming-soon"]], "Projects and Tools Using Census": [[31, "projects-and-tools-using-census"], [40, "projects-and-tools-using-census"]], "Quick start": [[32, "quick-start"], [49, "Quick-start"], [55, "Quick-start"]], "Python quick start": [[32, "python-quick-start"]], "Querying a slice of cell metadata": [[32, "querying-a-slice-of-cell-metadata"], [32, "id1"]], "Obtaining a slice as AnnData": [[32, "obtaining-a-slice-as-anndata"]], "Memory-efficient queries": [[32, "memory-efficient-queries"], [32, "id2"]], "R quick start": [[32, "r-quick-start"]], "Obtaining a slice as a Seurat or SingleCellExperiment object": [[32, "obtaining-a-slice-as-a-seurat-or-singlecellexperiment-object"]], "Census data and schema": [[33, "census-data-and-schema"]], "Schema": [[33, "schema"], [35, "schema"]], "Census summary info \"census_info\"": [[33, "census-summary-info-census-info"]], "Census single-cell data \"census_data\"": [[33, "census-single-cell-data-census-data"]], "Data included in the Census": [[33, "data-included-in-the-census"]], "SOMA objects": [[33, "soma-objects"]], "CELLxGENE Census Mirroring": [[34, "cellxgene-census-mirroring"]], "CZ CELLxGENE Discover Census Schema": [[35, "cz-cellxgene-discover-census-schema"]], "Census overview": [[35, "census-overview"]], "Definitions": [[35, "definitions"], [36, "definitions"]], "Census Schema versioning": [[35, "census-schema-versioning"]], "Data included": [[35, "data-included"]], "Species": [[35, "species"]], "Multi-species data constraints": [[35, "multi-species-data-constraints"]], "Assays": [[35, "assays"], [44, "Assays"]], "Full-gene sequencing assays": [[35, "full-gene-sequencing-assays"]], "Data matrix types": [[35, "data-matrix-types"]], "Sample types": [[35, "sample-types"]], "Repeated data": [[35, "repeated-data"]], "Data encoding and organization": [[35, "data-encoding-and-organization"]], "Census information census_obj[\"census_info\"] - SOMACollection": [[35, "census-information-census-obj-census-info-somacollection"]], "Census metadata \u2013 census_obj\u200b\u200b[\"census_info\"][\"summary\"] \u2013 SOMADataFrame": [[35, "census-metadata-census-obj-census-info-summary-somadataframe"]], "Census table of CELLxGENE Discover datasets \u2013 census_obj[\"census_info\"][\"datasets\"] \u2013 SOMADataFrame": [[35, "census-table-of-cellxgene-discover-datasets-census-obj-census-info-datasets-somadataframe"]], "Census summary cell counts  \u2013 census_obj[\"census_info\"][\"summary_cell_counts\"] \u2013 SOMADataframe": [[35, "census-summary-cell-counts-census-obj-census-info-summary-cell-counts-somadataframe"]], "Census table of organisms  \u2013 census_obj[\"census_info\"][\"organisms\"] \u2013 SOMADataframe": [[35, "census-table-of-organisms-census-obj-census-info-organisms-somadataframe"]], "Census Data \u2013 census_obj[\"census_data\"][organism] \u2013 SOMAExperiment": [[35, "census-data-census-obj-census-data-organism-somaexperiment"]], "Matrix Data, count (raw) matrix \u2013 census_obj[\"census_data\"][organism].ms[\"RNA\"].X[\"raw\"] \u2013 SOMASparseNDArray": [[35, "matrix-data-count-raw-matrix-census-obj-census-data-organism-ms-rna-x-raw-somasparsendarray"]], "Matrix Data, normalized count matrix \u2013 census_obj[\"census_data\"][organism].ms[\"RNA\"].X[\"normalized\"] \u2013 SOMASparseNDArray": [[35, "matrix-data-normalized-count-matrix-census-obj-census-data-organism-ms-rna-x-normalized-somasparsendarray"]], "Feature metadata \u2013 census_obj[\"census_data\"][organism].ms[\"RNA\"].var \u2013 SOMADataFrame": [[35, "feature-metadata-census-obj-census-data-organism-ms-rna-var-somadataframe"]], "Feature dataset presence matrix \u2013 census_obj[\"census_data\"][organism].ms[\"RNA\"][\"feature_dataset_presence_matrix\"] \u2013 SOMASparseNDArray": [[35, "feature-dataset-presence-matrix-census-obj-census-data-organism-ms-rna-feature-dataset-presence-matrix-somasparsendarray"]], "Cell metadata \u2013 census_obj[\"census_data\"][organism].obs \u2013 SOMADataFrame": [[35, "cell-metadata-census-obj-census-data-organism-obs-somadataframe"]], "Changelog": [[35, "changelog"]], "Version 2.0.0": [[35, "version-2-0-0"]], "Version 1.3.0": [[35, "version-1-3-0"]], "Version 1.2.0": [[35, "version-1-2-0"]], "Version 1.1.0": [[35, "version-1-1-0"]], "Version 1.0.0": [[35, "version-1-0-0"]], "Version 0.1.1": [[35, "version-0-1-1"]], "Version 0.1.0": [[35, "version-0-1-0"]], "Version 0.0.1": [[35, "version-0-0-1"]], "CZ CELLxGENE Discover Census storage & release policy": [[36, "cz-cellxgene-discover-census-storage-release-policy"]], "Census data storage policy": [[36, "census-data-storage-policy"]], "Census release information json": [[36, "census-release-information-json"]], "Census \u201cwhat\u2019s new?\u201d article editorial guidelines": [[37, "census-what-s-new-article-editorial-guidelines"]], "Location": [[37, "location"]], "Guidelines": [[37, "guidelines"], [38, "guidelines"]], "Title": [[37, "title"], [38, "title"]], "Date & author": [[37, "date-author"]], "Introduction": [[37, "introduction"], [38, "introduction"]], "Sections": [[37, "sections"], [38, "sections"]], "Example article": [[37, "example-article"]], "Census API notebook/vignette editorial guidelines": [[38, "census-api-notebook-vignette-editorial-guidelines"]], "Table of Contents": [[38, "table-of-contents"]], "is_primary_data knowledge reinforcement": [[38, "is-primary-data-knowledge-reinforcement"]], "Example notebook/vignette": [[38, "example-notebook-vignette"]], "Python tutorials": [[39, "python-tutorials"]], "Exporting data": [[39, "exporting-data"]], "[NEW! \ud83d\ude80] Using integrated embeddings and models": [[39, "new-using-integrated-embeddings-and-models"]], "Understanding Census data": [[39, "understanding-census-data"]], "Analyzing Census data": [[39, "analyzing-census-data"]], "Scalable computing": [[39, "scalable-computing"]], "Scalable machine learning": [[39, "scalable-machine-learning"]], "Learning about the CZ CELLxGENE Census": [[41, "Learning-about-the-CZ-CELLxGENE-Census"]], "Opening the Census": [[41, "Opening-the-Census"], [48, "Opening-the-Census"], [52, "Opening-the-Census"]], "Census organization": [[41, "Census-organization"]], "Main Census components": [[41, "Main-Census-components"]], "Census summary info": [[41, "Census-summary-info"]], "Census data": [[41, "Census-data"]], "Gene metadata": [[41, "Gene-metadata"]], "Census summary content tables": [[41, "Census-summary-content-tables"]], "Cell counts by cell metadata": [[41, "Cell-counts-by-cell-metadata"]], "Example: cell metadata included in the summary counts table": [[41, "Example:-cell-metadata-included-in-the-summary-counts-table"]], "Example: cell counts for each sequencing assay in human data": [[41, "Example:-cell-counts-for-each-sequencing-assay-in-human-data"]], "Example: number of microglial cells in the Census": [[41, "Example:-number-of-microglial-cells-in-the-Census"]], "Understanding Census contents beyond the summary tables": [[41, "Understanding-Census-contents-beyond-the-summary-tables"]], "Example: all cell types available in human": [[41, "Example:-all-cell-types-available-in-human"]], "Example: cell types available in human liver": [[41, "Example:-cell-types-available-in-human-liver"]], "Example: diseased T cells in human tissues": [[41, "Example:-diseased-T-cells-in-human-tissues"]], "Integrating multi-dataset slices of data": [[42, "Integrating-multi-dataset-slices-of-data"]], "Finding and fetching data from mouse liver (10X Genomics and Smart-Seq2)": [[42, "Finding-and-fetching-data-from-mouse-liver-(10X-Genomics-and-Smart-Seq2)"]], "Gene-length normalization of Smart-Seq2 data.": [[42, "Gene-length-normalization-of-Smart-Seq2-data."]], "Integration with scvi-tools": [[42, "Integration-with-scvi-tools"]], "Inspecting data prior to integration": [[42, "Inspecting-data-prior-to-integration"]], "Data integration with scVI": [[42, "Data-integration-with-scVI"]], "Integration with batch defined as dataset_id": [[42, "Integration-with-batch-defined-as-dataset_id"]], "Integration with batch defined as dataset_id + donor_id": [[42, "Integration-with-batch-defined-as-dataset_id-+-donor_id"]], "Integration with batch defined as dataset_id + donor_id + assay_ontology_term_id + suspension_type": [[42, "Integration-with-batch-defined-as-dataset_id-+-donor_id-+-assay_ontology_term_id-+-suspension_type"]], "Exploring biologically relevant clusters in Census embeddings": [[43, "Exploring-biologically-relevant-clusters-in-Census-embeddings"]], "Background": [[43, "Background"], [55, "Background"]], "Imports and function definitions": [[43, "Imports-and-function-definitions"]], "Melanocytes in eye": [[43, "Melanocytes-in-eye"]], "Sample and fetch 150k cells from eye tissue": [[43, "Sample-and-fetch-150k-cells-from-eye-tissue"]], "Observations": [[43, "Observations"], [43, "id1"], [43, "id2"]], "Retinal bipolar neurons in eye": [[43, "Retinal-bipolar-neurons-in-eye"]], "Dopaminergic neurons in brain": [[43, "Dopaminergic-neurons-in-brain"]], "Sample and fetch 150k cells from brain tissue": [[43, "Sample-and-fetch-150k-cells-from-brain-tissue"]], "Pulmonary ionocytes in lung (Tabula Sapiens)": [[43, "Pulmonary-ionocytes-in-lung-(Tabula-Sapiens)"]], "Fetch lung cells from Tabula Sapiens": [[43, "Fetch-lung-cells-from-Tabula-Sapiens"]], "Exploring all data from a tissue": [[44, "Exploring-all-data-from-a-tissue"]], "Learning about the lung data in the Census": [[44, "Learning-about-the-lung-data-in-the-Census"]], "Learning about cells of lung data": [[44, "Learning-about-cells-of-lung-data"]], "Datasets": [[44, "Datasets"]], "Disease": [[44, "Disease"]], "Sex": [[44, "Sex"]], "Cell vs nucleus": [[44, "Cell-vs-nucleus"]], "Cell types": [[44, "Cell-types"]], "Sub-tissues": [[44, "Sub-tissues"]], "Learning about genes of lung data": [[44, "Learning-about-genes-of-lung-data"]], "Summary of lung metadata": [[44, "Summary-of-lung-metadata"]], "Fetching all single-cell human lung data from the Census": [[44, "Fetching-all-single-cell-human-lung-data-from-the-Census"]], "Calculating QC metrics of the lung data": [[44, "Calculating-QC-metrics-of-the-lung-data"]], "Creating a normalized expression layer and embeddings": [[44, "Creating-a-normalized-expression-layer-and-embeddings"]], "Geneformer for cell class prediction and data projection": [[45, "Geneformer-for-cell-class-prediction-and-data-projection"]], "System requirements": [[45, "System-requirements"], [47, "System-requirements"]], "Downloading example data": [[45, "Downloading-example-data"], [47, "Downloading-example-data"]], "Downloading the fine-tuned Geneformer model": [[45, "Downloading-the-fine-tuned-Geneformer-model"]], "Importing required packages": [[45, "Importing-required-packages"]], "Preparing data and model": [[45, "Preparing-data-and-model"]], "Preparing single-cell data": [[45, "Preparing-single-cell-data"]], "Preparing data from model": [[45, "Preparing-data-from-model"]], "Using the Geneformer fine-tuned model for cell subclass inference": [[45, "Using-the-Geneformer-fine-tuned-model-for-cell-subclass-inference"]], "Loading tokenized data": [[45, "Loading-tokenized-data"]], "Performing inference of cell subclass": [[45, "Performing-inference-of-cell-subclass"]], "Inspecting inference results": [[45, "Inspecting-inference-results"]], "Using the Geneformer fine-tuned model for data projection": [[45, "Using-the-Geneformer-fine-tuned-model-for-data-projection"]], "Generating Geneformer embeddings for 10X PBMC 3K data": [[45, "Generating-Geneformer-embeddings-for-10X-PBMC-3K-data"]], "Joining Geneformer embeddings from 10X PBMC 3K data with other Census datasets": [[45, "Joining-Geneformer-embeddings-from-10X-PBMC-3K-data-with-other-Census-datasets"]], "Normalizing full-length gene sequencing data": [[46, "Normalizing-full-length-gene-sequencing-data"]], "Opening the census": [[46, "Opening-the-census"], [57, "Opening-the-census"]], "Fetching full-length example sequencing data (Smart-Seq)": [[46, "Fetching-full-length-example-sequencing-data-(Smart-Seq)"]], "Normalizing expression to account for gene length": [[46, "Normalizing-expression-to-account-for-gene-length"]], "Validation through clustering exploration": [[46, "Validation-through-clustering-exploration"]], "scVI for cell type prediction and data projection": [[47, "scVI-for-cell-type-prediction-and-data-projection"]], "Downloading the trained scVI model": [[47, "Downloading-the-trained-scVI-model"]], "Using the scVI pretrained model for data projection": [[47, "Using-the-scVI-pretrained-model-for-data-projection"]], "Using the scVI pretrained model for cell cell type inference.": [[47, "Using-the-scVI-pretrained-model-for-cell-cell-type-inference."]], "Summarizing cell and gene metadata": [[48, "Summarizing-cell-and-gene-metadata"]], "Summarizing cell metadata": [[48, "Summarizing-cell-metadata"]], "Example: Summarize all cell types": [[48, "Example:-Summarize-all-cell-types"]], "Example: Summarize a subset of cell types, selected with a value_filter": [[48, "Example:-Summarize-a-subset-of-cell-types,-selected-with-a-value_filter"]], "Full Census metadata stats": [[48, "Full-Census-metadata-stats"]], "Access CELLxGENE collaboration embeddings (scVI, Geneformer)": [[49, "Access-CELLxGENE-collaboration-embeddings-(scVI,-Geneformer)"]], "Storage format": [[49, "Storage-format"], [55, "Storage-format"]], "Query cells and load associated embeddings": [[49, "Query-cells-and-load-associated-embeddings"], [55, "Query-cells-and-load-associated-embeddings"]], "Loading embeddings into an AnnData obsm slot": [[49, "Loading-embeddings-into-an-AnnData-obsm-slot"]], "AnnData embeddings via cellxgene_census.get_anndata()": [[49, "AnnData-embeddings-via-cellxgene_census.get_anndata()"], [55, "AnnData-embeddings-via-cellxgene_census.get_anndata()"]], "AnnData embeddings via ExperimentAxisQuery": [[49, "AnnData-embeddings-via-ExperimentAxisQuery"], [55, "AnnData-embeddings-via-ExperimentAxisQuery"]], "Load an embedding into a dense NumPy array": [[49, "Load-an-embedding-into-a-dense-NumPy-array"], [55, "Load-an-embedding-into-a-dense-NumPy-array"]], "Generating citations for Census slices": [[50, "Generating-citations-for-Census-slices"]], "Generating citation strings": [[50, "Generating-citation-strings"]], "Via cell metadata query": [[50, "Via-cell-metadata-query"]], "Via AnnData query": [[50, "Via-AnnData-query"]], "Computing on X using online (incremental) algorithms": [[51, "Computing-on-X-using-online-(incremental)-algorithms"]], "Incremental count and mean calculation.": [[51, "Incremental-count-and-mean-calculation."]], "Incremental variance calculation": [[51, "Incremental-variance-calculation"]], "Counting cells per gene, grouped by dataset_id": [[51, "Counting-cells-per-gene,-grouped-by-dataset_id"]], "Genes measured in each cell (dataset presence matrix)": [[52, "Genes-measured-in-each-cell-(dataset-presence-matrix)"]], "Fetching the IDs of the Census datasets": [[52, "Fetching-the-IDs-of-the-Census-datasets"]], "Fetching the dataset presence matrix": [[52, "Fetching-the-dataset-presence-matrix"]], "Identifying genes measured in a specific dataset.": [[52, "Identifying-genes-measured-in-a-specific-dataset."]], "Identifying datasets that measured specific genes": [[52, "Identifying-datasets-that-measured-specific-genes"]], "Identifying all genes measured in a dataset": [[52, "Identifying-all-genes-measured-in-a-dataset"]], "Exploring the Census Datasets table": [[53, "Exploring-the-Census-Datasets-table"]], "Fetching the datasets table": [[53, "Fetching-the-datasets-table"]], "Fetching the expression data from a single dataset": [[53, "Fetching-the-expression-data-from-a-single-dataset"]], "Downloading the original source H5AD file of a dataset.": [[53, "Downloading-the-original-source-H5AD-file-of-a-dataset."]], "Understanding and filtering out duplicate cells": [[54, "Understanding-and-filtering-out-duplicate-cells"]], "Why are there duplicate cells in the Census?": [[54, "Why-are-there-duplicate-cells-in-the-Census?"]], "An example: duplicate cells in the Tabula Muris Senis data": [[54, "An-example:-duplicate-cells-in-the-Tabula-Muris-Senis-data"]], "Filtering out duplicate cells": [[54, "Filtering-out-duplicate-cells"]], "Filtering out duplicate cells when reading the obs data frame.": [[54, "Filtering-out-duplicate-cells-when-reading-the-obs-data-frame."]], "Filtering out duplicate cells when creating an AnnData": [[54, "Filtering-out-duplicate-cells-when-creating-an-AnnData"]], "Filtering out duplicate cells for out-of-core operations.": [[54, "Filtering-out-duplicate-cells-for-out-of-core-operations."]], "Access CELLxGENE-hosted embeddings": [[55, "Access-CELLxGENE-hosted-embeddings"]], "Contents": [[55, "Contents"]], "Load an embedding into an AnnData obsm slot": [[55, "Load-an-embedding-into-an-AnnData-obsm-slot"]], "Load embeddings and fetch associated Census data": [[55, "Load-embeddings-and-fetch-associated-Census-data"]], "Embedding Metadata": [[55, "Embedding-Metadata"]], "Querying data using the gget cellxgene module": [[56, "Querying-data-using-the-gget-cellxgene-module"]], "Install gget and set up cellxgene module": [[56, "Install-gget-and-set-up-cellxgene-module"]], "Fetch an AnnData object by selecting gene(s), tissue(s) and cell type(s)": [[56, "Fetch-an-AnnData-object-by-selecting-gene(s),-tissue(s)-and-cell-type(s)"]], "Plot a dot plot similar to those shown on the CZ CELLxGENE Discover Gene Expression": [[56, "Plot-a-dot-plot-similar-to-those-shown-on-the-CZ-CELLxGENE-Discover-Gene-Expression"]], "Fetch only cell metadata (corresponds to AnnData.obs)": [[56, "Fetch-only-cell-metadata-(corresponds-to-AnnData.obs)"]], "Use gget cellxgene from the command line": [[56, "Use-gget-cellxgene-from-the-command-line"]], "Querying and fetching the single-cell data and cell/gene metadata.": [[57, "Querying-and-fetching-the-single-cell-data-and-cell/gene-metadata."]], "Querying expression data": [[57, "Querying-expression-data"]], "Querying cell metadata (obs)": [[57, "Querying-cell-metadata-(obs)"]], "Querying gene metadata (var)": [[57, "Querying-gene-metadata-(var)"]], "Exploring pre-calculated summary cell counts": [[58, "Exploring-pre-calculated-summary-cell-counts"]], "Fetching the census_summary_cell_counts dataframe": [[58, "Fetching-the-census_summary_cell_counts-dataframe"]], "Creating summary counts beyond pre-calculated values.": [[58, "Creating-summary-counts-beyond-pre-calculated-values."]], "Experimental Highly Variable Genes API": [[59, "Experimental-Highly-Variable-Genes-API"]], "get_highly_variable_genes": [[59, "get_highly_variable_genes"]], "highly_variable_genes": [[59, "highly_variable_genes"]], "Out-of-core (incremental) mean and variance calculation": [[60, "Out-of-core-(incremental)-mean-and-variance-calculation"]], "The mean and variance API": [[60, "The-mean-and-variance-API"]], "Example: calculate mean and variance for a slice of the Census": [[60, "Example:-calculate-mean-and-variance-for-a-slice-of-the-Census"]], "Training a PyTorch Model": [[61, "Training-a-PyTorch-Model"]], "Open the Census": [[61, "Open-the-Census"]], "Create an ExperimentDataPipe": [[61, "Create-an-ExperimentDataPipe"]], "ExperimentDataPipe class explained": [[61, "ExperimentDataPipe-class-explained"]], "ExperimentDataPipe parameters explained": [[61, "ExperimentDataPipe-parameters-explained"]], "Split the dataset": [[61, "Split-the-dataset"]], "Create the DataLoader": [[61, "Create-the-DataLoader"]], "Define the model": [[61, "Define-the-model"]], "Train the model": [[61, "Train-the-model"]], "Make predictions with the model": [[61, "Make-predictions-with-the-model"]], "Python API": [[62, "module-cellxgene_census"]], "Open/retrieve Cell Census data": [[62, "open-retrieve-cell-census-data"]], "Get slice as AnnData": [[62, "get-slice-as-anndata"]], "Feature presence matrix": [[62, "feature-presence-matrix"]], "Versioning of Cell Census builds": [[62, "versioning-of-cell-census-builds"]], "Experimental: Machine Learning": [[62, "experimental-machine-learning"]], "Experimental: Processing": [[62, "experimental-processing"]], "Experimental: Embeddings": [[62, "experimental-embeddings"]], "Dependencies": [[63, "dependencies"]], "Set up Python environment": [[63, "set-up-python-environment"]], "Verify your installation": [[63, "verify-your-installation"]], "Latest development version": [[63, "latest-development-version"]], "What is SOMA": [[64, "what-is-soma"]]}, "indexentries": {"download_source_h5ad() (in module cellxgene_census)": [[1, "cellxgene_census.download_source_h5ad"]], "get_all_available_embeddings() (in module cellxgene_census.experimental)": [[2, "cellxgene_census.experimental.get_all_available_embeddings"]], "get_all_census_versions_with_embedding() (in module cellxgene_census.experimental)": [[3, "cellxgene_census.experimental.get_all_census_versions_with_embedding"]], "get_embedding() (in module cellxgene_census.experimental)": [[4, "cellxgene_census.experimental.get_embedding"]], "get_embedding_metadata() (in module cellxgene_census.experimental)": [[5, "cellxgene_census.experimental.get_embedding_metadata"]], "get_embedding_metadata_by_name() (in module cellxgene_census.experimental)": [[6, "cellxgene_census.experimental.get_embedding_metadata_by_name"]], "celldatasetbuilder (class in cellxgene_census.experimental.ml.huggingface)": [[7, "cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder"]], "__init__() (cellxgene_census.experimental.ml.huggingface.celldatasetbuilder method)": [[7, "cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder.__init__"]], "geneformertokenizer (class in cellxgene_census.experimental.ml.huggingface)": [[8, "cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer"]], "__init__() (cellxgene_census.experimental.ml.huggingface.geneformertokenizer method)": [[8, "cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer.__init__"]], "experimentdatapipe (class in cellxgene_census.experimental.ml.pytorch)": [[9, "cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe"]], "__init__() (cellxgene_census.experimental.ml.pytorch.experimentdatapipe method)": [[9, "cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe.__init__"]], "stats (class in cellxgene_census.experimental.ml.pytorch)": [[10, "cellxgene_census.experimental.ml.pytorch.Stats"]], "__init__() (cellxgene_census.experimental.ml.pytorch.stats method)": [[10, "cellxgene_census.experimental.ml.pytorch.Stats.__init__"]], "experiment_dataloader() (in module cellxgene_census.experimental.ml.pytorch)": [[11, "cellxgene_census.experimental.ml.pytorch.experiment_dataloader"]], "get_highly_variable_genes() (in module cellxgene_census.experimental.pp)": [[12, "cellxgene_census.experimental.pp.get_highly_variable_genes"]], "highly_variable_genes() (in module cellxgene_census.experimental.pp)": [[13, "cellxgene_census.experimental.pp.highly_variable_genes"]], "mean_variance() (in module cellxgene_census.experimental.pp)": [[14, "cellxgene_census.experimental.pp.mean_variance"]], "get_anndata() (in module cellxgene_census)": [[15, "cellxgene_census.get_anndata"]], "get_census_version_description() (in module cellxgene_census)": [[16, "cellxgene_census.get_census_version_description"]], "get_census_version_directory() (in module cellxgene_census)": [[17, "cellxgene_census.get_census_version_directory"]], "get_default_soma_context() (in module cellxgene_census)": [[18, "cellxgene_census.get_default_soma_context"]], "get_presence_matrix() (in module cellxgene_census)": [[19, "cellxgene_census.get_presence_matrix"]], "get_source_h5ad_uri() (in module cellxgene_census)": [[20, "cellxgene_census.get_source_h5ad_uri"]], "open_soma() (in module cellxgene_census)": [[21, "cellxgene_census.open_soma"]], "cellxgene_census": [[62, "module-cellxgene_census"]], "module": [[62, "module-cellxgene_census"]]}})
            \ No newline at end of file
            +Search.setIndex({"docnames": ["README", "_autosummary/cellxgene_census.download_source_h5ad", "_autosummary/cellxgene_census.experimental.get_all_available_embeddings", "_autosummary/cellxgene_census.experimental.get_all_census_versions_with_embedding", "_autosummary/cellxgene_census.experimental.get_embedding", "_autosummary/cellxgene_census.experimental.get_embedding_metadata", "_autosummary/cellxgene_census.experimental.get_embedding_metadata_by_name", "_autosummary/cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder", "_autosummary/cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer", "_autosummary/cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe", "_autosummary/cellxgene_census.experimental.ml.pytorch.Stats", "_autosummary/cellxgene_census.experimental.ml.pytorch.experiment_dataloader", "_autosummary/cellxgene_census.experimental.pp.get_highly_variable_genes", "_autosummary/cellxgene_census.experimental.pp.highly_variable_genes", "_autosummary/cellxgene_census.experimental.pp.mean_variance", "_autosummary/cellxgene_census.get_anndata", "_autosummary/cellxgene_census.get_census_version_description", "_autosummary/cellxgene_census.get_census_version_directory", "_autosummary/cellxgene_census.get_default_soma_context", "_autosummary/cellxgene_census.get_presence_matrix", "_autosummary/cellxgene_census.get_source_h5ad_uri", "_autosummary/cellxgene_census.open_soma", "articles", "articles/2023/20230808-r_api_release", "articles/2023/20230919-out_of_core_methods", "articles/2023/20231012-normalized_layer_precalc_stats", "articles/2024/20240404-categoricals", "cellxgene_census_aws_open_data", "cellxgene_census_docsite_FAQ", "cellxgene_census_docsite_data_release_info", "cellxgene_census_docsite_installation", "cellxgene_census_docsite_landing", "cellxgene_census_docsite_quick_start", "cellxgene_census_docsite_schema", "cellxgene_census_mirroring", "cellxgene_census_schema", "cellxgene_census_storage_and_release_policy", "census_article_guidelines", "census_notebook_guidelines", "examples", "index", "notebooks/analysis_demo/comp_bio_census_info", "notebooks/analysis_demo/comp_bio_data_integration_scvi", "notebooks/analysis_demo/comp_bio_embedding_exploration", "notebooks/analysis_demo/comp_bio_explore_and_load_lung_data", "notebooks/analysis_demo/comp_bio_geneformer_prediction", "notebooks/analysis_demo/comp_bio_normalizing_full_gene_sequencing", "notebooks/analysis_demo/comp_bio_scvi_model_use", "notebooks/analysis_demo/comp_bio_summarize_axis_query", "notebooks/api_demo/census_access_maintained_embeddings", "notebooks/api_demo/census_citation_generation", "notebooks/api_demo/census_compute_over_X", "notebooks/api_demo/census_dataset_presence", "notebooks/api_demo/census_datasets", "notebooks/api_demo/census_duplicated_cells", "notebooks/api_demo/census_embedding", "notebooks/api_demo/census_gget_demo", "notebooks/api_demo/census_query_extract", "notebooks/api_demo/census_summary_cell_counts", "notebooks/experimental/highly_variable_genes", "notebooks/experimental/mean_variance", "notebooks/experimental/pytorch", "python-api", "setup", "soma"], "filenames": ["README.md", "_autosummary/cellxgene_census.download_source_h5ad.rst", "_autosummary/cellxgene_census.experimental.get_all_available_embeddings.rst", "_autosummary/cellxgene_census.experimental.get_all_census_versions_with_embedding.rst", "_autosummary/cellxgene_census.experimental.get_embedding.rst", "_autosummary/cellxgene_census.experimental.get_embedding_metadata.rst", "_autosummary/cellxgene_census.experimental.get_embedding_metadata_by_name.rst", "_autosummary/cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder.rst", "_autosummary/cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer.rst", "_autosummary/cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe.rst", "_autosummary/cellxgene_census.experimental.ml.pytorch.Stats.rst", "_autosummary/cellxgene_census.experimental.ml.pytorch.experiment_dataloader.rst", "_autosummary/cellxgene_census.experimental.pp.get_highly_variable_genes.rst", "_autosummary/cellxgene_census.experimental.pp.highly_variable_genes.rst", "_autosummary/cellxgene_census.experimental.pp.mean_variance.rst", "_autosummary/cellxgene_census.get_anndata.rst", "_autosummary/cellxgene_census.get_census_version_description.rst", "_autosummary/cellxgene_census.get_census_version_directory.rst", "_autosummary/cellxgene_census.get_default_soma_context.rst", "_autosummary/cellxgene_census.get_presence_matrix.rst", "_autosummary/cellxgene_census.get_source_h5ad_uri.rst", "_autosummary/cellxgene_census.open_soma.rst", "articles.rst", "articles/2023/20230808-r_api_release.md", "articles/2023/20230919-out_of_core_methods.md", "articles/2023/20231012-normalized_layer_precalc_stats.md", "articles/2024/20240404-categoricals.md", "cellxgene_census_aws_open_data.md", "cellxgene_census_docsite_FAQ.md", "cellxgene_census_docsite_data_release_info.md", "cellxgene_census_docsite_installation.md", "cellxgene_census_docsite_landing.md", "cellxgene_census_docsite_quick_start.md", "cellxgene_census_docsite_schema.md", "cellxgene_census_mirroring.md", "cellxgene_census_schema.md", "cellxgene_census_storage_and_release_policy.md", "census_article_guidelines.md", "census_notebook_guidelines.md", "examples.rst", "index.rst", "notebooks/analysis_demo/comp_bio_census_info.ipynb", "notebooks/analysis_demo/comp_bio_data_integration_scvi.ipynb", "notebooks/analysis_demo/comp_bio_embedding_exploration.ipynb", "notebooks/analysis_demo/comp_bio_explore_and_load_lung_data.ipynb", "notebooks/analysis_demo/comp_bio_geneformer_prediction.ipynb", "notebooks/analysis_demo/comp_bio_normalizing_full_gene_sequencing.ipynb", "notebooks/analysis_demo/comp_bio_scvi_model_use.ipynb", "notebooks/analysis_demo/comp_bio_summarize_axis_query.ipynb", "notebooks/api_demo/census_access_maintained_embeddings.ipynb", "notebooks/api_demo/census_citation_generation.ipynb", "notebooks/api_demo/census_compute_over_X.ipynb", "notebooks/api_demo/census_dataset_presence.ipynb", "notebooks/api_demo/census_datasets.ipynb", "notebooks/api_demo/census_duplicated_cells.ipynb", "notebooks/api_demo/census_embedding.ipynb", "notebooks/api_demo/census_gget_demo.ipynb", "notebooks/api_demo/census_query_extract.ipynb", "notebooks/api_demo/census_summary_cell_counts.ipynb", "notebooks/experimental/highly_variable_genes.ipynb", "notebooks/experimental/mean_variance.ipynb", "notebooks/experimental/pytorch.ipynb", "python-api.rst", "setup.rst", "soma.rst"], "titles": ["API Documentation", "cellxgene_census.download_source_h5ad", "cellxgene_census.experimental.get_all_available_embeddings", "cellxgene_census.experimental.get_all_census_versions_with_embedding", "cellxgene_census.experimental.get_embedding", "cellxgene_census.experimental.get_embedding_metadata", "cellxgene_census.experimental.get_embedding_metadata_by_name", "cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder", "cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer", "cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe", "cellxgene_census.experimental.ml.pytorch.Stats", "cellxgene_census.experimental.ml.pytorch.experiment_dataloader", "cellxgene_census.experimental.pp.get_highly_variable_genes", "cellxgene_census.experimental.pp.highly_variable_genes", "cellxgene_census.experimental.pp.mean_variance", "cellxgene_census.get_anndata", "cellxgene_census.get_census_version_description", "cellxgene_census.get_census_version_directory", "cellxgene_census.get_default_soma_context", "cellxgene_census.get_presence_matrix", "cellxgene_census.get_source_h5ad_uri", "cellxgene_census.open_soma", "What\u2019s new?", "R package cellxgene.census V1 is out!", "Memory-efficient implementations of commonly used single-cell methods", "Introducing a normalized layer and pre-calculated cell and gene statistics in Census", "Census supports categoricals for cell metadata", "CZ CELLxGENE Discover Census in AWS", "FAQ", "Census data releases", "Installation", "CZ CELLxGENE Discover Census", "Quick start", "Census data and schema", "CELLxGENE Census Mirroring", "CZ CELLxGENE Discover Census Schema", "CZ CELLxGENE Discover Census storage & release policy", "Census \u201cwhat\u2019s new?\u201d article editorial guidelines", "Census API notebook/vignette editorial guidelines", "Python tutorials", "CZ CELLxGENE Discover Census", "Learning about the CZ CELLxGENE Census", "Integrating multi-dataset slices of data", "Exploring biologically relevant clusters in Census embeddings", "Exploring all data from a tissue", "Geneformer for cell class prediction and data projection", "Normalizing full-length gene sequencing data", "scVI for cell type prediction and data projection", "Summarizing cell and gene metadata", "Access CELLxGENE collaboration embeddings (scVI, Geneformer)", "Generating citations for Census slices", "Computing on X using online (incremental) algorithms", "Genes measured in each cell (dataset presence matrix)", "Exploring the Census Datasets table", "Understanding and filtering out duplicate cells", "Access CELLxGENE-hosted embeddings", "Querying data using the gget cellxgene module", "Querying and fetching the single-cell data and cell/gene metadata.", "Exploring pre-calculated summary cell counts", "Experimental Highly Variable Genes API", "Out-of-core (incremental) mean and variance calculation", "Training a PyTorch Model", "Python API", "Installation", "What is SOMA"], "terms": {"The": [0, 1, 2, 3, 4, 5, 6, 7, 9, 12, 13, 15, 16, 17, 18, 19, 20, 21, 23, 24, 25, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56, 57, 58, 59, 61, 62, 64], "websit": 0, "i": [0, 1, 3, 4, 6, 9, 10, 12, 13, 14, 15, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63], "current": [0, 17, 24, 25, 31, 32, 34, 40, 41, 42, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 64], "host": [0, 27, 29, 30, 31, 34, 35, 36, 38, 39, 40, 43, 45, 47, 49, 62, 63, 64], "http": [0, 8, 13, 17, 28, 30, 34, 38, 42, 44, 45, 46, 47, 50, 55, 56, 63], "chanzuckerberg": [0, 25, 30, 31, 35, 37, 38, 40, 56, 63], "github": [0, 25, 28, 31, 35, 38, 40, 55, 56, 63], "io": [0, 13, 42, 44, 46], "cellxgen": [0, 7, 8, 16, 17, 20, 22, 25, 26, 28, 29, 30, 32, 33, 37, 38, 39, 42, 43, 44, 45, 46, 47, 50, 53, 54, 62, 63, 64], "censu": [0, 1, 2, 3, 4, 6, 7, 8, 12, 15, 16, 17, 18, 19, 20, 21, 22, 24, 30, 32, 42, 47, 49, 51, 56, 58, 59, 63, 64], "site": [0, 28, 37, 38, 42, 44, 46], "rebuilt": 0, "each": [0, 2, 7, 8, 9, 17, 24, 25, 26, 28, 29, 32, 33, 34, 35, 39, 42, 43, 44, 45, 46, 48, 49, 50, 51, 53, 55, 56, 58, 59, 61, 62], "time": [0, 17, 24, 28, 35, 54, 56, 61], "tag": [0, 2, 4, 6, 27, 29, 36], "creat": [0, 12, 13, 27, 28, 31, 32, 33, 36, 38, 40, 41, 42, 45, 49, 50, 53, 55, 59], "repo": [0, 30, 62], "which": [0, 3, 4, 5, 6, 7, 9, 11, 12, 13, 14, 15, 17, 19, 21, 23, 24, 25, 26, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 59, 60, 61], "happen": [0, 17, 34], "releas": [0, 16, 17, 23, 25, 30, 32, 34, 35, 37, 41, 42, 44, 46, 48, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61], "includ": [0, 17, 24, 27, 28, 31, 37, 38, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 60, 62, 64], "regener": 0, "sphinx": 0, "python": [0, 5, 9, 23, 24, 25, 26, 29, 31, 37, 38, 40, 41, 45, 48, 50, 52, 55, 56, 57, 64], "doc": [0, 23, 28, 37, 38, 42, 61], "r": [0, 22, 25, 26, 28, 29, 31, 37, 38, 40, 44, 64], "pkgdown": 0, "check": [0, 23, 26, 31, 32, 34, 40, 42, 46, 52, 63], "git": [0, 8, 63], "simpli": [0, 28, 45, 63], "copi": [0, 18, 27, 36, 42, 43, 44, 46, 47], "dure": [0, 42, 45], "rebuild": 0, "see": [0, 9, 13, 24, 25, 26, 27, 28, 30, 32, 33, 35, 42, 43, 44, 45, 46, 54, 55, 56, 57, 59, 61, 62], "vignettes_": 0, "further": [0, 18, 25, 37, 43, 48, 55], "explan": [0, 37, 38, 54], "A": [0, 2, 3, 4, 5, 6, 9, 11, 13, 14, 17, 18, 19, 20, 21, 27, 29, 31, 32, 33, 34, 35, 36, 37, 39, 40, 41, 42, 44, 45, 46, 47, 52, 53, 54, 55, 56, 57], "docsit": 0, "can": [0, 9, 11, 12, 13, 18, 21, 23, 24, 25, 26, 27, 29, 31, 32, 33, 34, 35, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 63], "trigger": 0, "manual": 0, "through": [0, 7, 30, 31, 40, 47, 55, 57, 61], "workflow_dispatch": 0, "run": [0, 42, 43, 45, 47, 56, 61, 63], "workflow": [0, 39, 45], "thi": [0, 1, 8, 9, 10, 11, 13, 15, 17, 20, 23, 24, 25, 26, 27, 28, 29, 32, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63], "should": [0, 27, 29, 35, 36, 37, 38, 41, 42, 43, 44, 46, 48, 53, 61, 63], "done": [0, 12, 13, 14, 24, 27, 35, 44, 46, 59, 61], "bug": [0, 31, 40], "found": [0, 6, 19, 21, 23, 31, 40, 42, 43, 44, 46, 47, 53, 57], "necessari": [0, 7, 24, 31, 38, 40, 43], "In": [0, 4, 24, 25, 26, 29, 30, 31, 35, 38, 40, 41, 42, 43, 44, 45, 49, 51, 52, 54, 55, 59, 61, 63], "order": [0, 9, 26, 29, 38, 45, 61], "test": [0, 45, 48, 61, 63], "chang": [0, 25, 35, 36], "local": [0, 9, 27, 43, 53, 61, 63], "first": [0, 9, 19, 23, 24, 30, 32, 42, 44, 45, 46, 48, 49, 50, 51, 52, 54, 55, 61], "instal": [0, 8, 37], "requir": [0, 8, 9, 27, 35, 36, 44, 48, 49, 55, 56, 61], "pip": [0, 8, 28, 30, 56, 63], "txt": 0, "brew": 0, "pandoc": 0, "mac": 0, "o": [0, 18, 45, 47, 56], "Then": [0, 42, 45, 46, 49, 50, 55, 61], "And": [0, 25, 27, 32, 41, 42, 44, 45, 46, 49, 50, 54, 55, 57], "follow": [0, 23, 24, 25, 27, 28, 29, 30, 31, 32, 33, 35, 36, 37, 38, 40, 41, 43, 44, 45, 47, 54, 55, 57, 60, 61, 63], "command": [0, 28, 41, 45, 47], "cd": [0, 63], "make": [0, 23, 30, 35, 42, 44, 45, 46, 51, 63], "html": [0, 13, 28, 42, 44, 46], "gener": [0, 8, 10, 12, 13, 24, 28, 29, 31, 35, 39, 40, 41, 42, 43, 55, 56], "_build": 0, "index": [0, 4, 9, 12, 14, 15, 19, 33, 35, 43, 45, 47, 51, 52, 53, 59, 60], "dataset_id": [1, 13, 20, 24, 26, 35, 38, 41, 43, 44, 45, 46, 47, 49, 50, 52, 53, 54, 55, 56, 57, 60], "str": [1, 2, 3, 4, 5, 6, 7, 8, 9, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 35, 42, 43, 45, 56], "to_path": [1, 53], "census_vers": [1, 2, 4, 6, 8, 12, 16, 17, 20, 21, 25, 26, 29, 32, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61], "stabl": [1, 12, 13, 17, 20, 21, 23, 29, 30, 32, 41, 42, 44, 46, 48, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61], "none": [1, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 15, 16, 17, 18, 21, 35, 43, 44, 45, 51, 56], "download": [1, 24, 28, 49, 55, 62], "sourc": [1, 20, 21, 27, 30, 35, 36, 55, 56, 61, 63], "h5ad": [1, 16, 17, 20, 21, 27, 35, 36, 42, 45, 46, 50, 52, 56, 62], "dataset": [1, 7, 8, 12, 13, 19, 23, 25, 27, 29, 31, 33, 37, 38, 39, 40, 41, 43, 46, 47, 48, 49, 50, 51, 54, 55, 56, 57, 58], "given": [1, 2, 9, 16, 24, 27, 29, 35, 36, 43, 44, 49, 51, 52, 53, 55, 61], "user": [1, 12, 13, 18, 20, 23, 24, 25, 26, 27, 28, 31, 33, 34, 37, 38, 40, 42, 44, 45, 46, 47, 51, 59, 61], "specifi": [1, 3, 6, 8, 9, 12, 13, 14, 17, 18, 21, 25, 27, 29, 32, 34, 36, 41, 42, 44, 46, 48, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61], "file": [1, 8, 21, 27, 28, 29, 34, 35, 36, 41, 45, 47, 48, 56], "name": [1, 3, 6, 7, 9, 12, 13, 16, 17, 20, 27, 29, 32, 33, 35, 36, 37, 41, 42, 43, 44, 46, 48, 50, 51, 54, 55, 56, 57, 59, 62], "paramet": [1, 2, 3, 4, 5, 6, 7, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 42, 44, 46, 55], "fetch": [1, 4, 9, 12, 13, 15, 23, 37, 38, 39, 45, 47, 49, 50, 54, 61], "origin": [1, 9, 25, 35, 43, 44, 45, 47, 54, 61], "associ": [1, 3, 6, 35, 39, 44, 45], "where": [1, 9, 14, 34, 35, 36, 38, 42, 43, 44, 46, 48, 49, 51, 54, 55, 59, 60, 61], "written": [1, 12, 15, 37], "must": [1, 9, 12, 13, 30, 32, 35, 36, 37, 38, 43, 54, 63], "alreadi": [1, 43, 47], "exist": [1, 20, 26, 27, 28, 31, 34, 36, 40, 41, 44, 45, 54], "version": [1, 2, 3, 4, 6, 15, 16, 17, 20, 21, 23, 25, 28, 30, 34, 38, 41, 42, 43, 45, 47, 48, 49, 50, 52, 54, 55, 56, 57], "default": [1, 3, 4, 5, 6, 7, 8, 9, 12, 14, 15, 17, 18, 20, 21, 26, 29, 34, 42, 46, 51, 56, 60, 61], "rais": [1, 4, 6, 11, 12, 13, 16, 19, 20, 21, 41, 48], "valueerror": [1, 4, 6, 11, 12, 13, 16, 19, 21], "path": [1, 21, 27, 35, 36, 45, 56], "e": [1, 2, 3, 4, 6, 7, 9, 13, 14, 21, 25, 27, 29, 31, 33, 34, 35, 36, 37, 40, 41, 43, 44, 45, 48, 49, 51, 52, 53, 54, 55, 56, 59, 63], "overwrit": 1, "an": [1, 2, 4, 7, 9, 11, 12, 14, 15, 17, 21, 23, 24, 25, 27, 30, 31, 32, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 48, 50, 51, 53, 57, 60, 62, 63], "lifecycl": [1, 4, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21], "matur": [1, 16, 17, 19, 20, 21, 41, 43], "get_source_h5ad_uri": [1, 21, 53], "look": [1, 21, 25, 34, 41, 42, 44, 45, 46, 47, 49, 54, 55, 56, 57, 61, 64], "up": [1, 21, 24, 47, 51, 54], "locat": [1, 18, 21, 28, 34, 36, 53, 55, 57], "exampl": [1, 2, 4, 5, 8, 12, 13, 15, 16, 17, 18, 19, 20, 21, 23, 25, 27, 28, 30, 32, 35, 36, 39, 42, 43, 44, 49, 51, 55, 56, 57, 61, 63], "8e47ed12": 1, "c658": 1, "4252": [1, 44, 52], "b126": 1, "381df8d52a3d": 1, "tmp": [1, 21], "data": [1, 4, 9, 10, 11, 12, 13, 16, 17, 20, 24, 26, 30, 32, 34, 37, 38, 43, 48, 49, 50, 51, 52, 58, 59, 60, 61, 64], "list": [2, 3, 12, 13, 15, 27, 31, 33, 34, 35, 37, 38, 40, 41, 43, 44, 45, 47, 48, 52, 56, 57, 62], "dict": [2, 5, 6, 16, 17, 18, 21, 43, 47], "ani": [2, 4, 5, 6, 7, 8, 9, 11, 12, 13, 15, 18, 21, 23, 24, 25, 27, 28, 29, 31, 32, 35, 38, 40, 41, 43, 47, 49, 50, 51, 52, 53, 55, 58, 59, 61], "return": [2, 3, 4, 5, 6, 9, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 41, 43, 48, 51, 52, 56, 57, 58, 59, 60, 61], "dictionari": [2, 5, 6, 16, 17, 18, 21, 26, 34, 36, 41, 45, 55, 57], "all": [2, 3, 8, 9, 12, 13, 15, 17, 23, 25, 26, 27, 28, 29, 31, 32, 33, 35, 36, 37, 38, 39, 40, 42, 43, 45, 46, 47, 49, 50, 51, 53, 54, 55, 56, 57, 58, 63, 64], "avail": [2, 9, 13, 15, 17, 24, 25, 28, 29, 30, 34, 36, 42, 44, 45, 49, 55, 56, 57, 59, 62], "embed": [2, 3, 4, 5, 6, 15, 31, 40, 42, 47], "g": [2, 3, 4, 6, 7, 13, 14, 21, 27, 29, 31, 33, 34, 35, 36, 37, 40, 41, 43, 45, 48, 51, 53, 55, 56, 59, 63], "2023": [2, 4, 6, 23, 24, 25, 27, 31, 32, 34, 36, 37, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61], "12": [2, 4, 6, 16, 17, 20, 21, 25, 36, 41, 42, 43, 44, 45, 46, 47, 49, 52, 54, 55, 57], "15": [2, 4, 6, 17, 27, 36, 37, 41, 42, 43, 44, 45, 46, 47, 49, 52, 54, 55, 56, 60], "contain": [2, 3, 4, 5, 6, 8, 12, 13, 15, 16, 17, 19, 20, 21, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44, 45, 46, 48, 50, 51, 52, 53, 54, 55, 56, 57, 58, 60, 61, 62], "metadata": [2, 5, 6, 8, 12, 15, 22, 24, 27, 28, 31, 33, 38, 39, 40, 42, 43, 45, 46, 47, 49, 51, 52, 53, 54, 58, 59, 61, 63], "describ": [2, 5, 6, 27, 33, 35, 36, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 57, 58, 59, 60], "experiment_nam": [2, 43, 49, 55, 60], "experiment_1": 2, "measurement_nam": [2, 7, 9, 12, 15, 19, 24, 25, 32, 43, 45, 47, 49, 50, 51, 52, 54, 55, 59, 60, 61], "rna": [2, 7, 12, 15, 19, 24, 25, 28, 31, 32, 33, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 59, 60, 61], "organ": [2, 3, 6, 12, 15, 19, 23, 24, 25, 28, 31, 32, 33, 37, 38, 39, 40, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 53, 54, 55, 57, 58, 59, 61], "homo_sapien": [2, 7, 8, 23, 24, 25, 26, 27, 32, 33, 35, 41, 43, 44, 45, 47, 48, 49, 50, 52, 53, 54, 55, 56, 57, 58, 61], "n_embed": [2, 55], "1000": [2, 12, 13, 15, 24, 35, 42, 46], "n_featur": [2, 55], "200": [2, 4], "uri": [2, 4, 5, 16, 17, 18, 20, 21, 27, 34, 36, 43, 45, 47, 53, 62], "s3": [2, 16, 17, 18, 20, 21, 27, 28, 30, 34, 36, 43, 45, 47, 49, 53, 55, 63], "bucket": [2, 18, 21, 27, 28, 30, 35, 36], "embedding_1": 2, "embedding_nam": [3, 6, 43, 49, 55], "embedding_typ": [3, 6], "obs_embed": [3, 6, 15, 49, 55], "get": [3, 12, 16, 17, 21, 23, 24, 25, 26, 31, 32, 38, 40, 41, 42, 43, 44, 45, 46, 47, 49, 50, 52, 53, 54, 55, 57, 64], "specif": [3, 6, 21, 28, 29, 31, 33, 35, 36, 40, 41, 43, 48, 51, 54, 57], "scvi": [3, 6, 29, 38, 39, 43, 55, 63], "type": [3, 9, 19, 23, 26, 27, 29, 32, 33, 36, 39, 42, 43, 45, 46, 51, 52, 58, 61, 63], "embedding_uri": [4, 5, 43, 49, 55], "obs_soma_joinid": [4, 55], "ndarrai": [4, 12, 15, 49, 51, 55], "dtype": [4, 9, 12, 15, 41, 42, 43, 44, 46, 48, 49, 50, 51, 54, 55, 57, 61], "int64": [4, 9, 26, 35, 41, 42, 44, 46, 48, 51, 54, 57], "arrai": [4, 12, 15, 19, 28, 31, 33, 40, 44, 46, 51, 52, 61], "context": [4, 5, 7, 18, 21, 25, 27, 32, 41, 44, 48, 54, 55], "somatiledbcontext": [4, 5, 18, 21, 55], "float32": [4, 35, 46, 49, 51, 55], "read": [4, 5, 9, 10, 15, 19, 24, 25, 27, 28, 31, 32, 33, 35, 40, 41, 42, 43, 44, 45, 46, 48, 49, 50, 51, 52, 53, 55, 57, 58, 59, 61], "cell": [4, 7, 8, 9, 12, 13, 16, 17, 20, 22, 27, 28, 31, 34, 36, 37, 38, 39, 40, 42, 46, 53, 59, 60, 61, 64], "ob": [4, 8, 9, 12, 13, 14, 15, 23, 27, 32, 33, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 58, 59, 60, 61], "dens": [4, 9, 28, 31, 33, 40], "numpi": [4, 19, 28, 31, 38, 40, 42, 43, 44, 45, 47, 51, 52, 61], "without": [4, 31, 36, 40, 43, 45, 61], "nan": [4, 43, 55, 59], "valu": [4, 8, 9, 12, 13, 14, 15, 16, 19, 23, 24, 25, 26, 28, 29, 33, 35, 36, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 54, 55, 56, 57, 59, 60, 61, 64], "us": [4, 5, 9, 10, 11, 12, 13, 14, 15, 17, 18, 21, 22, 26, 27, 29, 30, 32, 33, 34, 35, 36, 37, 38, 41, 42, 43, 44, 46, 48, 49, 50, 52, 53, 54, 55, 57, 58, 59, 60, 61, 62, 63, 64], "verifi": 4, "content": [4, 8, 27, 29, 32, 33, 34, 35, 36, 37, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 56, 57, 58, 60, 61], "from": [4, 7, 8, 9, 14, 16, 19, 24, 25, 27, 29, 30, 31, 32, 33, 34, 35, 37, 38, 39, 40, 41, 46, 47, 49, 51, 52, 54, 55, 57, 58, 59, 60, 61, 63], "same": [4, 24, 25, 32, 33, 43, 45, 46, 49, 53, 54, 55, 57, 59], "slice": [4, 12, 15, 24, 25, 28, 31, 37, 38, 39, 40, 41, 44, 49, 51, 52, 53, 55, 57, 64], "custom": [4, 5, 18, 21, 27], "tiledbsoma": [4, 5, 7, 8, 9, 12, 13, 14, 15, 18, 21, 24, 25, 27, 32, 43, 49, 51, 54, 55, 59, 60, 61, 62], "open": [4, 5, 18, 20, 21, 23, 24, 25, 27, 29, 31, 32, 38, 40, 42, 44, 45, 50, 55, 56, 59, 64], "soma": [4, 5, 9, 10, 12, 15, 16, 17, 18, 21, 23, 24, 28, 29, 31, 32, 34, 35, 36, 38, 40, 41, 43, 48, 49, 51, 52, 53, 55, 59, 60, 61, 62], "object": [4, 5, 9, 12, 15, 18, 19, 20, 21, 23, 25, 28, 31, 35, 37, 40, 41, 42, 43, 44, 48, 49, 50, 53, 55, 57, 61, 64], "option": [4, 5, 12, 13, 17, 20, 21, 27, 30, 35, 36, 53, 56, 63], "ar": [4, 6, 9, 11, 12, 13, 14, 17, 21, 23, 24, 25, 26, 27, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 55, 56, 57, 59, 60, 61, 63], "position": [4, 51, 52], "other": [4, 7, 11, 24, 25, 35, 37, 41, 43, 46, 49, 51, 52, 53, 54, 55, 57, 63], "word": [4, 35, 36, 38, 43, 51, 52, 55], "identifi": [4, 12, 13, 17, 24, 29, 34, 36, 43, 46], "correspond": [4, 14, 17, 25, 27, 35, 38, 41, 43, 44, 45, 46, 47, 48, 49, 51, 54, 55, 57], "ith": 4, "posit": [4, 9, 41, 44, 45, 51], "mismatch": 4, "obs_somaids_to_fetch": 4, "np": [4, 38, 42, 43, 44, 45, 47, 51, 55], "10": [4, 17, 28, 29, 32, 37, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 53, 54, 55, 56, 57, 60, 61], "11": [4, 17, 28, 30, 41, 42, 43, 44, 45, 46, 47, 49, 52, 53, 54, 55, 56, 57, 60, 61, 63], "emb": [4, 43, 45, 55], "shape": [4, 41, 43, 44, 49, 51, 54, 55, 61], "2": [4, 9, 16, 17, 18, 20, 21, 24, 27, 28, 29, 30, 32, 34, 36, 37, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63], "0": [4, 9, 10, 11, 12, 13, 14, 15, 24, 25, 26, 29, 32, 33, 34, 37, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 56, 57, 58, 59, 60, 61], "4": [4, 9, 24, 29, 32, 35, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 61], "02954102": 4, "1": [4, 9, 14, 18, 24, 25, 27, 29, 32, 33, 34, 36, 37, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63], "0390625": 4, "14550781": 4, "40820312": 4, "00224304": 4, "265625": 4, "05883789": 4, "7890625": 4, "get_experiment_metadata": 5, "If": [6, 7, 9, 11, 12, 13, 14, 17, 18, 21, 27, 28, 29, 30, 31, 34, 35, 36, 38, 40, 41, 44, 48, 54, 55, 56, 61, 63], "more": [6, 9, 13, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 35, 36, 40, 41, 42, 43, 45, 46, 47, 48, 49, 51, 54, 55, 56, 57, 59, 61, 62], "match": [6, 12, 45, 47, 48, 53, 55, 56, 57, 59], "queri": [6, 7, 8, 9, 12, 13, 14, 15, 19, 24, 29, 31, 35, 38, 39, 40, 41, 42, 44, 47, 48, 51, 53, 54, 58, 59, 60, 61], "most": [6, 17, 24, 28, 29, 35, 41, 42, 43, 44, 45, 48, 54, 59, 61, 64], "recent": [6, 17, 23, 29], "one": [6, 7, 12, 15, 19, 21, 28, 29, 33, 34, 35, 36, 37, 38, 41, 42, 43, 45, 47, 53, 54, 55, 56, 57, 61], "either": [6, 17, 20, 27, 28, 35, 61], "var_embed": [6, 15, 55], "class": [7, 8, 9, 10, 19, 32, 33, 39, 49, 51, 52, 55], "experi": [7, 8, 9, 12, 15, 18, 25, 33, 35, 38, 43, 48, 49, 52, 53, 55, 58, 59, 61], "layer_nam": 7, "raw": [7, 9, 12, 13, 14, 15, 24, 25, 32, 33, 41, 43, 44, 49, 51, 54, 55, 60, 61], "block_siz": 7, "int": [7, 8, 9, 10, 11, 12, 13, 14, 15, 35, 43, 47, 51], "kwarg": [7, 8], "abstract": 7, "base": [7, 9, 12, 17, 23, 24, 25, 31, 33, 35, 36, 40, 41, 43, 44, 45, 47, 49, 54, 55, 56, 57, 59, 64], "method": [7, 8, 9, 10, 11, 12, 13, 18, 22, 25, 26, 27, 28, 29, 41, 43, 46, 48, 49, 51, 53, 55, 57, 59, 61, 64], "process": [7, 8, 9, 11, 24, 25, 28, 41, 45, 51, 54], "experimentaxisqueri": [7, 8, 13, 14, 59, 60], "result": [7, 8, 9, 12, 13, 14, 17, 24, 32, 42, 43, 47, 48, 49, 51, 55, 57, 59, 60, 61], "hug": [7, 8], "face": [7, 8, 38], "item": [7, 8, 33, 41, 48, 53, 61], "repres": [7, 14, 23, 29, 33, 35, 44, 55, 60], "subclass": [7, 43], "implement": [7, 22, 28, 31, 35, 40, 51, 59, 61, 64], "cell_item": 7, "row": [7, 9, 12, 13, 14, 19, 25, 32, 35, 41, 43, 44, 49, 51, 52, 53, 55, 56, 57, 58, 59, 60, 61], "x": [7, 9, 12, 13, 14, 15, 25, 32, 33, 37, 38, 41, 42, 43, 44, 45, 46, 47, 54, 55, 60, 61], "layer": [7, 9, 12, 13, 14, 15, 22, 31, 35, 37, 40, 42, 52, 56, 60], "mai": [7, 9, 12, 15, 17, 23, 26, 28, 29, 30, 31, 35, 36, 37, 38, 40, 41, 42, 43, 51, 52, 53, 54, 55, 61], "also": [7, 9, 17, 24, 26, 27, 28, 30, 43, 45, 47, 48, 52, 53, 54, 55, 56, 57, 59, 61, 63], "overrid": [7, 18, 21], "__init__": [7, 8, 9, 10, 51, 61], "__enter__": 7, "perform": [7, 9, 17, 24, 25, 29, 30, 31, 32, 35, 40, 41, 42, 44, 46, 51, 54, 55, 57, 60, 61], "preprocess": [7, 44], "inherit": 7, "so": [7, 9, 28, 41, 42, 43, 44, 45, 46, 47, 48, 51, 52, 61], "typic": [7, 43, 61], "usag": [7, 8, 9, 24, 27, 28, 32, 37, 42, 54, 61], "would": [7, 9, 42, 54, 61], "import": [7, 8, 24, 25, 26, 27, 29, 32, 37, 38, 41, 42, 44, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63], "geneformertoken": 7, "open_soma": [7, 8, 12, 15, 18, 23, 24, 25, 26, 27, 29, 32, 34, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63], "subclassofcelldatasetbuild": 7, "census_data": [7, 8, 23, 24, 25, 26, 27, 32, 41, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 59, 60, 61], "obs_queri": [7, 8, 9, 24, 25, 32, 49, 51, 54, 55, 59, 60, 61], "tilebsoma": 7, "axisqueri": [7, 8, 9, 24, 25, 32, 49, 51, 54, 55, 59, 60, 61], "defin": [7, 8, 12, 13, 28, 33, 34, 35, 36, 38, 41, 48, 51, 56, 57], "some": [7, 8, 9, 11, 24, 26, 35, 41, 42, 43, 44, 45, 46, 47, 54, 56, 63], "subset": [7, 8, 12, 26, 42, 43, 44, 45, 46, 47, 55, 60, 61], "var_queri": [7, 9, 24, 51, 61], "builder": 7, "build": [7, 8, 12, 15, 26, 27, 28, 29, 30, 33, 35, 36, 41, 44, 49, 55, 56], "initi": [7, 32, 35, 47, 49, 54, 55], "measur": [7, 9, 12, 15, 19, 24, 25, 33, 35, 39, 44, 46, 53, 55], "number": [7, 8, 9, 11, 12, 13, 14, 17, 25, 29, 35, 44, 45, 46, 47, 49, 51, 53, 54, 55, 59, 60, 61, 62], "memori": [7, 9, 18, 22, 23, 25, 26, 28, 30, 31, 37, 39, 40, 48, 51, 53, 54, 56, 57, 61, 63, 64], "onc": [7, 12, 13, 17, 23, 29, 41, 48, 51, 61], "unspecifi": 7, "sparsendarrayread": 7, "blockwis": [7, 55], "select": [7, 12, 13, 14, 15, 25, 32, 34, 35, 38, 42, 43, 44, 45, 49, 52, 53, 54, 55, 57, 59], "pass": [7, 9, 11, 18, 42, 47, 51, 56, 57, 61], "especi": 7, "attribut": [7, 8, 9, 10, 45, 49, 55, 56, 61], "obs_column_nam": [8, 9, 23, 25, 32, 61], "sequenc": [8, 9, 12, 13, 15, 31, 33, 38, 39, 40, 42, 43, 44, 52, 53, 55], "obs_attribut": 8, "max_input_token": 8, "2048": 8, "token_dictionary_fil": 8, "gene_median_fil": 8, "geneform": [8, 29, 39, 43, 55], "token": 8, "human": [8, 23, 25, 32, 33, 35, 36, 38, 39, 42, 43, 45, 48, 49, 52, 53, 54, 57, 58], "packag": [8, 22, 28, 30, 31, 32, 37, 40, 41, 42, 43, 44, 46, 47, 48, 49, 50, 51, 52, 55, 56, 57, 59, 60, 63], "separ": [8, 35, 43, 54, 56, 59], "co": [8, 28, 31, 40], "ctheodori": 8, "8df5dc1": 8, "latest": [8, 16, 17, 21, 25, 26, 30, 36, 38, 41, 42, 48, 50, 52, 55, 56, 57], "set": [8, 9, 18, 21, 24, 25, 32, 36, 42, 43, 45, 47, 52, 59, 61], "value_filt": [8, 12, 15, 23, 24, 25, 27, 29, 32, 41, 42, 43, 44, 46, 49, 50, 51, 54, 55, 57, 58, 59, 60, 61], "is_primary_data": [8, 12, 24, 31, 33, 35, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61], "true": [8, 9, 12, 14, 17, 24, 30, 35, 36, 38, 41, 42, 43, 44, 46, 47, 48, 51, 54, 55, 56, 57, 58, 59, 60, 61], "tissue_gener": [8, 12, 15, 23, 24, 27, 32, 35, 41, 43, 44, 49, 53, 54, 55, 56, 57, 58, 59, 60, 61], "tongu": [8, 49, 52, 54, 55, 61], "soma_joinid": [8, 9, 12, 14, 15, 19, 24, 26, 29, 32, 35, 41, 42, 43, 44, 46, 49, 50, 51, 52, 53, 55, 56, 57, 58, 59, 60], "cell_type_ontology_term_id": [8, 26, 35, 41, 44, 48, 49, 53, 55, 56, 57, 58, 60], "input_id": [8, 45], "length": [8, 35, 38, 39, 41, 44, 45, 50], "datafram": [8, 9, 12, 13, 14, 15, 19, 23, 25, 26, 32, 33, 35, 41, 43, 44, 48, 49, 51, 52, 53, 55, 56, 57, 59, 60, 61], "column": [8, 9, 12, 13, 14, 15, 32, 33, 35, 41, 43, 44, 45, 48, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61], "propag": [8, 61], "maximum": [8, 9, 12, 13, 61], "input": [8, 25, 51, 57, 61], "pickl": [8, 61], "suppli": 8, "map": [8, 23, 35, 41, 44, 45, 47, 51, 52, 53], "ensembl": [8, 45, 47, 56], "gene": [8, 9, 12, 13, 22, 23, 28, 31, 32, 33, 37, 38, 39, 40, 43, 45, 47, 49, 54, 55, 61], "id": [8, 35, 41, 42, 43, 45, 46, 47, 49, 51, 55, 56], "onto": 8, "median": 8, "express": [8, 25, 28, 35, 42, 43, 47, 49, 51, 55], "By": [8, 23, 24, 25, 26, 37, 41, 46, 56], "load": [8, 11, 23, 26, 28, 31, 38, 40, 42, 44, 47, 50, 57, 61, 63], "x_name": [9, 12, 15, 25, 49, 55, 61], "batch_siz": [9, 11, 61], "shuffl": [9, 11, 61], "bool": [9, 14, 17, 43], "fals": [9, 14, 17, 18, 24, 26, 27, 35, 36, 41, 42, 43, 44, 45, 47, 54, 56, 57, 58, 59, 60], "seed": [9, 42, 61], "return_sparse_x": 9, "soma_chunk_s": [9, 61], "use_eager_fetch": 9, "torchdata": [9, 11, 61], "datapip": [9, 11, 61], "iter": [9, 11, 23, 25, 27, 32, 37, 51, 54, 61, 64], "iterdatapip": [9, 11, 61], "upon": [9, 12, 21, 29, 41, 48, 59], "along": [9, 14, 23, 25, 36, 45, 60, 61], "var": [9, 12, 13, 14, 15, 19, 24, 32, 33, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 55, 56, 59, 60, 61], "ax": [9, 14, 61], "provid": [9, 21, 27, 28, 29, 31, 32, 33, 34, 35, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 52, 53, 54, 55, 56, 58, 59, 61, 62], "over": [9, 13, 14, 21, 25, 32, 51, 55, 56, 60], "when": [9, 11, 12, 13, 26, 34, 35, 36, 43, 51, 55, 56, 58, 59, 61], "": [9, 13, 17, 23, 27, 28, 35, 38, 39, 41, 42, 43, 44, 45, 46, 47, 49, 51, 52, 54, 55, 57, 60, 61], "built": [9, 25, 31, 35, 40, 62, 64], "function": [9, 12, 13, 25, 28, 29, 41, 51, 55, 56, 58, 59, 61, 62], "batch": [9, 12, 13, 24, 38, 43, 45, 47, 51, 59, 61], "x_batch": [9, 61], "y_batch": [9, 61], "control": [9, 24, 59, 61], "tensor": [9, 61], "have": [9, 17, 23, 25, 29, 30, 31, 35, 37, 40, 41, 42, 43, 46, 47, 48, 49, 51, 52, 55, 59, 61], "rank": [9, 12, 13, 59, 61], "2415": 9, "torch": [9, 11, 61], "encod": [9, 41, 42, 48, 49, 51, 55, 61], "For": [9, 14, 23, 25, 26, 27, 28, 30, 31, 32, 34, 35, 36, 37, 40, 41, 42, 43, 44, 45, 46, 47, 48, 51, 52, 54, 55, 56, 57, 59, 61, 62, 63], "larger": [9, 28, 31, 32, 40, 43, 51, 64], "dataload": [9, 11], "3": [9, 12, 13, 24, 28, 29, 30, 32, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 63], "2416": 9, "2417": 9, "whether": [9, 56], "spars": [9, 14, 19, 28, 31, 32, 33, 38, 40, 42, 46, 51, 52, 55], "model": [9, 12, 13, 29, 31, 35, 40, 42, 43, 49, 55, 59, 64], "support": [9, 11, 22, 27, 30, 31, 33, 34, 36, 40, 43, 45, 56, 60, 61, 64], "reduc": [9, 18, 25, 44, 49, 54, 55, 61], "determin": [9, 52, 61], "element": [9, 14, 19, 51, 52, 60], "alwai": [9, 17, 26, 33, 35, 54], "panda": [9, 12, 13, 14, 19, 26, 28, 31, 32, 40, 41, 43, 44, 48, 51, 52, 53, 57, 58, 59, 60, 61], "equival": [9, 25, 49, 51, 55], "soma_dim_0": [9, 49, 51, 54, 55], "matrix": [9, 14, 19, 25, 28, 31, 32, 33, 39, 40, 41, 42, 43, 44, 46, 49, 55, 56], "remain": [9, 43], "string": [9, 12, 13, 26, 34, 35, 36, 55, 57, 61], "integ": [9, 12, 15, 33, 36, 44, 46, 51, 61], "need": [9, 26, 30, 32, 35, 38, 41, 42, 45, 47, 52, 54, 57, 63], "decod": [9, 55, 61], "obtain": [9, 24, 38, 41, 42, 44, 45, 47, 54, 57, 61], "call": [9, 12, 13, 27, 29, 32, 34, 41, 42, 44, 46, 48, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61], "its": [9, 18, 23, 25, 27, 29, 31, 33, 35, 40, 42, 45, 47, 48, 52, 54, 57, 61, 64], "inverse_transform": [9, 61], "exp_data_pip": 9, "obs_encod": [9, 61], "obs_attr_nam": 9, "encoded_valu": 9, "construct": [9, 41, 43, 44, 52, 53, 55], "new": [9, 29, 31, 32, 35, 40, 42, 45, 56, 61], "filter": [9, 12, 15, 17, 23, 24, 25, 28, 29, 31, 32, 33, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 56, 57, 58, 61], "axi": [9, 12, 14, 15, 24, 33, 35, 43, 44, 45, 46, 47, 48, 49, 51, 54, 55, 59, 60, 61], "veri": [9, 58], "larg": [9, 23, 41, 44, 51, 54, 55, 56, 57, 58], "featur": [9, 19, 31, 32, 33, 37, 39, 40, 44, 45, 47, 49, 52, 55, 56, 59], "doe": [9, 20, 34, 47, 51, 55, 61], "onli": [9, 13, 14, 17, 21, 24, 25, 29, 32, 33, 34, 35, 36, 41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 54, 55, 57, 59, 61], "being": [9, 61], "singl": [9, 12, 13, 22, 27, 31, 34, 35, 37, 38, 39, 40, 41, 42, 43, 46, 52, 54, 55, 56, 61, 62, 64], "multipl": [9, 12, 13, 17, 31, 33, 35, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 57, 58], "reason": [9, 17, 36, 43], "two": [9, 25, 27, 33, 35, 38, 41, 42, 49, 55, 56, 57, 59], "step": [9, 24, 28, 44, 45, 61], "global": [9, 43, 44, 61], "contigu": 9, "group": [9, 35, 41, 43, 45, 58], "chunk": [9, 14, 28, 54, 61], "random": [9, 42, 43, 44, 45, 47, 61], "within": [9, 32, 35, 37, 38, 41, 43, 55, 61], "sinc": [9, 11, 23, 28, 29, 37, 42, 44, 54, 56, 61], "retriev": [9, 10, 15, 21, 23, 25, 34, 35, 39, 41, 49, 61], "keep": [9, 25, 45, 58, 63], "fix": [9, 28, 35, 61], "size": [9, 14, 33, 35, 43, 45, 47, 55, 58, 61], "ensur": [9, 25, 28, 32, 38, 41, 42, 44, 46, 48, 50, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61], "non": [9, 17, 24, 28, 33, 35, 38, 41, 43, 44, 45, 51, 54, 55, 57], "occur": [9, 12, 13, 28, 55], "second": [9, 19, 23, 37, 38, 49, 52, 55, 61, 63], "note": [9, 23, 26, 31, 33, 34, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 56, 57, 58, 61], "maintain": [9, 42, 49, 55], "proxim": [9, 44, 58], "even": [9, 26, 55], "after": [9, 28, 37, 38, 44], "suffici": [9, 28, 61, 63], "train": [9, 39, 42, 55], "To": [9, 18, 23, 25, 28, 29, 30, 31, 35, 38, 40, 41, 42, 43, 44, 45, 46, 47, 49, 50, 53, 54, 55, 56, 57, 61, 64], "end": [9, 35, 42, 43, 54], "treat": 9, "hyperparamet": 9, "tune": [9, 29, 49], "nn": [9, 61], "parallel": [9, 51], "distributeddataparallel": 9, "partit": 9, "disjoint": [9, 33], "across": [9, 23, 28, 31, 32, 33, 35, 37, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58], "worker": [9, 11], "As": [9, 24, 31, 32, 33, 35, 40, 42, 46, 49, 52, 55, 57, 60, 64], "still": [9, 42], "impact": [9, 43], "aspect": 9, "behavior": 9, "util": [9, 11, 24, 28, 38, 43, 45, 47, 48, 51, 54, 55, 63], "better": [9, 35, 39], "granular": [9, 61], "detail": [9, 24, 26, 27, 31, 32, 40, 42, 43, 57, 61], "gib": 9, "ram": [9, 51, 56, 63], "per": [9, 12, 13, 24, 25, 28, 35, 41, 44, 46, 52, 61], "request": [9, 27, 28, 31, 40, 45, 47, 48, 51, 58, 59, 61], "assum": [9, 13, 35, 43, 51, 61], "sparsiti": 9, "95": 9, "depend": [9, 23, 28, 30, 42, 45, 47], "next": [9, 23, 25, 27, 32, 61], "immedi": [9, 37, 38], "previous": [9, 43, 44], "made": [9, 43], "via": [9, 10, 27, 28, 29, 30, 31, 32, 34, 35, 40, 41, 42, 44, 45, 46, 47, 48, 53, 57, 61, 63, 64], "allow": [9, 23, 25, 47, 48, 54, 61], "network": 9, "filesystem": 9, "client": [9, 28], "side": 9, "potenti": [9, 31, 40, 43], "improv": 9, "overal": [9, 26, 61], "cost": [9, 28], "doubl": [9, 35], "n_ob": [10, 32, 44, 45, 47, 49, 51, 53, 55, 56, 57], "nnz": [10, 14, 25, 35, 49, 55], "elaps": 10, "n_soma_chunk": 10, "statist": [10, 14, 22, 51, 58], "about": [10, 23, 25, 28, 31, 32, 35, 36, 38, 39, 40, 42, 46, 48, 49, 54, 55, 56, 57], "experimentdatapip": [10, 11], "api": [10, 13, 24, 25, 29, 30, 31, 32, 34, 35, 37, 39, 40, 41, 42, 44, 46, 48, 52, 53, 56, 57, 61, 63, 64], "assess": [10, 43, 44], "throughput": 10, "attr": 10, "num_work": 11, "dataloader_kwarg": 11, "factori": 11, "safe": 11, "instanti": [11, 61], "work": [11, 23, 25, 30, 31, 40, 41, 63], "constructor": [11, 61], "applic": [11, 55], "sampler": [11, 61], "batch_sampl": [11, 61], "collate_fn": [11, 61], "ha": [11, 12, 13, 23, 25, 31, 33, 35, 37, 38, 40, 41, 42, 45, 46, 49, 52, 54, 55, 63, 64], "been": [11, 23, 25, 29, 37, 55, 63], "chain": [11, 61], "main": [11, 28, 30, 33, 38, 43, 49, 54, 55], "addit": [11, 15, 30, 31, 35, 38, 40, 41, 45, 47, 53, 56, 59, 60], "keyword": [11, 37], "argument": [11, 12, 13, 18, 21, 24, 25, 56, 57, 59, 60], "except": [11, 41, 43, 46, 57], "param": [11, 21], "collect": [12, 15, 19, 21, 27, 29, 33, 36, 41, 44, 45, 46, 47, 50, 52, 56], "obs_value_filt": [12, 15, 23, 24, 25, 29, 32, 42, 43, 45, 46, 47, 49, 50, 53, 54, 55, 57, 59, 60], "obs_coord": [12, 15, 43, 44], "byte": [12, 15], "float": [12, 13, 15, 25, 54, 61], "datetime64": [12, 15], "timestamptyp": [12, 15], "chunkedarrai": [12, 15], "var_value_filt": [12, 15, 23, 25, 32, 50, 54, 57], "var_coord": [12, 15, 44], "n_top_gen": [12, 13, 24, 42, 44, 46, 59], "flavor": [12, 13, 42, 44], "liter": [12, 13], "seurat_v3": [12, 13, 42, 44, 59], "span": [12, 13, 28, 43, 59], "batch_kei": [12, 13, 24, 42, 59], "max_loess_jitt": [12, 13], "1e": [12, 13, 61], "06": [12, 13, 36], "batch_key_func": [12, 13], "callabl": [12, 13], "convienc": 12, "wrapper": [12, 15, 27, 41, 59], "around": [12, 15, 32, 59], "highly_variable_gen": [12, 24, 42, 44, 45, 46], "execut": [12, 15, 27, 54], "annot": [12, 13, 28, 33, 35, 41, 42, 44, 45, 47, 59], "variabl": [12, 13, 25, 26, 28, 31, 32, 33, 35, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 61], "usual": [12, 15, 19, 24, 28, 61], "homo": [12, 15, 19, 23, 24, 25, 29, 32, 33, 35, 41, 44, 45, 47, 52, 54, 57, 58], "sapien": [12, 15, 19, 23, 24, 25, 29, 32, 33, 35, 41, 44, 45, 47, 52, 54, 57, 58], "mu": [12, 15, 19, 29, 35, 41, 42, 46, 53, 58], "musculu": [12, 15, 19, 29, 35, 41, 42, 46, 53, 58], "syntax": [12, 15], "coordin": [12, 15, 43, 51], "fraction": [12, 13, 24, 59], "estim": [12, 13, 59], "loess": [12, 13, 59], "varianc": [12, 13, 14, 25, 35, 39, 59], "fit": [12, 13, 42, 47, 48, 51, 59], "combin": [12, 13, 23, 28, 35, 41, 42, 43, 44, 47, 48, 51, 52, 54, 57], "kei": [12, 13, 35, 36, 41, 42, 43, 44, 49, 51, 55, 57], "convert": [12, 13, 23, 32, 51], "concaten": [12, 13, 32, 42, 54, 55, 60], "them": [12, 13, 23, 27, 28, 42, 45, 49, 54, 55, 57], "max_lowess_jitt": [12, 13, 59], "jitter": [12, 13, 44, 59], "add": [12, 13, 15, 25, 30, 35, 45, 46, 49, 51, 55], "case": [12, 13, 33, 35, 41, 42, 43, 46, 51, 54, 55, 59, 60, 61], "failur": [12, 13], "low": [12, 13, 28, 31, 40], "entri": [12, 13, 34], "count": [12, 13, 24, 25, 28, 31, 32, 33, 39, 40, 42, 44, 45, 46, 48, 53, 54, 57], "receiv": [12, 13, 44], "seri": [12, 13, 26, 35, 44, 51], "paramat": [12, 42], "hvg": [12, 13, 24, 59], "lung": [12, 15, 28, 35, 38, 41, 42, 45, 48, 52, 53, 54, 56, 57], "500": [12, 25, 28, 44, 46, 59], "anndata": [12, 15, 25, 27, 28, 31, 35, 40, 42, 43, 44, 45, 46, 47, 51, 53, 57, 64], "top": [12, 21, 24, 35, 36, 44, 48, 53, 58, 59], "mus_musculu": [12, 35, 46, 48, 51, 53, 54, 55, 56, 57, 59, 60], "highli": [12, 13, 28, 39, 42, 43, 44, 45, 46, 47, 61, 63, 64], "just": [12, 24, 28, 41, 44, 51, 54, 56], "hvg_soma_id": 12, "highly_vari": [12, 24, 44, 45, 46, 59], "adata": [12, 25, 32, 35, 42, 43, 44, 45, 47, 49, 50, 53, 54, 55, 56, 57], "get_anndata": [12, 25, 32, 42, 43, 44, 45, 46, 47, 50, 53, 54, 57, 59], "scanpi": [13, 24, 28, 32, 38, 42, 43, 44, 45, 46, 47, 49, 53, 55, 56, 59, 62], "mimic": 13, "seurat": [13, 24, 25, 28, 30, 31, 37, 40, 64], "v3": [13, 24, 30, 32, 41, 44, 57], "readthedoc": [13, 42, 44, 46], "en": [13, 42, 44, 46], "inform": [13, 25, 27, 28, 31, 33, 34, 37, 38, 40, 41, 42, 43, 44, 45, 47, 53, 54, 55, 56, 57, 59, 62], "ident": [13, 41], "those": [13, 24, 35, 42, 44, 46, 51], "produc": 13, "donor_id": [13, 35, 38, 41, 44, 49, 53, 55, 56, 57, 60], "lambda": [13, 47], "batch0": 13, "99": 13, "els": [13, 43, 52, 61], "batch1": 13, "calculate_mean": [14, 24, 60], "calculate_vari": [14, 24, 60], "ddof": [14, 60], "nnz_onli": 14, "calcul": [14, 22, 35, 39, 42, 43, 45], "mean": [14, 24, 35, 38, 39, 59, 63], "accumul": [14, 24, 51], "fashion": [14, 23, 24, 37], "total": [14, 24, 28, 29, 33, 35, 41, 44, 46], "n": [14, 25, 28, 32, 33, 35, 41, 44, 46, 49, 50, 51, 55, 60], "dimens": [14, 19, 33, 49, 52, 55, 61], "wise": [14, 44], "metric": [14, 38, 43, 47], "explicitli": [14, 25, 35, 55], "store": [14, 19, 25, 33, 35, 36, 38, 41, 43, 45, 48, 49, 52, 55, 56], "comput": [14, 23, 24, 28, 31, 40, 41, 60, 61], "otherwis": [14, 35, 36, 54], "skip": 14, "delta": [14, 51, 60], "degre": [14, 43, 60], "freedom": [14, 60], "divisor": [14, 60], "x_layer": [15, 25], "obsm_lay": [15, 43, 45, 47], "obsp_lay": 15, "varm_lay": 15, "varp_lay": 15, "column_nam": [15, 23, 25, 27, 32, 41, 43, 44, 48, 49, 50, 51, 53, 54, 55, 56, 57, 58], "axiscolumnnam": 15, "conveni": [15, 27, 41, 48, 51, 52, 53, 57, 59, 64], "obsm": [15, 29, 33, 42, 43, 45, 47], "slot": [15, 29, 43], "obsp": [15, 43], "varm": [15, 33], "varp": [15, 35, 52], "part": [15, 38, 42, 43], "get_all_available_embed": [15, 55], "experiment": [15, 18, 24, 30, 35, 39, 43, 49, 55, 60, 61], "brain": [15, 25, 32, 41, 51], "tissu": [15, 23, 25, 27, 29, 32, 35, 38, 39, 45, 46, 48, 49, 50, 51, 53, 54, 55, 57, 60], "censusversiondescript": [16, 17], "descript": [16, 17, 28, 31, 33, 35, 37, 40, 55, 57, 62], "directori": [16, 17, 30, 34, 36, 63], "unknown": [16, 44, 56, 57], "get_census_version_directori": 16, "entir": [16, 44, 48, 52, 61], "release_d": [16, 17, 36], "release_build": [16, 17, 36], "2022": [16, 17, 20, 21, 52, 53], "01": [16, 17, 20, 26, 35, 42, 46, 49, 50, 55], "public": [16, 17, 20, 27, 29, 34, 35, 36, 43, 45, 47, 49, 50, 53, 55, 56], "s3_region": [16, 17, 20, 34, 36, 53], "u": [16, 17, 18, 20, 21, 27, 28, 30, 31, 34, 36, 40, 44, 51, 53, 55, 63], "west": [16, 17, 20, 21, 27, 28, 30, 34, 36, 53, 55, 63], "lt": [17, 25, 27, 36, 42, 52], "retract": [17, 36], "flag": [17, 36, 61], "both": [17, 23, 25, 28, 35, 37, 38, 42, 43, 50, 54, 55, 57, 59, 61], "long": [17, 23, 27, 31, 36, 37, 38, 40, 61], "term": [17, 27, 31, 35, 36, 40, 41, 48, 51, 56, 61], "weekli": [17, 27, 31, 36, 40], "exclud": [17, 35, 44, 54, 61], "date": [17, 27, 29, 33, 35, 36, 41, 55], "yyyi": [17, 29, 36, 37], "mm": [17, 29, 36], "dd": [17, 29, 36, 37], "alias": 17, "alia": [17, 36], "appear": [17, 35, 36, 41, 43, 61], "under": [17, 34, 35, 36, 38, 44, 46], "again": [17, 56], "v": [17, 36, 42, 51], "sequenti": 17, "increment": [17, 24, 39], "get_census_version_descript": 17, "29": [17, 44, 45, 61], "v2": [17, 41, 42, 44, 56, 60], "v1": [17, 22, 25, 33, 34, 36, 41, 42, 44], "30": [17, 29, 42, 44, 45, 55, 61], "mistak": 17, "info_url": 17, "com": [17, 25, 28, 31, 34, 37, 38, 40, 45, 47, 50, 55, 56, 63], "errata": 17, "replaced_bi": [17, 36], "tiledb_config": [18, 21, 27, 55], "sensibl": 18, "somacor": 18, "somaobject": 18, "replac": [18, 36, 43, 45, 47], "tiledb": [18, 21, 23, 28, 29, 30, 31, 32, 38, 40, 41, 48, 57, 64], "configur": [18, 21, 27, 28, 61], "amount": [18, 56, 58], "oper": [18, 26, 28, 32, 41, 48, 51, 57, 61], "ctx": [18, 27, 55], "py": [18, 21, 42, 44, 46, 56], "init_buffer_byt": [18, 21], "128": [18, 21, 29, 44, 59, 61], "1024": [18, 21], "c": [18, 23, 25, 30, 32, 42, 44, 45, 46, 47, 52, 53, 63], "my": [18, 27], "privat": [18, 27], "access": [18, 20, 28, 29, 31, 32, 33, 34, 35, 37, 38, 39, 40, 41, 43, 46, 48, 57, 58, 61, 64], "differ": [18, 28, 35, 41, 42, 43, 49, 52, 54, 55, 57], "region": [18, 20, 27, 28, 30, 34, 36, 55, 63], "vf": [18, 27, 55], "no_sign_request": [18, 27, 55], "east": [18, 27], "csr_matrix": [19, 38, 42, 46], "presenc": [19, 33, 37, 38, 39, 43, 44, 46], "scipi": [19, 28, 31, 38, 40, 42, 43, 46, 52, 55], "csr_arrai": 19, "deafult": 19, "cannot": [19, 21], "321x60554": 19, "uint8": [19, 52], "6441269": 19, "compress": [19, 52], "format": [19, 27, 35, 36, 37, 51, 52, 62], "censusloc": 20, "guarante": [20, 31, 35, 40, 41, 42], "interest": [20, 31, 33, 40, 41, 43, 52, 54, 56], "_release_directori": 20, "keyerror": 20, "do": [20, 25, 30, 32, 35, 39, 41, 42, 43, 44, 45, 46, 47, 48, 53, 55, 57, 58, 60, 63], "cb5efdb0": 20, "f91c": 20, "4cbd": 20, "9ad4": 20, "9d4fa41c572d": 20, "mirror": 21, "suitabl": [21, 55], "chosen": [21, 34], "automat": [21, 28, 38, 41, 48], "take": [21, 24, 41, 42, 44, 45, 46, 49, 54, 55, 56, 57, 61, 63, 64], "preced": 21, "get_default_soma_context": [21, 27], "level": [21, 33, 35, 36, 37, 38, 41, 45, 51, 53, 54, 56, 58, 59], "It": [21, 28, 29, 33, 35, 37, 38, 41, 55, 59], "manag": [21, 25, 32, 41, 48, 58, 59], "close": [21, 23, 24, 25, 32, 41, 42, 43, 44, 46, 48, 49, 50, 53, 55, 57, 58], "exit": 21, "neither": 21, "invalid": [21, 51], "updat": [21, 24, 28, 35, 37, 42, 44, 46, 51, 55, 56], "31": [21, 44, 45, 61], "rather": [21, 44, 51], "than": [21, 23, 25, 28, 30, 31, 32, 35, 37, 40, 41, 43, 44, 51, 64], "out": [22, 25, 28, 29, 31, 33, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 56, 57, 58, 61, 63, 64], "effici": [22, 25, 28, 31, 38, 39, 40, 54, 56, 64], "commonli": [22, 56], "introduc": [22, 43, 56], "normal": [22, 27, 29, 31, 32, 33, 37, 38, 39, 40, 41, 43, 47, 55, 57, 59, 60], "pre": [22, 24, 28, 36, 39, 41, 45, 54, 55], "categor": [22, 31, 40, 56], "publish": [23, 24, 25, 26, 28, 29, 31, 35, 37, 40], "august": [23, 37], "7th": 23, "pablo": [23, 24, 25, 26, 37], "garcia": [23, 24, 25, 26, 37], "nieto": [23, 24, 25, 26, 37], "team": [23, 24, 25, 28, 37], "pleas": [23, 25, 27, 28, 31, 37, 40, 42, 43, 44, 45, 46, 47, 54, 56, 64], "announc": [23, 24, 25, 37], "come": [23, 32, 37, 42, 43, 44], "our": [23, 25, 28, 32, 37, 41, 42, 43, 45, 47, 49, 55], "back": [23, 37, 42, 45, 61], "now": [23, 24, 25, 26, 31, 32, 37, 38, 40, 41, 42, 44, 45, 46, 49, 50, 52, 53, 54, 55, 57, 60, 61, 63], "biologist": 23, "largest": [23, 28, 37], "standard": [23, 28, 31, 33, 40, 48, 51], "aggreg": [23, 37], "compos": [23, 33, 37], "60k": [23, 28, 37], "With": [23, 24, 25, 37, 41, 43, 46, 49, 55, 57, 61], "few": [23, 24, 39, 43, 45, 46, 54, 55, 56, 63], "hundr": [23, 37], "bigger": [23, 37], "quickli": [23, 29, 41, 42], "basic": [23, 42, 43, 44, 45, 46, 48, 49, 53, 55, 61], "structur": [23, 31, 35, 36, 40, 41, 43], "downstream": [23, 24, 25, 26, 32, 35, 55], "analysi": [23, 25, 32, 35, 37, 39, 41, 42, 43, 44, 46, 48, 54, 55], "instruct": [23, 28, 32], "learn": [23, 35, 38, 42, 43, 46, 48, 54, 55, 57], "sure": [23, 46], "resourc": [23, 34, 44], "quick": [23, 27, 28, 31, 39, 40, 41, 58, 61], "start": [23, 26, 27, 28, 29, 31, 39, 40, 41, 42, 44], "guid": [23, 27, 38, 42], "refer": [23, 25, 27, 28, 31, 32, 35, 37, 40, 42, 43, 45, 47, 57], "tutori": [23, 24, 28, 29, 31, 32, 40, 43, 44, 45, 46, 47, 49, 51, 53, 54, 55, 57, 58, 59, 60, 61], "reli": 23, "capabl": [23, 37, 39, 43, 52, 64], "shown": [23, 26, 35, 36, 41, 43, 49, 61], "section": [23, 27, 35, 41, 44, 45, 49, 54, 55], "czi": [23, 28, 31, 40, 62], "develop": [23, 29, 30, 37, 42, 44, 56], "upgrad": [23, 28, 56], "beta": [23, 41, 44, 45], "here": [23, 24, 28, 31, 32, 33, 35, 36, 38, 40, 42, 43, 54, 55, 56, 61], "ever": 23, "grow": 23, "cz": [23, 28, 29, 33, 39, 44, 46, 50, 53, 54], "discov": [23, 28, 29, 33, 38, 41, 44, 45, 49, 50, 53, 54, 55, 62], "accompani": 23, "ontologi": [23, 35, 45, 56], "cl": [23, 35, 41, 44, 45, 48, 57, 58, 60], "uberon": [23, 35, 41, 44, 48, 56, 57, 58, 60], "respect": [23, 25, 30, 35, 41, 43, 56, 57], "you": [23, 25, 28, 29, 30, 31, 32, 33, 38, 39, 40, 41, 42, 44, 45, 46, 47, 48, 49, 51, 53, 54, 55, 56, 57, 58, 61, 63], "find": [23, 25, 29, 31, 33, 38, 40, 41, 43, 45, 46, 47, 48, 49, 52, 55, 59], "schema": [23, 25, 26, 27, 28, 29, 36, 38, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58], "page": [23, 27, 28, 29, 32, 33, 42, 43, 45, 47, 49, 55, 64], "research": [23, 25, 28, 31, 40], "directli": [23, 27, 28, 29, 39, 41, 43, 44, 48, 53, 57, 61, 62], "session": [23, 27, 30], "librari": [23, 26, 28, 29, 30, 32, 33, 35, 41, 44, 61], "your": [23, 25, 28, 30, 31, 39, 40, 48, 53, 54, 55, 58], "navig": 23, "300k": [23, 32], "microgli": [23, 27, 32], "neuron": [23, 25, 27, 32, 41, 45, 52, 58], "femal": [23, 27, 32, 44, 54, 56, 57, 60], "donor": [23, 35, 44, 52, 53, 56], "somadatafram": [23, 32, 41, 48, 57], "cell_metadata": [23, 27, 32, 50], "arrow": [23, 25, 26, 28, 31, 32, 37, 40], "tabl": [23, 25, 32, 33, 39, 42, 43, 44, 46, 50, 51, 52, 54], "sex": [23, 25, 27, 29, 32, 35, 41, 49, 51, 53, 54, 55, 56, 57, 60], "cell_typ": [23, 24, 25, 26, 27, 32, 35, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 53, 54, 55, 56, 57, 58, 60, 61], "assai": [23, 26, 27, 29, 32, 42, 43, 46, 49, 53, 55, 56, 57, 58, 60], "suspension_typ": [23, 27, 32, 35, 38, 41, 44, 49, 53, 55, 56, 57, 60], "diseas": [23, 27, 29, 32, 35, 42, 43, 49, 53, 54, 55, 56, 57, 60], "concat": [23, 24, 32, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 59, 60], "tibbl": [23, 32], "frame": [23, 27, 28, 31, 32, 33, 40, 41, 52], "similarli": [23, 25, 26, 32, 41, 52, 57], "gene_filt": [23, 24, 25, 32], "feature_id": [23, 24, 25, 32, 35, 41, 44, 45, 47, 49, 51, 52, 53, 55, 56, 57, 59], "ensg00000107317": [23, 25, 32], "ensg00000106034": [23, 25, 32], "cell_filt": [23, 24, 25, 32], "leptomening": 23, "cell_column": [23, 25, 32], "seurat_obj": [23, 25, 32], "get_seurat": [23, 25, 32], "sce_obj": [23, 25, 32], "get_single_cell_experi": [23, 25, 32], "sometim": 23, "too": 23, "overview": [23, 33, 58], "septemb": 24, "18": [24, 41, 42, 44, 45, 47, 55, 60], "thrill": 24, "offici": [24, 35], "wide": [24, 27, 31, 40, 43, 52], "algorithm": [24, 43, 59, 60], "line": [24, 41, 45, 47, 61], "code": [24, 25, 38, 51, 56, 58, 61], "task": [24, 28, 43], "ten": 24, "convent": [24, 36, 41], "laptop": 24, "8gb": 24, "below": [24, 25, 26, 32, 35, 36, 37, 41, 44, 45, 49, 52, 58, 61], "full": [24, 27, 31, 33, 36, 37, 38, 39, 40, 42, 43, 57, 58, 61], "correct": [24, 29, 35, 61], "These": [24, 25, 28, 31, 34, 35, 38, 40, 41, 43, 44, 45, 47, 55], "interwoven": 24, "wai": [24, 41, 48, 49, 52, 54, 55, 57], "seamlessli": 24, "appli": [24, 43, 46, 47], "33m": [24, 28], "continu": [24, 32], "cellxgene_censu": [24, 25, 26, 27, 29, 32, 38, 41, 42, 43, 44, 45, 46, 47, 48, 50, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61, 62, 63], "pp": [24, 42, 43, 44, 45, 46, 47, 49, 55, 59, 60], "mean_vari": [24, 60], "small": [24, 25, 35, 37, 41, 43, 44, 46, 48, 51, 56, 57], "advantag": [24, 49, 55], "cpu": [24, 42, 45, 61], "multiprocess": 24, "speed": [24, 28], "popul": 24, "zero": [24, 25, 33, 35, 43, 51, 55, 59], "futur": [24, 29, 32, 34, 41, 42, 44, 45, 46, 48, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61], "we": [24, 25, 28, 31, 32, 34, 38, 40, 41, 42, 43, 44, 45, 46, 47, 49, 50, 51, 52, 54, 55, 56, 57, 60, 61], "enabl": [24, 28, 29, 35, 56], "easili": [24, 25, 28, 46, 49], "switch": [24, 56], "human_data": 24, "feature_nam": [24, 32, 35, 41, 43, 44, 49, 50, 51, 52, 53, 54, 55, 56, 57, 59], "axis_queri": [24, 25, 32, 49, 51, 54, 55, 59, 60], "mean_variance_df": 24, "gene_df": 24, "to_panda": [24, 32, 41, 42, 43, 44, 46, 48, 49, 50, 51, 52, 53, 54, 55, 57, 58, 59, 60], "8624": 24, "071926": 24, "5741": 24, "242485": 24, "16437": 24, "8": [24, 30, 32, 41, 42, 43, 44, 45, 46, 47, 49, 52, 54, 55, 56, 57, 59, 60, 61, 63], "233282": 24, "452": 24, "119153": 24, "feature_length": [24, 32, 35, 41, 42, 44, 46, 49, 51, 52, 53, 55, 56, 57, 59], "ensg00000171885": 24, "5943": 24, "ensg00000133703": 24, "6845": 24, "get_highly_variable_gen": 24, "while": [24, 32, 41, 43, 45, 49, 55, 59], "account": [24, 42, 61], "effect": [24, 25, 42, 43, 55], "integr": [24, 28, 31, 38, 40, 43, 44], "particular": [24, 26, 43, 61], "design": [24, 56], "paradigm": [24, 31, 40], "abov": [24, 28, 32, 33, 35, 41, 45, 54, 56, 57, 58], "tweak": 24, "compli": 24, "rule": 24, "thumb": 24, "good": [24, 43, 46, 55], "variances_norm": [24, 59], "003692": 24, "004627": 24, "748221": 24, "003084": 24, "003203": 24, "898657": 24, "014962": 24, "037395": 24, "513473": 24, "218865": 24, "547648": 24, "786928": 24, "002142": 24, "002242": 24, "894955": 24, "60659": [24, 44, 52], "000000": [24, 43, 51, 59], "60660": [24, 44, 52], "60661": [24, 44, 52], "60662": [24, 44, 52], "60663": [24, 44, 52], "octob": 25, "maximilian": 25, "lombardo": 25, "happi": 25, "introduct": 25, "tailor": 25, "empow": 25, "reflect": [25, 35, 43], "expand": [25, 35, 43, 51], "exclus": 25, "thei": [25, 35, 36, 42, 43, 49, 51, 52, 54, 55], "invit": 25, "feedback": 25, "explor": [25, 28, 31, 38, 39, 40, 55], "novel": [25, 44], "were": [25, 28, 33, 35, 41, 42, 43, 44, 46, 52, 54, 55], "mous": [25, 33, 35, 38, 41, 46, 51, 53, 54, 57, 59, 60], "divid": [25, 51, 54], "sum": [25, 26, 35, 43, 44, 45, 47, 48, 51, 53, 61], "point": [25, 33, 36, 43, 51], "precis": [25, 49, 55], "round": 25, "sigma": 25, "artifact": [25, 34, 35, 43], "m": [25, 30, 33, 41, 44, 45, 46, 47, 52, 57, 59, 63], "enrich": 25, "field": [25, 34, 35, 55, 64], "n_measured_ob": [25, 35, 49, 55], "wa": [25, 35, 43, 46, 47, 52, 53, 55, 56, 61], "augment": 25, "forego": 25, "common": [25, 32, 43, 48, 55, 57, 59, 61], "earli": 25, "raw_sum": [25, 35, 49, 51, 55], "deriv": [25, 45, 46, 55], "raw_mean_nnz": [25, 35, 49, 55], "averag": 25, "raw_variance_nnz": [25, 35, 49, 55], "n_measured_var": [25, 35, 49, 55], "thu": [25, 28, 31, 35, 38, 40, 42, 45, 48, 57], "ensg00000161798": [25, 32, 57], "ensg00000188229": [25, 32, 57], "sympathet": [25, 32], "singlecellexperi": [25, 30, 31, 37, 40], "outlin": 25, "like": [25, 26, 28, 34, 37, 41, 43, 44, 45, 48, 55, 61], "male": [25, 32, 44, 45, 51, 56, 57, 58, 60], "pyarrow": [25, 28, 31, 32, 40, 51, 54], "raw_slic": [25, 32], "somaaxisqueri": [25, 32], "read_next": [25, 32], "print": [25, 32, 43, 48, 50, 52, 53, 54, 55, 56, 61, 63], "encourag": [25, 31, 40], "engag": 25, "share": [25, 28, 31, 40], "invalu": 25, "ongo": 25, "project": [25, 30, 39, 43], "reach": [25, 31, 40, 42], "report": [25, 29, 43, 56], "issu": [25, 28, 29, 43], "repositori": [25, 28, 31, 35, 40, 55, 63], "april": [26, 36], "4th": 26, "2024": [26, 28, 31, 35, 40, 50], "emanuel": 26, "bezzi": 26, "04": [26, 35, 46, 49, 55], "instead": [26, 42, 43, 46, 56, 61], "observ": [26, 33, 35, 42, 51, 54, 56, 58], "smaller": [26, 32, 61], "footprint": 26, "howev": [26, 28, 42, 43, 44, 61], "pipelin": [26, 31, 39, 40], "explain": 26, "adapt": [26, 51, 55], "link": [26, 37, 44, 52, 53], "value_count": [26, 41, 42, 44, 46, 48, 51, 54, 57], "categori": [26, 29, 35, 41, 44, 45, 58], "present": [26, 28, 31, 33, 35, 36, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 57, 58], "groupbi": [26, 44, 47, 56, 58], "pivot": 26, "show": [26, 35, 38, 39, 41, 43, 45, 46, 47, 51, 54, 61], "unus": 26, "factor": [26, 43], "interfac": [26, 45, 47, 49, 55, 56, 61], "inspect": [26, 38, 49, 55, 61], "null": [26, 36], "indic": [26, 28, 33, 35, 38, 41, 43, 44, 46, 51, 52, 55, 57], "int16": 26, "int8": 26, "assay_ontology_term_id": [26, 35, 38, 41, 44, 48, 49, 53, 55, 56, 57, 60], "development_stag": [26, 35, 41, 44, 49, 53, 55, 56, 57, 60], "development_stage_ontology_term_id": [26, 35, 41, 44, 49, 53, 55, 56, 57, 60], "output": [26, 32, 51, 61], "truncat": 26, "amazon": [27, 28], "web": [27, 28], "servic": [27, 28, 34], "what": [27, 34, 35, 38, 41, 42, 43, 44, 54, 55, 57], "inclus": [27, 35, 48], "criteria": [27, 28, 32, 33, 35, 57], "individu": [27, 31, 35, 40, 41, 42, 46, 54], "root": [27, 35, 36, 63], "definit": [27, 42, 57], "publicli": [27, 28, 29, 31, 36, 40, 64], "uniqu": [27, 28, 29, 35, 41, 42, 43, 44, 48, 51, 54], "05": [27, 36, 37, 49, 54, 56, 61], "bulk": 27, "07": [27, 32, 34, 36, 41, 42, 44, 46, 48, 51, 52, 53, 56, 57, 58, 59, 60, 61], "25": [27, 32, 34, 41, 42, 43, 44, 45, 46, 48, 51, 52, 53, 56, 57, 58, 59, 60, 61], "shell": [27, 45, 47, 53], "sync": [27, 45], "sign": [27, 43, 45, 47], "recommend": [27, 28, 30, 32, 35, 36, 42, 43, 45, 47, 54, 56, 63, 64], "folder": [27, 36, 37, 38, 47], "interact": [27, 31, 35, 40], "document": [27, 28, 32, 35, 36, 38, 41, 42, 46, 48, 55, 57, 64], "last": [28, 29, 35, 36], "jan": 28, "latenc": [28, 31, 40], "acceler": [28, 31, 40], "50m": 28, "mice": 28, "harmon": [28, 31, 37, 40], "label": [28, 35, 36, 41, 43, 44, 45, 47, 50, 54, 56, 58, 61], "multi": [28, 33, 39, 44, 55], "core": [28, 39, 42, 51], "k": [28, 43], "onlin": [28, 29, 31, 36, 40, 60, 64], "t": [28, 35, 42, 44, 45, 46, 47, 48, 50, 53, 54, 57, 58], "covid": [28, 41, 44, 54, 57], "19": [28, 29, 41, 42, 44, 45, 47, 48, 52, 54, 55, 57], "suit": 28, "author": [28, 35], "spatial": [28, 33, 35, 42, 43, 44, 52, 53], "yet": [28, 30], "d": [28, 55, 63], "click": [28, 32], "citat": [28, 31, 35, 39, 40], "guidelin": [28, 31, 40], "offer": [28, 31, 40, 43, 49, 55, 64], "becaus": [28, 42, 44, 46, 54], "therefor": [28, 42, 46, 48, 54, 55], "numer": [28, 43], "incompat": [28, 35], "purpos": 28, "suggest": [28, 43], "fast": 28, "corpu": 28, "60": [28, 45, 54], "gencod": 28, "readi": [28, 45, 61], "cloud": [28, 30, 31, 34, 40, 53, 64], "matric": [28, 31, 32, 33, 40, 41, 43, 51], "possibl": [28, 35, 38, 45, 57], "due": [28, 41, 43, 51, 61], "free": [28, 56], "aw": [28, 30, 34, 45, 47, 53, 63], "ye": 28, "download_source_h5ad": [28, 53], "help": [28, 32, 38, 41, 46, 48, 55, 56, 57, 59, 61], "pattern": [28, 43], "internet": [28, 30, 56], "limit": [28, 41, 54], "bandwidth": [28, 54, 63], "tactic": 28, "connect": [28, 30, 44, 45, 56, 58, 63], "high": [28, 33, 35, 41, 43, 44, 45, 54, 56, 59, 63], "ethernet": 28, "wifi": 28, "coast": 28, "ec2": [28, 30], "instanc": [28, 30, 35, 43, 48, 56, 63], "There": [28, 30, 44, 45, 48, 49, 52, 54, 55, 59], "environ": [28, 30], "census_env": 28, "activ": [28, 30, 32, 55, 63], "submit": [28, 31, 40], "join": [28, 31, 40, 41, 44, 51, 53, 57, 59], "scienc": [28, 31, 40, 50, 52, 62], "commun": [28, 31, 40, 43, 49, 55], "slack": [28, 31, 37, 40], "question": [28, 41], "channel": [28, 31, 37, 40], "inquir": 28, "accept": [28, 35, 59], "meet": [28, 32, 57, 59], "biolog": [28, 39, 54, 55, 61], "try": [28, 61], "old": [28, 44, 60], "persist": [28, 33], "notebook": [28, 30, 37, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 55, 56, 61, 63], "sh": [28, 30], "restart": 28, "runtim": 28, "reload": [28, 45], "numba": [28, 51], "relat": [28, 36], "magic": 28, "similar": [28, 41, 42, 43, 44, 47, 57, 58, 59], "dbutil": 28, "restartpython": 28, "addition": [28, 42, 43], "node": [28, 41], "cluster": [28, 39, 42, 47], "0d53f00001ghvp3cap": 28, "between": [28, 35, 43, 45], "altern": [28, 61], "ad": [28, 35, 37, 43, 56, 57], "tab": 28, "edit": [29, 35, 36], "decemb": 29, "15th": [29, 31, 40], "stabil": 29, "scientif": 29, "reproduc": [29, 42, 56, 58], "plan": [29, 31, 40], "regular": 29, "everi": [29, 31, 40], "six": [29, 31, 40], "month": [29, 31, 36, 37, 40, 60], "least": [29, 31, 35, 40], "5": [29, 30, 32, 35, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 59, 60, 61], "year": [29, 31, 36, 37, 40, 44], "recogn": 29, "previou": [29, 34, 42, 44, 49, 55], "ingest": [29, 54], "hand": 29, "week": [29, 57], "651": 29, "62": [29, 44, 45, 47, 54], "998": 29, "417": 29, "684": 29, "805": 29, "36": [29, 41, 45, 61], "227": [29, 59], "903": 29, "230": 29, "588": [29, 44, 52, 53], "990": 29, "20": [29, 41, 42, 44, 45, 47, 50, 52, 55, 60, 63], "631": 29, "248": [29, 41, 48], "stage": [29, 44, 56, 57, 60], "173": [29, 59], "72": [29, 45], "self": [29, 37, 38, 42, 51, 56, 61], "ethnic": [29, 56], "na": [29, 35, 41, 58, 60], "suspens": [29, 42, 56], "74": [29, 45], "53": [29, 45], "27": [29, 41, 42, 44, 45, 52, 61], "fine": [29, 49, 63], "593": [29, 44, 52, 53], "56": [29, 44, 45], "400": 29, "873": 29, "255": 29, "245": [29, 52], "33": [29, 44, 45, 55, 61], "364": 29, "242": 29, "083": 29, "531": [29, 44], "13": [29, 41, 42, 44, 45, 46, 49, 54, 55], "035": 29, "9": [29, 32, 41, 42, 43, 44, 45, 46, 47, 48, 49, 52, 54, 55, 56, 57, 61], "613": [29, 41, 48, 58], "164": 29, "64": [29, 41, 45], "26": [29, 41, 42, 44, 45, 52, 61], "220": [29, 41, 48, 52], "66": [29, 41, 45, 48], "54": [29, 41, 45], "prevent": [29, 55], "analys": [29, 56], "mark": [29, 35, 41, 43, 54], "is_primari": 29, "exactli": [29, 35], "243": [29, 41, 52], "569": 29, "twice": [29, 41], "wish": [29, 41, 59], "consid": [29, 42], "duplicate_cells_census_lts_2023": 29, "csv": [29, 56], "zip": [29, 47, 51], "562": 29, "794": 29, "728": 29, "086": 29, "032": 29, "758": 29, "887": 29, "914": 29, "318": 29, "493": 29, "362": 29, "604": 29, "226": 29, "68": [29, 45], "51": [29, 44, 45], "61": [29, 45], "linux": [30, 63], "maco": [30, 63], "system": [30, 41, 43, 49, 53, 55, 63], "Or": 30, "tbd": 30, "16": [30, 41, 42, 44, 45, 46, 47, 49, 55, 56, 60, 61], "gb": [30, 56], "mbp": [30, 56], "increas": [30, 31, 40, 56], "virtual": [30, 63], "conda": 30, "venv": [30, 42, 44, 46, 63], "bin": [30, 63], "modul": [30, 38, 39, 42, 61], "less": [30, 31, 40, 43, 61], "complex": [30, 41, 43, 48, 51, 52], "databrick": 30, "faq": [30, 31, 40], "ubuntu": [30, 63], "apt": 30, "libxml2": 30, "dev": 30, "libssl": 30, "libcurl4": 30, "openssl": 30, "cmake": 30, "21": [30, 42, 44, 45, 46, 47, 52, 54, 57, 60], "greater": [30, 35, 50], "tool": [30, 38, 43, 47, 56, 63], "xcode": 30, "window": [30, 61], "univers": [30, 43, 55], "cran": 30, "org": [30, 50], "abl": [30, 34], "export": [30, 37, 49, 64], "biocmanag": 30, "quietli": 30, "break": [31, 40, 54], "ve": [31, 40], "central": [31, 40, 49, 55], "hub": [31, 40], "analyz": [31, 40], "significantli": [31, 40], "minim": [31, 40, 43], "studi": [31, 40, 42, 43], "scale": [31, 40, 42, 44, 45, 46], "interoper": [31, 40, 56], "toolkit": [31, 39, 40], "smart": [31, 33, 38, 40, 41, 44, 52, 53, 58, 60], "seq2": [31, 33, 38, 40, 41, 44, 46, 52, 53, 58, 60], "molecul": [31, 33, 35, 40], "10x": [31, 32, 33, 38, 40, 41, 43, 44, 47, 52, 53, 54, 56, 57, 60], "duplic": [31, 33, 38, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 55, 57, 58], "five": [31, 40], "perman": [31, 40], "ask": [31, 40], "email": [31, 37, 40, 55], "believ": [31, 40], "secur": [31, 40], "disclos": [31, 40], "contact": [31, 40], "seamless": [31, 40], "pytorch": [31, 39, 40], "usabl": [31, 40, 61], "area": [31, 40], "On": [31, 40], "demand": [31, 32, 40], "rich": [31, 40, 42], "subsampl": [31, 40], "vignett": [32, 47], "soon": 32, "remind": [32, 49, 52, 55], "etc": [32, 33, 38, 41], "consist": [32, 38, 41, 42, 43, 44, 46, 48, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61], "ey": [32, 52], "379219": 32, "microwel": [32, 41, 44, 57], "seq": [32, 41, 42, 44, 57, 58], "adren": [32, 41], "gland": [32, 41, 45, 54, 55, 58], "379220": 32, "379221": 32, "379222": 32, "379223": 32, "379224": 32, "7": [32, 41, 42, 43, 44, 45, 46, 47, 49, 52, 53, 54, 55, 56, 57, 61], "n_var": [32, 44, 46, 49, 51, 52, 53, 55, 56, 57], "demonstr": [32, 38, 39, 41, 42, 43, 47, 49, 50, 51, 53, 55, 56, 59, 61], "lazi": [32, 49, 54, 55], "evalu": 32, "well": [32, 41, 42, 44, 54, 58], "logic": [32, 44], "wrap": [32, 51, 61], "loop": 32, "r6": 32, "familiar": [32, 35, 42, 44, 46, 61, 64], "379": 32, "224": 32, "chr": 32, "fema": 32, "6": [32, 34, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 52, 53, 54, 55, 56, 57, 58, 61], "\u2139": 32, "214": 32, "4k": 32, "4744": 32, "sampl": [32, 44, 45, 47, 51], "bioconductor": 32, "ecosystem": 32, "dim": 32, "rownam": 32, "rowdata": 32, "colnam": 32, "obs48350835": 32, "obs48351829": 32, "obs52469564": 32, "obs52470190": 32, "coldata": 32, "reduceddimnam": 32, "mainexpnam": 32, "altexpnam": 32, "sparse_matrix": 32, "state": [32, 43, 44, 52, 53], "monitor": 32, "read_complet": 32, "friendli": [33, 35], "varieti": [33, 43, 48, 51, 55], "hierarchi": 33, "somacollect": [33, 41, 48, 63], "whole": [33, 41, 44], "summary_cell_count": [33, 41, 44, 58], "stratifi": [33, 41, 45], "relev": [33, 35, 39, 41, 57], "independ": [33, 41], "somaexperi": [33, 41, 51], "special": [33, 35, 41, 57], "form": [33, 41, 52, 61], "how": [33, 38, 39, 41, 43, 44, 46, 50, 54, 55, 58, 61], "avialbl": 33, "feature_dataset_presence_matrix": [33, 44, 46], "boolean": [33, 35, 44, 46, 52], "adher": 33, "technologi": [33, 35, 38, 41, 42, 44, 46], "short": [33, 37, 41], "densendarrai": 33, "dimension": [33, 35, 43, 44], "offset": 33, "sparsendarrai": [33, 49, 55], "primari": [33, 35, 38, 43, 45, 58], "geograph": 34, "json": [34, 45, 47, 55], "cziscienc": [34, 45, 47, 50, 55, 56], "base_uri": 34, "three": [34, 56, 57], "gc": 34, "rememb": [34, 41, 54], "relative_uri": 34, "hood": 34, "cloudfront": 34, "registri": 34, "resolv": 34, "against": 34, "onward": 34, "togeth": [34, 61], "could": [34, 43, 47, 61], "deprec": [34, 42], "march": 35, "NOT": [35, 36, 51, 52], "shall": [35, 36], "interpret": [35, 36, 43], "bcp": [35, 36], "14": [35, 36, 41, 42, 44, 45, 46, 49, 52, 55], "rfc2119": [35, 36], "rfc8174": [35, 36], "capit": [35, 36], "hereaft": 35, "visit": [35, 43, 62], "understand": [35, 38, 43], "reader": [35, 38], "throughout": [35, 45, 47, 54, 55], "serv": [35, 46], "deposit": [35, 36, 38], "heart": [35, 52, 54, 59], "left": [35, 37, 38, 42, 44], "ventricl": [35, 48], "semver": 35, "major": [35, 44], "delet": 35, "modal": 35, "minor": 35, "compat": 35, "patch": 35, "editori": 35, "impos": 35, "organism_ontology_term_id": 35, "ncbitaxon": 35, "10090": 35, "9606": 35, "feature_refer": 35, "speic": 35, "AND": 35, "compris": 35, "children": 35, "efo": [35, 41, 42, 44, 57, 58, 60], "0002772": 35, "0010183": [35, 41], "nascent": 35, "elong": 35, "target": [35, 41], "manner": [35, 49, 55, 61], "doesn": [35, 44], "concurr": 35, "perturb": 35, "intend": [35, 37, 59, 61], "primarili": [35, 42, 43, 44], "fusion": 35, "modif": 35, "mrna": [35, 41], "trna": 35, "rrna": 35, "viral": 35, "intron": 35, "ribosom": 35, "profil": [35, 41, 44], "umi": 35, "tissue_typ": 35, "equal": [35, 38, 48], "referenc": [35, 44], "whose": [35, 44, 57], "readabl": [35, 36, 38, 44], "census_schema_vers": [35, 41, 50], "census_build_d": [35, 41, 50], "iso": [35, 36, 55], "8601": [35, 36], "dataset_schema_vers": [35, 41, 50], "total_cell_count": [35, 41, 44, 50, 58], "unique_cell_count": [35, 41, 44, 50, 58], "number_donors_homo_sapien": [35, 41, 50], "number_donors_mus_musculu": [35, 41, 50], "10000": [35, 43], "100": [35, 41, 42, 44], "collection_id": [35, 42, 46, 52, 53], "quot": 35, "collection_nam": [35, 38, 42, 44, 46, 52, 53], "collection_doi": [35, 42, 46, 52, 53], "dataset_titl": [35, 38, 42, 44, 46, 52, 53], "dataset_h5ad_path": [35, 42, 46, 52, 53], "rel": [35, 46, 60], "storag": [35, 64], "dataset_total_cell_count": [35, 42, 46, 52, 53], "dataset_version_id": 35, "self_reported_ethn": [35, 41, 44, 49, 53, 55, 56, 57], "ontology_term_id": [35, 41, 44, 58], "0002048": [35, 44, 48], "cell_type_a": 35, "xxxxx": 35, "cell_type_n": 35, "assay_a": 35, "assay_n": 35, "tissue_a": 35, "tissue_n": 35, "tissue_general_a": 35, "tissue_general_n": 35, "disease_a": 35, "mondo": [35, 44], "disease_n": 35, "self_reported_ethnicity_a": 35, "hancestro": [35, 57], "self_reported_ethnicity_n": 35, "sex_a": 35, "pato": [35, 44, 57, 60], "sex_n": 35, "suspension_type_a": 35, "suspension_type_n": 35, "organism_label": 35, "machin": [35, 36, 45], "somameasur": 35, "somaindexeddatafram": 35, "fill": [35, 55], "remov": [35, 42, 44, 54], "variant": 35, "j": [35, 43, 50, 52, 53], "feature_biotyp": 35, "pin": 35, "clarifi": 35, "feature_1": 35, "feature_m": 35, "dataset_soma_joinid_1": 35, "dataset_soma_joinid_n": 35, "tissue_general_ontology_term_id": [35, 41, 44, 49, 53, 55, 56, 57, 60], "disease_ontology_term_id": [35, 41, 44, 49, 53, 55, 56, 57, 60], "observation_joinid": 35, "self_reported_ethnicity_ontology_term_id": [35, 41, 44, 49, 53, 55, 56, 57, 60], "sex_ontology_term_id": [35, 41, 44, 49, 53, 55, 56, 57, 60], "tissue_ontology_term_id": [35, 41, 44, 48, 49, 53, 55, 56, 57, 60], "handl": [35, 41, 48, 50, 54, 61], "text": [35, 36, 37, 38], "cell_census_build_d": 35, "cell_census_schema_vers": 35, "renam": [35, 44], "move": [35, 61], "dataset_presence_matrix": 35, "ascii": [35, 36], "0x22": 35, "exclam": 36, "intern": 36, "Its": 36, "notic": [36, 56], "printabl": 36, "charact": 36, "record": [36, 48], "parent": [36, 41], "longer": [36, 42], "dai": 36, "info_permalink": 36, "later": [36, 43, 45, 47, 49, 55], "release_alia": 36, "release_nam": 36, "url": [36, 45, 47], "blog": 37, "piec": [37, 41], "deliv": 37, "hous": 37, "blurb": 37, "extern": 37, "goal": [37, 38, 41, 42, 46, 51], "master": 37, "twitter": 37, "One": [37, 43], "stop": [37, 42, 54], "place": [37, 38, 42, 61], "histor": 37, "view": [37, 38, 44, 55, 56, 59], "great": [37, 42, 46], "approach": [37, 43], "apach": 37, "subdirectori": 37, "markdown": [37, 38], "md": [37, 38], "prefix": 37, "yyyymmdd": 37, "discret": [37, 38, 42], "20230810": 37, "r_api_is_out": 37, "highest": [37, 38], "header": [37, 38], "concis": [37, 38], "explanatori": [37, 38], "white_check_mark": [37, 38], "cool": 37, "error": [37, 41, 45, 47, 48], "ital": 37, "keyboard": 37, "john": 37, "smith": 37, "author1": 37, "phil": 37, "scoot": 37, "author2": 37, "introductori": [37, 38], "paragraph": [37, 38], "right": [37, 38, 44, 54], "underneath": [37, 38], "summari": [37, 38, 39, 50], "30m": 37, "rest": [37, 38, 44], "render": [37, 38], "sidebar": [37, 38], "absenc": [37, 38], "sub": [37, 38, 56], "writer": [37, 38], "pgarcia": 37, "capabitli": 37, "cellcensu": 38, "symlink": 38, "asset": 38, "onboard": 38, "product": 38, "unless": 38, "direct": [38, 53], "mention": 38, "action": 38, "extract": [38, 51, 61], "exhaust": [38, 42], "proper": [38, 42], "showcas": [38, 41, 42, 51, 54, 55, 57], "clear": [38, 41, 43, 54], "power": 38, "bold": 38, "lower": [38, 44, 59, 61], "qc": 38, "much": [38, 43, 48], "kept": 38, "succinct": 38, "liver": [38, 46, 54], "prior": 38, "blob": [38, 56], "cellxgene_census_schema": 38, "repeat": [38, 54], "let": [38, 41, 42, 43, 44, 45, 46, 47, 49, 52, 53, 54, 55, 56, 57], "sc": [38, 42, 43, 44, 45, 46, 47, 56], "tabula": [38, 42, 44, 46, 52, 53], "muri": [38, 42, 46, 53], "seni": [38, 42, 46, 53], "genom": [38, 56], "stream": [39, 64], "gget": 39, "collabor": [39, 43, 45], "predict": [39, 43], "biologi": [39, 55], "gain": 39, "natur": [39, 44, 45, 54, 56], "summar": [39, 41, 44, 58], "leverag": 39, "cover": 41, "simpl": [41, 43, 47, 51, 56, 61], "sever": [41, 48, 49], "prefer": [41, 48, 53], "34": [41, 42, 44, 45, 46, 47, 48, 51, 52, 53, 54, 56, 57, 58, 59, 60, 61], "39": [41, 42, 43, 44, 45, 46, 47, 48, 49, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61], "think": [41, 47], "variou": [41, 43, 48, 58], "analog": 41, "census_info": [41, 42, 44, 46, 50, 52, 53, 58], "census_obj": 41, "want": [41, 51, 54, 57, 61, 63], "pair": [41, 51], "61656118": [41, 48, 53], "37447773": 41, "13035": 41, "1417": 41, "Of": 41, "meta": [41, 54, 56], "consortia": 41, "idea": 41, "Not": 41, "cast": 41, "census_count": 41, "33364242": [41, 58], "56400873": [41, 53, 58], "0008722": [41, 44, 58], "264166": [41, 58], "279635": [41, 58], "drop": [41, 44, 45, 51, 58], "0008780": [41, 58], "25652": [41, 44, 58], "51304": [41, 58], "indrop": [41, 44, 58], "0008919": [41, 58], "89477": [41, 58], "206754": [41, 58], "0008931": [41, 58, 60], "78750": [41, 58], "188248": [41, 58], "1357": [41, 58], "0002113": [41, 58], "179684": [41, 58], "208324": [41, 58], "kidnei": [41, 45, 52, 54, 58], "1358": [41, 58], "0002365": [41, 58], "15577": [41, 58], "31154": [41, 58], "exocrin": [41, 45, 55, 58], "1359": [41, 58], "0002367": [41, 58], "37715": [41, 58], "130135": [41, 58], "prostat": [41, 58], "1360": [41, 58], "0002368": [41, 58], "13322": [41, 58], "26644": [41, 58], "endocrin": [41, 45, 58], "1361": [41, 58], "0002371": [41, 58], "90225": [41, 58], "144962": [41, 58], "bone": [41, 45, 53, 54, 58], "marrow": [41, 53, 54, 58], "1362": [41, 58], "omit": 41, "creation": 41, "sort": 41, "census_human_assai": 41, "sort_valu": [41, 45], "ascend": 41, "0009922": [41, 57], "11845077": 41, "25597563": 41, "0009899": [41, 44, 60], "7559102": 41, "12638794": 41, "0011025": 41, "3872375": 41, "6139786": 41, "0010550": 41, "4062980": 41, "5064268": 41, "sci": [41, 44], "0009900": 41, "2930054": 41, "3139770": 41, "17": [41, 42, 44, 45, 46, 47, 49, 55, 56, 60], "0030004": 41, "915037": 41, "1084235": 41, "transcript": [41, 44], "0030003": [41, 44], "744798": 41, "811422": 41, "0030002": [41, 57], "625175": 41, "642559": 41, "0700003": 41, "146278": 41, "177276": 41, "bd": [41, 44], "rhapsodi": [41, 44], "transcriptom": [41, 42, 44, 46, 52, 53, 54], "0009901": 41, "42397": 41, "121394": 41, "58981": [41, 44], "117962": 41, "0700004": 41, "96145": 41, "0008995": 41, "29128": 41, "0008953": 41, "4693": 41, "9386": 41, "strt": 41, "0010010": 41, "3105": 41, "5244": 41, "cel": 41, "69": [41, 45], "0000129": 41, "268114": 41, "370771": 41, "1038": [41, 42, 46, 50, 52, 53], "48998": 41, "62617": 41, "easi": [41, 51, 55], "fall": [41, 42], "certain": [41, 43, 61], "distribut": [41, 42, 50], "answer": 41, "exemplifi": 41, "stat": 41, "trivial": 41, "human_cell_typ": 41, "syncytiotrophoblast": [41, 57], "placent": [41, 57], "villou": [41, 57], "trophoblast": [41, 44, 45, 52, 53, 57], "extravil": [41, 57], "56400868": [41, 44], "pericyt": [41, 44, 45, 61], "56400869": [41, 44], "56400870": [41, 44], "56400871": [41, 44], "56400872": [41, 44], "focu": [41, 42, 43, 46], "de": 41, "human_cell_type_count": 41, "2673669": 41, "glutamaterg": [41, 45], "1541605": 41, "cd4": [41, 44, 45, 47], "alpha": [41, 44, 45], "1258976": 41, "cd8": [41, 44, 45, 47], "1235987": 41, "classic": [41, 44], "monocyt": [41, 44, 45, 47], "1030996": 41, "microfold": 41, "epithelium": 41, "intestin": [41, 45, 54], "dendrit": [41, 45, 47], "serou": 41, "bronchu": 41, "sperm": [41, 58], "enteroendocrin": 41, "599": 41, "abund": [41, 44], "That": 41, "achiev": [41, 55], "human_liver_cell_typ": 41, "85739": 41, "hepatoblast": 41, "58447": 41, "neoplast": [41, 45], "52431": 41, "erythroblast": 41, "45605": 41, "31388": 41, "pulmonari": [41, 44, 56, 57], "arteri": 41, "endotheli": [41, 44, 45, 52, 54, 61], "germin": 41, "center": 41, "b": [41, 44, 45, 47, 57], "pneumocyt": [41, 44], "innat": 41, "lymphoid": 41, "126": [41, 61], "go": 41, "sake": [41, 44, 51], "t_cells_list": 41, "t_cells_diseas": 41, "f": [41, 42, 43, 44, 45, 46, 47, 48, 49, 52, 53, 54, 55, 60, 61], "hodgkin": 41, "lymphoma": 41, "blood": [41, 52, 54, 56, 57], "62499": 41, "819428": 41, "30578": 41, "nose": 41, "respiratori": [41, 44, 58], "saliva": 41, "41": [41, 45], "crohn": 41, "colon": 41, "17490": 41, "52029": 41, "down": 41, "syndrom": 41, "181": 41, "breast": 41, "cancer": [41, 44], "1850": 41, "chronic": [41, 44, 57], "obstruct": [41, 44, 57], "9382": 41, "rhiniti": 41, "909": 41, "renal": [41, 44, 52, 53], "carcinoma": [41, 44, 57], "6548": 41, "20540": 41, "lymph": 41, "cystic": [41, 44], "fibrosi": [41, 44, 57], "follicular": 41, "1089": 41, "influenza": 41, "8871": 41, "interstiti": [41, 44, 45, 56, 57], "1803": 41, "benign": 41, "neoplasm": 41, "oncocytoma": 41, "2408": 41, "adenocarcinoma": [41, 44, 57], "205": 41, "3274": 41, "507": 41, "215013": 41, "24969": 41, "pleural": 41, "fluid": 41, "11558": 41, "5922": 41, "lymphangioleiomyomatosi": [41, 44, 57], "513": 41, "36573": 41, "nonpapillari": 41, "adipos": [41, 54], "4828": 41, "288": [41, 52], "clot": 41, "1717": 41, "69136": 41, "pleomorph": [41, 44, 57], "1715": 41, "pneumonia": [41, 44, 57], "856": [41, 51], "1671": 41, "disord": 41, "34301": 41, "squamou": [41, 44, 45, 57], "52053": 41, "lupu": 41, "erythematosu": 41, "355471": 41, "don": [41, 46, 48, 50, 54, 57], "forget": [41, 46, 48, 50, 57], "del": [41, 42, 43, 44], "opportun": 42, "inter": 42, "ignor": [42, 43, 44, 45, 46, 47, 49, 51, 55], "home": [42, 44, 46], "ssm": [42, 44, 46], "lib": [42, 44, 46], "python3": [42, 44, 46], "_set": 42, "63": [42, 45], "userwarn": [42, 44, 46], "70": [42, 45], "dl_pin_memory_gpu_train": 42, "pin_memori": 42, "loader": 42, "tqdm": [42, 44, 46], "auto": [42, 44, 46], "tqdmwarn": [42, 44, 46], "iprogress": [42, 44, 46], "jupyt": [42, 44, 46, 63], "ipywidget": [42, 44, 46], "user_instal": [42, 44, 46], "autonotebook": [42, 44, 46], "notebook_tqdm": [42, 44, 46], "census_dataset": [42, 44, 52, 53], "tabula_liv": 42, "loc": [42, 52], "525": [42, 46], "0b9d8a04": [42, 46, 53], "bb9d": [42, 46, 53], "44da": [42, 46, 53], "aa27": [42, 46, 53], "705bb65b54eb": [42, 46, 53], "s41586": [42, 46, 50, 52, 53], "020": [42, 46, 52, 53], "2496": [42, 46, 53], "4546e757": [42, 46], "34d0": [42, 46], "4d17": [42, 46], "be06": [42, 46], "538318925fcd": [42, 46], "atla": [42, 44, 46, 52, 53, 54], "cha": [42, 46], "2859": [42, 46], "547": 42, "6202a243": [42, 54], "b713": [42, 54], "4e12": [42, 54], "9ced": [42, 54], "c387f8483dea": [42, 54], "7294": [42, 54], "tabula_muris_liver_id": 42, "smart_seq_gene_length": 42, "to_numpi": [42, 43, 44, 45, 46, 49, 51, 55], "smart_seq_index": 42, "smart_seq_x": 42, "proce": [42, 46], "ceil": 42, "put": [42, 55], "omic": [42, 55], "yosef": 42, "lab": [42, 44, 52, 53, 55], "uc": [42, 43, 55], "berkelei": 42, "variat": [42, 43], "infer": [42, 61], "deep": 42, "scrna": [42, 44], "comprehens": 42, "best": [42, 43], "practic": [42, 46], "strength": 42, "bread": [42, 44], "butter": [42, 44], "neighbor": [42, 43, 44, 45, 46, 47, 49, 55], "graph": [42, 43], "visual": [42, 43, 44, 45, 47], "umap": [42, 43, 44, 45, 46, 47, 49, 55], "But": [42, 54], "save": [42, 49, 53, 55, 56, 61], "normalize_tot": [42, 43, 44, 45, 46, 47], "target_sum": [42, 43, 44, 45, 46, 47], "1e4": [42, 44, 45, 46, 47], "log1p": [42, 43, 44, 45, 46, 47], "max_valu": [42, 44, 45, 46], "final": [42, 43, 45, 46, 49, 51, 52, 54, 55, 59, 61], "tl": [42, 43, 44, 45, 46, 47, 49, 55], "pca": [42, 44, 45, 46], "n_neighbor": [42, 43, 45, 47], "n_pc": [42, 45], "40": [42, 45], "pl": [42, 43, 44, 45, 46, 47, 49, 55, 56], "color": [42, 43, 44, 45, 46, 47, 49, 55], "plot": [42, 43, 44, 46, 47, 49, 55], "_tool": [42, 44, 46], "scatterplot": [42, 43, 44, 46], "392": [42, 44, 46], "No": [42, 44, 46], "colormap": [42, 44, 46], "cmap": [42, 44, 46], "cax": [42, 44, 46], "scatter": [42, 43, 44, 46, 47, 49, 55], "strong": [42, 44], "properli": 42, "principl": 42, "randomli": [42, 43], "whenev": 42, "evidenc": 42, "articl": 42, "health": 42, "sikkema": 42, "et": [42, 54], "al": [42, 54], "whom": 42, "perfom": 42, "43": [42, 45, 52, 59], "latent": [42, 43, 47], "setup_anndata": 42, "vae": 42, "n_layer": 42, "n_latent": 42, "gene_likelihood": 42, "nb": 42, "n_hidden": 42, "50": [42, 45, 49, 57], "gpu": [42, 45, 47], "tpu": 42, "tf_cpp_min_log_level": 42, "rerun": [42, 43], "info": [42, 44, 47, 56], "max_epoch": 42, "ipu": 42, "hpu": 42, "epoch": [42, 61], "00": [42, 46, 49], "15it": 42, "v_num": 42, "train_loss_step": 42, "545": 42, "train_loss_epoch": 42, "560": 42, "trainer": [42, 45], "17it": 42, "represent": [42, 43, 45], "x_scvi": 42, "get_latent_represent": [42, 47], "use_rep": [42, 43, 45, 47, 49, 55], "mainli": 42, "driven": [42, 43], "albeit": 42, "contribut": [42, 43, 44, 49, 55], "curat": [42, 50, 56], "strongli": 42, "22": [42, 44, 45, 56, 58, 60, 63], "dataset_id_donor_id": 42, "astyp": [42, 43, 45], "23": [42, 44, 45, 52, 56], "24": [42, 44, 45, 52, 60], "27it": 42, "520": 42, "550": 42, "25it": 42, "mostli": [42, 44], "nucleu": [42, 55, 57], "accomplish": [42, 44], "latter": [42, 57], "knowledg": 43, "journei": 43, "2d": [43, 49, 55], "involv": 43, "nonlinear": 43, "transform": [43, 44, 45, 46, 47, 55], "Such": 43, "affect": [43, 61], "manifold": 43, "overclust": 43, "reduct": [43, 54], "mind": [43, 58], "hypothes": 43, "focus": 43, "ultim": 43, "underli": [43, 61, 62], "investig": 43, "behind": 43, "foundat": [43, 55], "technic": 43, "often": 43, "might": [43, 56], "pure": 43, "systemat": 43, "bias": [43, 44], "complic": 43, "matter": 43, "techniqu": 43, "nearest": 43, "themselv": 43, "amplifi": [43, 45], "rigor": 43, "benchmark": 43, "fulli": 43, "space": [43, 45], "highlight": 43, "challeng": 43, "unsolv": 43, "problem": 43, "briefli": [43, 56], "illustr": [43, 55], "capac": 43, "captur": 43, "intrigu": 43, "phenomena": 43, "disclaim": 43, "depth": [43, 44, 46], "insight": [43, 55], "glean": 43, "innacur": 43, "leidenalg": 43, "hdbscan": 43, "scikit": [43, 63], "warn": [43, 44, 45, 47, 49, 55], "get_embed": [43, 49, 55], "filterwarn": [43, 45, 47, 49, 55], "def": [43, 51, 61], "generate_umaps_from_embed": 43, "emb_nam": [43, 49], "euclidean": 43, "key_ad": 43, "neighbors_kei": 43, "x_emb_nam": 43, "x_": 43, "_": [43, 55], "_umap": 43, "x_umap": 43, "var_nam": [43, 44, 45, 47], "build_anndata_with_embed": 43, "coord": [43, 55], "miss": [43, 47, 51, 55], "intersect": 43, "accordingli": 43, "filt": 43, "ones": 43, "nan_row_sum": 43, "isnan": [43, 51], "total_column": 43, "embedding_uris_commun": 43, "scgpt": [43, 55], "contrib": [43, 45, 47, 49, 55], "cxg": [43, 55], "embedding_names_censu": 43, "embedding_names_al": 43, "obs_df": [43, 48, 49, 51, 55, 58, 60], "n_subset_cel": 43, "150000": 43, "idx_rand": 43, "choic": [43, 45, 47, 56], "soma_joinids_subset": 43, "tolist": [43, 44, 47, 48], "799353": 43, "distinctli": 43, "oca2": 43, "marker": [43, 47], "kit": 43, "vari": 43, "immatur": 43, "clearli": 43, "slight": 43, "extens": [43, 54], "concentr": 43, "seen": 43, "satellit": 43, "signatur": 43, "probabl": [43, 45, 61], "mani": [43, 51, 61], "disconnect": 43, "compon": 43, "tend": 43, "extent": 43, "versu": 43, "unclear": 43, "qualit": 43, "pronounc": 43, "basi": 43, "geneformer_umap": 43, "use_raw": 43, "scgpt_umap": 43, "uce_umap": 43, "scvi_umap": 43, "subclust": 43, "leiden": [43, 45, 47], "emploi": 43, "densiti": 43, "pairwis": 43, "distanc": [43, 51], "compar": [43, 47], "reveal": [43, 44], "distinct": [43, 61], "signific": [43, 58], "agreement": 43, "mutual": 43, "nmi": 43, "score": 43, "assign": [43, 51], "yield": 43, "65": [43, 45], "inher": 43, "expect": [43, 44, 46, 55], "finetun": 43, "homogen": [43, 61], "belong": 43, "underscor": 43, "draw": 43, "coupl": 43, "conclus": 43, "lead": 43, "identif": 43, "evid": 43, "examin": [43, 61], "relianc": 43, "unjustifi": 43, "known": 43, "phenomenon": 43, "cross": [43, 44], "fuller": 43, "hold": [43, 61], "lack": 43, "necessit": 43, "thereof": 43, "pd": [43, 44, 51, 59, 60, 61], "pdist": 43, "squareform": 43, "sklearn": [43, 47], "normalized_mutual_info_scor": 43, "adata_rbn": 43, "_connect": 43, "_leiden": 43, "pairwise_dist": 43, "_hdbscan": 43, "min_cluster_s": 43, "min_sampl": 43, "precomput": [43, 58], "fit_predict": 43, "displai": [43, 47, 48, 51, 55, 56, 61], "embedding_kei": 43, "sim_scores_leiden": 43, "len": [43, 44, 45, 47, 48, 51, 53, 54, 61], "sim_scores_hdbscan": 43, "embedding_i": 43, "enumer": 43, "embedding_j": 43, "sim_scores_leiden_t": 43, "sim_scores_hdbscan_t": 43, "seem": [43, 44], "log": [43, 44, 46, 47], "08115140648299893": 43, "7314893672395334": 43, "33702547333985217": 43, "7730928192948211": 43, "723355": 43, "721222": 43, "677754": 43, "775717": 43, "753719": 43, "822202": 43, "089308": 43, "106379": 43, "073141": 43, "480575": 43, "646415": 43, "356779": 43, "11896761": 43, "th": 43, "wherea": [43, 55], "tendenc": 43, "condit": [43, 57], "glioblastoma": 43, "pilocyt": 43, "astrocytoma": 43, "mix": 43, "outsid": 43, "53d208b0": [43, 44, 52], "2cfd": [43, 44, 52], "4366": [43, 44, 52], "9866": [43, 44, 52], "c3c6114081bc": [43, 44, 52], "smartseq": 43, "cftr": 43, "rare": 43, "recogniz": 43, "summary_t": 44, "980": [44, 59], "2907156": 44, "6011592": 44, "lung_ob": 44, "5945423": 44, "9f222629": [44, 56], "9e39": [44, 56], "47d0": [44, 56], "b83f": [44, 56], "e08d610c7479": [44, 56], "nativ": [44, 58], "0000003": [44, 48, 58], "0000461": [44, 57, 60], "5945426": 44, "ciliat": [44, 45], "columnar": [44, 45], "tracheobronchi": 44, "tree": 44, "0002145": 44, "57": [44, 45], "hsapdv": [44, 57], "0000151": 44, "0002771": 44, "0000384": [44, 60], "5945428": 44, "0000625": [44, 48], "0005097": 44, "5945432": 44, "0000624": [44, 48], "0005061": 44, "5945441": 44, "2907151": 44, "8c42cfd0": [44, 52, 53, 56], "0b0a": [44, 52, 53, 56], "46d5": [44, 52, 53, 56], "910c": [44, 52, 53, 56], "fc833d83c45e": [44, 52, 53, 56], "0000669": [44, 48], "0000145": 44, "0000383": [44, 60], "2907152": 44, "2907153": 44, "2907154": 44, "2907155": 44, "deeper": 44, "dive": 44, "characterist": 44, "set_index": [44, 47, 51, 53, 59, 60], "f171db61": [44, 52, 53, 57], "e57": [44, 52, 53, 57], "4535": [44, 52, 53, 57], "a06a": [44, 52, 53, 57], "35d8b6ef8f2b": [44, 52, 53, 57], "multiom": [44, 52, 53], "developm": [44, 52, 53], "donor_p13_trophoblast": [44, 52, 53], "ecf2e08": [44, 52, 53], "2032": [44, 52, 53], "4a9e": [44, 52, 53], "b466": [44, 52, 53], "b65b395f4a02": [44, 52, 53], "74cff64f": [44, 52, 53], "9da9": [44, 52, 53], "4b2a": [44, 52, 53], "9b3b": [44, 52, 53], "8a04a1598040": [44, 52, 53], "vivo": [44, 52, 53], "5af90777": [44, 52, 53], "6760": [44, 52, 53], "4003": [44, 52, 53], "9dba": [44, 52, 53], "8f945fec6fdf": [44, 52, 53], "intr": [44, 52, 53], "bd65a70f": [44, 52, 53], "b274": [44, 52, 53], "4133": [44, 52, 53], "b9dd": [44, 52, 53], "0d1431b6af34": [44, 52, 53], "multiregion": [44, 52, 53], "imm": [44, 52, 53], "f9ad5649": [44, 52, 53], "f372": [44, 52, 53], "43e1": [44, 52, 53], "a3a8": [44, 52, 53], "423383e5a8a2": [44, 52, 53], "molecular": [44, 52, 53], "character": [44, 46, 52, 53, 54], "vuln": [44, 52, 53], "456e8b9b": [44, 52, 53], "f872": [44, 52, 53], "488b": [44, 52, 53], "871d": [44, 52, 53], "94534090a865": [44, 52, 53], "peripher": [44, 52, 53], "immun": [44, 52, 53, 54], "respon": [44, 52, 53], "589": [44, 52, 53], "2adb1f8a": [44, 52, 53, 57], "a6b1": [44, 52, 53, 57], "4909": [44, 52, 53, 57], "8ee8": [44, 52, 53, 57], "484814e2d4bf": [44, 52, 53, 57], "landscap": [44, 52, 53], "sing": [44, 52, 53], "590": [44, 52, 53], "e04daea4": [44, 52, 53], "4412": [44, 52, 53], "45b5": [44, 52, 53], "989e": [44, 52, 53], "76a9be070a89": [44, 52, 53], "krasnow": [44, 52, 53], "591": [44, 52, 53], "592": [44, 52, 53], "append": [44, 55], "dataset_cell_count": 44, "cell_count": 44, "merg": [44, 45, 55, 59], "1e6a6ef9": 44, "7ec9": 44, "4c90": 44, "bbfb": 44, "2ad3c3165fd1": 44, "1028006": 44, "resolut": [44, 56], "luca": 44, "ex": 44, "314": 44, "784630": 44, "f7c1c579": 44, "2dc0": 44, "47e2": 44, "ba19": 44, "8165c5a0e353": 44, "217738": 44, "fetal": 44, "survei": 44, "embryon": 44, "483": 44, "d8da613f": 44, "e681": 44, "4c69": 44, "b463": 44, "e94f5e66847f": 44, "116313": 44, "lethal": 44, "80": [44, 45, 58], "576f193c": 44, "75d0": 44, "4a11": 44, "bd25": 44, "8676587e6dc2": 44, "90384": 44, "htan": 44, "msk": 44, "377": 44, "d41f45c1": 44, "1b7b": 44, "4573": 44, "a998": 44, "ac5c5acb1647": 44, "82991": 44, "reg": 44, "regulatori": 44, "58": [44, 45], "3dc61ca1": 44, "ce40": 44, "46b6": 44, "8337": 44, "f27260fd9a03": 44, "71752": 44, "uncov": 44, "proxima": 44, "325": 44, "60993": 44, "2672b679": 44, "8048": 44, "4f5e": 44, "9786": 44, "f1b196ccfd08": 44, "57019": 44, "spleen": [44, 52, 54], "parenchyma": 44, "416": 44, "9dbab10c": 44, "118d": 44, "496b": 44, "966a": 44, "67f1763a6b7d": 44, "49014": 44, "criti": 44, "482": 44, "9968be68": 44, "ab65": 44, "4a38": 44, "9e1a": 44, "c9b6abece194": 44, "47909": 44, "chart": 44, "endod": 44, "78": [44, 45], "3de0ad6d": 44, "4378": 44, "4f62": 44, "b37b": 44, "ec0b75a50d94": 44, "46500": 44, "lungmap": 44, "broad": 44, "ag": [44, 46, 54], "healthi": 44, "456": 44, "2f132ec9": 44, "24b5": 44, "422f": 44, "9be0": 44, "ccef03b4fe28": 44, "39778": 44, "sar": 44, "cov": 44, "receptor": [44, 58], "ace2": [44, 56], "tmprss2": 44, "prima": 44, "312": 44, "1e5bd3b8": 44, "6a0e": 44, "4959": 44, "8d69": 44, "cafed30fe814": 44, "35699": 44, "emphysema": [44, 57], "130": 44, "35682": [44, 52], "475": [44, 52], "1b9d8702": 44, "5af8": 44, "4142": 44, "85ed": 44, "020eb06ec4f6": 44, "35419": 44, "tiss": 44, "411": 44, "4ed927e9": 44, "c099": 44, "49af": 44, "b8ce": 44, "a2652d069333": 44, "35284": 44, "367": 44, "33698": 44, "4b6af54a": 44, "4a21": 44, "46e0": 44, "bc8d": 44, "673c0561a836": 44, "18386": 44, "01209dce": 44, "3575": 44, "4bed": 44, "b1df": 44, "129f57fbc031": 44, "11059": 44, "8657": 44, "f9846bb4": 44, "784d": 44, "4582": 44, "92c1": 44, "3f279e4c6f0c": 44, "176": [44, 52], "fibroblast": [44, 45, 56, 58], "smooth": 44, "muscl": [44, 45, 52, 54], "317": 44, "f64e1be1": 44, "de15": 44, "4d27": 44, "8da4": 44, "82225cd4c035": 44, "55": [44, 45, 60], "370": 44, "810ac45f": 44, "8969": 44, "4698": 44, "b42c": 44, "652f802f75c2": 44, "endothelium": 44, "320": 44, "0ba16f4b": 44, "cb87": 44, "4fa3": 44, "9363": 44, "19fc51eec6e7": 44, "myeloid": [44, 45], "326": 44, "reprens": 44, "divers": [44, 48, 52, 55], "plastic": 44, "tumor": 44, "neutrophil": 44, "subpopul": 44, "distal": 44, "gradient": 44, "differenti": [44, 45], "regul": 44, "epitheli": [44, 45, 52, 54, 58, 61], "fate": 44, "tell": 44, "1236968": 44, "702074": 44, "262323": 44, "122902": 44, "97432": 44, "65220": 44, "41852": 44, "25662": 44, "8638": 44, "8016": 44, "1164084": 44, "772120": 44, "331019": 44, "209675": 44, "120796": 44, "55254": 44, "51343": 44, "45714": 44, "31923": 44, "31792": 44, "31540": 44, "21167": 44, "17590": 44, "12374": 44, "10765": 44, "1402565": 44, "1122990": 44, "381601": 44, "2468587": 44, "438569": 44, "head": [44, 52], "alveolar": [44, 58], "macrophag": [44, 45], "291507": 44, "263362": 44, "211456": 44, "189471": 44, "154415": 44, "ii": 44, "128463": 44, "tract": 44, "105090": 44, "102303": 44, "killer": [44, 45, 54, 56], "95953": 44, "92846": 44, "stromal": [44, 45, 52, 54], "87714": 44, "81125": 44, "malign": 44, "75917": 44, "plasma": 44, "64551": 44, "59353": 44, "45305": 44, "capillari": 44, "39416": 44, "36381": 44, "36049": 44, "35467": 44, "2576327": 44, "147410": 44, "alveolu": 44, "54085": 44, "lingula": 44, "upper": [44, 52], "lobe": 44, "32099": 44, "17854": 44, "12880": 44, "10113": 44, "9276": 44, "7981": 44, "middl": 44, "3847": 44, "lung_var": 44, "ensg00000121410": [44, 52], "a1bg": [44, 52], "3999": [44, 52], "ensg00000268895": [44, 52], "as1": [44, 52], "3374": [44, 52], "ensg00000148584": [44, 52], "a1cf": [44, 52], "9603": [44, 52], "ensg00000175899": [44, 52], "a2m": [44, 52], "6318": [44, 52], "ensg00000245105": [44, 52], "2948": [44, 52], "ensg00000288719": [44, 52], "rp4": [44, 52], "669p10": [44, 52], "ensg00000288720": [44, 52], "rp11": [44, 52], "852e15": [44, 52], "7007": [44, 52], "ensg00000288721": [44, 52], "rp5": [44, 52], "973n23": [44, 52], "7765": [44, 52], "ensg00000288723": [44, 52], "553n16": [44, 52], "1015": [44, 52], "ensg00000288724": [44, 52], "rp13": [44, 52], "546i2": [44, 52], "625": [44, 52], "60664": [44, 49, 52, 55, 61], "actual": [44, 61], "mislead": 44, "know": [44, 54, 57], "presence_matrix": [44, 46, 52], "get_presence_matrix": [44, 46, 52], "a1": 44, "17811": 44, "50259": 44, "44150": 44, "34265": 44, "22447": 44, "23642": 44, "26347": 44, "20921": 44, "24672": 44, "27705": 44, "27243": 44, "26323": 44, "27181": 44, "23203": 44, "57042": 44, "32610": 44, "29620": 44, "26454": 44, "23705": 44, "38676": 44, "47307": 44, "23740": 44, "22552": 44, "20594": 44, "19952": 44, "uint64": 44, "genes_measur": 44, "var_somaid": 44, "nonzero": [44, 46], "ensg00000128274": 44, "a4galt": 44, "3358": 44, "ensg00000094914": 44, "aaa": 44, "4727": 44, "ensg00000081760": 44, "aac": 44, "16039": 44, "29951": 44, "ensg00000177272": 44, "kcna3": 44, "2476": 44, "30157": 44, "ensg00000184709": 44, "lrrc26": 44, "1209": 44, "30185": 44, "ensg00000087250": 44, "mt3": 44, "1679": 44, "30202": 44, "ensg00000136352": 44, "nkx2": 44, "3165": 44, "30512": 44, "ensg00000231439": 44, "wasir2": 44, "1054": 44, "11595": 44, "composit": 44, "infect": 44, "12k": 44, "intens": 44, "exercis": 44, "exploratori": 44, "000": 44, "lung_cell_subsampled_n": 44, "100000": 44, "lung_cell_subsampled_id": 44, "random_st": 44, "lung_gene_id": 44, "lung_adata": 44, "highest_expr_gen": 44, "n_top": 44, "calculate_qc_metr": 44, "percent_top": 44, "inplac": [44, 47], "violin": [44, 47], "n_genes_by_count": 44, "rotat": 44, "90": 44, "total_count": 44, "outlier": 44, "exlcud": 44, "ll": [44, 46, 55, 60], "extra": 44, "_highly_variable_gen": 44, "_simpl": 44, "843": 44, "view_to_actu": 44, "28": [44, 45, 56, 61], "n_cell_typ": 44, "drop_dupl": [44, 57], "randint": 44, "rang": [44, 45, 47, 49, 55, 61], "06x": 44, "0xffffff": 44, "palett": 44, "legend_loc": 44, "hard": 44, "32": [44, 45, 61], "top_cell_typ": 44, "reset_index": [44, 51], "lung_adata_top_cell_typ": 44, "unix": [45, 47], "mkdir": [45, 47], "p": [45, 47, 50, 51, 59], "wget": [45, 47], "nv": [45, 47], "pbmc3k_filtered_gene_bc_matric": [45, 47], "tar": [45, 47], "gz": [45, 47], "cf": [45, 47], "10xgenom": [45, 47], "exp": [45, 47], "pbmc3k": [45, 47], "xzf": [45, 47], "09": [45, 55], "38": [45, 56, 59], "7621991": [45, 47], "gt": [45, 47, 52, 56], "deatail": [45, 47], "insid": [45, 47], "geneformer_info": 45, "cxg_embedding_info": [45, 47], "model_link": [45, 47, 55], "cli": [45, 53], "progress": [45, 47, 56], "fine_tuned_geneform": 45, "datacollatorforcellclassif": 45, "embextractor": 45, "transcriptometoken": 45, "bertforsequenceclassif": 45, "ensembl_id": [45, 47], "ensg00000139618": 45, "suffix": 45, "n_count": [45, 47], "joinid": [45, 47, 52, 55], "write": [45, 53], "disk": 45, "read_10x_mtx": [45, 47], "filtered_gene_bc_matric": [45, 47], "hg19": [45, 47], "gene_id": [45, 47], "h5ad_dir": 45, "makedir": 45, "track": 45, "token_dir": 45, "tokenized_data": 45, "custom_attr_name_dict": 45, "tokenize_data": 45, "data_directori": 45, "output_directori": 45, "output_prefix": 45, "file_format": 45, "filter_pass": 45, "model_dir": 45, "label_mapping_dict_fil": 45, "label_to_cell_subclass": 45, "fp": 45, "label_mapping_dict": 45, "best4": 45, "cn": 45, "sensu": 45, "vertebrata": 45, "gabaerg": 45, "abnorm": 45, "adventiti": [45, 56], "anim": 45, "cardiocyt": 45, "skelet": 45, "cuboid": 45, "contractil": 45, "defens": 45, "duct": 45, "ecto": 45, "ectoderm": 45, "endo": 45, "pancrea": [45, 52, 54], "urethra": 45, "eukaryot": 45, "fat": [45, 52], "germ": [45, 58], "glandular": 45, "35": [45, 61], "glial": 45, "37": 45, "hematopoiet": [45, 57], "precursor": 45, "hepatocyt": 45, "inflammatori": 45, "interneuron": [45, 52], "42": 45, "ionocyt": 45, "44": [45, 47, 56], "45": [45, 59], "46": 45, "leukocyt": [45, 61], "47": 45, "lymphocyt": 45, "48": [45, 51], "49": 45, "mammari": [45, 54], "mesenchym": [45, 56], "52": [45, 51], "meso": 45, "mesoderm": 45, "motor": 45, "mural": 45, "59": [45, 54], "myofibroblast": 45, "neural": 45, "termin": 45, "ovarian": 45, "surfac": 45, "67": [45, 59], "phagocyt": 45, "pigment": 45, "cultur": [45, 58], "71": 45, "primordi": 45, "progenitor": [45, 56], "73": 45, "salivari": 45, "sebac": 45, "75": [45, 52], "secretori": 45, "76": 45, "sensori": 45, "77": 45, "seromucu": 45, "secret": [45, 56], "somat": 45, "79": 45, "stem": [45, 56, 57, 60], "81": [45, 51], "82": 45, "83": [45, 51, 59], "84": 45, "transit": 45, "85": 45, "86": 45, "87": 45, "vertebr": 45, "load_from_disk": 45, "num_row": 45, "2700": 45, "dummi": [45, 47], "add_column": 45, "slow": 45, "pretrain": 45, "from_pretrain": 45, "data_col": 45, "vector": 45, "predicted_label_id": 45, "argmax": [45, 61], "predicted_label": 45, "predicted_cell_subclass": 45, "min_mean": 45, "0125": 45, "max_mean": 45, "min_disp": 45, "svd_solver": 45, "arpack": 45, "scapi": 45, "original_cell_typ": [45, 47], "cd14": [45, 47], "fcgr3a": [45, 47], "megakaryocyt": [45, 47], "rename_categori": 45, "titl": [45, 49, 55], "n_class": 45, "output_dir": 45, "geneformer_embed": 45, "embex": 45, "model_typ": 45, "cellclassifi": 45, "num_class": 45, "max_ncel": 45, "emb_label": 45, "emb_lay": 45, "forward_batch_s": 45, "nproc": 45, "extract_emb": 45, "model_directori": 45, "input_data_fil": 45, "re": [45, 52], "grab": [45, 48, 52, 55, 59], "c697eaaf": [45, 47], "a3b": [45, 47], "4251": [45, 47], "b036": [45, 47], "5f9052179e70": [45, 47], "f2a488bf": [45, 47], "782f": [45, 47], "4c20": [45, 47], "a8e5": [45, 47], "cb34d48c1f7e": [45, 47], "fa8605cf": [45, 47], "f27e": [45, 47], "44af": [45, 47], "ac2a": [45, 47], "476bee4410d3": [45, 47], "3c75a463": [45, 47], "6a87": [45, 47], "4132": [45, 47], "83a8": [45, 47], "c3002624394d": [45, 47], "adata_censu": [45, 47], "simplifi": [45, 51], "shared_gen": 45, "index_subset": [45, 47], "3000": [45, 47], "adata_join": 45, "outer": 45, "liver_dataset": 46, "liver_dataset_id": 46, "liver_adata": 46, "859": 46, "52392": [46, 51, 53, 59], "gene_pres": 46, "17992": 46, "992": 46, "toarrai": [46, 55], "000e": 46, "590e": 46, "02": [46, 49, 50, 55], "969e": 46, "03": [46, 49, 52, 53], "280e": 46, "250e": 46, "400e": 46, "gene_length": 46, "00000000e": [46, 49], "58654413e": 46, "32001885e": 46, "74444813e": 46, "31455088e": 46, "71500419e": 46, "78985747e": 46, "real": 46, "filter_cel": 46, "min_gen": 46, "filter_gen": 46, "min_cel": 46, "saniti": 46, "prepar": 47, "pbmc": 47, "3k": 47, "scvi_info": 47, "pt": 47, "cp": [47, 53], "randomforestclassifi": 47, "unassign": 47, "model_filenam": 47, "prepare_query_anndata": 47, "is_train": 47, "trick": 47, "forward": [47, 61], "reprsent": 47, "vae_q": 47, "load_query_data": 47, "gene_symbol": [47, 56], "notnul": 47, "perfectli": 47, "appropri": 47, "markers_row1": 47, "il7r": 47, "lyz": 47, "ms4a1": 47, "cd8a": 47, "gnly": 47, "markers_row2": 47, "nkg7": 47, "ms4a7": 47, "fcer1a": 47, "cst3": 47, "ppbp": 47, "catch_warn": 47, "nk": 47, "label_map": 47, "adata_census_subset": 47, "adata_combin": 47, "correl": 47, "forest": 47, "classifi": 47, "rfc": 47, "predicted_cell_typ": [47, 61], "enough": [48, 51], "itself": 48, "tip": 48, "soma_df": 48, "faster": 48, "refin": 48, "_obs_": 48, "unique_cell_type_ontology_term_id": 48, "lot": 48, "top_10": 48, "nthe": 48, "0000525": [48, 57], "2000060": [48, 57], "0008036": [48, 57], "0002488": 48, "0002343": 48, "0000084": 48, "0001078": 48, "0000815": 48, "0000235": 48, "3000001": 48, "0000540": 48, "7665340": 48, "0000679": 48, "1894047": 48, "0000128": 48, "1881077": 48, "1508920": 48, "1477453": 48, "1419507": 48, "0000057": 48, "1397813": 48, "0000860": 48, "1369142": 48, "1308000": [48, 58], "4023040": 48, "1229658": 48, "occurr": 48, "lung_tissu": 48, "ntop": 48, "185": 48, "0002063": 48, "0000775": 48, "0001044": 48, "0001050": 48, "0000814": 48, "0000071": 48, "0000192": 48, "0002503": 48, "0002370": 48, "562038": 48, "0000583": 48, "526859": 48, "323985": 48, "323610": 48, "266333": 48, "255425": 48, "205013": 48, "0000623": 48, "164944": 48, "0001064": 48, "149067": 48, "0002632": 48, "132243": 48, "0002082": 48, "ooo2084": 48, "0002080": 48, "0000746": 48, "49929": 48, "0008034": 48, "33361": 48, "0002548": 48, "33180": 48, "0002131": 48, "30915": 48, "0000115": 48, "30054": 48, "18391": 48, "0000763": 48, "14408": 48, "13552": 48, "9690": 48, "0002144": 48, "9025": 48, "labl": 48, "cols_to_queri": 48, "complet": [48, 58], "df": [48, 56], "col": [48, 51, 52], "tuniqu": 48, "372": [49, 55], "axisarrai": [49, 55], "soma_dim_1": [49, 51, 54, 55], "soma_data": [49, 51, 54, 55], "bfloat16": [49, 55], "bit": [49, 55], "expon": [49, 55], "mantissa": [49, 55], "simplest": [49, 55], "nervou": [49, 55], "befor": [49, 55], "correspondong": [49, 55], "31780": [49, 55], "get_embedding_metadata_by_nam": 49, "to_anndata": [49, 55], "obs_joinid": [49, 55], "embeddinng": [49, 55], "stand": [49, 55], "alon": [49, 55], "17187500e": 49, "82995605e": 49, "50000000e": 49, "39941406e": 49, "71606445e": 49, "39843750e": 49, "71115112e": 49, "32031250e": 49, "00781250e": 49, "55310059e": 49, "85009766e": 49, "10156250e": 49, "42614746e": 49, "45312500e": 49, "53295898e": 49, "12915039e": 49, "84765625e": 49, "54113770e": 49, "94531250e": 49, "38281250e": 49, "03149414e": 49, "28881836e": 49, "14111328e": 49, "78125000e": 49, "15234375e": 49, "39562988e": 49, "79687500e": 49, "48388672e": 49, "19628906e": 49, "62803650e": 49, "88446045e": 49, "75694072": 50, "45846761": 50, "16292": 50, "2153": 50, "doi": [50, 55], "1002": 50, "ctm2": 50, "1356": 50, "695": 50, "696": 50, "697": 50, "1016": [50, 52, 53], "isci": 50, "698": 50, "1371": 50, "journal": 50, "699": 50, "700": 50, "cardiac": 50, "atrium": 50, "slice_dataset": 50, "isin": [50, 52], "sep": 50, "1126": [50, 52], "abl4896": [50, 52], "4866a804": 50, "37eb": 50, "436f": 50, "8c87": 50, "9cd585260061": 50, "e5f58829": [50, 52], "1a66": [50, 52], "40b5": [50, 52], "a624": [50, 52], "9046778e74f5": [50, 52], "bfd80f12": 50, "725c": 50, "4482": 50, "ad7f": 50, "1ed2b4909b0d": 50, "e6df8a57": 50, "f54f": 50, "413a": 50, "9d4d": 50, "dee03294d778": 50, "8d599205": 50, "5c51": 50, "4b50": 50, "9d48": 50, "3dec31238587": 50, "f6065c51": 50, "bd26": 50, "4aa5": 50, "a05d": 50, "2805aeea48d9": 50, "8cdbf790": 50, "4d29": 50, "4f46": 50, "9aef": 50, "21adfb2e21da": 50, "mybpc3": 50, "easier": 51, "experiment_queri": 51, "x_as_seri": 51, "nd": 51, "raw_n": 51, "aka": 51, "iloc": 51, "expens": 51, "var_df": [51, 52, 59], "float64": 51, "coo": 51, "arrow_tbl": 51, "var_dim": 51, "by_var": 51, "errstat": 51, "raw_mean": 51, "ensmusg00000051951": [51, 59], "xkr4": [51, 59], "6094": [51, 59], "202": 51, "032743": 51, "ensmusg00000089699": [51, 59], "gm1992": [51, 59], "250": [51, 59], "ensmusg00000102343": [51, 59], "gm37381": [51, 59], "1364": [51, 59], "ensmusg00000025900": [51, 59], "rp1": [51, 59], "12311": [51, 59], "106": 51, "236265": 51, "ensmusg00000025902": [51, 59], "sox17": [51, 59], "4772": [51, 59], "3259": 51, "991975": 51, "52387": [51, 59], "ensmusg00000081591": [51, 59], "btf3": [51, 59], "ps9": [51, 59], "496": [51, 59], "52388": [51, 59], "ensmusg00000118710": [51, 59], "mmu": [51, 59], "mir": [51, 59], "467a": [51, 59], "3_ensmusg00000118710": [51, 59], "52389": [51, 59], "ensmusg00000119584": [51, 59], "rn18": [51, 59], "1849": [51, 59], "52390": [51, 59], "ensmusg00000118538": [51, 59], "gm18218": [51, 59], "970": [51, 59], "52391": [51, 59], "ensmusg00000084217": [51, 59], "setd9": [51, 59], "670": [51, 59], "welford": [51, 60], "npt": 51, "onlinematrixmeanvari": 51, "n_sampl": 51, "n_variabl": 51, "axix": 51, "n_a": 51, "int32": [51, 61], "u_a": 51, "m2_a": 51, "coord_vec": 51, "value_vec": 51, "_mean_variance_upd": 51, "tupl": 51, "m2": 51, "_mean_variance_fin": 51, "max": 51, "jit": 51, "nopython": 51, "col_arr": 51, "val_arr": 51, "squar": 51, "val": 51, "u_prev": 51, "m2_prev": 51, "accont": 51, "chan": 51, "n_b": 51, "u_b": 51, "m2_b": 51, "mvn": 51, "raw_vari": 51, "848": 51, "312801": 51, "169": 51, "182975": 51, "279575": 51, "656207": 51, "malat1": 51, "ptprd": 51, "dlg2": 51, "pcdh9": 51, "n_cells_by_dataset": 51, "multiindex": 51, "from_product": 51, "n_cell": 51, "x_tbl": 51, "to_fram": 51, "get_index": 51, "pick": [51, 53], "3bbb6cf9": 51, "72b9": 51, "41be": 51, "b568": 51, "656de6eb18b5": 51, "ensmusg00000028399": 51, "79578": 51, "58b01044": 51, "c5e5": 51, "4b0f": 51, "8a2d": 51, "6ebf951e01ff": 51, "474": 51, "ensmusg00000052572": 51, "79513": 51, "98e5ea9f": [51, 60], "16d6": [51, 60], "47ec": [51, 60], "a529": [51, 60], "686e76515e39": [51, 60], "908": 51, "66ff82b4": 51, "9380": 51, "469c": 51, "bc4b": 51, "cfa08eacd325": 51, "c08f8441": 51, "4a10": 51, "4748": 51, "872a": 51, "e70c0bcccdba": 51, "ensmusg00000055421": 51, "79476": 51, "125": [51, 61], "3027": 51, "2910": 51, "117": 51, "ensmusg00000092341": 51, "79667": 51, "12622": 51, "20094": 51, "7102": 51, "12992": 51, "compil": 52, "n_dataset": 52, "therein": [52, 53], "human_rna": 52, "datasets_df": 52, "e2c257e7": [52, 53], "6f79": [52, 53], "487c": [52, 53], "b81c": [52, 53], "39451cd4ab3c": [52, 53], "023": [52, 53], "05869": [52, 53], "31497": [52, 53], "67070": [52, 53], "286326": [52, 53], "f7cecffa": [52, 53], "00b4": [52, 53], "4560": [52, 53], "a29a": [52, 53], "8ad626b8ee08": [52, 53], "ccell": [52, 53], "001": [52, 53], "270855": [52, 53], "3f50314f": [52, 53], "bdc9": [52, 53], "40c6": [52, 53], "8e4a": [52, 53], "b0901ebfbe4c": [52, 53], "2021": [52, 53], "007": [52, 53], "167283": [52, 53], "180bff9c": [52, 53], "c8a5": [52, 53], "4539": [52, 53], "b13b": [52, 53], "ddbc00d643e6": [52, 53], "s41593": [52, 53], "00764": [52, 53], "8168": [52, 53], "a72afd53": [52, 53], "ab92": [52, 53], "4511": [52, 53], "88da": [52, 53], "252fb0e26b9a": [52, 53], "s41591": [52, 53], "0944": [52, 53], "y": [52, 53], "44721": [52, 53], "38833785": [52, 53], "fac5": [52, 53], "48fd": [52, 53], "944a": [52, 53], "0f62a4c23ed1": [52, 53], "2157": [52, 53], "598266": [52, 53], "5d445965": [52, 53], "6f1a": [52, 53], "4b68": [52, 53], "ba3a": [52, 53], "b8f765155d3a": [52, 53], "2922": [52, 53], "9409": [52, 53], "65662": [52, 53], "593x60664": 52, "16133717": 52, "manipul": 52, "ensg00000286096": 52, "97a17473": 52, "e2b1": 52, "4f31": 52, "a544": 52, "44a60773e2dd": 52, "var_joinid": 52, "dataset_joinid": 52, "is_pres": 52, "tocoo": 52, "ff45e623": 52, "7f5f": 52, "46e3": 52, "b47d": 52, "56be0341f66b": 52, "13497": 52, "f01bdd17": 52, "4902": 52, "40f5": 52, "86e3": 52, "240d66dd2587": 52, "salivary_gland": 52, "27199": 52, "e6a11140": 52, "2545": 52, "46bc": 52, "929e": 52, "da243eed2ca": 52, "11505": 52, "e5c63d94": 52, "593c": 52, "4338": 52, "a489": 52, "e1048599e751": 52, "bladder": [52, 54], "24583": 52, "d8732da6": 52, "8d1d": 52, "42d9": 52, "b625": 52, "f2416c30054b": 52, "trachea": [52, 54], "9522": 52, "cee11228": 52, "9f0b": 52, "4e57": 52, "afe2": 52, "cfe15ee56312": 52, "34004": 52, "a357414d": 52, "2042": 52, "4eb5": 52, "95f0": 52, "c58604a18bdd": 52, "small_intestin": 52, "12467": 52, "a0754256": 52, "f44b": 52, "4c4a": 52, "962c": 52, "a552e47d3fdc": 52, "10650": 52, "983d5ec9": 52, "40e8": 52, "4512": 52, "9e65": 52, "a572a9c486cb": 52, "50115": 52, "5e5e7a2f": 52, "8f1c": 52, "42ac": 52, "90dc": 52, "b4f80f38e84c": 52, "20263": 52, "55cf0ea3": 52, "9d2b": 52, "4294": 52, "871e": 52, "bb4b49a79fc7": 52, "15020": [52, 61], "4f1555bc": 52, "4664": 52, "46c3": 52, "a606": 52, "78d34dd10d92": 52, "bone_marrow": [52, 53], "12297": 52, "2423ce2c": 52, "3149": 52, "4cca": 52, "a2ff": 52, "cf682ea29b5f": 52, "9641": 52, "1c9eb291": 52, "6d31": 52, "47e1": 52, "96b2": 52, "129b5e1ae64f": 52, "30746": 52, "18eb630b": 52, "a754": 52, "4111": 52, "8cd4": 52, "c24ec80aa5ec": 52, "lymph_nod": 52, "53275": 52, "0d2ee4ac": 52, "05ee": 52, "40b2": 52, "afb6": 52, "ebb584caa867": 52, "0ced5e76": 52, "6040": 52, "47ff": 52, "8a72": 52, "93847965afc0": 52, "thymu": [52, 54], "33664": 52, "283d65eb": 52, "dd53": 52, "496d": 52, "adb7": 52, "7570c7caa443": 52, "1101": [52, 55], "511898": 52, "8e10f1c4": 52, "8e98": 52, "41e5": 52, "b65f": 52, "8cd89a887122": 52, "2480956": 52, "139": 52, "fe1a73ab": 52, "a203": 52, "45fd": 52, "84e9": 52, "0f7fd19efcbd": 52, "dissect": 52, "amygdaloid": 52, "ami": [52, 63], "basolat": 52, "35285": 52, "143": 52, "f8dda921": 52, "5fb4": 52, "4c94": 52, "a654": 52, "c6fc346bfd6d": 52, "cerebr": 52, "cortex": 52, "cx": 52, "occipitotem": 52, "31899": 52, "160": 52, "dd03ce70": 52, "3243": 52, "4c96": 52, "9561": 52, "330cc461e4d7": 52, "perirhin": 52, "23732": 52, "165": 52, "d2b5efc1": 52, "14c6": 52, "4b5f": 52, "bd98": 52, "40f9084872d7": 52, "tail": 52, "hippocampu": 52, "hit": 52, "caudal": 52, "36886": 52, "175": 52, "c4b03352": 52, "af8d": 52, "492a": 52, "8d6b": 52, "40f304e0a122": 52, "superclust": 52, "medium": 52, "spini": 52, "152189": 52, "c2aad8fc": 52, "b63b": 52, "4f9b": 52, "9cfd": 52, "baf7bc9c1771": 52, "tempor": 52, "po": 52, "37642": 52, "177": 52, "c202b243": 52, "1aa1": 52, "4b16": 52, "bc9a": 52, "b36241f3b1e3": 52, "amygdala": 52, "excitatori": 52, "109452": 52, "178": 52, "bdb26abd": 52, "f4ba": 52, "4ea3": 52, "8862": 52, "c2340e7a4f55": 52, "cge": 52, "227671": 52, "183": 52, "acae7679": 52, "d077": 52, "461c": 52, "b857": 52, "ee6ccfeb267f": 52, "hih": 52, "ca1": 52, "39147": 52, "196": 52, "9372df2d": 52, "13d6": 52, "4fac": 52, "980b": 52, "919a5b7eb483": 52, "midbrain": 52, "periaqueduct": 52, "grai": 52, "33794": 52, "197": 52, "93131426": 52, "0124": 52, "4ab4": 52, "a013": 52, "9dfbcd99d467": 52, "epithalamu": 52, "eth": 52, "24327": 52, "206": [52, 59], "7c1c3d47": 52, "3166": 52, "43e5": 52, "9a95": 52, "65ceb2d45f78": 52, "pon": 52, "pn": 52, "pontin": 52, "reticular": 52, "49512": 52, "208": 52, "7a0a8891": 52, "9a22": 52, "4549": 52, "a55b": 52, "c2aca23c3a2a": 52, "hippocamp": 52, "74979": 52, "5e5ab909": 52, "f73f": 52, "4b57": 52, "98a0": 52, "6d2c5662f6a4": 52, "inferior": 52, "colliculu": 52, "32306": 52, "3f56901c": 52, "dd4a": 52, "47d6": 52, "b60b": 52, "7b0c0111cfb2": 52, "37911": 52, "3a7f3ab4": 52, "a280": 52, "4b3b": 52, "b2c0": 52, "6dd05614a78c": 52, "splatter": 52, "291833": 52, "249": 52, "35c8a04c": 52, "8639": 52, "4d15": 52, "8228": 52, "765d8d93fc96": 52, "hypothalamu": 52, "hth": 52, "supraopt": 52, "16753": 52, "270": 52, "07b1d7c8": 52, "5c2e": 52, "42f7": 52, "9246": 52, "26f746cd6013": 52, "myelencephalon": 52, "medulla": 52, "oblongata": 52, "27210": 52, "273": 52, "0325478a": 52, "9b52": 52, "b40a": 52, "2e2ab0d72eb1": 52, "intratelencephal": 52, "455006": 52, "483152": 52, "476": 52, "a68b64d8": 52, "aee3": 52, "4947": 52, "81b7": 52, "36b8fe5a44d2": 52, "82478": 52, "477": 52, "c5d88abe": 52, "f23a": 52, "45fa": 52, "a534": 52, "788985e93dad": 52, "264824": 52, "478": 52, "5a11f879": 52, "d1ef": 52, "458a": 52, "9b0bdfca5ebf": 52, "31691": 52, "479": 52, "104148": 52, "17481d16": 52, "ee44": 52, "49e5": 52, "bcf0": 52, "28c0780d8c4a": 52, "58109": 52, "ensg00000277745": 52, "h2ab3": 52, "58354": 52, "ensg00000233522": 52, "fam224a": 52, "2031": 52, "58411": 52, "ensg00000183146": 52, "prori": 52, "878": 52, "58523": 52, "ensg00000279274": 52, "533e23": 52, "58632": 52, "ensg00000277836": 52, "27211": 52, "all_experi": 53, "organism_nam": 53, "organism_experi": 53, "experiments_total_cel": 53, "num_cel": 53, "nfound": 53, "5255245": 53, "turn": 53, "toolchain": 53, "0bd1a1d": 53, "3aee": 53, "40e0": 53, "b2ec": 53, "86c7a30c7149": 53, "522": 53, "atl": 53, "40220": [53, 54], "submitt": 53, "tabula_muris_seni": 53, "lineag": [54, 55], "jin": 54, "tabula_muris_dataset_id": 54, "48b37086": [54, 56, 60], "25f7": [54, 56, 60], "4ecd": [54, 56, 60], "be66": [54, 56, 60], "f5bb378e3aea": [54, 56, 60], "tabula_muris_ob": 54, "35718": 54, "limb": 54, "28867": 54, "24540": 54, "21647": 54, "20680": 54, "12295": 54, "9275": 54, "lumen": 54, "8945": 54, "8613": 54, "7976": 54, "6777": 54, "6201": 54, "skin": [54, 60], "bodi": [54, 60], "4454": 54, "1887": 54, "tabula_muris_liver_dataset_id": 54, "tabula_muris_liver_ob": 54, "awar": 54, "chanc": 54, "priori": [54, 57], "sai": 54, "nk_cell": 54, "80935": 54, "nk_cells_primari": 54, "59109": 54, "aqp5": [54, 57], "adata_primari": 54, "demo": [54, 58], "awai": 54, "8448858": 54, "52812487": 54, "52812553": 54, "52812556": 54, "52812566": 54, "113": 54, "170": 54, "37033": 54, "37052": 54, "36904": 54, "36919": 54, "meaning": 55, "confirm": 55, "easiest": [55, 57], "data_typ": 55, "nmf": 55, "featu": 55, "impli": 55, "anoth": 55, "get_embedding_metadata": 55, "00506592": 55, "01348877": 55, "03173828": 55, "02331543": 55, "02404785": 55, "02441406": 55, "00595093": 55, "0065918": 55, "00070572": 55, "00187683": 55, "04663086": 55, "04614258": 55, "115722": 55, "512": [55, 59], "advanc": [55, 59], "portion": 55, "caution": 55, "quit": 55, "500_000": 55, "fail": [55, 59], "embedding_slic": 55, "emb_data": 55, "emb_joinid": 55, "reindex_disable_on_axi": 55, "embedding_presence_mask": 55, "getnnz": 55, "embedding_data": 55, "vstack": 55, "embedding_joinid": 55, "00762939": 55, "00076675": 55, "00047874": 55, "03588867": 55, "00405884": 55, "00239563": 55, "00982666": 55, "00946045": 55, "00473022": 55, "0135498": 55, "01049805": 55, "03051758": 55, "critic": 55, "meaningless": 55, "embedding_metadata": 55, "toward": 55, "ai": 55, "burgeon": 55, "pioneer": 55, "million": 55, "distil": 55, "concern": 55, "transfer": 55, "optim": [55, 61], "superior": 55, "primary_contact": 55, "bo": 55, "wang": 55, "bowang": 55, "vectorinstitut": 55, "affili": 55, "toronto": 55, "additional_contact": 55, "538439": 55, "additional_inform": 55, "62998417": 55, "submission_d": 55, "nonsens": 55, "assert": 55, "laura": 56, "luebbert": 56, "lauraluebbert": 56, "caltech": 56, "edu": 56, "databas": 56, "facilit": [56, 62], "cite": 56, "googl": 56, "colab": 56, "q": 56, "setup": 56, "fri": 56, "jul": 56, "succesfulli": 56, "gget_cellxgen": 56, "speci": 56, "meta_onli": 56, "verbos": 56, "arg": 56, "slc5a1": 56, "ensg00000130234": 56, "ensg00000100170": 56, "ui": 56, "celltyp": 56, "mucu": 56, "neuroendocrin": 56, "canon": 56, "cellular": 56, "reus": 56, "secondari": 56, "portal": 56, "9b94ccb0a2e0a8f6182b213aa4852c491f6f6aff": 56, "backend": 56, "wmg": 56, "tissue_mapp": 56, "abca1": 56, "minut": 56, "3679": 56, "thousand": 56, "ensg00000165029": 56, "11343": 56, "5332": 56, "9739": 56, "24539": 56, "5081": 56, "3674": 56, "3675": 56, "3676": 56, "3677": 56, "3678": 56, "retina": 56, "config": 56, "inlinebackend": 56, "figure_format": 56, "dotplot": 56, "ensmusg00000015405": 56, "047d57f2": 56, "4d14": 56, "45de": 56, "aa98": 56, "336c6f583750": 56, "97547": 56, "97548": 56, "97549": 56, "97550": 56, "97551": 56, "97552": 56, "example_adata": 56, "example_meta": 56, "querycondit": 57, "2313": 57, "2308": 57, "2309": 57, "2310": 57, "2311": 57, "2312": 57, "8626": 57, "1884": 57, "27047": 57, "tubb4b": 57, "2037": 57, "materi": 57, "shortli": 57, "comparison": 57, "op": 57, "sex_cell_metadata": 57, "669": 57, "385437": 57, "metatadata": 57, "cell_metadata_all_unknown_sex": 57, "9th": 57, "post": 57, "fertil": 57, "0000046": 57, "decidua": 57, "basali": 57, "0000453": 57, "placenta": 57, "0001987": 57, "3251329": 57, "56274573": 57, "cord": 57, "2000095": 57, "newborn": 57, "0000082": 57, "han": 57, "chines": 57, "0027": 57, "umbil": 57, "0012168": 57, "0000178": 57, "3251330": 57, "56274574": 57, "3251331": 57, "56274575": 57, "3251332": 57, "56274576": 57, "3251333": 57, "56274577": 57, "3251334": 57, "cell_metadata_b_cel": 57, "42720": 57, "10631": 57, "8742": 57, "8187": 57, "2083": 57, "1534": 57, "1512": 57, "1474": 57, "1210": 57, "332": 57, "204": 57, "133": 57, "gene_metadata": 57, "isn": 58, "narrow": 58, "as_index": 58, "0000001": 58, "0000006": 58, "2502": 58, "0000015": 58, "621": 58, "0000019": 58, "608": 58, "4028006": 58, "38250": 58, "609": 58, "4030009": 58, "tubul": 58, "segment": 58, "777": 58, "610": 58, "4030011": 58, "989": 58, "611": 58, "4030018": 58, "princip": 58, "107": [58, 59], "612": 58, "4030023": 58, "hillock": 58, "10170": 58, "semant": 59, "maxmimum": 59, "nois": 59, "disabl": 59, "docstr": 59, "hvgs_df": 59, "highly_variable_rank": 59, "230445": 59, "116": 59, "044863": 59, "749637": 59, "287551": 59, "276809": 59, "461324": 59, "407450": 59, "363945": 59, "055626": 59, "280": 59, "958509": 59, "combined_df": [59, 60], "188": 59, "ensmusg00000026117": 59, "zap70": 59, "2992": 59, "409091": 59, "14793": 59, "026717": 59, "350": 59, "775560": 59, "233": 59, "ensmusg00000026073": 59, "il1r2": 59, "1908": 59, "764085": 59, "41918": 59, "471500": 59, "402176": 59, "ensmusg00000026185": 59, "igfbp5": 59, "6006": 59, "234876": 59, "314355": 59, "591239": 59, "156": 59, "825651": 59, "ensmusg00000026180": 59, "cxcr2": 59, "3048": 59, "379390": 59, "10491": 59, "033344": 59, "640129": 59, "30296": 59, "ensmusg00000024803": 59, "ankrd1": 59, "2886": 59, "548572": 59, "274005": 59, "455137": 59, "741864": 59, "30313": 59, "ensmusg00000024987": 59, "cyp26a1": 59, "1983": 59, "186686": 59, "12973": 59, "622003": 59, "454": 59, "580162": 59, "30379": 59, "ensmusg00000018822": 59, "sfrp5": 59, "1900": 59, "927853": 59, "10943": 59, "645525": 59, "410": 59, "637004": 59, "32042": 59, "ensmusg00000031838": 59, "ifi30": 59, "91": 59, "676950": 59, "995276": 59, "564962": 59, "205886": 59, "33314": 59, "ensmusg00000092572": 59, "serpinb10": 59, "3490": 59, "264085": 59, "239812": 59, "487": 59, "535469": 59, "who": 59, "own": 59, "mv_df": 60, "3095357": 60, "915025": 60, "69571": 60, "774917": 60, "3095359": 60, "972801": 60, "9471": 60, "427044": 60, "3095363": 60, "169472": 60, "139042": 60, "208628": 60, "3095366": 60, "049836": 60, "24762": 60, "926397": 60, "3095368": 60, "345415": 60, "150412": 60, "440839": 60, "3278898": 60, "164319": 60, "339741": 60, "3278899": 60, "368339": 60, "930156": 60, "3278900": 60, "246049": 60, "886186": 60, "3278901": 60, "240724": 60, "307266": 60, "3278902": 60, "278420": 60, "086994": 60, "9314": 60, "keratinocyt": [60, 61], "0002337": 60, "mmusdv": 60, "0000089": 60, "18_53_m": 60, "0002097": 60, "18_47_f": 60, "basal": [60, 61], "epidermi": 60, "0002187": 60, "0000091": 60, "epiderm": 60, "0000362": 60, "logist": 61, "regress": 61, "ml": 61, "primer": 61, "census_ml": 61, "experiment_datapip": 61, "10_000": 61, "mechan": 61, "encapsul": 61, "caller": 61, "importantli": 61, "lazili": 61, "avoid": 61, "legaci": 61, "interchang": 61, "shuffler": 61, "layout": 61, "strategi": 61, "held": 61, "1gb": 61, "caus": 61, "valid": 61, "randomsplitt": 61, "train_datapip": 61, "test_datapip": 61, "random_split": 61, "weight": 61, "experiment_dataload": 61, "style": 61, "enforc": 61, "linear": 61, "logisticregress": 61, "input_dim": 61, "output_dim": 61, "super": 61, "noqa": 61, "up008": 61, "sigmoid": 61, "train_epoch": 61, "train_dataload": 61, "loss_fn": 61, "devic": 61, "train_loss": 61, "train_correct": 61, "train_tot": 61, "zero_grad": 61, "softmax": 61, "loss": 61, "backward": 61, "train_accuraci": 61, "secondli": 61, "42496620": 61, "42496621": 61, "42496622": 61, "42496633": 61, "42496634": 61, "42496635": 61, "desir": 61, "cuda": 61, "is_avail": 61, "cell_type_encod": 61, "classes_": 61, "crossentropyloss": 61, "adam": 61, "lr": 61, "7f": 61, "accuraci": 61, "4f": 61, "0167253": 61, "4856": 61, "0156710": 61, "4943": 61, "0149408": 61, "4813": 61, "0144469": 61, "5040": 61, "0141749": 61, "5669": 61, "0139776": 61, "6672": 61, "0138565": 61, "7920": 61, "0138094": 61, "8088": 61, "0136689": 61, "8757": 61, "0136101": 61, "8923": 61, "invok": 61, "eval": 61, "recov": 61, "At": 61, "unpickl": 61, "vein": 61, "123": 61, "124": 61, "127": 61, "helper": 62, "vscode": 63, "m6i": 63, "8xlarg": 63, "mount": 63, "nvme": 63, "drive": 63, "swap": 63, "third": 63, "parti": 63, "misc": 63, "soma_typ": 63, "clone": 63, "absent": 64, "paralleliz": 64}, "objects": {"": [[62, 0, 0, "-", "cellxgene_census"]], "cellxgene_census": [[1, 1, 1, "", "download_source_h5ad"], [15, 1, 1, "", "get_anndata"], [16, 1, 1, "", "get_census_version_description"], [17, 1, 1, "", "get_census_version_directory"], [18, 1, 1, "", "get_default_soma_context"], [19, 1, 1, "", "get_presence_matrix"], [20, 1, 1, "", "get_source_h5ad_uri"], [21, 1, 1, "", "open_soma"]], "cellxgene_census.experimental": [[2, 1, 1, "", "get_all_available_embeddings"], [3, 1, 1, "", "get_all_census_versions_with_embedding"], [4, 1, 1, "", "get_embedding"], [5, 1, 1, "", "get_embedding_metadata"], [6, 1, 1, "", "get_embedding_metadata_by_name"]], "cellxgene_census.experimental.ml.huggingface": [[7, 2, 1, "", "CellDatasetBuilder"], [8, 2, 1, "", "GeneformerTokenizer"]], "cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder": [[7, 3, 1, "", "__init__"]], "cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer": [[8, 3, 1, "", "__init__"]], "cellxgene_census.experimental.ml.pytorch": [[9, 2, 1, "", "ExperimentDataPipe"], [10, 2, 1, "", "Stats"], [11, 1, 1, "", "experiment_dataloader"]], "cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe": [[9, 3, 1, "", "__init__"]], "cellxgene_census.experimental.ml.pytorch.Stats": [[10, 3, 1, "", "__init__"]], "cellxgene_census.experimental.pp": [[12, 1, 1, "", "get_highly_variable_genes"], [13, 1, 1, "", "highly_variable_genes"], [14, 1, 1, "", "mean_variance"]]}, "objtypes": {"0": "py:module", "1": "py:function", "2": "py:class", "3": "py:method"}, "objnames": {"0": ["py", "module", "Python module"], "1": ["py", "function", "Python function"], "2": ["py", "class", "Python class"], "3": ["py", "method", "Python method"]}, "titleterms": {"api": [0, 27, 28, 38, 59, 60, 62], "document": 0, "cellxgene_censu": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 28, 49, 55], "download_source_h5ad": 1, "experiment": [2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 59, 62], "get_all_available_embed": 2, "get_all_census_versions_with_embed": 3, "get_embed": 4, "get_embedding_metadata": 5, "get_embedding_metadata_by_nam": 6, "ml": [7, 8, 9, 10, 11], "huggingfac": [7, 8], "celldatasetbuild": 7, "geneformertoken": 8, "pytorch": [9, 10, 11, 61], "experimentdatapip": [9, 61], "stat": [10, 25, 48], "experiment_dataload": 11, "pp": [12, 13, 14], "get_highly_variable_gen": [12, 59], "highly_variable_gen": [13, 59], "mean_vari": 14, "get_anndata": [15, 49, 55], "get_census_version_descript": 16, "get_census_version_directori": 17, "get_default_soma_context": 18, "get_presence_matrix": 19, "get_source_h5ad_uri": 20, "open_soma": 21, "what": [22, 28, 29, 37, 64], "": [22, 37, 56], "new": [22, 25, 28, 37, 39], "2023": [22, 29], "2024": 22, "r": [23, 27, 30, 32], "packag": [23, 45], "cellxgen": [23, 27, 31, 34, 35, 36, 40, 41, 49, 55, 56], "censu": [23, 25, 26, 27, 28, 29, 31, 33, 34, 35, 36, 37, 38, 39, 40, 41, 43, 44, 45, 46, 48, 50, 52, 53, 54, 55, 57, 60, 61, 62], "v1": 23, "i": [23, 28, 29, 64], "out": [23, 54, 60], "instal": [23, 28, 30, 32, 56, 63], "usag": 23, "made": 23, "possibl": 23, "tiledbsoma": 23, "effici": [23, 24, 32], "access": [23, 25, 27, 49, 55], "singl": [23, 24, 25, 28, 33, 44, 45, 53, 57], "cell": [23, 24, 25, 26, 29, 32, 33, 35, 41, 43, 44, 45, 47, 48, 49, 50, 51, 52, 54, 55, 56, 57, 58, 62], "data": [23, 25, 27, 28, 29, 31, 33, 35, 36, 39, 40, 41, 42, 44, 45, 46, 47, 53, 54, 55, 56, 57, 62], "33m": 23, "from": [23, 28, 42, 43, 44, 45, 53, 56], "easi": 23, "us": [23, 24, 25, 28, 31, 39, 40, 45, 47, 51, 56], "handl": 23, "cloud": 23, "host": [23, 28, 55], "queri": [23, 25, 28, 32, 49, 50, 55, 56, 57], "read": [23, 54], "metadata": [23, 25, 26, 29, 32, 35, 41, 44, 48, 50, 55, 56, 57], "export": [23, 25, 39], "slice": [23, 32, 42, 50, 60, 62], "seurat": [23, 32], "singlecellexperi": [23, 32], "stream": 23, "increment": [23, 51, 60], "chunk": 23, "memori": [24, 32], "implement": 24, "commonli": 24, "method": 24, "calcul": [24, 25, 44, 51, 58, 60], "averag": 24, "varianc": [24, 51, 60], "gene": [24, 25, 35, 41, 42, 44, 46, 48, 51, 52, 56, 57, 59], "express": [24, 44, 46, 53, 56, 57], "across": 24, "million": 24, "how": [24, 25, 27, 28], "work": 24, "exampl": [24, 37, 38, 41, 45, 46, 47, 48, 54, 60], "kra": 24, "aqp4": 24, "lung": [24, 43, 44], "epitheli": 24, "highli": [24, 59], "variabl": [24, 59], "find": [24, 42], "all": [24, 41, 44, 48, 52], "human": [24, 28, 41, 44], "esophagu": 24, "introduc": 25, "normal": [25, 28, 35, 42, 44, 46], "layer": [25, 28, 44], "pre": [25, 58], "statist": 25, "descript": 25, "ad": 25, "librari": 25, "size": 25, "enhanc": 25, "featur": [25, 28, 35, 62], "exist": 25, "toolkit": 25, "via": [25, 49, 50, 55], "tiledb": [25, 27], "soma": [25, 27, 33, 64], "util": 25, "ob": [25, 26, 35, 54, 56, 57], "var": [25, 35, 57], "help": 25, "u": 25, "improv": 25, "addit": 25, "support": [26, 28, 29], "categor": 26, "potenti": 26, "break": 26, "chang": 26, "identifi": [26, 52], "column": 26, "encod": [26, 35], "cz": [27, 31, 35, 36, 40, 41, 56], "discov": [27, 31, 35, 36, 40, 56], "aw": 27, "avail": [27, 41], "specif": [27, 52], "releas": [27, 29, 31, 36, 40], "version": [27, 29, 35, 62, 63], "cli": 27, "programat": 27, "download": [27, 45, 47, 53], "python": [27, 28, 30, 32, 39, 62, 63], "faq": 28, "why": [28, 54], "should": 28, "contain": 28, "do": 28, "cite": [28, 31, 40], "public": 28, "doe": 28, "have": 28, "embed": [28, 39, 43, 44, 45, 49, 55, 62], "differenti": 28, "other": [28, 45], "tool": [28, 31, 40, 42], "can": 28, "mous": [28, 42], "where": 28, "ar": [28, 54], "retriev": [28, 62], "origin": [28, 53], "h5ad": [28, 53], "dataset": [28, 35, 42, 44, 45, 52, 53, 61], "which": 28, "wa": 28, "built": 28, "increas": 28, "perform": [28, 45], "my": 28, "conda": 28, "ask": 28, "contribut": 28, "get": [28, 62], "an": [28, 49, 54, 55, 56, 61], "arrayschema": 28, "error": 28, "when": [28, 54], "open": [28, 41, 46, 48, 52, 57, 61, 62], "run": 28, "import": [28, 43, 45], "databrick": 28, "long": 29, "term": 29, "lt": 29, "weekli": 29, "latest": [29, 63], "list": 29, "12": 29, "15": 29, "inform": [29, 35, 36], "donor": 29, "count": [29, 35, 41, 51, 58], "embbed": 29, "07": 29, "25": 29, "05": 29, "errata": 29, "duplic": [29, 54], "observ": [29, 43], "is_primary_data": [29, 38], "true": 29, "requir": [30, 43, 45, 47, 50], "capabl": [31, 40], "schema": [31, 33, 35, 40], "question": [31, 40], "feedback": [31, 40], "issu": [31, 40], "come": [31, 40], "soon": [31, 40], "project": [31, 40, 45, 47], "quick": [32, 49, 55], "start": [32, 49, 55], "obtain": 32, "anndata": [32, 49, 50, 54, 55, 56, 62], "object": [32, 33, 56], "summari": [33, 35, 41, 44, 58], "info": [33, 41], "census_info": [33, 35], "census_data": [33, 35], "includ": [33, 35, 41], "mirror": 34, "overview": 35, "definit": [35, 36, 43], "speci": 35, "multi": [35, 42], "constraint": 35, "assai": [35, 41, 44], "full": [35, 46, 48], "sequenc": [35, 41, 46], "matrix": [35, 52, 62], "type": [35, 41, 44, 47, 48, 56], "sampl": [35, 43], "repeat": 35, "organ": [35, 41], "census_obj": 35, "somacollect": 35, "somadatafram": 35, "tabl": [35, 38, 41, 53], "summary_cell_count": 35, "somaexperi": 35, "raw": 35, "m": 35, "rna": 35, "x": [35, 51], "somasparsendarrai": 35, "presenc": [35, 52, 62], "feature_dataset_presence_matrix": 35, "changelog": 35, "2": 35, "0": 35, "1": 35, "3": 35, "storag": [36, 49, 55], "polici": 36, "json": 36, "articl": 37, "editori": [37, 38], "guidelin": [37, 38], "locat": 37, "titl": [37, 38], "date": 37, "author": 37, "introduct": [37, 38], "section": [37, 38], "notebook": 38, "vignett": 38, "content": [38, 41, 55], "knowledg": 38, "reinforc": 38, "tutori": 39, "integr": [39, 42], "model": [39, 45, 47, 61], "understand": [39, 41, 54], "analyz": 39, "scalabl": 39, "comput": [39, 51], "machin": [39, 62], "learn": [39, 41, 44, 62], "about": [41, 44], "main": 41, "compon": 41, "each": [41, 52], "number": 41, "microgli": 41, "beyond": [41, 58], "liver": [41, 42], "diseas": [41, 44], "t": 41, "tissu": [41, 43, 44, 56], "fetch": [42, 43, 44, 46, 52, 53, 55, 56, 57, 58], "10x": [42, 45], "genom": 42, "smart": [42, 46], "seq2": 42, "length": [42, 46], "scvi": [42, 47, 49], "inspect": [42, 45], "prior": 42, "batch": 42, "defin": [42, 61], "dataset_id": [42, 51], "donor_id": 42, "assay_ontology_term_id": 42, "suspension_typ": 42, "explor": [43, 44, 46, 53, 58], "biolog": 43, "relev": 43, "cluster": [43, 46], "background": [43, 55], "function": 43, "melanocyt": 43, "ey": 43, "150k": 43, "retin": 43, "bipolar": 43, "neuron": 43, "dopaminerg": 43, "brain": 43, "pulmonari": 43, "ionocyt": 43, "tabula": [43, 54], "sapien": 43, "sex": 44, "v": 44, "nucleu": 44, "sub": 44, "qc": 44, "metric": 44, "creat": [44, 54, 58, 61], "geneform": [45, 49], "class": [45, 61], "predict": [45, 47, 61], "system": [45, 47], "fine": 45, "tune": 45, "prepar": 45, "subclass": 45, "infer": [45, 47], "load": [45, 49, 55], "token": 45, "result": 45, "gener": [45, 50], "pbmc": 45, "3k": 45, "join": 45, "seq": 46, "account": 46, "valid": 46, "through": 46, "train": [47, 61], "pretrain": 47, "summar": 48, "subset": 48, "select": [48, 56], "value_filt": 48, "collabor": 49, "format": [49, 55], "associ": [49, 55], "obsm": [49, 55], "slot": [49, 55], "experimentaxisqueri": [49, 55], "dens": [49, 55], "numpi": [49, 55], "arrai": [49, 55], "citat": 50, "string": 50, "onlin": 51, "algorithm": 51, "mean": [51, 60], "per": 51, "group": 51, "measur": 52, "id": 52, "sourc": 53, "file": 53, "filter": 54, "muri": 54, "seni": 54, "frame": 54, "core": [54, 60], "oper": 54, "gget": 56, "modul": 56, "set": [56, 63], "up": [56, 63], "plot": 56, "dot": 56, "similar": 56, "those": 56, "shown": 56, "onli": 56, "correspond": 56, "command": 56, "line": 56, "census_summary_cell_count": 58, "datafram": 58, "valu": 58, "The": 60, "explain": 61, "paramet": 61, "split": 61, "dataload": 61, "make": 61, "build": 62, "process": 62, "depend": 63, "environ": 63, "verifi": 63, "your": 63, "develop": 63}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "nbsphinx": 4, "sphinx.ext.intersphinx": 1, "sphinx": 57}, "alltitles": {"API Documentation": [[0, "api-documentation"]], "cellxgene_census.download_source_h5ad": [[1, "cellxgene-census-download-source-h5ad"]], "cellxgene_census.experimental.get_all_available_embeddings": [[2, "cellxgene-census-experimental-get-all-available-embeddings"]], "cellxgene_census.experimental.get_all_census_versions_with_embedding": [[3, "cellxgene-census-experimental-get-all-census-versions-with-embedding"]], "cellxgene_census.experimental.get_embedding": [[4, "cellxgene-census-experimental-get-embedding"]], "cellxgene_census.experimental.get_embedding_metadata": [[5, "cellxgene-census-experimental-get-embedding-metadata"]], "cellxgene_census.experimental.get_embedding_metadata_by_name": [[6, "cellxgene-census-experimental-get-embedding-metadata-by-name"]], "cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder": [[7, "cellxgene-census-experimental-ml-huggingface-celldatasetbuilder"]], "cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer": [[8, "cellxgene-census-experimental-ml-huggingface-geneformertokenizer"]], "cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe": [[9, "cellxgene-census-experimental-ml-pytorch-experimentdatapipe"]], "cellxgene_census.experimental.ml.pytorch.Stats": [[10, "cellxgene-census-experimental-ml-pytorch-stats"]], "cellxgene_census.experimental.ml.pytorch.experiment_dataloader": [[11, "cellxgene-census-experimental-ml-pytorch-experiment-dataloader"]], "cellxgene_census.experimental.pp.get_highly_variable_genes": [[12, "cellxgene-census-experimental-pp-get-highly-variable-genes"]], "cellxgene_census.experimental.pp.highly_variable_genes": [[13, "cellxgene-census-experimental-pp-highly-variable-genes"]], "cellxgene_census.experimental.pp.mean_variance": [[14, "cellxgene-census-experimental-pp-mean-variance"]], "cellxgene_census.get_anndata": [[15, "cellxgene-census-get-anndata"]], "cellxgene_census.get_census_version_description": [[16, "cellxgene-census-get-census-version-description"]], "cellxgene_census.get_census_version_directory": [[17, "cellxgene-census-get-census-version-directory"]], "cellxgene_census.get_default_soma_context": [[18, "cellxgene-census-get-default-soma-context"]], "cellxgene_census.get_presence_matrix": [[19, "cellxgene-census-get-presence-matrix"]], "cellxgene_census.get_source_h5ad_uri": [[20, "cellxgene-census-get-source-h5ad-uri"]], "cellxgene_census.open_soma": [[21, "cellxgene-census-open-soma"]], "What\u2019s new?": [[22, "what-s-new"]], "2023": [[22, "id1"]], "2024": [[22, "id2"]], "R package cellxgene.census V1 is out!": [[23, "r-package-cellxgene-census-v1-is-out"]], "Installation and usage": [[23, "installation-and-usage"]], "Census R package is made possible by tiledbsoma": [[23, "census-r-package-is-made-possible-by-tiledbsoma"]], "Efficient access to single-cell data for >33M cells from R": [[23, "efficient-access-to-single-cell-data-for-33m-cells-from-r"]], "Easy-to-use handles to the cloud-hosted Census data": [[23, "easy-to-use-handles-to-the-cloud-hosted-census-data"]], "Querying and reading single-cell metadata from Census": [[23, "querying-and-reading-single-cell-metadata-from-census"]], "Exporting Census slices to Seurat and SingleCellExperiment": [[23, "exporting-census-slices-to-seurat-and-singlecellexperiment"]], "Streaming data incrementally in chunks": [[23, "streaming-data-incrementally-in-chunks"]], "Memory-efficient implementations of commonly used single-cell methods": [[24, "memory-efficient-implementations-of-commonly-used-single-cell-methods"]], "Efficient calculation of average and variance gene expression across millions of cells": [[24, "efficient-calculation-of-average-and-variance-gene-expression-across-millions-of-cells"]], "How it works": [[24, "how-it-works"], [24, "id1"]], "Example: KRAS and AQP4 average and variance expression in lung epithelial cells": [[24, "example-kras-and-aqp4-average-and-variance-expression-in-lung-epithelial-cells"]], "Efficient calculation of highly variable genes across millions of cells": [[24, "efficient-calculation-of-highly-variable-genes-across-millions-of-cells"]], "Example: Finding highly variable genes for all cells of the human esophagus": [[24, "example-finding-highly-variable-genes-for-all-cells-of-the-human-esophagus"]], "Introducing a normalized layer and pre-calculated cell and gene statistics in Census": [[25, "introducing-a-normalized-layer-and-pre-calculated-cell-and-gene-statistics-in-census"]], "Description of new data added to Census": [[25, "description-of-new-data-added-to-census"]], "Added a new library-size normalized layer": [[25, "added-a-new-library-size-normalized-layer"]], "Enhanced gene metadata": [[25, "enhanced-gene-metadata"]], "Enhanced cell metadata": [[25, "enhanced-cell-metadata"]], "How to use the new features": [[25, "how-to-use-the-new-features"]], "Exporting the normalized data to existing single-cell toolkits": [[25, "exporting-the-normalized-data-to-existing-single-cell-toolkits"]], "Accessing library-size normalized data layer via TileDB-SOMA": [[25, "accessing-library-size-normalized-data-layer-via-tiledb-soma"]], "Utilizing pre-calculated stats for querying obs and var": [[25, "utilizing-pre-calculated-stats-for-querying-obs-and-var"]], "Help us improve these data additions": [[25, "help-us-improve-these-data-additions"]], "Census supports categoricals for cell metadata": [[26, "census-supports-categoricals-for-cell-metadata"]], "Potential breaking changes": [[26, "potential-breaking-changes"]], "Identifying the obs columns encoded as categorical": [[26, "identifying-the-obs-columns-encoded-as-categorical"]], "CZ CELLxGENE Discover Census in AWS": [[27, "cz-cellxgene-discover-census-in-aws"]], "Census data available in AWS": [[27, "census-data-available-in-aws"]], "Data specifications": [[27, "data-specifications"]], "Data release versioning": [[27, "data-release-versioning"]], "How to access AWS Census data": [[27, "how-to-access-aws-census-data"]], "AWS CLI for programatic downloads": [[27, "aws-cli-for-programatic-downloads"]], "CELLxGENE Census API (Python and R)": [[27, "cellxgene-census-api-python-and-r"]], "TileDB-SOMA API (Python and R)": [[27, "tiledb-soma-api-python-and-r"]], "FAQ": [[28, "faq"]], "Why should I use the Census?": [[28, "why-should-i-use-the-census"]], "What data is contained in the Census?": [[28, "what-data-is-contained-in-the-census"]], "How do I cite the use of the Census for a publication?": [[28, "how-do-i-cite-the-use-of-the-census-for-a-publication"]], "Why does the Census not have a normalized layer or embeddings?": [[28, "why-does-the-census-not-have-a-normalized-layer-or-embeddings"]], "How does the Census differentiate from other tools?": [[28, "how-does-the-census-differentiate-from-other-tools"]], "Can I query human and mouse data in a single query?": [[28, "can-i-query-human-and-mouse-data-in-a-single-query"]], "Where are the Census data hosted?": [[28, "where-are-the-census-data-hosted"]], "Can I retrieve the original H5AD datasets from which the Census was built?": [[28, "can-i-retrieve-the-original-h5ad-datasets-from-which-the-census-was-built"]], "How can I increase the performance of my queries?": [[28, "how-can-i-increase-the-performance-of-my-queries"]], "Can I use conda to install the Census Python API?": [[28, "can-i-use-conda-to-install-the-census-python-api"]], "How can I ask for support?": [[28, "how-can-i-ask-for-support"]], "How can I ask for new features?": [[28, "how-can-i-ask-for-new-features"]], "How can I contribute my data to the Census?": [[28, "how-can-i-contribute-my-data-to-the-census"]], "Why do I get an ArraySchema error when opening the Census?": [[28, "why-do-i-get-an-arrayschema-error-when-opening-the-census"]], "Why do I get an error when running import cellxgene_census on Databricks?": [[28, "why-do-i-get-an-error-when-running-import-cellxgene-census-on-databricks"]], "Census data releases": [[29, "census-data-releases"]], "What is a Census data release?": [[29, "what-is-a-census-data-release"]], "Long-term supported (LTS) Census releases": [[29, "long-term-supported-lts-census-releases"]], "Weekly Census releases (latest)": [[29, "weekly-census-releases-latest"]], "List of LTS Census data releases": [[29, "list-of-lts-census-data-releases"]], "LTS 2023-12-15": [[29, "lts-2023-12-15"]], "Version information": [[29, "version-information"], [29, "id1"], [29, "id4"]], "Cell and donor counts": [[29, "cell-and-donor-counts"], [29, "id2"], [29, "id5"]], "Cell metadata": [[29, "cell-metadata"], [29, "id3"], [29, "id6"], [41, "Cell-metadata"]], "Cell embbedings": [[29, "cell-embbedings"]], "LTS 2023-07-25": [[29, "lts-2023-07-25"]], "LTS 2023-05-15": [[29, "lts-2023-05-15"]], "\ud83d\udd34 Errata \ud83d\udd34": [[29, "errata"]], "Duplicate observations with  is_primary_data = True": [[29, "duplicate-observations-with-is-primary-data-true"]], "Installation": [[30, "installation"], [32, "installation"], [63, "installation"]], "Requirements": [[30, "requirements"], [43, "Requirements"], [45, "Requirements"], [47, "Requirements"], [50, "Requirements"]], "Python": [[30, "python"]], "R": [[30, "r"]], "CZ CELLxGENE Discover Census": [[31, "cz-cellxgene-discover-census"], [40, "cz-cellxgene-discover-census"]], "Citing Census": [[31, "citing-census"], [40, "citing-census"]], "Census Capabilities": [[31, "census-capabilities"], [40, "census-capabilities"]], "Census Data and Schema": [[31, "census-data-and-schema"], [40, "census-data-and-schema"]], "Census Data Releases": [[31, "census-data-releases"], [40, "census-data-releases"]], "Questions, Feedback and Issues": [[31, "questions-feedback-and-issues"], [40, "questions-feedback-and-issues"]], "Coming Soon!": [[31, "coming-soon"], [40, "coming-soon"]], "Projects and Tools Using Census": [[31, "projects-and-tools-using-census"], [40, "projects-and-tools-using-census"]], "Quick start": [[32, "quick-start"], [49, "Quick-start"], [55, "Quick-start"]], "Python quick start": [[32, "python-quick-start"]], "Querying a slice of cell metadata": [[32, "querying-a-slice-of-cell-metadata"], [32, "id1"]], "Obtaining a slice as AnnData": [[32, "obtaining-a-slice-as-anndata"]], "Memory-efficient queries": [[32, "memory-efficient-queries"], [32, "id2"]], "R quick start": [[32, "r-quick-start"]], "Obtaining a slice as a Seurat or SingleCellExperiment object": [[32, "obtaining-a-slice-as-a-seurat-or-singlecellexperiment-object"]], "Census data and schema": [[33, "census-data-and-schema"]], "Schema": [[33, "schema"], [35, "schema"]], "Census summary info \"census_info\"": [[33, "census-summary-info-census-info"]], "Census single-cell data \"census_data\"": [[33, "census-single-cell-data-census-data"]], "Data included in the Census": [[33, "data-included-in-the-census"]], "SOMA objects": [[33, "soma-objects"]], "CELLxGENE Census Mirroring": [[34, "cellxgene-census-mirroring"]], "CZ CELLxGENE Discover Census Schema": [[35, "cz-cellxgene-discover-census-schema"]], "Census overview": [[35, "census-overview"]], "Definitions": [[35, "definitions"], [36, "definitions"]], "Census Schema versioning": [[35, "census-schema-versioning"]], "Data included": [[35, "data-included"]], "Species": [[35, "species"]], "Multi-species data constraints": [[35, "multi-species-data-constraints"]], "Assays": [[35, "assays"], [44, "Assays"]], "Full-gene sequencing assays": [[35, "full-gene-sequencing-assays"]], "Data matrix types": [[35, "data-matrix-types"]], "Sample types": [[35, "sample-types"]], "Repeated data": [[35, "repeated-data"]], "Data encoding and organization": [[35, "data-encoding-and-organization"]], "Census information census_obj[\"census_info\"] - SOMACollection": [[35, "census-information-census-obj-census-info-somacollection"]], "Census metadata \u2013 census_obj\u200b\u200b[\"census_info\"][\"summary\"] \u2013 SOMADataFrame": [[35, "census-metadata-census-obj-census-info-summary-somadataframe"]], "Census table of CELLxGENE Discover datasets \u2013 census_obj[\"census_info\"][\"datasets\"] \u2013 SOMADataFrame": [[35, "census-table-of-cellxgene-discover-datasets-census-obj-census-info-datasets-somadataframe"]], "Census summary cell counts  \u2013 census_obj[\"census_info\"][\"summary_cell_counts\"] \u2013 SOMADataframe": [[35, "census-summary-cell-counts-census-obj-census-info-summary-cell-counts-somadataframe"]], "Census table of organisms  \u2013 census_obj[\"census_info\"][\"organisms\"] \u2013 SOMADataframe": [[35, "census-table-of-organisms-census-obj-census-info-organisms-somadataframe"]], "Census Data \u2013 census_obj[\"census_data\"][organism] \u2013 SOMAExperiment": [[35, "census-data-census-obj-census-data-organism-somaexperiment"]], "Matrix Data, count (raw) matrix \u2013 census_obj[\"census_data\"][organism].ms[\"RNA\"].X[\"raw\"] \u2013 SOMASparseNDArray": [[35, "matrix-data-count-raw-matrix-census-obj-census-data-organism-ms-rna-x-raw-somasparsendarray"]], "Matrix Data, normalized count matrix \u2013 census_obj[\"census_data\"][organism].ms[\"RNA\"].X[\"normalized\"] \u2013 SOMASparseNDArray": [[35, "matrix-data-normalized-count-matrix-census-obj-census-data-organism-ms-rna-x-normalized-somasparsendarray"]], "Feature metadata \u2013 census_obj[\"census_data\"][organism].ms[\"RNA\"].var \u2013 SOMADataFrame": [[35, "feature-metadata-census-obj-census-data-organism-ms-rna-var-somadataframe"]], "Feature dataset presence matrix \u2013 census_obj[\"census_data\"][organism].ms[\"RNA\"][\"feature_dataset_presence_matrix\"] \u2013 SOMASparseNDArray": [[35, "feature-dataset-presence-matrix-census-obj-census-data-organism-ms-rna-feature-dataset-presence-matrix-somasparsendarray"]], "Cell metadata \u2013 census_obj[\"census_data\"][organism].obs \u2013 SOMADataFrame": [[35, "cell-metadata-census-obj-census-data-organism-obs-somadataframe"]], "Changelog": [[35, "changelog"]], "Version 2.0.0": [[35, "version-2-0-0"]], "Version 1.3.0": [[35, "version-1-3-0"]], "Version 1.2.0": [[35, "version-1-2-0"]], "Version 1.1.0": [[35, "version-1-1-0"]], "Version 1.0.0": [[35, "version-1-0-0"]], "Version 0.1.1": [[35, "version-0-1-1"]], "Version 0.1.0": [[35, "version-0-1-0"]], "Version 0.0.1": [[35, "version-0-0-1"]], "CZ CELLxGENE Discover Census storage & release policy": [[36, "cz-cellxgene-discover-census-storage-release-policy"]], "Census data storage policy": [[36, "census-data-storage-policy"]], "Census release information json": [[36, "census-release-information-json"]], "Census \u201cwhat\u2019s new?\u201d article editorial guidelines": [[37, "census-what-s-new-article-editorial-guidelines"]], "Location": [[37, "location"]], "Guidelines": [[37, "guidelines"], [38, "guidelines"]], "Title": [[37, "title"], [38, "title"]], "Date & author": [[37, "date-author"]], "Introduction": [[37, "introduction"], [38, "introduction"]], "Sections": [[37, "sections"], [38, "sections"]], "Example article": [[37, "example-article"]], "Census API notebook/vignette editorial guidelines": [[38, "census-api-notebook-vignette-editorial-guidelines"]], "Table of Contents": [[38, "table-of-contents"]], "is_primary_data knowledge reinforcement": [[38, "is-primary-data-knowledge-reinforcement"]], "Example notebook/vignette": [[38, "example-notebook-vignette"]], "Python tutorials": [[39, "python-tutorials"]], "Exporting data": [[39, "exporting-data"]], "[NEW! \ud83d\ude80] Using integrated embeddings and models": [[39, "new-using-integrated-embeddings-and-models"]], "Understanding Census data": [[39, "understanding-census-data"]], "Analyzing Census data": [[39, "analyzing-census-data"]], "Scalable computing": [[39, "scalable-computing"]], "Scalable machine learning": [[39, "scalable-machine-learning"]], "Learning about the CZ CELLxGENE Census": [[41, "Learning-about-the-CZ-CELLxGENE-Census"]], "Opening the Census": [[41, "Opening-the-Census"], [48, "Opening-the-Census"], [52, "Opening-the-Census"]], "Census organization": [[41, "Census-organization"]], "Main Census components": [[41, "Main-Census-components"]], "Census summary info": [[41, "Census-summary-info"]], "Census data": [[41, "Census-data"]], "Gene metadata": [[41, "Gene-metadata"]], "Census summary content tables": [[41, "Census-summary-content-tables"]], "Cell counts by cell metadata": [[41, "Cell-counts-by-cell-metadata"]], "Example: cell metadata included in the summary counts table": [[41, "Example:-cell-metadata-included-in-the-summary-counts-table"]], "Example: cell counts for each sequencing assay in human data": [[41, "Example:-cell-counts-for-each-sequencing-assay-in-human-data"]], "Example: number of microglial cells in the Census": [[41, "Example:-number-of-microglial-cells-in-the-Census"]], "Understanding Census contents beyond the summary tables": [[41, "Understanding-Census-contents-beyond-the-summary-tables"]], "Example: all cell types available in human": [[41, "Example:-all-cell-types-available-in-human"]], "Example: cell types available in human liver": [[41, "Example:-cell-types-available-in-human-liver"]], "Example: diseased T cells in human tissues": [[41, "Example:-diseased-T-cells-in-human-tissues"]], "Integrating multi-dataset slices of data": [[42, "Integrating-multi-dataset-slices-of-data"]], "Finding and fetching data from mouse liver (10X Genomics and Smart-Seq2)": [[42, "Finding-and-fetching-data-from-mouse-liver-(10X-Genomics-and-Smart-Seq2)"]], "Gene-length normalization of Smart-Seq2 data.": [[42, "Gene-length-normalization-of-Smart-Seq2-data."]], "Integration with scvi-tools": [[42, "Integration-with-scvi-tools"]], "Inspecting data prior to integration": [[42, "Inspecting-data-prior-to-integration"]], "Data integration with scVI": [[42, "Data-integration-with-scVI"]], "Integration with batch defined as dataset_id": [[42, "Integration-with-batch-defined-as-dataset_id"]], "Integration with batch defined as dataset_id + donor_id": [[42, "Integration-with-batch-defined-as-dataset_id-+-donor_id"]], "Integration with batch defined as dataset_id + donor_id + assay_ontology_term_id + suspension_type": [[42, "Integration-with-batch-defined-as-dataset_id-+-donor_id-+-assay_ontology_term_id-+-suspension_type"]], "Exploring biologically relevant clusters in Census embeddings": [[43, "Exploring-biologically-relevant-clusters-in-Census-embeddings"]], "Background": [[43, "Background"], [55, "Background"]], "Imports and function definitions": [[43, "Imports-and-function-definitions"]], "Melanocytes in eye": [[43, "Melanocytes-in-eye"]], "Sample and fetch 150k cells from eye tissue": [[43, "Sample-and-fetch-150k-cells-from-eye-tissue"]], "Observations": [[43, "Observations"], [43, "id1"], [43, "id2"]], "Retinal bipolar neurons in eye": [[43, "Retinal-bipolar-neurons-in-eye"]], "Dopaminergic neurons in brain": [[43, "Dopaminergic-neurons-in-brain"]], "Sample and fetch 150k cells from brain tissue": [[43, "Sample-and-fetch-150k-cells-from-brain-tissue"]], "Pulmonary ionocytes in lung (Tabula Sapiens)": [[43, "Pulmonary-ionocytes-in-lung-(Tabula-Sapiens)"]], "Fetch lung cells from Tabula Sapiens": [[43, "Fetch-lung-cells-from-Tabula-Sapiens"]], "Exploring all data from a tissue": [[44, "Exploring-all-data-from-a-tissue"]], "Learning about the lung data in the Census": [[44, "Learning-about-the-lung-data-in-the-Census"]], "Learning about cells of lung data": [[44, "Learning-about-cells-of-lung-data"]], "Datasets": [[44, "Datasets"]], "Disease": [[44, "Disease"]], "Sex": [[44, "Sex"]], "Cell vs nucleus": [[44, "Cell-vs-nucleus"]], "Cell types": [[44, "Cell-types"]], "Sub-tissues": [[44, "Sub-tissues"]], "Learning about genes of lung data": [[44, "Learning-about-genes-of-lung-data"]], "Summary of lung metadata": [[44, "Summary-of-lung-metadata"]], "Fetching all single-cell human lung data from the Census": [[44, "Fetching-all-single-cell-human-lung-data-from-the-Census"]], "Calculating QC metrics of the lung data": [[44, "Calculating-QC-metrics-of-the-lung-data"]], "Creating a normalized expression layer and embeddings": [[44, "Creating-a-normalized-expression-layer-and-embeddings"]], "Geneformer for cell class prediction and data projection": [[45, "Geneformer-for-cell-class-prediction-and-data-projection"]], "System requirements": [[45, "System-requirements"], [47, "System-requirements"]], "Downloading example data": [[45, "Downloading-example-data"], [47, "Downloading-example-data"]], "Downloading the fine-tuned Geneformer model": [[45, "Downloading-the-fine-tuned-Geneformer-model"]], "Importing required packages": [[45, "Importing-required-packages"]], "Preparing data and model": [[45, "Preparing-data-and-model"]], "Preparing single-cell data": [[45, "Preparing-single-cell-data"]], "Preparing data from model": [[45, "Preparing-data-from-model"]], "Using the Geneformer fine-tuned model for cell subclass inference": [[45, "Using-the-Geneformer-fine-tuned-model-for-cell-subclass-inference"]], "Loading tokenized data": [[45, "Loading-tokenized-data"]], "Performing inference of cell subclass": [[45, "Performing-inference-of-cell-subclass"]], "Inspecting inference results": [[45, "Inspecting-inference-results"]], "Using the Geneformer fine-tuned model for data projection": [[45, "Using-the-Geneformer-fine-tuned-model-for-data-projection"]], "Generating Geneformer embeddings for 10X PBMC 3K data": [[45, "Generating-Geneformer-embeddings-for-10X-PBMC-3K-data"]], "Joining Geneformer embeddings from 10X PBMC 3K data with other Census datasets": [[45, "Joining-Geneformer-embeddings-from-10X-PBMC-3K-data-with-other-Census-datasets"]], "Normalizing full-length gene sequencing data": [[46, "Normalizing-full-length-gene-sequencing-data"]], "Opening the census": [[46, "Opening-the-census"], [57, "Opening-the-census"]], "Fetching full-length example sequencing data (Smart-Seq)": [[46, "Fetching-full-length-example-sequencing-data-(Smart-Seq)"]], "Normalizing expression to account for gene length": [[46, "Normalizing-expression-to-account-for-gene-length"]], "Validation through clustering exploration": [[46, "Validation-through-clustering-exploration"]], "scVI for cell type prediction and data projection": [[47, "scVI-for-cell-type-prediction-and-data-projection"]], "Downloading the trained scVI model": [[47, "Downloading-the-trained-scVI-model"]], "Using the scVI pretrained model for data projection": [[47, "Using-the-scVI-pretrained-model-for-data-projection"]], "Using the scVI pretrained model for cell cell type inference.": [[47, "Using-the-scVI-pretrained-model-for-cell-cell-type-inference."]], "Summarizing cell and gene metadata": [[48, "Summarizing-cell-and-gene-metadata"]], "Summarizing cell metadata": [[48, "Summarizing-cell-metadata"]], "Example: Summarize all cell types": [[48, "Example:-Summarize-all-cell-types"]], "Example: Summarize a subset of cell types, selected with a value_filter": [[48, "Example:-Summarize-a-subset-of-cell-types,-selected-with-a-value_filter"]], "Full Census metadata stats": [[48, "Full-Census-metadata-stats"]], "Access CELLxGENE collaboration embeddings (scVI, Geneformer)": [[49, "Access-CELLxGENE-collaboration-embeddings-(scVI,-Geneformer)"]], "Storage format": [[49, "Storage-format"], [55, "Storage-format"]], "Query cells and load associated embeddings": [[49, "Query-cells-and-load-associated-embeddings"], [55, "Query-cells-and-load-associated-embeddings"]], "Loading embeddings into an AnnData obsm slot": [[49, "Loading-embeddings-into-an-AnnData-obsm-slot"]], "AnnData embeddings via cellxgene_census.get_anndata()": [[49, "AnnData-embeddings-via-cellxgene_census.get_anndata()"], [55, "AnnData-embeddings-via-cellxgene_census.get_anndata()"]], "AnnData embeddings via ExperimentAxisQuery": [[49, "AnnData-embeddings-via-ExperimentAxisQuery"], [55, "AnnData-embeddings-via-ExperimentAxisQuery"]], "Load an embedding into a dense NumPy array": [[49, "Load-an-embedding-into-a-dense-NumPy-array"], [55, "Load-an-embedding-into-a-dense-NumPy-array"]], "Generating citations for Census slices": [[50, "Generating-citations-for-Census-slices"]], "Generating citation strings": [[50, "Generating-citation-strings"]], "Via cell metadata query": [[50, "Via-cell-metadata-query"]], "Via AnnData query": [[50, "Via-AnnData-query"]], "Computing on X using online (incremental) algorithms": [[51, "Computing-on-X-using-online-(incremental)-algorithms"]], "Incremental count and mean calculation.": [[51, "Incremental-count-and-mean-calculation."]], "Incremental variance calculation": [[51, "Incremental-variance-calculation"]], "Counting cells per gene, grouped by dataset_id": [[51, "Counting-cells-per-gene,-grouped-by-dataset_id"]], "Genes measured in each cell (dataset presence matrix)": [[52, "Genes-measured-in-each-cell-(dataset-presence-matrix)"]], "Fetching the IDs of the Census datasets": [[52, "Fetching-the-IDs-of-the-Census-datasets"]], "Fetching the dataset presence matrix": [[52, "Fetching-the-dataset-presence-matrix"]], "Identifying genes measured in a specific dataset.": [[52, "Identifying-genes-measured-in-a-specific-dataset."]], "Identifying datasets that measured specific genes": [[52, "Identifying-datasets-that-measured-specific-genes"]], "Identifying all genes measured in a dataset": [[52, "Identifying-all-genes-measured-in-a-dataset"]], "Exploring the Census Datasets table": [[53, "Exploring-the-Census-Datasets-table"]], "Fetching the datasets table": [[53, "Fetching-the-datasets-table"]], "Fetching the expression data from a single dataset": [[53, "Fetching-the-expression-data-from-a-single-dataset"]], "Downloading the original source H5AD file of a dataset.": [[53, "Downloading-the-original-source-H5AD-file-of-a-dataset."]], "Understanding and filtering out duplicate cells": [[54, "Understanding-and-filtering-out-duplicate-cells"]], "Why are there duplicate cells in the Census?": [[54, "Why-are-there-duplicate-cells-in-the-Census?"]], "An example: duplicate cells in the Tabula Muris Senis data": [[54, "An-example:-duplicate-cells-in-the-Tabula-Muris-Senis-data"]], "Filtering out duplicate cells": [[54, "Filtering-out-duplicate-cells"]], "Filtering out duplicate cells when reading the obs data frame.": [[54, "Filtering-out-duplicate-cells-when-reading-the-obs-data-frame."]], "Filtering out duplicate cells when creating an AnnData": [[54, "Filtering-out-duplicate-cells-when-creating-an-AnnData"]], "Filtering out duplicate cells for out-of-core operations.": [[54, "Filtering-out-duplicate-cells-for-out-of-core-operations."]], "Access CELLxGENE-hosted embeddings": [[55, "Access-CELLxGENE-hosted-embeddings"]], "Contents": [[55, "Contents"]], "Load an embedding into an AnnData obsm slot": [[55, "Load-an-embedding-into-an-AnnData-obsm-slot"]], "Load embeddings and fetch associated Census data": [[55, "Load-embeddings-and-fetch-associated-Census-data"]], "Embedding Metadata": [[55, "Embedding-Metadata"]], "Querying data using the gget cellxgene module": [[56, "Querying-data-using-the-gget-cellxgene-module"]], "Install gget and set up cellxgene module": [[56, "Install-gget-and-set-up-cellxgene-module"]], "Fetch an AnnData object by selecting gene(s), tissue(s) and cell type(s)": [[56, "Fetch-an-AnnData-object-by-selecting-gene(s),-tissue(s)-and-cell-type(s)"]], "Plot a dot plot similar to those shown on the CZ CELLxGENE Discover Gene Expression": [[56, "Plot-a-dot-plot-similar-to-those-shown-on-the-CZ-CELLxGENE-Discover-Gene-Expression"]], "Fetch only cell metadata (corresponds to AnnData.obs)": [[56, "Fetch-only-cell-metadata-(corresponds-to-AnnData.obs)"]], "Use gget cellxgene from the command line": [[56, "Use-gget-cellxgene-from-the-command-line"]], "Querying and fetching the single-cell data and cell/gene metadata.": [[57, "Querying-and-fetching-the-single-cell-data-and-cell/gene-metadata."]], "Querying expression data": [[57, "Querying-expression-data"]], "Querying cell metadata (obs)": [[57, "Querying-cell-metadata-(obs)"]], "Querying gene metadata (var)": [[57, "Querying-gene-metadata-(var)"]], "Exploring pre-calculated summary cell counts": [[58, "Exploring-pre-calculated-summary-cell-counts"]], "Fetching the census_summary_cell_counts dataframe": [[58, "Fetching-the-census_summary_cell_counts-dataframe"]], "Creating summary counts beyond pre-calculated values.": [[58, "Creating-summary-counts-beyond-pre-calculated-values."]], "Experimental Highly Variable Genes API": [[59, "Experimental-Highly-Variable-Genes-API"]], "get_highly_variable_genes": [[59, "get_highly_variable_genes"]], "highly_variable_genes": [[59, "highly_variable_genes"]], "Out-of-core (incremental) mean and variance calculation": [[60, "Out-of-core-(incremental)-mean-and-variance-calculation"]], "The mean and variance API": [[60, "The-mean-and-variance-API"]], "Example: calculate mean and variance for a slice of the Census": [[60, "Example:-calculate-mean-and-variance-for-a-slice-of-the-Census"]], "Training a PyTorch Model": [[61, "Training-a-PyTorch-Model"]], "Open the Census": [[61, "Open-the-Census"]], "Create an ExperimentDataPipe": [[61, "Create-an-ExperimentDataPipe"]], "ExperimentDataPipe class explained": [[61, "ExperimentDataPipe-class-explained"]], "ExperimentDataPipe parameters explained": [[61, "ExperimentDataPipe-parameters-explained"]], "Split the dataset": [[61, "Split-the-dataset"]], "Create the DataLoader": [[61, "Create-the-DataLoader"]], "Define the model": [[61, "Define-the-model"]], "Train the model": [[61, "Train-the-model"]], "Make predictions with the model": [[61, "Make-predictions-with-the-model"]], "Python API": [[62, "module-cellxgene_census"]], "Open/retrieve Cell Census data": [[62, "open-retrieve-cell-census-data"]], "Get slice as AnnData": [[62, "get-slice-as-anndata"]], "Feature presence matrix": [[62, "feature-presence-matrix"]], "Versioning of Cell Census builds": [[62, "versioning-of-cell-census-builds"]], "Experimental: Machine Learning": [[62, "experimental-machine-learning"]], "Experimental: Processing": [[62, "experimental-processing"]], "Experimental: Embeddings": [[62, "experimental-embeddings"]], "Dependencies": [[63, "dependencies"]], "Set up Python environment": [[63, "set-up-python-environment"]], "Verify your installation": [[63, "verify-your-installation"]], "Latest development version": [[63, "latest-development-version"]], "What is SOMA": [[64, "what-is-soma"]]}, "indexentries": {"download_source_h5ad() (in module cellxgene_census)": [[1, "cellxgene_census.download_source_h5ad"]], "get_all_available_embeddings() (in module cellxgene_census.experimental)": [[2, "cellxgene_census.experimental.get_all_available_embeddings"]], "get_all_census_versions_with_embedding() (in module cellxgene_census.experimental)": [[3, "cellxgene_census.experimental.get_all_census_versions_with_embedding"]], "get_embedding() (in module cellxgene_census.experimental)": [[4, "cellxgene_census.experimental.get_embedding"]], "get_embedding_metadata() (in module cellxgene_census.experimental)": [[5, "cellxgene_census.experimental.get_embedding_metadata"]], "get_embedding_metadata_by_name() (in module cellxgene_census.experimental)": [[6, "cellxgene_census.experimental.get_embedding_metadata_by_name"]], "celldatasetbuilder (class in cellxgene_census.experimental.ml.huggingface)": [[7, "cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder"]], "__init__() (cellxgene_census.experimental.ml.huggingface.celldatasetbuilder method)": [[7, "cellxgene_census.experimental.ml.huggingface.CellDatasetBuilder.__init__"]], "geneformertokenizer (class in cellxgene_census.experimental.ml.huggingface)": [[8, "cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer"]], "__init__() (cellxgene_census.experimental.ml.huggingface.geneformertokenizer method)": [[8, "cellxgene_census.experimental.ml.huggingface.GeneformerTokenizer.__init__"]], "experimentdatapipe (class in cellxgene_census.experimental.ml.pytorch)": [[9, "cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe"]], "__init__() (cellxgene_census.experimental.ml.pytorch.experimentdatapipe method)": [[9, "cellxgene_census.experimental.ml.pytorch.ExperimentDataPipe.__init__"]], "stats (class in cellxgene_census.experimental.ml.pytorch)": [[10, "cellxgene_census.experimental.ml.pytorch.Stats"]], "__init__() (cellxgene_census.experimental.ml.pytorch.stats method)": [[10, "cellxgene_census.experimental.ml.pytorch.Stats.__init__"]], "experiment_dataloader() (in module cellxgene_census.experimental.ml.pytorch)": [[11, "cellxgene_census.experimental.ml.pytorch.experiment_dataloader"]], "get_highly_variable_genes() (in module cellxgene_census.experimental.pp)": [[12, "cellxgene_census.experimental.pp.get_highly_variable_genes"]], "highly_variable_genes() (in module cellxgene_census.experimental.pp)": [[13, "cellxgene_census.experimental.pp.highly_variable_genes"]], "mean_variance() (in module cellxgene_census.experimental.pp)": [[14, "cellxgene_census.experimental.pp.mean_variance"]], "get_anndata() (in module cellxgene_census)": [[15, "cellxgene_census.get_anndata"]], "get_census_version_description() (in module cellxgene_census)": [[16, "cellxgene_census.get_census_version_description"]], "get_census_version_directory() (in module cellxgene_census)": [[17, "cellxgene_census.get_census_version_directory"]], "get_default_soma_context() (in module cellxgene_census)": [[18, "cellxgene_census.get_default_soma_context"]], "get_presence_matrix() (in module cellxgene_census)": [[19, "cellxgene_census.get_presence_matrix"]], "get_source_h5ad_uri() (in module cellxgene_census)": [[20, "cellxgene_census.get_source_h5ad_uri"]], "open_soma() (in module cellxgene_census)": [[21, "cellxgene_census.open_soma"]], "cellxgene_census": [[62, "module-cellxgene_census"]], "module": [[62, "module-cellxgene_census"]]}})
            \ No newline at end of file