From 00e805dd585cb9cd288461eb483a5f4bdb04eb83 Mon Sep 17 00:00:00 2001 From: "github-actions[bot]" Date: Fri, 16 Aug 2024 23:10:02 +0000 Subject: [PATCH] Render course --- docs/02-data-structures.md | 8 +++++++- docs/404.html | 1 + docs/Introduction-to-Python.docx | Bin 390288 -> 390446 bytes docs/about-the-authors.html | 1 + docs/index.html | 1 + docs/intro-to-computing.html | 1 + docs/reference-keys.txt | 1 + docs/references.html | 1 + docs/search_index.json | 2 +- docs/working-with-data-structures.html | 10 ++++++++-- 10 files changed, 22 insertions(+), 4 deletions(-) diff --git a/docs/02-data-structures.md b/docs/02-data-structures.md index 55b0508..0111df1 100644 --- a/docs/02-data-structures.md +++ b/docs/02-data-structures.md @@ -111,7 +111,7 @@ The list data structure has an organization and functionality that metaphoricall And if it "makes sense" to us, then it is well-designed. -The list data structure we have been working with is an example of an **Object**. The definition of an object allows us to ask the questions above: what does it contain, and what can it do. It is an organizational tool for a collection of data and functions that we can relate to. Formally, an object contains the following: +The list data structure we have been working with is an example of an **Object**. The definition of an object allows us to ask the questions above: what does it contain, and what can it do? It is an organizational tool for a collection of data and functions that we can relate to. Formally, an object contains the following: - **Value** that holds the essential data for the object. @@ -131,6 +131,8 @@ Let's see how this applies to the list: Object methods are functions that does something with the object you are using it on. You should think about `chrNum.count(2)` as a function that takes in `chrNum` and `2` as inputs. If you want to use the count function on list `mixedList`, you would use `mixedList.count(x)`. +Here are some more examples of methods with lists: + | Function method | What it takes in | What it does | Returns | |----------------|----------------|-------------------------------------|------------------| | `chrNum.count(x)` | list `chrNum`, data type `x` | Counts the number of instances `x` appears as an element of `chrNum`. | Integer | @@ -349,3 +351,7 @@ metadata.iloc[5:, [1, 10, 21]] This is a great way to start thinking about subsetting your dataframes for analysis, but this way of of subsetting can lead to some inconsistencies in the long run. For instance, suppose your collaborator added a new cell line to the metadata and changed the order of the column. Then your code to subset the last 5 rows and the columns will get you a different answer once the spreadsheet is changed. The second way is to subset by the column name, and this is much more preferred in data analysis practice. You will learn about it next week! + +## Exercises + +Exercise for week 2 can be found [here](https://colab.research.google.com/drive/1oIL3gKEZR2Lq16k6XY0HXIhjYl34pEjr?usp=sharing). diff --git a/docs/404.html b/docs/404.html index ad25d9c..21841c8 100644 --- a/docs/404.html +++ b/docs/404.html @@ -178,6 +178,7 @@
  • 2.3.1 What does a Dataframe contain (in terms of data)?
  • 2.3.2 What can a Dataframe do (in terms of operations and functions)?
  • +
  • 2.4 Exercises
  • About the Authors
  • 3 References
  • diff --git a/docs/Introduction-to-Python.docx b/docs/Introduction-to-Python.docx index ac612d902c42de1faae903bde6d863038436c054..bac65485f861111673b4d2d027708132aa161a3f 100644 GIT binary patch delta 8301 zcmZ8`1yEc|(C#i|abF1T1SbS{O>lP&?(P~`+}#!p1b16p10gtz1eX8-f_t!FkNdxR z_3r(B5B+OfRLM z$U4}42(E8{NG3Znx~KS!k2b9gMxo7+f!`i{|4H$Eqm_CuX*LsBHaEa_z&Rn2w8Z9( zTw%1`{aR3;7$wQou5{?oVuLrKIK|icR2g4+{Cw7<6bl+lnljLs!U#E(#6FJQe=f4P zK11DJW3lp!Hz3JvY(?AU5cmZl`KCQ=$dr68O68P~QD@{ks)T;OD}188^Sf-te(jBN zRe`pM#PCP-HZ8)tjE%)Pp9;xpZC&QVqEq?LwuitxQrS`bG~Jm_xiOU$Nrp#rb2)2C z;*RP10W2xOXP!(|1lFemc|yuLJFZH4z2bpa=_IM`sjb)Xk(C+7`mg~kM`+e5jMXyK zw%=sRLLqgYnHJ(*V8ia!TlUkTVI|5#_nTjIL&`~NohxKI^v%7Tre)Eje|Mkj$yMl2 z*WX`m-0F)7N_OpQzUdcM zf8hCpUjOMvMn8Xzbu?3kcCPIjF*VR+rgM3PtFUb{k8~X}`n_hv?^J+vurL4Uyx!{V zK#UEFL8tjHCeu;f{MoKQvoFFH;p(7S=1--b_2QFdU;a(XtQrLaKk~%o%aD|r+vHG5xfyPzNjWdOr?`5 zn6jEugWw6FW1;`mTsfA?Z$84sluvBU$?G6M;FL6isRRT5$So6x?CkX?0-Q#UO=!Xr zWRDnIML>b5%DU(TPFcz=tJ@3+@zM#5OkX@Kr6iIo!*5ob(JgNYSafX~VI|G%+coRi zCmE$9+dIx?2?0uNS-F0rb(LpyD8A(%;qN!g^@JfZf@qlx4;8bZQ1ou=o>%J32tJDT zR)X2Y3_9TD^BA(nk7t}@F=LJ*+3c2fn(XXO76l6XFyYs=Uk(IQ)65eb9dTAam^|Bp z77}B(YmN#Eo(az5<66vZQL}LM%046b&hU_ygA`UG0L?Mox%)$Aa#)V5#2^5H0fwNs>l%GP3Fy`Sho} zgQ6Mv$7=m1Sjr$L=b=+Y7WY=q+U+yEdn7*Scze8)}V*(G6Ta}RTK`t5PXOD!= zq2q6&rTZ*LR3z9Y|yMCS>-vNy2&V5kIn zV|a0jDLm?XmDXSdBZmBeH*4;ppX+yoB)T^TDK#eu_07{W!r{ie%eQslZ`W9gKYb4c zuF22n-i40YwD~Uz#@2hnd=yj#2@_Ism@jww=0AqjK#@t!*DJDIys!+Maa#iO&gPB7 zj_Z}{*~BHCFRs%>*BG?jO2i>MA-PnD)hy#|LpnZI2#mYEV|Y zR=lQ7h^3{?W~UD0DgR{s(Ss!Y#GbjA@Dr0gMc#ibkvpyFm^8mtOO*7=o%O}rBQq6B zU}O*la523*c`p0Gu-W*Dw5LU1mdnUSN$QS;Z0_>5%z$|!7>?}=!REtvYP;QRcOi=_ z=;qO*Io~ap_WfM169C~&SU*vav|&VrL0ynuo9v$G-U<-io|&FMYHTXC%MiX@N1XSD zzvs|?>OwmFv{|LcAB>3+ernbCiEVH-yH_oto+k31aQdC0{;GV|ce0%l`**B8vy{(<^Rea)t{W4bdC`Vl07~>Lvu`-Rc$AcL{;Q=1Q zFyXj^`XuiYRo8VS-Eg)$1b74LbNjDM=e)<&Z6c~u_cHpKMoQ z%KCYcaH_afn3|~RM3Oz0Uf-##kFGmCg+FQ9{GH?nVAv9J2trw`ZKieJi}sg&Cd2-PEXvD6A64gM@}@Ub$u6i*OOXgBW8MY#~FaYI}jjXz9%{x=Hv$S$|1SUtn7LT?O>hUkViO*+7L@5csjcb`nW zY`;Qe-Xy;D|1HOlgQfhYBd&C`PQ}WGmks}`*hBbDA*2fb-uiVpKYeOa_v`8OOnLb0 z=VK0E?i>NM4gcxho@@h-TNBlgeyd>6AlzY-IrcuoW6p#4LF6~oDpdjaVEp%axw*?f z@Mrv2>=IPhv~PqfCA zNsDCu?C;U{KUeHaT=5k%M>SEO-WVF_iRKy=X|O%qOR85Y;4pi)?DcAF(!o8k8jI2? z%zMcR!u=vJwx0NDZm#&Y6<`~9w&=(5EF(`WnUDlwHsRH{!OMx*xFq?H95>D}vw|z< zSgJJ+3Tih@>fSEs_)PoDBI+@o9HrRRhuXyYL*easp*V z*7IumeSW_X9z()2-|jPwhokK>!R?ODm5NTGsEA+`4(&O{pJ;Wx4cUP@>(}!v6}O(?RaB_A?9a9ygHJXK?36Z?rD zLMBIi(Zn*oP?**RlyOzREv)-PUY{`pGDRaH^A2)foJpbK|C(i>y*KV@3dZUxIiM5b za!Y%KC85WT+~tv`Ma|N#wFutBd%MczunUpi!zQ!cY|=iOwWT)+)h?+U4=g9BCQtLQ zEu7?^Zq6>FjuAo!Y&7)inD25qovO578oE@$m6IvU@A1@N?16Mxpo`)?x~-mRyVucdvl*%}$4?e4*~L-d{Pf zjAtHO7u9@iE5u|x?zXk+w@64mt&r2eTwit`R%>1o$k$ESGZr#~f6(dhSH6pc^;n1j zrtxl@$RgsWAtX`oqA|$0^~cF6um_B|z7$!M$Y3||#Ce_3yC(2yNp;U`D{IBk=6WH4 z34AjEIjv_BvBEf_gb4OzMcd9}_rN8)8UxQf3C)tPLhD*_tPS6YUB6m~Lnv>-d9q;j zRo@Tic%5u&MH?Ki3(~oi7$#p+cA9Aa!{JIg1NRm!#SmbuR-cA4V3Ll*JLM7^zSIY)(ESBB)u!%azBZ*DbbUS2P|KB%7u^%4S%k*}^vB zJjNS?!`mV5x2B9^w(BaW))>UB^ZT-wDq9Ir&P`9CKy2?*V;;|(~RzS@Bo^N zz23qs=?~e0)hmbBZkgjEn?gj5H@3SEI@bj^HX<{|+3ODa=^N{nD{>U~Og;;dxLn(d z#!3~D_Ka8MP<3YVFA${MX{5yd=nFn9VK20!9pXF}#Z5lP)x zHHV1hl8rf0F-PA}qf=pcz-%iu5StosB2VjG=^TR#_g+)9(lw}m?|;4*GuUYT8jpE) zP9=B?F7n)CW!=PcCDh>ibb~8eR{33qA^{~g5B^zL(4lo@p8!(VfMew)-`9bS)>Ded zlQ#D!f#&wlj(}Y2E%+gD6H%x80qcX&#YyD25JL6ZNp@K&7n+5oq4wiZdR5sv)3a;+eM5`y%&&4o zTL~Jx{nYN?f4`F~sloyZzl61M=$_dL;4aFTbo)@d&}pjchY8f92`5Bg$9U9}e?{}i z6R?f6?KOO4X;TYTa&9y8O+xSK zg{z)c{6E`Op!<^zfaP?K&yhN9oYclXKI~Pge)XFEqsqLI7+w2haE&p`vRpHd?8^3C z@kolhHrxY3Q_W&Z+DaV!5z-}^uG`o_DP>Oni9LJ^$T{LOp1@(o3+xget;uK$@0Hj# zxpq<&Pbff?ySpL+KT5Xs@3xnWvYZ-Qw4DYJ#vcSD7}Ro2>!X z^XXMq3H3voELvhsNE$~c!-yF1E@-6XVW=iiATZ>|uZ;-l9ul=0!d(7#%^2^xBO~Lb zX$GQIYs)(lC%Ky45V^-!{;VW#OPKZ+<7tYDHYw zQ~^>Nbcc|`zbs8NlbS& zbCJrFxvl6r;bsXj`DgX8gYNf2v;7n(*A{3Jmv`P?9A(3xcHG$)p7**{ml*<9&iwOg zZN)p;v^SRdZTiQWG$pWVT*|QY+UCVFWJrb@!Pp|(A%x#~fVkcB`%s986pVtU0`b}h z@$X(*tS~`DkGBEWdM0m+t+_hO^5QYmX-4NbL=6n>jz+MjA(NT6Ousi<%n!Eo<$HfW zzEB%hMw7I~{vCEb>RL!uynWV@wXmD%lz2|AnEYW#dHDhX^Y>i+32E8n4*vxR5#ooi zQ&+1?j+-Pg=B`<;X5iiQ6og9i0prRYBm#w}4N3u{JcrBf>CkpZ z=*Xdst5!0-3B-Awak%KIQY7!H>#j~G%$YxhCLKVoe6#i^XE)itF5WLtI|j41C9n(( z^RguR0__d2nGrYnuxbj6?w1bnfjFeD$|`~+81Qm&`Qjphb3xbhV8Tw6I~z@wH1G!p zW*i96lw@33MaEG&Ul5N&|2*;SV0mLfv87Y*1##y=!}Uvw2 z4N8}oz@mWFJq$1Gu%U)Th$vv%-Yc8@nr}V^p?8NdR;cEShrtmCmeq9w#-cWt_sxgS zOXR{23FZ`ij~Rpa>ElLZeEkrJVi+y%qdC9xdJsNQW6*5%8e(M0vFv0vs41r#O!ah2 zg+-EwbzgCbeT{&>U^7f{FAN!6QHHGv&tnuX*iiB= zlbm1^p03vqQOe@lo+8Q)q3B7+Spp7JF)YgLlRBNL99_wocLw~_TbGvLF-ICz&$pYU}sTmkTo5ouk3!%m5BxR)x%`iRe| za^2Vk))89XD6&H-j@%l0x@|RfdX9cp7Uk}MgQp`-v4G^r_ zfRL(eF?`Yl(wH;-HqjoomV9dcMp-DG{|b2Vjj9U4Tf$Z0mJUG=`}iVs$JYUvI?QBW zif~WmD)~OeBHTM##m>Bt{?m+1YXDMe9tJu|P`rZpX9&N)ebCVfqU=^-JjasA^4FsD z2NVy*!{Q;8Yk*HdQ9k~|tOJuP<*dxm0CTEu#H)mOh=`n^RT+gKJ^b2ph!b%;M7S<8 z@jc=`gE&R0Itqs#Hl@tA&}*KgbMVhBwJ`g_8kyh%?coS-uL-Z4X#-6#lZRt?;K4gd z?BAvl^!C5JJBg*T0#8lE%*sD?TPBFztwc{Ce;ivz9_Yxi_nRfvr}I=E-qpu$g5S-K zRLT+8K-zd8Y;JV>q^v`#6UbUbn-GX&kh=b~=L?Orm?d=xP#vRZ=ZU$THY`293O0ag zb@w$4d9&QlNg|#pR{7a=vK!i8CVy+s>veCvz{)%D_bt{aPea8Oxe`_2)EZ;bLO)^9PKpGDKAOk?-&p;HAiA;OvAMfxZp5+M! zh6dW3_Qlgsqk-l-ob#?r&=JqoMMpF1YdP4(p}@$Xx^5Vbx^7RClFG$j-?c6GBT=C) zIBYtDl%+XaAb3}Zwl-6_Sl$|b_bc6hgG>~19c_(&hQ(rGDsvmv$Rz{7S3zx1Jj>Nr zAv83#C{a=dvMWL4(MEZ>cj*2+W@ZLp?&?fTo9UeOs?AoVTRfClS;7fCP{3Nh6`XL^ ziOCtbhR7I#x%5?2-7*EdC9pm7aQEZ;``>O|V^KTcI?;3|P6DT_6mX7lR|o%EmET?* zdB?fNHV~fVSdCyB-d{49byEneLYN> z@IzH}*D1+|xT^^&cUGliv^CmcUm7Ocoh&co%=I8I!vOh4Vhmp%EWVx_@iYt^ocrApZpPa)7B`u17BX;$3 z|5Zg|t(rV!+9>s?r>gB6LD8VB5pfn83IcmN)o2OB6;%*}1adumZ-#5t%`Wpyy+)%t z{|Y;WsC|dd$Mwp$Gk9UW=#Dt|bd*1aWual<&-}yxbKit zp+ENbnGEiI{mwVXgu(UgE-ETLxeyFIPo$HbDaAp^v{>fs&?TdYK*udUye`BC40-(B zd*{bGzustLNLDj3rKl`4bLAy5S36&*=pfA<#AMeLiHtRL6D)Z72VA3j+(;*5IiIxZikyAz1~`7NyqWLQ^~|2R^Qxg?`dTv zbD~p)cz14PpM~P5srb?IMvu-lOeF53JyD*$(X(PE>QE*(*HxuuZp@;ET#2GQ`)}+z zxsuV3Kfp=WiR$e&8rfeMQ0GC%oWe#V?W{mSzJkzqt=>PY5EI{1a`$WmXwzZ_4y3lH zpOsxm2Hw5jyZc{<=+&~QOArD8kbsfeEJ+7}epFfcpVO0W3yn@~QZg+pkTWMOx(6(u zyM`JLHLnSZRs_tb20pHSZK5u;{be>L)nl=HbvZb+%a1JM9e(22=7~1WFI&jJ=lYFG zM>;Na=}$l#AsIPEuDjCojf;AE?XEG06PfJ51txWZoRKP_#7MleUJx7OUl|rzIml8B zn(_=kJnHcytxx4rj+4R@UrfC&5V!fMDC;yAk-B~S z_ueX;^5i}hJ=d{{HUE>}sntGHb?+)FbE!-^%aF4A;cg{5{rxw>%r5eHIDC=w?ZCvs zSae2P2kN9UyY+g?s-cO`3QwxuGl+iEVurA6h;Mh_BG01g=s@tv$CC!S&;b!+OoX=B z)61}J{%tGps?BH|&7%*p86#hQMpj0i47Lnc{Ec5>K3eAaGs+vG#WJs{zp!NVHfitG z6=Q&M_bb~NJHDj(O4RtbN!JF23s`Q1a@qDh@#7*DZ)_;g^v~V`uniCjnI<>*7|=02 zFKrhHyyMY`2k&iDa8Nz>n#>@b>NVr$%~++KaBzy7fVCc5L zMVLIOC5OKjXkDT`A~k`3OZSc+lNgn8goxb*VhP9h*4(W!J!tq*s|4b(VnG1^uJSN( zsZfNM_T+#}Bg%{MR+ajvS=p_I3`BluRw75?t)-5sGeEX`lnf$eD+o$aLNUVxc)g%+ zWH4LgWR-OS?n|_)LcBZ2_V1q9M{Hx3zSIu0^Vh|Fp5azz1hU0x`U#a#9$IpLfi>dr z5)Sr_5~E`g_~4l>@AfP2V@>~#^KA&tzURjgY zXlj8>D*|CDAxdj;)nZQykpS`7{9%pP@Up};Jf=kY3&+%B5!UyKGRj5_II{z+w8XEd zB}QD99UUHDb@$p8H_6n3rW%uIWWov3*h%t|Lz~7^j+o@>)dY>Ki-eUf=pIj6I>wB{J$)ERuduhITm;V_!7R`JO}eON@;_;w z(#rV9SnDVty%^ZX(SJ~&@%GDxq?(&eyZ{b!JuYB-XlzDgYzFW}5ndGWMPIeXW`LMp z0o%)(vq4(Gc4*5fNc_dzoPzlO+hv9dpMf|jUiR|8mpR~H(g%$@1F=%RaQ=U>0szSW z#Q*>Z->0@qQnIKj0AFg^|7&gI=+*s&#%sPjp&$kTUOp23B{NX!a}bT_i&KY$_^AjK z002P?0Koni0|3yL1pvIA-7MJMJ$xN3-QW5+I{Y^PH254uhJdOD%|8cGzbvosAE=@Z zT|Wm={WITx-cNZT0Fe7K1>FC**3p1cT!5%v#)w^j$beH&!~cr{UJ!zrcIt)}0kr-C zg!w|~`G;@^UAX|gc|pL-7lM=#l=AX_B1-%NaZI4bm;d7|=mi3=pm~=dx_{#Tk(fJW zI0&W`$Xt!itq^dAUI@XkXUgYdM6lM+%OEr0%H_o>#@WK zQrW)&fdW|?_Tc2H;H@^^H}KFBAp(306&z&=sy-|n9cqlre`0sFM%Yso@SbAr%LCR; z)m@RBKByd!HNI2qR#h17Ju;k1Y_atg4*$9+@)-E=uHv<6L|3HSk$|XxSocdaqC`xp z>pB^;mWRpm?isxow{9;&7n=x8!fTtQ109Fbtghm!2*yQJRZtJ;@9gq^%LD>4);eEl z>&kE-J$;WT(v5k;@&}yEQ%IGSE(#}0nKsS9xSYfPoTHe_>>!yQt{zpp|Q{>)zd6rZ@z z%ka^=CT*YB$<}3Xg(<6lE5aT?H=uSelIpt)Y!2XR^WA3TXFbY2Z#GED%yp{qgb7py z>Hk5X_GVqvHrg@>ve4NqLaoy`D~pKhVD>4lLh^8;V$@JGm;Xi2D(Z-qeGwn~G8^UH zo=h*U5dQxCoT&s#?R5pSk-uieNNI*;0xq7WozjlV%1?fZa{UjsBe+eNMphkT^~o5; zD8b z=qjX4yQsdrjCpn9zT4MyW^7--;x`)RIzaFDrZT;?nuY}E;X9b$mO#C$;Jv@eQOFUy zF&m+eo;5f+A{8R4HS;uEOv!0oCrh?L)8$X$Tf2xsqQCA!U^WX9ArwR0fMTuCid>8T zO26dB_IqDjPbpw9lZEkAnK2a5>~wue#l115682@L(2BO{O}W@eU(~*P)qxxCsP384 zAbHH2yCi=1>Mn<2O#I!MEJSX~+oU;n1 zGh_)0JJ`9JfF0qOLmb==3c%#)x+0MaV}~Y3fhd5fjC3h0nWmW#uQnQ4fT#@}Y23{q zzFRS!WTrTplz6ZH4@-QgeQDwdU00fg0Vx7M&OD&s{b!R#kVfcL4j}Dsu{pRk<4_#) z)p}YQ>dEIZx6F%#)b@4T3-UkMn zig!TmYzTvnP%z!qyt4wDwBJf)lxoW4Mc*3BSS4e+;BuL4_nb=G$kCzbg-XW^OOY)! zafMEBQY!L{%w*!qBp$}sE^)Vkdk3-^qWP$?@6uVFsIa8l!$?;VRVu0JkT-xRP5p#U znM$zNS}Bou!kd~FSJvp>9TCNR;a*O01_GEg2_`!=Sl{Xs2aGL$iI$yTsph96(nuGS$+I(df6W+!d-CXKpYH3!V^wnJaO_A>m{2`wnxAUtlb z>ZeUnC4TAJ7B4?0s*;0)G?n1ftfs2dNvJOKZAP=_<gzSjjsK37}E4e;}Gr{E+*gqMAi;xU% zI{C3C5PGclsSSlvR{W4gQ^QI560`_>%I@)+t|E5GMt5M~4X(=U@meY)b`XX)s0o5p0br2m5{%>5RKv%*a! z2xZf48LM#eF_l1XMAfK>l5t7I*g>$9N17|;6y-iwoXVZm7Ig0W>8zY`;Wa$*R_nd$ zt&{fVa*}?gmcvzHyVajLVa75&Ue8@Bmh#MNYgHy|?fs=BtMnPxt3oj=tpVZvNIhj& z)dN#)>Lo#BXYp7;UbbZT%Z%4ROKu{jN0$S7URK}e>d&aI5g`L2j8rpAIvI@im22@_ zdDU~I{RgUc%EGJQiw_@HOfoK3%BO_FDF4ix^~+~7Ku2*JHzV9O;5jsIHjMpV2jH>| zHLcrMgz{{L&T1xi_S6Z)17C|K$@cC)Ub#rFhecE7p!Ihn|1?N?i3`L5cFDUaelu6r zo7cx3bRl7h9DXR)reLp=xX|2SC>AB*swlxqvY*HLnXqZV?i+lI?J@h>QKF8FP`vpr zwQDEB;W_i+svFX6WM;Q!00*PDD75WrJNfS1`;nAt1N6ucOv17k~#|itHK!2y*AcC)a)hao%O*%V>rJ5)3B}K@p!T^QgL2nL4rYUn_a;; zf?nAjC1Pyc<_lj)>u-q|-6i7Q-$jN{>;GDMp@uD>Thg#sdZ4!y6U}EAli4lho-7J~{T@z;;cI`k2xQqt^z{Mmyp6%6J=Y+touOWOI zvMIm6wG6YGk^tZLn!IQKs!#`5L8C0VP!{ZQj3Udn9DW0<6iKoT=>?z;qKCtZ7J^_b z#Fwue2z5=5h6$_do1Nu#lwsYg79VW+TJXD`ik@nWhFTfsf3ff6uJE6FKX&C}HmL?I z=FJbN6Ccp?iBJoQP>Wo$3OSX~(8AyOY#@a}zwcMrp(`Y@R~Z;q67E+iG7i-u-W~J< zPu1?Zv$a6-Pjq{i9d>p=>&g~Gk#W2>3=mz{5sf>UB1wH)WQI{QW~^lLlqfuv#x$}L zHK5b;C1tyJpZE1lPbWKF(N}~#H|yv}Ap(^R!(A$&zCF&%MRoPk-7g#`@w9FxfMf>` z8e2l+c=hMRqt2lHfvKM0Z#MWVZ3E54(**8M^n5^71Nkq|%wQwFXYRZQQj?fELi6p% zaV1)k59lR5TDqAZt87@^2lr;>(w`a>vYNe497+Q9*S~$k9g|LsD=3lf^#te6p^1JU zQ~6LXS>4*2{k1jN=0yayg;wlF)#)v1#lH6+M`xqWfP_kRVQitpm@DIJyp#RLG!244 zse(~JNG>x{Qn*Ok!+B95IuKu5Aw-@C+?4Sa=}YtVtE_JJ>#VGjQw!n)*Y%&?uX_(@ zZWUI+_CAK##h>IrJo-6&XZ zTWnR3s={zx2+1-i=uWga8#>q2q!oNy3_%1 zcr*8&iypAHPR$QW9B^6;{rDtgJr_>WF%)*)GUa#6+|d#Nr^?SZ<{R9a-L1|I+hGcI zHeK_(JLylqdfV$}Rqz{$OyusDi%UI#bRCE8Py6e-{km!9)H7}|Sh&mEVT~L5pmpjg zbo@xMCd7G~9UyS~K7aOz-NN~1F}oLr5z5Aw)3O|91stbwxvJh@VP;NR^mRS{F4@dX zPqTVhJ7oFBe28S?O--mLdhLtm+NX79p?dmv7n_LWq5hsNi_c!Sh|7ygVF7d_--u4! z$agqLC8-n9qNltID0f#2*FAWk2gto!xUuMTg?e92J7%3arJ)KBKBnq3pg2vr*ZL%R zFiN24$3~B8EhC_!zk(3mB`jKwt)E!Ior9MK&o!xd&;_L7KNTQu#w_yn38XVy4Z}Sk z$8f>S3AU8^Vu#A6O@pNEE`gqlqRL(z;ht)rAXcu&wU<+Dbw^6S;e7o2U8o_{>8!G} zteMTE!>5AP1pPW~9@U^vOkp;%c!qG`S55Bi%lqb~gVXcX8A`0k!TpF_FL#e`!rdj3qLu2Yey@w;7$5p?0Ha4vzL9cc9~-^0B=2wqhk? zBdkEl6~223RLJulUnRNIwXj6PRB;rUGvbE(;o~wnphQ<|O8Tx8%{h#WIoS1;^42bA zFGepyQVWAWQty#S-ZbzI>o$J&dG zEAl0WJgfKV3TW(d2B4&*YnwkzP8nnl$Mtl3UT&}zJ6)g9`m4M%kAIW(gHI@1*vDhu zscDMY*kZf;C^?^&65n6BA2FS_z2GBbfRD-%-6KM$$#s_&dnY_aQrjK%c^t^TsdcAp zUriuU4RLXSxBavzmc9R!D5n7Y3oFrp#ee1SJDMHq{j3sH32=(CCQ0+YKN^q6`2efswWf)Am5sBV1Aq4Bi+ZRUd zr|lhdz8e#3&tJ9+z@B?M{LNTuQ(umu=PPdGZw>7EKmCUqk-R~7>O8aES8oLlOZ^D6O3d(k^vC0ZzWV!3=)aNk!f zWBqyaH%J6r-O4WajThQG>I+mkn#Tsuc$X4=10YGaD3`kU>c*`l&~F2gyQkP554ISu z9Jz;S#yz!i%$psGtJT*8e#rN~M?Q~-3_a7)sf z-;|D>nBQS!v{8sZPdULL3Jo7YL}wrehE+~U0bhYCdE|?#Ygr--O+B?qt|pV47$aWZ z>j&Vek4*}DkL7zp57x=Ar&I^W5kQDJd%=>4fq04bXTYk17e=AVb_-=9EWm+xfWmum z*|=cJ>Ed-uM~PkcTXDlz_YWLyy%C%5uU$20#c?D@|Gr&t2gof9HX9*C_-J)85*WPy9ULL-=Wr1sL#Rk!5t z*|Tb2SubBOtl#P)I#i^$`sFX*z^l{rbH=ZIP~3b_uwsuRUGEg7?@5Sw_?TTSe$oN` z{N1xO8ShYj*{aj06nd0DsZB)nk|VFsTZ5Ei;gFQ;cp4&yzU?!=)B|L_Y*@~gX@wKC zMHZ@%KI&AXKcFi=e}9~ARgMepxZ(aFeAsf%r*&wZTZJF6VYL!JVU;_}aOj|w?a45R zlk>>`aQT9K25x|n@{Wt=ZWTi1C-a_VC@@Q*(l(BBVZ;X3|7GE1;Hm+)BwHct7Ib6wQ48Ao zsrjWpF0?Y->!$ncO=F<^Rx+;!G~lB^9<_XZPwO{^FP!RlBKrl4BEdgY!XOx3eJ`oT zM6=p?#k`&~2e?l3r5#Z?u1zR>KzLp3Il0N;2YNF;p^Ey!t}(e8>!85GJmtq5Nx1&L z96Is_4J>lJ@UpKK>0meUoVniqJm>`;6_y5LNLpohJnb2IG|9j<%CF&9J-idM1!8Ky z^m>I_ty=H)Sj!{vzcxyUdcMd*I_Ho~;{Dya9FwmU>#n)sSw%e@Lof7k!Oa;7jw+mQ6bXTQp|d~R`VSxf^;eJHv#yz%YQ7&*B< zr`$5x$4kh_C}s22{LzP?uBwIdxDDAM$6g|mK<_EUwdq~M=Gh%?V|O87*J+#Fs_pVj zuB6Lj4vXuD8=rKtZGKz;;wmW#Ig_6XyTiMQ%HN&$z(0R5W9yYh_`kb>1A)wNk`_ei z03+2!4*akug%n#@WYUL14)yWE*vQxbj-S-7!G`^@%k*Bgsw9IotMaIYl@mP5s9VL3 zxAZl+n-sYgS9@D*%D(namD6?`2sytln$2G)GUW&~?3ekiWsy-nY@tBfdQsCCC`9s) zzTtotgd|w--W1JF5$lw)e3oZiD~gnt0T!bAvMvHd>8$2@eNGlKowy#^<9zEr#sK3i zsx3i?7Wro$qZk`P?F}{?X$`Mjd^POMG4XcgFUIHg6J}qrIN&mVHXA1crin!YQBgcI5vx5rmTiyAZP-j-8i+C|7#IU=T;+m`F>VTupZ%FF*Lf~ACR6;UCY!GELj*Nezo z1#w<1hGFkn;+ARi4wzxQjWczP5Y$F@&<>3K-YZpx%cz6PM>qQJTiq#V6n$N$qnM3! zqca=745=c;L&pJfndnMhN`4jn0`Ng|{kbK3Mf7Z(KLI! z#kzTWnq&RDe!s5h5G5U#+lc(>($B=-bq7sj!;W#;jCGd+51a&g?$n$|a!v}A?H zyw|AS`Qc;m1c2bfO<`hKTa?>3w^}ob2d=Qc+8CNSlEp4lbaDPlXgOc|37~MRim_un z8=#khPpg#j3s`$}q14^#l9Gbz)4P@oGM}q-@fE7kvbM*|$S8`0E5k?zz9q}L~S3HD%bXVPO(daX2NQi=}$6mx%i z5a-_OIW1+P&SrEolaywYuP8RVNaQYU({Q{V3DOv-1AO5ox0G4t-YIa$$On0fx1y^W z%Z$2rwd*kiE$Y=4oi_do?jvd&vn%Y5e{b^z_fgw;py5d&4uF3y41)h_yn?r&S5@}Fd;K1Z_>-6S5*jgeC3Zf;K9 z^`P6i7#Czq+j7MqnnHGd4(feQ%l#fcvH#<@H%FEPR*lC#l3B!`205G*=NIP^;@ts3 zFXzYWC3IX+b5-gSqX7(9mrSo&-;#UmUeS(e2kW{jzU!AsxRtz8)Y@t4VNEqRnMGwD zfidd@p)stu{O&9vlna4iEDaXutF&TJ8ZwJ+bJU>SPcFTEGJn%GajCxTRDAu*>xgba zkDQ0eyBMEO9{|ci7XFFA2T7d;kJXg5QCpj=)Tl|Ggbd9uN&wF#-1|f?P{I1pHz?9c zhyq9eeotFwXW$toXNga76L(|K2V+{NBXVq=UbDIbu0mmzQJixNBEczC;=(RlSt= z0IW@Pi`16Ol>%9KR}$edNLBCIO@$;%@lwg=_(G)EOG|nK4tSXiG+QHMCLC=ppdL#J z;Y;Etc;DUO7r%+r%H%&hjf2hRzjk>J20{$wT{ z%+qp<+}41HtXf>H!st`*B7sY}#Dfw(!A{EbAGSj2)50J9A$4MkD(uXt)(ysKevM|{ z7&WqzJuYYuZxAz1e|AmU-x~Rvv5zh>h=TvQDmSc12g5!0{mkX+k)Iuby7b>Jput-t zFw%1ukXhX)B*|8T0l4T;*5^-n%dCWhNQ&(4aKxq*0?WcRL< zu7uGl{40&A zB*Mx`$;P?}I(Mzm4tMEU)Of;^{M{Wf$~Ka&M#m(t>4Nop#s~iZpTimw=Hmo=u)3aH zLO~Depmpr%fQ;!2260Jk*(GA5aqynndSHR;yC!P%?9 zgwG(<&(P#AtW$>toI$AmD*S!^6u=-**7KQg{-x0J7S{GmdLFa1E^cUoL?0*5QZB{b? diff --git a/docs/about-the-authors.html b/docs/about-the-authors.html index ce51ed7..6b6f1cb 100644 --- a/docs/about-the-authors.html +++ b/docs/about-the-authors.html @@ -178,6 +178,7 @@
  • 2.3.1 What does a Dataframe contain (in terms of data)?
  • 2.3.2 What can a Dataframe do (in terms of operations and functions)?
  • +
  • 2.4 Exercises
  • About the Authors
  • 3 References
  • diff --git a/docs/index.html b/docs/index.html index db7af10..1768266 100644 --- a/docs/index.html +++ b/docs/index.html @@ -178,6 +178,7 @@
  • 2.3.1 What does a Dataframe contain (in terms of data)?
  • 2.3.2 What can a Dataframe do (in terms of operations and functions)?
  • +
  • 2.4 Exercises
  • About the Authors
  • 3 References
  • diff --git a/docs/intro-to-computing.html b/docs/intro-to-computing.html index 78b63d6..eaafaad 100644 --- a/docs/intro-to-computing.html +++ b/docs/intro-to-computing.html @@ -178,6 +178,7 @@
  • 2.3.1 What does a Dataframe contain (in terms of data)?
  • 2.3.2 What can a Dataframe do (in terms of operations and functions)?
  • +
  • 2.4 Exercises
  • About the Authors
  • 3 References
  • diff --git a/docs/reference-keys.txt b/docs/reference-keys.txt index 4bc5ecf..80c2935 100644 --- a/docs/reference-keys.txt +++ b/docs/reference-keys.txt @@ -24,4 +24,5 @@ dataframes what-does-a-dataframe-contain-in-terms-of-data what-can-a-dataframe-do-in-terms-of-operations-and-functions subsetting-dataframes +exercises-1 references diff --git a/docs/references.html b/docs/references.html index 8282dc6..a23e8e6 100644 --- a/docs/references.html +++ b/docs/references.html @@ -178,6 +178,7 @@
  • 2.3.1 What does a Dataframe contain (in terms of data)?
  • 2.3.2 What can a Dataframe do (in terms of operations and functions)?
  • +
  • 2.4 Exercises
  • About the Authors
  • 3 References
  • diff --git a/docs/search_index.json b/docs/search_index.json index ef8898a..9734a06 100644 --- a/docs/search_index.json +++ b/docs/search_index.json @@ -1 +1 @@ -[["index.html", "Introduction to Python About this Course 0.1 Curriculum 0.2 Target Audience 0.3 Learning Objectives 0.4 Offerings", " Introduction to Python August, 2024 About this Course 0.1 Curriculum The course covers fundamentals of Python, a high-level programming language, and use it to wrangle data for analysis and visualization. 0.2 Target Audience The course is intended for researchers who want to learn coding for the first time with a data science application via the Python language. This course is also appropriate for folks who have explored data science or programming on their own and want to focus on some fundamentals. 0.3 Learning Objectives Analyze Tidy datasets in the Python programming language via data subsetting, joining, and transformations. Evaluate summary statistics and data visualization to understand scientific questions. Describe how the Python programming environment interpret complex expressions made out of functions, operations, and data structures, in a step-by-step way. Apply problem solving strategies to debug broken code. 0.4 Offerings This course is taught on a regular basis at Fred Hutch Cancer Center through the Data Science Lab. Announcements of course offering can be found here. "],["intro-to-computing.html", "Chapter 1 Intro to Computing 1.1 Goals of the course 1.2 What is a computer program? 1.3 A programming language has following elements: 1.4 Google Colab Setup 1.5 Grammar Structure 1: Evaluation of Expressions 1.6 Grammar Structure 2: Storing data types in the Variable Environment 1.7 Grammar Structure 3: Evaluation of Functions 1.8 Tips on writing your first code 1.9 Exercises", " Chapter 1 Intro to Computing Welcome to Introduction to Python! Each week, we cover a chapter, which consists of a lesson and exercise. In our first week together, we will look at big conceptual themes in programming, see how code is run, and learn some basic grammar structures of programming. 1.1 Goals of the course In the next 6 weeks, we will explore: Fundamental concepts in high-level programming languages (Python, R, Julia, etc.) that is transferable: How do programs run, and how do we solve problems using functions and data structures? Beginning of data science fundamentals: How do you translate your scientific question to a data wrangling problem and answer it? Data science workflow. Image source: R for Data Science. Find a nice balance between the two throughout the course: we will try to reproduce a figure from a scientific publication using new data. 1.2 What is a computer program? A sequence of instructions to manipulate data for the computer to execute. A series of translations: English <-> Programming Code for Interpreter <-> Machine Code for Central Processing Unit (CPU) We will focus on English <-> Programming Code for Python Interpreter in this class. More importantly: How we organize ideas <-> Instructing a computer to do something. 1.3 A programming language has following elements: Grammar structure to construct expressions; combining expressions to create more complex expressions Encapsulate complex expressions via functions to create modular and reusable tasks Encapsulate complex data via data structures to allow efficient manipulation of data 1.4 Google Colab Setup Google Colab is a Integrated Development Environment (IDE) on a web browser. Think about it as Microsoft Word to a plain text editor. It provides extra bells and whistles to using Python that is easier for the user. Let’s open up the KRAS analysis in Google Colab. If you are taking this course while it is in session, the project name is probably named “KRAS Demo” in your Google Classroom workspace. If you are taking this course on your own time, you can view it here. Today, we will pay close attention to: Python Console (“Executions”): Open it via View -> Executed code history. You give it one line of Python code, and the console executes that single line of code; you give it a single piece of instruction, and it executes it for you. Notebook: in the central panel of the website, you will see Python code interspersed with word document text. This is called a Python Notebook (other similar services include Jupyter Notebook, iPython Notebook), which has chunks of plain text and Python code, and it helps us understand better the code we are writing. Variable Environment: Open it by clicking on the “{x}” button on the left-hand panel. Often, your code will store information in the Variable Environment, so that information can be reused. For instance, we often load in data and store it in the Variable Environment, and use it throughout rest of your Python code. The first thing we will do is see the different ways we can run Python code. You can do the following: Type something into the Python Console (Execution) and click the arrow button, such as 2+2. The Python Console will run it and give you an output. Look through the Python Notebook, and when you see a chunk of Python Code, click the arrow button. It will copy the Python code chunk to the Python Console and run all of it. You will likely see variables created in the Variables panel as you load in and manipulate data. Run every single Python code chunk via Runtime -> Run all. Remember that the order that you run your code matters in programming. Your final product would be the result of Option 3, in which you run every Python code chunk from start to finish. However, sometimes it is nice to try out smaller parts of your code via Options 1 or 2. But you will be at risk of running your code out of order! To create your own content in the notebook, click on a section you want to insert content, and then click on “+ Code” or “+ Text” to add Python code or text, respectively. Python Notebook is great for data science work, because: It encourages reproducible data analysis, when you run your analysis from start to finish. It encourages excellent documentation, as you can have code, output from code, and prose combined together. It is flexible to use other programming languages, such as R. Now, we will get to the basics of programming grammar. 1.5 Grammar Structure 1: Evaluation of Expressions Expressions are be built out of operations or functions. Functions and operations take in data types, do something with them, and return another data type. We can combine multiple expressions together to form more complex expressions: an expression can have other expressions nested inside it. For instance, consider the following expressions entered to the Python Console: 18 + 21 ## 39 max(18, 21) ## 21 max(18 + 21, 65) ## 65 18 + (21 + 65) ## 104 len("ATCG") ## 4 Here, our input data types to the operation are integer in lines 1-4 and our input data type to the function is string in line 5. We will go over common data types shortly. Operations are just functions in hiding. We could have written: from operator import add add(18, 21) ## 39 add(18, add(21, 65)) ## 104 Remember that the Python language is supposed to help us understand what we are writing in code easily, lending to readable code. Therefore, it is sometimes useful to come up with operations that is easier to read. (Most functions in Python are stored in a collection of functions called modules that needs to be loaded. The import statement gives us permission to access the functions in the module “operator”.) 1.5.1 Data types Here are some common data types we will be using in this course. Data type name Data type shorthand Examples Integer int 2, 4 Float float 3.5, -34.1009 String str “hello”, “234-234-8594” Boolean bool True, False A nice way to summarize this first grammar structure is using the function machine schema, way back from algebra class: Function machine from algebra class. Here are some aspects of this schema to pay attention to: A programmer should not need to know how the function or operation is implemented in order to use it - this emphasizes abstraction and modular thinking, a foundation in any programming language. A function can have different kinds of inputs and outputs - it doesn’t need to be numbers. In the len() function, the input is a String, and the output is an Integer. We will see increasingly complex functions with all sorts of different inputs and outputs. 1.6 Grammar Structure 2: Storing data types in the Variable Environment To build up a computer program, we need to store our returned data type from our expression somewhere for downstream use. We can assign a variable to it as follows: x = 18 + 21 If you enter this in the Console, you will see that in the Variable Environment, the variable x has a value of 39. 1.6.1 Execution rule for variable assignment Evaluate the expression to the right of =. Bind variable to the left of = to the resulting value. The variable is stored in the Variable Environment. The Variable Environment is where all the variables are stored, and can be used for an expression anytime once it is defined. Only one unique variable name can be defined. The variable is stored in the working memory of your computer, Random Access Memory (RAM). This is temporary memory storage on the computer that can be accessed quickly. Typically a personal computer has 8, 16, 32 Gigabytes of RAM. Look, now x can be reused downstream: x - 2 ## 37 y = x * 2 It is quite common for programmers to not know what data type a variable is while they are coding. To learn about the data type of a variable, use the type() function on any variable in Python: type(y) ## <class 'int'> We should give useful variable names so that we know what to expect! If you are working with sales data, consider num_sales instead of y. 1.7 Grammar Structure 3: Evaluation of Functions Let’s look at functions a little bit more formally: A function has a function name, arguments, and returns a data type. 1.7.1 Execution rule for functions: Evaluate the function by its arguments if there’s any, and if the arguments are functions or contains operations, evaluate those functions or operations first. The output of functions is called the returned value. Often, we will use multiple functions in a nested way, and it is important to understand how the Python console understand the order of operation. We can also use paranthesis to change the order of operation. Think about what the Python is going to do step-by–step in the lines of code below: max(len("hello"), 4) ## 5 (len("pumpkin") - 8) * 2 ## -2 If we don’t know how to use a function, such as pow(), we can ask for help: ?pow pow(base, exp, mod=None) Equivalent to base**exp with 2 arguments or base**exp % mod with 3 arguments Some types, such as ints, are able to use a more efficient algorithm when invoked using the three argument form. This shows the function takes in three input arguments: base, exp, and mod=None. When an argument has an assigned value of mod=None, that means the input argument already has a value, and you don’t need to specify anything, unless you want to. The following ways are equivalent ways of using the pow() function: pow(2, 3) ## 8 pow(base=2, exp=3) ## 8 pow(exp=3, base=2) ## 8 but this will give you something different: pow(3, 2) ## 9 And there is an operational equivalent: 2 ** 3 ## 8 We will mostly look at functions with input arguments and return types in this course, but not all functions need to have input arguments and output return. Here are some varieties of functions to stretch your horizons. Function call What it takes in What it does Returns pow(a, b) integer a, integer b Raises a to the bth power. Integer print(x) any data type x Prints out the value of x to the console. None datetime.now() Nothing Gets the current time. String 1.8 Tips on writing your first code Computer = powerful + stupid Computers are excellent at doing something specific over and over again, but is extremely rigid and lack flexibility. Here are some tips that is helpful for beginners: Write incrementally, test often Check your assumptions, especially using new functions, operations, and new data types. Live environments are great for testing, but not great for reproducibility. Ask for help! To get more familiar with the errors Python gives you, take a look at this summary of Python error messages. 1.9 Exercises Exercise for week 1 can be found here. "],["working-with-data-structures.html", "Chapter 2 Working with data structures 2.1 Lists 2.2 Objects in Python 2.3 Dataframes", " Chapter 2 Working with data structures In our second lesson, we start to look at two data structures, Lists and Dataframes, that can handle a large amount of data for analysis. 2.1 Lists In the first exercise, you started to explore data structures, which store information about data types. You explored lists, which is an ordered collection of data types or data structures. Each element of a list contains a data type or another data structure. We can now store a vast amount of information in a list, and assign it to a single variable. Even more, we can use operations and functions on a list, modifying many elements within the list at once! This makes analyzing data much more scalable and less repetitive. We create a list via the bracket [ ] operation. staff = ["chris", "ted", "jeff"] chrNum = [2, 3, 1, 2, 2] mixedList = [False, False, False, "A", "B", 92] 2.1.1 Subsetting lists To access an element of a list, you can use the bracket notation [ ] to access the elements of the list. We simply access an element via the “index” number - the location of the data within the list. Here’s the tricky thing about the index number: it starts at 0! 1st element of chrNum: chrNum[0] 2nd element of chrNum: chrNum[1] … 5th element of chrNum: chrNum[4] With subsetting, you can modify elements of a list or use the element of a list as part of an expression. 2.1.2 Subsetting multiple elements of lists Suppose you want to access multiple elements of a list, such as accessing the first three elements of chrNum. You would use the slice operator :, which specifies: the index number to start the index number to stop, plus one. If you want to access the first three elements of chrNum: chrNum[0:3] ## [2, 3, 1] The first element’s index number is 0, the third element’s index number is 2, plus 1, which is 3. If you want to access the second and third elements of chrNum: chrNum[1:3] ## [3, 1] If you want to access everything but the first three elements of chrNum: chrNum[3:len(chrNum)] ## [2, 2] where len(chrNum) is the length of the list. When the start or stop index is specified, it implies that you are subsetting starting the from the beginning of the list or subsetting to the end of the list, respectively: chrNum[:3] ## [2, 3, 1] chrNum[3:] ## [2, 2] More discussion of list slicing can be found here. 2.2 Objects in Python The list data structure has an organization and functionality that metaphorically represents a pen-and-paper list in our physical world. Like a physical object, we have examined: What does it contain (in terms of data)? What can it do (in terms of operations and functions)? And if it “makes sense” to us, then it is well-designed. The list data structure we have been working with is an example of an Object. The definition of an object allows us to ask the questions above: what does it contain, and what can it do. It is an organizational tool for a collection of data and functions that we can relate to. Formally, an object contains the following: Value that holds the essential data for the object. Attributes that store additional data for the object. Functions called Methods that can be used on the object. This organizing structure on an object applies to pretty much all Python data types and data structures. Let’s see how this applies to the list: Value: the contents of the list, such as [2, 3, 4]. Attributes that store additional values: Not relevant for lists. Methods that can be used on the object: chrNum.count(2) counts the number of instances 2 appears as an element of chrNum. Object methods are functions that does something with the object you are using it on. You should think about chrNum.count(2) as a function that takes in chrNum and 2 as inputs. If you want to use the count function on list mixedList, you would use mixedList.count(x). Function method What it takes in What it does Returns chrNum.count(x) list chrNum, data type x Counts the number of instances x appears as an element of chrNum. Integer chrNum.append(x) list chrNum, data type x Appends x to the end of the chrNum. None (but chrNum is modified!) chrNum.sort() list chrNum Sorts chrNum by ascending order. None (but chrNum is modified!) chrNum.reverse() list chrNum Reverses the order of chrNum. None (but chrNum is modified!) 2.3 Dataframes A Dataframe is a two-dimensional data structure that stores data like a spreadsheet does. The Dataframe data structure is found within a Python module called “Pandas”. A Python module is an organized collection of functions and data structures. The import statement below gives us permission to access the “Pandas” module via the variable pd. To load in a Dataframe from existing spreadsheet data, we use the function pd.read_csv(): import pandas as pd metadata = pd.read_csv("classroom_data/metadata.csv") type(metadata) ## <class 'pandas.core.frame.DataFrame'> There is a similar function pd.read_excel() for loading in Excel spreadsheets. Let’s investigate the Dataframe as an object: What does a Dataframe contain (in terms of data)? What can a Dataframe do (in terms of operations and functions)? 2.3.1 What does a Dataframe contain (in terms of data)? We first take a look at the contents: metadata ## ModelID ... OncotreeLineage ## 0 ACH-000001 ... Ovary/Fallopian Tube ## 1 ACH-000002 ... Myeloid ## 2 ACH-000003 ... Bowel ## 3 ACH-000004 ... Myeloid ## 4 ACH-000005 ... Myeloid ## ... ... ... ... ## 1859 ACH-002968 ... Esophagus/Stomach ## 1860 ACH-002972 ... Esophagus/Stomach ## 1861 ACH-002979 ... Esophagus/Stomach ## 1862 ACH-002981 ... Esophagus/Stomach ## 1863 ACH-003071 ... Lung ## ## [1864 rows x 30 columns] It looks like there are 1864 rows and 30 columns in this Dataframe, and when we display it it shows some of the data. We can look at specific columns by looking at attributes via the dot operation. We can also look at the columns via the bracket operation. metadata.ModelID ## 0 ACH-000001 ## 1 ACH-000002 ## 2 ACH-000003 ## 3 ACH-000004 ## 4 ACH-000005 ## ... ## 1859 ACH-002968 ## 1860 ACH-002972 ## 1861 ACH-002979 ## 1862 ACH-002981 ## 1863 ACH-003071 ## Name: ModelID, Length: 1864, dtype: object metadata['ModelID'] ## 0 ACH-000001 ## 1 ACH-000002 ## 2 ACH-000003 ## 3 ACH-000004 ## 4 ACH-000005 ## ... ## 1859 ACH-002968 ## 1860 ACH-002972 ## 1861 ACH-002979 ## 1862 ACH-002981 ## 1863 ACH-003071 ## Name: ModelID, Length: 1864, dtype: object The names of all columns is stored as an attribute, which can be accessed via the dot operation. metadata.columns ## Index(['ModelID', 'PatientID', 'CellLineName', 'StrippedCellLineName', 'Age', ## 'SourceType', 'SangerModelID', 'RRID', 'DepmapModelType', 'AgeCategory', ## 'GrowthPattern', 'LegacyMolecularSubtype', 'PrimaryOrMetastasis', ## 'SampleCollectionSite', 'Sex', 'SourceDetail', 'LegacySubSubtype', ## 'CatalogNumber', 'CCLEName', 'COSMICID', 'PublicComments', ## 'WTSIMasterCellID', 'EngineeredModel', 'TreatmentStatus', ## 'OnboardedMedia', 'PlateCoating', 'OncotreeCode', 'OncotreeSubtype', ## 'OncotreePrimaryDisease', 'OncotreeLineage'], ## dtype='object') The number of rows and columns are also stored as an attribute: metadata.shape ## (1864, 30) 2.3.2 What can a Dataframe do (in terms of operations and functions)? We can use the head() and tail() functions to look at the first few rows and last few rows of metadata, respectively: metadata.head() ## ModelID PatientID ... OncotreePrimaryDisease OncotreeLineage ## 0 ACH-000001 PT-gj46wT ... Ovarian Epithelial Tumor Ovary/Fallopian Tube ## 1 ACH-000002 PT-5qa3uk ... Acute Myeloid Leukemia Myeloid ## 2 ACH-000003 PT-puKIyc ... Colorectal Adenocarcinoma Bowel ## 3 ACH-000004 PT-q4K2cp ... Acute Myeloid Leukemia Myeloid ## 4 ACH-000005 PT-q4K2cp ... Acute Myeloid Leukemia Myeloid ## ## [5 rows x 30 columns] metadata.tail() ## ModelID PatientID ... OncotreePrimaryDisease OncotreeLineage ## 1859 ACH-002968 PT-pjhrsc ... Esophagogastric Adenocarcinoma Esophagus/Stomach ## 1860 ACH-002972 PT-dkXZB1 ... Esophagogastric Adenocarcinoma Esophagus/Stomach ## 1861 ACH-002979 PT-lyHTzo ... Esophagogastric Adenocarcinoma Esophagus/Stomach ## 1862 ACH-002981 PT-Z9akXf ... Esophagogastric Adenocarcinoma Esophagus/Stomach ## 1863 ACH-003071 PT-LAGmLq ... Lung Neuroendocrine Tumor Lung ## ## [5 rows x 30 columns] Both of these functions (without input arguments) are considered as methods: they are functions that does something with the Dataframe you are using it on. You should think about metadata.head() as a function that takes in metadata as an input. If we had another Dataframe called my_data and you want to use the same function, you will have to say my_data.head(). 2.3.2.1 Subsetting Dataframes Perhaps the most important operation you will can do with Dataframes is subsetting them. There are two ways to do it. The first way is to subset by numerical indicies, exactly like how we did for lists. You will use the iloc and bracket operations, and you give two slices: one for the row, and one for the column. Subset the first 5 rows, and first two columns: metadata.iloc[:5, :2] ## ModelID PatientID ## 0 ACH-000001 PT-gj46wT ## 1 ACH-000002 PT-5qa3uk ## 2 ACH-000003 PT-puKIyc ## 3 ACH-000004 PT-q4K2cp ## 4 ACH-000005 PT-q4K2cp If we want a custom slice that is not sequential, we can use an integer list. Subset the last 5 rows, and the 1st and 10 and 21th column: metadata.iloc[5:, [1, 10, 21]] ## PatientID GrowthPattern WTSIMasterCellID ## 5 PT-ej13Dz Suspension 2167.0 ## 6 PT-NOXwpH Adherent 569.0 ## 7 PT-fp8PeY Adherent 1806.0 ## 8 PT-puKIyc Adherent 2104.0 ## 9 PT-AR7W9o Adherent NaN ## ... ... ... ... ## 1859 PT-pjhrsc Organoid NaN ## 1860 PT-dkXZB1 Organoid NaN ## 1861 PT-lyHTzo Organoid NaN ## 1862 PT-Z9akXf Organoid NaN ## 1863 PT-LAGmLq Suspension NaN ## ## [1859 rows x 3 columns] This is a great way to start thinking about subsetting your dataframes for analysis, but this way of of subsetting can lead to some inconsistencies in the long run. For instance, suppose your collaborator added a new cell line to the metadata and changed the order of the column. Then your code to subset the last 5 rows and the columns will get you a different answer once the spreadsheet is changed. The second way is to subset by the column name, and this is much more preferred in data analysis practice. You will learn about it next week! "],["about-the-authors.html", "About the Authors", " About the Authors These credits are based on our course contributors table guidelines.     Credits Names Pedagogy Lead Content Instructor(s) FirstName LastName Lecturer(s) (include chapter name/link in parentheses if only for specific chapters) - make new line if more than one chapter involved Delivered the course in some way - video or audio Content Author(s) (include chapter name/link in parentheses if only for specific chapters) - make new line if more than one chapter involved If any other authors besides lead instructor Content Contributor(s) (include section name/link in parentheses) - make new line if more than one section involved Wrote less than a chapter Content Editor(s)/Reviewer(s) Checked your content Content Director(s) Helped guide the content direction Content Consultants (include chapter name/link in parentheses or word “General”) - make new line if more than one chapter involved Gave high level advice on content Acknowledgments Gave small assistance to content but not to the level of consulting Production Content Publisher(s) Helped with publishing platform Content Publishing Reviewer(s) Reviewed overall content and aesthetics on publishing platform Technical Course Publishing Engineer(s) Helped with the code for the technical aspects related to the specific course generation Template Publishing Engineers Candace Savonen, Carrie Wright, Ava Hoffman Publishing Maintenance Engineer Candace Savonen Technical Publishing Stylists Carrie Wright, Ava Hoffman, Candace Savonen Package Developers (ottrpal) Candace Savonen, John Muschelli, Carrie Wright Art and Design Illustrator(s) Created graphics for the course Figure Artist(s) Created figures/plots for course Videographer(s) Filmed videos Videography Editor(s) Edited film Audiographer(s) Recorded audio Audiography Editor(s) Edited audio recordings Funding Funder(s) Institution/individual who funded course including grant number Funding Staff Staff members who help with funding   ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.2 (2023-10-31) ## os Ubuntu 22.04.4 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2024-08-16 ## pandoc 3.1.1 @ /usr/local/bin/ (via rmarkdown) ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date (UTC) lib source ## bookdown 0.39.1 2024-06-11 [1] Github (rstudio/bookdown@f244cf1) ## bslib 0.6.1 2023-11-28 [1] RSPM (R 4.3.0) ## cachem 1.0.8 2023-05-01 [1] RSPM (R 4.3.0) ## cli 3.6.2 2023-12-11 [1] RSPM (R 4.3.0) ## devtools 2.4.5 2022-10-11 [1] RSPM (R 4.3.0) ## digest 0.6.34 2024-01-11 [1] RSPM (R 4.3.0) ## ellipsis 0.3.2 2021-04-29 [1] RSPM (R 4.3.0) ## evaluate 0.23 2023-11-01 [1] RSPM (R 4.3.0) ## fastmap 1.1.1 2023-02-24 [1] RSPM (R 4.3.0) ## fs 1.6.3 2023-07-20 [1] RSPM (R 4.3.0) ## glue 1.7.0 2024-01-09 [1] RSPM (R 4.3.0) ## htmltools 0.5.7 2023-11-03 [1] RSPM (R 4.3.0) ## htmlwidgets 1.6.4 2023-12-06 [1] RSPM (R 4.3.0) ## httpuv 1.6.14 2024-01-26 [1] RSPM (R 4.3.0) ## jquerylib 0.1.4 2021-04-26 [1] RSPM (R 4.3.0) ## jsonlite 1.8.8 2023-12-04 [1] RSPM (R 4.3.0) ## knitr 1.47.3 2024-06-11 [1] Github (yihui/knitr@e1edd34) ## later 1.3.2 2023-12-06 [1] RSPM (R 4.3.0) ## lifecycle 1.0.4 2023-11-07 [1] RSPM (R 4.3.0) ## magrittr 2.0.3 2022-03-30 [1] RSPM (R 4.3.0) ## memoise 2.0.1 2021-11-26 [1] RSPM (R 4.3.0) ## mime 0.12 2021-09-28 [1] RSPM (R 4.3.0) ## miniUI 0.1.1.1 2018-05-18 [1] RSPM (R 4.3.0) ## pkgbuild 1.4.3 2023-12-10 [1] RSPM (R 4.3.0) ## pkgload 1.3.4 2024-01-16 [1] RSPM (R 4.3.0) ## profvis 0.3.8 2023-05-02 [1] RSPM (R 4.3.0) ## promises 1.2.1 2023-08-10 [1] RSPM (R 4.3.0) ## purrr 1.0.2 2023-08-10 [1] RSPM (R 4.3.0) ## R6 2.5.1 2021-08-19 [1] RSPM (R 4.3.0) ## Rcpp 1.0.12 2024-01-09 [1] RSPM (R 4.3.0) ## remotes 2.4.2.1 2023-07-18 [1] RSPM (R 4.3.0) ## rlang 1.1.4 2024-06-04 [1] CRAN (R 4.3.2) ## rmarkdown 2.27.1 2024-06-11 [1] Github (rstudio/rmarkdown@e1c93a9) ## sass 0.4.8 2023-12-06 [1] RSPM (R 4.3.0) ## sessioninfo 1.2.2 2021-12-06 [1] RSPM (R 4.3.0) ## shiny 1.8.0 2023-11-17 [1] RSPM (R 4.3.0) ## stringi 1.8.3 2023-12-11 [1] RSPM (R 4.3.0) ## stringr 1.5.1 2023-11-14 [1] RSPM (R 4.3.0) ## urlchecker 1.0.1 2021-11-30 [1] RSPM (R 4.3.0) ## usethis 2.2.3 2024-02-19 [1] RSPM (R 4.3.0) ## vctrs 0.6.5 2023-12-01 [1] RSPM (R 4.3.0) ## xfun 0.44.4 2024-06-11 [1] Github (yihui/xfun@9da62cc) ## xtable 1.8-4 2019-04-21 [1] RSPM (R 4.3.0) ## yaml 2.3.8 2023-12-11 [1] RSPM (R 4.3.0) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library ## ## ────────────────────────────────────────────────────────────────────────────── "],["references.html", "Chapter 3 References", " Chapter 3 References "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] +[["index.html", "Introduction to Python About this Course 0.1 Curriculum 0.2 Target Audience 0.3 Learning Objectives 0.4 Offerings", " Introduction to Python August, 2024 About this Course 0.1 Curriculum The course covers fundamentals of Python, a high-level programming language, and use it to wrangle data for analysis and visualization. 0.2 Target Audience The course is intended for researchers who want to learn coding for the first time with a data science application via the Python language. This course is also appropriate for folks who have explored data science or programming on their own and want to focus on some fundamentals. 0.3 Learning Objectives Analyze Tidy datasets in the Python programming language via data subsetting, joining, and transformations. Evaluate summary statistics and data visualization to understand scientific questions. Describe how the Python programming environment interpret complex expressions made out of functions, operations, and data structures, in a step-by-step way. Apply problem solving strategies to debug broken code. 0.4 Offerings This course is taught on a regular basis at Fred Hutch Cancer Center through the Data Science Lab. Announcements of course offering can be found here. "],["intro-to-computing.html", "Chapter 1 Intro to Computing 1.1 Goals of the course 1.2 What is a computer program? 1.3 A programming language has following elements: 1.4 Google Colab Setup 1.5 Grammar Structure 1: Evaluation of Expressions 1.6 Grammar Structure 2: Storing data types in the Variable Environment 1.7 Grammar Structure 3: Evaluation of Functions 1.8 Tips on writing your first code 1.9 Exercises", " Chapter 1 Intro to Computing Welcome to Introduction to Python! Each week, we cover a chapter, which consists of a lesson and exercise. In our first week together, we will look at big conceptual themes in programming, see how code is run, and learn some basic grammar structures of programming. 1.1 Goals of the course In the next 6 weeks, we will explore: Fundamental concepts in high-level programming languages (Python, R, Julia, etc.) that is transferable: How do programs run, and how do we solve problems using functions and data structures? Beginning of data science fundamentals: How do you translate your scientific question to a data wrangling problem and answer it? Data science workflow. Image source: R for Data Science. Find a nice balance between the two throughout the course: we will try to reproduce a figure from a scientific publication using new data. 1.2 What is a computer program? A sequence of instructions to manipulate data for the computer to execute. A series of translations: English <-> Programming Code for Interpreter <-> Machine Code for Central Processing Unit (CPU) We will focus on English <-> Programming Code for Python Interpreter in this class. More importantly: How we organize ideas <-> Instructing a computer to do something. 1.3 A programming language has following elements: Grammar structure to construct expressions; combining expressions to create more complex expressions Encapsulate complex expressions via functions to create modular and reusable tasks Encapsulate complex data via data structures to allow efficient manipulation of data 1.4 Google Colab Setup Google Colab is a Integrated Development Environment (IDE) on a web browser. Think about it as Microsoft Word to a plain text editor. It provides extra bells and whistles to using Python that is easier for the user. Let’s open up the KRAS analysis in Google Colab. If you are taking this course while it is in session, the project name is probably named “KRAS Demo” in your Google Classroom workspace. If you are taking this course on your own time, you can view it here. Today, we will pay close attention to: Python Console (“Executions”): Open it via View -> Executed code history. You give it one line of Python code, and the console executes that single line of code; you give it a single piece of instruction, and it executes it for you. Notebook: in the central panel of the website, you will see Python code interspersed with word document text. This is called a Python Notebook (other similar services include Jupyter Notebook, iPython Notebook), which has chunks of plain text and Python code, and it helps us understand better the code we are writing. Variable Environment: Open it by clicking on the “{x}” button on the left-hand panel. Often, your code will store information in the Variable Environment, so that information can be reused. For instance, we often load in data and store it in the Variable Environment, and use it throughout rest of your Python code. The first thing we will do is see the different ways we can run Python code. You can do the following: Type something into the Python Console (Execution) and click the arrow button, such as 2+2. The Python Console will run it and give you an output. Look through the Python Notebook, and when you see a chunk of Python Code, click the arrow button. It will copy the Python code chunk to the Python Console and run all of it. You will likely see variables created in the Variables panel as you load in and manipulate data. Run every single Python code chunk via Runtime -> Run all. Remember that the order that you run your code matters in programming. Your final product would be the result of Option 3, in which you run every Python code chunk from start to finish. However, sometimes it is nice to try out smaller parts of your code via Options 1 or 2. But you will be at risk of running your code out of order! To create your own content in the notebook, click on a section you want to insert content, and then click on “+ Code” or “+ Text” to add Python code or text, respectively. Python Notebook is great for data science work, because: It encourages reproducible data analysis, when you run your analysis from start to finish. It encourages excellent documentation, as you can have code, output from code, and prose combined together. It is flexible to use other programming languages, such as R. Now, we will get to the basics of programming grammar. 1.5 Grammar Structure 1: Evaluation of Expressions Expressions are be built out of operations or functions. Functions and operations take in data types, do something with them, and return another data type. We can combine multiple expressions together to form more complex expressions: an expression can have other expressions nested inside it. For instance, consider the following expressions entered to the Python Console: 18 + 21 ## 39 max(18, 21) ## 21 max(18 + 21, 65) ## 65 18 + (21 + 65) ## 104 len("ATCG") ## 4 Here, our input data types to the operation are integer in lines 1-4 and our input data type to the function is string in line 5. We will go over common data types shortly. Operations are just functions in hiding. We could have written: from operator import add add(18, 21) ## 39 add(18, add(21, 65)) ## 104 Remember that the Python language is supposed to help us understand what we are writing in code easily, lending to readable code. Therefore, it is sometimes useful to come up with operations that is easier to read. (Most functions in Python are stored in a collection of functions called modules that needs to be loaded. The import statement gives us permission to access the functions in the module “operator”.) 1.5.1 Data types Here are some common data types we will be using in this course. Data type name Data type shorthand Examples Integer int 2, 4 Float float 3.5, -34.1009 String str “hello”, “234-234-8594” Boolean bool True, False A nice way to summarize this first grammar structure is using the function machine schema, way back from algebra class: Function machine from algebra class. Here are some aspects of this schema to pay attention to: A programmer should not need to know how the function or operation is implemented in order to use it - this emphasizes abstraction and modular thinking, a foundation in any programming language. A function can have different kinds of inputs and outputs - it doesn’t need to be numbers. In the len() function, the input is a String, and the output is an Integer. We will see increasingly complex functions with all sorts of different inputs and outputs. 1.6 Grammar Structure 2: Storing data types in the Variable Environment To build up a computer program, we need to store our returned data type from our expression somewhere for downstream use. We can assign a variable to it as follows: x = 18 + 21 If you enter this in the Console, you will see that in the Variable Environment, the variable x has a value of 39. 1.6.1 Execution rule for variable assignment Evaluate the expression to the right of =. Bind variable to the left of = to the resulting value. The variable is stored in the Variable Environment. The Variable Environment is where all the variables are stored, and can be used for an expression anytime once it is defined. Only one unique variable name can be defined. The variable is stored in the working memory of your computer, Random Access Memory (RAM). This is temporary memory storage on the computer that can be accessed quickly. Typically a personal computer has 8, 16, 32 Gigabytes of RAM. Look, now x can be reused downstream: x - 2 ## 37 y = x * 2 It is quite common for programmers to not know what data type a variable is while they are coding. To learn about the data type of a variable, use the type() function on any variable in Python: type(y) ## <class 'int'> We should give useful variable names so that we know what to expect! If you are working with sales data, consider num_sales instead of y. 1.7 Grammar Structure 3: Evaluation of Functions Let’s look at functions a little bit more formally: A function has a function name, arguments, and returns a data type. 1.7.1 Execution rule for functions: Evaluate the function by its arguments if there’s any, and if the arguments are functions or contains operations, evaluate those functions or operations first. The output of functions is called the returned value. Often, we will use multiple functions in a nested way, and it is important to understand how the Python console understand the order of operation. We can also use paranthesis to change the order of operation. Think about what the Python is going to do step-by–step in the lines of code below: max(len("hello"), 4) ## 5 (len("pumpkin") - 8) * 2 ## -2 If we don’t know how to use a function, such as pow(), we can ask for help: ?pow pow(base, exp, mod=None) Equivalent to base**exp with 2 arguments or base**exp % mod with 3 arguments Some types, such as ints, are able to use a more efficient algorithm when invoked using the three argument form. This shows the function takes in three input arguments: base, exp, and mod=None. When an argument has an assigned value of mod=None, that means the input argument already has a value, and you don’t need to specify anything, unless you want to. The following ways are equivalent ways of using the pow() function: pow(2, 3) ## 8 pow(base=2, exp=3) ## 8 pow(exp=3, base=2) ## 8 but this will give you something different: pow(3, 2) ## 9 And there is an operational equivalent: 2 ** 3 ## 8 We will mostly look at functions with input arguments and return types in this course, but not all functions need to have input arguments and output return. Here are some varieties of functions to stretch your horizons. Function call What it takes in What it does Returns pow(a, b) integer a, integer b Raises a to the bth power. Integer print(x) any data type x Prints out the value of x to the console. None datetime.now() Nothing Gets the current time. String 1.8 Tips on writing your first code Computer = powerful + stupid Computers are excellent at doing something specific over and over again, but is extremely rigid and lack flexibility. Here are some tips that is helpful for beginners: Write incrementally, test often Check your assumptions, especially using new functions, operations, and new data types. Live environments are great for testing, but not great for reproducibility. Ask for help! To get more familiar with the errors Python gives you, take a look at this summary of Python error messages. 1.9 Exercises Exercise for week 1 can be found here. "],["working-with-data-structures.html", "Chapter 2 Working with data structures 2.1 Lists 2.2 Objects in Python 2.3 Dataframes 2.4 Exercises", " Chapter 2 Working with data structures In our second lesson, we start to look at two data structures, Lists and Dataframes, that can handle a large amount of data for analysis. 2.1 Lists In the first exercise, you started to explore data structures, which store information about data types. You explored lists, which is an ordered collection of data types or data structures. Each element of a list contains a data type or another data structure. We can now store a vast amount of information in a list, and assign it to a single variable. Even more, we can use operations and functions on a list, modifying many elements within the list at once! This makes analyzing data much more scalable and less repetitive. We create a list via the bracket [ ] operation. staff = ["chris", "ted", "jeff"] chrNum = [2, 3, 1, 2, 2] mixedList = [False, False, False, "A", "B", 92] 2.1.1 Subsetting lists To access an element of a list, you can use the bracket notation [ ] to access the elements of the list. We simply access an element via the “index” number - the location of the data within the list. Here’s the tricky thing about the index number: it starts at 0! 1st element of chrNum: chrNum[0] 2nd element of chrNum: chrNum[1] … 5th element of chrNum: chrNum[4] With subsetting, you can modify elements of a list or use the element of a list as part of an expression. 2.1.2 Subsetting multiple elements of lists Suppose you want to access multiple elements of a list, such as accessing the first three elements of chrNum. You would use the slice operator :, which specifies: the index number to start the index number to stop, plus one. If you want to access the first three elements of chrNum: chrNum[0:3] ## [2, 3, 1] The first element’s index number is 0, the third element’s index number is 2, plus 1, which is 3. If you want to access the second and third elements of chrNum: chrNum[1:3] ## [3, 1] If you want to access everything but the first three elements of chrNum: chrNum[3:len(chrNum)] ## [2, 2] where len(chrNum) is the length of the list. When the start or stop index is specified, it implies that you are subsetting starting the from the beginning of the list or subsetting to the end of the list, respectively: chrNum[:3] ## [2, 3, 1] chrNum[3:] ## [2, 2] More discussion of list slicing can be found here. 2.2 Objects in Python The list data structure has an organization and functionality that metaphorically represents a pen-and-paper list in our physical world. Like a physical object, we have examined: What does it contain (in terms of data)? What can it do (in terms of operations and functions)? And if it “makes sense” to us, then it is well-designed. The list data structure we have been working with is an example of an Object. The definition of an object allows us to ask the questions above: what does it contain, and what can it do? It is an organizational tool for a collection of data and functions that we can relate to. Formally, an object contains the following: Value that holds the essential data for the object. Attributes that store additional data for the object. Functions called Methods that can be used on the object. This organizing structure on an object applies to pretty much all Python data types and data structures. Let’s see how this applies to the list: Value: the contents of the list, such as [2, 3, 4]. Attributes that store additional values: Not relevant for lists. Methods that can be used on the object: chrNum.count(2) counts the number of instances 2 appears as an element of chrNum. Object methods are functions that does something with the object you are using it on. You should think about chrNum.count(2) as a function that takes in chrNum and 2 as inputs. If you want to use the count function on list mixedList, you would use mixedList.count(x). Here are some more examples of methods with lists: Function method What it takes in What it does Returns chrNum.count(x) list chrNum, data type x Counts the number of instances x appears as an element of chrNum. Integer chrNum.append(x) list chrNum, data type x Appends x to the end of the chrNum. None (but chrNum is modified!) chrNum.sort() list chrNum Sorts chrNum by ascending order. None (but chrNum is modified!) chrNum.reverse() list chrNum Reverses the order of chrNum. None (but chrNum is modified!) 2.3 Dataframes A Dataframe is a two-dimensional data structure that stores data like a spreadsheet does. The Dataframe data structure is found within a Python module called “Pandas”. A Python module is an organized collection of functions and data structures. The import statement below gives us permission to access the “Pandas” module via the variable pd. To load in a Dataframe from existing spreadsheet data, we use the function pd.read_csv(): import pandas as pd metadata = pd.read_csv("classroom_data/metadata.csv") type(metadata) ## <class 'pandas.core.frame.DataFrame'> There is a similar function pd.read_excel() for loading in Excel spreadsheets. Let’s investigate the Dataframe as an object: What does a Dataframe contain (in terms of data)? What can a Dataframe do (in terms of operations and functions)? 2.3.1 What does a Dataframe contain (in terms of data)? We first take a look at the contents: metadata ## ModelID ... OncotreeLineage ## 0 ACH-000001 ... Ovary/Fallopian Tube ## 1 ACH-000002 ... Myeloid ## 2 ACH-000003 ... Bowel ## 3 ACH-000004 ... Myeloid ## 4 ACH-000005 ... Myeloid ## ... ... ... ... ## 1859 ACH-002968 ... Esophagus/Stomach ## 1860 ACH-002972 ... Esophagus/Stomach ## 1861 ACH-002979 ... Esophagus/Stomach ## 1862 ACH-002981 ... Esophagus/Stomach ## 1863 ACH-003071 ... Lung ## ## [1864 rows x 30 columns] It looks like there are 1864 rows and 30 columns in this Dataframe, and when we display it it shows some of the data. We can look at specific columns by looking at attributes via the dot operation. We can also look at the columns via the bracket operation. metadata.ModelID ## 0 ACH-000001 ## 1 ACH-000002 ## 2 ACH-000003 ## 3 ACH-000004 ## 4 ACH-000005 ## ... ## 1859 ACH-002968 ## 1860 ACH-002972 ## 1861 ACH-002979 ## 1862 ACH-002981 ## 1863 ACH-003071 ## Name: ModelID, Length: 1864, dtype: object metadata['ModelID'] ## 0 ACH-000001 ## 1 ACH-000002 ## 2 ACH-000003 ## 3 ACH-000004 ## 4 ACH-000005 ## ... ## 1859 ACH-002968 ## 1860 ACH-002972 ## 1861 ACH-002979 ## 1862 ACH-002981 ## 1863 ACH-003071 ## Name: ModelID, Length: 1864, dtype: object The names of all columns is stored as an attribute, which can be accessed via the dot operation. metadata.columns ## Index(['ModelID', 'PatientID', 'CellLineName', 'StrippedCellLineName', 'Age', ## 'SourceType', 'SangerModelID', 'RRID', 'DepmapModelType', 'AgeCategory', ## 'GrowthPattern', 'LegacyMolecularSubtype', 'PrimaryOrMetastasis', ## 'SampleCollectionSite', 'Sex', 'SourceDetail', 'LegacySubSubtype', ## 'CatalogNumber', 'CCLEName', 'COSMICID', 'PublicComments', ## 'WTSIMasterCellID', 'EngineeredModel', 'TreatmentStatus', ## 'OnboardedMedia', 'PlateCoating', 'OncotreeCode', 'OncotreeSubtype', ## 'OncotreePrimaryDisease', 'OncotreeLineage'], ## dtype='object') The number of rows and columns are also stored as an attribute: metadata.shape ## (1864, 30) 2.3.2 What can a Dataframe do (in terms of operations and functions)? We can use the head() and tail() functions to look at the first few rows and last few rows of metadata, respectively: metadata.head() ## ModelID PatientID ... OncotreePrimaryDisease OncotreeLineage ## 0 ACH-000001 PT-gj46wT ... Ovarian Epithelial Tumor Ovary/Fallopian Tube ## 1 ACH-000002 PT-5qa3uk ... Acute Myeloid Leukemia Myeloid ## 2 ACH-000003 PT-puKIyc ... Colorectal Adenocarcinoma Bowel ## 3 ACH-000004 PT-q4K2cp ... Acute Myeloid Leukemia Myeloid ## 4 ACH-000005 PT-q4K2cp ... Acute Myeloid Leukemia Myeloid ## ## [5 rows x 30 columns] metadata.tail() ## ModelID PatientID ... OncotreePrimaryDisease OncotreeLineage ## 1859 ACH-002968 PT-pjhrsc ... Esophagogastric Adenocarcinoma Esophagus/Stomach ## 1860 ACH-002972 PT-dkXZB1 ... Esophagogastric Adenocarcinoma Esophagus/Stomach ## 1861 ACH-002979 PT-lyHTzo ... Esophagogastric Adenocarcinoma Esophagus/Stomach ## 1862 ACH-002981 PT-Z9akXf ... Esophagogastric Adenocarcinoma Esophagus/Stomach ## 1863 ACH-003071 PT-LAGmLq ... Lung Neuroendocrine Tumor Lung ## ## [5 rows x 30 columns] Both of these functions (without input arguments) are considered as methods: they are functions that does something with the Dataframe you are using it on. You should think about metadata.head() as a function that takes in metadata as an input. If we had another Dataframe called my_data and you want to use the same function, you will have to say my_data.head(). 2.3.2.1 Subsetting Dataframes Perhaps the most important operation you will can do with Dataframes is subsetting them. There are two ways to do it. The first way is to subset by numerical indicies, exactly like how we did for lists. You will use the iloc and bracket operations, and you give two slices: one for the row, and one for the column. Subset the first 5 rows, and first two columns: metadata.iloc[:5, :2] ## ModelID PatientID ## 0 ACH-000001 PT-gj46wT ## 1 ACH-000002 PT-5qa3uk ## 2 ACH-000003 PT-puKIyc ## 3 ACH-000004 PT-q4K2cp ## 4 ACH-000005 PT-q4K2cp If we want a custom slice that is not sequential, we can use an integer list. Subset the last 5 rows, and the 1st and 10 and 21th column: metadata.iloc[5:, [1, 10, 21]] ## PatientID GrowthPattern WTSIMasterCellID ## 5 PT-ej13Dz Suspension 2167.0 ## 6 PT-NOXwpH Adherent 569.0 ## 7 PT-fp8PeY Adherent 1806.0 ## 8 PT-puKIyc Adherent 2104.0 ## 9 PT-AR7W9o Adherent NaN ## ... ... ... ... ## 1859 PT-pjhrsc Organoid NaN ## 1860 PT-dkXZB1 Organoid NaN ## 1861 PT-lyHTzo Organoid NaN ## 1862 PT-Z9akXf Organoid NaN ## 1863 PT-LAGmLq Suspension NaN ## ## [1859 rows x 3 columns] This is a great way to start thinking about subsetting your dataframes for analysis, but this way of of subsetting can lead to some inconsistencies in the long run. For instance, suppose your collaborator added a new cell line to the metadata and changed the order of the column. Then your code to subset the last 5 rows and the columns will get you a different answer once the spreadsheet is changed. The second way is to subset by the column name, and this is much more preferred in data analysis practice. You will learn about it next week! 2.4 Exercises Exercise for week 2 can be found here. "],["about-the-authors.html", "About the Authors", " About the Authors These credits are based on our course contributors table guidelines.     Credits Names Pedagogy Lead Content Instructor(s) FirstName LastName Lecturer(s) (include chapter name/link in parentheses if only for specific chapters) - make new line if more than one chapter involved Delivered the course in some way - video or audio Content Author(s) (include chapter name/link in parentheses if only for specific chapters) - make new line if more than one chapter involved If any other authors besides lead instructor Content Contributor(s) (include section name/link in parentheses) - make new line if more than one section involved Wrote less than a chapter Content Editor(s)/Reviewer(s) Checked your content Content Director(s) Helped guide the content direction Content Consultants (include chapter name/link in parentheses or word “General”) - make new line if more than one chapter involved Gave high level advice on content Acknowledgments Gave small assistance to content but not to the level of consulting Production Content Publisher(s) Helped with publishing platform Content Publishing Reviewer(s) Reviewed overall content and aesthetics on publishing platform Technical Course Publishing Engineer(s) Helped with the code for the technical aspects related to the specific course generation Template Publishing Engineers Candace Savonen, Carrie Wright, Ava Hoffman Publishing Maintenance Engineer Candace Savonen Technical Publishing Stylists Carrie Wright, Ava Hoffman, Candace Savonen Package Developers (ottrpal) Candace Savonen, John Muschelli, Carrie Wright Art and Design Illustrator(s) Created graphics for the course Figure Artist(s) Created figures/plots for course Videographer(s) Filmed videos Videography Editor(s) Edited film Audiographer(s) Recorded audio Audiography Editor(s) Edited audio recordings Funding Funder(s) Institution/individual who funded course including grant number Funding Staff Staff members who help with funding   ## ─ Session info ─────────────────────────────────────────────────────────────── ## setting value ## version R version 4.3.2 (2023-10-31) ## os Ubuntu 22.04.4 LTS ## system x86_64, linux-gnu ## ui X11 ## language (EN) ## collate en_US.UTF-8 ## ctype en_US.UTF-8 ## tz Etc/UTC ## date 2024-08-16 ## pandoc 3.1.1 @ /usr/local/bin/ (via rmarkdown) ## ## ─ Packages ─────────────────────────────────────────────────────────────────── ## package * version date (UTC) lib source ## bookdown 0.39.1 2024-06-11 [1] Github (rstudio/bookdown@f244cf1) ## bslib 0.6.1 2023-11-28 [1] RSPM (R 4.3.0) ## cachem 1.0.8 2023-05-01 [1] RSPM (R 4.3.0) ## cli 3.6.2 2023-12-11 [1] RSPM (R 4.3.0) ## devtools 2.4.5 2022-10-11 [1] RSPM (R 4.3.0) ## digest 0.6.34 2024-01-11 [1] RSPM (R 4.3.0) ## ellipsis 0.3.2 2021-04-29 [1] RSPM (R 4.3.0) ## evaluate 0.23 2023-11-01 [1] RSPM (R 4.3.0) ## fastmap 1.1.1 2023-02-24 [1] RSPM (R 4.3.0) ## fs 1.6.3 2023-07-20 [1] RSPM (R 4.3.0) ## glue 1.7.0 2024-01-09 [1] RSPM (R 4.3.0) ## htmltools 0.5.7 2023-11-03 [1] RSPM (R 4.3.0) ## htmlwidgets 1.6.4 2023-12-06 [1] RSPM (R 4.3.0) ## httpuv 1.6.14 2024-01-26 [1] RSPM (R 4.3.0) ## jquerylib 0.1.4 2021-04-26 [1] RSPM (R 4.3.0) ## jsonlite 1.8.8 2023-12-04 [1] RSPM (R 4.3.0) ## knitr 1.47.3 2024-06-11 [1] Github (yihui/knitr@e1edd34) ## later 1.3.2 2023-12-06 [1] RSPM (R 4.3.0) ## lifecycle 1.0.4 2023-11-07 [1] RSPM (R 4.3.0) ## magrittr 2.0.3 2022-03-30 [1] RSPM (R 4.3.0) ## memoise 2.0.1 2021-11-26 [1] RSPM (R 4.3.0) ## mime 0.12 2021-09-28 [1] RSPM (R 4.3.0) ## miniUI 0.1.1.1 2018-05-18 [1] RSPM (R 4.3.0) ## pkgbuild 1.4.3 2023-12-10 [1] RSPM (R 4.3.0) ## pkgload 1.3.4 2024-01-16 [1] RSPM (R 4.3.0) ## profvis 0.3.8 2023-05-02 [1] RSPM (R 4.3.0) ## promises 1.2.1 2023-08-10 [1] RSPM (R 4.3.0) ## purrr 1.0.2 2023-08-10 [1] RSPM (R 4.3.0) ## R6 2.5.1 2021-08-19 [1] RSPM (R 4.3.0) ## Rcpp 1.0.12 2024-01-09 [1] RSPM (R 4.3.0) ## remotes 2.4.2.1 2023-07-18 [1] RSPM (R 4.3.0) ## rlang 1.1.4 2024-06-04 [1] CRAN (R 4.3.2) ## rmarkdown 2.27.1 2024-06-11 [1] Github (rstudio/rmarkdown@e1c93a9) ## sass 0.4.8 2023-12-06 [1] RSPM (R 4.3.0) ## sessioninfo 1.2.2 2021-12-06 [1] RSPM (R 4.3.0) ## shiny 1.8.0 2023-11-17 [1] RSPM (R 4.3.0) ## stringi 1.8.3 2023-12-11 [1] RSPM (R 4.3.0) ## stringr 1.5.1 2023-11-14 [1] RSPM (R 4.3.0) ## urlchecker 1.0.1 2021-11-30 [1] RSPM (R 4.3.0) ## usethis 2.2.3 2024-02-19 [1] RSPM (R 4.3.0) ## vctrs 0.6.5 2023-12-01 [1] RSPM (R 4.3.0) ## xfun 0.44.4 2024-06-11 [1] Github (yihui/xfun@9da62cc) ## xtable 1.8-4 2019-04-21 [1] RSPM (R 4.3.0) ## yaml 2.3.8 2023-12-11 [1] RSPM (R 4.3.0) ## ## [1] /usr/local/lib/R/site-library ## [2] /usr/local/lib/R/library ## ## ────────────────────────────────────────────────────────────────────────────── "],["references.html", "Chapter 3 References", " Chapter 3 References "],["404.html", "Page not found", " Page not found The page you requested cannot be found (perhaps it was moved or renamed). You may want to try searching to find the page's new location, or use the table of contents to find the page you are looking for. "]] diff --git a/docs/working-with-data-structures.html b/docs/working-with-data-structures.html index 8300f38..adbe899 100644 --- a/docs/working-with-data-structures.html +++ b/docs/working-with-data-structures.html @@ -178,6 +178,7 @@
  • 2.3.1 What does a Dataframe contain (in terms of data)?
  • 2.3.2 What can a Dataframe do (in terms of operations and functions)?
  • +
  • 2.4 Exercises
  • About the Authors
  • 3 References
  • @@ -270,7 +271,7 @@

    2.2 Objects in Python2.2 Objects in Python @@ -455,9 +457,13 @@

    2.3.2.1 Subsetting Dataframes

    This is a great way to start thinking about subsetting your dataframes for analysis, but this way of of subsetting can lead to some inconsistencies in the long run. For instance, suppose your collaborator added a new cell line to the metadata and changed the order of the column. Then your code to subset the last 5 rows and the columns will get you a different answer once the spreadsheet is changed.

    The second way is to subset by the column name, and this is much more preferred in data analysis practice. You will learn about it next week!

    - + +
    +

    2.4 Exercises

    +

    Exercise for week 2 can be found here.

    +