Skip to content

Commit

Permalink
Render toc-less
Browse files Browse the repository at this point in the history
github-actions[bot] committed Nov 14, 2024
1 parent cfd822c commit 41f7f2b
Showing 17 changed files with 71 additions and 53 deletions.
6 changes: 3 additions & 3 deletions docs/no_toc/02-data-structures.md
Original file line number Diff line number Diff line change
@@ -124,7 +124,7 @@ Object methods are functions that does something with the object you are using i
Here are some more examples of methods with lists:

| Function method | What it takes in | What it does | Returns |
|---------------|---------------|---------------------------|---------------|
|------------------------------------------------------------------------------|------------------------------|-----------------------------------------------------------------------|----------------------------------|
| [`chrNum.count(x)`](https://docs.python.org/3/tutorial/datastructures.html) | list `chrNum`, data type `x` | Counts the number of instances `x` appears as an element of `chrNum`. | Integer |
| [`chrNum.append(x)`](https://docs.python.org/3/tutorial/datastructures.html) | list `chrNum`, data type `x` | Appends `x` to the end of the `chrNum`. | None (but `chrNum` is modified!) |
| [`chrNum.sort()`](https://docs.python.org/3/tutorial/datastructures.html) | list `chrNum` | Sorts `chrNum` by ascending order. | None (but `chrNum` is modified!) |
@@ -324,7 +324,7 @@ metadata.tail()

Both of these functions (without input arguments) are considered as **methods**: they are functions that does something with the Dataframe you are using it on. You should think about `metadata.head()` as a function that takes in `metadata` as an input. If we had another Dataframe called `my_data` and you want to use the same function, you will have to say `my_data.head()`.

## Subsetting Dataframes
## Subsetting Dataframes

Perhaps the most important operation you will can do with Dataframes is subsetting them. There are two ways to do it. The first way is to subset by numerical indicies, exactly like how we did for lists.

@@ -355,7 +355,7 @@ Here is how the dataframe looks like with the row and column index numbers:

Subset the first fourth rows, and the first two columns:

![](images/pandas subset_1.png)
![](images/pandas%20subset_1.png)

Now, back to `metadata` dataframe:

12 changes: 6 additions & 6 deletions docs/no_toc/03-data-wrangling1.md
Original file line number Diff line number Diff line change
@@ -100,7 +100,7 @@ expression.head()
```

| Dataframe | The observation is | Some variables are | Some values are |
|-----------------|-----------------|--------------------|------------------|
|------------|--------------------|-------------------------------|-----------------------------|
| metadata | Cell line | ModelID, Age, OncotreeLineage | "ACH-000001", 60, "Myeloid" |
| expression | Cell line | KRAS_Exp | 2.4, .3 |
| mutation | Cell line | KRAS_Mut | TRUE, FALSE |
@@ -117,9 +117,9 @@ Here's a starting prompt:
We have been using **explicit subsetting** with numerical indicies, such as "I want to filter for rows 20-50 and select columns 2 and 8". We are now going to switch to **implicit subsetting** in which we describe the subsetting criteria via comparision operators and column names, such as:

*"I want to subset for rows such that the OncotreeLineage is breast cancer and subset for columns Age and Sex."*
*"I want to subset for rows such that the OncotreeLineage is lung cancer and subset for columns Age and Sex."*

Notice that when we subset for rows in an implicit way, we formulate our criteria in terms of the columns.This is because we are guaranteed to have column names in Dataframes, but not row names.
Notice that when we subset for rows in an implicit way, we formulate our criteria in terms of the columns. This is because we are guaranteed to have column names in Dataframes, but not row names.

#### Let's convert our implicit subsetting criteria into code!

@@ -145,7 +145,7 @@ metadata['OncotreeLineage'] == "Lung"
## Name: OncotreeLineage, Length: 1864, dtype: bool
```

Then, we will use the [`.loc`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html) operation (which is different than [`.iloc`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html) operation!) and subsetting brackets to subset rows and columns Age and Sex at the same time:
Then, we will use the [`.loc`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html) attribute (which is different than [`.iloc`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html) attribute!) and subsetting brackets to subset rows and columns Age and Sex at the same time:


``` python
@@ -213,7 +213,7 @@ Now that your Dataframe has be transformed based on your scientific question, yo
If we look at the data structure of a Dataframe's column, it is actually not a List, but an object called Series. It has methods can compute summary statistics for us. Let's take a look at a few popular examples:

| Function method | What it takes in | What it does | Returns |
|----------------|----------------|------------------------|----------------|
|---------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------|-------------------------------------------------------------------------------|---------------|
| [`metadata.Age.mean()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.mean.html) | `metadata.Age` as a numeric Series | Computes the mean value of the `Age` column. | Float (NumPy) |
| [`metadata['Age'].median()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.median.html) | `metadata['Age']` as a numeric Series | Computes the median value of the `Age` column. | Float (NumPy) |
| [`metadata.Age.max()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.max.html) | `metadata.Age` as a numeric Series | Computes the max value of the `Age` column. | Float (NumPy) |
@@ -277,7 +277,7 @@ Notice that the output of some of these methods are Float (NumPy). This refers t
We will dedicate extensive time later this course to talk about data visualization, but the Dataframe's column, Series, has a method called [`.plot()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.plot.html) that can help us make simple plots for one variable. The `.plot()` method will by default make a line plot, but it is not necessary the plot style we want, so we can give the optional argument `kind` a String value to specify the plot style. We use it for making a histogram or bar plot.

| Plot style | Useful for | kind = | Code |
|-------------|-------------|-------------|---------------------------------|
|------------|------------|--------|--------------------------------------------------------------|
| Histogram | Numerics | "hist" | `metadata.Age.plot(kind = "hist")` |
| Bar plot | Strings | "bar" | `metadata.OncotreeSubtype.value_counts().plot(kind = "bar")` |

4 changes: 2 additions & 2 deletions docs/no_toc/04-data-wrangling2.md
Original file line number Diff line number Diff line change
@@ -164,7 +164,7 @@ To get there, we need to:

- **Summarize** each group via a summary statistic performed on a column, such as `Age`.

We first subset the the two columns we need, and then use the methods [`.group_by(x)`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) and `.mean()`.
We use the methods [`.group_by(x)`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) and `.mean()`.


``` python
@@ -210,7 +210,7 @@ metadata_grouped['Age'].mean()

Here's what's going on:

- We use the Dataframe method [`.group_by(x)`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) and specify the column we want to group by. The output of this method is a Grouped Dataframe object. It still contains all the information of the `metadata` Dataframe, but it makes a note that it's been grouped.
- We use the Dataframe method [`.group_by(x)`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html) and specify the column we want to group by. The output of this method is a **Grouped Dataframe object**. It still contains all the information of the `metadata` Dataframe, but it makes a note that it's been grouped.

- We subset to the column `Age`. The grouping information still persists (This is a Grouped Series object).

8 changes: 6 additions & 2 deletions docs/no_toc/05-data-visualization.md
Original file line number Diff line number Diff line change
@@ -30,7 +30,7 @@ Categorical (between 1 categorical and 1 continuous variable)

- Violin plots

[![Image source: Seaborn's overview of plotting functions](https://seaborn.pydata.org/_images/function_overview_8_0.png)](https://seaborn.pydata.org/tutorial/function_overview.html)
[![Image source: Seaborn\'s overview of plotting functions](https://seaborn.pydata.org/_images/function_overview_8_0.png)](https://seaborn.pydata.org/tutorial/function_overview.html)

Why do we focus on these common plots? Our eyes are better at distinguishing certain visual features more than others. All of these plots are focused on their position to depict data, which gives us the most effective visual scale.

@@ -221,6 +221,10 @@ plot = sns.displot(data=metadata, x="Age", hue="Sex", multiple="dodge", palette=

<img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-15-27.png" width="590" />

## Other resources

We recommend checking out the workshop [Better Plots](https://hutchdatascience.org/better_plots/), which showcase examples of how to clean up your plots for clearer communication.

## Exercises

Exercise for week 5 can be found [here](https://colab.research.google.com/drive/1kT3zzq2rrhL1vHl01IdW5L1V7v0iK0wY?usp=sharing).
Exercise for week 5 can be found [here](https://colab.research.google.com/drive/17iwr8NwLLrmzRj4a6zRZucETXpPkmDNR?usp=sharing).
3 changes: 2 additions & 1 deletion docs/no_toc/404.html
Original file line number Diff line number Diff line change
@@ -206,7 +206,8 @@
<li class="chapter" data-level="5.2" data-path="data-visualization.html"><a href="data-visualization.html#relational-between-2-continuous-variables"><i class="fa fa-check"></i><b>5.2</b> Relational (between 2 continuous variables)</a></li>
<li class="chapter" data-level="5.3" data-path="data-visualization.html"><a href="data-visualization.html#categorical-between-1-categorical-and-1-continuous-variable"><i class="fa fa-check"></i><b>5.3</b> Categorical (between 1 categorical and 1 continuous variable)</a></li>
<li class="chapter" data-level="5.4" data-path="data-visualization.html"><a href="data-visualization.html#basic-plot-customization"><i class="fa fa-check"></i><b>5.4</b> Basic plot customization</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.5</b> Exercises</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#other-resources"><i class="fa fa-check"></i><b>5.5</b> Other resources</a></li>
<li class="chapter" data-level="5.6" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.6</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="about-the-authors.html"><a href="about-the-authors.html"><i class="fa fa-check"></i>About the Authors</a></li>
<li class="chapter" data-level="6" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i><b>6</b> References</a></li>
2 changes: 1 addition & 1 deletion docs/no_toc/About.md
Original file line number Diff line number Diff line change
@@ -51,7 +51,7 @@ These credits are based on our [course contributors table guidelines](https://ww
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Etc/UTC
## date 2024-09-26
## date 2024-11-14
## pandoc 3.1.1 @ /usr/local/bin/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
5 changes: 3 additions & 2 deletions docs/no_toc/about-the-authors.html
Original file line number Diff line number Diff line change
@@ -206,7 +206,8 @@
<li class="chapter" data-level="5.2" data-path="data-visualization.html"><a href="data-visualization.html#relational-between-2-continuous-variables"><i class="fa fa-check"></i><b>5.2</b> Relational (between 2 continuous variables)</a></li>
<li class="chapter" data-level="5.3" data-path="data-visualization.html"><a href="data-visualization.html#categorical-between-1-categorical-and-1-continuous-variable"><i class="fa fa-check"></i><b>5.3</b> Categorical (between 1 categorical and 1 continuous variable)</a></li>
<li class="chapter" data-level="5.4" data-path="data-visualization.html"><a href="data-visualization.html#basic-plot-customization"><i class="fa fa-check"></i><b>5.4</b> Basic plot customization</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.5</b> Exercises</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#other-resources"><i class="fa fa-check"></i><b>5.5</b> Other resources</a></li>
<li class="chapter" data-level="5.6" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.6</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="about-the-authors.html"><a href="about-the-authors.html"><i class="fa fa-check"></i>About the Authors</a></li>
<li class="chapter" data-level="6" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i><b>6</b> References</a></li>
@@ -386,7 +387,7 @@ <h1>About the Authors<a href="about-the-authors.html#about-the-authors" class="a
## collate en_US.UTF-8
## ctype en_US.UTF-8
## tz Etc/UTC
## date 2024-09-26
## date 2024-11-14
## pandoc 3.1.1 @ /usr/local/bin/ (via rmarkdown)
##
## ─ Packages ───────────────────────────────────────────────────────────────────
15 changes: 10 additions & 5 deletions docs/no_toc/data-visualization.html
Original file line number Diff line number Diff line change
@@ -206,7 +206,8 @@
<li class="chapter" data-level="5.2" data-path="data-visualization.html"><a href="data-visualization.html#relational-between-2-continuous-variables"><i class="fa fa-check"></i><b>5.2</b> Relational (between 2 continuous variables)</a></li>
<li class="chapter" data-level="5.3" data-path="data-visualization.html"><a href="data-visualization.html#categorical-between-1-categorical-and-1-continuous-variable"><i class="fa fa-check"></i><b>5.3</b> Categorical (between 1 categorical and 1 continuous variable)</a></li>
<li class="chapter" data-level="5.4" data-path="data-visualization.html"><a href="data-visualization.html#basic-plot-customization"><i class="fa fa-check"></i><b>5.4</b> Basic plot customization</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.5</b> Exercises</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#other-resources"><i class="fa fa-check"></i><b>5.5</b> Other resources</a></li>
<li class="chapter" data-level="5.6" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.6</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="about-the-authors.html"><a href="about-the-authors.html"><i class="fa fa-check"></i>About the Authors</a></li>
<li class="chapter" data-level="6" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i><b>6</b> References</a></li>
@@ -268,7 +269,7 @@ <h1><span class="header-section-number">Chapter 5</span> Data Visualization<a hr
<li><p>Bar plots</p></li>
<li><p>Violin plots</p></li>
</ul>
<p><a href="https://seaborn.pydata.org/tutorial/function_overview.html"><img src="https://seaborn.pydata.org/_images/function_overview_8_0.png" alt="Image source: Seaborns overview of plotting functions" /></a></p>
<p><a href="https://seaborn.pydata.org/tutorial/function_overview.html"><img src="https://seaborn.pydata.org/_images/function_overview_8_0.png" alt="Image source: Seaborn&#39;s overview of plotting functions" /></a></p>
<p>Why do we focus on these common plots? Our eyes are better at distinguishing certain visual features more than others. All of these plots are focused on their position to depict data, which gives us the most effective visual scale.</p>
<p><a href="https://www.oreilly.com/library/view/visualization-analysis-and/9781466508910/K14708_C005.xhtml"><img src="https://www.oreilly.com/api/v2/epubs/9781466508910/files/image/fig5-1.png" alt="Image Source: Visualization Analysis and Design by [Tamara Munzner](https://www.oreilly.com/search?q=author:%22Tamara%20Munzner%22)" /></a></p>
<p>Let’s load in our genomics datasets and start making some plots from them.</p>
@@ -363,9 +364,13 @@ <h2><span class="header-section-number">5.4</span> Basic plot customization<a hr
<pre><code>## &lt;string&gt;:1: UserWarning: The palette list has more values (6) than needed (3), which may not be intended.</code></pre>
<p><img src="resources/images/05-data-visualization_files/figure-html/unnamed-chunk-15-27.png" width="590" /></p>
</div>
<div id="exercises-4" class="section level2 hasAnchor" number="5.5">
<h2><span class="header-section-number">5.5</span> Exercises<a href="data-visualization.html#exercises-4" class="anchor-section" aria-label="Anchor link to header"></a></h2>
<p>Exercise for week 5 can be found <a href="https://colab.research.google.com/drive/1kT3zzq2rrhL1vHl01IdW5L1V7v0iK0wY?usp=sharing">here</a>.</p>
<div id="other-resources" class="section level2 hasAnchor" number="5.5">
<h2><span class="header-section-number">5.5</span> Other resources<a href="data-visualization.html#other-resources" class="anchor-section" aria-label="Anchor link to header"></a></h2>
<p>We recommend checking out the workshop <a href="https://hutchdatascience.org/better_plots/">Better Plots</a>, which showcase examples of how to clean up your plots for clearer communication.</p>
</div>
<div id="exercises-4" class="section level2 hasAnchor" number="5.6">
<h2><span class="header-section-number">5.6</span> Exercises<a href="data-visualization.html#exercises-4" class="anchor-section" aria-label="Anchor link to header"></a></h2>
<p>Exercise for week 5 can be found <a href="https://colab.research.google.com/drive/17iwr8NwLLrmzRj4a6zRZucETXpPkmDNR?usp=sharing">here</a>.</p>

</div>
</div>
33 changes: 17 additions & 16 deletions docs/no_toc/data-wrangling-part-1.html
Original file line number Diff line number Diff line change
@@ -206,7 +206,8 @@
<li class="chapter" data-level="5.2" data-path="data-visualization.html"><a href="data-visualization.html#relational-between-2-continuous-variables"><i class="fa fa-check"></i><b>5.2</b> Relational (between 2 continuous variables)</a></li>
<li class="chapter" data-level="5.3" data-path="data-visualization.html"><a href="data-visualization.html#categorical-between-1-categorical-and-1-continuous-variable"><i class="fa fa-check"></i><b>5.3</b> Categorical (between 1 categorical and 1 continuous variable)</a></li>
<li class="chapter" data-level="5.4" data-path="data-visualization.html"><a href="data-visualization.html#basic-plot-customization"><i class="fa fa-check"></i><b>5.4</b> Basic plot customization</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.5</b> Exercises</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#other-resources"><i class="fa fa-check"></i><b>5.5</b> Other resources</a></li>
<li class="chapter" data-level="5.6" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.6</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="about-the-authors.html"><a href="about-the-authors.html"><i class="fa fa-check"></i>About the Authors</a></li>
<li class="chapter" data-level="6" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i><b>6</b> References</a></li>
@@ -315,10 +316,10 @@ <h2><span class="header-section-number">3.2</span> Our working Tidy Data: DepMap
## [5 rows x 536 columns]</code></pre>
<table>
<colgroup>
<col width="23%" />
<col width="23%" />
<col width="27%" />
<col width="25%" />
<col width="13%" />
<col width="21%" />
<col width="33%" />
<col width="31%" />
</colgroup>
<thead>
<tr class="header">
@@ -359,8 +360,8 @@ <h2><span class="header-section-number">3.3</span> Transform: “What do you wan
<p>In the <code>metadata</code> dataframe, which rows would you subset for and columns would you subset for that relate to a scientific question?</p>
</blockquote>
<p>We have been using <strong>explicit subsetting</strong> with numerical indicies, such as “I want to filter for rows 20-50 and select columns 2 and 8”. We are now going to switch to <strong>implicit subsetting</strong> in which we describe the subsetting criteria via comparision operators and column names, such as:</p>
<p><em>“I want to subset for rows such that the OncotreeLineage is breast cancer and subset for columns Age and Sex.”</em></p>
<p>Notice that when we subset for rows in an implicit way, we formulate our criteria in terms of the columns.This is because we are guaranteed to have column names in Dataframes, but not row names.</p>
<p><em>“I want to subset for rows such that the OncotreeLineage is lung cancer and subset for columns Age and Sex.”</em></p>
<p>Notice that when we subset for rows in an implicit way, we formulate our criteria in terms of the columns. This is because we are guaranteed to have column names in Dataframes, but not row names.</p>
<div id="lets-convert-our-implicit-subsetting-criteria-into-code" class="section level4 hasAnchor" number="3.3.0.1">
<h4><span class="header-section-number">3.3.0.1</span> Let’s convert our implicit subsetting criteria into code!<a href="data-wrangling-part-1.html#lets-convert-our-implicit-subsetting-criteria-into-code" class="anchor-section" aria-label="Anchor link to header"></a></h4>
<p>To subset for rows implicitly, we will use the conditional operators on Dataframe columns you used in Exercise 2. To formulate a conditional operator expression that OncotreeLineage is breast cancer:</p>
@@ -377,7 +378,7 @@ <h4><span class="header-section-number">3.3.0.1</span> Let’s convert our impli
## 1862 False
## 1863 True
## Name: OncotreeLineage, Length: 1864, dtype: bool</code></pre>
<p>Then, we will use the <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html"><code>.loc</code></a> operation (which is different than <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html"><code>.iloc</code></a> operation!) and subsetting brackets to subset rows and columns Age and Sex at the same time:</p>
<p>Then, we will use the <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html"><code>.loc</code></a> attribute (which is different than <a href="https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html"><code>.iloc</code></a> attribute!) and subsetting brackets to subset rows and columns Age and Sex at the same time:</p>
<div class="sourceCode" id="cb78"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb78-1"><a href="data-wrangling-part-1.html#cb78-1" tabindex="-1"></a>metadata.loc[metadata[<span class="st">&#39;OncotreeLineage&#39;</span>] <span class="op">==</span> <span class="st">&quot;Lung&quot;</span>, [<span class="st">&quot;Age&quot;</span>, <span class="st">&quot;Sex&quot;</span>]]</span></code></pre></div>
<pre><code>## Age Sex
## 10 39.0 Female
@@ -420,10 +421,10 @@ <h2><span class="header-section-number">3.4</span> Summary Statistics<a href="da
<p>If we look at the data structure of a Dataframe’s column, it is actually not a List, but an object called Series. It has methods can compute summary statistics for us. Let’s take a look at a few popular examples:</p>
<table style="width:100%;">
<colgroup>
<col width="22%" />
<col width="22%" />
<col width="33%" />
<col width="22%" />
<col width="46%" />
<col width="17%" />
<col width="29%" />
<col width="5%" />
</colgroup>
<thead>
<tr class="header">
@@ -504,10 +505,10 @@ <h2><span class="header-section-number">3.5</span> Simple data visualization<a h
<p>We will dedicate extensive time later this course to talk about data visualization, but the Dataframe’s column, Series, has a method called <a href="https://pandas.pydata.org/docs/reference/api/pandas.Series.plot.html"><code>.plot()</code></a> that can help us make simple plots for one variable. The <code>.plot()</code> method will by default make a line plot, but it is not necessary the plot style we want, so we can give the optional argument <code>kind</code> a String value to specify the plot style. We use it for making a histogram or bar plot.</p>
<table>
<colgroup>
<col width="18%" />
<col width="18%" />
<col width="18%" />
<col width="45%" />
<col width="12%" />
<col width="12%" />
<col width="8%" />
<col width="65%" />
</colgroup>
<thead>
<tr class="header">
7 changes: 4 additions & 3 deletions docs/no_toc/data-wrangling-part-2.html
Original file line number Diff line number Diff line change
@@ -206,7 +206,8 @@
<li class="chapter" data-level="5.2" data-path="data-visualization.html"><a href="data-visualization.html#relational-between-2-continuous-variables"><i class="fa fa-check"></i><b>5.2</b> Relational (between 2 continuous variables)</a></li>
<li class="chapter" data-level="5.3" data-path="data-visualization.html"><a href="data-visualization.html#categorical-between-1-categorical-and-1-continuous-variable"><i class="fa fa-check"></i><b>5.3</b> Categorical (between 1 categorical and 1 continuous variable)</a></li>
<li class="chapter" data-level="5.4" data-path="data-visualization.html"><a href="data-visualization.html#basic-plot-customization"><i class="fa fa-check"></i><b>5.4</b> Basic plot customization</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.5</b> Exercises</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#other-resources"><i class="fa fa-check"></i><b>5.5</b> Other resources</a></li>
<li class="chapter" data-level="5.6" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.6</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="about-the-authors.html"><a href="about-the-authors.html"><i class="fa fa-check"></i>About the Authors</a></li>
<li class="chapter" data-level="6" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i><b>6</b> References</a></li>
@@ -459,7 +460,7 @@ <h2><span class="header-section-number">4.3</span> Grouping and summarizing Data
<li><p><strong>Group</strong> the data based on some criteria, elements of <code>OncotreeLineage</code></p></li>
<li><p><strong>Summarize</strong> each group via a summary statistic performed on a column, such as <code>Age</code>.</p></li>
</ul>
<p>We first subset the the two columns we need, and then use the methods <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html"><code>.group_by(x)</code></a> and <code>.mean()</code>.</p>
<p>We use the methods <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html"><code>.group_by(x)</code></a> and <code>.mean()</code>.</p>
<div class="sourceCode" id="cb103"><pre class="sourceCode python"><code class="sourceCode python"><span id="cb103-1"><a href="data-wrangling-part-2.html#cb103-1" tabindex="-1"></a>metadata_grouped <span class="op">=</span> metadata.groupby(<span class="st">&quot;OncotreeLineage&quot;</span>)</span>
<span id="cb103-2"><a href="data-wrangling-part-2.html#cb103-2" tabindex="-1"></a>metadata_grouped[<span class="st">&#39;Age&#39;</span>].mean()</span></code></pre></div>
<pre><code>## OncotreeLineage
@@ -497,7 +498,7 @@ <h2><span class="header-section-number">4.3</span> Grouping and summarizing Data
## Name: Age, dtype: float64</code></pre>
<p>Here’s what’s going on:</p>
<ul>
<li><p>We use the Dataframe method <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html"><code>.group_by(x)</code></a> and specify the column we want to group by. The output of this method is a Grouped Dataframe object. It still contains all the information of the <code>metadata</code> Dataframe, but it makes a note that it’s been grouped.</p></li>
<li><p>We use the Dataframe method <a href="https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html"><code>.group_by(x)</code></a> and specify the column we want to group by. The output of this method is a <strong>Grouped Dataframe object</strong>. It still contains all the information of the <code>metadata</code> Dataframe, but it makes a note that it’s been grouped.</p></li>
<li><p>We subset to the column <code>Age</code>. The grouping information still persists (This is a Grouped Series object).</p></li>
<li><p>We use the method <code>.mean()</code> to calculate the mean value of <code>Age</code> within each group defined by <code>OncotreeLineage</code>.</p></li>
</ul>
5 changes: 3 additions & 2 deletions docs/no_toc/index.html
Original file line number Diff line number Diff line change
@@ -206,7 +206,8 @@
<li class="chapter" data-level="5.2" data-path="data-visualization.html"><a href="data-visualization.html#relational-between-2-continuous-variables"><i class="fa fa-check"></i><b>5.2</b> Relational (between 2 continuous variables)</a></li>
<li class="chapter" data-level="5.3" data-path="data-visualization.html"><a href="data-visualization.html#categorical-between-1-categorical-and-1-continuous-variable"><i class="fa fa-check"></i><b>5.3</b> Categorical (between 1 categorical and 1 continuous variable)</a></li>
<li class="chapter" data-level="5.4" data-path="data-visualization.html"><a href="data-visualization.html#basic-plot-customization"><i class="fa fa-check"></i><b>5.4</b> Basic plot customization</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.5</b> Exercises</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#other-resources"><i class="fa fa-check"></i><b>5.5</b> Other resources</a></li>
<li class="chapter" data-level="5.6" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.6</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="about-the-authors.html"><a href="about-the-authors.html"><i class="fa fa-check"></i>About the Authors</a></li>
<li class="chapter" data-level="6" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i><b>6</b> References</a></li>
@@ -246,7 +247,7 @@ <h1>
</div>
<div id="header">
<h1 class="title">Introduction to Python</h1>
<p class="date"><em>September, 2024</em></p>
<p class="date"><em>November, 2024</em></p>
</div>
<div id="about-this-course" class="section level1 unnumbered hasAnchor">
<h1>About this Course<a href="index.html#about-this-course" class="anchor-section" aria-label="Anchor link to header"></a></h1>
2 changes: 1 addition & 1 deletion docs/no_toc/index.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
---
title: "Introduction to Python"
date: "September, 2024"
date: "November, 2024"
site: bookdown::bookdown_site
documentclass: book
bibliography: [book.bib]
3 changes: 2 additions & 1 deletion docs/no_toc/intro-to-computing.html
Original file line number Diff line number Diff line change
@@ -206,7 +206,8 @@
<li class="chapter" data-level="5.2" data-path="data-visualization.html"><a href="data-visualization.html#relational-between-2-continuous-variables"><i class="fa fa-check"></i><b>5.2</b> Relational (between 2 continuous variables)</a></li>
<li class="chapter" data-level="5.3" data-path="data-visualization.html"><a href="data-visualization.html#categorical-between-1-categorical-and-1-continuous-variable"><i class="fa fa-check"></i><b>5.3</b> Categorical (between 1 categorical and 1 continuous variable)</a></li>
<li class="chapter" data-level="5.4" data-path="data-visualization.html"><a href="data-visualization.html#basic-plot-customization"><i class="fa fa-check"></i><b>5.4</b> Basic plot customization</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.5</b> Exercises</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#other-resources"><i class="fa fa-check"></i><b>5.5</b> Other resources</a></li>
<li class="chapter" data-level="5.6" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.6</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="about-the-authors.html"><a href="about-the-authors.html"><i class="fa fa-check"></i>About the Authors</a></li>
<li class="chapter" data-level="6" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i><b>6</b> References</a></li>
1 change: 1 addition & 0 deletions docs/no_toc/reference-keys.txt
Original file line number Diff line number Diff line change
@@ -47,5 +47,6 @@ distributions-one-variable
relational-between-2-continuous-variables
categorical-between-1-categorical-and-1-continuous-variable
basic-plot-customization
other-resources
exercises-4
references
3 changes: 2 additions & 1 deletion docs/no_toc/references.html
Original file line number Diff line number Diff line change
@@ -206,7 +206,8 @@
<li class="chapter" data-level="5.2" data-path="data-visualization.html"><a href="data-visualization.html#relational-between-2-continuous-variables"><i class="fa fa-check"></i><b>5.2</b> Relational (between 2 continuous variables)</a></li>
<li class="chapter" data-level="5.3" data-path="data-visualization.html"><a href="data-visualization.html#categorical-between-1-categorical-and-1-continuous-variable"><i class="fa fa-check"></i><b>5.3</b> Categorical (between 1 categorical and 1 continuous variable)</a></li>
<li class="chapter" data-level="5.4" data-path="data-visualization.html"><a href="data-visualization.html#basic-plot-customization"><i class="fa fa-check"></i><b>5.4</b> Basic plot customization</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.5</b> Exercises</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#other-resources"><i class="fa fa-check"></i><b>5.5</b> Other resources</a></li>
<li class="chapter" data-level="5.6" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.6</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="about-the-authors.html"><a href="about-the-authors.html"><i class="fa fa-check"></i>About the Authors</a></li>
<li class="chapter" data-level="6" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i><b>6</b> References</a></li>
2 changes: 1 addition & 1 deletion docs/no_toc/search_index.json

Large diffs are not rendered by default.

13 changes: 7 additions & 6 deletions docs/no_toc/working-with-data-structures.html
Original file line number Diff line number Diff line change
@@ -206,7 +206,8 @@
<li class="chapter" data-level="5.2" data-path="data-visualization.html"><a href="data-visualization.html#relational-between-2-continuous-variables"><i class="fa fa-check"></i><b>5.2</b> Relational (between 2 continuous variables)</a></li>
<li class="chapter" data-level="5.3" data-path="data-visualization.html"><a href="data-visualization.html#categorical-between-1-categorical-and-1-continuous-variable"><i class="fa fa-check"></i><b>5.3</b> Categorical (between 1 categorical and 1 continuous variable)</a></li>
<li class="chapter" data-level="5.4" data-path="data-visualization.html"><a href="data-visualization.html#basic-plot-customization"><i class="fa fa-check"></i><b>5.4</b> Basic plot customization</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.5</b> Exercises</a></li>
<li class="chapter" data-level="5.5" data-path="data-visualization.html"><a href="data-visualization.html#other-resources"><i class="fa fa-check"></i><b>5.5</b> Other resources</a></li>
<li class="chapter" data-level="5.6" data-path="data-visualization.html"><a href="data-visualization.html#exercises-4"><i class="fa fa-check"></i><b>5.6</b> Exercises</a></li>
</ul></li>
<li class="chapter" data-level="" data-path="about-the-authors.html"><a href="about-the-authors.html"><i class="fa fa-check"></i>About the Authors</a></li>
<li class="chapter" data-level="6" data-path="references.html"><a href="references.html"><i class="fa fa-check"></i><b>6</b> References</a></li>
@@ -311,12 +312,12 @@ <h2><span class="header-section-number">2.2</span> Objects in Python<a href="wor
</ul>
<p>Object methods are functions that does something with the object you are using it on. You should think about <code>chrNum.count(2)</code> as a function that takes in <code>chrNum</code> and <code>2</code> as inputs. If you want to use the count function on list <code>mixedList</code>, you would use <code>mixedList.count(x)</code>.</p>
<p>Here are some more examples of methods with lists:</p>
<table>
<table style="width:100%;">
<colgroup>
<col width="20%" />
<col width="20%" />
<col width="37%" />
<col width="20%" />
<col width="36%" />
<col width="14%" />
<col width="33%" />
<col width="15%" />
</colgroup>
<thead>
<tr class="header">

0 comments on commit 41f7f2b

Please sign in to comment.