Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wk4 #19

Merged
merged 3 commits into from
Aug 27, 2024
Merged

wk4 #19

Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 8 additions & 6 deletions 01-intro-to-computing.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,9 @@ Some types, such as ints, are able to use a more efficient algorithm when
invoked using the three argument form.
```

This shows the function takes in three input arguments: `base`, `exp`, and `mod=None`. When an argument has an assigned value of `mod=None`, that means the input argument already has a value, and you don't need to specify anything, unless you want to.
We can also find a similar help document, in a [nicer rendered form online.](https://docs.python.org/3/library/functions.html#pow) We will practice looking at function documentation throughout the course, because that is a fundamental skill to learn more functions on your own.

The documentation shows the function takes in three input arguments: `base`, `exp`, and `mod=None`. When an argument has an assigned value of `mod=None`, that means the input argument already has a value, and you don't need to specify anything, unless you want to.

The following ways are equivalent ways of using the `pow()` function:

Expand All @@ -219,11 +221,11 @@ And there is an operational equivalent:

We will mostly look at functions with input arguments and return types in this course, but not all functions need to have input arguments and output return. Let's look at some examples of functions that don't always have an input or output:

| Function call | What it takes in | What it does | Returns |
|---------------|---------------|----------------------------|---------------|
| `pow(a, b)` | integer `a`, integer `b` | Raises `a` to the `b`th power. | Integer |
| `print(x)` | any data type `x` | Prints out the value of `x` to the console. | None |
| `dir()` | Nothing | Gives a list of all the variables defined in the environment. | List |
| Function call | What it takes in | What it does | Returns |
|----------------|----------------|-------------------------|----------------|
| [`pow(a, b)`](https://docs.python.org/3/library/functions.html#pow) | integer `a`, integer `b` | Raises `a` to the `b`th power. | Integer |
| [`print(x)`](https://docs.python.org/3/library/functions.html#print) | any data type `x` | Prints out the value of `x` to the console. | None |
| [`dir()`](https://docs.python.org/3/library/functions.html#dir) | Nothing | Gives a list of all the variables defined in the environment. | List |

## Tips on writing your first code

Expand Down
20 changes: 10 additions & 10 deletions 02-data-structures.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -105,20 +105,20 @@ Object methods are functions that does something with the object you are using i

Here are some more examples of methods with lists:

| Function method | What it takes in | What it does | Returns |
|----------------|----------------|-------------------------------------|------------------|
| `chrNum.count(x)` | list `chrNum`, data type `x` | Counts the number of instances `x` appears as an element of `chrNum`. | Integer |
| `chrNum.append(x)` | list `chrNum`, data type `x` | Appends `x` to the end of the `chrNum`. | None (but `chrNum` is modified!) |
| `chrNum.sort()` | list `chrNum` | Sorts `chrNum` by ascending order. | None (but `chrNum` is modified!) |
| `chrNum.reverse()` | list `chrNum` | Reverses the order of `chrNum`. | None (but `chrNum` is modified!) |
| Function method | What it takes in | What it does | Returns |
|---------------|---------------|---------------------------|---------------|
| [`chrNum.count(x)`](https://docs.python.org/3/tutorial/datastructures.html) | list `chrNum`, data type `x` | Counts the number of instances `x` appears as an element of `chrNum`. | Integer |
| [`chrNum.append(x)`](https://docs.python.org/3/tutorial/datastructures.html) | list `chrNum`, data type `x` | Appends `x` to the end of the `chrNum`. | None (but `chrNum` is modified!) |
| [`chrNum.sort()`](https://docs.python.org/3/tutorial/datastructures.html) | list `chrNum` | Sorts `chrNum` by ascending order. | None (but `chrNum` is modified!) |
| [`chrNum.reverse()`](https://docs.python.org/3/tutorial/datastructures.html) | list `chrNum` | Reverses the order of `chrNum`. | None (but `chrNum` is modified!) |

## Dataframes

A Dataframe is a two-dimensional data structure that stores data like a spreadsheet does.

The Dataframe data structure is found within a Python module called "Pandas". A Python module is an organized collection of functions and data structures. The `import` statement below gives us permission to access the "Pandas" module via the variable `pd`.

To load in a Dataframe from existing spreadsheet data, we use the function `pd.read_csv()`:
To load in a Dataframe from existing spreadsheet data, we use the function [`pd.read_csv()`](https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html):

```{python}
import pandas as pd
Expand All @@ -127,7 +127,7 @@ metadata = pd.read_csv("classroom_data/metadata.csv")
type(metadata)
```

There is a similar function `pd.read_excel()` for loading in Excel spreadsheets.
There is a similar function [`pd.read_excel()`](https://pandas.pydata.org/docs/reference/api/pandas.read_excel.html) for loading in Excel spreadsheets.

Let's investigate the Dataframe as an object:

Expand Down Expand Up @@ -166,7 +166,7 @@ metadata.shape

### What can a Dataframe do (in terms of operations and functions)?

We can use the `head()` and `tail()` functions to look at the first few rows and last few rows of `metadata`, respectively:
We can use the [`.head()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.head.html) and [`.tail()`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.tail.html) methods to look at the first few rows and last few rows of `metadata`, respectively:

```{python}
metadata.head()
Expand All @@ -179,7 +179,7 @@ Both of these functions (without input arguments) are considered as **methods**:

Perhaps the most important operation you will can do with Dataframes is subsetting them. There are two ways to do it. The first way is to subset by numerical indicies, exactly like how we did for lists.

You will use the `iloc` and bracket operations, and you give two slices: one for the row, and one for the column.
You will use the [`iloc`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iloc.html) and bracket operations, and you give two slices: one for the row, and one for the column.

Let's start with a small dataframe to see how it works before returning to `metadata`:

Expand Down
20 changes: 10 additions & 10 deletions 03-data-wrangling1.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ expression.head()
```

| Dataframe | The observation is | Some variables are | Some values are |
|------------------|------------------|-------------------|------------------|
|-----------------|-----------------|--------------------|------------------|
| metadata | Cell line | ModelID, Age, OncotreeLineage | "ACH-000001", 60, "Myeloid" |
| expression | Cell line | KRAS_Exp | 2.4, .3 |
| mutation | Cell line | KRAS_Mut | TRUE, FALSE |
Expand Down Expand Up @@ -94,7 +94,7 @@ To subset for rows implicitly, we will use the conditional operators on Datafram
metadata['OncotreeLineage'] == "Lung"
```

Then, we will use the `.loc` operation (which is different than `.iloc` operation!) and subsetting brackets to subset rows and columns Age and Sex at the same time:
Then, we will use the [`.loc`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.loc.html) operation (which is different than [`.iloc`](https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iloc.html) operation!) and subsetting brackets to subset rows and columns Age and Sex at the same time:

```{python}
metadata.loc[metadata['OncotreeLineage'] == "Lung", ["Age", "Sex"]]
Expand Down Expand Up @@ -126,12 +126,12 @@ Now that your Dataframe has be transformed based on your scientific question, yo

If we look at the data structure of a Dataframe's column, it is actually not a List, but an object called Series. It has methods can compute summary statistics for us. Let's take a look at a few popular examples:

| Function method | What it takes in | What it does | Returns |
|----------------|----------------|-------------------------|----------------|
| `metadata.Age.mean()` | `metadata.Age` as a numeric Series | Computes the mean value of the `Age` column. | Float (NumPy) |
| `metadata['Age'].median()` | `metadata['Age']` as a numeric Series | Computes the median value of the `Age` column. | Float (NumPy) |
| `metadata.Age.max()` | `metadata.Age` as a numeric Series | Computes the max value of the `Age` column. | Float (NumPy) |
| `metadata.OncotreeSubtype.value_counts()` | `metadata.OncotreeSubtype` as a string Series | Creates a frequency table of all unique elements in `OncotreeSubtype` column. | Series |
| Function method | What it takes in | What it does | Returns |
|----------------|----------------|------------------------|----------------|
| [`metadata.Age.mean()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.mean.html) | `metadata.Age` as a numeric Series | Computes the mean value of the `Age` column. | Float (NumPy) |
| [`metadata['Age'].median()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.median.html) | `metadata['Age']` as a numeric Series | Computes the median value of the `Age` column. | Float (NumPy) |
| [`metadata.Age.max()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.max.html) | `metadata.Age` as a numeric Series | Computes the max value of the `Age` column. | Float (NumPy) |
| [`metadata.OncotreeSubtype.value_counts()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.value_counts.html) | `metadata.OncotreeSubtype` as a string Series | Creates a frequency table of all unique elements in `OncotreeSubtype` column. | Series |

Let's try it out, with some nice print formatting:

Expand All @@ -144,10 +144,10 @@ Notice that the output of some of these methods are Float (NumPy). This refers t

## Simple data visualization

We will dedicate extensive time later this course to talk about data visualization, but the Dataframe's column, Series, has a method called `.plot()` that can help us make simple plots for one variable. The `.plot()` method will by default make a line plot, but it is not necessary the plot style we want, so we can give the optional argument `kind` a String value to specify the plot style. We use it for making a histogram or bar plot.
We will dedicate extensive time later this course to talk about data visualization, but the Dataframe's column, Series, has a method called [`.plot()`](https://pandas.pydata.org/docs/reference/api/pandas.Series.plot.html) that can help us make simple plots for one variable. The `.plot()` method will by default make a line plot, but it is not necessary the plot style we want, so we can give the optional argument `kind` a String value to specify the plot style. We use it for making a histogram or bar plot.

| Plot style | Useful for | kind = | Code |
|-----------|-----------|-----------|--------------------------------------|
|-------------|-------------|-------------|---------------------------------|
| Histogram | Numerics | "hist" | `metadata.Age.plot(kind = "hist")` |
| Bar plot | Strings | "bar" | `metadata.OncotreeSubtype.value_counts().plot(kind = "bar")` |

Expand Down
Loading
Loading