diff --git a/02_Procedural_Python/Lecture-Python-And-Data.ipynb b/02_Procedural_Python/Lecture-Python-And-Data.ipynb new file mode 100644 index 0000000..e6d0bc7 --- /dev/null +++ b/02_Procedural_Python/Lecture-Python-And-Data.ipynb @@ -0,0 +1,3336 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Software Engineering for Data Scientists\n", + "\n", + "## *Manipulating Data with Python*\n", + "## CSE 583" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Today's Objectives\n", + "\n", + "#### 0. Cloning LectureNotes\n", + "\n", + "#### 1. Opening & Navigating the Jupyter Notebook\n", + "\n", + "#### 2. Data type basics\n", + "\n", + "#### 3. Loading data with ``pandas``\n", + "\n", + "#### 4. Cleaning and Manipulating data with ``pandas``\n", + "\n", + "#### 5. Visualizing data with ``pandas`` & ``matplotlib``" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 0. Cloning Lecture Notes\n", + "\n", + "The course materials are maintained on github. The next lecture will discuss github in detail. Today, you'll get minimal instructions to get access to today's lecture materials.\n", + "\n", + "1. Open a terminal session\n", + "1. Type 'git clone https://github.com/UWSEDS/LectureNotes.git'\n", + "1. Wait until the download is complete" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 1. Opening and Navigating the IPython Notebook\n", + "\n", + "We will start today with the interactive environment that we will be using often through the course: the [Jupyter Notebook](http://jupyter.org).\n", + "\n", + "We will walk through the following steps together:\n", + "\n", + "1. Download [miniconda](https://conda.io/miniconda.html) (be sure to get Version 3.6) and install it on your system (hopefully you have done this before coming to class)\n", + " ```\n", + " ```\n", + "\n", + "2. Use the ``conda`` command-line tool to update your package listing and install the IPython notebook:\n", + "\n", + " Update ``conda``'s listing of packages for your system:\n", + " ```\n", + " $ conda update conda\n", + " ```\n", + " \n", + " Install IPython notebook and all its requirements\n", + " ```\n", + " $ conda install jupyter notebook\n", + " ```\n", + " \n", + "3. Navigate to the directory containing the course material. For example:\n", + "\n", + " ```\n", + " $ cd LectureNotes/02_Procedural_Python\n", + " ```\n", + " \n", + " You should see a number of files in the directory, including these:\n", + " \n", + " ```\n", + " $ ls\n", + " \n", + " ```\n", + "\n", + "4. Type ``jupyter notebook`` in the terminal to start the notebook\n", + "\n", + " ```\n", + " $ jupyter notebook\n", + " ```\n", + " \n", + " If everything has worked correctly, it should automatically launch your default browser\n", + " ```\n", + " ```\n", + " \n", + "5. Click on ``Lecture-Python-And-Data.ipynb`` to open the notebook containing the content for this lecture.\n", + "\n", + "With that, you're set up to use the Jupyter notebook!" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 2. Data Types Basics\n", + "\n", + "### 2.1 Data type theory\n", + "- Components with the same capabilities are of the same *type*. \n", + " - For example, the numbers 2 and 200 are both integers.\n", + "- A type is defined recursively. Some examples.\n", + " - A list is a collection of objects that can be indexed by position.\n", + " - A list of integers contains an integer at each position.\n", + "- A type has a set of supported operations. For example:\n", + " - Integers can be added\n", + " - Strings can be concatented\n", + " - A table can find the name of its columns\n", + " - What type is returned from the operation?\n", + "- In python, members (components and operations) are indicated by a '.'\n", + " - If `a` is a list, the `a.append(1)` adds `1` to the list." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.2 Primitive types\n", + "\n", + "The primitive types are integers, floats, strings, booleans." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 2.2.1 Integers" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "2" + ] + }, + "execution_count": 1, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Integer arithematic\n", + "1 + 1" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "1 1.5\n" + ] + } + ], + "source": [ + "# Integer division version floating point division\n", + "print (6 // 4, 6/ 4)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 2.2.2 Floats" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "18.0 -2.4492935982947064e-16\n" + ] + } + ], + "source": [ + "# Have the full set of \"calculator functions\" but need the numpy package\n", + "import numpy as np\n", + "print (6.0 * 3, np.sin(2*np.pi))" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "nan" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "# Floats can have a null value called nan, not a number\n", + "a = np.nan\n", + "3*a" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### 2.2.3 Strings" + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "# Can concatenate, substring, find, count, ..." + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Concatenation: The lazybrown fox\n", + "First three letters: The\n", + "Index of 'z': 6\n" + ] + } + ], + "source": [ + "a = \"The lazy\"\n", + "b = \"brown fox\"\n", + "print (\"Concatenation: \", a + b)\n", + "print (\"First three letters: \" + a[0:3])\n", + "print (\"Index of 'z': \" + str(a.find('z')))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.3 Tuples\n", + "A tuple is an ordered sequence of objects. Tuples cannot be changed; they are immuteable." + ] + }, + { + "cell_type": "code", + "execution_count": 48, + "metadata": {}, + "outputs": [], + "source": [ + "a_tuple = (1, 'a', [1,2])" + ] + }, + { + "cell_type": "code", + "execution_count": 49, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 49, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a_tuple[0]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.4 Lists\n", + "A list is an ordered sequence of objects that can be changed." + ] + }, + { + "cell_type": "code", + "execution_count": 7, + "metadata": {}, + "outputs": [], + "source": [ + "a_list = [1, 'a', [1,2]]" + ] + }, + { + "cell_type": "code", + "execution_count": 50, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 50, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a_list[0]" + ] + }, + { + "cell_type": "code", + "execution_count": 8, + "metadata": {}, + "outputs": [], + "source": [ + "a_list.append(2)" + ] + }, + { + "cell_type": "code", + "execution_count": 9, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "[1, 'a', [1, 2], 2]" + ] + }, + "execution_count": 9, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a_list" + ] + }, + { + "cell_type": "code", + "execution_count": 10, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['__add__',\n", + " '__class__',\n", + " '__contains__',\n", + " '__delattr__',\n", + " '__delitem__',\n", + " '__dir__',\n", + " '__doc__',\n", + " '__eq__',\n", + " '__format__',\n", + " '__ge__',\n", + " '__getattribute__',\n", + " '__getitem__',\n", + " '__gt__',\n", + " '__hash__',\n", + " '__iadd__',\n", + " '__imul__',\n", + " '__init__',\n", + " '__init_subclass__',\n", + " '__iter__',\n", + " '__le__',\n", + " '__len__',\n", + " '__lt__',\n", + " '__mul__',\n", + " '__ne__',\n", + " '__new__',\n", + " '__reduce__',\n", + " '__reduce_ex__',\n", + " '__repr__',\n", + " '__reversed__',\n", + " '__rmul__',\n", + " '__setattr__',\n", + " '__setitem__',\n", + " '__sizeof__',\n", + " '__str__',\n", + " '__subclasshook__',\n", + " 'append',\n", + " 'clear',\n", + " 'copy',\n", + " 'count',\n", + " 'extend',\n", + " 'index',\n", + " 'insert',\n", + " 'pop',\n", + " 'remove',\n", + " 'reverse',\n", + " 'sort']" + ] + }, + "execution_count": 10, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dir(a_list)" + ] + }, + { + "cell_type": "code", + "execution_count": 11, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "1" + ] + }, + "execution_count": 11, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "a_list.count(1)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "collapsed": true + }, + "source": [ + "### 2.5 Dictionaries\n", + "A dictionary is a kind of associates a *key* with a *value*. A value can be any object, even another dictionary." + ] + }, + { + "cell_type": "code", + "execution_count": 12, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "{'Dave': 'Cake', 'Joe': ['Cake', 'Pie']}\n" + ] + } + ], + "source": [ + "dessert_dict = {} # Empty dictionary\n", + "dessert_dict['Dave'] = \"Cake\"\n", + "dessert_dict[\"Joe\"] = [\"Cake\", \"Pie\"]\n", + "print (dessert_dict)" + ] + }, + { + "cell_type": "code", + "execution_count": 13, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'Cake'" + ] + }, + "execution_count": 13, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dessert_dict[\"Dave\"]" + ] + }, + { + "cell_type": "code", + "execution_count": 14, + "metadata": {}, + "outputs": [], + "source": [ + "# This produces an error\n", + "#dessert[\"Bernease\"]" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### 2.6 Summary\n", + "
\n", + "\n", + "| type | description |\n", + "|------|------------|\n", + "| primitive | int, float, string, bool |\n", + "| tuple | An immutable collection of ordered objects |\n", + "| list | A mutable collection of ordered objects |\n", + "| dictionary | A mutable collection of named objects |\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 3. Loading data with ``pandas``\n", + "\n", + "With this simple Python computation experience under our belt, we can now move to doing some more interesting analysis." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Python's Data Science Ecosystem\n", + "\n", + "In addition to Python's built-in modules like the ``math`` module we explored above, there are also many often-used third-party modules that are core tools for doing data science with Python.\n", + "Some of the most important ones are:\n", + "\n", + "#### [``numpy``](http://numpy.org/): Numerical Python\n", + "\n", + "Numpy is short for \"Numerical Python\", and contains tools for efficient manipulation of arrays of data.\n", + "If you have used other computational tools like IDL or MatLab, Numpy should feel very familiar.\n", + "\n", + "#### [``scipy``](http://scipy.org/): Scientific Python\n", + "\n", + "Scipy is short for \"Scientific Python\", and contains a wide range of functionality for accomplishing common scientific tasks, such as optimization/minimization, numerical integration, interpolation, and much more.\n", + "We will not look closely at Scipy today, but we will use its functionality later in the course.\n", + "\n", + "#### [``pandas``](http://pandas.pydata.org/): Labeled Data Manipulation in Python\n", + "\n", + "Pandas is short for \"Panel Data\", and contains tools for doing more advanced manipulation of labeled data in Python, in particular with a columnar data structure called a *Data Frame*.\n", + "If you've used the [R](http://rstats.org) statistical language (and in particular the so-called \"Hadley Stack\"), much of the functionality in Pandas should feel very familiar.\n", + "\n", + "#### [``matplotlib``](http://matplotlib.org): Visualization in Python\n", + "\n", + "Matplotlib started out as a Matlab plotting clone in Python, and has grown from there in the 15 years since its creation. It is the most popular data visualization tool currently in the Python data world (though other recent packages are starting to encroach on its monopoly)." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Installing Pandas & friends\n", + "\n", + "Because the above packages are not included in Python itself, you need to install them separately. While it is possible to install these from source (compiling the C and/or Fortran code that does the heavy lifting under the hood) it is much easier to use a package manager like ``conda``. All it takes is to run\n", + "\n", + "```\n", + "$ conda install numpy scipy pandas matplotlib\n", + "```\n", + "\n", + "and (so long as your conda setup is working) the packages will be downloaded and installed on your system." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Downloading the data\n", + "\n", + "shell commands can be run from the notebook by preceding them with an exclamation point:" + ] + }, + { + "cell_type": "code", + "execution_count": 15, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "01-Course-Introduction-And-Data-Essentials.ppt\r\n", + "02-Python-and-Data.pdf\r\n", + "04-Procedural_Python.ipynb\r\n", + "2015_trip_data.csv\r\n", + "Breakout-Simple-Math.ipynb\r\n", + "(Completed)Breakout-Simple-Math.ipynb\r\n", + "(Completed)Lecture-Python-And-Data.ipynb\r\n", + "Lecture-Python-And-Data-Autum-2016.ipynb\r\n", + "Lecture-Python-And-Data-Autumn-2017.ipynb\r\n", + "Lecture-Python-And-Data-CSE515A.ipynb\r\n", + "Lecture-Python-and-Data.ipynb\r\n", + "Lecture-Python-And-Data.ipynb\r\n", + "Play With Notebooks.ipynb\r\n", + "procedural_programming_in_python.ipynb\r\n", + "pronto.csv\r\n", + "__pycache__\r\n", + "split_apply_combine.png\r\n", + "table_modifiers.py\r\n" + ] + } + ], + "source": [ + "!ls" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "uncomment this to download the data:" + ] + }, + { + "cell_type": "code", + "execution_count": 16, + "metadata": {}, + "outputs": [], + "source": [ + "#!curl -o pronto.csv https://data.seattle.gov/api/views/tw7j-dfaw/rows.csv?accessType=DOWNLOAD" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Loading Data with Pandas" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Because we'll use it so much, we often import under a shortened name using the ``import ... as ...`` pattern:" + ] + }, + { + "cell_type": "code", + "execution_count": 17, + "metadata": {}, + "outputs": [], + "source": [ + "import pandas as pd\n", + "df = pd.read_csv('pronto.csv')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can use the ``read_csv`` command to read the comma-separated-value data:" + ] + }, + { + "cell_type": "code", + "execution_count": 18, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "01-Course-Introduction-And-Data-Essentials.ppt\r\n", + "02-Python-and-Data.pdf\r\n", + "04-Procedural_Python.ipynb\r\n", + "2015_trip_data.csv\r\n", + "Breakout-Simple-Math.ipynb\r\n", + "(Completed)Breakout-Simple-Math.ipynb\r\n", + "(Completed)Lecture-Python-And-Data.ipynb\r\n", + "Lecture-Python-And-Data-Autum-2016.ipynb\r\n", + "Lecture-Python-And-Data-Autumn-2017.ipynb\r\n", + "Lecture-Python-And-Data-CSE515A.ipynb\r\n", + "Lecture-Python-and-Data.ipynb\r\n", + "Lecture-Python-And-Data.ipynb\r\n", + "Play With Notebooks.ipynb\r\n", + "procedural_programming_in_python.ipynb\r\n", + "pronto.csv\r\n", + "__pycache__\r\n", + "split_apply_combine.png\r\n", + "table_modifiers.py\r\n" + ] + } + ], + "source": [ + "!ls" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "*Note: strings in Python can be defined either with double quotes or single quotes*" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Viewing Pandas Dataframes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The ``head()`` and ``tail()`` methods show us the first and last rows of the data" + ] + }, + { + "cell_type": "code", + "execution_count": 19, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
trip_idstarttimestoptimebikeidtripdurationfrom_station_nameto_station_namefrom_station_idto_station_idusertypegenderbirthyear
043110/13/2014 10:31:00 AM10/13/2014 10:48:00 AMSEA00298985.9352nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberMale1960.0
143210/13/2014 10:32:00 AM10/13/2014 10:48:00 AMSEA00195926.3752nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberMale1970.0
243310/13/2014 10:33:00 AM10/13/2014 10:48:00 AMSEA00486883.8312nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberFemale1988.0
343410/13/2014 10:34:00 AM10/13/2014 10:48:00 AMSEA00333865.9372nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberFemale1977.0
443510/13/2014 10:34:00 AM10/13/2014 10:49:00 AMSEA00202923.9232nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberMale1971.0
\n", + "
" + ], + "text/plain": [ + " trip_id starttime stoptime bikeid \\\n", + "0 431 10/13/2014 10:31:00 AM 10/13/2014 10:48:00 AM SEA00298 \n", + "1 432 10/13/2014 10:32:00 AM 10/13/2014 10:48:00 AM SEA00195 \n", + "2 433 10/13/2014 10:33:00 AM 10/13/2014 10:48:00 AM SEA00486 \n", + "3 434 10/13/2014 10:34:00 AM 10/13/2014 10:48:00 AM SEA00333 \n", + "4 435 10/13/2014 10:34:00 AM 10/13/2014 10:49:00 AM SEA00202 \n", + "\n", + " tripduration from_station_name \\\n", + "0 985.935 2nd Ave & Spring St \n", + "1 926.375 2nd Ave & Spring St \n", + "2 883.831 2nd Ave & Spring St \n", + "3 865.937 2nd Ave & Spring St \n", + "4 923.923 2nd Ave & Spring St \n", + "\n", + " to_station_name from_station_id \\\n", + "0 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "1 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "2 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "3 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "4 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "\n", + " to_station_id usertype gender birthyear \n", + "0 PS-04 Member Male 1960.0 \n", + "1 PS-04 Member Male 1970.0 \n", + "2 PS-04 Member Female 1988.0 \n", + "3 PS-04 Member Female 1977.0 \n", + "4 PS-04 Member Male 1971.0 " + ] + }, + "execution_count": 19, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 20, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "Index(['trip_id', 'starttime', 'stoptime', 'bikeid', 'tripduration',\n", + " 'from_station_name', 'to_station_name', 'from_station_id',\n", + " 'to_station_id', 'usertype', 'gender', 'birthyear'],\n", + " dtype='object')" + ] + }, + "execution_count": 20, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.columns" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The ``shape`` attribute shows us the number of elements:" + ] + }, + { + "cell_type": "code", + "execution_count": 21, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "(275091, 12)" + ] + }, + "execution_count": 21, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.shape" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The ``columns`` attribute gives us the column names" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The ``index`` attribute gives us the index names" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The ``dtypes`` attribute gives the data types of each column:" + ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "trip_id int64\n", + "starttime object\n", + "stoptime object\n", + "bikeid object\n", + "tripduration float64\n", + "from_station_name object\n", + "to_station_name object\n", + "from_station_id object\n", + "to_station_id object\n", + "usertype object\n", + "gender object\n", + "birthyear float64\n", + "dtype: object" + ] + }, + "execution_count": 22, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.dtypes" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 4. Manipulating data with ``pandas``\n", + "\n", + "Here we'll cover some key features of manipulating data with pandas" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Access columns by name using square-bracket indexing:" + ] + }, + { + "cell_type": "code", + "execution_count": 23, + "metadata": {}, + "outputs": [], + "source": [ + "df_small = df[ 'stoptime']" + ] + }, + { + "cell_type": "code", + "execution_count": 24, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "pandas.core.series.Series" + ] + }, + "execution_count": 24, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "type(df_small)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Mathematical operations on columns happen *element-wise*:" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "0 0.273871\n", + "1 0.257326\n", + "2 0.245509\n", + "Name: tripduration, dtype: float64" + ] + }, + "execution_count": 25, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "trip_duration_hours = df['tripduration']/3600\n", + "trip_duration_hours[:3]" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [], + "source": [ + "df['trip_duration_hours'] = df['tripduration']/3600" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "del df['trip_duration_hours']" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
trip_idstarttimestoptimebikeidtripdurationfrom_station_nameto_station_namefrom_station_idto_station_idusertypegenderbirthyear
043110/13/2014 10:31:00 AM10/13/2014 10:48:00 AMSEA00298985.9352nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberMale1960.0
143210/13/2014 10:32:00 AM10/13/2014 10:48:00 AMSEA00195926.3752nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberMale1970.0
243310/13/2014 10:33:00 AM10/13/2014 10:48:00 AMSEA00486883.8312nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberFemale1988.0
343410/13/2014 10:34:00 AM10/13/2014 10:48:00 AMSEA00333865.9372nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberFemale1977.0
443510/13/2014 10:34:00 AM10/13/2014 10:49:00 AMSEA00202923.9232nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberMale1971.0
\n", + "
" + ], + "text/plain": [ + " trip_id starttime stoptime bikeid \\\n", + "0 431 10/13/2014 10:31:00 AM 10/13/2014 10:48:00 AM SEA00298 \n", + "1 432 10/13/2014 10:32:00 AM 10/13/2014 10:48:00 AM SEA00195 \n", + "2 433 10/13/2014 10:33:00 AM 10/13/2014 10:48:00 AM SEA00486 \n", + "3 434 10/13/2014 10:34:00 AM 10/13/2014 10:48:00 AM SEA00333 \n", + "4 435 10/13/2014 10:34:00 AM 10/13/2014 10:49:00 AM SEA00202 \n", + "\n", + " tripduration from_station_name \\\n", + "0 985.935 2nd Ave & Spring St \n", + "1 926.375 2nd Ave & Spring St \n", + "2 883.831 2nd Ave & Spring St \n", + "3 865.937 2nd Ave & Spring St \n", + "4 923.923 2nd Ave & Spring St \n", + "\n", + " to_station_name from_station_id \\\n", + "0 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "1 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "2 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "3 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "4 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "\n", + " to_station_id usertype gender birthyear \n", + "0 PS-04 Member Male 1960.0 \n", + "1 PS-04 Member Male 1970.0 \n", + "2 PS-04 Member Female 1988.0 \n", + "3 PS-04 Member Female 1977.0 \n", + "4 PS-04 Member Male 1971.0 " + ] + }, + "execution_count": 28, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
trip_idstarttimestoptimebikeidtripdurationfrom_station_nameto_station_namefrom_station_idto_station_idusertypegenderbirthyear
043110/13/2014 10:31:00 AM10/13/2014 10:48:00 AMSEA00298985.9352nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberMale1960.0
143210/13/2014 10:32:00 AM10/13/2014 10:48:00 AMSEA00195926.3752nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberMale1970.0
\n", + "
" + ], + "text/plain": [ + " trip_id starttime stoptime bikeid \\\n", + "0 431 10/13/2014 10:31:00 AM 10/13/2014 10:48:00 AM SEA00298 \n", + "1 432 10/13/2014 10:32:00 AM 10/13/2014 10:48:00 AM SEA00195 \n", + "\n", + " tripduration from_station_name \\\n", + "0 985.935 2nd Ave & Spring St \n", + "1 926.375 2nd Ave & Spring St \n", + "\n", + " to_station_name from_station_id \\\n", + "0 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "1 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "\n", + " to_station_id usertype gender birthyear \n", + "0 PS-04 Member Male 1960.0 \n", + "1 PS-04 Member Male 1970.0 " + ] + }, + "execution_count": 29, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.loc[[0,1],:]" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [], + "source": [ + "df_long_trips = df[df['tripduration'] >10000]" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [], + "source": [ + "sel = df['tripduration'] >10000 \n", + "df_long_trips = df[sel]" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "275091" + ] + }, + "execution_count": 32, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "len(df)" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [], + "source": [ + "# Make a copy of a slice\n", + "df_subset = df[['starttime', 'stoptime']].copy()\n", + "df_subset['trip_hours'] = df['tripduration']/3600" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Columns can be created (or overwritten) with the assignment operator.\n", + "Let's create a *tripminutes* column with the number of minutes for each trip" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "More complicated mathematical operations can be done with tools in the ``numpy`` package:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Working with Times" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "One trick to know when working with columns of times is that Pandas ``DateTimeIndex`` provides a nice interface for working with columns of times.\n", + "\n", + "For a dataset of this size, using ``pd.to_datetime`` and specifying the date format can make things much faster (from the [strftime reference](http://strftime.org/), we see that the pronto data has format ``\"%m/%d/%Y %I:%M:%S %p\"``" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "(Note: you can also use ``infer_datetime_format=True`` in most cases to automatically infer the correct format, though due to a bug it doesn't work when AM/PM are present)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "With it, we can extract, the hour of the day, the day of the week, the month, and a wide range of other views of the time:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Simple Grouping of Data\n", + "\n", + "The real power of Pandas comes in its tools for grouping and aggregating data. Here we'll look at *value counts* and the basics of *group-by* operations." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### Value Counts" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Pandas includes an array of useful functionality for manipulating and analyzing tabular data.\n", + "We'll take a look at two of these here." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The ``pandas.value_counts`` returns statistics on the unique values within each column.\n", + "\n", + "We can use it, for example, to break down rides by gender:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Or to break down rides by age:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "By default, the values rather than the index are sorted. Use ``sort=False`` to turn this behavior off:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can explore other things as well: day of week, hour of day, etc." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Group-by Operation\n", + "\n", + "One of the killer features of the Pandas dataframe is the ability to do group-by operations.\n", + "You can visualize the group-by like this (image borrowed from the [Python Data Science Handbook](http://shop.oreilly.com/product/0636920034919.do))" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
trip_idstarttimestoptimebikeidtripdurationfrom_station_nameto_station_namefrom_station_idto_station_idusertypegenderbirthyear
043110/13/2014 10:31:00 AM10/13/2014 10:48:00 AMSEA00298985.9352nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberMale1960.0
143210/13/2014 10:32:00 AM10/13/2014 10:48:00 AMSEA00195926.3752nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberMale1970.0
243310/13/2014 10:33:00 AM10/13/2014 10:48:00 AMSEA00486883.8312nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberFemale1988.0
343410/13/2014 10:34:00 AM10/13/2014 10:48:00 AMSEA00333865.9372nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberFemale1977.0
443510/13/2014 10:34:00 AM10/13/2014 10:49:00 AMSEA00202923.9232nd Ave & Spring StOccidental Park / Occidental Ave S & S Washing...CBD-06PS-04MemberMale1971.0
\n", + "
" + ], + "text/plain": [ + " trip_id starttime stoptime bikeid \\\n", + "0 431 10/13/2014 10:31:00 AM 10/13/2014 10:48:00 AM SEA00298 \n", + "1 432 10/13/2014 10:32:00 AM 10/13/2014 10:48:00 AM SEA00195 \n", + "2 433 10/13/2014 10:33:00 AM 10/13/2014 10:48:00 AM SEA00486 \n", + "3 434 10/13/2014 10:34:00 AM 10/13/2014 10:48:00 AM SEA00333 \n", + "4 435 10/13/2014 10:34:00 AM 10/13/2014 10:49:00 AM SEA00202 \n", + "\n", + " tripduration from_station_name \\\n", + "0 985.935 2nd Ave & Spring St \n", + "1 926.375 2nd Ave & Spring St \n", + "2 883.831 2nd Ave & Spring St \n", + "3 865.937 2nd Ave & Spring St \n", + "4 923.923 2nd Ave & Spring St \n", + "\n", + " to_station_name from_station_id \\\n", + "0 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "1 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "2 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "3 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "4 Occidental Park / Occidental Ave S & S Washing... CBD-06 \n", + "\n", + " to_station_id usertype gender birthyear \n", + "0 PS-04 Member Male 1960.0 \n", + "1 PS-04 Member Male 1970.0 \n", + "2 PS-04 Member Female 1988.0 \n", + "3 PS-04 Member Female 1977.0 \n", + "4 PS-04 Member Male 1971.0 " + ] + }, + "execution_count": 34, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
trip_idstarttimestoptimebikeidtripdurationfrom_station_nameto_station_nameto_station_idusertypegenderbirthyear
from_station_id
BT-0110463104631046310463104631046310463104631046341624162
BT-0373347334733473347334733473347334733448624862
BT-0446664666466646664666466646664666466634243424
BT-0556995699569956995699569956995699569929752975
BT-06150150150150150150150150150130130
\n", + "
" + ], + "text/plain": [ + " trip_id starttime stoptime bikeid tripduration \\\n", + "from_station_id \n", + "BT-01 10463 10463 10463 10463 10463 \n", + "BT-03 7334 7334 7334 7334 7334 \n", + "BT-04 4666 4666 4666 4666 4666 \n", + "BT-05 5699 5699 5699 5699 5699 \n", + "BT-06 150 150 150 150 150 \n", + "\n", + " from_station_name to_station_name to_station_id usertype \\\n", + "from_station_id \n", + "BT-01 10463 10463 10463 10463 \n", + "BT-03 7334 7334 7334 7334 \n", + "BT-04 4666 4666 4666 4666 \n", + "BT-05 5699 5699 5699 5699 \n", + "BT-06 150 150 150 150 \n", + "\n", + " gender birthyear \n", + "from_station_id \n", + "BT-01 4162 4162 \n", + "BT-03 4862 4862 \n", + "BT-04 3424 3424 \n", + "BT-05 2975 2975 \n", + "BT-06 130 130 " + ] + }, + "execution_count": 35, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_count = df.groupby(['from_station_id']).count()\n", + "df_count.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 36, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
countnew
from_station_id
BT-01104631
BT-0373341
BT-0446661
BT-0556991
BT-061501
\n", + "
" + ], + "text/plain": [ + " count new\n", + "from_station_id \n", + "BT-01 10463 1\n", + "BT-03 7334 1\n", + "BT-04 4666 1\n", + "BT-05 5699 1\n", + "BT-06 150 1" + ] + }, + "execution_count": 36, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_count1 = df_count[['trip_id']]\n", + "df_count2 = df_count1.rename(columns={'trip_id': 'count'})\n", + "df_count2['new'] = 1\n", + "df_count2.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 37, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
trip_idtripdurationbirthyear
from_station_id
BT-01147831.0098441375.0312031980.131427
BT-03139404.2946551019.2006841976.505142
BT-04157992.809687891.0958971979.877044
BT-05139283.5723811199.9494811975.937479
BT-06291807.953333659.7705471975.830769
\n", + "
" + ], + "text/plain": [ + " trip_id tripduration birthyear\n", + "from_station_id \n", + "BT-01 147831.009844 1375.031203 1980.131427\n", + "BT-03 139404.294655 1019.200684 1976.505142\n", + "BT-04 157992.809687 891.095897 1979.877044\n", + "BT-05 139283.572381 1199.949481 1975.937479\n", + "BT-06 291807.953333 659.770547 1975.830769" + ] + }, + "execution_count": 37, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_mean = df.groupby(['from_station_id']).mean()\n", + "df_mean.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 38, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "{'BT-01': Int64Index([ 217, 227, 228, 282, 283, 310, 326, 327,\n", + " 329, 331,\n", + " ...\n", + " 274971, 274973, 274974, 274975, 274976, 274979, 275032, 275033,\n", + " 275075, 275076],\n", + " dtype='int64', length=10463),\n", + " 'BT-03': Int64Index([ 87, 88, 230, 261, 366, 407, 414, 439,\n", + " 453, 754,\n", + " ...\n", + " 268122, 268181, 268307, 268318, 268319, 268391, 268392, 268467,\n", + " 268527, 268528],\n", + " dtype='int64', length=7334),\n", + " 'BT-04': Int64Index([ 66, 67, 94, 104, 108, 166, 233, 259,\n", + " 322, 333,\n", + " ...\n", + " 274350, 274361, 274424, 274704, 274789, 274970, 275009, 275064,\n", + " 275065, 275083],\n", + " dtype='int64', length=4666),\n", + " 'BT-05': Int64Index([ 110, 413, 426, 513, 585, 618, 744, 753,\n", + " 795, 1003,\n", + " ...\n", + " 274547, 274605, 274610, 274621, 274817, 274847, 274910, 274911,\n", + " 275029, 275034],\n", + " dtype='int64', length=5699),\n", + " 'BT-06': Int64Index([268581, 268642, 268667, 268718, 268735, 268781, 268897, 268903,\n", + " 268914, 268961,\n", + " ...\n", + " 274777, 274778, 274865, 274931, 274937, 274949, 274951, 274956,\n", + " 274962, 274965],\n", + " dtype='int64', length=150),\n", + " 'CBD-03': Int64Index([ 118, 119, 164, 229, 275, 285, 328, 339,\n", + " 356, 357,\n", + " ...\n", + " 274574, 274594, 274599, 274622, 274672, 274743, 274774, 274810,\n", + " 274915, 275053],\n", + " dtype='int64', length=4822),\n", + " 'CBD-04': Int64Index([105392, 105458, 105467, 105472, 105614, 105615, 105835, 105836,\n", + " 105855, 105858,\n", + " ...\n", + " 274730, 274922, 274924, 274925, 274926, 274927, 274952, 274958,\n", + " 274983, 275080],\n", + " dtype='int64', length=3440),\n", + " 'CBD-05': Int64Index([ 54, 79, 95, 148, 149, 150, 151, 165,\n", + " 211, 219,\n", + " ...\n", + " 274081, 274085, 274086, 274138, 274139, 274623, 274624, 274692,\n", + " 274814, 274906],\n", + " dtype='int64', length=5068),\n", + " 'CBD-06': Int64Index([ 0, 1, 2, 3, 4, 5, 63, 68,\n", + " 70, 71,\n", + " ...\n", + " 274534, 274544, 274570, 274671, 274744, 274765, 274797, 274836,\n", + " 274838, 274980],\n", + " dtype='int64', length=4911),\n", + " 'CBD-07': Int64Index([ 42, 69, 78, 141, 196, 269, 478, 510,\n", + " 522, 542,\n", + " ...\n", + " 274317, 274413, 274439, 274467, 274579, 274673, 274688, 274752,\n", + " 274879, 274907],\n", + " dtype='int64', length=3263),\n", + " 'CBD-13': Int64Index([ 99, 139, 198, 249, 276, 334, 381, 388,\n", + " 424, 442,\n", + " ...\n", + " 274444, 274535, 274555, 274613, 274726, 274780, 274903, 274916,\n", + " 274966, 274968],\n", + " dtype='int64', length=9067),\n", + " 'CD-01': Int64Index([ 68531, 68532, 69169, 69170, 69954, 70367, 70529, 70546,\n", + " 70621, 70622,\n", + " ...\n", + " 224878, 225305, 225422, 225427, 225492, 225493, 225587, 225666,\n", + " 226277, 226597],\n", + " dtype='int64', length=958),\n", + " 'CH-01': Int64Index([ 256, 355, 382, 416, 437, 444, 502, 503,\n", + " 600, 679,\n", + " ...\n", + " 274585, 274632, 274641, 274656, 274670, 274827, 274829, 274871,\n", + " 274933, 274995],\n", + " dtype='int64', length=6409),\n", + " 'CH-02': Int64Index([ 55, 56, 58, 83, 113, 126, 127, 154,\n", + " 162, 163,\n", + " ...\n", + " 274450, 274488, 274593, 274660, 274668, 274868, 274917, 275067,\n", + " 275081, 275082],\n", + " dtype='int64', length=8546),\n", + " 'CH-03': Int64Index([ 290, 417, 428, 435, 436, 452, 494, 516,\n", + " 640, 665,\n", + " ...\n", + " 274625, 274650, 274658, 274693, 274822, 274844, 274872, 274893,\n", + " 274894, 274950],\n", + " dtype='int64', length=6218),\n", + " 'CH-05': Int64Index([ 134, 195, 205, 248, 250, 251, 315, 324,\n", + " 337, 390,\n", + " ...\n", + " 274457, 274597, 274714, 274850, 274855, 274873, 274889, 274932,\n", + " 274936, 274993],\n", + " dtype='int64', length=6948),\n", + " 'CH-06': Int64Index([ 212, 253, 277, 278, 279, 403, 449, 504,\n", + " 684, 881,\n", + " ...\n", + " 274679, 274745, 274746, 274750, 274751, 274781, 274788, 274834,\n", + " 274839, 274875],\n", + " dtype='int64', length=3765),\n", + " 'CH-07': Int64Index([ 146, 210, 299, 341, 374, 377, 401, 415,\n", + " 431, 466,\n", + " ...\n", + " 274832, 274846, 274890, 274891, 274892, 274955, 275002, 275049,\n", + " 275060, 275077],\n", + " dtype='int64', length=11568),\n", + " 'CH-08': Int64Index([ 120, 136, 144, 158, 159, 242, 262, 294,\n", + " 311, 321,\n", + " ...\n", + " 274824, 274848, 274853, 274904, 274905, 274935, 274943, 274982,\n", + " 275006, 275090],\n", + " dtype='int64', length=8573),\n", + " 'CH-09': Int64Index([ 101, 168, 222, 349, 380, 467, 567, 578,\n", + " 628, 647,\n", + " ...\n", + " 274357, 274466, 274468, 274475, 274500, 274595, 274611, 274732,\n", + " 274895, 275066],\n", + " dtype='int64', length=5246),\n", + " 'CH-12': Int64Index([ 319, 384, 411, 441, 451, 462, 540, 554,\n", + " 558, 605,\n", + " ...\n", + " 274577, 274603, 274609, 274615, 274631, 274680, 275050, 275056,\n", + " 275088, 275089],\n", + " dtype='int64', length=5857),\n", + " 'CH-15': Int64Index([ 109, 160, 244, 340, 402, 430, 459, 468,\n", + " 723, 724,\n", + " ...\n", + " 274674, 274756, 274791, 274812, 274826, 274840, 274852, 274857,\n", + " 274921, 274969],\n", + " dtype='int64', length=6550),\n", + " 'CH-16': Int64Index([175075, 175093, 175108, 175126, 175127, 175131, 175136, 175144,\n", + " 175145, 175210,\n", + " ...\n", + " 274629, 274805, 274816, 274825, 274854, 274920, 275035, 275042,\n", + " 275051, 275072],\n", + " dtype='int64', length=2089),\n", + " 'DPD-01': Int64Index([ 59, 91, 93, 289, 568, 644, 667, 740,\n", + " 839, 974,\n", + " ...\n", + " 274530, 274604, 274665, 274681, 274798, 274823, 274939, 274957,\n", + " 274961, 275026],\n", + " dtype='int64', length=4822),\n", + " 'DPD-03': Int64Index([ 131, 197, 345, 347, 727, 844, 1075, 1144,\n", + " 1347, 1430,\n", + " ...\n", + " 273047, 273048, 274048, 274092, 274776, 274800, 275001, 275003,\n", + " 275004, 275005],\n", + " dtype='int64', length=1423),\n", + " 'EL-01': Int64Index([ 199, 400, 700, 702, 715, 716, 769, 1175,\n", + " 1350, 1351,\n", + " ...\n", + " 274676, 274718, 274731, 274770, 274837, 274928, 274929, 274948,\n", + " 274959, 275038],\n", + " dtype='int64', length=3604),\n", + " 'EL-03': Int64Index([ 344, 358, 360, 425, 492, 583, 927, 1027,\n", + " 1071, 1110,\n", + " ...\n", + " 274757, 274861, 274862, 274882, 274984, 274988, 274990, 274994,\n", + " 275031, 275052],\n", + " dtype='int64', length=5788),\n", + " 'EL-05': Int64Index([ 200, 201, 447, 456, 488, 615, 646, 694,\n", + " 763, 858,\n", + " ...\n", + " 274019, 274157, 274162, 274253, 274368, 274477, 274584, 274725,\n", + " 274877, 274878],\n", + " dtype='int64', length=3400),\n", + " 'FH-01': Int64Index([ 100, 231, 325, 330, 373, 386, 455, 485,\n", + " 505, 521,\n", + " ...\n", + " 173748, 173988, 174253, 174384, 174549, 174647, 174657, 174690,\n", + " 174986, 175005],\n", + " dtype='int64', length=2349),\n", + " 'FH-04': Int64Index([ 364, 371, 392, 396, 460, 482, 529, 950,\n", + " 970, 984,\n", + " ...\n", + " 274428, 274519, 274600, 274640, 274646, 274648, 274849, 274851,\n", + " 274964, 275000],\n", + " dtype='int64', length=4208),\n", + " 'ID-04': Int64Index([ 89, 123, 155, 156, 169, 170, 214, 223,\n", + " 237, 309,\n", + " ...\n", + " 274353, 274445, 274548, 274792, 274930, 275014, 275057, 275058,\n", + " 275084, 275085],\n", + " dtype='int64', length=2474),\n", + " 'PS-04': Int64Index([ 6, 7, 8, 9, 10, 11, 12, 13,\n", + " 14, 15,\n", + " ...\n", + " 274446, 274471, 274572, 274734, 274766, 274874, 274901, 274902,\n", + " 274944, 275068],\n", + " dtype='int64', length=5409),\n", + " 'PS-05': Int64Index([ 45, 49, 53, 57, 90, 130, 202, 218,\n", + " 246, 247,\n", + " ...\n", + " 274633, 274634, 274635, 274666, 274820, 274828, 274978, 274987,\n", + " 275022, 275073],\n", + " dtype='int64', length=3969),\n", + " 'SLU-01': Int64Index([ 111, 142, 143, 147, 152, 153, 220, 308,\n", + " 312, 370,\n", + " ...\n", + " 274686, 274722, 274768, 274769, 274883, 274884, 274997, 275041,\n", + " 275059, 275071],\n", + " dtype='int64', length=7084),\n", + " 'SLU-02': Int64Index([ 137, 181, 296, 397, 427, 458, 464, 500,\n", + " 530, 531,\n", + " ...\n", + " 274678, 274684, 274687, 274747, 274753, 274833, 274835, 274885,\n", + " 274899, 274991],\n", + " dtype='int64', length=7018),\n", + " 'SLU-04': Int64Index([ 213, 245, 273, 288, 291, 295, 316, 432,\n", + " 589, 639,\n", + " ...\n", + " 274185, 274309, 274311, 274415, 274493, 274711, 274887, 274941,\n", + " 275027, 275048],\n", + " dtype='int64', length=5226),\n", + " 'SLU-07': Int64Index([ 368, 454, 551, 552, 575, 577, 633, 648,\n", + " 735, 741,\n", + " ...\n", + " 274607, 274608, 274647, 274715, 274716, 274771, 274845, 274898,\n", + " 274946, 275040],\n", + " dtype='int64', length=6339),\n", + " 'SLU-15': Int64Index([ 102, 178, 232, 243, 284, 287, 292, 313,\n", + " 318, 338,\n", + " ...\n", + " 274808, 274863, 274897, 274923, 274953, 274967, 274985, 275036,\n", + " 275037, 275061],\n", + " dtype='int64', length=9741),\n", + " 'SLU-16': Int64Index([ 391, 406, 420, 448, 486, 487, 532, 536,\n", + " 537, 538,\n", + " ...\n", + " 274099, 274103, 274296, 274442, 274602, 274720, 274763, 274764,\n", + " 274867, 274896],\n", + " dtype='int64', length=5045),\n", + " 'SLU-18': Int64Index([ 103, 320, 359, 446, 544, 556, 565, 566,\n", + " 591, 614,\n", + " ...\n", + " 209477, 209600, 209625, 209663, 209671, 209907, 209917, 209918,\n", + " 209929, 210002],\n", + " dtype='int64', length=3461),\n", + " 'SLU-19': Int64Index([ 129, 280, 304, 350, 351, 353, 354, 457,\n", + " 493, 564,\n", + " ...\n", + " 274407, 274512, 274590, 274651, 274701, 274702, 274703, 274841,\n", + " 274963, 275062],\n", + " dtype='int64', length=7285),\n", + " 'SLU-20': Int64Index([ 79307, 79441, 79473, 79584, 79657, 79658, 79659, 79864,\n", + " 79868, 79994,\n", + " ...\n", + " 273606, 274539, 274540, 274561, 274606, 274758, 274806, 274807,\n", + " 274918, 275039],\n", + " dtype='int64', length=2452),\n", + " 'SLU-21': Int64Index([133364, 133365, 133388, 133620, 133621, 133744, 133745, 134178,\n", + " 134179, 135016,\n", + " ...\n", + " 274136, 274193, 274497, 274612, 274698, 274801, 274960, 275030,\n", + " 275043, 275074],\n", + " dtype='int64', length=1114),\n", + " 'SLU-22': Int64Index([210885, 210897, 210898, 210899, 210913, 210918, 211084, 211085,\n", + " 211264, 211318,\n", + " ...\n", + " 274525, 274652, 274669, 274683, 274699, 274772, 274803, 274804,\n", + " 274842, 274866],\n", + " dtype='int64', length=1748),\n", + " 'SLU-23': Int64Index([ 192, 206, 224, 225, 226, 305, 306, 549,\n", + " 550, 635,\n", + " ...\n", + " 275010, 275011, 275012, 275016, 275017, 275018, 275019, 275020,\n", + " 275021, 275023],\n", + " dtype='int64', length=5739),\n", + " 'UD-01': Int64Index([ 60, 61, 76, 177, 182, 208, 608, 942,\n", + " 943, 1054,\n", + " ...\n", + " 274123, 274124, 274158, 274460, 274575, 274700, 274705, 274869,\n", + " 274881, 275025],\n", + " dtype='int64', length=3889),\n", + " 'UD-02': Int64Index([ 92, 97, 183, 193, 204, 240, 241, 543,\n", + " 654, 655,\n", + " ...\n", + " 274286, 274287, 274369, 274418, 274502, 274815, 274919, 275024,\n", + " 275086, 275087],\n", + " dtype='int64', length=1417),\n", + " 'UD-04': Int64Index([ 96, 161, 184, 188, 260, 372, 499, 611,\n", + " 678, 891,\n", + " ...\n", + " 274104, 274105, 274160, 274259, 274264, 274283, 274400, 274528,\n", + " 274659, 274870],\n", + " dtype='int64', length=3534),\n", + " 'UD-07': Int64Index([ 115, 116, 281, 469, 669, 696, 738, 904,\n", + " 963, 1040,\n", + " ...\n", + " 273080, 273086, 273331, 273359, 273545, 273783, 274165, 274175,\n", + " 274281, 274759],\n", + " dtype='int64', length=2429),\n", + " 'UW-01': Int64Index([ 730, 1691, 1759, 2124, 2383, 2746, 3087, 3356,\n", + " 3404, 3510,\n", + " ...\n", + " 142135, 142136, 142249, 142254, 142259, 143101, 144918, 145571,\n", + " 147714, 147773],\n", + " dtype='int64', length=480),\n", + " 'UW-02': Int64Index([ 72, 73, 74, 80, 421, 857, 964, 1026,\n", + " 1183, 1433,\n", + " ...\n", + " 274078, 274079, 274088, 274089, 274300, 274301, 274537, 274586,\n", + " 274998, 275063],\n", + " dtype='int64', length=2002),\n", + " 'UW-04': Int64Index([ 187, 343, 375, 463, 477, 580, 673, 762,\n", + " 781, 833,\n", + " ...\n", + " 274811, 274856, 274858, 274876, 274888, 274996, 275045, 275054,\n", + " 275069, 275070],\n", + " dtype='int64', length=2688),\n", + " 'UW-06': Int64Index([ 167, 272, 385, 631, 774, 951, 1011, 1048,\n", + " 1078, 1083,\n", + " ...\n", + " 274454, 274499, 274562, 274617, 274742, 274773, 274786, 274802,\n", + " 274809, 274945],\n", + " dtype='int64', length=2383),\n", + " 'UW-07': Int64Index([ 121, 122, 215, 216, 365, 367, 404, 721,\n", + " 1090, 1141,\n", + " ...\n", + " 273936, 273944, 273961, 274498, 274571, 274576, 274721, 274727,\n", + " 274947, 275046],\n", + " dtype='int64', length=1905),\n", + " 'UW-10': Int64Index([ 105, 124, 128, 314, 619, 896, 934, 935,\n", + " 999, 1006,\n", + " ...\n", + " 238201, 238453, 238514, 238816, 238854, 239274, 239545, 240295,\n", + " 240446, 240775],\n", + " dtype='int64', length=1175),\n", + " 'UW-11': Int64Index([150250, 150776, 151044, 151373, 151690, 152037, 152061, 153903,\n", + " 153941, 154905,\n", + " ...\n", + " 273385, 273607, 273760, 273892, 273904, 274016, 274017, 274018,\n", + " 274473, 274675],\n", + " dtype='int64', length=1237),\n", + " 'UW-12': Int64Index([241157, 241173, 241175, 241194, 241208, 241245, 241292, 241403,\n", + " 241435, 241447,\n", + " ...\n", + " 274748, 274794, 274795, 274843, 274859, 274864, 274938, 274940,\n", + " 274999, 275055],\n", + " dtype='int64', length=689),\n", + " 'WF-01': Int64Index([ 133, 135, 297, 298, 300, 302, 307, 369,\n", + " 475, 514,\n", + " ...\n", + " 274972, 274977, 274989, 275013, 275015, 275028, 275044, 275047,\n", + " 275078, 275079],\n", + " dtype='int64', length=13038),\n", + " 'WF-03': Int64Index([226781, 226784, 226827, 227100, 227321, 227322, 227569, 227570,\n", + " 227768, 227769,\n", + " ...\n", + " 274097, 274100, 274306, 274307, 274325, 274383, 274667, 274708,\n", + " 274909, 274934],\n", + " dtype='int64', length=646),\n", + " 'WF-04': Int64Index([ 64, 65, 132, 157, 203, 207, 236, 264,\n", + " 266, 267,\n", + " ...\n", + " 274382, 274630, 274637, 274697, 274707, 274709, 274880, 274913,\n", + " 274981, 274986],\n", + " dtype='int64', length=6271)}" + ] + }, + "execution_count": 38, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dfgroup = df.groupby(['from_station_id'])\n", + "dfgroup.groups" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The simplest version of a groupby looks like this, and you can use almost any aggregation function you wish (mean, median, sum, minimum, maximum, standard deviation, count, etc.)\n", + "\n", + "```\n", + ".groupby().()\n", + "```\n", + "\n", + "for example, we can group by gender and find the average of all numerical columns:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It's also possible to indes the grouped object like it is a dataframe:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can even group by multiple values: for example we can look at the trip duration by time of day and by gender:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "The ``unstack()`` operation can help make sense of this type of multiply-grouped data. What this technically does is split a multiple-valued index into an index plus columns:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## 5. Visualizing data with ``pandas``\n", + "\n", + "Of course, looking at tables of data is not very intuitive.\n", + "Fortunately Pandas has many useful plotting functions built-in, all of which make use of the ``matplotlib`` library to generate plots.\n", + "\n", + "Whenever you do plotting in the IPython notebook, you will want to first run this *magic command* which configures the notebook to work well with plots:" + ] + }, + { + "cell_type": "code", + "execution_count": 39, + "metadata": {}, + "outputs": [], + "source": [ + "%matplotlib inline" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Now we can simply call the ``plot()`` method of any series or dataframe to get a reasonable view of the data:" + ] + }, + { + "cell_type": "code", + "execution_count": 40, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "" + ] + }, + "execution_count": 40, + "metadata": {}, + "output_type": "execute_result" + }, + { + "data": { + "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZcAAAD8CAYAAAC7IukgAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADl0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uIDIuMi4yLCBodHRwOi8vbWF0cGxvdGxpYi5vcmcvhp/UCwAAFTlJREFUeJzt3H+sXHWZx/H3sy1ggz8oojdN22xx7R+i7CLeQDdszF3ZLQX/KCaQ1BBbkaTGhawm3cSqyeKKJroJmsAqbg2NxbACi5o2sWxtkIkxkR9FK6V2sVfsyrUNDbYiV6Nu8dk/5ntlvMyduZ37nd6Z7vuVTObMc77nnO9zZ3o/nTPnTmQmkiTV9GfzPQFJ0unHcJEkVWe4SJKqM1wkSdUZLpKk6gwXSVJ1hoskqTrDRZJUneEiSapu4XxPoLbzzjsvV6xY0dO2v/71rzn77LPrTmgenW79gD0NC3saDq09Pf74489l5utq7fu0C5cVK1awZ8+enrZtNBqMjY3VndA8Ot36AXsaFvY0HFp7ioj/qblvT4tJkqozXCRJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqo77f5Cfy72/fx53rv5m/Ny7EOffue8HFeS+qHrO5eIWB4RD0XEgYjYHxEfLPWPR8TPI2JvuV3Vss1HImI8Ip6KiCta6mtKbTwiNrfUz4+IRyLiYETcGxFnlvpZ5fF4Wb+iZvOSpP6YzWmxE8CmzHwTsAq4MSIuKOs+l5kXldtOgLJuHfBmYA3whYhYEBELgM8DVwIXAO9u2c9nyr5WAseBG0r9BuB4Zr4R+FwZJ0kacF3DJTOPZOb3y/ILwAFgaYdN1gL3ZObvMvOnwDhwSbmNZ+bTmfl74B5gbUQE8A7g/rL9NuDqln1tK8v3A5eX8ZKkAXZSn7mU01JvBR4BLgNuioj1wB6a726O0wyeh1s2m+ClMHpmWv1S4LXALzPzRJvxS6e2ycwTEfF8Gf/ctHltBDYCjIyM0Gg0TqatPxpZBJsuPNF9YB/0OudOJicn+7Lf+WRPw8GehkM/e5p1uETEK4GvAR/KzF9FxB3ALUCW+1uB9wHt3lkk7d8lZYfxdFn3UiFzC7AFYHR0NHv9Wuzb797Orfvm5xqHQ9eNVd/n6f4V4acLexoO9nRyZnUpckScQTNY7s7MrwNk5rOZ+WJm/gH4Es3TXtB857G8ZfNlwOEO9eeAcyJi4bT6n+yrrH8NcOxkGpQknXqzuVosgDuBA5n52Zb6kpZh7wKeLMs7gHXlSq/zgZXAo8BjwMpyZdiZND/035GZCTwEXFO23wBsb9nXhrJ8DfDtMl6SNMBmcw7oMuA9wL6I2FtqH6V5tddFNE9THQLeD5CZ+yPiPuBHNK80uzEzXwSIiJuAXcACYGtm7i/7+zBwT0R8EvgBzTCj3H8lIsZpvmNZN4deJUmnSNdwyczv0v6zj50dtvkU8Kk29Z3ttsvMp3nptFpr/bfAtd3mKEkaLH79iySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTquoZLRCyPiIci4kBE7I+ID5b6uRGxOyIOlvvFpR4RcVtEjEfEExFxccu+NpTxByNiQ0v9bRGxr2xzW0REp2NIkgbbbN65nAA2ZeabgFXAjRFxAbAZeDAzVwIPlscAVwIry20jcAc0gwK4GbgUuAS4uSUs7ihjp7ZbU+ozHUOSNMC6hktmHsnM75flF4ADwFJgLbCtDNsGXF2W1wJ3ZdPDwDkRsQS4Atidmccy8ziwG1hT1r06M7+XmQncNW1f7Y4hSRpgJ/WZS0SsAN4KPAKMZOYRaAYQ8PoybCnwTMtmE6XWqT7Rpk6HY0iSBtjC2Q6MiFcCXwM+lJm/Kh+LtB3appY91GctIjbSPK3GyMgIjUbjZDb/o5FFsOnCEz1tO1e9zrmTycnJvux3PtnTcLCn4dDPnmYVLhFxBs1guTszv17Kz0bEksw8Uk5tHS31CWB5y+bLgMOlPjat3ij1ZW3GdzrGn8jMLcAWgNHR0RwbG2s3rKvb797OrftmnbdVHbpurPo+G40Gvf4sBpU9DQd7Gg797Gk2V4sFcCdwIDM/27JqBzB1xdcGYHtLfX25amwV8Hw5pbULWB0Ri8sH+auBXWXdCxGxqhxr/bR9tTuGJGmAzea/6ZcB7wH2RcTeUvso8Gngvoi4AfgZcG1ZtxO4ChgHfgNcD5CZxyLiFuCxMu4TmXmsLH8A+DKwCHig3OhwDEnSAOsaLpn5Xdp/LgJweZvxCdw4w762Alvb1PcAb2lT/0W7Y0iSBpt/oS9Jqs5wkSRVZ7hIkqozXCRJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTqDBdJUnWGiySpOsNFklRd13CJiK0RcTQinmypfTwifh4Re8vtqpZ1H4mI8Yh4KiKuaKmvKbXxiNjcUj8/Ih6JiIMRcW9EnFnqZ5XH42X9ilpNS5L6azbvXL4MrGlT/1xmXlRuOwEi4gJgHfDmss0XImJBRCwAPg9cCVwAvLuMBfhM2ddK4DhwQ6nfABzPzDcCnyvjJElDoGu4ZOZ3gGOz3N9a4J7M/F1m/hQYBy4pt/HMfDozfw/cA6yNiADeAdxftt8GXN2yr21l+X7g8jJekjTgFs5h25siYj2wB9iUmceBpcDDLWMmSg3gmWn1S4HXAr/MzBNtxi+d2iYzT0TE82X8c9MnEhEbgY0AIyMjNBqNnhoaWQSbLjzRfWAf9DrnTiYnJ/uy3/lkT8PBnoZDP3vqNVzuAG4BstzfCrwPaPfOImn/Dik7jKfLuj8tZm4BtgCMjo7m2NhYh6nP7Pa7t3Prvrnkbe8OXTdWfZ+NRoNefxaDyp6Ggz0Nh3721NPVYpn5bGa+mJl/AL5E87QXNN95LG8Zugw43KH+HHBORCycVv+TfZX1r2H2p+ckSfOop3CJiCUtD98FTF1JtgNYV670Oh9YCTwKPAasLFeGnUnzQ/8dmZnAQ8A1ZfsNwPaWfW0oy9cA3y7jJUkDrus5oIj4KjAGnBcRE8DNwFhEXETzNNUh4P0Ambk/Iu4DfgScAG7MzBfLfm4CdgELgK2Zub8c4sPAPRHxSeAHwJ2lfifwlYgYp/mOZd2cu5UknRJdwyUz392mfGeb2tT4TwGfalPfCexsU3+al06rtdZ/C1zbbX6SpMHjX+hLkqozXCRJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqq5ruETE1og4GhFPttTOjYjdEXGw3C8u9YiI2yJiPCKeiIiLW7bZUMYfjIgNLfW3RcS+ss1tERGdjiFJGnyzeefyZWDNtNpm4MHMXAk8WB4DXAmsLLeNwB3QDArgZuBS4BLg5pawuKOMndpuTZdjSJIGXNdwyczvAMemldcC28ryNuDqlvpd2fQwcE5ELAGuAHZn5rHMPA7sBtaUda/OzO9lZgJ3TdtXu2NIkgZcr5+5jGTmEYBy//pSXwo80zJuotQ61Sfa1DsdQ5I04BZW3l+0qWUP9ZM7aMRGmqfWGBkZodFonOwuABhZBJsuPNHTtnPV65w7mZyc7Mt+55M9DQd7Gg797KnXcHk2IpZk5pFyautoqU8Ay1vGLQMOl/rYtHqj1Je1Gd/pGC+TmVuALQCjo6M5NjY209CObr97O7fuq523s3PourHq+2w0GvT6sxhU9jQc7Gk49LOnXk+L7QCmrvjaAGxvqa8vV42tAp4vp7R2AasjYnH5IH81sKuseyEiVpWrxNZP21e7Y0iSBlzX/6ZHxFdpvus4LyImaF719Wngvoi4AfgZcG0ZvhO4ChgHfgNcD5CZxyLiFuCxMu4TmTl1kcAHaF6Rtgh4oNzocAxJ0oDrGi6Z+e4ZVl3eZmwCN86wn63A1jb1PcBb2tR/0e4YkqTB51/oS5KqM1wkSdUZLpKk6gwXSVJ1hoskqTrDRZJUneEiSarOcJEkVWe4SJKqM1wkSdUZLpKk6gwXSVJ1hoskqTrDRZJUneEiSarOcJEkVWe4SJKqM1wkSdUZLpKk6gwXSVJ1hoskqTrDRZJUneEiSarOcJEkVWe4SJKqM1wkSdUZLpKk6gwXSVJ1hoskqTrDRZJUneEiSarOcJEkVTencImIQxGxLyL2RsSeUjs3InZHxMFyv7jUIyJui4jxiHgiIi5u2c+GMv5gRGxoqb+t7H+8bBtzma8k6dSo8c7lbzPzoswcLY83Aw9m5krgwfIY4EpgZbltBO6AZhgBNwOXApcAN08FUhmzsWW7NRXmK0nqs36cFlsLbCvL24CrW+p3ZdPDwDkRsQS4Atidmccy8ziwG1hT1r06M7+XmQnc1bIvSdIAWzjH7RP4VkQk8O+ZuQUYycwjAJl5JCJeX8YuBZ5p2Xai1DrVJ9rUXyYiNtJ8h8PIyAiNRqOnZkYWwaYLT/S07Vz1OudOJicn+7Lf+WRPw8GehkM/e5pruFyWmYdLgOyOiP/uMLbd5yXZQ/3lxWaobQEYHR3NsbGxjpOeye13b+fWfXP9kfTm0HVj1ffZaDTo9WcxqOxpONjTcOhnT3M6LZaZh8v9UeAbND8zebac0qLcHy3DJ4DlLZsvAw53qS9rU5ckDbiewyUizo6IV00tA6uBJ4EdwNQVXxuA7WV5B7C+XDW2Cni+nD7bBayOiMXlg/zVwK6y7oWIWFWuElvfsi9J0gCbyzmgEeAb5erghcB/ZOZ/RcRjwH0RcQPwM+DaMn4ncBUwDvwGuB4gM49FxC3AY2XcJzLzWFn+APBlYBHwQLlJkgZcz+GSmU8Df9Wm/gvg8jb1BG6cYV9bga1t6nuAt/Q6R0nS/PAv9CVJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTqDBdJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTqFs73BNS0YvM3q+9z04UneG+X/R769DurH1eSfOciSarOcJEkVWe4SJKqM1wkSdUNfLhExJqIeCoixiNi83zPR5LU3UBfLRYRC4DPA38PTACPRcSOzPzR/M7s9NGPq9Rmw6vUpNPbQIcLcAkwnplPA0TEPcBawHAZcr2G2mwur+7GYJP6b9DDZSnwTMvjCeDSeZqLThPz9W5tJjUCsxsDVafaoIdLtKnlywZFbAQ2loeTEfFUj8c7D3iux20Hzj+eZv2APfUqPtPPvbd12j1PnP49/XnNHQ96uEwAy1seLwMOTx+UmVuALXM9WETsyczRue5nUJxu/YA9DQt7Gg797GnQrxZ7DFgZEedHxJnAOmDHPM9JktTFQL9zycwTEXETsAtYAGzNzP3zPC1JUhcDHS4AmbkT2HmKDjfnU2sD5nTrB+xpWNjTcOhbT5H5ss/HJUmak0H/zEWSNIQMF4bvK2Yi4lBE7IuIvRGxp9TOjYjdEXGw3C8u9YiI20pvT0TExS372VDGH4yIDae4h60RcTQinmypVeshIt5WfkbjZdt2l7Wfip4+HhE/L8/V3oi4qmXdR8r8noqIK1rqbV+P5cKWR0qv95aLXPrZz/KIeCgiDkTE/oj4YKkP7fPUoadhfp5eERGPRsQPS0//0mkeEXFWeTxe1q/otdeOMvP/9Y3mhQI/Ad4AnAn8ELhgvufVZc6HgPOm1f4V2FyWNwOfKctXAQ/Q/JuhVcAjpX4u8HS5X1yWF5/CHt4OXAw82Y8egEeBvy7bPABcOU89fRz4pzZjLyivtbOA88trcEGn1yNwH7CuLH8R+ECf+1kCXFyWXwX8uMx7aJ+nDj0N8/MUwCvL8hnAI+Xn33YewD8AXyzL64B7e+210813Li1fMZOZvwemvmJm2KwFtpXlbcDVLfW7sulh4JyIWAJcAezOzGOZeRzYDaw5VZPNzO8Ax6aVq/RQ1r06M7+XzX81d7Xsq29m6Gkma4F7MvN3mflTYJzma7Ht67H8j/4dwP1l+9afT19k5pHM/H5ZfgE4QPNbM4b2eerQ00yG4XnKzJwsD88ot+wwj9bn737g8jLvk+q127wMl/ZfMdPpxTYIEvhWRDwezW8nABjJzCPQ/AcEvL7UZ+pvEPuu1cPSsjy9Pl9uKqeJtk6dQuLke3ot8MvMPDGtfkqUUydvpfm/4tPieZrWEwzx8xQRCyJiL3CUZnj/pMM8/jj3sv75Mu+qvysMl1l+xcyAuSwzLwauBG6MiLd3GDtTf8PU98n2MEi93QH8BXARcAS4tdSHpqeIeCXwNeBDmfmrTkPb1Ialp6F+njLzxcy8iOa3mFwCvKnDPE5JT4bLLL9iZpBk5uFyfxT4Bs0X07PlNAPl/mgZPlN/g9h3rR4myvL0+imXmc+Wf/h/AL5E87mCk+/pOZqnmRZOq/dVRJxB85fw3Zn59VIe6uepXU/D/jxNycxfAg2an7nMNI8/zr2sfw3N07lVf1cYLkP2FTMRcXZEvGpqGVgNPElzzlNX4WwAtpflHcD6ciXPKuD5cipjF7A6IhaXUwCrS20+VemhrHshIlaVc8nrW/Z1Sk39Ei7eRfO5gmZP68qVO+cDK2l+uN329Vg+k3gIuKZs3/rz6dfcA7gTOJCZn21ZNbTP00w9Dfnz9LqIOKcsLwL+juZnSTPNo/X5uwb4dpn3SfXadWL9uoJhmG40r3L5Mc3zlB+b7/l0mesbaF6t8UNg/9R8aZ4zfRA4WO7PzZeuJPl86W0fMNqyr/fR/NBuHLj+FPfxVZqnH/6X5v+MbqjZAzBK8xfET4B/o/zB8Dz09JUy5yfKP8glLeM/Vub3FC1XSc30eizP/aOl1/8EzupzP39D8/THE8DecrtqmJ+nDj0N8/P0l8APytyfBP650zyAV5TH42X9G3rttdPNv9CXJFXnaTFJUnWGiySpOsNFklSd4SJJqs5wkSRVZ7hIkqozXCRJ1RkukqTq/g/5Os7LxcVukwAAAABJRU5ErkJggg==\n", + "text/plain": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import matplotlib.pyplot as plt\n", + "df['tripduration'].hist()" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Adjusting the Plot Style\n", + "\n", + "Matplotlib has a number of plot styles you can use. For example, if you like R you might use the ggplot style:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Other plot types\n", + "\n", + "Pandas supports a range of other plotting types; you can find these by using the autocomplete on the ``plot`` method:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "For example, we can create a histogram of trip durations:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you'd like to adjust the x and y limits of the plot, you can use the ``set_xlim()`` and ``set_ylim()`` method of the resulting object:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Breakout: Exploring the Data\n", + "\n", + "Make a plot of the total number of rides as a function of month of the year (You'll need to extract the month, use a ``groupby``, and find the appropriate aggregation to count the number in each group)." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Split this plot by gender. Do you see any seasonal ridership patterns by gender?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Split this plot by user type. Do you see any seasonal ridership patterns by usertype?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Repeat the above three steps, counting the number of rides by time of day rather thatn by month." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Are there any other interesting insights you can discover in the data using these tools?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Using Files\n", + "- Writing and running python modules\n", + "- Using python modules in your Jupyter Notebook" + ] + }, + { + "cell_type": "code", + "execution_count": 41, + "metadata": {}, + "outputs": [], + "source": [ + "# A script for creating a dataframe with counts of the occurrence of a columns' values\n", + "df_count = df.groupby('from_station_id').count()\n", + "df_count1 = df_count[['trip_id']]\n", + "df_count2 = df_count1.rename(columns={'trip_id': 'count'})" + ] + }, + { + "cell_type": "code", + "execution_count": 42, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
count
from_station_id
BT-0110463
BT-037334
BT-044666
BT-055699
BT-06150
\n", + "
" + ], + "text/plain": [ + " count\n", + "from_station_id \n", + "BT-01 10463\n", + "BT-03 7334\n", + "BT-04 4666\n", + "BT-05 5699\n", + "BT-06 150" + ] + }, + "execution_count": 42, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "df_count2.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 43, + "metadata": {}, + "outputs": [], + "source": [ + "def make_table_count(df_arg, groupby_column):\n", + " df_count = df_arg.groupby(groupby_column).count()\n", + " column_name = df.columns[0]\n", + " df_count1 = df_count[[column_name]]\n", + " df_count2 = df_count1.rename(columns={column_name: 'count'})\n", + " return df_count2" + ] + }, + { + "cell_type": "code", + "execution_count": 44, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
count
from_station_id
BT-0110463
BT-037334
BT-044666
BT-055699
BT-06150
\n", + "
" + ], + "text/plain": [ + " count\n", + "from_station_id \n", + "BT-01 10463\n", + "BT-03 7334\n", + "BT-04 4666\n", + "BT-05 5699\n", + "BT-06 150" + ] + }, + "execution_count": 44, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dff = make_table_count(df, 'from_station_id')\n", + "dff.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 45, + "metadata": {}, + "outputs": [], + "source": [ + "import table_modifiers as tm" + ] + }, + { + "cell_type": "code", + "execution_count": 46, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['__builtins__',\n", + " '__cached__',\n", + " '__doc__',\n", + " '__file__',\n", + " '__loader__',\n", + " '__name__',\n", + " '__package__',\n", + " '__spec__',\n", + " 'table_counter']" + ] + }, + "execution_count": 46, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "dir(tm)" + ] + }, + { + "cell_type": "code", + "execution_count": 47, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
count
from_station_id
BT-0110463
BT-037334
BT-044666
BT-055699
BT-06150
CBD-034822
CBD-043440
CBD-055068
CBD-064911
CBD-073263
CBD-139067
CD-01958
CH-016409
CH-028546
CH-036218
CH-056948
CH-063765
CH-0711568
CH-088573
CH-095246
CH-125857
CH-156550
CH-162089
DPD-014822
DPD-031423
EL-013604
EL-035788
EL-053400
FH-012349
FH-044208
ID-042474
PS-045409
PS-053969
SLU-017084
SLU-027018
SLU-045226
SLU-076339
SLU-159741
SLU-165045
SLU-183461
SLU-197285
SLU-202452
SLU-211114
SLU-221748
SLU-235739
UD-013889
UD-021417
UD-043534
UD-072429
UW-01480
UW-022002
UW-042688
UW-062383
UW-071905
UW-101175
UW-111237
UW-12689
WF-0113038
WF-03646
WF-046271
\n", + "
" + ], + "text/plain": [ + " count\n", + "from_station_id \n", + "BT-01 10463\n", + "BT-03 7334\n", + "BT-04 4666\n", + "BT-05 5699\n", + "BT-06 150\n", + "CBD-03 4822\n", + "CBD-04 3440\n", + "CBD-05 5068\n", + "CBD-06 4911\n", + "CBD-07 3263\n", + "CBD-13 9067\n", + "CD-01 958\n", + "CH-01 6409\n", + "CH-02 8546\n", + "CH-03 6218\n", + "CH-05 6948\n", + "CH-06 3765\n", + "CH-07 11568\n", + "CH-08 8573\n", + "CH-09 5246\n", + "CH-12 5857\n", + "CH-15 6550\n", + "CH-16 2089\n", + "DPD-01 4822\n", + "DPD-03 1423\n", + "EL-01 3604\n", + "EL-03 5788\n", + "EL-05 3400\n", + "FH-01 2349\n", + "FH-04 4208\n", + "ID-04 2474\n", + "PS-04 5409\n", + "PS-05 3969\n", + "SLU-01 7084\n", + "SLU-02 7018\n", + "SLU-04 5226\n", + "SLU-07 6339\n", + "SLU-15 9741\n", + "SLU-16 5045\n", + "SLU-18 3461\n", + "SLU-19 7285\n", + "SLU-20 2452\n", + "SLU-21 1114\n", + "SLU-22 1748\n", + "SLU-23 5739\n", + "UD-01 3889\n", + "UD-02 1417\n", + "UD-04 3534\n", + "UD-07 2429\n", + "UW-01 480\n", + "UW-02 2002\n", + "UW-04 2688\n", + "UW-06 2383\n", + "UW-07 1905\n", + "UW-10 1175\n", + "UW-11 1237\n", + "UW-12 689\n", + "WF-01 13038\n", + "WF-03 646\n", + "WF-04 6271" + ] + }, + "execution_count": 47, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "tm.table_counter(df, 'from_station_id')" + ] + } + ], + "metadata": { + "anaconda-cloud": {}, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.4" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} diff --git a/02_Procedural_Python/procedural_programming_in_python.ipynb b/02_Procedural_Python/procedural_programming_in_python.ipynb new file mode 100644 index 0000000..e885e03 --- /dev/null +++ b/02_Procedural_Python/procedural_programming_in_python.ipynb @@ -0,0 +1,922 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Procedural programming in python\n", + "\n", + "## Topics\n", + "* Flow control, part 1\n", + " * If\n", + " * For\n", + " * range() function\n", + "* Some hacky hack time\n", + "* Exercises" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "### Review of Data Types\n", + "\n", + "| type | description |\n", + "|------|------------|\n", + "| primitive | int, float, string, bool |\n", + "| tuple | An immutable collection of ordered objects |\n", + "| list | A mutable collection of ordered objects |\n", + "| dictionary | A mutable collection of named objects |\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "## Flow control\n", + "\n", + "Flow control figure\n", + "\n", + "Flow control refers how to programs do loops, conditional execution, and order of functional operations. Let's start with conditionals, or the venerable ``if`` statement.\n", + "\n", + "Let's start with a simple list of instructors for these classes." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "['Dave', 'Joe', 'Bernease', 'Dorkus the Clown']" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "instructors = ['Dave', 'Joe', 'Bernease', 'Dorkus the Clown']\n", + "instructors" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### If\n", + "If statements can be use to execute some lines or block of code if a particular condition is satisfied. E.g. Let's print something based on the entries in the list." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "#fakeinstructor\n" + ] + } + ], + "source": [ + "if 'Dorkus the Clown' in instructors:\n", + " print('#fakeinstructor')" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Usually we want conditional logic on both sides of a binary condition, e.g. some action when ``True`` and some when ``False``" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if 'Dorkus the Clown' in instructors:\n", + " print('There are fake names for class instructors in your list!')\n", + "else:\n", + " print(\"Nothing to see here\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "There is a special do nothing word: `pass` that skips over some arm of a conditional, e.g." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "if 'Joe' in instructors:\n", + " print(\"Congratulations! Joe is teaching, your class won't stink!\")\n", + "else:\n", + " pass" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "_Note_: what have you noticed in this session about quotes? What is the difference between ``'`` and ``\"``?\n", + "\n", + "\n", + "Another simple example:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "scrolled": true + }, + "outputs": [], + "source": [ + "if True is False:\n", + " print(\"I'm so confused\")\n", + "else:\n", + " print(\"Everything is right with the world\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "It is always good practice to handle all cases explicity. `Conditional fall through` is a common source of bugs.\n", + "\n", + "Sometimes we wish to test multiple conditions. Use `if`, `elif`, and `else`." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_favorite = 'pie'\n", + "\n", + "if my_favorite is 'cake':\n", + " print(\"He likes cake! I'll start making a double chocolate velvet cake right now!\")\n", + "elif my_favorite is 'pie':\n", + " print(\"He likes pie! I'll start making a cherry pie right now!\")\n", + "else:\n", + " print(\"He likes \" + my_favorite + \". I don't know how to make that.\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Conditionals can take ``and`` and ``or`` and ``not``. E.g." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_favorite = 'pie'\n", + "\n", + "if my_favorite is 'cake' or my_favorite is 'pie':\n", + " print(my_favorite + \" : I have a recipe for that!\")\n", + "else:\n", + " print(\"Ew! Who eats that?\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## For\n", + "\n", + "For loops are the standard loop, though `while` is also common. For has the general form:\n", + "```\n", + "for items in list:\n", + " do stuff\n", + "```\n", + "\n", + "For loops and collections like tuples, lists and dictionaries are natural friends." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for instructor in instructors:\n", + " print(instructor)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "You can combine loops and conditionals:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for instructor in instructors:\n", + " if instructor.endswith('Clown'):\n", + " print(instructor + \" doesn't sound like a real instructor name!\")\n", + " else:\n", + " print(instructor + \" is so smart... all those gooey brains!\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Dictionaries can use the `keys` method for iterating." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for key in my_dict.keys():\n", + " if len(key) > 5:\n", + " print(my_dict[key])" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### range()\n", + "\n", + "Since for operates over lists, it is common to want to do something like:\n", + "```\n", + "NOTE: C-like\n", + "for (i = 0; i < 3; ++i) {\n", + " print(i);\n", + "}\n", + "```\n", + "\n", + "The Python equivalent is:\n", + "\n", + "```\n", + "for i in [0, 1, 2]:\n", + " do something with i\n", + "```\n", + "\n", + "What happens when the range you want to sample is big, e.g.\n", + "```\n", + "NOTE: C-like\n", + "for (i = 0; i < 1000000000; ++i) {\n", + " print(i);\n", + "}\n", + "```\n", + "\n", + "That would be a real pain in the rear to have to write out the entire list from 1 to 1000000000.\n", + "\n", + "Enter, the `range()` function. E.g.\n", + " ```range(3) is [0, 1, 2]```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "range(3)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Notice that Python (in the newest versions, e.g. 3+) has an object type that is a range. This saves memory and speeds up calculations vs. an explicit representation of a range as a list - but it can be automagically converted to a list on the fly by Python. To show the contents as a `list` we can use the type case like with the tuple above.\n", + "\n", + "Sometimes, in older Python docs, you will see `xrange`. This used the range object back in Python 2 and `range` returned an actual list. Beware of this!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "list(range(3))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Remember earlier with slicing, the syntax `:3` meant `[0, 1, 2]`? Well, the same upper bound philosophy applies here.\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for index in range(3):\n", + " instructor = instructors[index]\n", + " if instructor.endswith('Clown'):\n", + " print(instructor + \" doesn't sound like a real instructor name!\")\n", + " else:\n", + " print(instructor + \" is so smart... all those gooey brains!\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This would probably be better written as" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for index in range(len(instructors)):\n", + " instructor = instructors[index]\n", + " if instructor.endswith('Clown'):\n", + " print(instructor + \" doesn't sound like a real instructor name!\")\n", + " else:\n", + " print(instructor + \" is so smart... all those gooey brains!\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "But in all, it isn't very Pythonesque to use indexes like that (unless you have another reason in the loop) and you would opt instead for the `instructor in instructors` form. \n", + "\n", + "More often, you are doing something with the numbers that requires them to be integers, e.g. math." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sum = 0\n", + "for i in range(10):\n", + " sum += i\n", + "print(sum)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### For loops can be nested\n", + "\n", + "_Note_: for more on formatting strings, see: [https://pyformat.info](https://pyformat.info)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for i in range(1, 4):\n", + " for j in range(1, 4):\n", + " print('%d * %d = %d' % (i, j, i*j)) # Note string formatting here, %d means an integer" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### You can exit loops early if a condition is met:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "for i in range(10):\n", + " if i == 4:\n", + " break\n", + "i" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### You can skip stuff in a loop with `continue`" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sum = 0\n", + "for i in range(10):\n", + " if (i == 5):\n", + " continue\n", + " else:\n", + " sum += i\n", + "print(sum)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### There is a unique language feature call ``for...else``" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "sum = 0\n", + "for i in range(10):\n", + " sum += i\n", + "else:\n", + " print('final i = %d, and sum = %d' % (i, sum))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### You can iterate over letters in a string" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "my_string = \"DIRECT\"\n", + "for c in my_string:\n", + " print(c)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
\n", + "## Exercise\n", + "\n", + "Objective: Replace the `bash magic` bits for downloading the Pronto data and uncompressing it with Python code. Since the download is big, check if the zip file exists first before downloading it again. Then load it into a pandas dataframe.\n", + "\n", + "Notes:\n", + "* The `os` package has tools for checking if a file exists: ``os.path.exists``\n", + "```\n", + "import os\n", + "filename = 'pronto.csv'\n", + "if os.path.exists(filename):\n", + " print(\"wahoo!\")\n", + "```\n", + "* Use the `requests` package to get the file given a url (got this from the requests docs)\n", + "```\n", + "import requests\n", + "url = 'https://s3.amazonaws.com/pronto-data/open_data_year_two.zip'\n", + "req = requests.get(url)\n", + "assert req.status_code == 200 # if the download failed, this line will generate an error\n", + "with open(filename, 'wb') as f:\n", + " f.write(req.content)\n", + "```\n", + "* Use the `zipfile` package to decompress the file while reading it into `pandas`\n", + "```\n", + "import pandas as pd\n", + "import zipfile\n", + "csv_filename = '2016_trip_data.csv'\n", + "zf = zipfile.ZipFile(filename)\n", + "data = pd.read_csv(zf.open(csv_filename))\n", + "```\n" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Now, use your code from above for the following URLs and filenames\n", + "\n", + "| URL | filename | csv_filename |\n", + "|-----|----------|--------------|\n", + "| https://github.com/UWSEDS/LectureNotes/blob/master/open_data_year_two_set1.zip?raw=true | open_data_year_two_set1.zip | 2016_trip_data_set1.csv |\n", + "| https://github.com/UWSEDS/LectureNotes/blob/master/open_data_year_two_set2.zip?raw=true | open_data_year_two_set2.zip | 2016_trip_data_set2.csv |\n", + "| https://github.com/UWSEDS/LectureNotes/blob/master/open_data_year_two_set3.zip?raw=true | open_data_year_two_set3.zip | 2016_trip_data_set3.csv |\n", + "\n", + "What pieces of the data structures and flow control that we talked about earlier can you use?" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "
" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "## Functions\n", + "\n", + "For loops let you repeat some code for every item in a list. Functions are similar in that they run the same lines of code for new values of some variable. They are different in that functions are not limited to looping over items.\n", + "\n", + "Functions are a critical part of writing easy to read, reusable code.\n", + "\n", + "Create a function like:\n", + "```\n", + "def function_name (parameters):\n", + " \"\"\"\n", + " optional docstring\n", + " \"\"\"\n", + " function expressions\n", + " return [variable]\n", + "```\n", + "\n", + "_Note:_ Sometimes I use the word argument in place of parameter.\n", + "\n", + "Here is a simple example. It prints a string that was passed in and returns nothing." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "def print_string(str):\n", + " \"\"\"This prints out a string passed as the parameter.\"\"\"\n", + " print(str)\n", + " return" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "To call the function, use:\n", + "```\n", + "print_string(\"Dave is awesome!\")\n", + "```\n", + "\n", + "_Note:_ The function has to be defined before you can call it!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print_string(\"Dave is awesome!\")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "If you don't provide an argument or too many, you get an error." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Parameters (or arguments) in Python are all passed by reference. This means that if you modify the parameters in the function, they are modified outside of the function.\n", + "\n", + "See the following example:\n", + "\n", + "```\n", + "def change_list(my_list):\n", + " \"\"\"This changes a passed list into this function\"\"\"\n", + " my_list.append('four');\n", + " print('list inside the function: ', my_list)\n", + " return\n", + "\n", + "my_list = [1, 2, 3];\n", + "print('list before the function: ', my_list)\n", + "change_list(my_list);\n", + "print('list after the function: ', my_list)\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def change_list(my_list):\n", + " \"\"\"This changes a passed list into this function\"\"\"\n", + " my_list.append('four');\n", + " print('list inside the function: ', my_list)\n", + " return\n", + "\n", + "my_list = [1, 2, 3];\n", + "print('list before the function: ', my_list)\n", + "change_list(my_list);\n", + "print('list after the function: ', my_list)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Variables have scope: `global` and `local`\n", + "\n", + "In a function, new variables that you create are not saved when the function returns - these are `local` variables. Variables defined outside of the function can be accessed but not changed - these are `global` variables, _Note_ there is a way to do this with the `global` keyword. Generally, the use of `global` variables is not encouraged, instead use parameters.\n", + "\n", + "```\n", + "my_global_1 = 'bad idea'\n", + "my_global_2 = 'another bad one'\n", + "my_global_3 = 'better idea'\n", + "\n", + "def my_function():\n", + " print(my_global)\n", + " my_global_2 = 'broke your global, man!'\n", + " global my_global_3\n", + " my_global_3 = 'still a better idea'\n", + " return\n", + " \n", + "my_function()\n", + "print(my_global_2)\n", + "print(my_global_3)\n", + "```" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In general, you want to use parameters to provide data to a function and return a result with the `return`. E.g.\n", + "\n", + "```\n", + "def sum(x, y):\n", + " my_sum = x + y\n", + " return my_sum\n", + "```\n", + "\n", + "If you are going to return multiple objects, what data structure that we talked about can be used? Give and example below." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "### Parameters have four different types:\n", + "\n", + "| type | behavior |\n", + "|------|----------|\n", + "| required | positional, must be present or error, e.g. `my_func(first_name, last_name)` |\n", + "| keyword | position independent, e.g. `my_func(first_name, last_name)` can be called `my_func(first_name='Dave', last_name='Beck')` or `my_func(last_name='Beck', first_name='Dave')` |\n", + "| default | keyword params that default to a value if not provided |\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [ + "def print_name(first, last='the Clown'):\n", + " print('Your name is %s %s' % (first, last))\n", + " return" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Play around with the above function." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Functions can contain any code that you put anywhere else including:\n", + "* if...elif...else\n", + "* for...else\n", + "* while\n", + "* other function calls" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "def print_name_age(first, last, age):\n", + " print_name(first, last)\n", + " print('Your age is %d' % (age))\n", + " if age > 35:\n", + " print('You are really old.')\n", + " return" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "print_name_age(age=40, last='Beck', first='Dave')" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "#### How would you functionalize the above code for downloading, unzipping, and making a dataframe?" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Once you have some code that is functionalized and not going to change, you can move it to a file that ends in `.py`, check it into version control, import it into your notebook and use it!\n", + "\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Homework:\n", + "Save your functions to `pronto_utils.py`. Import the functions and use them to rewrite HW1. This will be laid out in the homework repo for HW2. Check the website." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "collapsed": true + }, + "outputs": [], + "source": [] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.6.4" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} diff --git a/Autumn2017/03-Procedural-Python/Procedural-Python-Completed.ipynb b/Autumn2017/03-Procedural-Python/Procedural-Python-Completed.ipynb index 826b9ef..b6a87b3 100644 --- a/Autumn2017/03-Procedural-Python/Procedural-Python-Completed.ipynb +++ b/Autumn2017/03-Procedural-Python/Procedural-Python-Completed.ipynb @@ -266,7 +266,7 @@ "metadata": { "anaconda-cloud": {}, "kernelspec": { - "display_name": "Python [default]", + "display_name": "Python 3", "language": "python", "name": "python3" }, @@ -280,7 +280,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.2" + "version": "3.6.4" } }, "nbformat": 4, diff --git a/Autumn2017/03-Procedural-Python/Procedural-Python.ipynb b/Autumn2017/03-Procedural-Python/Procedural-Python.ipynb index 90414ab..66dcb35 100644 --- a/Autumn2017/03-Procedural-Python/Procedural-Python.ipynb +++ b/Autumn2017/03-Procedural-Python/Procedural-Python.ipynb @@ -49,9 +49,9 @@ "metadata": { "anaconda-cloud": {}, "kernelspec": { - "display_name": "Python 3.6", + "display_name": "Python 3", "language": "python", - "name": "python3.6" + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -63,9 +63,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.6.0" + "version": "3.6.4" } }, "nbformat": 4, - "nbformat_minor": 0 + "nbformat_minor": 1 } diff --git a/PreFall2018/02-Python-and-Data/02-Python-and-Data.pdf b/PreFall2018/02-Python-and-Data/02-Python-and-Data.pdf deleted file mode 100644 index ec5feb5..0000000 Binary files a/PreFall2018/02-Python-and-Data/02-Python-and-Data.pdf and /dev/null differ diff --git a/PreFall2018/02-Python-and-Data/Lecture-Python-and-Data.ipynb b/PreFall2018/02-Python-and-Data/Lecture-Python-and-Data.ipynb index 7497cef..98b7d71 100644 --- a/PreFall2018/02-Python-and-Data/Lecture-Python-and-Data.ipynb +++ b/PreFall2018/02-Python-and-Data/Lecture-Python-and-Data.ipynb @@ -3,9 +3,7 @@ { "cell_type": "code", "execution_count": 1, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [ { "data": { @@ -232,18 +230,14 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -264,18 +258,14 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -289,9 +279,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -305,9 +293,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -321,9 +307,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -362,9 +346,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -378,9 +360,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -395,9 +375,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -434,9 +412,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -503,9 +479,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -572,9 +546,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "from IPython.display import Image\n", @@ -591,9 +563,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -613,9 +583,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -629,9 +597,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -668,9 +634,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -692,9 +656,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "import seaborn\n", @@ -711,9 +673,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -729,9 +689,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -875,9 +833,9 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 3.5", - "language": "", - "name": "python3.5" + "display_name": "Python 3", + "language": "python", + "name": "python3" }, "language_info": { "codemirror_mode": { @@ -889,9 +847,9 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.5.1" + "version": "3.6.4" } }, "nbformat": 4, - "nbformat_minor": 0 + "nbformat_minor": 1 } diff --git a/PreFall2018/04-Procedural_Python.ipynb b/PreFall2018/04-Procedural_Python.ipynb index c109307..aa7fae9 100644 --- a/PreFall2018/04-Procedural_Python.ipynb +++ b/PreFall2018/04-Procedural_Python.ipynb @@ -36,9 +36,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_tuple = ('I', 'like', 'cake')\n", @@ -55,9 +53,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_tuple[0]" @@ -73,9 +69,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_tuple[-1]" @@ -91,9 +85,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_tuple[0:2]" @@ -102,9 +94,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_tuple[0:3]" @@ -132,9 +122,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_tuple[1:]" @@ -143,9 +131,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_tuple[:-1]" @@ -154,9 +140,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_tuple[:]" @@ -172,9 +156,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_tuple[2]" @@ -190,9 +172,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "#my_tuple[2] = 'pie'" @@ -222,9 +202,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_tuple" @@ -233,9 +211,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_tuple = ('I', 'love', 'pie')\n", @@ -252,9 +228,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "'love' in my_tuple" @@ -270,9 +244,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "import math\n", @@ -292,9 +264,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -308,9 +278,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_second_tuple + my_tuple" @@ -326,9 +294,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -380,9 +346,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_list = ['I', 'like', 'cake']\n", @@ -399,9 +363,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_list[0]" @@ -410,9 +372,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_list[-1]" @@ -421,9 +381,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_list[0:3]" @@ -439,9 +397,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_list[2] = 'pie'\n", @@ -458,9 +414,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_list[1:] = ['love', 'puppies']\n", @@ -477,9 +431,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "'puppies' in my_list" @@ -488,9 +440,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "'kittens' in my_list" @@ -510,9 +460,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_new_list = []\n", @@ -529,9 +477,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_new_list.append('Now')\n", @@ -548,9 +494,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_new_list + my_list" @@ -566,9 +510,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "len(my_list)" @@ -603,9 +545,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "list(my_tuple)" @@ -623,9 +563,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "type(tuple)" @@ -634,9 +572,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "type(list(my_tuple))" @@ -685,9 +621,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_list.count('I')\n", @@ -720,9 +654,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -772,9 +704,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_dict = { 'tuple' : 'An immutable collection of ordered objects',\n", @@ -793,9 +723,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_dict['dictionary']" @@ -811,9 +739,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_dict['dictionary'] = 'A mutable collection of named objects'\n", @@ -832,9 +758,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_dict['cabbage'] = 'Green leafy plant in the Brassica family'\n", @@ -851,9 +775,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_dict['cabbage'] = None\n", @@ -870,9 +792,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_dict.pop('cabbage', None)\n", @@ -889,9 +809,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_new_dict = {}\n", @@ -910,9 +828,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_dict.keys()" @@ -928,9 +844,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_dict.items()" @@ -946,9 +860,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "'dictionary' in my_dict.keys()" @@ -964,9 +876,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "'dictionary' in my_dict" @@ -982,9 +892,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "'A mutable collection of ordered objects' in my_dict" @@ -1053,9 +961,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "instructors = ['Dave', 'Joe', 'Dorkus the Clown']\n", @@ -1073,9 +979,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "if 'Dorkus the Clown' in instructors:\n", @@ -1092,9 +996,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "if 'Dorkus the Clown' in instructors:\n", @@ -1113,9 +1015,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "if 'Joe' in instructors:\n", @@ -1138,7 +1038,6 @@ "cell_type": "code", "execution_count": null, "metadata": { - "collapsed": false, "scrolled": true }, "outputs": [], @@ -1161,9 +1060,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_favorite = 'pie'\n", @@ -1186,9 +1083,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_favorite = 'pie'\n", @@ -1217,9 +1112,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "for instructor in instructors:\n", @@ -1236,9 +1129,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "for instructor in instructors:\n", @@ -1258,9 +1149,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "for key in my_dict.keys():\n", @@ -1306,9 +1195,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "range(3)" @@ -1326,9 +1213,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "list(range(3))" @@ -1344,9 +1229,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "for index in range(3):\n", @@ -1367,9 +1250,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "for index in range(len(instructors)):\n", @@ -1392,9 +1273,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "sum = 0\n", @@ -1415,9 +1294,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "for i in range(1, 4):\n", @@ -1435,9 +1312,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "for i in range(10):\n", @@ -1456,9 +1331,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "sum = 0\n", @@ -1480,9 +1353,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "sum = 0\n", @@ -1502,9 +1373,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "my_string = \"DIRECT\"\n", @@ -1551,18 +1420,14 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [] }, @@ -1735,9 +1600,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "print_string(\"Dave is awesome!\")" @@ -1784,9 +1647,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "def change_list(my_list):\n", @@ -1934,9 +1795,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "def print_name_age(first, last, age):\n", @@ -1950,9 +1809,7 @@ { "cell_type": "code", "execution_count": null, - "metadata": { - "collapsed": false - }, + "metadata": {}, "outputs": [], "source": [ "print_name_age(age=40, last='Beck', first='Dave')" @@ -2038,23 +1895,23 @@ ], "metadata": { "kernelspec": { - "display_name": "Python 2", + "display_name": "Python 3", "language": "python", - "name": "python2" + "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", - "version": 2 + "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", - "pygments_lexer": "ipython2", - "version": "2.7.12" + "pygments_lexer": "ipython3", + "version": "3.6.4" } }, "nbformat": 4, - "nbformat_minor": 0 + "nbformat_minor": 1 }