-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdemo-notebook-instructions.Rmd
106 lines (73 loc) · 32.5 KB
/
demo-notebook-instructions.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
title: "Demo Notebook Instructions"
output:
html_document: default
---
# Introduction
<br>
The goal of this session is to give you a hands-on taste on how to use the different features of the app, including conducting analyses, publishing reports, and deploying models.
<br><br>
Imagine that you are a data scientist at a company that has to perform dynamic inventory management. An example of that would be a ride-sharing company where you want to know which parts of a city to direct your drivers to depending on the time of day and other factors.
<br><br>
In this first part of the session we'll go over doing analysis and publishing the results for business-user consumption. We'll:
* Import pre-installed modules and install additional open-source modules using pip.
* Load data.
* Run a simple analysis.
* Publish the results as a Report.
<br><br>
In the second part of the session we’ll deploy a content recommender model that anyone in the organization can query for recommendations using a REST API.
Jupyter Notebookdemo-notebook-instructions (autosaved) R
R Trusted
File
Edit
View
Insert
Cell
Kernel
Help
# Introduction
<br>
The goal of this session is to give you a hands-on taste on how to use the different features of the app, including conducting analyses, publishing reports, and deploying models.
<br><br>
Imagine that you are a data scientist at a company that has to perform dynamic inventory management. An example of that would be a ride-sharing company where you want to know which parts of a city to direct your drivers to depending on the time of day and other factors.
<br><br>
In this first part of the session we'll go over doing analysis and publishing the results for business-user consumption. We'll:
* Import pre-installed modules and install additional open-source modules using pip.
* Load data.
* Run a simple analysis.
* Publish the results as a Report.
<br><br>
In the second part of the session we’ll deploy a content recommender model that anyone in the organization can query for recommendations using a REST API.
## Getting Started
From the <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">Projects</span> page, select a project for the work.
Before running a notebook we need to first spin up an environment. Click on the shortcut <img style = "display: inline; width:30px; height:30px" src =""></img> button on the left sidebar and select <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">Launch a Session</span>. As you may recall, each project is linked to a github repository. Select the repository branch in which to implement the new analysis code. For this demo we'll select the <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">master</span> branch.
For this tutorial we'll be using RStudio to do our analysis. Select it under the <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">Select Tool</span> drop down menu. For this analysis the 2GB environment should be enough, so select that under <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">Compute Resources</span>. We'll use R under <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">Script Language</span>. You can also set up libraries to pre-install, but we won't do that for now. Let's go ahead and then click on <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">Launch</span>. We can now use this environment to run our analyses.
Navigate to the Session using the Currently Running Resources drop down. Click the <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">Go To</span> button. You will see a list of files. You should see the notebook <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">demo-notebook.ipynb</span>. Open it.
## In the Notebook : Loading Modules
The first thing we'll do is import a few modules. The code to import a module that is already installed is:
> <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">library("dplyr")</span>
New modules can be installed from CRAN and other archives. For example, the following command will install the forecast module:
> <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">install.packages("forecast")</span>
## In the Notebook : Loading And Transforming Data
We'll next define a set of functions for loading and featurizing our data. The platform can pull data databases, as well as data files stored in S3 buckets or other online storage. For this demo we've stored our data in a RData file for the sake of simplicity.
## In the Notebook : Analyzing Data
Let's explore how the mean number of pickups each day differs between different taxi zones/neighborhoods in New York City. We will visualize these data as a heatmap. The lighter the shade of blue for that zone, the more demand there is. It is clear that there are substantial differences between the zones with the demand concentrated mainly in Manhattan and at the airports.
Now that we've seen our data we'll note any insights that were revealed. Here are some examples:
<blockquote>
<div style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">
1. Demand is uneven across zones.
<br><br>
2. The highest demand is seen in Manhanttan, specifically south of Central Park.
<br><br>
3. Brooklyn and Long Island have the most demand of the outer boroughs.
</div>
</blockquote>
## Publishing Results
To publish our analysis we must first **save the notebook and then commit and push those changes to the repository branch** (Hit <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">SYNC</span>). Once you have committed and pushed the notebook, you are ready to publish a report.
Navigate to your project and then click on the <img style = "display: inline; width:30px; height:30px; valign:center" src =""></img> button.
Select <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">Publish a Report</span>.
Select the branch to which you made the notebook changes. If you followed the instructions in this demo that branch should be <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">master</span>. Under <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">File</span>, select the notebook to be published, <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">demo-notebook.ipynb</span>
Next, fill in the <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">Report Info:</span> including a title and a description. Click the <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">Publish</span> button. You can immediately view the report by clicking the <span style="padding: 2px 5px 2px 5px; font-family:courier; background-color:#DDDDDD;">Go To Report</span> button at the end of the publishing process.
You can also view Reports with the Outputs view, this is where a business user or collaborator would typically view reports, progress, or the history of the project.
## Congratulations!
You've done your first analysis on the DataScience Platform and published your first Report! All of this was done within a single environment. Analysis code, outputs, and reporting coexist and interact nicely inside the platform making the workflow from data analysis to reporting and sharing a seamless process.