-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathLSE_DA301_Assignment_R_template.R
302 lines (181 loc) · 9.25 KB
/
LSE_DA301_Assignment_R_template.R
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
## LSE Data Analytics Online Career Accelerator
# DA301: Advanced Analytics for Organisational Impact
###############################################################################
# Assignment template
## Scenario
## You are a data analyst working for Turtle Games, a game manufacturer and
## retailer. They manufacture and sell their own products, along with sourcing
## and selling products manufactured by other companies. Their product range
## includes books, board games, video games and toys. They have a global
## customer base and have a business objective of improving overall sales
##performance by utilising customer trends.
## In particular, Turtle Games wants to understand:
## - how customers accumulate loyalty points (Week 1)
## - how useful are remuneration and spending scores data (Week 2)
## - can social data (e.g. customer reviews) be used in marketing
## campaigns (Week 3)
## - what is the impact on sales per product (Week 4)
## - the reliability of the data (e.g. normal distribution, Skewness, Kurtosis)
## (Week 5)
## - if there is any possible relationship(s) in sales between North America,
## Europe, and global sales (Week 6).
################################################################################
# Week 4 assignment: EDA using R
## The sales department of Turtle games prefers R to Python. As you can perform
## data analysis in R, you will explore and prepare the data set for analysis by
## utilising basic statistics and plots. Note that you will use this data set
## in future modules as well and it is, therefore, strongly encouraged to first
## clean the data as per provided guidelines and then save a copy of the clean
## data for future use.
# Instructions
# 1. Load and explore the data.
## - Remove redundant columns (Ranking, Year, Genre, Publisher) by creating
## a subset of the data frame.
## - Create a summary of the new data frame.
# 2. Create plots to review and determine insights into data set.
## - Create scatterplots, histograms and boxplots to gain insights into
## the Sales data.
## - Note your observations and diagrams that could be used to provide
## insights to the business.
# 3. Include your insights and observations.
###############################################################################
# 1. Load and explore the data
# Install and import Tidyverse.
# Import the data set.
# Print the data frame.
# Create a new data frame from a subset of the sales data frame.
# Remove unnecessary columns.
# View the data frame.
# View the descriptive statistics.
################################################################################
# 2. Review plots to determine insights into the data set.
## 2a) Scatterplots
# Create scatterplots.
## 2b) Histograms
# Create histograms.
## 2c) Boxplots
# Create boxplots.
###############################################################################
# 3. Observations and insights
## Your observations and insights here ......
###############################################################################
###############################################################################
# Week 5 assignment: Cleaning and maniulating data using R
## Utilising R, you will explore, prepare and explain the normality of the data
## set based on plots, Skewness, Kurtosis, and a Shapiro-Wilk test. Note that
## you will use this data set in future modules as well and it is, therefore,
## strongly encouraged to first clean the data as per provided guidelines and
## then save a copy of the clean data for future use.
## Instructions
# 1. Load and explore the data.
## - Continue to use the data frame that you prepared in the Week 4 assignment.
## - View the data frame to sense-check the data set.
## - Determine the `min`, `max` and `mean` values of all the sales data.
## - Create a summary of the data frame.
# 2. Determine the impact on sales per product_id.
## - Use the group_by and aggregate functions to sum the values grouped by
## product.
## - Create a summary of the new data frame.
# 3. Create plots to review and determine insights into the data set.
## - Create scatterplots, histograms, and boxplots to gain insights into
## the Sales data.
## - Note your observations and diagrams that could be used to provide
## insights to the business.
# 4. Determine the normality of the data set.
## - Create and explore Q-Q plots for all sales data.
## - Perform a Shapiro-Wilk test on all the sales data.
## - Determine the Skewness and Kurtosis of all the sales data.
## - Determine if there is any correlation between the sales data columns.
# 5. Create plots to gain insights into the sales data.
## - Compare all the sales data (columns) for any correlation(s).
## - Add a trend line to the plots for ease of interpretation.
# 6. Include your insights and observations.
################################################################################
# 1. Load and explore the data
# View data frame created in Week 4.
# Check output: Determine the min, max, and mean values.
# View the descriptive statistics.
###############################################################################
# 2. Determine the impact on sales per product_id.
## 2a) Use the group_by and aggregate functions.
# Group data based on Product and determine the sum per Product.
# View the data frame.
# Explore the data frame.
## 2b) Determine which plot is the best to compare game sales.
# Create scatterplots.
# Create histograms.
# Create boxplots.
###############################################################################
# 3. Determine the normality of the data set.
## 3a) Create Q-Q Plots
# Create Q-Q Plots.
## 3b) Perform Shapiro-Wilk test
# Install and import Moments.
# Perform Shapiro-Wilk test.
## 3c) Determine Skewness and Kurtosis
# Skewness and Kurtosis.
## 3d) Determine correlation
# Determine correlation.
###############################################################################
# 4. Plot the data
# Create plots to gain insights into data.
# Choose the type of plot you think best suits the data set and what you want
# to investigate. Explain your answer in your report.
###############################################################################
# 5. Observations and insights
# Your observations and insights here...
###############################################################################
###############################################################################
# Week 6 assignment: Making recommendations to the business using R
## The sales department wants to better understand if there is any relationship
## between North America, Europe, and global sales. Therefore, you need to
## investigate any possible relationship(s) in the sales data by creating a
## simple and multiple linear regression model. Based on the models and your
## previous analysis (Weeks 1-5), you will then provide recommendations to
## Turtle Games based on:
## - Do you have confidence in the models based on goodness of fit and
## accuracy of predictions?
## - What would your suggestions and recommendations be to the business?
## - If needed, how would you improve the model(s)?
## - Explain your answers.
# Instructions
# 1. Load and explore the data.
## - Continue to use the data frame that you prepared in the Week 5 assignment.
# 2. Create a simple linear regression model.
## - Determine the correlation between the sales columns.
## - View the output.
## - Create plots to view the linear regression.
# 3. Create a multiple linear regression model
## - Select only the numeric columns.
## - Determine the correlation between the sales columns.
## - View the output.
# 4. Predict global sales based on provided values. Compare your prediction to
# the observed value(s).
## - NA_Sales_sum of 34.02 and EU_Sales_sum of 23.80.
## - NA_Sales_sum of 3.93 and EU_Sales_sum of 1.56.
## - NA_Sales_sum of 2.73 and EU_Sales_sum of 0.65.
## - NA_Sales_sum of 2.26 and EU_Sales_sum of 0.97.
## - NA_Sales_sum of 22.08 and EU_Sales_sum of 0.52.
# 5. Include your insights and observations.
###############################################################################
# 1. Load and explor the data
# View data frame created in Week 5.
# Determine a summary of the data frame.
###############################################################################
# 2. Create a simple linear regression model
## 2a) Determine the correlation between columns
# Create a linear regression model on the original data.
## 2b) Create a plot (simple linear regression)
# Basic visualisation.
###############################################################################
# 3. Create a multiple linear regression model
# Select only numeric columns from the original data frame.
# Multiple linear regression model.
###############################################################################
# 4. Predictions based on given values
# Compare with observed values for a number of records.
###############################################################################
# 5. Observations and insights
# Your observations and insights here...
###############################################################################
###############################################################################