-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathsairr_ed.Rmd
172 lines (148 loc) · 5.96 KB
/
sairr_ed.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
---
title: "SAIRR Check"
author: "Simon Halliday"
date: "2017-June -27"
output: oilabs::lab_report
bibliography: uct/sairr_bib.bib
---
```{r setup, include = FALSE, message = FALSE, warnings = FALSE}
library(tidyverse)
library(mosaic)
library(knitr)
```
##Intro
I saw the report by @sairr17 and used by a friend in a Facebook discussion (you can get the report [here](http://sa-monitor.com/wp-content/uploads/2017/02/IRR-Auditing-racial-transformation-2017-01.pdf)). I didn't believe the numbers he cited from the report, so I wanted to download the data and check them. The main data are from @ghs2015. I use packages by @tidyverse2017 and @oilabs2016 for ease in this doc. Note, this isn't meant to be a comprehensive fact check, just me not sure about numbers a friend was citing.
##Data
The data can be obtained here: [microdata.worldbank.org/index.php/catalog/2773/related_materials](http://microdata.worldbank.org/index.php/catalog/2773/related_materials).
Import the data csv of the GHS 2015:
```{r data_import, warnings = FALSE, message = FALSE}
ghs15 <- read_csv("uct/csv/ghs_2015_person_v1.1_20160608.csv")
```
##Wrangling
Wrangle the data and select the sample of 20 and older.
Generate a variable that checks the proportion of people of each race that has Matric or higher.
Also try to find out their error with another variable.
Couldn't quite get the same proportions they did, but I wasn't trying too hard.
```{r}
educ20 <-
ghs15 %>%
filter(Age > 19) %>%
mutate(matricmore = ifelse(Q15HIEDU %in% c(12, 13, 16,
17, 18, 19,
21, 22, 23,
24, 25, 26,
27, 28, 29),
1, 0),
matriconly = ifelse(Q15HIEDU %in% c(12, 13, 23)
, 1, 0),
race = factor(race, labels = c("Black/African",
"Coloured",
"Indian/Asian",
"White")))
```
##Summary Stats
Generate summary data frame.
Note: I haven't used the sampling weights here because I haven't had the time to research them, they shouldn't change the estimates too much just yet.
```{r}
raceSum <-
educ20 %>%
group_by(race) %>%
summarise(matricprop = round(mean(matricmore), 3))
kable(raceSum, col.names = c("Race", "Proportion with at least Grade 12"))
```
##Other tallies
Screwing around with some other tallies to check my work above.
```{r}
mosaic::tally(~Q15HIEDU, data = educ20)
mosaic::tally(~race, data = educ20)
mosaic::tally(~matricmore, data = educ20)
mosaic::tally(~matricmore | race, data = educ20, format = "percent")
```
##Housing
For housing I need to read in the household data.
```{r}
ghs15hh <- read_csv("uct/csv/ghs_2015_house_v1.1_20160608.csv")
```
```{r}
mosaic::tally(~Q56Owner | head_popgrp, data = ghs15hh, format = "percent")
```
```{r}
mosaic::tally(~Q58Val | head_popgrp, data = ghs15hh, format = "percent")
```
##Wrangling HH data
I need to generate new variables to create the proportions of the different home values by race.
```{r}
ghshhval <-
ghs15hh %>%
filter(Q58Val != 99) %>%
select(head_popgrp, Q58Val)
hhcounts <-
ghshhval %>%
group_by(head_popgrp) %>%
summarise(n = n())
hhval <-
ghshhval %>%
group_by(Q58Val, head_popgrp) %>%
summarise(count = count(head_popgrp)) %>%
left_join(hhcounts) %>%
ungroup %>%
mutate(freq = count / n,
Q58Val = factor(Q58Val,
labels = c("Less than R50k", "R50 001 - R250 000", "R250 001 - R500
000", " R500 001 - R1 000 000", "R1 000 001 - R1 500 000",
"R1 500 001 - R2 000 000", "R2 000 001 - R3 000 000",
"More than R3 000 000", "Do not know")),
head_popgrp = factor(head_popgrp, labels = c("Black/African", "Coloured", "Indian/Asian",
"White")))
```
##Plotting the hh values
I now plot the wrangled data in `ggplot()`:
```{r}
hhplot <-
hhval %>%
ggplot(aes(x = Q58Val, y = freq, group = head_popgrp, fill = head_popgrp)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("Home value estimate") +
ylab("Frequency") +
scale_fill_discrete(name = "Ethnic group of HH Head") +
labs(title = "Home value frequencies by ethnicity in South Africa",
subtitle = "Source: SA GHS, 2015") +
theme_bw() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
text = element_text(size = 16))
hhplot
```
I need to export the file to jpeg to post to FB.
```{r}
jpeg(file = "hhplot_sa_ghs15.jpg", width = 640, height = 480, units = "px")
hhplot
dev.off()
```
```{r}
hhplot2 <-
hhval %>%
filter(Q58Val != "Do not know") %>%
ggplot(aes(x = Q58Val, y = freq, fill = head_popgrp)) +
geom_bar(stat = "identity", position = "dodge") +
xlab("Home value estimate") +
ylab("Frequency") +
#facet_grid(. ~head_popgrp, scales = "free") +
facet_wrap(~ head_popgrp) +
scale_fill_discrete(guide = FALSE) +
#scale_fill_discrete(name = "Ethnic group of HH Head") +
labs(title = "Home value frequencies by ethnicity in South Africa",
subtitle = "Source: SA GHS, 2015 (Response \"Don't know\" Excluded)") +
theme_bw() +
theme(text = element_text(size = 16),
axis.text.x = element_text(angle = 45, hjust = 1, size = 10)
)
hhplot2
```
```{r}
jpeg(file = "hhplot_sa_ghs15_facet.jpg", width = 640, height = 480, units = "px")
hhplot2
dev.off()
```
##License
<a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/"><img alt="Creative Commons License" style="border-width:0" src="https://i.creativecommons.org/l/by-nc/4.0/88x31.png" /></a><br />This work is licensed under a <a rel="license" href="http://creativecommons.org/licenses/by-nc/4.0/">Creative Commons Attribution-NonCommercial 4.0 International License</a>.
##References