-
Notifications
You must be signed in to change notification settings - Fork 18
/
Copy path7-Mapping.qmd
237 lines (169 loc) · 12.2 KB
/
7-Mapping.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
---
title: "Worksheet 7: Mapping"
author: ""
date: ""
---
_This is the seventh in a series of worksheets for History 8500 at Clemson University. The goal of these worksheets is simple: practice, practice, practice. The worksheet introduces concepts and techniques and includes prompts for you to practice in this interactive document. When you are finished, you should change the author name (above), render your document to a pdf, and upload it to canvas. Don't forget to commit your changes as you go and push to github when you finish the worksheet._
## Mapping with `ggmap()` and `ggplot2()`
There are many different mapping packages for R. That means there is no single way to create a map in R. Different packages have different strengths and weaknesses and your use of them will depend on your goals and the historical questions you would like to ask. If your project is highly map centric - it may be better suited to ArcGIS which we will not cover in this class.
```{r message=FALSE, warning=FALSE}
library(ggplot2)
library(tidyverse)
library(DigitalMethodsData)
library(ggmap)
library(tidygeocoder)
```
### Geocoding
The first step in any project is to create geographical data. Depending on the time period you study and the nature of the data, this may or may not be able to be automated. The process of associating geographical coordinates (latitude/longitude) with data is called **geocoding**. There are numerous avenues and services for performing this service. Google Maps and Open Street Maps are the two most common. These services accept an address and return latitude and longitude coordinates. Google Maps does require an API Key which you can sign up for. Typically geocoding with Google costs .5 cents per entry but when you sign up with them, you get $300 in credit per year (at least at the time of writing this - that may change). Although we geocode a very large amount of data with Google on Mapping the Gay Guides, I've never been charged for geocoding.
However, for the purposes of this class we're going to use Open Street Map's geocoding API because it is open source and therefore free.
To use the geocoding service, lets first load some data. We'll use the recreation data that we used last week.
```{r}
rec.data <- read.csv("https://raw.githubusercontent.com/regan008/DigitalMethodsData/main/raw/Recreation-Expenditures.csv")
head(rec.data)
```
Notice in this dataset we have the city state and year but no geographical coordinates if we wanted to map this data. Even if we don't have an exact street address, we can still geocode this data to get coordinates. The function to do that is `geocode()` and we can pass it a city and street. Note the method option, which designates which geocoding service we want to use.
```{r}
rec.data.coordinates <- rec.data %>% geocode(city = city, state = state, method='osm', lat = latitude, long = longitude)
head(rec.data.coordinates)
```
Now we have latitude and longitude coordinates for our data.
(@) Use this approach to geocode the `UndergroundRR` data.
```{r}
```
(@) Geocode the Boston Women Voters dataset. Note that the data does include street addresses but because they are broken into parts - street address number, street, etc - you'll need to combine them into a full address to pass to the geocoding function.
```{r}
data("BostonWomenVoters")
```
### Maps with `ggplot()`
Just like charts in ggplot, maps in ggplot are plotted using a variety of layers. To build a map we need to supply it with geographic data that can use to plot a base map. Your base map will differ depending on the scale of your data, the questions you are asking, and your area of study. For the purposes of this worksheet lets map the gay guides data. Typically you'd need to geocode this data first, but this data has already been geolocated for you.
First we need to get a basemap. For this example we'll use the `map_data()` function which turns data from the `maps` package into a data frame that is suitable for plotting with ggplot.
(@) Look at the documentation for `map_data()`. Which geographies does this package provide access to?
>
Lets load the base map data for the US.
```{r}
usa <- map_data("state")
```
(@) `map_data()` generates a data frame. Take a look at this data frame, what types of data are included?
>
We can now pass this data to ggplot to create a simple basemap. When we wanted to create a bar plot using `ggplot()` we called `geom_bar`. When we wanted to create a line chart we used `geom_point()` and `geom_line()`. The sample principle applies here and `ggplot()` provides a geom for maps.
```{r}
ggplot() +
geom_map( data = usa, map = usa, aes(long, lat, map_id=region))
```
Now we have a basemap! But what if we want to layer data onto it. Lets add all of the locations in `gayguides` from 1965. First we need to set up our data:
```{r}
data(gayguides)
gayguides <- gayguides %>% filter(Year == 1965)
```
And then we can use the same mapping code from above but this time we're going to add an additional geom -- `geom_point()` which will point to each of our locations from 1965.
```{r}
ggplot() +
geom_map(data = usa, map = usa, aes(long, lat, map_id=region), fill = "lightgray", color = "black") +
geom_point(data = gayguides, mapping = aes(x = lon, y = lat), color = "red")
```
(@) This map looks slightly funny, but that is because the data includes entries outside of the contiguous United States. Try filtering out those entries and mapping this again. Can you change the color or size of the points? Can you add a title?
```{r}
```
(@) Can you map just locations in South Carolina (on a basemap of SC)?
```{r}
```
(@) Create a map that uses your geocoded data from the Boston Women Voters dataset.
```{r}
```
Lets return to the recreational data for a minute.
```{r}
#| eval: false
head(rec.data.coordinates)
```
One interesting way to visualize this map might be to plot each location as a point on the map but to use the total_expenditures values to determine the size of the points.
We can do that by making a small adjustment to the map we made previously. First lets recreate a basic map of all these locations using `ggplot()`
```{r}
ggplot() +
geom_map(data = usa, map = usa, aes(long, lat, map_id=region), fill = "lightgray", color = "black") +
geom_point(data = rec.data.coordinates, mapping = aes(x=longitude, y=latitude))
```
```{r}
ggplot() +
geom_map( data = usa, map = usa, aes(long, lat, map_id=region), fill="white", color="gray") +
geom_point(data = rec.data.coordinates, mapping = aes(x=longitude, y=latitude, size=total_expenditures))
```
---
```{r}
library(readr) #you may have to install it using `install.packages()`.
library(sf)
library(ipumsr) #you may need to install this. If you are on a mac, it may give you warnings. Try loading it to verify installation worked.
library(tidyverse)
#NHGIS data is stored in zip files. R has functions for dealing with these but we still need to download the file to our server. Here we're going to write a function that will create a new directory, download the data, and rename it.
dir.create("data/", showWarnings = FALSE)
get_data <- function(x) {
download.file("https://github.com/regan008/DigitalMethodsData/blob/main/raw/nhgis0005_shape_simplified.zip?raw=true", "data/nhgis_simplified_shape.zip")
download.file("https://github.com/regan008/DigitalMethodsData/blob/main/raw/nhgis0005_csv.zip?raw=true", "data/nhgis_data.zip")
}
get_data()
# Change these filepaths to the filepaths of your downloaded extract
nhgis_csv_file <- "data/nhgis_data.zip"
nhgis_shp_file <- "data/nhgis_simplified_shape.zip"
#load the shape file and then the data file into read_nhgis_sf
nhgis_shp <- read_ipums_sf(
shape_file = nhgis_shp_file
)
nhgis_data <- read_nhgis(nhgis_csv_file)
#Use the ipums join file to join both the data and shape file together.
nhgis <- ipums_shape_full_join(nhgis_data, nhgis_shp, by = "GISJOIN")
#filter nhgis so that the map focuses on the 48 contiguous states.
nhgis <- nhgis %>% filter(STATE != "Alaska Territory" & STATENAM != "Hawaii Territory")
#plot
ggplot(data = nhgis, aes(fill = AZF001)) +
geom_sf()
```
(@) In the code above, why filter out Hawaii and Alaska? Try commenting out that line and rerunning the code. What happens? Why might we want to do this? Why might we not want to do this? How does it shape the interpretation?
>
This is a great start. But using AZF001 (Native born males) as the fill does not create a useful visualization. It doesn't give us a sense of the proportion of that data. There are multiple problems with the map as it is, but one is that the color scale makes this incredibly hard to read. We can fix that by using a scale to break the values of AZF001 into bins and assign each a color. R has a function for this. It comes from the scales pacakge which you may need to install.
```{r}
library(scales)
ggplot(data = nhgis, aes(fill = AZF001)) +
geom_sf() + scale_fill_distiller(name="Native Born Males", palette = "Spectral" , breaks = pretty_breaks(n = 10))
```
This is now much more readable but the numbers represented are simply the raw population count. That may be fine depending on your question but what would be much better, is if we knew what percentage of the total population foreign born males represented. To get that we have to calculate it. The next few questions will walk build on the code above and walk you through doing this.
(@) First, create a variable called total_male_pop, with the total foreign and native born male population by summing the variables AZF001 and AZF003.
```{r}
```
(@) Now, use the total_male_pop variable and create a variable for the the percentage of foreign born males.
```{r}
```
(@) Now map your result. You'll want to replicate the code from the example above, but this time add another layer to the plot - a scale. Here we'll use this scale `scale_fill_continuous("", labels = scales::percent)`
Before you write that code, look up the documentation for the above code (and look at the examples). What does it do?
>
Now create the map:
```{r}
```
### Leaflet
In recent years Leaflet has become the most popular open source Javascript library for mapping. In comparison to `ggplot()` the advantage of leaflet is its interactivity. It allows you to zoom in, have pop ups, etc. While `ggplot()` is a powerful tool for static maps and would be useful for a book or journal article, leaflet would make a useful addition to an online digital component.
Like `ggplot()` it works by layering information on top of a basemap. You begin by loading the library and invoking leaflet.
```{r}
library(leaflet)
my.map <- leaflet()
my.map
```
Now that we've got a leaflet object created, we can add layers to it starting with a basemap.
```{r}
my.map %>% addTiles()
```
Leaflet refers to tiles - these are sort of like base maps. Next we need to add coordinates. In this example, lets use the coordinates for Dr. Regan's office.
```{r}
my.map %>% addTiles() %>% addMarkers(lng=-82.836856, lat=34.678286, popup = "Hardin 004")
```
We could also do this with a data frame. Notice that in this example, we use the leaflet function and call the data inside rather than passing the function coordinates manually. We then use the paste function to build out text for a pop up.
```{r}
leaflet(data=rec.data.coordinates) %>% addTiles() %>% addMarkers(~longitude, ~latitude, popup = paste("The total expenditures in ", rec.data.coordinates$city, ", ", rec.data.coordinates$state, " were ", rec.data.coordinates$total_expenditures, sep=""))
```
(@) Use leaflet to map a dataset of your choice:
```{r}
```
(@) Explain what choices you made in building this map? Why might you use leaflet over ggplot? When would ggplot be more desirable?
>
### Exercises
For the next portion of this worksheet you will use some data about national parks that Dr. Barczewski created. Using this data (link below) you should use ggplot (charts, maps) and other mapping tools to come up with several ways to visualize it. You should try to think about this from the perspective of her larger book project, how could you visualize this data to help her make a compelling argument? See the email I send for more details about her project. Pick a dimension and make maps based on it.
```{r}
parks <- read.csv("https://raw.githubusercontent.com/regan008/DigitalMethodsData/main/raw/parks-geocoded.csv")
```