-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathindex.Rmd
507 lines (361 loc) · 28.5 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
---
pagetitle: "Vector - Raster"
author: "Loïc Dutrieux, Jan Verbesselt, Dainius Masiliunas and David Swinkels"
date: "`r format(Sys.time(), '%Y-%m-%d')`"
output:
rmdformats::html_clean:
title: "Vector - Raster"
highlight: zenburn
---
```{css, echo=FALSE}
@import url("https://netdna.bootstrapcdn.com/bootswatch/3.0.0/simplex/bootstrap.min.css");
.main-container {max-width: none;}
pre {color: inherit; background-color: inherit;}
code[class^="sourceCode"]::before {
content: attr(class);
display: block;
text-align: right;
font-size: 70%;
}
code[class^="sourceCode r"]::before { content: "R Source";}
code[class^="sourceCode python"]::before { content: "Python Source"; }
code[class^="sourceCode bash"]::before { content: "Bash Source"; }
```
<font size="6">[WUR Geoscripting](https://geoscripting-wur.github.io/)</font> <img src="https://www.wur.nl/upload/854757ab-168f-46d7-b415-f8b501eebaa5_WUR_RGB_standard_2021-site.svg" alt="WUR logo" style="height: 35px; margin:inherit;"/>
# Vector - Raster
## Learning objectives
* Plot spatial vector and raster data;
* Transform vector and raster data;
* Write and read spatial vector formats (e.g. KML, GML, GeoJSON and Shapefile);
* Apply basic operations on vector data, such as masking and cropping;
* Be able to extract raster data using vector data.
## Introduction
In the previous tutorials we saw how to deal with [raster](http://geoscripting-wur.github.io/IntroToRaster/) data using R. This tutorial is an introduction on how to handle vector data in R, as well as how to handle the combination of vector and raster data.
### Some packages for working with vector and raster data in R
In this tutorial we will use the *sf* package. This package focuses solely on vector data. It provides a standardized encoding of vector data and uses GDAL to read and write data, GEOS for geometrical operations and PROJ for projection conversions and datum transformations.
The GDAL library is well-documented (http://gdal.org/), but with a catch for R and Python programmers. The GDAL (and associated OGR) library and command line tools are all written in C and C++. Bindings are available that allow access from a variety of other languages including R and Python but the documentation is all written for the C++ version of the libraries. This can make reading the documentation rather challenging. Fortunately, the *sf* package, providing GDAL bindings in R, is also well documented with lots of examples. The same is valid for the Python libaries.
Thus, functionality that you commonly find in expensive GIS software is also available within R, using free but very powerful software libraries. Here is [handy 'cheatsheet'](https://github.com/rstudio/cheatsheets/blob/master/sf.pdf) for spatial operations with *sf*. The functions of the *sf* package are prefixed by `st_`, short for 'spatial type'.
The possibilities are huge. In this course we can only scratch the surface with some essentials, which hopefully invites you to experiment further and use them in your research. Details can be found in the book *Applied Spatial Data Analysis with R* and several vignettes authored by Roger Bivand, Edzer Pebesma and Virgilio Gomez-Rubio. This book can be accessed for free through the following [link](https://link.springer.com/book/10.1007/978-1-4614-7618-4)!
### Raster and vector integration and conversion
Historically, the first package handling vectors in R was the *sp* package, and the first package handling rasters was the *raster* package. They had cross-integration so that you could perform operations such as cropping a raster by a vector or extracting raster information over a vector location. However, the *sp* package got deprecated and *sf* was made as its successor, with much easier handling of the data (as a regular *data.frame*). Many of the packages that previously handled *sp* objects, including *raster*, got updated to also handle *sf* objects. Next, the *stars* package was developed by the creators of the *sf* package as a means of having multidimensional rasters. However, the older *raster* package was still the go-to solution for raster handling in R. Finally, as you noticed in a previous tutorial, in 2020 the *raster* package was deprecated and replaced with the *terra* package, which is a much faster C++ version of *raster*, but it also includes its own definition of vector data.
The result is that some packages provide or work with *sp*, *sf* and *terra* vectors, and some provide or work with *raster*, *stars* and *terra* rasters. This can be quite confusing. Therefore, in the course we currently use only the *sf* package for handling vectors and the *terra* package for handling rasters. The workflow is then to convert any object that is not *sf* or *terra* into *sf* or *terra*, do processing in *sf* and/or *terra*, and optionally convert the objects back if integration with some other package is necessary (e.g. `sf` to `SpatVector` for use in *terra* in some cases). Below is a matrix showing how to convert the various objects.
| From/To | *sf* | *terra* | *raster* |
|-------- |------------- |---------- |---------- |
| *sp* `SpatialPoints`/`SpatialLines`/`SpatialPolygons`| `st_as_sf()` | `vect()` | None |
| *sf* `sf data.frame` | None | `vect()` | None |
| *terra* `SpatVector` | `st_as_sf()` | None | None |
| *raster* `RasterLayer`/`RasterStack`/`RasterBrick` | None | `rast()` | None |
| *stars* `stars` | None | `rast()` | None |
| *terra* `SpatRaster` | None | None | `brick()` |
| *raster* `extent` | `st_bbox()` | `ext()` | None |
| *sf* `bbox` | None | `ext()` | `extent()` |
| *terra* `SpatExtent` | `st_bbox()` | None | None so far|
The matrix above shows lossless conversions, also known as *casting* or *coercing* data types. You can also convert data from raster to vector format and vice-versa. However, whenever you start converting between rasters and vectors, you should wonder whether you are taking the right approach to solve your problem. An approach that does not involve converting your data from vector to raster or the opposite should almost always be preferred.
As a result, because these functions are only useful for some very particular situations, we only give a brief description of them below.
- Vector to raster:
- There is one function that allows to convert a vector object to a raster object. It is the `rasterize()` function.
- Raster to vector:
- Three functions allow to convert raster data to vector: `as.points()`, `as.contour()` and `as.polygons()` produce `SpatVector` (*terra*) objects. The conversion to polygons can be useful to convert the result of a classification. In that case, set `dissolve =` to `TRUE` so that the polygons with the same attribute value will be dissolved into multi-polygon regions.
### Geometric operations
Raw raster data do not usually conform to any notion of administrative or geographical boundaries. Vector data (and extents) can be used to mask or crop data to a desired region of interest.
- Crop:
- Cropping consists in reducing the extent of a spatial object to a smaller extent. As a result, the output of `crop()` will automatically be rectangular and will not consider any features such as polygons to perform the subsetting. It is often useful to crop data as tightly as possible to the area under investigation to reduce the amount of data and have a more focused view when visualizing the data.
- `crop()` uses objects of class `SpatExtent` to define the new extent, or any object that can be coerced to an extent (see `?ext` for more info on this). This means that practically all spatial objects (raster or vector) can be used directly in crop. Considering two rasters `r1` and `r2` with `r2` smaller than `r1`, you can simply use `crop(r1, r2)` in order to crop `r1` to the extent of `r2`.
- You can easily define an extent interactively (by clicking) thanks to the `draw()` function.
- Mask:
- `mask()` can be used with almost all spatial objects to mask (= set to `NA`) values of a raster object. When used with a polygon object, `mask()` will keep values of the raster overlayed by polygons and mask the values outside of polygons.
- Note the very useful `inverse =` argument of `mask()`, which allows to mask the inverse of the area covered by the features. We will use this feature of mask later in the tutorial to exclude water areas of a raster, defined in an independent polygon object.
- Buffer:
- `buffer()` Calculate a buffer around all cells that are not NA in a SpatRaster, or around the geometries of a SpatVector (currently only implemented for points).
- Note that the distance unit of the buffer width parameter is meters if the CRS is (+proj=longlat), and in map units (typically also meters) if not.
- See `?buffer` for more info.
- Extract:
- The most common operation when combining vector and raster data is the extraction, using the function `extract()`. It simply consists in extracting the values of a raster object for locations specified by a vector object (e.g. points).
- When using `extract()` with polygons or lines, individual features of the vector layer may overlay or intersect several pixels. In that case a function (`fun =`) can be used to summarize the values into one. Note that although most often the functions `min`, `max`, `mean` or `median` are used for the spatial aggregation, also any custom-made function can be used. The result is a matrix of raster values at the given locations. See `?extract()` example section.
# Visualise spatial vector layer together with a Landsat 8 image from Wageningen
Step by step we will:
1. Download the Landsat 8 data of Wageningen;
2. Download and prepare administrative boundary data of the Netherlands;
3. Download water area data of Wageningen;
4. Mask the data to match the boundaries of the city;
5. Mask the data to exclude water bodies.
## Prepare the data
```{r}
# Install necessary packages and load them
if(!"terra" %in% installed.packages()){install.packages("terra")}
if(!"sf" %in% installed.packages()){install.packages("sf")}
library(terra)
library(sf)
# Create data directory if it does not yet exist
if (!dir.exists("data")) {
dir.create("data")
}
# Download to data directory
download.file(url = 'https://github.com/GeoScripting-WUR/VectorRaster/releases/download/tutorial-data/landsat8.zip', destfile = 'data/landsat8.zip')
# Unzip to data directory
unzip('data/landsat8.zip', exdir = "data")
# Identify the right file
landsatPath <- list.files(path = "data/", pattern = glob2rx('LC8*.tif'), full.names = TRUE)
wagLandsat <- rast(landsatPath)
```
We can start by visualizing the data. Since it is a multispectral image, we can use `plotRGB()` to do so.
```{r, fig.align='center'}
# plotRGB does not support negative values, so we remove them
wagLandsat[wagLandsat < 0] <- NA
# Band names can be changed here
names(wagLandsat) <- c("band1", "band2", "band3", "band4", "band5", "band6", "band7")
# Select which bands to assign to the red, green, and blue colour channels
plotRGB(wagLandsat, 5, 4, 3)
```
We have chosen to visualize the Landsat image as a false color composite, meaning that the chosen bands do not match the RGB channels. Indeed, we have plotted the near-infrared band as red, the red as green, and the green as blue.
```{r}
# Download and unzip the Dutch municipalities boundaries
download.file(url = 'https://github.com/GeoScripting-WUR/Scripting4GeoIntro/releases/download/data/data.zip', destfile = 'data/GADM.zip')
unzip('data/GADM.zip', exdir = "data")
# Load level 2 dataset
nlCitySf <- st_read("data/gadm41_NLD_2.json")
# Investigate the structure of the object
head(nlCitySf)
```
It seems that the municipality names are in the `NAME_2` column. So we can subset the `sf data.frame` to the city of Wageningen alone. To do so we can use simple data frame manipulation/subsetting syntax.
```{r}
# Remove rows with NA
nlCitySf <- nlCitySf[!is.na(nlCitySf$NAME_2),]
# Filter Wageningen
wagContour <- nlCitySf[nlCitySf$NAME_2 == 'Wageningen',]
```
We can use the resulting `wagContour` object, to mask the values out of Wageningen, but first, since the two objects are in different coordinate systems, we need to reproject the projection of one to the other.
```{block, type="alert alert-success"}
> **Question 1:** Would you rather reproject a raster or a vector layer? Give two reasons why you would choose to reproject a raster or vector.
```
```{r, message=FALSE}
# Get the target CRS from the wagLandsat raster object
targetCRS <- st_crs(wagLandsat)
# Use sf to transform wagContour to the same projection as wagLandsat
wagContourUTM <- st_transform(wagContour, targetCRS)
```
Now that the two objects are in the same CRS, we can do the masking and visualize the result. Let's first crop and then mask, to see the difference.
## Crop, mask and visualise
```{r, fig.align='center', fig.show='hold'}
wagLandsatCrop <- crop(wagLandsat, wagContourUTM)
wagLandsatSub <- mask(wagLandsat, wagContourUTM)
# Set graphical parameters (one row and two columns)
opar <- par(mfrow = c(1, 2))
plotRGB(wagLandsatCrop, 5, 4, 3)
plotRGB(wagLandsatSub, 5, 4, 3)
plot(st_geometry(wagContourUTM), add = TRUE, border = 'green', col = 'transparent', lwd = 3) # set fill colour to transparent
# Reset graphical parameters
par(opar)
```
In the figure above, the left panel displays the output of `crop`, while the second panel shows the result of masking the Landsat scene using the contour of Wageningen as input.
We also have a water mask of Wageningen in vector format. Let's download it and also reproject it to the CRS of the Landsat data.
```{block type="alert alert-info"}
Important functions are `st_read` and `st_write`. These are very powerful functions that enable reading and writing simple features or layers from a file or database.
```
```{r}
# Download and unzip files
download.file(url = 'https://github.com/GeoScripting-WUR/VectorRaster/releases/download/tutorial-data/wageningenWater.zip', destfile = 'data/wageningenWater.zip')
unzip('data/wageningenWater.zip', exdir = "data")
# Load the Water Shapefile directly as an sf object
water <- st_read('data/Water.shp')
names(water) # check attribute columns
# Transform the water object to match the projection of our other data
waterUTM <- st_transform(water, targetCRS)
```
Note the use of `inverse = TRUE` in the code below, to *mask* the pixels that intersect with the features of the vector object.
```{r, fig.align='center'}
# Mask the Landsat data
wagLandsatSubW <- mask(wagLandsatSub, mask = waterUTM, inverse = TRUE)
# Plot
plotRGB(wagLandsatSubW, 5, 4, 3)
plot(st_geometry(waterUTM), col = 'lightblue', add = TRUE, border = '#3A9AF0', lwd = 1) # use a site such as https://htmlcolorcodes.com/ to find the hexidecimal code for colours
```
Also, some of our friends want these exact data too (e.g. the `water` polygon object).
One friend of ours is a software engineer and he wants a GeoJSON. Another friend is a GIS-analyst in QGIS and as a backup he wants the file in Geographic Markup Language (GML). These fileformats (GeoJSON and GML, but also KML and Shapefile) are commonly used in spatial analysis. Let’s try to give them the files in those formats!
You can try for yourself and e.g. start by converting them to KML and opening them in [Google My Maps](https://mymaps.google.com/).
```{r, eval=FALSE}
# Create output directory if it does not yet exist
if (!dir.exists("output")) {
dir.create("output")
}
# Export the simple feature to a KML.
try(st_write(water, "output/water.kml", driver = "kml", delete_dsn = ifelse(file.exists("output/water.kml"), TRUE, FALSE)))
# Note that the code above checks for existence of the KML file and sets the delete_dsn option to true or false accordingly
```
You can now open the created `water.kml` file in [Google My Maps](https://mymaps.google.com/) or *Google Earth Pro*.
You can also try out other formats like e.g. `GeoJSON`. Another option for visualisation is an interactive map using [mapview](https://r-spatial.github.io/mapview/index.html), which in its turn is based on [leaflet](https://rstudio.github.io/leaflet/). The output of the simple example below can be viewed [here](mapview.html).
```{r, eval=FALSE, warning=FALSE}
# Install and load library
if(!"mapview" %in% installed.packages()){install.packages("mapview")}
if(!"webshot" %in% installed.packages()){install.packages("webshot")}
library(mapview)
# Plot the water polygons
map <- mapview(water)
map
# Save to file
mapshot(map, url = 'mapview.html')
```
# Extract raster values along a transect
Another use of the `extract()` function can be to visualize or analyse data along transects. In the following example, we will run a transect across Belgium and visualize the change in elevation.
Let's first download the elevation data of Belgium:
```{r}
# Download to data directory
download.file(url = 'https://github.com/GeoScripting-WUR/VectorRaster/releases/download/tutorial-data/belgium_elev.zip', destfile = 'data/belgium_elev.zip')
# Unzip to data directory
unzip('data/belgium_elev.zip', exdir = "data")
# Download data
bel <- rast('data/belgium_elev.tif')
# Display metadata
bel
```
We can start by visualizing the data.
```{r, fig.align='center'}
plot(bel)
```
Everything seems correct.
We want to look at a transect, which we can draw by hand by selecting two points by clicking. The `draw('line')` function will help us do that. Once you run the function, you will be able to click in the plotting window of R (The `bel` object should already be present in the plot panel before running `draw('line')`). Press *esc* once you have selected the two extremities of the line.
```{r, eval=FALSE}
line <- draw('line')
```
```{r, echo=FALSE, fig.align='center'}
line <- vect(readRDS('data/line.rds'))
plot(bel)
plot(line, add=TRUE)
```
Let's first write this line to a file, using a few different file formats. Note that `draw()` returns a SpatVector. That means we cannot (directly) use `st_write()` here like we did before. Using the overview table at the start of this tutorial, we can change that using `st_as_sf()`. After writing to disk, you might try opening these new files in R, or in different software, such as QGIS or ArcGIS.
```{r, eval=FALSE}
# To a GeoJSON
st_write(st_as_sf(line), 'output/line.geojson')
# To an ESRI Shapefile
st_write(st_as_sf(line), 'output/line.shp')
```
Now the elevation values can simply be extracted using `extract()`.
```{r}
alt <- extract(bel, line)
```
We can plot the result as follows.
```{r, fig.align='center'}
plot(alt$BEL_msk_alt, type = 'l', ylab = "Altitude (m)")
```
```{block, type="alert alert-success"}
> **Question 2:** Take a look at the x-axis of the graph above. What unit corresponds to the given values? Can you think of interpretation problems when following the method we apply here?
```
# Extra
Check out [this overview](https://mhallwor.github.io/_pages/MakingMaps) of examples for creating static and interactive maps in R, making use of packages like [mapview](https://r-spatial.github.io/mapview/) and [leaflet](https://rstudio.github.io/leaflet/).
<!-- In order to make an index for the x axis, we can calculate the distance between the two extremities of the transect, using `distHaversine()` from the *geosphere* package. -->
<!-- ```{r} -->
<!-- # check for the geosphere package and install if missing -->
<!-- if(!"geosphere" %in% rownames(installed.packages())){install.packages("geosphere")} -->
<!-- ``` -->
<!-- ```{r, eval=TRUE} -->
<!-- library(geosphere) -->
<!-- ## convert line to sf and assign the crs -->
<!-- line2 <- st_as_sf(line) -->
<!-- st_crs(line2) <- 4326 -->
<!-- ## Use sf to calculate the Great Circle distance between the two ends of the line -->
<!-- dist_sf <- st_length(line2) -->
<!-- ## Use geosphere to calculate the Great Circle distance between the two ends of the line -->
<!-- start <- sf::st_coordinates(line2)[1,1:2] ## get the coordinate pairs for the start and end of the line -->
<!-- end <- sf::st_coordinates(line2)[2,1:2] -->
<!-- dist_geosphere <- distHaversine(start,end) -->
<!-- ## print the results -->
<!-- writeLines(c(paste0(as.integer(dist_sf), "m - sf Great Circle [WGS coords]"), paste0(as.integer(dist_geosphere), "m - geosphere Great Circle [WGS coords]"))) -->
<!-- ``` -->
<!-- ```{block, type="alert alert-success"} -->
<!-- > **Question 2:** Why is there a difference between the line lengths calculated by *sf* and *geosphere*? -->
<!-- *Hint:* Look at the help page of the `distHaversine()` function. Also see the `distVincentyEllipsoid()` function. Does this one provide a more accurate distance? -->
<!-- ``` -->
<!-- Note that there is a small approximation on the position of the line and the distances between the samples as the real shortest path between two points is not a straight line in lat-long, while the distance we just calculated is the shortest path. For short distances, as in the example, this is acceptable. -->
<!-- Otherwise, we could also have projected the raster and the line to a projected coordinate system. -->
<!-- Calculating long distances accurately on an imperfect spheroid is very tricky. The *geosphere distHaversine()* assumes a perfectly spherical projection. Take care when using these functions that you understand how the distance is being calculated and what implications that may have for your work. -->
<!-- We will continue using the distance calculated by *sf*. -->
<!-- ```{r, eval = TRUE} -->
<!-- ## Format an array for use as x axis index with the same length as the alt[[1]] array -->
<!-- distanceVector <- seq(0, as.numeric(dist_sf), along.with = alt[[1]]) -->
<!-- ``` -->
<!-- Let's now visualize the final output. -->
<!-- ```{r, fig.align='center', fig.show='hold', eval = TRUE} -->
<!-- ## Visualize the output -->
<!-- plot(bel, main = 'Altitude (m)') -->
<!-- plot(line, add = TRUE) -->
<!-- plot(distanceVector/1000, alt[[1]], type = 'l', -->
<!-- main = 'Altitude transect Belgium', -->
<!-- xlab = 'Distance (Km)', -->
<!-- ylab = 'Altitude (m)', -->
<!-- las = 1) -->
<!-- ``` -->
<!-- # Extract raster values randomly using the `sampleRandom()` function -->
<!-- Below is an extra example and a simplification of the random sample demo shown [here](https://geoscripting-wur.github.io/Scripting4Geo/). We will use the `sampleRandom()` function to randomly sample altitude information from a DEM of Belgium: -->
<!-- ```{r} -->
<!-- # You can choose your own country here -->
<!-- bel <- getData('alt', country='BEL', mask=TRUE) ## SRTM 90m height data -->
<!-- belAdmin <- getData('GADM', country='BEL', level=2) ## administrative boundaries -->
<!-- ## Sample the raster randomly with 40 points -->
<!-- sRandomBel <- sampleRandom(bel, na.rm=TRUE, sp=TRUE, size = 40) -->
<!-- ## Create a data.frame containing relevant info -->
<!-- sRandomData <- data.frame(altitude = sRandomBel@data[[1]], -->
<!-- latitude = sRandomBel@coords[,'y'], -->
<!-- longitude = sRandomBel@coords[,'x']) -->
<!-- ``` -->
<!-- ```{r, fig.align='center', fig.show='hold'} -->
<!-- ## Plot -->
<!-- plot(bel) -->
<!-- plot(belAdmin, add=TRUE) -->
<!-- plot(sRandomBel, add = TRUE, col = "red") -->
<!-- ## Plot altitude versus latitude -->
<!-- plot(sRandomData$latitude,sRandomData$altitude, ylab = "Altitude (m)", xlab = "Latitude (degrees)") -->
<!-- ``` -->
<!-- ```{block, type="alert alert-success"} -->
<!-- > **Question 3**: If you would like to sample height data per province randomly, what would you need to do? -->
<!-- ``` -->
<!-- ### Build a calibration dataset in Google Maps -->
<!-- Below we'll see how we can deal with a calibration dataset in R, for calibrating a classifier. It involves working with `sf data.frame` classes, extracting reflectance data for the corresponding calibration samples, manipulating data frames and building a model that can be used to predict land cover. But first we need to build the calibration dataset (ground truth), and for that we will use Google Maps. -->
<!-- To begin with, make a KML of the study area extent which we will use as a guide. -->
<!-- ```{r, echo=FALSE} -->
<!-- # get bounding box of cropped raster and export it as KML -->
<!-- wagLandsatCropBbox <- sf::st_as_sfc(st_bbox(wagLandsatCrop)) -->
<!-- st_write(wagLandsatCropBbox, "wagLandsatCropBbox.kml", driver = "kml", delete_dsn=TRUE) -->
<!-- ``` -->
<!-- Open [Google My Maps](https://www.google.com/maps/about/mymaps/), click *get started*, login on your Google account, create a new map by clicking on the *+* sign, name the map training_landcover, under *Untitled layer* click *Import* and find the KML file that you just created. You will see a rectangle of the study area appear on the map with the name *wagLandsatCropBbox*, click *Add layer*, name the new, untitled layer *landcover_points*, change the basemap to a satellite map, click *Add marker*, draw points on top of a few landcover types, and name the point as the land cover type. Keep your points within the bounding box (the *wagLandsatCropBbox* layer), otherwise they are out of the extent of the Landsat tile. Keep it to a few classes, such as `agriculture`, `forest`, `water`, `urban`. When you are done (15 - 30 points, with at least 5 points for each class), export the file to KML. -->
<!-- **Note:** in the approach described above, we get to decide where the calibration samples are. Another approach would be to automatically generate randomly distributed samples. This can be done very easily in R using the `sf::st_sample()` function, which automatically returns a `sfc` (simple feature collection) object of any given size. Options for the `sf::st_sample()` include `type = regular` (for regular sampling) and `by_polygon = TRUE` for stratified sampling, to ensure that all classes (of a landcover classification) are equally represented in the sample using `multipolygon` delimiters. [See the documentation](https://www.rdocumentation.org/packages/sf/versions/0.9-6/topics/st_sample) for a more detailed explanation. -->
<!-- Load the newly created KML file using the `st_read()` function. -->
<!-- ```{r} -->
<!-- samples <- sf::st_read(dsn = './data/sampleLandcover.kml') -->
<!-- ``` -->
<!-- Okay, nice, the data has been read as a simple feature. Re-project the object to the CRS of the Landsat data. -->
<!-- ```{r} -->
<!-- ## Re-project sf data.frame -->
<!-- samplesUTM <- sf::st_transform(samples, targetCRS) -->
<!-- ``` -->
<!-- ### Calibrate the classifier -->
<!-- ```{r} -->
<!-- ## Extract the surface reflectance -->
<!-- calib <- raster::extract(wagLandsatCrop, samplesUTM, df=TRUE) ## df=TRUE i.e. return as a data.frame -->
<!-- ## Combine the newly created data.frame to the description column of the calibration dataset -->
<!-- calib2 <- cbind(samplesUTM$Name, calib) -->
<!-- ## Change the name of the first column, for convenience -->
<!-- colnames(calib2)[1] <- 'lc' -->
<!-- ## Make the lc column into a factor -->
<!-- calib2$lc <-as.factor(calib2$lc) -->
<!-- ## Inspect the structure of the data.frame -->
<!-- str(calib2) -->
<!-- ``` -->
<!-- **Note:** the use of `df = TRUE` in the `extract()` call is so that we get a data frame in return. Data frame is the most common class to work with all types of models, such as linear models (`lm()`) or random forest models as we use later. -->
<!-- Now we will calibrate a random forest model using the extracted data frame. -->
<!-- Do not focus too much on the algorithm used, the important part for this tutorial is the data extraction and the following data frame manipulation. More details will come about random forest classifiers tomorrow. -->
<!-- ```{r, message = FALSE} -->
<!-- if(!require(randomForest)) { -->
<!-- install.packages("randomForest") -->
<!-- } -->
<!-- library(randomForest) -->
<!-- ## Calibrate model -->
<!-- model <- randomForest(lc ~ band1 + band2 + band3 + band4 + band5 + band6 + band7, data = calib2) -->
<!-- ## Use the model to predict land cover -->
<!-- lcMap <- predict(wagLandsatCrop, model = model) -->
<!-- ``` -->
<!-- Let's visualize the output. The function `levelplot()` from the rasterVis package is a convenient function to plot categorical raster data. -->
<!-- ```{r, fig.align='center'} -->
<!-- library(rasterVis) -->
<!-- levelplot(lcMap, main = "Landcover Map of Wageningen", col.regions = c('lightgreen', 'darkgreen', 'orange', 'blue')) -->
<!-- ``` -->
<!-- OK, we've seen better land cover maps of Wageningen, but given the amount of training data we used (22 in my case), it is not too bad. A larger calibration dataset would certainly result in a better accuracy. -->