Skip to content

Commit

Permalink
add all content
Browse files Browse the repository at this point in the history
  • Loading branch information
Francisco Rowe authored and Francisco Rowe committed Jul 4, 2022
1 parent 59a780c commit 7ef3ec7
Show file tree
Hide file tree
Showing 22 changed files with 20,110 additions and 378 deletions.
14 changes: 12 additions & 2 deletions 01-gds.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,12 @@ options(htmltools.dir.version = FALSE)
}
```

```{marginfigure}
[**Back**](index.html) \
[**Next**](02-spatial-data.html)
```

# Rise of data

We are experiencing a data revolution. Technological advances in computational power, storage and network platforms have enabled the emergence of *Big Data*. These technological innovations have facilitated the production, processing, analysis and storage of large volumes of digital data. Information that previously could not be stored, or used to be captured using analog devices can now be recorded digitally. We can now digitally generate, store, manage and analyse data that were previously very challenging to access, such as books, newspapers, photographs and art work. Mobile phones, social media platforms, satellites, emails, smart cards, CCTV and The Internet have all led to the current data revolution we are living in.
Expand All @@ -40,9 +46,13 @@ Arribas-Bel, Dani, Mark Green, Francisco Rowe, and Alex Singleton. 2021. “Open

# Geographic Data Science

Geographic data science is a subfield of research in geography and sits at the intersection between geography and data science (Singleton and Arribas-Bel, 2019). Geographic data science entails a bidirectional relationship between geography and data science. Geographic data science argues for the benefits of *Geography* for *Data Science* to address spatially explicit problems, especially because much *Big Data* are spatial. At the same time, *Geography* has much to gain from *Data Science*, particularly in the methodological and technical aspects of working with *Big Data*.
```{marginfigure}
Singleton, A., and Arribas-Bel, D. 2019. “Geographic Data Science.” *Geographical Analysis*. 53 (1): 61–75. https://doi.org/10.1111/gean.12194.
```

Geographic data science is a subfield of research in geography and sits at the intersection between geography and data science (Singleton and Arribas-Bel, 2019). Geographic data science entails a bidirectional relationship between geography and data science. Geographic data science argues for the benefits of *Geography* for *Data Science* to address spatially explicit problems, especially because much *Big Data* are spatial. Explicitly consideration of space is not adding an additional variable in regression models, but understanding the conceptual and methodological complexities and understanding of geographical context, such as *spatial autocorrelation*, *spatial non-stationarity*, *spatial heterogeneity* and *local contextual contingencies* (see Rowe and Arribas-Bel 2022). At the same time, *Geography* has much to gain from *Data Science*, particularly in the methodological and technical aspects of working with *Big Data*.

```{marginfigure}
Singleton, Alex, and Daniel Arribas-Bel. 2019. “Geographic Data Science.” *Geographical Analysis*. 53 (1): 61–75. https://doi.org/10.1111/gean.12194.
Rowe, F. and Arribas-Bel, D. 2022. “Spatial Modelling for Data Scientists.” https://doi.org/10.17605/OSF.IO/8F6XR.
```

34 changes: 23 additions & 11 deletions 01-gds.html
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@

<meta name="author" content="Francisco Rowe (@fcorowe)" />

<meta name="date" content="2022-06-26" />
<meta name="date" content="2022-07-02" />

<title>What is geographic data science?</title>

Expand Down Expand Up @@ -311,7 +311,7 @@

<h1 class="title toc-ignore">What is geographic data science?</h1>
<h4 class="author">Francisco Rowe (<a href="http://twitter.com/fcorowe"><code>@fcorowe</code></a>)</h4>
<h4 class="date">2022-06-26</h4>
<h4 class="date">2022-07-02</h4>



Expand All @@ -321,6 +321,8 @@ <h4 class="date">2022-06-26</h4>
border: 3px #000000;
}
</style>
<p><label for="tufte-mn-1" class="margin-toggle">⊕</label><input type="checkbox" id="tufte-mn-1" class="margin-toggle"><span class="marginnote"><span style="display: block;"><a href="index.html"><strong>Back</strong></a><br />
</span> <span style="display: block;"><a href="02-spatial-data.html"><strong>Next</strong></a></span></span></p>
<div id="rise-of-data" class="section level1">
<h1>Rise of data</h1>
<p>We are experiencing a data revolution. Technological advances in
Expand All @@ -334,7 +336,7 @@ <h1>Rise of data</h1>
newspapers, photographs and art work. Mobile phones, social media
platforms, satellites, emails, smart cards, CCTV and The Internet have
all led to the current data revolution we are living in.</p>
<p><label for="tufte-mn-1" class="margin-toggle">⊕</label><input type="checkbox" id="tufte-mn-1" class="margin-toggle"><span class="marginnote"><span style="display: block;">Rowe, F. 2021.
<p><label for="tufte-mn-2" class="margin-toggle">⊕</label><input type="checkbox" id="tufte-mn-2" class="margin-toggle"><span class="marginnote"><span style="display: block;">Rowe, F. 2021.
<a href="https://doi.org/10.31235/osf.io/phz3e">Big Data and Human
Geography</a>. In: Demeritt, D. and Lees L. (eds) <em>Concise
Encyclopedia of Human Geography</em>. Edward Elgar Encyclopedias in the
Expand All @@ -355,25 +357,35 @@ <h1>Rise of data</h1>
and design of specialised methods, software and expert knowledge, and
linkage to other data sources, in order to use most <em>Big Data</em>
sources (Arribas-Bel et al., 2021).</p>
<p><label for="tufte-mn-2" class="margin-toggle">⊕</label><input type="checkbox" id="tufte-mn-2" class="margin-toggle"><span class="marginnote"><span style="display: block;">Arribas-Bel, Dani, Mark
<p><label for="tufte-mn-3" class="margin-toggle">⊕</label><input type="checkbox" id="tufte-mn-3" class="margin-toggle"><span class="marginnote"><span style="display: block;">Arribas-Bel, Dani, Mark
Green, Francisco Rowe, and Alex Singleton. 2021. “Open Data Products-a
Framework for Creating Valuable Analysis Ready Data.” <em>Journal of
Geographical Systems</em>. 23 (4): 497–514.</span></span></p>
Geographical Systems</em>. 23 (4): 497–514.
<a href="https://doi.org/10.1007/s10109-021-00363-5" class="uri">https://doi.org/10.1007/s10109-021-00363-5</a></span></span></p>
</div>
<div id="geographic-data-science" class="section level1">
<h1>Geographic Data Science</h1>
<p><label for="tufte-mn-4" class="margin-toggle">⊕</label><input type="checkbox" id="tufte-mn-4" class="margin-toggle"><span class="marginnote"><span style="display: block;">Singleton, A., and
Arribas-Bel, D. 2019. “Geographic Data Science.” <em>Geographical
Analysis</em>. 53 (1): 61–75. <a href="https://doi.org/10.1111/gean.12194" class="uri">https://doi.org/10.1111/gean.12194</a>.</span></span></p>
<p>Geographic data science is a subfield of research in geography and
sits at the intersection between geography and data science (Singleton
and Arribas-Bel, 2019). Geographic data science entails a bidirectional
relationship between geography and data science. Geographic data science
argues for the benefits of <em>Geography</em> for <em>Data Science</em>
to address spatially explicit problems, especially because much <em>Big
Data</em> are spatial. At the same time, <em>Geography</em> has much to
gain from <em>Data Science</em>, particularly in the methodological and
technical aspects of working with <em>Big Data</em>.</p>
<p><label for="tufte-mn-3" class="margin-toggle">⊕</label><input type="checkbox" id="tufte-mn-3" class="margin-toggle"><span class="marginnote"><span style="display: block;">Singleton, Alex, and
Daniel Arribas-Bel. 2019. “Geographic Data Science.” <em>Geographical
Analysis</em>. 53 (1): 61–75. <a href="https://doi.org/10.1111/gean.12194" class="uri">https://doi.org/10.1111/gean.12194</a>.</span></span></p>
Data</em> are spatial. Explicitly consideration of space is not adding
an additional variable in regression models, but understanding the
conceptual and methodological complexities and understanding of
geographical context, such as <em>spatial autocorrelation</em>,
<em>spatial non-stationarity</em>, <em>spatial heterogeneity</em> and
<em>local contextual contingencies</em> (see Rowe and Arribas-Bel 2022).
At the same time, <em>Geography</em> has much to gain from <em>Data
Science</em>, particularly in the methodological and technical aspects
of working with <em>Big Data</em>.</p>
<p><label for="tufte-mn-5" class="margin-toggle">⊕</label><input type="checkbox" id="tufte-mn-5" class="margin-toggle"><span class="marginnote"><span style="display: block;">Rowe, F. and
Arribas-Bel, D. 2022. “Spatial Modelling for Data Scientists.”
<a href="https://doi.org/10.17605/OSF.IO/8F6XR" class="uri">https://doi.org/10.17605/OSF.IO/8F6XR</a>.</span></span></p>
</div>


Expand Down
138 changes: 15 additions & 123 deletions 02-spatial-data.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,12 @@ options(htmltools.dir.version = FALSE)
}
```

```{marginfigure}
[**Back**](01-gds.html) \
[**Next**](03-spatial_weights.html)
```

# Fundamental Geographic Data Structures

Three main structures are generally used to organise geographic data:
Expand Down Expand Up @@ -71,137 +77,23 @@ For more advanced map making, use dedicated visualisation packages such as `tmap
```{r}
plot(oa_shp$geometry)
```
We can thematically colour any attributes in the spatial data frame based on a column by passing the name of that column to the plot function. We map the share of unemployed population. We can adjust the key or legend position (`key.pos`), plot axes (`axes`), length of the scale bar (`key.length`), thickness/width of the scale bar (`key.width`), method or number to break the data attribute (`breaks`), line width (`lwd`) and colour of polygon borders (`border`).

```{r}
oa_cents = st_centroid(oa_shp)
plot(st_geometry(oa_cents))
```



```{r}
plot(oa_shp["unemp"])
```

# Spatial Data is *Special*

- Type of data
- Distinctive attributes
- Challenges

# Traditional data

Attributes:

- Collected for a purpose
- Granular information (deep)
- High quality

Challenges:

- Costly - resource intensive
- Coarse aggregations
- Temporally slow

# New forms of data

```{marginfigure}
Rowe, F. 2021. [Big Data and Human Geography](https://doi.org/10.31235/osf.io/phz3e). In: Demeritt, D. and Lees L. (eds) ConciseEncyclopedia of Human Geography. Edward Elgar Encyclopedias in the Social Sciences series.
plot(oa_shp["unemp"], key.pos = 4, axes = TRUE, key.width = lcm(1.3), key.length = 1., breaks = "jenks", lwd = 0.1, border = 'grey')
```

## Spatial Data types

```{marginfigure}
Rowe, F. Arribas-Bel, D. 2021. [Spatial Modelling for Data Scientists](https://gdsl-ul.github.io/san/).
```

Different classifications of spatial data types exist. Knowing the structure of the data at hand is important to think of appropriate analytical methods.

![Fig. 1. Data Types.](./figs/datatypes.png) Area / Lattice data source: [Önnerfors et al. (2019)](https://www.scribd.com/document/428488199/Eurostat-regional-yearbook-2019). Point data source: [Tao et al. (2018)](https://doi.org/10.1016/j.trc.2017.11.005). Flow data source: [Rowe and Patias (2020)](https://doi.org/10.1080/21681376.2020.1811139). Trajectory data source: [Kwan and Lee (2004)](http://www.meipokwan.org/Paper/Best_2003.pdf).

*Lattice/Areal Data*

- Correspond to records of attribute values (e.g. population counts) for a fixed geographical area.
- Regular (e.g. grids or pixels) or irregular shapes (e.g. states, counties or travel-to-work areas).

*Point Data*

- Records of the geographic location of an discrete event.\
- Number of occurrences of geographical process at a given location.

*Flow Data*

- Records of measurements for a pair of geographic point locations or pair of areas.\
- Capture the linkage or spatial interaction between two locations.

*Trajectory Data*

- Records of moving objects at various points in time.
- Composed of a single string of data recording the geographic location of an object at various points in time.
Various types of geometries (i.e. lines, points and polygons) exist. We can transform vector data into points by running:

# Hierarchical Structure of Data

Smaller geographical units are organised within larger geographical units.

```{r, echo=FALSE, warning=FALSE, message=FALSE, results="hide"}
library(sf)
oa_shp <- st_read("./data/Liverpool_OA.shp")
```

```{r, echo=FALSE}
head(oa_shp[,1:4])
```

# Key Challenges

Major challenges exist when working with spatial data.

## Modifible Area Unit Problem (MAUP)

The MAUP represents a challenge that has troubled geographers for decades.

Two aspects of the MAUP are normally recognised in empirical analysis:

- *Scale*. The idea that a geographical area can be divided into geographies with differing numbers of spatial units.

- *Zonation*. The idea that a geographical area can be divided into the same number of units in a variety of ways.

![Fig. 2. MAUP effect. (a) scale effect; and, (b) zonation effect.](./figs/maup.png) Source: [Loidl et al (2016)](https://doi.org/10.3390/safety2030017)

```{marginfigure}
Loidl, M., Wallentin, G., Wendel, R. and Zagel, B., 2016. [Mapping bicycle crash risk patterns on the local scale](https://doi.org/10.3390/safety2030017). Safety, 2(3), p.17.
```{r, warning=FALSE}
oa_cents = st_centroid(oa_shp)
head(oa_cents[,1:4])
```
And visualise the data by running:

> MAUP can greatly impact our results and capacity to make inferences, leading to wrong conclusions
### Solutions?

No solution!

Potential mitigation strategies:

- Analysis at different geographical scales\
- Use the smallest geography available \> create random aggregations \> assess changes to the results
- Use functional areas

## Ecological Fallacy

An error in the interpretation of statistical data based on aggregate information e.g.

- [Robinson (1950)](https://doi.org/10.1093/ije/dyn357): country of birth and literacy

```{marginfigure}
WS Robinson, [Ecological Correlations and the Behavior of Individuals](https://doi.org/10.1093/ije/dyn357), International Journal of Epidemiology, Volume 38, Issue 2, April 2009, Pages 337–341.
```{r}
plot(st_geometry(oa_cents))
```

## Spatial Dependence

Refers to the spatial association of values for an indicator, esp. **spatial proximity of more similar (or less similar)** than expected for randomly associated pairs of observations.

## Spatial Heterogeneity

Refers to the **uneven distribution** of a variable's values across space.

## Spatial nonstationarity

It refers to **variations in the relationship** between an outcome variable and a set of predictor variables **across space**.
211 changes: 48 additions & 163 deletions 02-spatial-data.html

Large diffs are not rendered by default.

Loading

0 comments on commit 7ef3ec7

Please sign in to comment.