Skip to content

Commit

Permalink
🔀 Merge branch:dev_lieberson into branch:main (#17)
Browse files Browse the repository at this point in the history
* ✨ Initial commit for branch "dev_lieberson" (ndi v0.1.6.9002)

* Added `lieberson()` function to compute he aspatial racial/ethnic Isolation Index (_xPx\*_) based on Lieberson (1981; ISBN-13:978-1-032-53884-6) and and [Bell (1954)](https://doi.org/10.2307/2574118)
* `bell()` function computes the Interaction Index (Bell) not the Isolation Index as previously documented. Updated documentation throughout
* Renamed *AI* as *A*, *DI* as *D*, *Gini* as *G*, and *II* as _xPx\*_ to align with the definitions from [Massey & Denton (1988)](https://doi.org/10.1093/sf/67.2.281). The output for `atkinson()` now produces `a` instead of `ai`. The output for `duncan()` now produces `d` instead of `ai`. The output for `gini()` now produces `g` instead of `gini`. The output for `bell()` now produces `xPx_star` instead of `II`. The internal functions `ai_fun()`, `di_fun()` and `ii_fun()` were renamed `a_fun()`, `d_fun()` and `xpx_star_fun()`, respectively.
* 📝 Updated vignette for v0.1.6.9002
  • Loading branch information
idblr authored Aug 23, 2024
1 parent 50ba11c commit 2828b07
Show file tree
Hide file tree
Showing 51 changed files with 1,373 additions and 399 deletions.
22 changes: 12 additions & 10 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: ndi
Title: Neighborhood Deprivation Indices
Version: 0.1.6.9001
Date: 2024-08-20
Version: 0.1.6.9002
Date: 2024-08-22
Authors@R:
c(person(given = "Ian D.",
family = "Buller",
Expand Down Expand Up @@ -31,10 +31,10 @@ Description: Computes various metrics of socio-economic deprivation and disparit
Concentration at the Extremes (ICE) based on Feldman et al. (2015)
<doi:10.1136/jech-2015-205728> and Krieger et al. (2016)
<doi:10.2105/AJPH.2015.302955>, (4) compute the aspatial racial/ethnic
Dissimilarity Index (DI) based on Duncan & Duncan (1955) <doi:10.2307/2088328>, (5)
compute the aspatial income or racial/ethnic Atkinson Index (AI) based on Atkinson
(1970) <doi:10.1016/0022-0531(70)90039-6>, (6) aspatial racial/ethnic Isolation
Index (II) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell
Dissimilarity Index (D) based on Duncan & Duncan (1955) <doi:10.2307/2088328>, (5)
compute the aspatial income or racial/ethnic Atkinson Index (A) based on Atkinson
(1970) <doi:10.1016/0022-0531(70)90039-6>, (6) aspatial racial/ethnic Interaction
Index (xPy*) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell
(1954) <doi:10.2307/2574118>, (7) aspatial racial/ethnic Correlation Ratio (V)
based on Bell (1954) <doi:10.2307/2574118> and White (1986) <doi:10.2307/3644339>,
(8) aspatial racial/ethnic Location Quotient (LQ) based on Merton (1939)
Expand All @@ -43,10 +43,12 @@ Description: Computes various metrics of socio-economic deprivation and disparit
Exposure and Isolation (LEx/Is) metric based on Bemanian & Beyer (2017)
<doi:10.1158/1055-9965.EPI-16-0926>, (10) aspatial racial/ethnic Delta (DEL)
based on Hoover (1941) <doi:10.1017/S0022050700052980> and Duncan et al. (1961;
LC:60007089), and (11) an index of spatial proximity based on White (1986)
<doi:10.2307/3644339> and Blau (1977; ISBN-13:978-0-029-03660-0). Also using data
from the ACS-5 (2005-2009 onward), the package can retrieve the aspatial Gini Index
(G) based Gini (1921) <doi:10.2307/2223319>.
LC:60007089), (11) an index of spatial proximity (SP) based on White (1986)
<doi:10.2307/3644339> and Blau (1977; ISBN-13:978-0-029-03660-0), and (12) the
aspatial racial/ethnic Isolatoin Index (xPx*) based on Lieberson (1981;
ISBN-13:978-1-032-53884-6) and Bell (1954) <doi:10.2307/2574118>. Also using data
from the ACS-5 (2005-2009 onward), the package can retrieve the aspatial Gini
Index (G) based Gini (1921) <doi:10.2307/2223319>.
License: Apache License (>= 2.0)
Encoding: UTF-8
LazyData: true
Expand Down
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ export(duncan)
export(gini)
export(hoover)
export(krieger)
export(lieberson)
export(messer)
export(powell_wiley)
export(sudano)
Expand Down
15 changes: 9 additions & 6 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
# ndi (development version)

## ndi v0.1.6.9001
## ndi v0.1.6.9002

### New Features
* Added `hoover()` function to compute the aspatial racial/ethnic Delta (*DEL*) based on [Hoover (1941)](https://doi.org/10.1017/S0022050700052980) and Duncan et al. (1961; LC:60007089)
* Added `white_blau()` function to compute an index of spatial proximity (*SP*) based on [White (1986)](https://doi.org/10.2307/3644339) and Blau (1977; ISBN-13:978-0-029-03660-0)
* Added `geo_large = 'cbsa'` for Core Based Statistical Areas, `geo_large = 'csa'` for Combined Statistical Areas, and `geo_large = 'metro'` for Metropolitan Divisions as the larger geographical unit in `atkinson()`, `bell()`, `bemanian_beyer()`, `duncan()`, `hoover()`, `sudano()`, and `white()`, `white_blau()` functions.
* Added `lieberson()` function to compute he aspatial racial/ethnic Isolation Index (_xPx\*_) based on Lieberson (1981; ISBN-13:978-1-032-53884-6) and and [Bell (1954)](https://doi.org/10.2307/2574118)
* Added `geo_large = 'cbsa'` for Core Based Statistical Areas, `geo_large = 'csa'` for Combined Statistical Areas, and `geo_large = 'metro'` for Metropolitan Divisions as the larger geographical unit in `atkinson()`, `bell()`, `bemanian_beyer()`, `duncan()`, `hoover()`, `lieberson()`, `sudano()`, and `white()`, `white_blau()` functions.
* Thank you for the feature suggestions, [Symielle Gaston](https://orcid.org/0000-0001-9495-1592)

### Updates
* Fixed bug in `bell()`, `bemanian_beyer()`, `duncan()`, `sudano()`, and `white()` when a smaller geography contains n=0 total population, will assign a value of zero (0) in the internal calculation instead of NA
* `bell()` function computes the Interaction Index (Bell) not the Isolation Index as previously documented. Updated documentation throughout
* Fixed bug in `bell()`, `bemanian_beyer()`, `duncan()`, `sudano()`, and `white()` functions when a smaller geography contains n=0 total population, will assign a value of zero (0) in the internal calculation instead of NA
* Renamed *AI* as *A*, *DI* as *D*, *Gini* as *G*, and *II* as _xPy\*_ to align with the definitions from [Massey & Denton (1988)](https://doi.org/10.1093/sf/67.2.281). The output for `atkinson()` now produces `a` instead of `ai`. The output for `duncan()` now produces `d` instead of `ai`. The output for `gini()` now produces `g` instead of `gini`. The output for `bell()` now produces `xPy_star` instead of `II`. The internal functions `ai_fun()`, `di_fun()` and `ii_fun()` were renamed `a_fun()`, `d_fun()` and `xpy_star_fun()`, respectively.
* `tigris` and `units` are now Imports
* 'package.R' deprecated. Replaced with 'ndi-package.R'
* Re-formatted code and documentation throughout for consistent readability
Expand All @@ -34,8 +37,8 @@
## ndi v0.1.4

### New Features
* Added `atkinson()` function to compute the aspatial income or racial/ethnic Atkinson Index (*AI*) based on [Atkinson (1970)](https://doi.org/10.1016/0022-0531(70)90039-6) for specified counties/tracts 2009 onward
* Added `bell()` function to compute the aspatial racial/ethnic Isolation Index (*II*) based on Shevky & Williams (1949; ISBN-13:978-0837156378) and [Bell (1954)](https://doi.org/10.2307/2574118)
* Added `atkinson()` function to compute the aspatial income or racial/ethnic Atkinson Index (*A*) based on [Atkinson (1970)](https://doi.org/10.1016/0022-0531(70)90039-6) for specified counties/tracts 2009 onward
* Added `bell()` function to compute the aspatial racial/ethnic Interaction Index (_xPy\*_) based on Shevky & Williams (1949; ISBN-13:978-0837156378) and [Bell (1954)](https://doi.org/10.2307/2574118)
* Added `white()` function to compute the aspatial racial/ethnic Correlation Ratio (*V*) based on [Bell (1954)](https://doi.org/10.2307/2574118) and [White (1986)](https://doi.org/10.2307/3644339)
* Added `sudano()` function to compute the aspatial racial/ethnic Location Quotient (*LQ*) based on [Merton (1939)](https://doi.org/10.2307/2084686) and [Sudano et al. (2013)](https://doi.org/10.1016/j.healthplace.2012.09.015)
* Added `bemanian_beyer()` function to compute the aspatial racial/ethnic Local Exposure and Isolation (*LEx/Is*) metric based on [Bemanian & Beyer (2017)](https://doi.org/10.1158/1055-9965.EPI-16-0926)
Expand All @@ -56,7 +59,7 @@
## ndi v0.1.3

### New Features
* Added `duncan()` function to compute the Dissimilarity Index (*DI*) based on [Duncan & Duncan (1955)](https://doi.org/10.2307/2088328) for specified counties/tracts 2009 onward
* Added `duncan()` function to compute the Dissimilarity Index (*D*) based on [Duncan & Duncan (1955)](https://doi.org/10.2307/2088328) for specified counties/tracts 2009 onward
* Thank you for the feature suggestion, [Jessica Madrigal](https://orcid.org/0000-0001-5303-5109)
* Added 'utils.R' file with internal `di_fun()` function for `duncan()` function

Expand Down
52 changes: 26 additions & 26 deletions R/atkinson.R
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
#' @param quiet Logical. If TRUE, will display messages about potential missing census information. The default is FALSE.
#' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics
#'
#' @details This function will compute the aspatial Atkinson Index (\emph{AI}) of income or selected racial/ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Atkinson (1970) \doi{10.1016/0022-0531(70)90039-6}. This function provides the computation of \emph{AI} for median household income and any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals).
#' @details This function will compute the aspatial Atkinson Index (\emph{A}) of income or selected racial/ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Atkinson (1970) \doi{10.1016/0022-0531(70)90039-6}. This function provides the computation of \emph{A} for median household income and any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals).
#'
#' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available (2010 onward for \code{geo_large = 'cbsa'} and 2011 onward for \code{geo_large = 'csa'} or \code{geo_large = 'metro'}) but may be available from other U.S. Census Bureau surveys. When \code{subgroup = 'MedHHInc'}, the metric will be computed for median household income ('B19013_001'). The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are:
#' \itemize{
Expand Down Expand Up @@ -39,18 +39,18 @@
#'
#' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output.
#'
#' \emph{AI} is a measure of the evenness of residential inequality (e.g., racial/ethnic segregation) when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. \emph{AI} can range in value from 0 to 1 with smaller values indicating lower levels of inequality (e.g., less segregation).
#' \emph{A} is a measure of the evenness of residential inequality (e.g., racial/ethnic segregation) when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. \emph{A} can range in value from 0 to 1 with smaller values indicating lower levels of inequality (e.g., less segregation).
#'
#' The \code{epsilon} argument that determines how to weight the increments to inequality contributed by different proportions of the Lorenz curve. A user must explicitly decide how heavily to weight smaller geographical units at different points on the Lorenz curve (i.e., whether the index should take greater account of differences among areas of over- or under-representation). The \code{epsilon} argument must have values between 0 and 1.0. For \code{0 <= epsilon < 0.5} or less 'inequality-averse,' smaller geographical units with a subgroup proportion smaller than the subgroup proportion of the larger geographical unit contribute more to inequality ('over-representation'). For \code{0.5 < epsilon <= 1.0} or more 'inequality-averse,' smaller geographical units with a subgroup proportion larger than the subgroup proportion of the larger geographical unit contribute more to inequality ('under-representation'). If \code{epsilon = 0.5} (the default), units of over- and under-representation contribute equally to the index. See Section 2.3 of Saint-Jacques et al. (2020) \doi{10.48550/arXiv.2002.05819} for one method to select \code{epsilon}.
#'
#' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{AI} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{AI} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{AI} computation.
#' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{A} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{A} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{A} computation.
#'
#' @return An object of class 'list'. This is a named list with the following components:
#'
#' \describe{
#' \item{\code{ai}}{An object of class 'tbl' for the GEOID, name, and \emph{AI} at specified larger census geographies.}
#' \item{\code{ai_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.}
#' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute \emph{AI}.}
#' \item{\code{a}}{An object of class 'tbl' for the GEOID, name, and \emph{A} at specified larger census geographies.}
#' \item{\code{a_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.}
#' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute \emph{A}.}
#' }
#'
#' @import dplyr
Expand Down Expand Up @@ -151,7 +151,7 @@ atkinson <- function(geo_large = 'county',
out_names <- names(selected_vars) # save for output
in_subgroup <- paste0(subgroup, 'E')

# Acquire AI variables and sf geometries
# Acquire A variables and sf geometries
out_dat <- suppressMessages(suppressWarnings(
tidycensus::get_acs(
geography = geo_small,
Expand Down Expand Up @@ -183,7 +183,7 @@ atkinson <- function(geo_large = 'county',
)
}

# Grouping IDs for AI computation
# Grouping IDs for A computation
if (geo_large == 'state') {
out_dat <- out_dat %>%
dplyr::mutate(
Expand Down Expand Up @@ -278,7 +278,7 @@ atkinson <- function(geo_large = 'county',
dplyr::mutate(subgroup = rowSums(.[, in_subgroup]))
}

# Compute AI
# Compute A
## From Atkinson (1970) https://doi.org/10.1016/0022-0531(70)90039-6
## A_{\epsilon}(x_{1},...,x_{n}) = \begin{Bmatrix}
## 1 - (\frac{1}{n}\sum_{i=1}^{n}x_{i}^{1-\epsilon})^{1/(1-\epsilon)}/(\frac{1}{n}\sum_{i=1}^{n}x_{i}) & \mathrm{if\:} \epsilon \neq 1 \\
Expand All @@ -291,19 +291,19 @@ atkinson <- function(geo_large = 'county',
## (\frac{1}{n}\sum_{i=1}^{n}x_{i}^{p})^{1/p} & \mathrm{if\:} p \neq 0 \\
## (\prod_{i=1}^{n}x_{i})^{1/n} & \mathrm{if\:} p = 0 \\
## \end{Bmatrix}
## then AI is
## then A is
## A_{\epsilon}(x_{1},...,x_{n}) = 1 - \frac{M_{1-\epsilon}(x_{1},...,x_{n})}{M_{1}(x_{1},...,x_{n})}

## Compute
out_tmp <- out_dat %>%
split(., f = list(out_dat$oid)) %>%
lapply(., FUN = ai_fun, epsilon = epsilon, omit_NAs = omit_NAs) %>%
lapply(., FUN = a_fun, epsilon = epsilon, omit_NAs = omit_NAs) %>%
utils::stack(.) %>%
dplyr::mutate(
AI = values,
A = values,
oid = ind
) %>%
dplyr::select(AI, oid)
dplyr::select(A, oid)

# Warning for missingness of census characteristics
missingYN <- as.data.frame(out_dat[, in_subgroup])
Expand Down Expand Up @@ -332,59 +332,59 @@ atkinson <- function(geo_large = 'county',
if (geo_large == 'state') {
out <- out_dat %>%
dplyr::left_join(out_tmp, by = dplyr::join_by(oid)) %>%
dplyr::select(oid, state, AI) %>%
dplyr::select(oid, state, A) %>%
unique(.) %>%
dplyr::mutate(GEOID = oid) %>%
dplyr::select(GEOID, state, AI) %>%
dplyr::select(GEOID, state, A) %>%
.[.$GEOID != 'NANA',]
}
if (geo_large == 'county') {
out <- out_dat %>%
dplyr::left_join(out_tmp, by = dplyr::join_by(oid)) %>%
dplyr::select(oid, state, county, AI) %>%
dplyr::select(oid, state, county, A) %>%
unique(.) %>%
dplyr::mutate(GEOID = oid) %>%
dplyr::select(GEOID, state, county, AI) %>%
dplyr::select(GEOID, state, county, A) %>%
.[.$GEOID != 'NANA',]
}
if (geo_large == 'tract') {
out <- out_dat %>%
dplyr::left_join(out_tmp, by = dplyr::join_by(oid)) %>%
dplyr::select(oid, state, county, tract, AI) %>%
dplyr::select(oid, state, county, tract, A) %>%
unique(.) %>%
dplyr::mutate(GEOID = oid) %>%
dplyr::select(GEOID, state, county, tract, AI) %>%
dplyr::select(GEOID, state, county, tract, A) %>%
.[.$GEOID != 'NANA',]
}
if (geo_large == 'cbsa') {
out <- out_dat %>%
dplyr::left_join(out_tmp, by = dplyr::join_by(oid)) %>%
dplyr::select(oid, cbsa, AI) %>%
dplyr::select(oid, cbsa, A) %>%
unique(.) %>%
dplyr::mutate(GEOID = oid) %>%
dplyr::select(GEOID, cbsa, AI) %>%
dplyr::select(GEOID, cbsa, A) %>%
.[.$GEOID != 'NANA', ] %>%
dplyr::distinct(GEOID, .keep_all = TRUE) %>%
dplyr::filter(stats::complete.cases(.))
}
if (geo_large == 'csa') {
out <- out_dat %>%
dplyr::left_join(out_tmp, by = dplyr::join_by(oid)) %>%
dplyr::select(oid, csa, AI) %>%
dplyr::select(oid, csa, A) %>%
unique(.) %>%
dplyr::mutate(GEOID = oid) %>%
dplyr::select(GEOID, csa, AI) %>%
dplyr::select(GEOID, csa, A) %>%
.[.$GEOID != 'NANA', ] %>%
dplyr::distinct(GEOID, .keep_all = TRUE) %>%
dplyr::filter(stats::complete.cases(.))
}
if (geo_large == 'metro') {
out <- out_dat %>%
dplyr::left_join(out_tmp, by = dplyr::join_by(oid)) %>%
dplyr::select(oid, metro, AI) %>%
dplyr::select(oid, metro, A) %>%
unique(.) %>%
dplyr::mutate(GEOID = oid) %>%
dplyr::select(GEOID, metro, AI) %>%
dplyr::select(GEOID, metro, A) %>%
.[.$GEOID != 'NANA', ] %>%
dplyr::distinct(GEOID, .keep_all = TRUE) %>%
dplyr::filter(stats::complete.cases(.))
Expand All @@ -398,7 +398,7 @@ atkinson <- function(geo_large = 'county',
dplyr::arrange(GEOID) %>%
dplyr::as_tibble()

out <- list(ai = out, ai_data = out_dat, missing = missingYN)
out <- list(a = out, a_data = out_dat, missing = missingYN)

return(out)
}
Loading

0 comments on commit 2828b07

Please sign in to comment.