diff --git a/DESCRIPTION b/DESCRIPTION index 696fcf2..d96b7de 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,6 +1,6 @@ Package: ndi Title: Neighborhood Deprivation Indices -Version: 0.1.6.9005 +Version: 0.1.6.9006 Date: 2024-08-24 Authors@R: c(person(given = "Ian D.", @@ -45,11 +45,13 @@ Description: Computes various metrics of socio-economic deprivation and disparit based on Hoover (1941) <doi:10.1017/S0022050700052980> and Duncan et al. (1961; LC:60007089), (11) an index of spatial proximity (SP) based on White (1986) <doi:10.2307/3644339> and Blau (1977; ISBN-13:978-0-029-03660-0), (12) the - aspatial racial or ethnic Isolatoin Index (xPx*) based on Lieberson (1981; + aspatial racial or ethnic Isolation Index (xPx*) based on Lieberson (1981; ISBN-13:978-1-032-53884-6) and Bell (1954) <doi:10.2307/2574118>, (13) the aspatial racial or ethnic Gini Index (G) based Gini (1921) <doi:10.2307/2223319>, - and (14) the aspatial racial or ethnic Dissimilarity Index (D) based on James & - Taeuber (1985) <doi:10.2307/270845>. Also using data from the ACS-5 (2005-2009 + (14) the aspatial racial or ethnic Dissimilarity Index (D) based on James & + Taeuber (1985) <doi:10.2307/270845>, and (15) the aspatial racial or ethnic + Entropy (H) based on Theil (1972; ISBN-13:978-0-444-10378-9) and Theil & Finizza + (1971) <doi:110.1080/0022250X.1971.9989795>. Also using data from the ACS-5 (2005-2009 onward), the package can retrieve the aspatial income Gini Index (G) based on Gini (1921) <doi:10.2307/2223319>. License: Apache License (>= 2.0) diff --git a/NAMESPACE b/NAMESPACE index 379bcc2..96424ae 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -14,6 +14,7 @@ export(lieberson) export(messer) export(powell_wiley) export(sudano) +export(theil) export(white) export(white_blau) import(dplyr) diff --git a/NEWS.md b/NEWS.md index c647ef0..9e148af 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,12 +1,13 @@ # ndi (development version) -## ndi v0.1.6.9005 +## ndi v0.1.6.9006 ### New Features * Added `hoover()` function to compute the aspatial racial or ethnic Delta (*DEL*) based on [Hoover (1941)](https://doi.org/10.1017/S0022050700052980) and Duncan et al. (1961; LC:60007089) * Added `white_blau()` function to compute an index of spatial proximity (*SP*) based on [White (1986)](https://doi.org/10.2307/3644339) and Blau (1977; ISBN-13:978-0-029-03660-0) * Added `lieberson()` function to compute the aspatial racial or ethnic Isolation Index (_xPx\*_) based on Lieberson (1981; ISBN-13:978-1-032-53884-6) and and [Bell (1954)](https://doi.org/10.2307/2574118) * Added `james_taeuber()` function to compute the aspatial racial or ethnic Dissimilarity Index (*D*) based on [James & Taeuber (1985)](https://doi.org/10.2307/270845) +* Added `theil()` function the aspatial racial or ethnic Entropy (*H*) based on Theil (1972; ISBN:978-0-444-10378-9) and [Theil & Finizza (1971)](https://doi.org/110.1080/0022250X.1971.9989795) * Added `geo_large = 'cbsa'` for Core Based Statistical Areas, `geo_large = 'csa'` for Combined Statistical Areas, and `geo_large = 'metro'` for Metropolitan Divisions as the larger geographical unit in `atkinson()`, `bell()`, `bemanian_beyer()`, `duncan()`, `hoover()`, `lieberson()`, `sudano()`, and `white()`, `white_blau()` functions. * Thank you for the feature suggestions, [Symielle Gaston](https://orcid.org/0000-0001-9495-1592) * Added `holder` argument to `atkinson()` function to toggle the computation with or without the Hölder mean. The function can now compute *A* without the Hölder mean. The default is `holder = FALSE`. @@ -15,13 +16,13 @@ ### Updates * `bell()` function computes the Interaction Index (Bell) not the Isolation Index as previously documented. Updated documentation throughout * Fixed bug in `bell()`, `bemanian_beyer()`, `duncan()`, `sudano()`, and `white()` functions when a smaller geography contains n=0 total population, will assign a value of zero (0) in the internal calculation instead of NA -* Renamed *AI* as *A*, *DI* as *D*, *Gini* as *G*, and *II* as _xPy\*_ to align with the definitions from [Massey & Denton (1988)](https://doi.org/10.1093/sf/67.2.281). The output for `atkinson()` now produces `a` instead of `ai`. The output for `duncan()` now produces `d` instead of `ai`. The output for `gini()` now produces `g` instead of `gini`. The output for `bell()` now produces `xPy_star` instead of `II`. The internal functions `ai_fun()`, `di_fun()` and `ii_fun()` were renamed `a_fun()`, `d_fun()` and `xpy_star_fun()`, respectively. +* Renamed *AI* as *A*, *DI* as *D*, *Gini* as *G*, and *II* as _xPy\*_ to align with the definitions from [Massey & Denton (1988)](https://doi.org/10.1093/sf/67.2.281). The output for `atkinson()` now produces `a` instead of `ai`. The output for `duncan()` now produces `d` instead of `ai`. The output for `gini()` now produces `g` instead of `gini`. The output for `bell()` now produces `xPy_star` instead of `II`. The internal functions `ai_fun()`, `di_fun()` and `ii_fun()` were renamed `a_fun()`, `ddd_fun()` and `xpy_star_fun()`, respectively. * `tigris` and `units` are now Imports * 'package.R' deprecated. Replaced with 'ndi-package.R' * Re-formatted code and documentation throughout for consistent readability * Renamed 'race/ethnicity' or 'racial/ethnic' to 'race or ethnicity' or 'racial or ethnic' throughout documentation to use more modern, inclusive, and appropriate language * Updated documentation about value range of *V* (White) from `{0 to 1}` to `{-Inf to Inf}` -* Added examples for `gini()`, `james_taeuber()`, `lieberson()`, `hoover()` and `white_blau()` functions in vignette and README +* Added examples for `gini()`, `james_taeuber()`, `lieberson()`, `hoover()`, `theil()`, and `white_blau()` functions in vignette and README * Added example for `holder` argument in `atkinson()` function in README * Reformatted functions for consistent internal structure * Updated examples in vignette to showcase a larger variety of U.S. states diff --git a/R/atkinson.R b/R/atkinson.R index a72a1c4..e0aaa6c 100644 --- a/R/atkinson.R +++ b/R/atkinson.R @@ -40,9 +40,9 @@ #' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. #' -#' \emph{A} is a measure of the evenness of residential inequality (e.g., racial or ethnic segregation) when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. \emph{A} can range in value from 0 to 1 with smaller values indicating lower levels of inequality (e.g., less segregation). +#' \emph{A} is a measure of the evenness of residential inequality (e.g., racial or ethnic segregation) when comparing smaller geographical units to larger ones within which the smaller geographical units are located. \emph{A} can range in value from 0 to 1 with smaller values indicating lower levels of inequality (e.g., less segregation). #' -#' The \code{epsilon} argument that determines how to weight the increments to inequality contributed by different proportions of the Lorenz curve. A user must explicitly decide how heavily to weight smaller geographical units at different points on the Lorenz curve (i.e., whether the index should take greater account of differences among areas of over- or under-representation). The \code{epsilon} argument must have values between 0 and 1.0. For \code{0 <= epsilon < 0.5} or less 'inequality-averse,' smaller geographical units with a subgroup proportion smaller than the subgroup proportion of the larger geographical unit contribute more to inequality ('over-representation'). For \code{0.5 < epsilon <= 1.0} or more 'inequality-averse,' smaller geographical units with a subgroup proportion larger than the subgroup proportion of the larger geographical unit contribute more to inequality ('under-representation'). If \code{epsilon = 0.5} (the default), units of over- and under-representation contribute equally to the index. See Section 2.3 of Saint-Jacques et al. (2020) \doi{10.48550/arXiv.2002.05819} for one method to select \code{epsilon}. +#' The \code{epsilon} argument that determines how to weight the increments to inequality contributed by different proportions of the Lorenz curve. A user must explicitly decide how heavily to weight smaller geographical units at different points on the Lorenz curve (i.e., whether the index should take greater account of differences among units of over- or under-representation). The \code{epsilon} argument must have values between 0 and 1.0. For \code{0 <= epsilon < 0.5} or less 'inequality-averse,' smaller geographical units with a subgroup proportion smaller than the subgroup proportion of the larger geographical unit contribute more to inequality ('over-representation'). For \code{0.5 < epsilon <= 1.0} or more 'inequality-averse,' smaller geographical units with a subgroup proportion larger than the subgroup proportion of the larger geographical unit contribute more to inequality ('under-representation'). If \code{epsilon = 0.5} (the default), units of over- and under-representation contribute equally to the index. See Section 2.3 of Saint-Jacques et al. (2020) \doi{10.48550/arXiv.2002.05819} for one method to select \code{epsilon}. #' #' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{A} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{A} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{A} computation. #' diff --git a/R/bemanian_beyer.R b/R/bemanian_beyer.R index 1fdf986..481e9de 100644 --- a/R/bemanian_beyer.R +++ b/R/bemanian_beyer.R @@ -39,9 +39,9 @@ #' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. #' -#' \emph{LEx/Is} is a measure of the probability that two individuals living within a specific smaller geography (e.g., census tract) of either different (i.e., exposure) or the same (i.e., isolation) racial or ethnic subgroup(s) will interact, assuming that individuals within a smaller geography are randomly mixed. \emph{LEx/Is} is standardized with a logit transformation and centered against an expected case that all races or ethnicities are evenly distributed across a larger geography. (Note: will adjust data by 0.025 if probabilities are zero, one, or undefined. The output will include a warning if adjusted. See \code{\link[car]{logit}} for additional details.) +#' \emph{LEx/Is} is a measure of the probability that two individuals living within a specific smaller geographical unit (e.g., census tract) of either different (i.e., exposure) or the same (i.e., isolation) racial or ethnic subgroup(s) will interact, assuming that individuals within a smaller geographical unit are randomly mixed. \emph{LEx/Is} is standardized with a logit transformation and centered against an expected case that all races or ethnicities are evenly distributed across a larger geographical unit. (Note: will adjust data by 0.025 if probabilities are zero, one, or undefined. The output will include a warning if adjusted. See \code{\link[car]{logit}} for additional details.) #' -#' \emph{LEx/Is} can range from negative infinity to infinity. If \emph{LEx/Is} is zero then the estimated probability of the interaction between two people of the given subgroup(s) within a smaller geography is equal to the expected probability if the subgroup(s) were perfectly mixed in the larger geography. If \emph{LEx/Is} is greater than zero then the interaction is more likely to occur within the smaller geography than in the larger geography, and if \emph{LEx/Is} is less than zero then the interaction is less likely to occur within the smaller geography than in the larger geography. Note: the exponentiation of each \emph{LEx/Is} metric results in the odds ratio of the specific exposure or isolation of interest in a smaller geography relative to the larger geography. +#' \emph{LEx/Is} can range from negative infinity to infinity. If \emph{LEx/Is} is zero then the estimated probability of the interaction between two people of the given subgroup(s) within a smaller geographical unit is equal to the expected probability if the subgroup(s) were perfectly mixed in the larger geographical unit. If \emph{LEx/Is} is greater than zero then the interaction is more likely to occur within the smaller geographical unit than in the larger geographical unit, and if \emph{LEx/Is} is less than zero then the interaction is less likely to occur within the smaller geographical unit than in the larger geographical unit. Note: the exponentiation of each \emph{LEx/Is} metric results in the odds ratio of the specific exposure or isolation of interest in a smaller geographical unit relative to the larger geographical unit. #' #' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{LEx/Is} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{LEx/Is} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{LEx/Is} computation. #' diff --git a/R/duncan.R b/R/duncan.R index b483440..113a253 100644 --- a/R/duncan.R +++ b/R/duncan.R @@ -39,14 +39,14 @@ #' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. #' -#' \emph{D} is a measure of the evenness of racial or ethnic residential segregation when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. \emph{D} can range in value from 0 to 1 and represents the proportion of racial or ethnic subgroup members that would have to change their area of residence to achieve an even distribution within the larger geographical area under conditions of maximum segregation. +#' \emph{D} is a measure of the evenness of racial or ethnic residential segregation when comparing smaller geographical units to larger ones within which the smaller geographical units are located. \emph{D} can range in value from 0 to 1 and represents the proportion of racial or ethnic subgroup members that would have to change their area of residence to achieve an even distribution within the larger geographical area under conditions of maximum segregation. #' #' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{D} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{D} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{D} computation. #' #' @return An object of class 'list'. This is a named list with the following components: #' #' \describe{ -#' \item{\code{di}}{An object of class 'tbl' for the GEOID, name, and \emph{D} at specified larger census geographies.} +#' \item{\code{d}}{An object of class 'tbl' for the GEOID, name, and \emph{D} at specified larger census geographies.} #' \item{\code{d_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.} #' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute \emph{D}.} #' } @@ -315,11 +315,11 @@ duncan <- function(geo_large = 'county', ## From Duncan & Duncan (1955) https://doi.org/10.2307/2088328 ## D_{jt} = 1/2 \sum_{i=1}^{k} | \frac{x_{ijt}}{X_{jt}}-\frac{y_{ijt}}{Y_{jt}}| ## Where for k smaller geographies: - ## D_{jt} denotes the D of larger geography j at time t - ## x_{ijt} denotes the racial or ethnic subgroup population of smaller geography i within larger geography j at time t - ## X_{jt} denotes the racial or ethnic subgroup population of larger geography j at time t - ## y_{ijt} denotes the racial or ethnic referent subgroup population of smaller geography i within larger geography j at time t - ## Y_{jt} denotes the racial or ethnic referent subgroup population of larger geography j at time t + ## D_{jt} denotes the D of larger geographical unit j at time t + ## x_{ijt} denotes the racial or ethnic subgroup population of smaller geographical unit i within larger geographical unit j at time t + ## X_{jt} denotes the racial or ethnic subgroup population of larger geographical unit j at time t + ## y_{ijt} denotes the racial or ethnic referent subgroup population of smaller geographical unit i within larger geographical unit j at time t + ## Y_{jt} denotes the racial or ethnic referent subgroup population of larger geographical unit j at time t ## Compute out_tmp <- out_dat %>% diff --git a/R/gini.R b/R/gini.R index 6bc0cf8..48120b1 100644 --- a/R/gini.R +++ b/R/gini.R @@ -278,8 +278,8 @@ gini <- function(geo_large = 'county', ## t_{j} is the total population of area j ## p_{i} is the proportion of the subgroup population of area i ## p_{j} is the proportion of the subgroup population of area j - ## T is the total population of all areas - ## P is the proportion of the subgroup population of all areas + ## T is the total population of all smaller geographical units + ## P is the proportion of the subgroup population of all smaller geographical units ## Compute out_tmp <- out_dat %>% diff --git a/R/globals.R b/R/globals.R index cbd4fb5..fae714c 100644 --- a/R/globals.R +++ b/R/globals.R @@ -260,6 +260,7 @@ globalVariables( 'G_incE', 'G_re', 'xPx_star', - 'xPy_star' + 'xPy_star', + 'H' ) ) diff --git a/R/james_taeuber.R b/R/james_taeuber.R index 049f381..1fed059 100644 --- a/R/james_taeuber.R +++ b/R/james_taeuber.R @@ -38,14 +38,14 @@ #' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. #' -#' \emph{D} is a measure of the evenness of racial or ethnic residential segregation when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. \emph{D} can range in value from 0 to 1 and represents the proportion of racial or ethnic subgroup members that would have to change their area of residence to achieve an even distribution within the larger geographical area under conditions of maximum segregation. +#' \emph{D} is a measure of the evenness of racial or ethnic residential segregation when comparing smaller geographical units to larger ones within which the smaller geographical units are located. \emph{D} can range in value from 0 to 1 and represents the proportion of racial or ethnic subgroup members that would have to change their area of residence to achieve an even distribution within the larger geographical area under conditions of maximum segregation. #' #' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{D} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{D} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{D} computation. #' #' @return An object of class 'list'. This is a named list with the following components: #' #' \describe{ -#' \item{\code{di}}{An object of class 'tbl' for the GEOID, name, and \emph{D} at specified larger census geographies.} +#' \item{\code{d}}{An object of class 'tbl' for the GEOID, name, and \emph{D} at specified larger census geographies.} #' \item{\code{d_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.} #' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute \emph{D}.} #' } @@ -280,8 +280,8 @@ james_taeuber <- function(geo_large = 'county', ## Where for i smaller geographies: ## t_{i} is the total population of area i ## p_{i} is the proportion of the subgroup population of area i - ## T is the total population of all areas - ## P is the proportion of the subgroup population of all areas + ## T is the total population of all smaller geographical units + ## P is the proportion of the subgroup population of all smaller geographical units ## Compute out_tmp <- out_dat %>% diff --git a/R/ndi-package.R b/R/ndi-package.R index 17bff61..2de4f79 100644 --- a/R/ndi-package.R +++ b/R/ndi-package.R @@ -36,6 +36,8 @@ #' #' \code{\link{sudano}} Computes the aspatial racial or ethnic Location Quotient (\emph{LQ}) based on Merton (1939) \doi{10.2307/2084686} and Sudano et al. (2013) \doi{10.1016/j.healthplace.2012.09.015}. #' +#' \code{\link{theil}} Computes the aspatial racial or ethnic Entropy (\emph{H}) based on Theil (1972; ISBN-13:978-0-444-10378-9) and Theil & Finizza (1971) \doi{110.1080/0022250X.1971.9989795}. +#' #' \code{\link{white}} Computes the aspatial racial or ethnic Correlation Ratio (\emph{V}) based on Bell (1954) \doi{10.2307/2574118} and White (1986) \doi{10.2307/3644339}. #' #' \code{\link{white_blau}} Computes an index of spatial proximity (\emph{SP}) based on White (1986) \doi{10.2307/3644339} and Blau (1977; ISBN-13:978-0-029-03660-0). diff --git a/R/sudano.R b/R/sudano.R index d73d66f..306fbb6 100644 --- a/R/sudano.R +++ b/R/sudano.R @@ -38,7 +38,7 @@ #' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. #' -#' \emph{LQ} is some measure of relative racial homogeneity of each smaller geography within a larger geography. \emph{LQ} can range in value from 0 to infinity because it is ratio of two proportions in which the numerator is the proportion of subgroup population in a smaller geography and the denominator is the proportion of subgroup population in its larger geography. For example, a smaller geography with an \emph{LQ} of 5 means that the proportion of the subgroup population living in the smaller geography is five times the proportion of the subgroup population in its larger geography. +#' \emph{LQ} is some measure of relative racial homogeneity of each smaller geographical units within a larger geographical unit. \emph{LQ} can range in value from 0 to infinity because it is ratio of two proportions in which the numerator is the proportion of subgroup population in a smaller geographical unit and the denominator is the proportion of subgroup population in its larger geographical unit. For example, a smaller geographical unit with an \emph{LQ} of 5 means that the proportion of the subgroup population living in the smaller geographical unit is five times the proportion of the subgroup population in its larger geographical unit. #' #' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{LQ} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{LQ} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{LQ} computation. #' @@ -277,7 +277,7 @@ sudano <- function(geo_large = 'county', ## From Sudano (2013) https://doi.org/10.1016/j.healthplace.2012.09.015 ## LQ_{im} = (x_{im}/X_{i})/(X_{m}/X) ## for: - ## i smaller geography and subgroup m + ## i smaller geographical unit and subgroup m ## Compute out_tmp <- out_dat %>% diff --git a/R/theil.R b/R/theil.R new file mode 100644 index 0000000..0a16e1e --- /dev/null +++ b/R/theil.R @@ -0,0 +1,402 @@ +#' Entropy based on Theil (1972) and Theil & Finizza (1971) +#' +#' Compute the aspatial Entropy (Theil) of selected racial or ethnic subgroup(s) and U.S. geographies +#' +#' @param geo_large Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}. +#' @param geo_small Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}. +#' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available. +#' @param subgroup Character string specifying the racial or ethnic subgroup(s) as the comparison population. See Details for available choices. +#' @param omit_NAs Logical. If FALSE, will compute index for a larger geographical unit only if all of its smaller geographical units have values. The default is TRUE. +#' @param quiet Logical. If TRUE, will display messages about potential missing census information. The default is FALSE. +#' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics +#' +#' @details This function will compute the aspatial Entropy (\emph{H}) of selected racial or ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Theil (1972; ISBN-13:978-0-444-10378-9) and Theil & Finizza (1971) \doi{110.1080/0022250X.1971.9989795}. This function provides the computation of \emph{H} for any of the U.S. Census Bureau race or ethnicity subgroups (including Hispanic and non-Hispanic individuals). +#' +#' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available (2010 onward for \code{geo_large = 'cbsa'} and 2011 onward for \code{geo_large = 'csa'} or \code{geo_large = 'metro'}) but may be available from other U.S. Census Bureau surveys. The twenty racial or ethnic subgroups (U.S. Census Bureau definitions) are: +#' \itemize{ +#' \item \strong{B03002_002}: not Hispanic or Latino \code{'NHoL'} +#' \item \strong{B03002_003}: not Hispanic or Latino, white alone \code{'NHoLW'} +#' \item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +#' \item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +#' \item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{'NHoLA'} +#' \item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +#' \item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +#' \item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +#' \item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +#' \item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +#' \item \strong{B03002_012}: Hispanic or Latino \code{'HoL'} +#' \item \strong{B03002_013}: Hispanic or Latino, white alone \code{'HoLW'} +#' \item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{'HoLB'} +#' \item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +#' \item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{'HoLA'} +#' \item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +#' \item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +#' \item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +#' \item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +#' \item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} +#' } +#' +#' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. +#' +#' \emph{H} is a measure of the evenness of racial or ethnic residential segregation when comparing smaller geographical units to larger ones within which the smaller geographical units are located. \emph{H} can range in value from 0 to 1 and represents the (weighted) average deviation of each smaller geographical unit from the larger geographical unit's "entropy" or racial and ethnic diversity, which is greatest when each group is equally represented in the larger geographical unit. \emph{H} varies between 0, when all smaller geographical units have the same racial or ethnic composition as the larger geographical area (i.e., maximum integration), to a high of 1, when all smaller geographical units contain one group only (maximum segregation). +#' +#' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{H} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{H} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{H} computation. +#' +#' Note: The computation differs from Massey & Denton (1988) \doi{10.1093/sf/67.2.281} by taking the absolute value of \code{(E-E_{i})} so extent of the output is \code{{0, 1}} as designed by Theil (1972; ISBN-13:978-0-444-10378-9) instead of \code{{-Inf, Inf}} as described in Massey & Denton (1988) \doi{10.1093/sf/67.2.281}. +#' +#' @return An object of class 'list'. This is a named list with the following components: +#' +#' \describe{ +#' \item{\code{h}}{An object of class 'tbl' for the GEOID, name, and \emph{H} at specified larger census geographies.} +#' \item{\code{h_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.} +#' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute \emph{H}.} +#' } +#' +#' @import dplyr +#' @importFrom sf st_drop_geometry st_within +#' @importFrom stats complete.cases +#' @importFrom stringr str_trim +#' @importFrom tidycensus get_acs +#' @importFrom tidyr pivot_longer separate +#' @importFrom tigris combined_statistical_areas core_based_statistical_areas metro_divisions +#' @importFrom utils stack +#' @export +#' +#' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). +#' +#' @examples +#' \dontrun{ +#' # Wrapped in \dontrun{} because these examples require a Census API key. +#' +#' # Entropy (Theil) +#' ## of Black populations +#' ## of census tracts within counties within Georgia, U.S.A., counties (2020) +#' theil( +#' geo_large = 'county', +#' geo_small = 'tract', +#' state = 'GA', +#' year = 2020, +#' subgroup = c('NHoLB', 'HoLB') +#' ) +#' +#' } +#' +theil <- function(geo_large = 'county', + geo_small = 'tract', + year = 2020, + subgroup, + omit_NAs = TRUE, + quiet = FALSE, + ...) { + + # Check arguments + match.arg(geo_large, choices = c('state', 'county', 'tract', 'cbsa', 'csa', 'metro')) + match.arg(geo_small, choices = c('county', 'tract', 'block group')) + stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward + match.arg( + subgroup, + several.ok = TRUE, + choices = c( + 'NHoL', + 'NHoLW', + 'NHoLB', + 'NHoLAIAN', + 'NHoLA', + 'NHoLNHOPI', + 'NHoLSOR', + 'NHoLTOMR', + 'NHoLTRiSOR', + 'NHoLTReSOR', + 'HoL', + 'HoLW', + 'HoLB', + 'HoLAIAN', + 'HoLA', + 'HoLNHOPI', + 'HoLSOR', + 'HoLTOMR', + 'HoLTRiSOR', + 'HoLTReSOR' + ) + ) + + # Select census variable + vars <- c( + TotalPop = 'B03002_001', + NHoL = 'B03002_002', + NHoLW = 'B03002_003', + NHoLB = 'B03002_004', + NHoLAIAN = 'B03002_005', + NHoLA = 'B03002_006', + NHoLNHOPI = 'B03002_007', + NHoLSOR = 'B03002_008', + NHoLTOMR = 'B03002_009', + NHoLTRiSOR = 'B03002_010', + NHoLTReSOR = 'B03002_011', + HoL = 'B03002_012', + HoLW = 'B03002_013', + HoLB = 'B03002_014', + HoLAIAN = 'B03002_015', + HoLA = 'B03002_016', + HoLNHOPI = 'B03002_017', + HoLSOR = 'B03002_018', + HoLTOMR = 'B03002_019', + HoLTRiSOR = 'B03002_020', + HoLTReSOR = 'B03002_021' + ) + + selected_vars <- vars[c('TotalPop', subgroup)] + out_names <- names(selected_vars) # save for output + in_subgroup <- paste0(subgroup, 'E') + + # Acquire H variables and sf geometries + out_dat <- suppressMessages(suppressWarnings( + tidycensus::get_acs( + geography = geo_small, + year = year, + output = 'wide', + variables = selected_vars, + geometry = TRUE, + keep_geo_vars = TRUE, + ... + ) + )) + + # Format output + if (geo_small == 'county') { + out_dat <- out_dat %>% + tidyr::separate(NAME.y, into = c('county', 'state'), sep = ',') + } + if (geo_small == 'tract') { + out_dat <- out_dat %>% + tidyr::separate(NAME.y, into = c('tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate(tract = gsub('[^0-9\\.]', '', tract)) + } + if (geo_small == 'block group') { + out_dat <- out_dat %>% + tidyr::separate(NAME.y, into = c('block.group', 'tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate( + tract = gsub('[^0-9\\.]', '', tract), + block.group = gsub('[^0-9\\.]', '', block.group) + ) + } + + # Grouping IDs for H computation + if (geo_large == 'state') { + out_dat <- out_dat %>% + dplyr::mutate( + oid = STATEFP, + state = stringr::str_trim(state) + ) %>% + sf::st_drop_geometry() + } + if (geo_large == 'tract') { + out_dat <- out_dat %>% + dplyr::mutate( + oid = paste0(STATEFP, COUNTYFP, TRACTCE), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) %>% + sf::st_drop_geometry() + } + if (geo_large == 'county') { + out_dat <- out_dat %>% + dplyr::mutate( + oid = paste0(STATEFP, COUNTYFP), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) %>% + sf::st_drop_geometry() + } + if (geo_large == 'cbsa') { + stopifnot(is.numeric(year), year >= 2010) # CBSAs only available 2010 onward + lgeom <- suppressMessages(suppressWarnings(tigris::core_based_statistical_areas(year = year))) + wlgeom <- sf::st_within(out_dat, lgeom) + out_dat <- out_dat %>% + dplyr::mutate( + oid = lapply(wlgeom, function(x) { + tmp <- lgeom[x, 3] %>% sf::st_drop_geometry() + lapply(tmp, function(x) { if (length(x) == 0) NA else x }) + }) %>% + unlist(), + cbsa = lapply(wlgeom, function(x) { + tmp <- lgeom[x, 4] %>% sf::st_drop_geometry() + lapply(tmp, function(x) { if (length(x) == 0) NA else x }) + }) %>% + unlist() + ) %>% + sf::st_drop_geometry() + } + if (geo_large == 'csa') { + stopifnot(is.numeric(year), year >= 2011) # CSAs only available 2011 onward + lgeom <- suppressMessages(suppressWarnings(tigris::combined_statistical_areas(year = year))) + wlgeom <- sf::st_within(out_dat, lgeom) + out_dat <- out_dat %>% + dplyr::mutate( + oid = lapply(wlgeom, function(x) { + tmp <- lgeom[x, 2] %>% sf::st_drop_geometry() + lapply(tmp, function(x) { if (length(x) == 0) NA else x }) + }) %>% + unlist(), + csa = lapply(wlgeom, function(x) { + tmp <- lgeom[x, 3] %>% sf::st_drop_geometry() + lapply(tmp, function(x) { if (length(x) == 0) NA else x }) + }) %>% + unlist() + ) %>% + sf::st_drop_geometry() + } + if (geo_large == 'metro') { + stopifnot(is.numeric(year), year >= 2011) # Metro Divisions only available 2011 onward + lgeom <- suppressMessages(suppressWarnings(tigris::metro_divisions(year = year))) + wlgeom <- sf::st_within(out_dat, lgeom) + out_dat <- out_dat %>% + dplyr::mutate( + oid = lapply(wlgeom, function(x) { + tmp <- lgeom[x, 4] %>% sf::st_drop_geometry() + lapply(tmp, function(x) { if (length(x) == 0) NA else x }) + }) %>% + unlist(), + metro = lapply(wlgeom, function(x) { + tmp <- lgeom[x, 5] %>% sf::st_drop_geometry() + lapply(tmp, function(x) { if (length(x) == 0) NA else x }) + }) %>% + unlist() + ) %>% + sf::st_drop_geometry() + } + + # Count of racial or ethnic subgroup populations + ## Count of racial or ethnic comparison subgroup population + if (length(in_subgroup) == 1) { + out_dat <- out_dat %>% + dplyr::mutate(subgroup = .[, in_subgroup]) + } else { + out_dat <- out_dat %>% + dplyr::mutate(subgroup = rowSums(.[, in_subgroup])) + } + + # Compute H + ## From Theil (1972) https://doi.org/10.1080/0022250X.1971.9989795 + ## Note: Differs from Massey & Denton (1988) https://doi.org/10.1093/sf/67.2.281 + ## by taking the absolute value of (E-E_{i}) so extent of the output is + ## {0, 1} as designed by Theil (1972) instead of {-Inf, Inf} as described in + ## Massey & Denton (1988) + ## H = \sum_{i=1}^{n}\left [ t_{i} \left | E-E_{i} \right | /ET \right ] + ## Where for i smaller geographies: + ## E=(P)ln[1/P]+(1-P)ln[1/(1-P)] + ## E_{i}=(p_{i})ln[1/p_{i}]+(1-p_{i})ln[1/(1-p_{i})] + ## and + ## t_{i} is the total population of area i + ## p_{i} is the proportion of the subgroup population of area i + ## T is the total population of all smaller geographical units + ## P is the proportion of the subgroup population of all smaller geographical units + + ## Compute + out_tmp <- out_dat %>% + split(., f = list(out_dat$oid)) %>% + lapply(., FUN = h_fun, omit_NAs = omit_NAs) %>% + utils::stack(.) %>% + dplyr::mutate( + H = values, + oid = ind + ) %>% + dplyr::select(H, oid) + + # Warning for missingness of census characteristics + missingYN <- out_dat[, c('TotalPopE', in_subgroup)] + names(missingYN) <- out_names + missingYN <- missingYN %>% + tidyr::pivot_longer( + cols = dplyr::everything(), + names_to = 'variable', + values_to = 'val' + ) %>% + dplyr::group_by(variable) %>% + dplyr::summarise( + total = dplyr::n(), + n_missing = sum(is.na(val)), + percent_missing = paste0(round(mean(is.na(val)) * 100, 2), ' %') + ) + + if (quiet == FALSE) { + # Warning for missing census data + if (sum(missingYN$n_missing) > 0) { + message('Warning: Missing census data') + } + } + + # Format output + if (geo_large == 'state') { + out <- out_dat %>% + dplyr::left_join(out_tmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, state, H) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, state, H) %>% + .[.$GEOID != 'NANA',] + } + if (geo_large == 'county') { + out <- out_dat %>% + dplyr::left_join(out_tmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, state, county, H) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, state, county, H) %>% + .[.$GEOID != 'NANA',] + } + if (geo_large == 'tract') { + out <- out_dat %>% + dplyr::left_join(out_tmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, state, county, tract, H) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, state, county, tract, H) %>% + .[.$GEOID != 'NANA',] + } + if (geo_large == 'cbsa') { + out <- out_dat %>% + dplyr::left_join(out_tmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, cbsa, H) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, cbsa, H) %>% + .[.$GEOID != 'NANA', ] %>% + dplyr::distinct(GEOID, .keep_all = TRUE) %>% + dplyr::filter(stats::complete.cases(.)) + } + if (geo_large == 'csa') { + out <- out_dat %>% + dplyr::left_join(out_tmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, csa, H) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, csa, H) %>% + .[.$GEOID != 'NANA', ] %>% + dplyr::distinct(GEOID, .keep_all = TRUE) %>% + dplyr::filter(stats::complete.cases(.)) + } + if (geo_large == 'metro') { + out <- out_dat %>% + dplyr::left_join(out_tmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, metro, H) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, metro, H) %>% + .[.$GEOID != 'NANA', ] %>% + dplyr::distinct(GEOID, .keep_all = TRUE) %>% + dplyr::filter(stats::complete.cases(.)) + } + + out <- out %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + out_dat <- out_dat %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + out <- list(h = out, h_data = out_dat, missing = missingYN) + + return(out) +} diff --git a/R/utils.R b/R/utils.R index 4eea14b..9fa5301 100644 --- a/R/utils.R +++ b/R/utils.R @@ -1,4 +1,5 @@ -# Internal function for the Dissimilarity Index (Duncan & Duncan 1955) +# Internal function for the Dissimilarity Index +## Duncan & Duncan (1955) https://doi.org/10.2307/2088328 ## Returns NA value if only one smaller geography in a larger geography ddd_fun <- function(x, omit_NAs) { xx <- x[ , c('subgroup', 'subgroup_ref')] @@ -15,7 +16,8 @@ ddd_fun <- function(x, omit_NAs) { } } -# Internal function for the Atkinson Index (Atkinson 1970) +# Internal function for the Atkinson Index +## Atkinson (1970) https://doi.org/10.1016/0022-0531(70)90039-6 ## Returns NA value if only one smaller geography in a larger geography ## If denoting the Hölder mean a_fun <- function(x, epsilon, omit_NAs, holder) { @@ -48,7 +50,8 @@ a_fun <- function(x, epsilon, omit_NAs, holder) { } } -# Internal function for the aspatial Interaction Index (Bell 1954) +# Internal function for the aspatial Interaction Index +## Bell (1954) https://doi.org/10.2307/2574118 ## Returns NA value if only one smaller geography in a larger geography xpy_star_fun <- function(x, omit_NAs) { xx <- x[ , c('TotalPopE', 'subgroup', 'subgroup_ixn')] @@ -65,7 +68,8 @@ xpy_star_fun <- function(x, omit_NAs) { } } -# Internal function for the aspatial Isolation Index (Lieberson 1981) +# Internal function for the aspatial Isolation Index +## Lieberson (1981) ISBN-13:978-1-032-53884-6 ## Returns NA value if only one smaller geography in a larger geography xpx_star_fun <- function(x, omit_NAs) { xx <- x[ , c('TotalPopE', 'subgroup')] @@ -81,7 +85,8 @@ xpx_star_fun <- function(x, omit_NAs) { } } -# Internal function for the aspatial Correlation Ratio (White 1986) +# Internal function for the aspatial Correlation Ratio +## White (1986) https://doi.org/10.2307/3644339 ## Returns NA value if only one smaller geography in a larger geography v_fun <- function(x, omit_NAs) { xx <- x[ , c('TotalPopE', 'subgroup')] @@ -100,7 +105,8 @@ v_fun <- function(x, omit_NAs) { } } -# Internal function for the aspatial Location Quotient (Sudano et al. 2013) +# Internal function for the aspatial Location Quotient +## Sudano et al. (2013) https://doi.org/10.1016/j.healthplace.2012.09.015 ## Returns NA value if only one smaller geography in a larger geography lq_fun <- function(x, omit_NAs) { xx <- x[ , c('TotalPopE', 'subgroup', 'GEOID')] @@ -120,9 +126,8 @@ lq_fun <- function(x, omit_NAs) { } } - - -# Internal function for the aspatial Local Exposure & Isolation (Bemanian & Beyer 2017) metric +# Internal function for the aspatial Local Exposure & Isolation metric +# Bemanian & Beyer (2017) https://doi.org/10.1158/1055-9965.EPI-16-0926 ## Returns NA value if only one smaller geography in a larger geography lexis_fun <- function(x, omit_NAs) { xx <- x[ , c('TotalPopE', 'subgroup', 'subgroup_ixn', 'GEOID')] @@ -142,7 +147,8 @@ lexis_fun <- function(x, omit_NAs) { } } -# Internal function for the aspatial Delta (Hoover 1941) +# Internal function for the aspatial Delta +## Hoover (1941) https://10.1017/S0022050700052980 ## Returns NA value if only one smaller geography in a larger geography del_fun <- function(x, omit_NAs) { xx <- x[ , c('subgroup', 'ALAND')] @@ -159,7 +165,8 @@ del_fun <- function(x, omit_NAs) { } } -# Internal function for an index of spatial proximity (White 1986) +# Internal function for an index of spatial proximity +## White (1986) https://doi.org/10.2307/3644339 ## Returns NA value if only one smaller geography in a larger geography sp_fun <- function(x, omit_NAs) { xx <- x[ , c('TotalPopE', 'subgroup', 'subgroup_ref', 'ALAND')] @@ -184,7 +191,8 @@ sp_fun <- function(x, omit_NAs) { } } -# Internal function for the Gini Index (Gini 1921) +# Internal function for the Gini Index +## Gini (1921) https://doi.org/10.2307/2223319 ## Returns NA value if only one smaller geography in a larger geography g_fun <- function(x, omit_NAs) { xx <- x[ , c('TotalPopE', 'subgroup')] @@ -206,7 +214,8 @@ g_fun <- function(x, omit_NAs) { } } -# Internal function for the Dissimilarity Index (James & Taeuber 1985) +# Internal function for the Dissimilarity Index +## James & Taeuber (1985) https://doi.org/10.2307/270845 ## Returns NA value if only one smaller geography in a larger geography djt_fun <- function(x, omit_NAs) { xx <- x[ , c('TotalPopE', 'subgroup')] @@ -220,7 +229,39 @@ djt_fun <- function(x, omit_NAs) { N <- sum(xx$TotalPopE, na.rm = TRUE) p_i <- x_i / t_i P <- X / N - D <- sum(t_i * abs(p_i - P), na.rm = TRUE)/(2 * N * P * (1 - P)) + D <- sum(t_i * abs(p_i - P), na.rm = TRUE) / (2 * N * P * (1 - P)) return(D) } } + +# Internal function for Entropy +## Theil (1972) https://doi.org/10.1080/0022250X.1971.9989795 +## Returns NA value if only one smaller geography in a larger geography +## Note: Differs from Massey & Denton (1988) https://doi.org/10.1093/sf/67.2.281 +## by taking the absolute value of (E-E_{i}) so extent of the output is +## {0, 1} as designed by Theil (1972) instead of {-Inf, Inf} as described in +## Massey & Denton (1988) +h_fun <- function(x, omit_NAs) { + xx <- x[ , c('TotalPopE', 'subgroup')] + if (omit_NAs == TRUE) { xx <- xx[stats::complete.cases(xx), ] } + if (nrow(x) < 2 || any(xx < 0) || any(is.na(xx))) { + NA + } else { + x_i <- xx$subgroup + X <- sum(xx$subgroup, na.rm = TRUE) + t_i <- xx$TotalPopE + N <- sum(xx$TotalPopE, na.rm = TRUE) + p_i <- x_i / t_i + p_i[is.infinite(p_i)] <- 0 + P <- X / N + if (is.infinite(P)) { P <- 0 } + E_i <- p_i * log(1 / p_i) + (1 - p_i) * log(1 / (1 - p_i)) + E_i[is.infinite(E_i)] <- 0 + E <- P * log(1 / P) + (1 - P) * log(1 / (1 - P)) + if (is.infinite(E)) { E <- 0 } + H_i <- t_i * abs(E - E_i) / (E * N) + H_i[is.infinite(H_i)] <- NA + H <- sum(H_i, na.rm = TRUE) + return(H) + } +} diff --git a/R/white.R b/R/white.R index deaac8a..2ae7034 100644 --- a/R/white.R +++ b/R/white.R @@ -1,18 +1,18 @@ #' Correlation Ratio based on Bell (1954) and White (1986) #' -#' Compute the aspatial Correlation Ratio (White) of a selected racial or ethnnic subgroup(s) and U.S. geographies. +#' Compute the aspatial Correlation Ratio (White) of a selected racial or ethnic subgroup(s) and U.S. geographies. #' #' @param geo_large Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}. #' @param geo_small Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}. #' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available. -#' @param subgroup Character string specifying the racial or ethnnic subgroup(s). See Details for available choices. +#' @param subgroup Character string specifying the racial or ethnic subgroup(s). See Details for available choices. #' @param omit_NAs Logical. If FALSE, will compute index for a larger geographical unit only if all of its smaller geographical units have values. The default is TRUE. #' @param quiet Logical. If TRUE, will display messages about potential missing census information. The default is FALSE. #' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics #' -#' @details This function will compute the aspatial Correlation Ratio (\emph{V} or \eqn{Eta^{2}}{Eta^2}) of selected racial or ethnnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Bell (1954) \doi{10.2307/2574118} and White (1986) \doi{10.2307/3644339}. This function provides the computation of \emph{V} for any of the U.S. Census Bureau race or ethnnicity subgroups (including Hispanic and non-Hispanic individuals). +#' @details This function will compute the aspatial Correlation Ratio (\emph{V} or \eqn{Eta^{2}}{Eta^2}) of selected racial or ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Bell (1954) \doi{10.2307/2574118} and White (1986) \doi{10.2307/3644339}. This function provides the computation of \emph{V} for any of the U.S. Census Bureau race or ethnicity subgroups (including Hispanic and non-Hispanic individuals). #' -#' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available (2010 onward for \code{geo_large = 'cbsa'} and 2011 onward for \code{geo_large = 'csa'} or \code{geo_large = 'metro'}) but may be available from other U.S. Census Bureau surveys. The twenty racial or ethnnic subgroups (U.S. Census Bureau definitions) are: +#' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available (2010 onward for \code{geo_large = 'cbsa'} and 2011 onward for \code{geo_large = 'csa'} or \code{geo_large = 'metro'}) but may be available from other U.S. Census Bureau surveys. The twenty racial or ethnic subgroups (U.S. Census Bureau definitions) are: #' \itemize{ #' \item \strong{B03002_002}: not Hispanic or Latino \code{'NHoL'} #' \item \strong{B03002_003}: not Hispanic or Latino, white alone \code{'NHoLW'} @@ -259,8 +259,8 @@ white <- function(geo_large = 'county', sf::st_drop_geometry() } - # Count of racial or ethnnic subgroup populations - ## Count of racial or ethnnic comparison subgroup population + # Count of racial or ethnic subgroup populations + ## Count of racial or ethnic comparison subgroup population if (length(in_subgroup) == 1) { out_dat <- out_dat %>% dplyr::mutate(subgroup = .[, in_subgroup]) diff --git a/R/white_blau.R b/R/white_blau.R index 872321e..3f4767c 100644 --- a/R/white_blau.R +++ b/R/white_blau.R @@ -39,7 +39,7 @@ #' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. #' -#' \emph{SP} is a measure of clustering of racial or ethnic populations within smaller geographical areas that are located within larger geographical areas. \emph{SP} can range in value from 0 to Inf and represents the degree to which an area is a racial or ethnic enclave. A value of 1 indicates there is no differential clustering between subgroup and referent group members. A value greater than 1 indicates subgroup members live nearer to one another than to referent subgroup members. A value less than 1 indicates subgroup live nearer to and referent subgroup members than to their own subgroup members. +#' \emph{SP} is a measure of clustering of racial or ethnic populations within smaller geographical units that are located within larger geographical units. \emph{SP} can range in value from 0 to Inf and represents the degree to which an area is a racial or ethnic enclave. A value of 1 indicates there is no differential clustering between subgroup and referent group members. A value greater than 1 indicates subgroup members live nearer to one another than to referent subgroup members. A value less than 1 indicates subgroup live nearer to and referent subgroup members than to their own subgroup members. #' #' The metric uses the exponential transform of a distance matrix (kilometers) between smaller geographical area centroids, with a diagonal defined as \code{(0.6*a_{i})^{0.5}} where \code{a_{i}} is the area (square kilometers) of smaller geographical unit \code{i} as defined by White (1983) \doi{10.1086/227768}. #' @@ -312,11 +312,11 @@ white_blau <- function(geo_large = 'county', ## From White (1986) https://doi.org/10.2307/3644339} ## D_{jt} = 1/2 \sum_{i=1}^{k} | \frac{x_{ijt}}{X_{jt}}-\frac{y_{ijt}}{Y_{jt}}| ## Where for k smaller geographies: - ## D_{jt} denotes the DI of larger geography j at time t - ## x_{ijt} denotes the racial or ethnic subgroup population of smaller geography i within larger geography j at time t - ## X_{jt} denotes the racial or ethnic subgroup population of larger geography j at time t - ## y_{ijt} denotes the racial or ethnic referent subgroup population of smaller geography i within larger geography j at time t - ## Y_{jt} denotes the racial or ethnic referent subgroup population of larger geography j at time t + ## D_{jt} denotes the DI of larger geographical unit j at time t + ## x_{ijt} denotes the racial or ethnic subgroup population of smaller geographical unit i within larger geographical unit j at time t + ## X_{jt} denotes the racial or ethnic subgroup population of larger geographical unit j at time t + ## y_{ijt} denotes the racial or ethnic referent subgroup population of smaller geographical unit i within larger geographical unit j at time t + ## Y_{jt} denotes the racial or ethnic referent subgroup population of larger geographical unit j at time t ## Compute out_tmp <- out_dat %>% diff --git a/README.md b/README.md index 808bdcf..baa2d4c 100644 --- a/README.md +++ b/README.md @@ -16,7 +16,7 @@ ### Overview -The *ndi* package is a suite of [**R**](https://cran.r-project.org/) functions to compute various metrics of socio-economic deprivation and disparity in the United States. Some metrics are considered 'spatial' because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other metrics are 'aspatial' because they only consider the value within each census geography. Two types of aspatial neighborhood deprivation index (*NDI*) are available: (1) based on [Messer et al. (2006)](https://doi.org/10.1007/s11524-006-9094-x) and (2) based on [Andrews et al. (2020)](https://doi.org/10.1080/17445647.2020.1750066) and [Slotman et al. (2022)](https://doi.org/10.1016/j.dib.2022.108002) who use variables chosen by [Roux and Mair (2010)](https://doi.org/10.1111/j.1749-6632.2009.05333.x). Both are a decomposition of various demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates (ACS-5; 2006-2010 onward) pulled by the [tidycensus](https://CRAN.R-project.org/package=tidycensus) package. Using data from the ACS-5 (2005-2009 onward), the *ndi* package can also compute the (1) spatial Racial Isolation Index (*RI*) based on [Anthopolos et al. (2011)](https://doi.org/10.1016/j.sste.2011.06.002), (2) spatial Educational Isolation Index (*EI*) based on [Bravo et al. (2021)](https://doi.org/10.3390/ijerph18179384), (3) aspatial Index of Concentration at the Extremes (*ICE*) based on [Feldman et al. (2015)](https://doi.org/10.1136/jech-2015-205728) and [Krieger et al. (2016)](https://doi.org/10.2105/AJPH.2015.302955), (4) aspatial racial or ethnic Dissimilarity Index (*D*) based on [Duncan & Duncan (1955)](https://doi.org/10.2307/2088328), (5) aspatial income or racial or ethnic Atkinson Index (*A*) based on [Atkinson (1970)](https://doi.org/10.1016/0022-0531(70)90039-6), (6) aspatial racial or ethnic Interaction Index (_xPy\*_) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and [Bell (1954)](https://doi.org/10.2307/2574118), (7) aspatial racial or ethnic Correlation Ratio (*V*) based on [Bell (1954)](https://doi.org/10.2307/2574118) and [White (1986)](https://doi.org/10.2307/3644339), (8) aspatial racial or ethnic Location Quotient (*LQ*) based on [Merton (1939)](https://doi.org/10.2307/2084686) and [Sudano et al. (2013)](https://doi.org/10.1016/j.healthplace.2012.09.015), (9) aspatial racial or ethnic Local Exposure and Isolation (*LEx/Is*) metric based on [Bemanian & Beyer (2017)](https://doi.org/10.1158/1055-9965.EPI-16-0926), (10) aspatial racial or ethnic Delta (*DEL*) based on [Hoover (1941)](https://doi.org/10.1017/S0022050700052980) and Duncan et al. (1961; LC:60007089), (11) an index of spatial proximity (*SP*) based on [White (1986)](https://doi.org/10.2307/3644339) and Blau (1977; ISBN-13:978-0-029-03660-0), and (12) the aspatial racial or ethnic Isolation Index (_xPx\*_) based on Lieberson (1981; ISBN-13:978-1-032-53884-6) and [Bell (1954)](https://doi.org/10.2307/2574118), (13) the aspatial racial or ethnic Gini Index (*G*) based on [Gini (1921)](https://doi.org/10.2307/2223319), and (14) aspatial racial or ethnic Dissimilarity Index (*D*) based on [James & Taeuber (1985)](https://doi.org/10.2307/270845). Also using data from the ACS-5 (2005-2009 onward), the *ndi* package can retrieve the aspatial income Gini Index (*G*) based on [Gini (1921)](https://doi.org/10.2307/2223319). +The *ndi* package is a suite of [**R**](https://cran.r-project.org/) functions to compute various metrics of socio-economic deprivation and disparity in the United States. Some metrics are considered 'spatial' because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other metrics are 'aspatial' because they only consider the value within each census geography. Two types of aspatial neighborhood deprivation index (*NDI*) are available: (1) based on [Messer et al. (2006)](https://doi.org/10.1007/s11524-006-9094-x) and (2) based on [Andrews et al. (2020)](https://doi.org/10.1080/17445647.2020.1750066) and [Slotman et al. (2022)](https://doi.org/10.1016/j.dib.2022.108002) who use variables chosen by [Roux and Mair (2010)](https://doi.org/10.1111/j.1749-6632.2009.05333.x). Both are a decomposition of various demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates (ACS-5; 2006-2010 onward) pulled by the [tidycensus](https://CRAN.R-project.org/package=tidycensus) package. Using data from the ACS-5 (2005-2009 onward), the *ndi* package can also compute the (1) spatial Racial Isolation Index (*RI*) based on [Anthopolos et al. (2011)](https://doi.org/10.1016/j.sste.2011.06.002), (2) spatial Educational Isolation Index (*EI*) based on [Bravo et al. (2021)](https://doi.org/10.3390/ijerph18179384), (3) aspatial Index of Concentration at the Extremes (*ICE*) based on [Feldman et al. (2015)](https://doi.org/10.1136/jech-2015-205728) and [Krieger et al. (2016)](https://doi.org/10.2105/AJPH.2015.302955), (4) aspatial racial or ethnic Dissimilarity Index (*D*) based on [Duncan & Duncan (1955)](https://doi.org/10.2307/2088328), (5) aspatial income or racial or ethnic Atkinson Index (*A*) based on [Atkinson (1970)](https://doi.org/10.1016/0022-0531(70)90039-6), (6) aspatial racial or ethnic Interaction Index (_xPy\*_) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and [Bell (1954)](https://doi.org/10.2307/2574118), (7) aspatial racial or ethnic Correlation Ratio (*V*) based on [Bell (1954)](https://doi.org/10.2307/2574118) and [White (1986)](https://doi.org/10.2307/3644339), (8) aspatial racial or ethnic Location Quotient (*LQ*) based on [Merton (1939)](https://doi.org/10.2307/2084686) and [Sudano et al. (2013)](https://doi.org/10.1016/j.healthplace.2012.09.015), (9) aspatial racial or ethnic Local Exposure and Isolation (*LEx/Is*) metric based on [Bemanian & Beyer (2017)](https://doi.org/10.1158/1055-9965.EPI-16-0926), (10) aspatial racial or ethnic Delta (*DEL*) based on [Hoover (1941)](https://doi.org/10.1017/S0022050700052980) and Duncan et al. (1961; LC:60007089), (11) an index of spatial proximity (*SP*) based on [White (1986)](https://doi.org/10.2307/3644339) and Blau (1977; ISBN-13:978-0-029-03660-0), and (12) the aspatial racial or ethnic Isolation Index (_xPx\*_) based on Lieberson (1981; ISBN-13:978-1-032-53884-6) and [Bell (1954)](https://doi.org/10.2307/2574118), the (13) aspatial racial or ethnic Gini Index (*G*) based on [Gini (1921)](https://doi.org/10.2307/2223319), (14) aspatial racial or ethnic Dissimilarity Index (*D*) based on [James & Taeuber (1985)](https://doi.org/10.2307/270845), and (15) the aspatial racial or ethnic Entropy (*H*) based on Theil (1972; ISBN:978-0-444-10378-9) and [Theil & Finizza (1971)](https://doi.org/110.1080/0022250X.1971.9989795). Also using data from the ACS-5 (2005-2009 onward), the *ndi* package can retrieve the aspatial income Gini Index (*G*) based on [Gini (1921)](https://doi.org/10.2307/2223319). ### Installation @@ -99,6 +99,10 @@ To install the development version from GitHub: <td>Compute the aspatial racial or ethnic Location Quotient (<i>LQ</i>) based on <a href='https://doi.org/10.2307/2084686'>Merton (1938)</a> and <a href='https://doi.org/10.1016/j.healthplace.2012.09.015'>Sudano et al. (2013)</a></td> </tr> <tr> +<td><a href='/R/theil.R'><code>theil</code></a></td> +<td>Compute the aspatial racial or ethnic Entropy (<i>H</i>) based on Theil (1972; ISBN-13:978-0-444-10378-9) and <a href='https://doi.org/110.1080/0022250X.1971.9989795'>Theil & Finizza (1971)</a></td> +</tr> +<tr> <td><a href='/R/white.R'><code>white</code></a></td> <td>Compute the aspatial racial or ethnic Correlation Ratio (<i>V</i>) based on <a href='https://doi.org/10.2307/2574118'>Bell (1954)</a> and <a href='https://doi.org/10.2307/3644339'>White (1986)</a></td> </tr> @@ -1283,6 +1287,50 @@ ggplot() +  +```r +# ------------------------------------------------- # +# Compute aspatial racial or ethnic Entropy (Theil) # +# ------------------------------------------------- # + +# Entropy based on Theil (1972; ISBN:978-0-444-10378-9) and Theil & Finizza (1971) +## Selected subgroup: Not Hispanic or Latino, Black or African American alone +## Selected large geography: census tract +## Selected small geography: census block group +H_2020_DC <- theil( + geo_large = 'tract', + geo_small = 'block group', + state = 'DC', + year = 2020, + subgroup = 'NHoLB' +) + +# Obtain the 2020 census tracts from the 'tigris' package +tract_2020_DC <- tracts(state = 'DC', year = 2020, cb = TRUE) + +# Join the H (Theil) values to the census tract geometry +H_2020_DC <- tract_2020_DC %>% + left_join(H_2020_DC$h, by = 'GEOID') + +ggplot() + + geom_sf( + data = H_2020_DC, + aes(fill = H), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_c(limits = c(0, 1)) + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Entropy (Theil)\nWashington, D.C. census block groups to tracts', + subtitle = 'Black non-Hispanic' + ) +``` + + + ### Funding This package was originally developed while the author was a postdoctoral fellow supported by the [Cancer Prevention Fellowship Program](https://cpfp.cancer.gov) at the [National Cancer Institute](https://www.cancer.gov). Any modifications since December 05, 2022 were made while the author was an employee of [DLH, LLC](https://www.dlhcorp.com) (formerly Social & Scientific Systems, Inc.). diff --git a/cran-comments.md b/cran-comments.md index 2b5d765..43f8db4 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -5,18 +5,19 @@ * Added `white_blau()` function to compute an index of spatial proximity (*SP*) based on [White (1986)](https://doi.org/10.2307/3644339) and Blau (1977; ISBN-13:978-0-029-03660-0) * Added `lieberson()` function to compute the aspatial racial or ethnic Isolation Index (_xPx\*_) based on Lieberson (1981; ISBN-13:978-1-032-53884-6) and and [Bell (1954)](https://doi.org/10.2307/2574118) * Added `james_taeuber()` function to compute the aspatial racial or ethnic Dissimilarity Index (*D*) based on [James & Taeuber (1985)](https://doi.org/10.2307/270845) + * Added `theil()` function the aspatial racial or ethnic Entropy (*H*) based on Theil (1972; ISBN:978-0-444-10378-9) and [Theil & Finizza (1971)](https://doi.org/110.1080/0022250X.1971.9989795) * Added `geo_large = 'cbsa'` for Core Based Statistical Areas, `geo_large = 'csa'` for Combined Statistical Areas, and `geo_large = 'metro'` for Metropolitan Divisions as the larger geographical unit in `atkinson()`, `bell()`, `bemanian_beyer()`, `duncan()`, `hoover()`, `lieberson()`, `sudano()`, and `white()`, `white_blau()` functions. * Added `holder` argument to `atkinson()` function to toggle the computation with or without the Hölder mean. The function can now compute *A* without the Hölder mean. The default is `holder = FALSE`. * The `gini()` function now computes the aspatial racial or ethnic Gini Index (*G*) based on [Gini (1921)](https://doi.org/10.2307/2223319) as the main outcome. Arguments `geo_large`, `geo_small`, `subgroup`, and `omit_NAs` were added and argument `geo` was deprecated. The `gini()` function still retrieves the original output of the aspatial income Gini Index (*G*) at each smaller geography and is moved from the `g` output to `g_data` output. * `bell()` function computes the Interaction Index (Bell) not the Isolation Index as previously documented. Updated documentation throughout * Fixed bug in `bell()`, `bemanian_beyer()`, `duncan()`, `sudano()`, and `white()` functions when a smaller geography contains n=0 total population, will assign a value of zero (0) in the internal calculation instead of NA - * Renamed *AI* as *A*, *DI* as *D*, *Gini* as *G*, and *II* as _xPy\*_ to align with the definitions from [Massey & Denton (1988)](https://doi.org/10.1093/sf/67.2.281). The output for `atkinson()` now produces `a` instead of `ai`. The output for `duncan()` now produces `d` instead of `ai`. The output for `gini()` now produces `g` instead of `gini`. The output for `bell()` now produces `xPy_star` instead of `II`. The internal functions `ai_fun()`, `di_fun()` and `ii_fun()` were renamed `a_fun()`, `d_fun()` and `xpy_star_fun()`, respectively. + * Renamed *AI* as *A*, *DI* as *D*, *Gini* as *G*, and *II* as _xPy\*_ to align with the definitions from [Massey & Denton (1988)](https://doi.org/10.1093/sf/67.2.281). The output for `atkinson()` now produces `a` instead of `ai`. The output for `duncan()` now produces `d` instead of `ai`. The output for `gini()` now produces `g` instead of `gini`. The output for `bell()` now produces `xPy_star` instead of `II`. The internal functions `ai_fun()`, `di_fun()` and `ii_fun()` were renamed `a_fun()`, `ddd_fun()` and `xpy_star_fun()`, respectively. * `tigris` and `units` are now Imports * 'package.R' deprecated. Replaced with 'ndi-package.R' * Re-formatted code and documentation throughout for consistent readability * Renamed 'race/ethnicity' or 'racial/ethnic' to 'race or ethnicity' or 'racial or ethnic' throughout documentation to use more modern, inclusive, and appropriate language * Updated documentation about value range of *V* (White) from `{0 to 1}` to `{-Inf to Inf}` - * Added examples for `gini()`, `james_taeuber()`, `lieberson()`, `hoover()` and `white_blau()` functions in vignette and README + * Added examples for `gini()`, `james_taeuber()`, `lieberson()`, `hoover()`, `theil()`, and `white_blau()` functions in vignette and README * Added example for `holder` argument in `atkinson()` function in README * Reformatted functions for consistent internal structure * Updated examples in vignette to showcase a larger variety of U.S. states @@ -32,7 +33,7 @@ * <https://doi.org/10.2307/3644339> * <https://doi.org/10.2307/2084686> -* Some tests and examples for `anthopolos()`, `atkinson()`, `bell()`, `bemanian_beyer()`, `bravo()`, `duncan()`, `gini()`, `hoover()`, `james_taeuber()`, `krieger()`, `lieberson()`, `messer()`, `powell_wiley()`, `sudano()`, `white()`, and `white_blau()` functions require a Census API key so they are skipped if NULL or not run +* Some tests and examples for `anthopolos()`, `atkinson()`, `bell()`, `bemanian_beyer()`, `bravo()`, `duncan()`, `gini()`, `hoover()`, `james_taeuber()`, `krieger()`, `lieberson()`, `messer()`, `powell_wiley()`, `sudano()`, `theil()`, `white()`, and `white_blau()` functions require a Census API key so they are skipped if NULL or not run ## Test environments * local Windows install, R 4.4.1 diff --git a/inst/CITATION b/inst/CITATION index 6a4a5dd..2d98b10 100755 --- a/inst/CITATION +++ b/inst/CITATION @@ -3,7 +3,7 @@ bibentry(bibtype = 'manual', author = as.person('Ian D. Buller'), publisher = 'The Comprehensive R Archive Network', year = '2024', - number = '0.1.6.9000.', + number = '0.1.6.9006.', doi = '10.5281/zenodo.6989030', url = 'https://cran.r-project.org/package=ndi', @@ -517,3 +517,39 @@ bibentry(bibtype = 'Article', header = 'If you computed D (James & Taeuber) values, please also cite:' ) + +bibentry(bibtype = 'Book', + title = 'Statistical decomposition analysis. With applications in the social and administrative sciences', + author = as.person('Henri Theil'), + year = '1972', + city = 'Amsterdam', + publisher = 'North-Holland Publishing Company', + isbn = '978-0-444-10378-9', + + textVersion = + paste('Henri Theil (1972).', + 'Statistical decomposition analysis. With applications in the social and administrative sciences.', + 'Amsterdam: North-Holland Publishing Company', + 'ISBN-13:978-1-032-53884-6'), + + header = 'If you computed H (Theil) values, please also cite (1):' +) + +bibentry(bibtype = 'Article', + title = 'A note on the measurement of racial integration of schools by means of informational conceptsFootnote', + author = c(as.person('Henri Theil'), + as.person('Anthony J. Finizza')), + journal = 'Journal of Mathematical Sociology', + year = '1971', + volume = '1', + pages = '187--194', + doi = '10.1080/0022250X.1971.9989795', + + textVersion = + paste('Henri Theil & Anthony J. Finizza (1971).', + 'A note on the measurement of racial integration of schools by means of informational conceptsFootnote.', + 'Journal of Mathematical Sociology, 1, 187-194.', + 'DOI:10.1080/0022250X.1971.9989795'), + + header = 'And (2):' +) diff --git a/man/atkinson.Rd b/man/atkinson.Rd index 8af7fe6..b097d0f 100644 --- a/man/atkinson.Rd +++ b/man/atkinson.Rd @@ -76,9 +76,9 @@ The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. -\emph{A} is a measure of the evenness of residential inequality (e.g., racial or ethnic segregation) when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. \emph{A} can range in value from 0 to 1 with smaller values indicating lower levels of inequality (e.g., less segregation). +\emph{A} is a measure of the evenness of residential inequality (e.g., racial or ethnic segregation) when comparing smaller geographical units to larger ones within which the smaller geographical units are located. \emph{A} can range in value from 0 to 1 with smaller values indicating lower levels of inequality (e.g., less segregation). -The \code{epsilon} argument that determines how to weight the increments to inequality contributed by different proportions of the Lorenz curve. A user must explicitly decide how heavily to weight smaller geographical units at different points on the Lorenz curve (i.e., whether the index should take greater account of differences among areas of over- or under-representation). The \code{epsilon} argument must have values between 0 and 1.0. For \code{0 <= epsilon < 0.5} or less 'inequality-averse,' smaller geographical units with a subgroup proportion smaller than the subgroup proportion of the larger geographical unit contribute more to inequality ('over-representation'). For \code{0.5 < epsilon <= 1.0} or more 'inequality-averse,' smaller geographical units with a subgroup proportion larger than the subgroup proportion of the larger geographical unit contribute more to inequality ('under-representation'). If \code{epsilon = 0.5} (the default), units of over- and under-representation contribute equally to the index. See Section 2.3 of Saint-Jacques et al. (2020) \doi{10.48550/arXiv.2002.05819} for one method to select \code{epsilon}. +The \code{epsilon} argument that determines how to weight the increments to inequality contributed by different proportions of the Lorenz curve. A user must explicitly decide how heavily to weight smaller geographical units at different points on the Lorenz curve (i.e., whether the index should take greater account of differences among units of over- or under-representation). The \code{epsilon} argument must have values between 0 and 1.0. For \code{0 <= epsilon < 0.5} or less 'inequality-averse,' smaller geographical units with a subgroup proportion smaller than the subgroup proportion of the larger geographical unit contribute more to inequality ('over-representation'). For \code{0.5 < epsilon <= 1.0} or more 'inequality-averse,' smaller geographical units with a subgroup proportion larger than the subgroup proportion of the larger geographical unit contribute more to inequality ('under-representation'). If \code{epsilon = 0.5} (the default), units of over- and under-representation contribute equally to the index. See Section 2.3 of Saint-Jacques et al. (2020) \doi{10.48550/arXiv.2002.05819} for one method to select \code{epsilon}. Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{A} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{A} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{A} computation. } diff --git a/man/bemanian_beyer.Rd b/man/bemanian_beyer.Rd index 0697688..a4af420 100644 --- a/man/bemanian_beyer.Rd +++ b/man/bemanian_beyer.Rd @@ -73,9 +73,9 @@ The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. -\emph{LEx/Is} is a measure of the probability that two individuals living within a specific smaller geography (e.g., census tract) of either different (i.e., exposure) or the same (i.e., isolation) racial or ethnic subgroup(s) will interact, assuming that individuals within a smaller geography are randomly mixed. \emph{LEx/Is} is standardized with a logit transformation and centered against an expected case that all races or ethnicities are evenly distributed across a larger geography. (Note: will adjust data by 0.025 if probabilities are zero, one, or undefined. The output will include a warning if adjusted. See \code{\link[car]{logit}} for additional details.) +\emph{LEx/Is} is a measure of the probability that two individuals living within a specific smaller geographical unit (e.g., census tract) of either different (i.e., exposure) or the same (i.e., isolation) racial or ethnic subgroup(s) will interact, assuming that individuals within a smaller geographical unit are randomly mixed. \emph{LEx/Is} is standardized with a logit transformation and centered against an expected case that all races or ethnicities are evenly distributed across a larger geographical unit. (Note: will adjust data by 0.025 if probabilities are zero, one, or undefined. The output will include a warning if adjusted. See \code{\link[car]{logit}} for additional details.) -\emph{LEx/Is} can range from negative infinity to infinity. If \emph{LEx/Is} is zero then the estimated probability of the interaction between two people of the given subgroup(s) within a smaller geography is equal to the expected probability if the subgroup(s) were perfectly mixed in the larger geography. If \emph{LEx/Is} is greater than zero then the interaction is more likely to occur within the smaller geography than in the larger geography, and if \emph{LEx/Is} is less than zero then the interaction is less likely to occur within the smaller geography than in the larger geography. Note: the exponentiation of each \emph{LEx/Is} metric results in the odds ratio of the specific exposure or isolation of interest in a smaller geography relative to the larger geography. +\emph{LEx/Is} can range from negative infinity to infinity. If \emph{LEx/Is} is zero then the estimated probability of the interaction between two people of the given subgroup(s) within a smaller geographical unit is equal to the expected probability if the subgroup(s) were perfectly mixed in the larger geographical unit. If \emph{LEx/Is} is greater than zero then the interaction is more likely to occur within the smaller geographical unit than in the larger geographical unit, and if \emph{LEx/Is} is less than zero then the interaction is less likely to occur within the smaller geographical unit than in the larger geographical unit. Note: the exponentiation of each \emph{LEx/Is} metric results in the odds ratio of the specific exposure or isolation of interest in a smaller geographical unit relative to the larger geographical unit. Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{LEx/Is} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{LEx/Is} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{LEx/Is} computation. } diff --git a/man/duncan.Rd b/man/duncan.Rd index ed6b4e0..b38a167 100644 --- a/man/duncan.Rd +++ b/man/duncan.Rd @@ -36,7 +36,7 @@ duncan( An object of class 'list'. This is a named list with the following components: \describe{ -\item{\code{di}}{An object of class 'tbl' for the GEOID, name, and \emph{D} at specified larger census geographies.} +\item{\code{d}}{An object of class 'tbl' for the GEOID, name, and \emph{D} at specified larger census geographies.} \item{\code{d_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.} \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute \emph{D}.} } @@ -73,7 +73,7 @@ The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. -\emph{D} is a measure of the evenness of racial or ethnic residential segregation when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. \emph{D} can range in value from 0 to 1 and represents the proportion of racial or ethnic subgroup members that would have to change their area of residence to achieve an even distribution within the larger geographical area under conditions of maximum segregation. +\emph{D} is a measure of the evenness of racial or ethnic residential segregation when comparing smaller geographical units to larger ones within which the smaller geographical units are located. \emph{D} can range in value from 0 to 1 and represents the proportion of racial or ethnic subgroup members that would have to change their area of residence to achieve an even distribution within the larger geographical area under conditions of maximum segregation. Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{D} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{D} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{D} computation. } diff --git a/man/figures/h.png b/man/figures/h.png new file mode 100644 index 0000000..e4e2601 Binary files /dev/null and b/man/figures/h.png differ diff --git a/man/james_taeuber.Rd b/man/james_taeuber.Rd index f42ea3f..0fc86a0 100644 --- a/man/james_taeuber.Rd +++ b/man/james_taeuber.Rd @@ -33,7 +33,7 @@ james_taeuber( An object of class 'list'. This is a named list with the following components: \describe{ -\item{\code{di}}{An object of class 'tbl' for the GEOID, name, and \emph{D} at specified larger census geographies.} +\item{\code{d}}{An object of class 'tbl' for the GEOID, name, and \emph{D} at specified larger census geographies.} \item{\code{d_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.} \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute \emph{D}.} } @@ -70,7 +70,7 @@ The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. -\emph{D} is a measure of the evenness of racial or ethnic residential segregation when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. \emph{D} can range in value from 0 to 1 and represents the proportion of racial or ethnic subgroup members that would have to change their area of residence to achieve an even distribution within the larger geographical area under conditions of maximum segregation. +\emph{D} is a measure of the evenness of racial or ethnic residential segregation when comparing smaller geographical units to larger ones within which the smaller geographical units are located. \emph{D} can range in value from 0 to 1 and represents the proportion of racial or ethnic subgroup members that would have to change their area of residence to achieve an even distribution within the larger geographical area under conditions of maximum segregation. Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{D} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{D} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{D} computation. } diff --git a/man/ndi-package.Rd b/man/ndi-package.Rd index 3bf90b4..bf0d9a5 100644 --- a/man/ndi-package.Rd +++ b/man/ndi-package.Rd @@ -43,6 +43,8 @@ Key content of the 'ndi' package include:\cr \code{\link{sudano}} Computes the aspatial racial or ethnic Location Quotient (\emph{LQ}) based on Merton (1939) \doi{10.2307/2084686} and Sudano et al. (2013) \doi{10.1016/j.healthplace.2012.09.015}. +\code{\link{theil}} Computes the aspatial racial or ethnic Entropy (\emph{H}) based on Theil (1972; ISBN-13:978-0-444-10378-9) and Theil & Finizza (1971) \doi{110.1080/0022250X.1971.9989795}. + \code{\link{white}} Computes the aspatial racial or ethnic Correlation Ratio (\emph{V}) based on Bell (1954) \doi{10.2307/2574118} and White (1986) \doi{10.2307/3644339}. \code{\link{white_blau}} Computes an index of spatial proximity (\emph{SP}) based on White (1986) \doi{10.2307/3644339} and Blau (1977; ISBN-13:978-0-029-03660-0). diff --git a/man/sudano.Rd b/man/sudano.Rd index 977a278..b5c0542 100644 --- a/man/sudano.Rd +++ b/man/sudano.Rd @@ -70,7 +70,7 @@ The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. -\emph{LQ} is some measure of relative racial homogeneity of each smaller geography within a larger geography. \emph{LQ} can range in value from 0 to infinity because it is ratio of two proportions in which the numerator is the proportion of subgroup population in a smaller geography and the denominator is the proportion of subgroup population in its larger geography. For example, a smaller geography with an \emph{LQ} of 5 means that the proportion of the subgroup population living in the smaller geography is five times the proportion of the subgroup population in its larger geography. +\emph{LQ} is some measure of relative racial homogeneity of each smaller geographical units within a larger geographical unit. \emph{LQ} can range in value from 0 to infinity because it is ratio of two proportions in which the numerator is the proportion of subgroup population in a smaller geographical unit and the denominator is the proportion of subgroup population in its larger geographical unit. For example, a smaller geographical unit with an \emph{LQ} of 5 means that the proportion of the subgroup population living in the smaller geographical unit is five times the proportion of the subgroup population in its larger geographical unit. Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{LQ} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{LQ} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{LQ} computation. } diff --git a/man/theil.Rd b/man/theil.Rd new file mode 100644 index 0000000..910c289 --- /dev/null +++ b/man/theil.Rd @@ -0,0 +1,99 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/theil.R +\name{theil} +\alias{theil} +\title{Entropy based on Theil (1972) and Theil & Finizza (1971)} +\usage{ +theil( + geo_large = "county", + geo_small = "tract", + year = 2020, + subgroup, + omit_NAs = TRUE, + quiet = FALSE, + ... +) +} +\arguments{ +\item{geo_large}{Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}.} + +\item{geo_small}{Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}.} + +\item{year}{Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available.} + +\item{subgroup}{Character string specifying the racial or ethnic subgroup(s) as the comparison population. See Details for available choices.} + +\item{omit_NAs}{Logical. If FALSE, will compute index for a larger geographical unit only if all of its smaller geographical units have values. The default is TRUE.} + +\item{quiet}{Logical. If TRUE, will display messages about potential missing census information. The default is FALSE.} + +\item{...}{Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics} +} +\value{ +An object of class 'list'. This is a named list with the following components: + +\describe{ +\item{\code{h}}{An object of class 'tbl' for the GEOID, name, and \emph{H} at specified larger census geographies.} +\item{\code{h_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.} +\item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute \emph{H}.} +} +} +\description{ +Compute the aspatial Entropy (Theil) of selected racial or ethnic subgroup(s) and U.S. geographies +} +\details{ +This function will compute the aspatial Entropy (\emph{H}) of selected racial or ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Theil (1972; ISBN-13:978-0-444-10378-9) and Theil & Finizza (1971) \doi{110.1080/0022250X.1971.9989795}. This function provides the computation of \emph{H} for any of the U.S. Census Bureau race or ethnicity subgroups (including Hispanic and non-Hispanic individuals). + +The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available (2010 onward for \code{geo_large = 'cbsa'} and 2011 onward for \code{geo_large = 'csa'} or \code{geo_large = 'metro'}) but may be available from other U.S. Census Bureau surveys. The twenty racial or ethnic subgroups (U.S. Census Bureau definitions) are: +\itemize{ +\item \strong{B03002_002}: not Hispanic or Latino \code{'NHoL'} +\item \strong{B03002_003}: not Hispanic or Latino, white alone \code{'NHoLW'} +\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{'NHoLA'} +\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +\item \strong{B03002_012}: Hispanic or Latino \code{'HoL'} +\item \strong{B03002_013}: Hispanic or Latino, white alone \code{'HoLW'} +\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{'HoLB'} +\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{'HoLA'} +\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} +} + +Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. + +\emph{H} is a measure of the evenness of racial or ethnic residential segregation when comparing smaller geographical units to larger ones within which the smaller geographical units are located. \emph{H} can range in value from 0 to 1 and represents the (weighted) average deviation of each smaller geographical unit from the larger geographical unit's "entropy" or racial and ethnic diversity, which is greatest when each group is equally represented in the larger geographical unit. \emph{H} varies between 0, when all smaller geographical units have the same racial or ethnic composition as the larger geographical area (i.e., maximum integration), to a high of 1, when all smaller geographical units contain one group only (maximum segregation). + +Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, census tract \code{geo_large = 'tract'}, Core Based Statistical Area \code{geo_large = 'cbsa'}, Combined Statistical Area \code{geo_large = 'csa'}, and Metropolitan Division \code{geo_large = 'metro'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the \emph{H} value returned is NA. If the larger geographical unit is Combined Based Statistical Areas \code{geo_large = 'csa'} or Core Based Statistical Areas \code{geo_large = 'cbsa'}, only the smaller geographical units completely within a larger geographical unit are considered in the \emph{H} computation (see internal \code{\link[sf]{st_within}} function for more information) and recommend specifying all states within which the interested larger geographical unit are located using the internal \code{state} argument to ensure all appropriate smaller geographical units are included in the \emph{H} computation. + +Note: The computation differs from Massey & Denton (1988) \doi{10.1093/sf/67.2.281} by taking the absolute value of \code{(E-E_{i})} so extent of the output is \code{{0, 1}} as designed by Theil (1972; ISBN-13:978-0-444-10378-9) instead of \code{{-Inf, Inf}} as described in Massey & Denton (1988) \doi{10.1093/sf/67.2.281}. +} +\examples{ +\dontrun{ +# Wrapped in \dontrun{} because these examples require a Census API key. + + # Entropy (Theil) + ## of Black populations + ## of census tracts within counties within Georgia, U.S.A., counties (2020) + theil( + geo_large = 'county', + geo_small = 'tract', + state = 'GA', + year = 2020, + subgroup = c('NHoLB', 'HoLB') + ) + +} + +} +\seealso{ +\code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). +} diff --git a/man/white.Rd b/man/white.Rd index 5a57532..893e86d 100644 --- a/man/white.Rd +++ b/man/white.Rd @@ -21,7 +21,7 @@ white( \item{year}{Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available.} -\item{subgroup}{Character string specifying the racial or ethnnic subgroup(s). See Details for available choices.} +\item{subgroup}{Character string specifying the racial or ethnic subgroup(s). See Details for available choices.} \item{omit_NAs}{Logical. If FALSE, will compute index for a larger geographical unit only if all of its smaller geographical units have values. The default is TRUE.} @@ -39,12 +39,12 @@ An object of class 'list'. This is a named list with the following components: } } \description{ -Compute the aspatial Correlation Ratio (White) of a selected racial or ethnnic subgroup(s) and U.S. geographies. +Compute the aspatial Correlation Ratio (White) of a selected racial or ethnic subgroup(s) and U.S. geographies. } \details{ -This function will compute the aspatial Correlation Ratio (\emph{V} or \eqn{Eta^{2}}{Eta^2}) of selected racial or ethnnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Bell (1954) \doi{10.2307/2574118} and White (1986) \doi{10.2307/3644339}. This function provides the computation of \emph{V} for any of the U.S. Census Bureau race or ethnnicity subgroups (including Hispanic and non-Hispanic individuals). +This function will compute the aspatial Correlation Ratio (\emph{V} or \eqn{Eta^{2}}{Eta^2}) of selected racial or ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Bell (1954) \doi{10.2307/2574118} and White (1986) \doi{10.2307/3644339}. This function provides the computation of \emph{V} for any of the U.S. Census Bureau race or ethnicity subgroups (including Hispanic and non-Hispanic individuals). -The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available (2010 onward for \code{geo_large = 'cbsa'} and 2011 onward for \code{geo_large = 'csa'} or \code{geo_large = 'metro'}) but may be available from other U.S. Census Bureau surveys. The twenty racial or ethnnic subgroups (U.S. Census Bureau definitions) are: +The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available (2010 onward for \code{geo_large = 'cbsa'} and 2011 onward for \code{geo_large = 'csa'} or \code{geo_large = 'metro'}) but may be available from other U.S. Census Bureau surveys. The twenty racial or ethnic subgroups (U.S. Census Bureau definitions) are: \itemize{ \item \strong{B03002_002}: not Hispanic or Latino \code{'NHoL'} \item \strong{B03002_003}: not Hispanic or Latino, white alone \code{'NHoLW'} diff --git a/man/white_blau.Rd b/man/white_blau.Rd index d01cb8c..d1fc37f 100644 --- a/man/white_blau.Rd +++ b/man/white_blau.Rd @@ -73,7 +73,7 @@ The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. -\emph{SP} is a measure of clustering of racial or ethnic populations within smaller geographical areas that are located within larger geographical areas. \emph{SP} can range in value from 0 to Inf and represents the degree to which an area is a racial or ethnic enclave. A value of 1 indicates there is no differential clustering between subgroup and referent group members. A value greater than 1 indicates subgroup members live nearer to one another than to referent subgroup members. A value less than 1 indicates subgroup live nearer to and referent subgroup members than to their own subgroup members. +\emph{SP} is a measure of clustering of racial or ethnic populations within smaller geographical units that are located within larger geographical units. \emph{SP} can range in value from 0 to Inf and represents the degree to which an area is a racial or ethnic enclave. A value of 1 indicates there is no differential clustering between subgroup and referent group members. A value greater than 1 indicates subgroup members live nearer to one another than to referent subgroup members. A value less than 1 indicates subgroup live nearer to and referent subgroup members than to their own subgroup members. The metric uses the exponential transform of a distance matrix (kilometers) between smaller geographical area centroids, with a diagonal defined as \code{(0.6*a_{i})^{0.5}} where \code{a_{i}} is the area (square kilometers) of smaller geographical unit \code{i} as defined by White (1983) \doi{10.1086/227768}. diff --git a/tests/testthat/test-theil.R b/tests/testthat/test-theil.R new file mode 100644 index 0000000..bdb1ac8 --- /dev/null +++ b/tests/testthat/test-theil.R @@ -0,0 +1,87 @@ +context('theil') + +# -------------- # +# theil testthat # +# -------------- # + +test_that('theil throws error with invalid arguments', { + # Unavailable geography + expect_error( + theil( + geo_small = 'zcta', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + ) + ) + expect_error( + theil( + geo_large = 'block group', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + ) + ) + + # Unavailable year + expect_error( + theil( + state = 'DC', + year = 2005, + subgroup = 'NHoLB', + quiet = TRUE + ) + ) + + # Unavailable subgroup + expect_error( + theil( + state = 'DC', + year = 2020, + subgroup = 'terran', + quiet = TRUE + ) + ) + + skip_if(Sys.getenv('CENSUS_API_KEY') == '') + + # Incorrect state + expect_error( + theil( + state = 'AB', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + ) + ) + +}) + +test_that('theil works', { + skip_if(Sys.getenv('CENSUS_API_KEY') == '') + + expect_silent(theil( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB'), + )) + + expect_silent( + theil( + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + ) + ) + + expect_silent(theil( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB'), + quiet = TRUE + )) + +}) diff --git a/vignettes/vignette.Rmd b/vignettes/vignette.Rmd index 3b49f43..477ad98 100644 --- a/vignettes/vignette.Rmd +++ b/vignettes/vignette.Rmd @@ -541,14 +541,15 @@ Since version v0.1.1, the [*ndi*](https://CRAN.R-project.org/package=ndi) packag 4. `krieger()` function that computes the Index of Concentration at the Extremes (*ICE*) based on based on [Feldman et al. (2015)](https://doi.org/10.1136/jech-2015-205728) and [Krieger et al. (2016)](https://doi.org/10.2105/AJPH.2015.302955) 5. `duncan()` function that computes the Dissimilarity Index (*D*) based on [Duncan & Duncan (1955)](https://doi.org/10.2307/2088328) 6. `atkinson()` function that computes the Atkinson Index (*A*) based on [Atkinson (1970)](https://doi.org/10.1016/0022-0531(70)90039-6) -7. `bell()` function that computes the aspatial racial or ethnic Interaction Index (_xPy\*_) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and [Bell (1954)](https://doi.org/10.2307/2574118) -8. `white()` function that computes the aspatial racial or ethnic Correlation Ratio (*V*) based on [Bell (1954)](https://doi.org/10.2307/2574118) and [White (1986)](https://doi.org/10.2307/3644339) -9. `sudano()` function that computes the aspatial racial or ethnic Location Quotient (*LQ*) based on [Merton (1939)](https://doi.org/10.2307/2084686) and [Sudano et al. (2013)](https://doi.org/10.1016/j.healthplace.2012.09.015) -10. `bemanian_beyer()` function that computes the aspatial racial or ethnic Local Exposure and Isolation (*LEx/Is*) metric based on [Bemanian & Beyer (2017)](https://doi.org/10.1158/1055-9965.EPI-16-0926) -11. `hoover()` function that computes the aspatial racial or ethnic Delta (*DEL*) based on [Hoover (1941)](https://doi.org/10.1017/S0022050700052980) and Duncan et al. (1961; LC:60007089) +7. `bell()` function that computes the racial or ethnic Interaction Index (_xPy\*_) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and [Bell (1954)](https://doi.org/10.2307/2574118) +8. `white()` function that computes the racial or ethnic Correlation Ratio (*V*) based on [Bell (1954)](https://doi.org/10.2307/2574118) and [White (1986)](https://doi.org/10.2307/3644339) +9. `sudano()` function that computes the racial or ethnic Location Quotient (*LQ*) based on [Merton (1939)](https://doi.org/10.2307/2084686) and [Sudano et al. (2013)](https://doi.org/10.1016/j.healthplace.2012.09.015) +10. `bemanian_beyer()` function that computes the racial or ethnic Local Exposure and Isolation (*LEx/Is*) metric based on [Bemanian & Beyer (2017)](https://doi.org/10.1158/1055-9965.EPI-16-0926) +11. `hoover()` function that computes the racial or ethnic Delta (*DEL*) based on [Hoover (1941)](https://doi.org/10.1017/S0022050700052980) and Duncan et al. (1961; LC:60007089) 12. `white_blau()` function that computes an index of spatial proximity (*SP*) based on [White (1986)](https://doi.org/10.2307/3644339) and Blau (1977; ISBN-13:978-0-029-03660-0) -13. `lieberson()` function that computes the aspatial racial or ethnic Isolation Index (_xPx\*_) based on Lieberson (1981; ISBN-13:978-1-032-53884-6) and [Bell (1954)](https://doi.org/10.2307/2574118) +13. `lieberson()` function that computes the racial or ethnic Isolation Index (_xPx\*_) based on Lieberson (1981; ISBN-13:978-1-032-53884-6) and [Bell (1954)](https://doi.org/10.2307/2574118) 14. `james_taeuber()` function that computes the Dissimilarity Index (*D*) based on based on [James & Taeuber (1985)](https://doi.org/10.2307/270845) +15. `theil()` function that computes the racial or ethnic Entropy (*H*) based on Theil (1972; ISBN:978-0-444-10378-9) and [Theil & Finizza (1971)](https://doi.org/110.1080/0022250X.1971.9989795) #### Compute Racial Isolation Index (*RI*) @@ -729,7 +730,7 @@ Can correct one source of edge effect in the same manner as shown for the *RI* m #### The racial or ethnic Gini Index (*G*) -Compute the aspatial racial or ethnic Gini Index (*G*) values (2006-2010 5-year ACS) for Massachusetts, U.S.A., census tracts within counties. This metric is based on [Gini (1921)](https://doi.org/10.2307/2223319). Multiple racial or ethnic subgroups are available in the `gini()` function, including: +Compute the racial or ethnic Gini Index (*G*) values (2006-2010 5-year ACS) for Massachusetts, U.S.A., census tracts within counties. This metric is based on [Gini (1921)](https://doi.org/10.2307/2223319). Multiple racial or ethnic subgroups are available in the `gini()` function, including: | ACS table source | racial or ethnic subgroup | character for `subgroup` arguments | | -------------- | ------------- | ---------------- | @@ -800,7 +801,7 @@ ggplot() + #### The income Gini Index (*G*) -Retrieve the aspatial income Gini Index (*G*) values (2006-2010 5-year ACS) for Massachusetts, U.S.A., census tracts. This metric is based on [Gini (1921)](https://doi.org/10.2307/2223319), and the `gini()` function retrieves the estimate from the ACS-5 when calculating the Gini Index (*G*) for racial or ethnic inequality. +Retrieve the income Gini Index (*G*) values (2006-2010 5-year ACS) for Massachusetts, U.S.A., census tracts. This metric is based on [Gini (1921)](https://doi.org/10.2307/2223319), and the `gini()` function retrieves the estimate from the ACS-5 when calculating the Gini Index (*G*) for racial or ethnic inequality. According to the [U.S. Census Bureau](https://census.gov/topics/income-poverty/income-inequality/about/metrics/gini-index.html): 'The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income). The Gini Index is based on the difference between the Lorenz curve (the observed cumulative income distribution) and the notion of a perfectly equal income distribution.' @@ -850,7 +851,7 @@ ggplot() + ### Index of Concentration at the Extremes (*ICE*) -Compute the aspatial Index of Concentration at the Extremes values (2006-2010 5-year ACS) for Wayne County, Michigan, U.S.A., census tracts. Wayne County is the home of Detroit, Michigan, a highly segregated city in the U.S. This metric is based on [Feldman et al. (2015)](https://doi.org/10.1136/jech-2015-205728) and [Krieger et al. (2016)](https://doi.org/10.2105/AJPH.2015.302955) who expanded the metric designed by Massey in a chapter of [Booth & Crouter (2001)](https://doi.org/10.4324/9781410600141) initially designed for residential segregation. The `krieger()` function computes five *ICE* metrics using the following ACS-5 groups: +Compute the Index of Concentration at the Extremes values (2006-2010 5-year ACS) for Wayne County, Michigan, U.S.A., census tracts. Wayne County is the home of Detroit, Michigan, a highly segregated city in the U.S. This metric is based on [Feldman et al. (2015)](https://doi.org/10.1136/jech-2015-205728) and [Krieger et al. (2016)](https://doi.org/10.2105/AJPH.2015.302955) who expanded the metric designed by Massey in a chapter of [Booth & Crouter (2001)](https://doi.org/10.4324/9781410600141) initially designed for residential segregation. The `krieger()` function computes five *ICE* metrics using the following ACS-5 groups: | ACS table group | *ICE* metric | Comparison | | -------------- | ------------- | ---------------- | @@ -987,7 +988,7 @@ ggplot() + #### Compute racial or ethnic Dissimilarity Index (*D*) -Compute the aspatial racial or ethnic *D* values (2006-2010 5-year ACS) for Pennsylvania, U.S.A., counties from census tracts. This metric is based on [Duncan & Duncan (1955)](https://doi.org/10.2307/2088328) that assessed the racial or ethnic isolation of students that identify as non-Hispanic or Latino, Black or African American alone compared to students that identify as non-Hispanic or Latino, white alone between schools and school districts. Multiple racial or ethnic subgroups are available in the `duncan()` function, including: +Compute the racial or ethnic *D* values (2006-2010 5-year ACS) for Pennsylvania, U.S.A., counties from census tracts. This metric is based on [Duncan & Duncan (1955)](https://doi.org/10.2307/2088328) that assessed the racial or ethnic isolation of students that identify as non-Hispanic or Latino, Black or African American alone compared to students that identify as non-Hispanic or Latino, white alone between schools and school districts. Multiple racial or ethnic subgroups are available in the `duncan()` function, including: | ACS table source | racial or ethnic subgroup | character for `subgroup` or `subgroup_ref` arguments | | -------------- | ------------- | ---------------- | @@ -1058,9 +1059,9 @@ ggplot() + ) ``` -#### Compute aspatial income or racial or ethnic Atkinson Index (*A*) +#### Compute income or racial or ethnic Atkinson Index (*A*) -Compute the aspatial income or racial or ethnic *A* values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties from census block groups. This metric is based on [Atkinson (1970)](https://doi.org/10.2307/2088328) that assessed the distribution of income within 12 counties but has since been adapted to study racial or ethnic segregation (see [James & Taeuber 1985](https://doi.org/10.2307/270845)). To compare median household income, specify `subgroup = 'MedHHInc'` which will use the ACS-5 variable 'B19013_001' in the computation. Multiple racial or ethnic subgroups are available in the `atkinson()` function, including: +Compute the income or racial or ethnic *A* values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties from census block groups. This metric is based on [Atkinson (1970)](https://doi.org/10.2307/2088328) that assessed the distribution of income within 12 counties but has since been adapted to study racial or ethnic segregation (see [James & Taeuber 1985](https://doi.org/10.2307/270845)). To compare median household income, specify `subgroup = 'MedHHInc'` which will use the ACS-5 variable 'B19013_001' in the computation. Multiple racial or ethnic subgroups are available in the `atkinson()` function, including: | ACS table source | racial or ethnic subgroup | character for `subgroup` argument | | -------------- | ------------- | ---------------- | @@ -1133,7 +1134,7 @@ ggplot() + #### Compute racial or ethnic Interaction Index (_xPy\*_) -Compute the aspatial racial or ethnic _xPy\*_ values (2017-2021 5-year ACS) for Ohio, U.S.A., counties from census tracts. This metric is based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and adapted by [Bell (1954)](https://doi.org/10.2307/2574118). Multiple racial or ethnic subgroups are available in the `bell()` function, including: +Compute the racial or ethnic _xPy\*_ values (2017-2021 5-year ACS) for Ohio, U.S.A., counties from census tracts. This metric is based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and adapted by [Bell (1954)](https://doi.org/10.2307/2574118). Multiple racial or ethnic subgroups are available in the `bell()` function, including: | ACS table source | racial or ethnic subgroup | character for `subgroup` or `subgroup_ixn` argument | | -------------- | ------------- | ---------------- | @@ -1204,7 +1205,7 @@ ggplot() + #### Compute Correlation Ratio (*V*) -Compute the aspatial racial or ethnic *V* values (2017-2021 5-year ACS) for South Carolina, U.S.A., counties from census tracts. This metric is based on [Bell (1954)](https://doi.org/10.2307/2574118) and adapted by [White (1986)](https://doi.org/10.2307/3644339). Multiple racial or ethnic subgroups are available in the `white()` function, including: +Compute the racial or ethnic *V* values (2017-2021 5-year ACS) for South Carolina, U.S.A., counties from census tracts. This metric is based on [Bell (1954)](https://doi.org/10.2307/2574118) and adapted by [White (1986)](https://doi.org/10.2307/3644339). Multiple racial or ethnic subgroups are available in the `white()` function, including: | ACS table source | racial or ethnic subgroup | character for `subgroup` argument | | -------------- | ------------- | ---------------- | @@ -1279,7 +1280,7 @@ ggplot() + #### Compute Location Quotient (*LQ*) -Compute the aspatial racial or ethnic *LQ* values (2017-2021 5-year ACS) for Tennessee, U.S.A., counties vs. the state. This metric is based on [Merton (1939)](https://doi.org/10.2307/2084686) and adapted by [Sudano et al. (2013)](https://doi.org/10.1016/j.healthplace.2012.09.015). Multiple racial or ethnic subgroups are available in the `sudano()` function, including: +Compute the racial or ethnic *LQ* values (2017-2021 5-year ACS) for Tennessee, U.S.A., counties vs. the state. This metric is based on [Merton (1939)](https://doi.org/10.2307/2084686) and adapted by [Sudano et al. (2013)](https://doi.org/10.1016/j.healthplace.2012.09.015). Multiple racial or ethnic subgroups are available in the `sudano()` function, including: | ACS table source | racial or ethnic subgroup | character for `subgroup` argument | | -------------- | ------------- | ---------------- | @@ -1349,7 +1350,7 @@ ggplot() + #### Compute Local Exposure and Isolation (*LEx/Is*) -Compute the aspatial racial or ethnic Local Exposure and Isolation metric (2017-2021 5-year ACS) for Mississippi, U.S.A., counties vs. the state. This metric is based on [Bemanian & Beyer (2017)](https://doi.org/10.1158/1055-9965.EPI-16-0926). Multiple racial or ethnic subgroups are available in the `bemanian_beyer()` function, including: +Compute the racial or ethnic Local Exposure and Isolation metric (2017-2021 5-year ACS) for Mississippi, U.S.A., counties vs. the state. This metric is based on [Bemanian & Beyer (2017)](https://doi.org/10.1158/1055-9965.EPI-16-0926). Multiple racial or ethnic subgroups are available in the `bemanian_beyer()` function, including: | ACS table source | racial or ethnic subgroup | character for `subgroup` or `subgroup_ixn` argument | | -------------- | ------------- | ---------------- | @@ -1449,7 +1450,7 @@ ggplot() + #### Compute Delta (*DEL*) -Compute the aspatial racial or ethnic *DEL* values (2017-2021 5-year ACS) for Alabama, U.S.A., counties from census tracts. This metric is based on [Hoover (1941)](https://doi.org/10.1017/S0022050700052980) and Duncan et al. (1961; LC:60007089). Multiple racial or ethnic subgroups are available in the `hoover()` function, including: +Compute the racial or ethnic *DEL* values (2017-2021 5-year ACS) for Alabama, U.S.A., counties from census tracts. This metric is based on [Hoover (1941)](https://doi.org/10.1017/S0022050700052980) and Duncan et al. (1961; LC:60007089). Multiple racial or ethnic subgroups are available in the `hoover()` function, including: | ACS table source | racial or ethnic subgroup | character for `subgroup` argument | | -------------- | ------------- | ---------------- | @@ -1601,7 +1602,7 @@ ggplot() + #### Compute racial or ethnic Isolation Index (_xPx\*_) -Compute the aspatial racial or ethnic _xPx\*_ values (2015-2019 5-year ACS) for Delaware, U.S.A., census tracts from census block groups. This metric is based on [Bell (1954)](https://doi.org/10.2307/2574118) and adapted by Lieberson (1981; ISBN-13:978-1-032-53884-6). Multiple racial or ethnic subgroups are available in the `lieberson()` function, including: +Compute the racial or ethnic _xPx\*_ values (2015-2019 5-year ACS) for Delaware, U.S.A., census tracts from census block groups. This metric is based on [Bell (1954)](https://doi.org/10.2307/2574118) and adapted by Lieberson (1981; ISBN-13:978-1-032-53884-6). Multiple racial or ethnic subgroups are available in the `lieberson()` function, including: | ACS table source | racial or ethnic subgroup | character for `subgroup` argument | | -------------- | ------------- | ---------------- | @@ -1671,7 +1672,7 @@ ggplot() + #### Compute racial or ethnic Dissimilarity Index (*D*) -Compute the aspatial racial or ethnic *D* values (2006-2010 5-year ACS) for Pennsylvania, U.S.A., counties from census tracts. This metric is based on [James & Taeuber (1985)](https://doi.org/10.2307/270845). Multiple racial or ethnic subgroups are available in the `james_taeuber()` function, including: +Compute the racial or ethnic *D* values (2006-2010 5-year ACS) for Pennsylvania, U.S.A., counties from census tracts. This metric is based on [James & Taeuber (1985)](https://doi.org/10.2307/270845). Multiple racial or ethnic subgroups are available in the `james_taeuber()` function, including: | ACS table source | racial or ethnic subgroup | character for `subgroup`arguments | | -------------- | ------------- | ---------------- | @@ -1741,6 +1742,80 @@ ggplot() + ) ``` +#### Compute racial or ethnic Entropy (*H*) + +Compute Entropy (2010-2014 5-year ACS) for Philadelphia, PA, metropolitan area from census tracts. This metric is based on based on Theil (1972; ISBN:978-0-444-10378-9) and [Theil & Finizza (1971)](https://doi.org/110.1080/0022250X.1971.9989795). Multiple racial or ethnic subgroups are available in the `theil()` function, including: + +| ACS table source | racial or ethnic subgroup | character for `subgroup`arguments | +| -------------- | ------------- | ---------------- | +| B03002_002 | not Hispanic or Latino | NHoL | +| B03002_003 | not Hispanic or Latino, white alone | NHoLW | +| B03002_004 | not Hispanic or Latino, Black or African American alone | NHoLB | +| B03002_005 | not Hispanic or Latino, American Indian and Alaska Native alone | NHoLAIAN | +| B03002_006 | not Hispanic or Latino, Asian alone | NHoLA | +| B03002_007 | not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone | NHoLNHOPI | +| B03002_008 | not Hispanic or Latino, some other race alone | NHoLSOR | +| B03002_009 | not Hispanic or Latino, two or more races | NHoLTOMR | +| B03002_010 | not Hispanic or Latino, two races including some other race | NHoLTRiSOR | +| B03002_011 | not Hispanic or Latino, two races excluding some other race, and three or more races | NHoLTReSOR | +| B03002_012 | Hispanic or Latino | HoL | +| B03002_013 | Hispanic or Latino, white alone | HoLW | +| B03002_014 | Hispanic or Latino, Black or African American alone | HoLB | +| B03002_015 | Hispanic or Latino, American Indian and Alaska Native alone | HoLAIAN | +| B03002_016 | Hispanic or Latino, Asian alone | HoLA | +| B03002_017 | Hispanic or Latino, Native Hawaiian and other Pacific Islander alone | HoLNHOPI | +| B03002_018 | Hispanic or Latino, some other race alone | HoLSOR | +| B03002_019 | Hispanic or Latino, two or more races | HoLTOMR | +| B03002_020 | Hispanic or Latino, two races including some other race | HoLTRiSOR | +| B03002_021 | Hispanic or Latino, two races excluding some other race, and three or more races | HoLTReSOR | + +*H* is a measure of the evenness of racial or ethnic residential segregation when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. *H* can range in value from 0 to 1 and represents the (weighted) average deviation of each smaller geographical unit from the larger geographical unit's "entropy" or racial and ethnic diversity, which is greatest when each group is equally represented in the larger geographical unit. *H* varies between 0, when all smaller geographical units have the same racial or ethnic composition as the larger geographical area (i.e., maximum integration), to a high of 1, when all smaller geographical units contain one group only (maximum segregation). + +```{r theil_prep, results = 'hide'} +theil2014PA <- theil( + geo_large = 'metro', + geo_small = 'tract', + state = c('PA', 'NJ', 'DE', 'MD', 'OH', 'WV', 'NY', 'CT'), + year = 2014, + subgroup = c('NHoLB', 'HoLB') +) + +# Obtain the 2014 Combined Statistical Areas from the 'tigris' package +metro2014 <- metro_divisions(year = 2014) +# Obtain the 2014 state from the 'tigris' package +state2014 <- states(cb = TRUE) + +# Join the SP values to the CSA geometries and filter for Georgia +PA2010theil <- metro2014 %>% + left_join(theil2014PA$h, by = 'GEOID') %>% + filter(!st_is_empty(.)) %>% + filter(!is.na(H)) %>% + st_filter(state2014 %>% filter(STUSPS == 'PA')) %>% + st_make_valid() +``` + +```{r theil_plot, fig.height = 4, fig.width = 7} +# Visualize the H values (2010-2014 5-year ACS) for Pennsylvania, U.S.A., metro divisions +ggplot() + + geom_sf( + data = PA2010theil, + aes(fill = H) + ) + + geom_sf( + data = state2014 %>% filter(STUSPS == 'PA'), + fill = 'transparent', + color = 'black', + size = 0.2 + ) + + theme_minimal() + + scale_fill_viridis_c(limits = c(0, 1)) + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2010-2014 estimates') + + ggtitle( + 'Entropy (Theil)\nCensus tracts to Metro Divisions in Pennsylvania', + subtitle = 'Black population' + ) +``` + ```{r system} sessionInfo() ``` diff --git a/vignettes/vignette.html b/vignettes/vignette.html index a96c4e2..9d40805 100644 --- a/vignettes/vignette.html +++ b/vignettes/vignette.html @@ -1135,8 +1135,8 @@ <h4>Assign the referent (U.S.-Standardized Metric)</h4> <span id="cb30-7"><a href="#cb30-7" tabindex="-1"></a> )</span></code></pre></div> <p><img src="" /><!-- --></p> <p>The process to compute a US-standardized <em>NDI</em> (Powell-Wiley) -took about 3 minutes to run on a machine with the features listed at the -end of the vignette.</p> +took about 2.8 minutes to run on a machine with the features listed at +the end of the vignette.</p> </div> </div> <div id="additional-metrics-socio-economic-deprivation-and-disparity" class="section level3"> @@ -1165,30 +1165,33 @@ <h3>Additional metrics socio-economic deprivation and disparity</h3> <li><code>atkinson()</code> function that computes the Atkinson Index (<em>A</em>) based on <a href="https://doi.org/10.1016/0022-0531(70)90039-6">Atkinson (1970)</a></li> -<li><code>bell()</code> function that computes the aspatial racial or -ethnic Interaction Index (<em>xPy*</em>) based on Shevky & Williams -(1949; ISBN-13:978-0-837-15637-8) and <a href="https://doi.org/10.2307/2574118">Bell (1954)</a></li> -<li><code>white()</code> function that computes the aspatial racial or -ethnic Correlation Ratio (<em>V</em>) based on <a href="https://doi.org/10.2307/2574118">Bell (1954)</a> and <a href="https://doi.org/10.2307/3644339">White (1986)</a></li> -<li><code>sudano()</code> function that computes the aspatial racial or -ethnic Location Quotient (<em>LQ</em>) based on <a href="https://doi.org/10.2307/2084686">Merton (1939)</a> and <a href="https://doi.org/10.1016/j.healthplace.2012.09.015">Sudano et +<li><code>bell()</code> function that computes the racial or ethnic +Interaction Index (<em>xPy*</em>) based on Shevky & Williams (1949; +ISBN-13:978-0-837-15637-8) and <a href="https://doi.org/10.2307/2574118">Bell (1954)</a></li> +<li><code>white()</code> function that computes the racial or ethnic +Correlation Ratio (<em>V</em>) based on <a href="https://doi.org/10.2307/2574118">Bell (1954)</a> and <a href="https://doi.org/10.2307/3644339">White (1986)</a></li> +<li><code>sudano()</code> function that computes the racial or ethnic +Location Quotient (<em>LQ</em>) based on <a href="https://doi.org/10.2307/2084686">Merton (1939)</a> and <a href="https://doi.org/10.1016/j.healthplace.2012.09.015">Sudano et al. (2013)</a></li> -<li><code>bemanian_beyer()</code> function that computes the aspatial -racial or ethnic Local Exposure and Isolation (<em>LEx/Is</em>) metric -based on <a href="https://doi.org/10.1158/1055-9965.EPI-16-0926">Bemanian & +<li><code>bemanian_beyer()</code> function that computes the racial or +ethnic Local Exposure and Isolation (<em>LEx/Is</em>) metric based on <a href="https://doi.org/10.1158/1055-9965.EPI-16-0926">Bemanian & Beyer (2017)</a></li> -<li><code>hoover()</code> function that computes the aspatial racial or -ethnic Delta (<em>DEL</em>) based on <a href="https://doi.org/10.1017/S0022050700052980">Hoover (1941)</a> and +<li><code>hoover()</code> function that computes the racial or ethnic +Delta (<em>DEL</em>) based on <a href="https://doi.org/10.1017/S0022050700052980">Hoover (1941)</a> and Duncan et al. (1961; LC:60007089)</li> <li><code>white_blau()</code> function that computes an index of spatial proximity (<em>SP</em>) based on <a href="https://doi.org/10.2307/3644339">White (1986)</a> and Blau (1977; ISBN-13:978-0-029-03660-0)</li> -<li><code>lieberson()</code> function that computes the aspatial racial -or ethnic Isolation Index (<em>xPx*</em>) based on Lieberson (1981; +<li><code>lieberson()</code> function that computes the racial or ethnic +Isolation Index (<em>xPx*</em>) based on Lieberson (1981; ISBN-13:978-1-032-53884-6) and <a href="https://doi.org/10.2307/2574118">Bell (1954)</a></li> <li><code>james_taeuber()</code> function that computes the Dissimilarity Index (<em>D</em>) based on based on <a href="https://doi.org/10.2307/270845">James & Taeuber (1985)</a></li> +<li><code>theil()</code> function that computes the racial or ethnic +Entropy (<em>H</em>) based on Theil (1972; <a href="ISBN:978-0-444-10378-9" class="uri">ISBN:978-0-444-10378-9</a>) +and <a href="https://doi.org/110.1080/0022250X.1971.9989795">Theil & +Finizza (1971)</a></li> </ol> <div id="compute-racial-isolation-index-ri" class="section level4"> <h4>Compute Racial Isolation Index (<em>RI</em>)</h4> @@ -1511,7 +1514,7 @@ <h4>Compute Educational Isolation Index (<em>EI</em>)</h4> </div> <div id="the-racial-or-ethnic-gini-index-g" class="section level4"> <h4>The racial or ethnic Gini Index (<em>G</em>)</h4> -<p>Compute the aspatial racial or ethnic Gini Index (<em>G</em>) values +<p>Compute the racial or ethnic Gini Index (<em>G</em>) values (2006-2010 5-year ACS) for Massachusetts, U.S.A., census tracts within counties. This metric is based on <a href="https://doi.org/10.2307/2223319">Gini (1921)</a>. Multiple racial or ethnic subgroups are available in the <code>gini()</code> function, @@ -1683,12 +1686,12 @@ <h4>The racial or ethnic Gini Index (<em>G</em>)</h4> </div> <div id="the-income-gini-index-g" class="section level4"> <h4>The income Gini Index (<em>G</em>)</h4> -<p>Retrieve the aspatial income Gini Index (<em>G</em>) values -(2006-2010 5-year ACS) for Massachusetts, U.S.A., census tracts. This -metric is based on <a href="https://doi.org/10.2307/2223319">Gini -(1921)</a>, and the <code>gini()</code> function retrieves the estimate -from the ACS-5 when calculating the Gini Index (<em>G</em>) for racial -or ethnic inequality.</p> +<p>Retrieve the income Gini Index (<em>G</em>) values (2006-2010 5-year +ACS) for Massachusetts, U.S.A., census tracts. This metric is based on +<a href="https://doi.org/10.2307/2223319">Gini (1921)</a>, and the +<code>gini()</code> function retrieves the estimate from the ACS-5 when +calculating the Gini Index (<em>G</em>) for racial or ethnic +inequality.</p> <p>According to the <a href="https://census.gov/topics/income-poverty/income-inequality/about/metrics/gini-index.html">U.S. Census Bureau</a>: ‘The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data @@ -1742,10 +1745,10 @@ <h4>The income Gini Index (<em>G</em>)</h4> </div> <div id="index-of-concentration-at-the-extremes-ice" class="section level3"> <h3>Index of Concentration at the Extremes (<em>ICE</em>)</h3> -<p>Compute the aspatial Index of Concentration at the Extremes values -(2006-2010 5-year ACS) for Wayne County, Michigan, U.S.A., census -tracts. Wayne County is the home of Detroit, Michigan, a highly -segregated city in the U.S. This metric is based on <a href="https://doi.org/10.1136/jech-2015-205728">Feldman et +<p>Compute the Index of Concentration at the Extremes values (2006-2010 +5-year ACS) for Wayne County, Michigan, U.S.A., census tracts. Wayne +County is the home of Detroit, Michigan, a highly segregated city in the +U.S. This metric is based on <a href="https://doi.org/10.1136/jech-2015-205728">Feldman et al. (2015)</a> and <a href="https://doi.org/10.2105/AJPH.2015.302955">Krieger et al. (2016)</a> who expanded the metric designed by Massey in a chapter of <a href="https://doi.org/10.4324/9781410600141">Booth & Crouter @@ -1922,15 +1925,15 @@ <h3>Index of Concentration at the Extremes (<em>ICE</em>)</h3> <p><img src="" /><img src="" /><img src="" /><img src="" /><img src="" /></p> <div id="compute-racial-or-ethnic-dissimilarity-index-d" class="section level4"> <h4>Compute racial or ethnic Dissimilarity Index (<em>D</em>)</h4> -<p>Compute the aspatial racial or ethnic <em>D</em> values (2006-2010 -5-year ACS) for Pennsylvania, U.S.A., counties from census tracts. This -metric is based on <a href="https://doi.org/10.2307/2088328">Duncan -& Duncan (1955)</a> that assessed the racial or ethnic isolation of -students that identify as non-Hispanic or Latino, Black or African -American alone compared to students that identify as non-Hispanic or -Latino, white alone between schools and school districts. Multiple -racial or ethnic subgroups are available in the <code>duncan()</code> -function, including:</p> +<p>Compute the racial or ethnic <em>D</em> values (2006-2010 5-year ACS) +for Pennsylvania, U.S.A., counties from census tracts. This metric is +based on <a href="https://doi.org/10.2307/2088328">Duncan & Duncan +(1955)</a> that assessed the racial or ethnic isolation of students that +identify as non-Hispanic or Latino, Black or African American alone +compared to students that identify as non-Hispanic or Latino, white +alone between schools and school districts. Multiple racial or ethnic +subgroups are available in the <code>duncan()</code> function, +including:</p> <table> <colgroup> <col width="32%" /> @@ -2099,12 +2102,11 @@ <h4>Compute racial or ethnic Dissimilarity Index (<em>D</em>)</h4> <span id="cb44-21"><a href="#cb44-21" tabindex="-1"></a> )</span></code></pre></div> <p><img src="" /><!-- --></p> </div> -<div id="compute-aspatial-income-or-racial-or-ethnic-atkinson-index-a" class="section level4"> -<h4>Compute aspatial income or racial or ethnic Atkinson Index -(<em>A</em>)</h4> -<p>Compute the aspatial income or racial or ethnic <em>A</em> values -(2017-2021 5-year ACS) for Kentucky, U.S.A., counties from census block -groups. This metric is based on <a href="https://doi.org/10.2307/2088328">Atkinson (1970)</a> that assessed +<div id="compute-income-or-racial-or-ethnic-atkinson-index-a" class="section level4"> +<h4>Compute income or racial or ethnic Atkinson Index (<em>A</em>)</h4> +<p>Compute the income or racial or ethnic <em>A</em> values (2017-2021 +5-year ACS) for Kentucky, U.S.A., counties from census block groups. +This metric is based on <a href="https://doi.org/10.2307/2088328">Atkinson (1970)</a> that assessed the distribution of income within 12 counties but has since been adapted to study racial or ethnic segregation (see <a href="https://doi.org/10.2307/270845">James & Taeuber 1985</a>). To compare median household income, specify @@ -2297,12 +2299,12 @@ <h4>Compute aspatial income or racial or ethnic Atkinson Index </div> <div id="compute-racial-or-ethnic-interaction-index-xpy" class="section level4"> <h4>Compute racial or ethnic Interaction Index (<em>xPy*</em>)</h4> -<p>Compute the aspatial racial or ethnic <em>xPy*</em> values (2017-2021 -5-year ACS) for Ohio, U.S.A., counties from census tracts. This metric -is based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and -adapted by <a href="https://doi.org/10.2307/2574118">Bell (1954)</a>. -Multiple racial or ethnic subgroups are available in the -<code>bell()</code> function, including:</p> +<p>Compute the racial or ethnic <em>xPy*</em> values (2017-2021 5-year +ACS) for Ohio, U.S.A., counties from census tracts. This metric is based +on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and adapted +by <a href="https://doi.org/10.2307/2574118">Bell (1954)</a>. Multiple +racial or ethnic subgroups are available in the <code>bell()</code> +function, including:</p> <table> <colgroup> <col width="32%" /> @@ -2470,12 +2472,12 @@ <h4>Compute racial or ethnic Interaction Index (<em>xPy*</em>)</h4> </div> <div id="compute-correlation-ratio-v" class="section level4"> <h4>Compute Correlation Ratio (<em>V</em>)</h4> -<p>Compute the aspatial racial or ethnic <em>V</em> values (2017-2021 -5-year ACS) for South Carolina, U.S.A., counties from census tracts. -This metric is based on <a href="https://doi.org/10.2307/2574118">Bell -(1954)</a> and adapted by <a href="https://doi.org/10.2307/3644339">White (1986)</a>. Multiple racial -or ethnic subgroups are available in the <code>white()</code> function, -including:</p> +<p>Compute the racial or ethnic <em>V</em> values (2017-2021 5-year ACS) +for South Carolina, U.S.A., counties from census tracts. This metric is +based on <a href="https://doi.org/10.2307/2574118">Bell (1954)</a> and +adapted by <a href="https://doi.org/10.2307/3644339">White (1986)</a>. +Multiple racial or ethnic subgroups are available in the +<code>white()</code> function, including:</p> <table> <colgroup> <col width="32%" /> @@ -2648,10 +2650,10 @@ <h4>Compute Correlation Ratio (<em>V</em>)</h4> </div> <div id="compute-location-quotient-lq" class="section level4"> <h4>Compute Location Quotient (<em>LQ</em>)</h4> -<p>Compute the aspatial racial or ethnic <em>LQ</em> values (2017-2021 -5-year ACS) for Tennessee, U.S.A., counties vs. the state. This metric -is based on <a href="https://doi.org/10.2307/2084686">Merton (1939)</a> -and adapted by <a href="https://doi.org/10.1016/j.healthplace.2012.09.015">Sudano et +<p>Compute the racial or ethnic <em>LQ</em> values (2017-2021 5-year +ACS) for Tennessee, U.S.A., counties vs. the state. This metric is based +on <a href="https://doi.org/10.2307/2084686">Merton (1939)</a> and +adapted by <a href="https://doi.org/10.1016/j.healthplace.2012.09.015">Sudano et al. (2013)</a>. Multiple racial or ethnic subgroups are available in the <code>sudano()</code> function, including:</p> <table> @@ -2824,9 +2826,9 @@ <h4>Compute Location Quotient (<em>LQ</em>)</h4> </div> <div id="compute-local-exposure-and-isolation-lexis" class="section level4"> <h4>Compute Local Exposure and Isolation (<em>LEx/Is</em>)</h4> -<p>Compute the aspatial racial or ethnic Local Exposure and Isolation -metric (2017-2021 5-year ACS) for Mississippi, U.S.A., counties vs. the -state. This metric is based on <a href="https://doi.org/10.1158/1055-9965.EPI-16-0926">Bemanian & +<p>Compute the racial or ethnic Local Exposure and Isolation metric +(2017-2021 5-year ACS) for Mississippi, U.S.A., counties vs. the state. +This metric is based on <a href="https://doi.org/10.1158/1055-9965.EPI-16-0926">Bemanian & Beyer (2017)</a>. Multiple racial or ethnic subgroups are available in the <code>bemanian_beyer()</code> function, including:</p> <table> @@ -3039,11 +3041,12 @@ <h4>Compute Local Exposure and Isolation (<em>LEx/Is</em>)</h4> </div> <div id="compute-delta-del" class="section level4"> <h4>Compute Delta (<em>DEL</em>)</h4> -<p>Compute the aspatial racial or ethnic <em>DEL</em> values (2017-2021 -5-year ACS) for Alabama, U.S.A., counties from census tracts. This -metric is based on <a href="https://doi.org/10.1017/S0022050700052980">Hoover (1941)</a> and -Duncan et al. (1961; LC:60007089). Multiple racial or ethnic subgroups -are available in the <code>hoover()</code> function, including:</p> +<p>Compute the racial or ethnic <em>DEL</em> values (2017-2021 5-year +ACS) for Alabama, U.S.A., counties from census tracts. This metric is +based on <a href="https://doi.org/10.1017/S0022050700052980">Hoover +(1941)</a> and Duncan et al. (1961; LC:60007089). Multiple racial or +ethnic subgroups are available in the <code>hoover()</code> function, +including:</p> <table> <colgroup> <col width="32%" /> @@ -3396,12 +3399,12 @@ <h4>Compute an index of spatial proximity (<em>SP</em>)</h4> </div> <div id="compute-racial-or-ethnic-isolation-index-xpx" class="section level4"> <h4>Compute racial or ethnic Isolation Index (<em>xPx*</em>)</h4> -<p>Compute the aspatial racial or ethnic <em>xPx*</em> values (2015-2019 -5-year ACS) for Delaware, U.S.A., census tracts from census block -groups. This metric is based on <a href="https://doi.org/10.2307/2574118">Bell (1954)</a> and adapted by -Lieberson (1981; ISBN-13:978-1-032-53884-6). Multiple racial or ethnic -subgroups are available in the <code>lieberson()</code> function, -including:</p> +<p>Compute the racial or ethnic <em>xPx*</em> values (2015-2019 5-year +ACS) for Delaware, U.S.A., census tracts from census block groups. This +metric is based on <a href="https://doi.org/10.2307/2574118">Bell +(1954)</a> and adapted by Lieberson (1981; ISBN-13:978-1-032-53884-6). +Multiple racial or ethnic subgroups are available in the +<code>lieberson()</code> function, including:</p> <table> <colgroup> <col width="32%" /> @@ -3567,11 +3570,11 @@ <h4>Compute racial or ethnic Isolation Index (<em>xPx*</em>)</h4> </div> <div id="compute-racial-or-ethnic-dissimilarity-index-d-1" class="section level4"> <h4>Compute racial or ethnic Dissimilarity Index (<em>D</em>)</h4> -<p>Compute the aspatial racial or ethnic <em>D</em> values (2006-2010 -5-year ACS) for Pennsylvania, U.S.A., counties from census tracts. This -metric is based on <a href="https://doi.org/10.2307/270845">James & -Taeuber (1985)</a>. Multiple racial or ethnic subgroups are available in -the <code>james_taeuber()</code> function, including:</p> +<p>Compute the racial or ethnic <em>D</em> values (2006-2010 5-year ACS) +for Pennsylvania, U.S.A., counties from census tracts. This metric is +based on <a href="https://doi.org/10.2307/270845">James & Taeuber +(1985)</a>. Multiple racial or ethnic subgroups are available in the +<code>james_taeuber()</code> function, including:</p> <table> <colgroup> <col width="32%" /> @@ -3737,7 +3740,187 @@ <h4>Compute racial or ethnic Dissimilarity Index (<em>D</em>)</h4> <span id="cb63-20"><a href="#cb63-20" tabindex="-1"></a> <span class="at">subtitle =</span> <span class="st">'Black population'</span></span> <span id="cb63-21"><a href="#cb63-21" tabindex="-1"></a> )</span></code></pre></div> <p><img src="" /><!-- --></p> -<div class="sourceCode" id="cb64"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb64-1"><a href="#cb64-1" tabindex="-1"></a><span class="fu">sessionInfo</span>()</span></code></pre></div> +</div> +<div id="compute-racial-or-ethnic-entropy-h" class="section level4"> +<h4>Compute racial or ethnic Entropy (<em>H</em>)</h4> +<p>Compute Entropy (2010-2014 5-year ACS) for Philadelphia, PA, +metropolitan area from census tracts. This metric is based on based on +Theil (1972; <a href="ISBN:978-0-444-10378-9" class="uri">ISBN:978-0-444-10378-9</a>) and <a href="https://doi.org/110.1080/0022250X.1971.9989795">Theil & +Finizza (1971)</a>. Multiple racial or ethnic subgroups are available in +the <code>theil()</code> function, including:</p> +<table> +<colgroup> +<col width="32%" /> +<col width="30%" /> +<col width="37%" /> +</colgroup> +<thead> +<tr class="header"> +<th>ACS table source</th> +<th>racial or ethnic subgroup</th> +<th>character for <code>subgroup</code>arguments</th> +</tr> +</thead> +<tbody> +<tr class="odd"> +<td>B03002_002</td> +<td>not Hispanic or Latino</td> +<td>NHoL</td> +</tr> +<tr class="even"> +<td>B03002_003</td> +<td>not Hispanic or Latino, white alone</td> +<td>NHoLW</td> +</tr> +<tr class="odd"> +<td>B03002_004</td> +<td>not Hispanic or Latino, Black or African American alone</td> +<td>NHoLB</td> +</tr> +<tr class="even"> +<td>B03002_005</td> +<td>not Hispanic or Latino, American Indian and Alaska Native alone</td> +<td>NHoLAIAN</td> +</tr> +<tr class="odd"> +<td>B03002_006</td> +<td>not Hispanic or Latino, Asian alone</td> +<td>NHoLA</td> +</tr> +<tr class="even"> +<td>B03002_007</td> +<td>not Hispanic or Latino, Native Hawaiian and Other Pacific Islander +alone</td> +<td>NHoLNHOPI</td> +</tr> +<tr class="odd"> +<td>B03002_008</td> +<td>not Hispanic or Latino, some other race alone</td> +<td>NHoLSOR</td> +</tr> +<tr class="even"> +<td>B03002_009</td> +<td>not Hispanic or Latino, two or more races</td> +<td>NHoLTOMR</td> +</tr> +<tr class="odd"> +<td>B03002_010</td> +<td>not Hispanic or Latino, two races including some other race</td> +<td>NHoLTRiSOR</td> +</tr> +<tr class="even"> +<td>B03002_011</td> +<td>not Hispanic or Latino, two races excluding some other race, and +three or more races</td> +<td>NHoLTReSOR</td> +</tr> +<tr class="odd"> +<td>B03002_012</td> +<td>Hispanic or Latino</td> +<td>HoL</td> +</tr> +<tr class="even"> +<td>B03002_013</td> +<td>Hispanic or Latino, white alone</td> +<td>HoLW</td> +</tr> +<tr class="odd"> +<td>B03002_014</td> +<td>Hispanic or Latino, Black or African American alone</td> +<td>HoLB</td> +</tr> +<tr class="even"> +<td>B03002_015</td> +<td>Hispanic or Latino, American Indian and Alaska Native alone</td> +<td>HoLAIAN</td> +</tr> +<tr class="odd"> +<td>B03002_016</td> +<td>Hispanic or Latino, Asian alone</td> +<td>HoLA</td> +</tr> +<tr class="even"> +<td>B03002_017</td> +<td>Hispanic or Latino, Native Hawaiian and other Pacific Islander +alone</td> +<td>HoLNHOPI</td> +</tr> +<tr class="odd"> +<td>B03002_018</td> +<td>Hispanic or Latino, some other race alone</td> +<td>HoLSOR</td> +</tr> +<tr class="even"> +<td>B03002_019</td> +<td>Hispanic or Latino, two or more races</td> +<td>HoLTOMR</td> +</tr> +<tr class="odd"> +<td>B03002_020</td> +<td>Hispanic or Latino, two races including some other race</td> +<td>HoLTRiSOR</td> +</tr> +<tr class="even"> +<td>B03002_021</td> +<td>Hispanic or Latino, two races excluding some other race, and three +or more races</td> +<td>HoLTReSOR</td> +</tr> +</tbody> +</table> +<p><em>H</em> is a measure of the evenness of racial or ethnic +residential segregation when comparing smaller geographical areas to +larger ones within which the smaller geographical areas are located. +<em>H</em> can range in value from 0 to 1 and represents the (weighted) +average deviation of each smaller geographical unit from the larger +geographical unit’s “entropy” or racial and ethnic diversity, which is +greatest when each group is equally represented in the larger +geographical unit. <em>H</em> varies between 0, when all smaller +geographical units have the same racial or ethnic composition as the +larger geographical area (i.e., maximum integration), to a high of 1, +when all smaller geographical units contain one group only (maximum +segregation).</p> +<div class="sourceCode" id="cb64"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb64-1"><a href="#cb64-1" tabindex="-1"></a>theil2014PA <span class="ot"><-</span> <span class="fu">theil</span>(</span> +<span id="cb64-2"><a href="#cb64-2" tabindex="-1"></a> <span class="at">geo_large =</span> <span class="st">'metro'</span>,</span> +<span id="cb64-3"><a href="#cb64-3" tabindex="-1"></a> <span class="at">geo_small =</span> <span class="st">'tract'</span>,</span> +<span id="cb64-4"><a href="#cb64-4" tabindex="-1"></a> <span class="at">state =</span> <span class="fu">c</span>(<span class="st">'PA'</span>, <span class="st">'NJ'</span>, <span class="st">'DE'</span>, <span class="st">'MD'</span>, <span class="st">'OH'</span>, <span class="st">'WV'</span>, <span class="st">'NY'</span>, <span class="st">'CT'</span>),</span> +<span id="cb64-5"><a href="#cb64-5" tabindex="-1"></a> <span class="at">year =</span> <span class="dv">2014</span>,</span> +<span id="cb64-6"><a href="#cb64-6" tabindex="-1"></a> <span class="at">subgroup =</span> <span class="fu">c</span>(<span class="st">'NHoLB'</span>, <span class="st">'HoLB'</span>)</span> +<span id="cb64-7"><a href="#cb64-7" tabindex="-1"></a>)</span> +<span id="cb64-8"><a href="#cb64-8" tabindex="-1"></a></span> +<span id="cb64-9"><a href="#cb64-9" tabindex="-1"></a><span class="co"># Obtain the 2014 Combined Statistical Areas from the 'tigris' package</span></span> +<span id="cb64-10"><a href="#cb64-10" tabindex="-1"></a>metro2014 <span class="ot"><-</span> <span class="fu">metro_divisions</span>(<span class="at">year =</span> <span class="dv">2014</span>)</span> +<span id="cb64-11"><a href="#cb64-11" tabindex="-1"></a><span class="co"># Obtain the 2014 state from the 'tigris' package</span></span> +<span id="cb64-12"><a href="#cb64-12" tabindex="-1"></a>state2014 <span class="ot"><-</span> <span class="fu">states</span>(<span class="at">cb =</span> <span class="cn">TRUE</span>)</span> +<span id="cb64-13"><a href="#cb64-13" tabindex="-1"></a></span> +<span id="cb64-14"><a href="#cb64-14" tabindex="-1"></a><span class="co"># Join the SP values to the CSA geometries and filter for Georgia</span></span> +<span id="cb64-15"><a href="#cb64-15" tabindex="-1"></a>PA2010theil <span class="ot"><-</span> metro2014 <span class="sc">%>%</span></span> +<span id="cb64-16"><a href="#cb64-16" tabindex="-1"></a> <span class="fu">left_join</span>(theil2014PA<span class="sc">$</span>h, <span class="at">by =</span> <span class="st">'GEOID'</span>) <span class="sc">%>%</span></span> +<span id="cb64-17"><a href="#cb64-17" tabindex="-1"></a> <span class="fu">filter</span>(<span class="sc">!</span><span class="fu">st_is_empty</span>(.)) <span class="sc">%>%</span></span> +<span id="cb64-18"><a href="#cb64-18" tabindex="-1"></a> <span class="fu">filter</span>(<span class="sc">!</span><span class="fu">is.na</span>(H)) <span class="sc">%>%</span></span> +<span id="cb64-19"><a href="#cb64-19" tabindex="-1"></a> <span class="fu">st_filter</span>(state2014 <span class="sc">%>%</span> <span class="fu">filter</span>(STUSPS <span class="sc">==</span> <span class="st">'PA'</span>)) <span class="sc">%>%</span></span> +<span id="cb64-20"><a href="#cb64-20" tabindex="-1"></a> <span class="fu">st_make_valid</span>()</span></code></pre></div> +<div class="sourceCode" id="cb65"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb65-1"><a href="#cb65-1" tabindex="-1"></a><span class="co"># Visualize the H values (2010-2014 5-year ACS) for Pennsylvania, U.S.A., metro divisions </span></span> +<span id="cb65-2"><a href="#cb65-2" tabindex="-1"></a><span class="fu">ggplot</span>() <span class="sc">+</span></span> +<span id="cb65-3"><a href="#cb65-3" tabindex="-1"></a> <span class="fu">geom_sf</span>(</span> +<span id="cb65-4"><a href="#cb65-4" tabindex="-1"></a> <span class="at">data =</span> PA2010theil,</span> +<span id="cb65-5"><a href="#cb65-5" tabindex="-1"></a> <span class="fu">aes</span>(<span class="at">fill =</span> H)</span> +<span id="cb65-6"><a href="#cb65-6" tabindex="-1"></a> ) <span class="sc">+</span></span> +<span id="cb65-7"><a href="#cb65-7" tabindex="-1"></a> <span class="fu">geom_sf</span>(</span> +<span id="cb65-8"><a href="#cb65-8" tabindex="-1"></a> <span class="at">data =</span> state2014 <span class="sc">%>%</span> <span class="fu">filter</span>(STUSPS <span class="sc">==</span> <span class="st">'PA'</span>),</span> +<span id="cb65-9"><a href="#cb65-9" tabindex="-1"></a> <span class="at">fill =</span> <span class="st">'transparent'</span>,</span> +<span id="cb65-10"><a href="#cb65-10" tabindex="-1"></a> <span class="at">color =</span> <span class="st">'black'</span>,</span> +<span id="cb65-11"><a href="#cb65-11" tabindex="-1"></a> <span class="at">size =</span> <span class="fl">0.2</span></span> +<span id="cb65-12"><a href="#cb65-12" tabindex="-1"></a> ) <span class="sc">+</span></span> +<span id="cb65-13"><a href="#cb65-13" tabindex="-1"></a> <span class="fu">theme_minimal</span>() <span class="sc">+</span></span> +<span id="cb65-14"><a href="#cb65-14" tabindex="-1"></a> <span class="fu">scale_fill_viridis_c</span>(<span class="at">limits =</span> <span class="fu">c</span>(<span class="dv">0</span>, <span class="dv">1</span>)) <span class="sc">+</span></span> +<span id="cb65-15"><a href="#cb65-15" tabindex="-1"></a> <span class="fu">labs</span>(<span class="at">fill =</span> <span class="st">'Index (Continuous)'</span>, <span class="at">caption =</span> <span class="st">'Source: U.S. Census ACS 2010-2014 estimates'</span>) <span class="sc">+</span></span> +<span id="cb65-16"><a href="#cb65-16" tabindex="-1"></a> <span class="fu">ggtitle</span>(</span> +<span id="cb65-17"><a href="#cb65-17" tabindex="-1"></a> <span class="st">'Entropy (Theil)</span><span class="sc">\n</span><span class="st">Census tracts to Metro Divisions in Pennsylvania'</span>,</span> +<span id="cb65-18"><a href="#cb65-18" tabindex="-1"></a> <span class="at">subtitle =</span> <span class="st">'Black population'</span></span> +<span id="cb65-19"><a href="#cb65-19" tabindex="-1"></a> )</span></code></pre></div> +<p><img src="" /><!-- --></p> +<div class="sourceCode" id="cb66"><pre class="sourceCode r"><code class="sourceCode r"><span id="cb66-1"><a href="#cb66-1" tabindex="-1"></a><span class="fu">sessionInfo</span>()</span></code></pre></div> <pre><code>## R version 4.4.1 (2024-06-14 ucrt) ## Platform: x86_64-w64-mingw32/x64 ## Running under: Windows 10 x64 (build 19045) @@ -3759,7 +3942,7 @@ <h4>Compute racial or ethnic Dissimilarity Index (<em>D</em>)</h4> ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: -## [1] tigris_2.1 tidycensus_1.6.5 sf_1.0-16 ndi_0.1.6.9005 +## [1] tigris_2.1 tidycensus_1.6.5 sf_1.0-16 ndi_0.1.6.9006 ## [5] ggplot2_3.5.1 dplyr_1.1.4 knitr_1.48 ## ## loaded via a namespace (and not attached):