diff --git a/DESCRIPTION b/DESCRIPTION index ca8e078..a0b6915 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,7 +1,7 @@ Package: ndi Title: Neighborhood Deprivation Indices -Version: 0.1.5 -Date: 2024-01-23 +Version: 0.1.6.9000 +Date: 2024-07-06 Authors@R: c(person(given = "Ian D.", family = "Buller", @@ -39,16 +39,17 @@ Description: Computes various metrics of socio-economic deprivation and disparit based on Bell (1954) and White (1986) , (8) aspatial racial/ethnic Location Quotient (LQ) based on Merton (1939) and Sudano et al. (2013) - , and (9) aspatial racial/ethnic Local + , (9) aspatial racial/ethnic Local Exposure and Isolation metric based on Bemanian & Beyer (2017) - . Also using data from the ACS-5 (2005-2009 - onward), the package can retrieve the aspatial Gini Index based Gini (1921) - . + , and (10) aspatial racial/ethnic Delta based on + Hoover (1941) and Duncan et al. (1961; LC:60007089). + Also using data from the ACS-5 (2005-2009 onward), the package can retrieve the + aspatial Gini Index based Gini (1921) . License: Apache License (>= 2.0) Encoding: UTF-8 LazyData: true Roxygen: list(markdown = TRUE) -RoxygenNote: 7.2.3 +RoxygenNote: 7.3.2 Depends: R (>= 3.5.0) Imports: diff --git a/NAMESPACE b/NAMESPACE index c648ea6..71b1c6a 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -7,6 +7,7 @@ export(bemanian_beyer) export(bravo) export(duncan) export(gini) +export(hoover) export(krieger) export(messer) export(powell_wiley) diff --git a/NEWS.md b/NEWS.md index db0146a..ad665cf 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,18 @@ # ndi (development version) +## ndi v0.1.6.9000 + +### New Features +* Added `hoover()` function to compute the aspatial racial/ethnic Delta (DEL) based on [Hoover (1941)](https://doi.org/10.1017/S0022050700052980) and Duncan et al. (1961; LC:60007089) +* Thank you for the feature suggestion, [Symielle Gaston](https://orcid.org/0000-0001-9495-1592) + +### Updates +* Fixed bug in `bell()`, `bemanian_beyer()`, `duncan()`, `sudano()`, and `white()` when a smaller geography contains n=0 total population, will assign a value of zero (0) in the internal calculation instead of NA +* 'package.R' deprecated. Replaced with 'ndi-package.R'. +* Re-formatted code and documentation throughout for consistent readability +* Updated documentation about value range of V (White) from `{0 to 1}` to `{-Inf to Inf}` +* Updated examples in vignette (& README) an example for `hoover()` and a larger variety of U.S. states + ## ndi v0.1.5 ### New Features @@ -9,7 +22,7 @@ * 'DescTools' is now Suggests to fix Rd cross-references NOTE * Fixed 'lost braces in \itemize' NOTE for `anthopolos()`, `atkinson()`, `bell()`, `bemanian_beyer()`, `bravo()`, `duncan()`, `krieger()`, `messer()`, `powell_wiley()`, `sudano()`, and `white()` functions * Fixed 'Moved Permanently' content by replacing the old URL with the new URL -* Fixed citation for Slotman _et al._ (2022) in CITATION +* Fixed citation for Slotman et al. (2022) in CITATION ## ndi v0.1.4 @@ -17,7 +30,7 @@ * Added `atkinson()` function to compute the aspatial income or racial/ethnic Atkinson Index (AI) based on [Atkinson (1970)](https://doi.org/10.1016/0022-0531(70)90039-6) for specified counties/tracts 2009 onward * Added `bell()` function to compute the aspatial racial/ethnic Isolation Index (II) based on Shevky & Williams (1949; ISBN-13:978-0837156378) and [Bell (1954)](https://doi.org/10.2307/2574118) * Added `white()` function to compute the aspatial racial/ethnic Correlation Ratio (V) based on [Bell (1954)](https://doi.org/10.2307/2574118) and [White (1986)](https://doi.org/10.2307/3644339) -* Added `sudano()` function to compute the aspatial racial/ethnic Location Quotient (LQ) based on [Merton (1939)](https://doi.org/10.2307/2084686) and [Sudano _et al._ (2013)](https://doi.org/10.1016/j.healthplace.2012.09.015) +* Added `sudano()` function to compute the aspatial racial/ethnic Location Quotient (LQ) based on [Merton (1939)](https://doi.org/10.2307/2084686) and [Sudano et al. (2013)](https://doi.org/10.1016/j.healthplace.2012.09.015) * Added `bemanian_beyer()` function to compute the aspatial racial/ethnic Local Exposure and Isolation (LEx/Is) metric based on [Bemanian & Beyer (2017)](https://doi.org/10.1158/1055-9965.EPI-16-0926) ### Updates @@ -25,7 +38,7 @@ * Fixed bug in reverse dependency check failure for `anthopolos()` and `bravo()` functions removing `returnValue()` when data are not missing * Thank you, [Roger Bivand](https://github.com/rsbivand), for the catch. Relates to [ndi Issue #5](https://github.com/idblr/ndi/issues/5) * Updated `duncan()`, `gini()`, `krieger()`, `messer()`, and `powell_wiley()` for consistency in messaging when data are not missing -* Updated tests for `anthopolos()` and `bravo()` if `Sys.getenv("CENSUS_API_KEY") != ""` +* Updated tests for `anthopolos()` and `bravo()` if `Sys.getenv('CENSUS_API_KEY') != ''` * Added `omit_NAs` argument in `duncan()` function to choose if NA values will be included in its computation * In `duncan()` function, if any smaller geographic unit has zero counts the output for its larger geographic unit will be NA * Fixed bug in `duncan()` function for multiple `subgroup` and `subgroup_ref` selections @@ -41,7 +54,7 @@ * Added 'utils.R' file with internal `di_fun()` function for `duncan()` function ### Updates -* Fixed bug in `bravo()` function where ACS-5 data (2005-2009) are from the "B15002" question and "B06009" after +* Fixed bug in `bravo()` function where ACS-5 data (2005-2009) are from the 'B15002' question and 'B06009' after * Fixed bug in missingness warning for all metrics * `utils` is now Imports * Updated vignette and README with new features @@ -53,7 +66,7 @@ ## ndi v0.1.2 ### New Features -* Added `krieger()` function to compute the Index of Concentration at the Extremes (ICE) based on [Feldman _et al._ (2015)](https://doi.org/10.1136/jech-2015-205728) and [Krieger _et al._ (2016)](https://doi.org/10.2105/AJPH.2015.302955) for specified counties/tracts 2009 onward +* Added `krieger()` function to compute the Index of Concentration at the Extremes (ICE) based on [Feldman et al. (2015)](https://doi.org/10.1136/jech-2015-205728) and [Krieger et al. (2016)](https://doi.org/10.2105/AJPH.2015.302955) for specified counties/tracts 2009 onward * Thank you for the feature suggestion, [David Berrigan](https://orcid.org/0000-0002-5333-179X) * Added `df` argument for the `messer()` and `powell_wiley()` functions to specify a pre-formatted data set input for the NDI computation * Added `round_output` argument for the `messer()` and `powell_wiley()` functions to provide raw output as the default and rounded output as optional. @@ -64,7 +77,7 @@ * Fixed bug in `powell_wiley()` function where the internal PCA will now run properly if only one factor has an eigenvalue above 1 * Optimized the code to calculate missingness in all functions * Thank you for the suggested bug fixes, [Jacob Englert](https://github.com/jacobenglert) -* Fixed bug in `powell_wiley()` function where "PctNoPhone" before 2015 is "DP04_0074PE" and "DP04_0075PE" after +* Fixed bug in `powell_wiley()` function where 'PctNoPhone' before 2015 is 'DP04_0074PE' and 'DP04_0075PE' after * Thank you for alerting this issue, [Jessica Gleason](https://orcid.org/0000-0001-9877-7931) * Relaxed `year` argument in functions to include any year after 2009 or 2010 for the indices * Cleaned-up output formatting in functions @@ -79,8 +92,8 @@ ## ndi v0.1.1 ### New Features -* Added `anthopolos()` function to compute the Racial Isolation Index (RI) based on based on [Anthopolos _et al._ (2011)](https://doi.org/10.1016/j.sste.2011.06.002) for specified counties/tracts 2009 onward -* Added `bravo()` function to compute the Educational Isolation Index (EI) based on based on [Bravo _et al._ (2021)](https://doi.org/10.3390/ijerph18179384) for specified counties/tracts 2009 onward +* Added `anthopolos()` function to compute the Racial Isolation Index (RI) based on based on [Anthopolos et al. (2011)](https://doi.org/10.1016/j.sste.2011.06.002) for specified counties/tracts 2009 onward +* Added `bravo()` function to compute the Educational Isolation Index (EI) based on based on [Bravo et al. (2021)](https://doi.org/10.3390/ijerph18179384) for specified counties/tracts 2009 onward * Added `gini()` function to retrieve the Gini Index based on [Gini (1921)](https://doi.org/10.2307/2223319) for specified counties/tracts 2009 onward * Thank you for the feature suggestions, [Jessica Madrigal](https://orcid.org/0000-0001-5303-5109) diff --git a/R/DCtracts2020.R b/R/DCtracts2020.R index d085901..1e466bf 100644 --- a/R/DCtracts2020.R +++ b/R/DCtracts2020.R @@ -32,4 +32,4 @@ #' head(DCtracts2020) #' #' @source \url{https://github.com/idblr/ndi/blob/master/README.md} -"DCtracts2020" +'DCtracts2020' diff --git a/R/anthopolos.R b/R/anthopolos.R index fe301a1..5b7c95a 100644 --- a/R/anthopolos.R +++ b/R/anthopolos.R @@ -1,50 +1,50 @@ -#' Racial Isolation Index based on Anthopolos _et al._ (2011) -#' +#' Racial Isolation Index based on Anthopolos et al. (2011) +#' #' Compute the spatial Racial Isolation Index (Anthopolos) of selected subgroup(s). #' -#' @param geo Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}. +#' @param geo Character string specifying the geography of the data either census tracts \code{geo = 'tract'} (the default) or counties \code{geo = 'county'}. #' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available. #' @param subgroup Character string specifying the racial/ethnic subgroup(s). See Details for available choices. #' @param quiet Logical. If TRUE, will display messages about potential missing census information. The default is FALSE. #' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics #' -#' @details This function will compute the spatial Racial Isolation Index (RI) of U.S. census tracts or counties for a specified geographical extent (e.g., the entire U.S. or a single state) based on Anthopolos _et al._ (2011) \doi{10.1016/j.sste.2011.06.002} who originally designed the metric for the racial isolation of non-Hispanic Black individuals. This function provides the computation of RI for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). -#' +#' @details This function will compute the spatial Racial Isolation Index (RI) of U.S. census tracts or counties for a specified geographical extent (e.g., the entire U.S. or a single state) based on Anthopolos et al. (2011) \doi{10.1016/j.sste.2011.06.002} who originally designed the metric for the racial isolation of non-Hispanic Black individuals. This function provides the computation of RI for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). +#' #' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the geospatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: #' \itemize{ -#' \item **B03002_002**: not Hispanic or Latino \code{"NHoL"} -#' \item **B03002_003**: not Hispanic or Latino, white alone\code{"NHoLW"} -#' \item **B03002_004**: not Hispanic or Latino, Black or African American alone \code{"NHoLB"} -#' \item **B03002_005**: not Hispanic or Latino, American Indian and Alaska Native alone \code{"NHoLAIAN"} -#' \item **B03002_006**: not Hispanic or Latino, Asian alone \code{"NHoLA"} -#' \item **B03002_007**: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"NHoLNHOPI"} -#' \item **B03002_008**: not Hispanic or Latino, Some other race alone \code{"NHoLSOR"} -#' \item **B03002_009**: not Hispanic or Latino, Two or more races \code{"NHoLTOMR"} -#' \item **B03002_010**: not Hispanic or Latino, Two races including Some other race \code{"NHoLTRiSOR"} -#' \item **B03002_011**: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"NHoLTReSOR"} -#' \item **B03002_012**: Hispanic or Latino \code{"HoL"} -#' \item **B03002_013**: Hispanic or Latino, white alone \code{"HoLW"} -#' \item **B03002_014**: Hispanic or Latino, Black or African American alone \code{"HoLB"} -#' \item **B03002_015**: Hispanic or Latino, American Indian and Alaska Native alone \code{"HoLAIAN"} -#' \item **B03002_016**: Hispanic or Latino, Asian alone \code{"HoLA"} -#' \item **B03002_017**: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"HoLNHOPI"} -#' \item **B03002_018**: Hispanic or Latino, Some other race alone \code{"HoLSOR"} -#' \item **B03002_019**: Hispanic or Latino, Two or more races \code{"HoLTOMR"} -#' \item **B03002_020**: Hispanic or Latino, Two races including Some other race \code{"HoLTRiSOR"} -#' \item **B03002_021**: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"HoLTReSOR"} +#' \item **B03002_002**: not Hispanic or Latino \code{'NHoL'} +#' \item **B03002_003**: not Hispanic or Latino, white alone\code{'NHoLW'} +#' \item **B03002_004**: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +#' \item **B03002_005**: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +#' \item **B03002_006**: not Hispanic or Latino, Asian alone \code{'NHoLA'} +#' \item **B03002_007**: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +#' \item **B03002_008**: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +#' \item **B03002_009**: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +#' \item **B03002_010**: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +#' \item **B03002_011**: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +#' \item **B03002_012**: Hispanic or Latino \code{'HoL'} +#' \item **B03002_013**: Hispanic or Latino, white alone \code{'HoLW'} +#' \item **B03002_014**: Hispanic or Latino, Black or African American alone \code{'HoLB'} +#' \item **B03002_015**: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +#' \item **B03002_016**: Hispanic or Latino, Asian alone \code{'HoLA'} +#' \item **B03002_017**: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +#' \item **B03002_018**: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +#' \item **B03002_019**: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +#' \item **B03002_020**: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +#' \item **B03002_021**: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} #' } -#' +#' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. NOTE: Current version does not correct for edge effects (e.g., census geographies along the specified spatial extent border, coastline, or U.S.-Mexico / U.S.-Canada border) may have few neighboring census geographies, and RI values in these census geographies may be unstable. A stop-gap solution for the former source of edge effect is to compute the RI for neighboring census geographies (i.e., the states bordering a study area of interest) and then use the estimates of the study area of interest. -#' +#' #' A census geography (and its neighbors) that has nearly all of its population who identify with the specified race/ethnicity subgroup(s) (e.g., non-Hispanic or Latino, Black or African American alone) will have an RI value close to 1. In contrast, a census geography (and its neighbors) that has nearly none of its population who identify with the specified race/ethnicity subgroup(s) (e.g., not non-Hispanic or Latino, Black or African American alone) will have an RI value close to 0. -#' +#' #' @return An object of class 'list'. This is a named list with the following components: -#' +#' #' \describe{ #' \item{\code{ri}}{An object of class 'tbl' for the GEOID, name, RI, and raw census values of specified census geographies.} #' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute RI.} #' } -#' +#' #' @import dplyr #' @importFrom Matrix sparseMatrix #' @importFrom sf st_drop_geometry st_geometry st_intersects @@ -52,161 +52,206 @@ #' @importFrom tidycensus get_acs #' @importFrom tidyr pivot_longer separate #' @export -#' +#' #' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). #' #' @examples #' \dontrun{ #' # Wrapped in \dontrun{} because these examples require a Census API key. -#' +#' #' # Tract-level metric (2020) -#' anthopolos(geo = "tract", state = "GA", -#' year = 2020, subgroup = c("NHoLB", "HoLB")) -#' +#' anthopolos( +#' geo = 'tract', +#' state = 'GA', +#' year = 2020, +#' subgroup = c('NHoLB', 'HoLB') +#' ) +#' #' # County-level metric (2020) -#' anthopolos(geo = "county", state = "GA", -#' year = 2020, subgroup = c("NHoLB", "HoLB")) -#' +#' anthopolos( +#' geo = 'county', +#' state = 'GA', +#' year = 2020, +#' subgroup = c('NHoLB', 'HoLB') +#' ) +#' #' } -#' -anthopolos <- function(geo = "tract", year = 2020, subgroup, quiet = FALSE, ...) { - - # Check arguments - match.arg(geo, choices = c("county", "tract")) - stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward - match.arg(subgroup, several.ok = TRUE, - choices = c("NHoL", "NHoLW", "NHoLB", "NHoLAIAN", "NHoLA", "NHoLNHOPI", - "NHoLSOR", "NHoLTOMR", "NHoLTRiSOR", "NHoLTReSOR", - "HoL", "HoLW", "HoLB", "HoLAIAN", "HoLA", "HoLNHOPI", - "HoLSOR", "HoLTOMR", "HoLTRiSOR", "HoLTReSOR")) - - # Select census variables - vars <- c(TotalPop = "B03002_001", - NHoL = "B03002_002", - NHoLW = "B03002_003", - NHoLB = "B03002_004", - NHoLAIAN = "B03002_005", - NHoLA = "B03002_006", - NHoLNHOPI = "B03002_007", - NHoLSOR = "B03002_008", - NHoLTOMR = "B03002_009", - NHoLTRiSOR = "B03002_010", - NHoLTReSOR = "B03002_011", - HoL = "B03002_012", - HoLW = "B03002_013", - HoLB = "B03002_014", - HoLAIAN = "B03002_015", - HoLA = "B03002_016", - HoLNHOPI = "B03002_017", - HoLSOR = "B03002_018", - HoLTOMR = "B03002_019", - HoLTRiSOR = "B03002_020", - HoLTReSOR = "B03002_021") - - selected_vars <- vars[c("TotalPop", subgroup)] - out_names <- names(selected_vars) # save for output - prefix <- "subgroup" - suffix <- seq(1:length(subgroup)) - names(selected_vars) <- c("TotalPop", paste(prefix, suffix, sep = "")) - in_names <- paste(names(selected_vars), "E", sep = "") - - # Acquire RI variables and sf geometries - ri_data <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo, - year = year, - output = "wide", - variables = selected_vars, - geometry = TRUE, ...))) - +#' +anthopolos <- function(geo = 'tract', + year = 2020, + subgroup, + quiet = FALSE, + ...) { - if (geo == "tract") { + # Check arguments + match.arg(geo, choices = c('county', 'tract')) + stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward + match.arg( + subgroup, + several.ok = TRUE, + choices = c( + 'NHoL', + 'NHoLW', + 'NHoLB', + 'NHoLAIAN', + 'NHoLA', + 'NHoLNHOPI', + 'NHoLSOR', + 'NHoLTOMR', + 'NHoLTRiSOR', + 'NHoLTReSOR', + 'HoL', + 'HoLW', + 'HoLB', + 'HoLAIAN', + 'HoLA', + 'HoLNHOPI', + 'HoLSOR', + 'HoLTOMR', + 'HoLTRiSOR', + 'HoLTReSOR' + ) + ) + + # Select census variables + vars <- c( + TotalPop = 'B03002_001', + NHoL = 'B03002_002', + NHoLW = 'B03002_003', + NHoLB = 'B03002_004', + NHoLAIAN = 'B03002_005', + NHoLA = 'B03002_006', + NHoLNHOPI = 'B03002_007', + NHoLSOR = 'B03002_008', + NHoLTOMR = 'B03002_009', + NHoLTRiSOR = 'B03002_010', + NHoLTReSOR = 'B03002_011', + HoL = 'B03002_012', + HoLW = 'B03002_013', + HoLB = 'B03002_014', + HoLAIAN = 'B03002_015', + HoLA = 'B03002_016', + HoLNHOPI = 'B03002_017', + HoLSOR = 'B03002_018', + HoLTOMR = 'B03002_019', + HoLTRiSOR = 'B03002_020', + HoLTReSOR = 'B03002_021' + ) + + selected_vars <- vars[c('TotalPop', subgroup)] + out_names <- names(selected_vars) # save for output + prefix <- 'subgroup' + suffix <- seq(1:length(subgroup)) + names(selected_vars) <- c('TotalPop', paste(prefix, suffix, sep = '')) + in_names <- paste(names(selected_vars), 'E', sep = '') + + # Acquire RI variables and sf geometries + ri_data <- suppressMessages(suppressWarnings( + tidycensus::get_acs( + geography = geo, + year = year, + output = 'wide', + variables = selected_vars, + geometry = TRUE, + ... + ) + )) + + if (geo == 'tract') { + ri_data <- ri_data %>% + tidyr::separate(NAME, into = c('tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate(tract = gsub('[^0-9\\.]', '', tract)) + } else { + ri_data <- ri_data %>% + tidyr::separate(NAME, into = c('county', 'state'), sep = ',') + } + ri_data <- ri_data %>% - tidyr::separate(NAME, into = c("tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]","", tract)) - } else { - ri_data <- ri_data %>% tidyr::separate(NAME, into = c("county", "state"), sep = ",") - } - - ri_data <- ri_data %>% - dplyr::mutate(subgroup = rowSums(sf::st_drop_geometry(ri_data[ , in_names[-1]]))) - - # Compute RI - ## From Anthopolos et al. (2011) https://doi.org/10.1016/j.sste.2011.06.002 - ## RI_{im} = (Sigma_{j∈∂_{i}} w_{ij} * T_{jm}) / (Sigma_{j∈∂_{i}} w_{ij} * T_{j}) - ## Where: - ## ∂_{i} denotes the set of index units i and its neighbors - ## Given M mutually exclusive racial/ethnic subgroups, m indexes the subgroups of M - ## T_{i} denotes the total population in region i (TotalPop) - ## T_{im} denotes the population of the selected subgroup(s) (subgroup1, ...) - ## w_{ij} denotes a nXn first-order adjacency matrix, where n is the number of census geometries in the study area - ### and the entries of w_{ij} are set to 1 if a boundary is shared by region i and region j and zero otherwise - ### Entries of the main diagonal (since i∈∂_{i}, w_{ij} = w_{ii} when j = i) of w_{ij} are set to 1.5 - ### such that the weight of the index unit, i, is larger than the weights assigned to adjacent tracts - - ## Geospatial adjacency matrix (wij) - tmp <- sf::st_intersects(sf::st_geometry(ri_data), sparse = TRUE) - names(tmp) <- as.character(seq_len(nrow(ri_data))) - tmpL <- length(tmp) - tmpcounts <- unlist(Map(length, tmp)) - tmpi <- rep(1:tmpL, tmpcounts) - tmpj <- unlist(tmp) - wij <- Matrix::sparseMatrix(i = tmpi, j = tmpj, x = 1, dims = c(tmpL, tmpL)) - diag(wij) <- 1.5 - - ## Compute - ri_data <- sf::st_drop_geometry(ri_data) # drop geometries (can join back later) - RIim <- list() - for (i in 1:dim(wij)[1]){ - RIim[[i]] <- sum(as.matrix(wij[i, ])*ri_data[ , "subgroup"]) / sum(as.matrix(wij[i, ])*ri_data[, "TotalPopE"]) - } - ri_data$RI <- unlist(RIim) - - # Warning for missingness of census characteristics - missingYN <- ri_data[ , in_names] - names(missingYN) <- out_names - missingYN <- missingYN %>% - tidyr::pivot_longer(cols = dplyr::everything(), - names_to = "variable", - values_to = "val") %>% - dplyr::group_by(variable) %>% - dplyr::summarise(total = dplyr::n(), - n_missing = sum(is.na(val)), - percent_missing = paste0(round(mean(is.na(val)) * 100, 2), " %")) - - if (quiet == FALSE) { - # Warning for missing census data - if (sum(missingYN$n_missing) > 0) { - message("Warning: Missing census data") + dplyr::mutate(subgroup = rowSums(sf::st_drop_geometry(ri_data[, in_names[-1]]))) + + # Compute RI + ## From Anthopolos et al. (2011) https://doi.org/10.1016/j.sste.2011.06.002 + ## RI_{im} = (Sigma_{j∈∂_{i}} w_{ij} * T_{jm}) / (Sigma_{j∈∂_{i}} w_{ij} * T_{j}) + ## Where: + ## ∂_{i} denotes the set of index units i and its neighbors + ## Given M mutually exclusive racial/ethnic subgroups, m indexes the subgroups of M + ## T_{i} denotes the total population in region i (TotalPop) + ## T_{im} denotes the population of the selected subgroup(s) (subgroup1, ...) + ## w_{ij} denotes a nXn first-order adjacency matrix, where n is the number of census geometries in the study area + ### and the entries of w_{ij} are set to 1 if a boundary is shared by region i and region j and zero otherwise + ### Entries of the main diagonal (since i∈∂_{i}, w_{ij} = w_{ii} when j = i) of w_{ij} are set to 1.5 + ### such that the weight of the index unit, i, is larger than the weights assigned to adjacent tracts + + ## Geospatial adjacency matrix (wij) + tmp <- ri_data %>% + sf::st_geometry() %>% + sf::st_intersects(sparse = TRUE) + names(tmp) <- as.character(seq_len(nrow(ri_data))) + tmpL <- length(tmp) + tmpcounts <- unlist(Map(length, tmp)) + tmpi <- rep(1:tmpL, tmpcounts) + tmpj <- unlist(tmp) + wij <- Matrix::sparseMatrix( + i = tmpi, + j = tmpj, + x = 1, + dims = c(tmpL, tmpL) + ) + diag(wij) <- 1.5 + + ## Compute + ri_data <- ri_data %>% + sf::st_drop_geometry() # drop geometries (can join back later) + RIim <- list() + for (i in 1:dim(wij)[1]) { + RIim[[i]] <- sum(as.matrix(wij[i,]) * ri_data[, 'subgroup']) / + sum(as.matrix(wij[i,]) * ri_data[, 'TotalPopE']) } + ri_data$RI <- unlist(RIim) + + # Warning for missingness of census characteristics + missingYN <- ri_data[, in_names] + names(missingYN) <- out_names + missingYN <- missingYN %>% + tidyr::pivot_longer( + cols = dplyr::everything(), + names_to = 'variable', + values_to = 'val' + ) %>% + dplyr::group_by(variable) %>% + dplyr::summarise( + total = dplyr::n(), + n_missing = sum(is.na(val)), + percent_missing = paste0(round(mean(is.na(val)) * 100, 2), ' %') + ) + + if (quiet == FALSE) { + # Warning for missing census data + if (sum(missingYN$n_missing) > 0) { + message('Warning: Missing census data') + } + } + + # Format output + if (geo == 'tract') { + ri <- ri_data %>% + dplyr::select(c('GEOID', 'state', 'county', 'tract', 'RI', dplyr::all_of(in_names))) + names(ri) <- c('GEOID', 'state', 'county', 'tract', 'RI', out_names) + } else { + ri <- ri_data %>% + dplyr::select(c('GEOID', 'state', 'county', 'RI', dplyr::all_of(in_names))) + names(ri) <- c('GEOID', 'state', 'county', 'RI', out_names) + } + + ri <- ri %>% + dplyr::mutate( + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + out <- list(ri = ri, missing = missingYN) + + return(out) } - - # Format output - if (geo == "tract") { - ri <- ri_data %>% - dplyr::select(c("GEOID", - "state", - "county", - "tract", - "RI", - dplyr::all_of(in_names))) - names(ri) <- c("GEOID", "state", "county", "tract", "RI", out_names) - } else { - ri <- ri_data %>% - dplyr::select(c("GEOID", - "state", - "county", - "RI", - dplyr::all_of(in_names))) - names(ri) <- c("GEOID", "state", "county", "RI", out_names) - } - - ri <- ri %>% - dplyr::mutate(state = stringr::str_trim(state), - county = stringr::str_trim(county)) %>% - dplyr::arrange(GEOID) %>% - dplyr::as_tibble() - - out <- list(ri = ri, - missing = missingYN) - - return(out) -} diff --git a/R/atkinson.R b/R/atkinson.R index 6185ec6..6e6f4ca 100644 --- a/R/atkinson.R +++ b/R/atkinson.R @@ -1,9 +1,9 @@ -#' Atkinson Index based on Atkinson (1970) -#' +#' Atkinson Index based on Atkinson (1970) +#' #' Compute the aspatial Atkinson Index of income or selected racial/ethnic subgroup(s) and U.S. geographies. #' -#' @param geo_large Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = "county"}. -#' @param geo_small Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = "tract"}. +#' @param geo_large Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}. +#' @param geo_small Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}. #' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available. #' @param subgroup Character string specifying the income or racial/ethnic subgroup(s) as the comparison population. See Details for available choices. #' @param epsilon Numerical. Shape parameter that denotes the aversion to inequality. Value must be between 0 and 1.0 (the default is 0.5). @@ -12,47 +12,47 @@ #' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics #' #' @details This function will compute the aspatial Atkinson Index (AI) of income or selected racial/ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Atkinson (1970) \doi{10.1016/0022-0531(70)90039-6}. This function provides the computation of AI for median household income and any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). -#' -#' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. When \code{subgroup = "MedHHInc"}, the metric will be computed for median household income ("B19013_001"). The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: +#' +#' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. When \code{subgroup = 'MedHHInc'}, the metric will be computed for median household income ('B19013_001'). The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: #' \itemize{ -#' \item **B03002_002**: not Hispanic or Latino \code{"NHoL"} -#' \item **B03002_003**: not Hispanic or Latino, white alone \code{"NHoLW"} -#' \item **B03002_004**: not Hispanic or Latino, Black or African American alone \code{"NHoLB"} -#' \item **B03002_005**: not Hispanic or Latino, American Indian and Alaska Native alone \code{"NHoLAIAN"} -#' \item **B03002_006**: not Hispanic or Latino, Asian alone \code{"NHoLA"} -#' \item **B03002_007**: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"NHoLNHOPI"} -#' \item **B03002_008**: not Hispanic or Latino, Some other race alone \code{"NHoLSOR"} -#' \item **B03002_009**: not Hispanic or Latino, Two or more races \code{"NHoLTOMR"} -#' \item **B03002_010**: not Hispanic or Latino, Two races including Some other race \code{"NHoLTRiSOR"} -#' \item **B03002_011**: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"NHoLTReSOR"} -#' \item **B03002_012**: Hispanic or Latino \code{"HoL"} -#' \item **B03002_013**: Hispanic or Latino, white alone \code{"HoLW"} -#' \item **B03002_014**: Hispanic or Latino, Black or African American alone \code{"HoLB"} -#' \item **B03002_015**: Hispanic or Latino, American Indian and Alaska Native alone \code{"HoLAIAN"} -#' \item **B03002_016**: Hispanic or Latino, Asian alone \code{"HoLA"} -#' \item **B03002_017**: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"HoLNHOPI"} -#' \item **B03002_018**: Hispanic or Latino, Some other race alone \code{"HoLSOR"} -#' \item **B03002_019**: Hispanic or Latino, Two or more races \code{"HoLTOMR"} -#' \item **B03002_020**: Hispanic or Latino, Two races including Some other race \code{"HoLTRiSOR"} -#' \item **B03002_021**: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"HoLTReSOR"} +#' \item **B03002_002**: not Hispanic or Latino \code{'NHoL'} +#' \item **B03002_003**: not Hispanic or Latino, white alone \code{'NHoLW'} +#' \item **B03002_004**: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +#' \item **B03002_005**: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +#' \item **B03002_006**: not Hispanic or Latino, Asian alone \code{'NHoLA'} +#' \item **B03002_007**: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +#' \item **B03002_008**: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +#' \item **B03002_009**: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +#' \item **B03002_010**: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +#' \item **B03002_011**: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +#' \item **B03002_012**: Hispanic or Latino \code{'HoL'} +#' \item **B03002_013**: Hispanic or Latino, white alone \code{'HoLW'} +#' \item **B03002_014**: Hispanic or Latino, Black or African American alone \code{'HoLB'} +#' \item **B03002_015**: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +#' \item **B03002_016**: Hispanic or Latino, Asian alone \code{'HoLA'} +#' \item **B03002_017**: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +#' \item **B03002_018**: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +#' \item **B03002_019**: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +#' \item **B03002_020**: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +#' \item **B03002_021**: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} #' } #' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. -#' +#' #' AI is a measure of the evenness of residential inequality (e.g., racial/ethnic segregation) when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. The AI metric can range in value from 0 to 1 with smaller values indicating lower levels of inequality (e.g., less segregation). -#' -#' The \code{epsilon} argument that determines how to weight the increments to inequality contributed by different proportions of the Lorenz curve. A user must explicitly decide how heavily to weight smaller geographical units at different points on the Lorenz curve (i.e., whether the index should take greater account of differences among areas of over- or under-representation). The \code{epsilon} argument must have values between 0 and 1.0. For \code{0 <= epsilon < 0.5} or less "inequality-averse," smaller geographical units with a subgroup proportion smaller than the subgroup proportion of the larger geographical unit contribute more to inequality ("over-representation"). For \code{0.5 < epsilon <= 1.0} or more "inequality-averse," smaller geographical units with a subgroup proportion larger than the subgroup proportion of the larger geographical unit contribute more to inequality ("under-representation"). If \code{epsilon = 0.5} (the default), units of over- and under-representation contribute equally to the index. See Section 2.3 of Saint-Jacques _et al._ (2020) \doi{10.48550/arXiv.2002.05819} for one method to select \code{epsilon}. -#' -#' Larger geographies available include state \code{geo_large = "state"}, county \code{geo_large = "county"}, and census tract \code{geo_large = "tract"} levels. Smaller geographies available include, county \code{geo_small = "county"}, census tract \code{geo_small = "tract"}, and census block group \code{geo_small = "block group"} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the AI value returned is NA. -#' +#' +#' The \code{epsilon} argument that determines how to weight the increments to inequality contributed by different proportions of the Lorenz curve. A user must explicitly decide how heavily to weight smaller geographical units at different points on the Lorenz curve (i.e., whether the index should take greater account of differences among areas of over- or under-representation). The \code{epsilon} argument must have values between 0 and 1.0. For \code{0 <= epsilon < 0.5} or less 'inequality-averse,' smaller geographical units with a subgroup proportion smaller than the subgroup proportion of the larger geographical unit contribute more to inequality ('over-representation'). For \code{0.5 < epsilon <= 1.0} or more 'inequality-averse,' smaller geographical units with a subgroup proportion larger than the subgroup proportion of the larger geographical unit contribute more to inequality ('under-representation'). If \code{epsilon = 0.5} (the default), units of over- and under-representation contribute equally to the index. See Section 2.3 of Saint-Jacques et al. (2020) \doi{10.48550/arXiv.2002.05819} for one method to select \code{epsilon}. +#' +#' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, and census tract \code{geo_large = 'tract'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the AI value returned is NA. +#' #' @return An object of class 'list'. This is a named list with the following components: -#' +#' #' \describe{ #' \item{\code{ai}}{An object of class 'tbl' for the GEOID, name, and AI at specified larger census geographies.} #' \item{\code{ai_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.} #' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute AI.} #' } -#' +#' #' @import dplyr #' @importFrom sf st_drop_geometry #' @importFrom stats na.omit @@ -60,195 +60,254 @@ #' @importFrom tidyr pivot_longer separate #' @importFrom utils stack #' @export -#' +#' #' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). #' #' @examples #' \dontrun{ #' # Wrapped in \dontrun{} because these examples require a Census API key. -#' +#' #' # Atkinson Index of non-Hispanic Black populations #' ## of census tracts within Georgia, U.S.A., counties (2020) -#' atkinson(geo_large = "county", geo_small = "tract", state = "GA", -#' year = 2020, subgroup = "NHoLB") -#' +#' atkinson( +#' geo_large = 'county', +#' geo_small = 'tract', +#' state = 'GA', +#' year = 2020, +#' subgroup = 'NHoLB' +#' ) +#' #' } -#' -atkinson <- function(geo_large = "county", geo_small = "tract", year = 2020, subgroup, epsilon = 0.5, omit_NAs = TRUE, quiet = FALSE, ...) { - - # Check arguments - match.arg(geo_large, choices = c("state", "county", "tract")) - match.arg(geo_small, choices = c("county", "tract", "block group")) - stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward - match.arg(subgroup, several.ok = TRUE, - choices = c("NHoL", "NHoLW", "NHoLB", "NHoLAIAN", "NHoLA", "NHoLNHOPI", - "NHoLSOR", "NHoLTOMR", "NHoLTRiSOR", "NHoLTReSOR", - "HoL", "HoLW", "HoLB", "HoLAIAN", "HoLA", "HoLNHOPI", - "HoLSOR", "HoLTOMR", "HoLTRiSOR", "HoLTReSOR", "MedHHInc")) - stopifnot(is.numeric(epsilon), epsilon >= 0 , epsilon <= 1) # values between 0 and 1 - - # Select census variables - vars <- c(NHoL = "B03002_002", - NHoLW = "B03002_003", - NHoLB = "B03002_004", - NHoLAIAN = "B03002_005", - NHoLA = "B03002_006", - NHoLNHOPI = "B03002_007", - NHoLSOR = "B03002_008", - NHoLTOMR = "B03002_009", - NHoLTRiSOR = "B03002_010", - NHoLTReSOR = "B03002_011", - HoL = "B03002_012", - HoLW = "B03002_013", - HoLB = "B03002_014", - HoLAIAN = "B03002_015", - HoLA = "B03002_016", - HoLNHOPI = "B03002_017", - HoLSOR = "B03002_018", - HoLTOMR = "B03002_019", - HoLTRiSOR = "B03002_020", - HoLTReSOR = "B03002_021", - MedHHInc = "B19013_001") - - selected_vars <- vars[subgroup] - out_names <- names(selected_vars) # save for output - in_subgroup <- paste(subgroup, "E", sep = "") - - # Acquire AI variables and sf geometries - ai_data <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo_small, - year = year, - output = "wide", - variables = selected_vars, - geometry = TRUE, - keep_geo_vars = TRUE, ...))) - - # Format output - if (geo_small == "county") { - ai_data <- sf::st_drop_geometry(ai_data) %>% - tidyr::separate(NAME.y, into = c("county", "state"), sep = ",") - } - if (geo_small == "tract") { - ai_data <- sf::st_drop_geometry(ai_data) %>% - tidyr::separate(NAME.y, into = c("tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]", "", tract)) - } - if (geo_small == "block group") { - ai_data <- sf::st_drop_geometry(ai_data) %>% - tidyr::separate(NAME.y, into = c("block.group", "tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]", "", tract), - block.group = gsub("[^0-9\\.]", "", block.group)) - } +#' +atkinson <- function(geo_large = 'county', + geo_small = 'tract', + year = 2020, + subgroup, + epsilon = 0.5, + omit_NAs = TRUE, + quiet = FALSE, + ...) { - # Grouping IDs for AI computation - if (geo_large == "tract") { - ai_data <- ai_data %>% - dplyr::mutate(oid = paste(.$STATEFP, .$COUNTYFP, .$TRACTCE, sep = ""), - state = stringr::str_trim(state), - county = stringr::str_trim(county)) - } - if (geo_large == "county") { - ai_data <- ai_data %>% - dplyr::mutate(oid = paste(.$STATEFP, .$COUNTYFP, sep = ""), - state = stringr::str_trim(state), - county = stringr::str_trim(county)) - } - if (geo_large == "state") { - ai_data <- ai_data %>% - dplyr::mutate(oid = .$STATEFP, - state = stringr::str_trim(state)) - } - - # Count of racial/ethnic subgroup populations - ## Count of racial/ethnic subgroup population - if (length(in_subgroup) == 1) { - ai_data <- ai_data %>% - dplyr::mutate(subgroup = .[ , in_subgroup]) - } else { - ai_data <- ai_data %>% - dplyr::mutate(subgroup = rowSums(.[ , in_subgroup])) - } - - # Compute AI - ## From Atkinson (1970) https://doi.org/10.1016/0022-0531(70)90039-6 - ## A_{\epsilon}(x_{1},...,x_{n}) = \begin{Bmatrix} - ## 1 - (\frac{1}{n}\sum_{i=1}^{n}x_{i}^{1-\epsilon})^{1/(1-\epsilon)}/(\frac{1}{n}\sum_{i=1}^{n}x_{i}) & \mathrm{if\:} \epsilon \neq 1 \\ - ## 1 - (\prod_{i=1}^{n}x_{i})^{1/n}/(\frac{1}{n}\sum_{i=1}^{n}x_{i}) & \mathrm{if\:} \epsilon = 1 \\ - ## \end{Bmatrix} - ## Where the Atkinson index (A) is defined for a population subgroup count (x) of a given smaller geographical unit (i) for n smaller geographical units - ## and an inequality-aversion parameter (epsilon) - ## If denoting the Hölder mean (based on `Atkinson()` function in 'DescTools' package) by - ## M_{p}(x_{1},...,x_{n}) = \begin{Bmatrix} - ## (\frac{1}{n}\sum_{i=1}^{n}x_{i}^{p})^{1/p} & \mathrm{if\:} p \neq 0 \\ - ## (\prod_{i=1}^{n}x_{i})^{1/n} & \mathrm{if\:} p = 0 \\ - ## \end{Bmatrix} - ## then AI is - ## A_{\epsilon}(x_{1},...,x_{n}) = 1 - \frac{M_{1-\epsilon}(x_{1},...,x_{n})}{M_{1}(x_{1},...,x_{n})} - - ## Compute - AItmp <- ai_data %>% - split(., f = list(ai_data$oid)) %>% - lapply(., FUN = ai_fun, epsilon = epsilon, omit_NAs = omit_NAs) %>% - utils::stack(.) %>% - dplyr::mutate(AI = values, - oid = ind) %>% - dplyr::select(AI, oid) - - # Warning for missingness of census characteristics - missingYN <- as.data.frame(ai_data[ , in_subgroup]) - names(missingYN) <- out_names - missingYN <- missingYN %>% - tidyr::pivot_longer(cols = dplyr::everything(), - names_to = "variable", - values_to = "val") %>% - dplyr::group_by(variable) %>% - dplyr::summarise(total = dplyr::n(), - n_missing = sum(is.na(val)), - percent_missing = paste0(round(mean(is.na(val)) * 100, 2), " %")) - - if (quiet == FALSE) { - # Warning for missing census data - if (sum(missingYN$n_missing) > 0) { - message("Warning: Missing census data") + # Check arguments + match.arg(geo_large, choices = c('state', 'county', 'tract')) + match.arg(geo_small, choices = c('county', 'tract', 'block group')) + stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward + match.arg( + subgroup, + several.ok = TRUE, + choices = c( + 'NHoL', + 'NHoLW', + 'NHoLB', + 'NHoLAIAN', + 'NHoLA', + 'NHoLNHOPI', + 'NHoLSOR', + 'NHoLTOMR', + 'NHoLTRiSOR', + 'NHoLTReSOR', + 'HoL', + 'HoLW', + 'HoLB', + 'HoLAIAN', + 'HoLA', + 'HoLNHOPI', + 'HoLSOR', + 'HoLTOMR', + 'HoLTRiSOR', + 'HoLTReSOR', + 'MedHHInc' + ) + ) + stopifnot(is.numeric(epsilon), epsilon >= 0 , epsilon <= 1) # values between 0 and 1 + + # Select census variables + vars <- c( + NHoL = 'B03002_002', + NHoLW = 'B03002_003', + NHoLB = 'B03002_004', + NHoLAIAN = 'B03002_005', + NHoLA = 'B03002_006', + NHoLNHOPI = 'B03002_007', + NHoLSOR = 'B03002_008', + NHoLTOMR = 'B03002_009', + NHoLTRiSOR = 'B03002_010', + NHoLTReSOR = 'B03002_011', + HoL = 'B03002_012', + HoLW = 'B03002_013', + HoLB = 'B03002_014', + HoLAIAN = 'B03002_015', + HoLA = 'B03002_016', + HoLNHOPI = 'B03002_017', + HoLSOR = 'B03002_018', + HoLTOMR = 'B03002_019', + HoLTRiSOR = 'B03002_020', + HoLTReSOR = 'B03002_021', + MedHHInc = 'B19013_001' + ) + + selected_vars <- vars[subgroup] + out_names <- names(selected_vars) # save for output + in_subgroup <- paste(subgroup, 'E', sep = '') + + # Acquire AI variables and sf geometries + ai_data <- suppressMessages(suppressWarnings( + tidycensus::get_acs( + geography = geo_small, + year = year, + output = 'wide', + variables = selected_vars, + geometry = TRUE, + keep_geo_vars = TRUE, + ... + ) + )) + + # Format output + if (geo_small == 'county') { + ai_data <- ai_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('county', 'state'), sep = ',') } + if (geo_small == 'tract') { + ai_data <- ai_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate(tract = gsub('[^0-9\\.]', '', tract)) + } + if (geo_small == 'block group') { + ai_data <- ai_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('block.group', 'tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate( + tract = gsub('[^0-9\\.]', '', tract), + block.group = gsub('[^0-9\\.]', '', block.group) + ) + } + + # Grouping IDs for AI computation + if (geo_large == 'tract') { + ai_data <- ai_data %>% + dplyr::mutate( + oid = paste(.$STATEFP, .$COUNTYFP, .$TRACTCE, sep = ''), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) + } + if (geo_large == 'county') { + ai_data <- ai_data %>% + dplyr::mutate( + oid = paste(.$STATEFP, .$COUNTYFP, sep = ''), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) + } + if (geo_large == 'state') { + ai_data <- ai_data %>% + dplyr::mutate( + oid = .$STATEFP, + state = stringr::str_trim(state) + ) + } + + # Count of racial/ethnic subgroup populations + ## Count of racial/ethnic subgroup population + if (length(in_subgroup) == 1) { + ai_data <- ai_data %>% + dplyr::mutate(subgroup = .[, in_subgroup]) + } else { + ai_data <- ai_data %>% + dplyr::mutate(subgroup = rowSums(.[, in_subgroup])) + } + + # Compute AI + ## From Atkinson (1970) https://doi.org/10.1016/0022-0531(70)90039-6 + ## A_{\epsilon}(x_{1},...,x_{n}) = \begin{Bmatrix} + ## 1 - (\frac{1}{n}\sum_{i=1}^{n}x_{i}^{1-\epsilon})^{1/(1-\epsilon)}/(\frac{1}{n}\sum_{i=1}^{n}x_{i}) & \mathrm{if\:} \epsilon \neq 1 \\ + ## 1 - (\prod_{i=1}^{n}x_{i})^{1/n}/(\frac{1}{n}\sum_{i=1}^{n}x_{i}) & \mathrm{if\:} \epsilon = 1 \\ + ## \end{Bmatrix} + ## Where the Atkinson index (A) is defined for a population subgroup count (x) of a given smaller geographical unit (i) for n smaller geographical units + ## and an inequality-aversion parameter (epsilon) + ## If denoting the Hölder mean (based on `Atkinson()` function in 'DescTools' package) by + ## M_{p}(x_{1},...,x_{n}) = \begin{Bmatrix} + ## (\frac{1}{n}\sum_{i=1}^{n}x_{i}^{p})^{1/p} & \mathrm{if\:} p \neq 0 \\ + ## (\prod_{i=1}^{n}x_{i})^{1/n} & \mathrm{if\:} p = 0 \\ + ## \end{Bmatrix} + ## then AI is + ## A_{\epsilon}(x_{1},...,x_{n}) = 1 - \frac{M_{1-\epsilon}(x_{1},...,x_{n})}{M_{1}(x_{1},...,x_{n})} + + ## Compute + AItmp <- ai_data %>% + split(., f = list(ai_data$oid)) %>% + lapply(., FUN = ai_fun, epsilon = epsilon, omit_NAs = omit_NAs) %>% + utils::stack(.) %>% + dplyr::mutate( + AI = values, + oid = ind + ) %>% + dplyr::select(AI, oid) + + # Warning for missingness of census characteristics + missingYN <- as.data.frame(ai_data[, in_subgroup]) + names(missingYN) <- out_names + missingYN <- missingYN %>% + tidyr::pivot_longer( + cols = dplyr::everything(), + names_to = 'variable', + values_to = 'val' + ) %>% + dplyr::group_by(variable) %>% + dplyr::summarise( + total = dplyr::n(), + n_missing = sum(is.na(val)), + percent_missing = paste0(round(mean(is.na(val)) * 100, 2), ' %') + ) + + if (quiet == FALSE) { + # Warning for missing census data + if (sum(missingYN$n_missing) > 0) { + message('Warning: Missing census data') + } + } + + # Format output + if (geo_large == 'state') { + ai <- ai_data %>% + dplyr::left_join(AItmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, state, AI) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, state, AI) %>% + .[.$GEOID != 'NANA',] + } + if (geo_large == 'county') { + ai <- ai_data %>% + dplyr::left_join(AItmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, state, county, AI) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, state, county, AI) %>% + .[.$GEOID != 'NANA',] + } + if (geo_large == 'tract') { + ai <- ai_data %>% + dplyr::left_join(AItmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, state, county, tract, AI) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, state, county, tract, AI) %>% + .[.$GEOID != 'NANA',] + } + + ai <- ai %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + ai_data <- ai_data %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + out <- list(ai = ai, ai_data = ai_data, missing = missingYN) + + return(out) } - - # Format output - if (geo_large == "state") { - ai <- merge(ai_data, AItmp) %>% - dplyr::select(oid, state, AI) %>% - unique(.) %>% - dplyr::mutate(GEOID = oid) %>% - dplyr::select(GEOID, state, AI) %>% - .[.$GEOID != "NANA", ] - } - if (geo_large == "county") { - ai <- merge(ai_data, AItmp) %>% - dplyr::select(oid, state, county, AI) %>% - unique(.) %>% - dplyr::mutate(GEOID = oid) %>% - dplyr::select(GEOID, state, county, AI) %>% - .[.$GEOID != "NANA", ] - } - if (geo_large == "tract") { - ai <- merge(ai_data, AItmp) %>% - dplyr::select(oid, state, county, tract, AI) %>% - unique(.) %>% - dplyr::mutate(GEOID = oid) %>% - dplyr::select(GEOID, state, county, tract, AI) %>% - .[.$GEOID != "NANA", ] - } - - ai <- ai %>% - dplyr::arrange(GEOID) %>% - dplyr::as_tibble() - - ai_data <- ai_data %>% - dplyr::arrange(GEOID) %>% - dplyr::as_tibble() - - out <- list(ai = ai, - ai_data = ai_data, - missing = missingYN) - - return(out) -} diff --git a/R/bell.R b/R/bell.R index 29d7e36..22b4cfe 100644 --- a/R/bell.R +++ b/R/bell.R @@ -1,9 +1,9 @@ -#' Isolation Index based on Shevky & Williams (1949) and Bell (1954) -#' +#' Isolation Index based on Shevky & Williams (1949) and Bell (1954) +#' #' Compute the aspatial Isolation Index (Bell) of a selected racial/ethnic subgroup(s) and U.S. geographies. #' -#' @param geo_large Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = "county"}. -#' @param geo_small Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = "tract"}. +#' @param geo_large Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}. +#' @param geo_small Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}. #' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available. #' @param subgroup Character string specifying the racial/ethnic subgroup(s). See Details for available choices. #' @param subgroup_ixn Character string specifying the racial/ethnic subgroup(s) as the interaction population. If the same as \code{subgroup}, will compute the simple isolation of the group. See Details for available choices. @@ -12,45 +12,45 @@ #' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics #' #' @details This function will compute the aspatial Isolation Index (II) of selected racial/ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell (1954) \doi{10.2307/2574118}. This function provides the computation of II for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). -#' +#' #' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: #' \itemize{ -#' \item **B03002_002**: not Hispanic or Latino \code{"NHoL"} -#' \item **B03002_003**: not Hispanic or Latino, white alone \code{"NHoLW"} -#' \item **B03002_004**: not Hispanic or Latino, Black or African American alone \code{"NHoLB"} -#' \item **B03002_005**: not Hispanic or Latino, American Indian and Alaska Native alone \code{"NHoLAIAN"} -#' \item **B03002_006**: not Hispanic or Latino, Asian alone \code{"NHoLA"} -#' \item **B03002_007**: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"NHoLNHOPI"} -#' \item **B03002_008**: not Hispanic or Latino, Some other race alone \code{"NHoLSOR"} -#' \item **B03002_009**: not Hispanic or Latino, Two or more races \code{"NHoLTOMR"} -#' \item **B03002_010**: not Hispanic or Latino, Two races including Some other race \code{"NHoLTRiSOR"} -#' \item **B03002_011**: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"NHoLTReSOR"} -#' \item **B03002_012**: Hispanic or Latino \code{"HoL"} -#' \item **B03002_013**: Hispanic or Latino, white alone \code{"HoLW"} -#' \item **B03002_014**: Hispanic or Latino, Black or African American alone \code{"HoLB"} -#' \item **B03002_015**: Hispanic or Latino, American Indian and Alaska Native alone \code{"HoLAIAN"} -#' \item **B03002_016**: Hispanic or Latino, Asian alone \code{"HoLA"} -#' \item **B03002_017**: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"HoLNHOPI"} -#' \item **B03002_018**: Hispanic or Latino, Some other race alone \code{"HoLSOR"} -#' \item **B03002_019**: Hispanic or Latino, Two or more races \code{"HoLTOMR"} -#' \item **B03002_020**: Hispanic or Latino, Two races including Some other race \code{"HoLTRiSOR"} -#' \item **B03002_021**: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"HoLTReSOR"} +#' \item **B03002_002**: not Hispanic or Latino \code{'NHoL'} +#' \item **B03002_003**: not Hispanic or Latino, white alone \code{'NHoLW'} +#' \item **B03002_004**: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +#' \item **B03002_005**: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +#' \item **B03002_006**: not Hispanic or Latino, Asian alone \code{'NHoLA'} +#' \item **B03002_007**: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +#' \item **B03002_008**: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +#' \item **B03002_009**: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +#' \item **B03002_010**: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +#' \item **B03002_011**: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +#' \item **B03002_012**: Hispanic or Latino \code{'HoL'} +#' \item **B03002_013**: Hispanic or Latino, white alone \code{'HoLW'} +#' \item **B03002_014**: Hispanic or Latino, Black or African American alone \code{'HoLB'} +#' \item **B03002_015**: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +#' \item **B03002_016**: Hispanic or Latino, Asian alone \code{'HoLA'} +#' \item **B03002_017**: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +#' \item **B03002_018**: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +#' \item **B03002_019**: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +#' \item **B03002_020**: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +#' \item **B03002_021**: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} #' } -#' +#' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. -#' +#' #' II is some measure of the probability that a member of one subgroup(s) will meet or interact with a member of another subgroup(s) with higher values signifying higher probability of interaction (less isolation). II can range in value from 0 to 1. -#' -#' Larger geographies available include state \code{geo_large = "state"}, county \code{geo_large = "county"}, and census tract \code{geo_large = "tract"} levels. Smaller geographies available include, county \code{geo_small = "county"}, census tract \code{geo_small = "tract"}, and census block group \code{geo_small = "block group"} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the II value returned is NA. -#' +#' +#' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, and census tract \code{geo_large = 'tract'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the II value returned is NA. +#' #' @return An object of class 'list'. This is a named list with the following components: -#' +#' #' \describe{ #' \item{\code{ii}}{An object of class 'tbl' for the GEOID, name, and II at specified larger census geographies.} #' \item{\code{ii_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.} #' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute II.} #' } -#' +#' #' @import dplyr #' @importFrom sf st_drop_geometry #' @importFrom stats complete.cases @@ -58,202 +58,283 @@ #' @importFrom tidyr pivot_longer separate #' @importFrom utils stack #' @export -#' +#' #' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). #' #' @examples #' \dontrun{ #' # Wrapped in \dontrun{} because these examples require a Census API key. -#' +#' #' # Isolation of non-Hispanic Black vs. non-Hispanic white populations #' ## of census tracts within Georgia, U.S.A., counties (2020) -#' bell(geo_large = "county", geo_small = "tract", state = "GA", -#' year = 2020, subgroup = "NHoLB", subgroup_ixn = "NHoLW") -#' +#' bell( +#' geo_large = 'county', +#' geo_small = 'tract', +#' state = 'GA', +#' year = 2020, +#' subgroup = 'NHoLB', +#' subgroup_ixn = 'NHoLW' +#' ) +#' #' } -#' -bell <- function(geo_large = "county", geo_small = "tract", year = 2020, subgroup, subgroup_ixn, omit_NAs = TRUE, quiet = FALSE, ...) { - - # Check arguments - match.arg(geo_large, choices = c("state", "county", "tract")) - match.arg(geo_small, choices = c("county", "tract", "block group")) - stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward - match.arg(subgroup, several.ok = TRUE, - choices = c("NHoL", "NHoLW", "NHoLB", "NHoLAIAN", "NHoLA", "NHoLNHOPI", - "NHoLSOR", "NHoLTOMR", "NHoLTRiSOR", "NHoLTReSOR", - "HoL", "HoLW", "HoLB", "HoLAIAN", "HoLA", "HoLNHOPI", - "HoLSOR", "HoLTOMR", "HoLTRiSOR", "HoLTReSOR")) - match.arg(subgroup_ixn, several.ok = TRUE, - choices = c("NHoL", "NHoLW", "NHoLB", "NHoLAIAN", "NHoLA", "NHoLNHOPI", - "NHoLSOR", "NHoLTOMR", "NHoLTRiSOR", "NHoLTReSOR", - "HoL", "HoLW", "HoLB", "HoLAIAN", "HoLA", "HoLNHOPI", - "HoLSOR", "HoLTOMR", "HoLTRiSOR", "HoLTReSOR")) - - # Select census variables - vars <- c(TotalPop = "B03002_001", - NHoL = "B03002_002", - NHoLW = "B03002_003", - NHoLB = "B03002_004", - NHoLAIAN = "B03002_005", - NHoLA = "B03002_006", - NHoLNHOPI = "B03002_007", - NHoLSOR = "B03002_008", - NHoLTOMR = "B03002_009", - NHoLTRiSOR = "B03002_010", - NHoLTReSOR = "B03002_011", - HoL = "B03002_012", - HoLW = "B03002_013", - HoLB = "B03002_014", - HoLAIAN = "B03002_015", - HoLA = "B03002_016", - HoLNHOPI = "B03002_017", - HoLSOR = "B03002_018", - HoLTOMR = "B03002_019", - HoLTRiSOR = "B03002_020", - HoLTReSOR = "B03002_021") - - selected_vars <- vars[c("TotalPop", subgroup, subgroup_ixn)] - out_names <- names(selected_vars) # save for output - in_subgroup <- paste(subgroup, "E", sep = "") - in_subgroup_ixn <- paste(subgroup_ixn, "E", sep = "") - - # Acquire II variables and sf geometries - ii_data <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo_small, - year = year, - output = "wide", - variables = selected_vars, - geometry = TRUE, - keep_geo_vars = TRUE, ...))) - - # Format output - if (geo_small == "county") { - ii_data <- sf::st_drop_geometry(ii_data) %>% - tidyr::separate(NAME.y, into = c("county", "state"), sep = ",") - } - if (geo_small == "tract") { - ii_data <- sf::st_drop_geometry(ii_data) %>% - tidyr::separate(NAME.y, into = c("tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]", "", tract)) - } - if (geo_small == "block group") { - ii_data <- sf::st_drop_geometry(ii_data) %>% - tidyr::separate(NAME.y, into = c("block.group", "tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]", "", tract), - block.group = gsub("[^0-9\\.]", "", block.group)) - } - - # Grouping IDs for II computation - if (geo_large == "tract") { - ii_data <- ii_data %>% - dplyr::mutate(oid = paste(.$STATEFP, .$COUNTYFP, .$TRACTCE, sep = ""), - state = stringr::str_trim(state), - county = stringr::str_trim(county)) - } - if (geo_large == "county") { - ii_data <- ii_data %>% - dplyr::mutate(oid = paste(.$STATEFP, .$COUNTYFP, sep = ""), - state = stringr::str_trim(state), - county = stringr::str_trim(county)) - } - if (geo_large == "state") { - ii_data <- ii_data %>% - dplyr::mutate(oid = .$STATEFP, - state = stringr::str_trim(state)) - } - - # Count of racial/ethnic subgroup populations - ## Count of racial/ethnic comparison subgroup population - if (length(in_subgroup) == 1) { - ii_data <- ii_data %>% - dplyr::mutate(subgroup = .[ , in_subgroup]) - } else { - ii_data <- ii_data %>% - dplyr::mutate(subgroup = rowSums(.[ , in_subgroup])) - } - ## Count of racial/ethnic interaction subgroup population - if (length(in_subgroup_ixn) == 1) { - ii_data <- ii_data %>% - dplyr::mutate(subgroup_ixn = .[ , in_subgroup_ixn]) - } else { - ii_data <- ii_data %>% - dplyr::mutate(subgroup_ixn = rowSums(.[ , in_subgroup_ixn])) - } - - # Compute II - ## From Bell (1954) https://doi.org/10.2307/2574118 - ## _{x}P_{y}^* = \sum_{i=1}^{k} \left ( \frac{x_{i}}{X}\right )\left ( \frac{y_{i}}{n_{i}}\right ) - ## Where for k geographical units i: - ## X denotes the total number of subgroup population in study (reference) area - ## x_{i} denotes the number of subgroup population X in geographical unit i - ## y_{i} denotes the number of subgroup population Y in geographical unit i - ## n_{i} denotes the total population of geographical unit i - ## If x_{i} = y_{i}, then computes the average isolation experienced by members of subgroup population X - - ## Compute - IItmp <- ii_data %>% - split(., f = list(ii_data$oid)) %>% - lapply(., FUN = ii_fun, omit_NAs = omit_NAs) %>% - utils::stack(.) %>% - dplyr::mutate(II = values, - oid = ind) %>% - dplyr::select(II, oid) - - # Warning for missingness of census characteristics - missingYN <- ii_data[ , c("TotalPopE", in_subgroup, in_subgroup_ixn)] - names(missingYN) <- out_names - missingYN <- missingYN %>% - tidyr::pivot_longer(cols = dplyr::everything(), - names_to = "variable", - values_to = "val") %>% - dplyr::group_by(variable) %>% - dplyr::summarise(total = dplyr::n(), - n_missing = sum(is.na(val)), - percent_missing = paste0(round(mean(is.na(val)) * 100, 2), " %")) +#' +bell <- function(geo_large = 'county', + geo_small = 'tract', + year = 2020, + subgroup, + subgroup_ixn, + omit_NAs = TRUE, + quiet = FALSE, + ...) { - if (quiet == FALSE) { - # Warning for missing census data - if (sum(missingYN$n_missing) > 0) { - message("Warning: Missing census data") + # Check arguments + match.arg(geo_large, choices = c('state', 'county', 'tract')) + match.arg(geo_small, choices = c('county', 'tract', 'block group')) + stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward + match.arg( + subgroup, + several.ok = TRUE, + choices = c( + 'NHoL', + 'NHoLW', + 'NHoLB', + 'NHoLAIAN', + 'NHoLA', + 'NHoLNHOPI', + 'NHoLSOR', + 'NHoLTOMR', + 'NHoLTRiSOR', + 'NHoLTReSOR', + 'HoL', + 'HoLW', + 'HoLB', + 'HoLAIAN', + 'HoLA', + 'HoLNHOPI', + 'HoLSOR', + 'HoLTOMR', + 'HoLTRiSOR', + 'HoLTReSOR' + ) + ) + match.arg( + subgroup_ixn, + several.ok = TRUE, + choices = c( + 'NHoL', + 'NHoLW', + 'NHoLB', + 'NHoLAIAN', + 'NHoLA', + 'NHoLNHOPI', + 'NHoLSOR', + 'NHoLTOMR', + 'NHoLTRiSOR', + 'NHoLTReSOR', + 'HoL', + 'HoLW', + 'HoLB', + 'HoLAIAN', + 'HoLA', + 'HoLNHOPI', + 'HoLSOR', + 'HoLTOMR', + 'HoLTRiSOR', + 'HoLTReSOR' + ) + ) + + # Select census variables + vars <- c( + TotalPop = 'B03002_001', + NHoL = 'B03002_002', + NHoLW = 'B03002_003', + NHoLB = 'B03002_004', + NHoLAIAN = 'B03002_005', + NHoLA = 'B03002_006', + NHoLNHOPI = 'B03002_007', + NHoLSOR = 'B03002_008', + NHoLTOMR = 'B03002_009', + NHoLTRiSOR = 'B03002_010', + NHoLTReSOR = 'B03002_011', + HoL = 'B03002_012', + HoLW = 'B03002_013', + HoLB = 'B03002_014', + HoLAIAN = 'B03002_015', + HoLA = 'B03002_016', + HoLNHOPI = 'B03002_017', + HoLSOR = 'B03002_018', + HoLTOMR = 'B03002_019', + HoLTRiSOR = 'B03002_020', + HoLTReSOR = 'B03002_021' + ) + + selected_vars <- vars[c('TotalPop', subgroup, subgroup_ixn)] + out_names <- names(selected_vars) # save for output + in_subgroup <- paste(subgroup, 'E', sep = '') + in_subgroup_ixn <- paste(subgroup_ixn, 'E', sep = '') + + # Acquire II variables and sf geometries + ii_data <- suppressMessages(suppressWarnings( + tidycensus::get_acs( + geography = geo_small, + year = year, + output = 'wide', + variables = selected_vars, + geometry = TRUE, + keep_geo_vars = TRUE, + ... + ) + )) + + + # Format output + if (geo_small == 'county') { + ii_data <- ii_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('county', 'state'), sep = ',') } + if (geo_small == 'tract') { + ii_data <- ii_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate(tract = gsub('[^0-9\\.]', '', tract)) + } + if (geo_small == 'block group') { + ii_data <- ii_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('block.group', 'tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate( + tract = gsub('[^0-9\\.]', '', tract), + block.group = gsub('[^0-9\\.]', '', block.group) + ) + } + + # Grouping IDs for II computation + if (geo_large == 'tract') { + ii_data <- ii_data %>% + dplyr::mutate( + oid = paste(.$STATEFP, .$COUNTYFP, .$TRACTCE, sep = ''), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) + } + if (geo_large == 'county') { + ii_data <- ii_data %>% + dplyr::mutate( + oid = paste(.$STATEFP, .$COUNTYFP, sep = ''), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) + } + if (geo_large == 'state') { + ii_data <- ii_data %>% + dplyr::mutate( + oid = .$STATEFP, + state = stringr::str_trim(state) + ) + } + + # Count of racial/ethnic subgroup populations + ## Count of racial/ethnic comparison subgroup population + if (length(in_subgroup) == 1) { + ii_data <- ii_data %>% + dplyr::mutate(subgroup = .[, in_subgroup]) + } else { + ii_data <- ii_data %>% + dplyr::mutate(subgroup = rowSums(.[, in_subgroup])) + } + ## Count of racial/ethnic interaction subgroup population + if (length(in_subgroup_ixn) == 1) { + ii_data <- ii_data %>% + dplyr::mutate(subgroup_ixn = .[, in_subgroup_ixn]) + } else { + ii_data <- ii_data %>% + dplyr::mutate(subgroup_ixn = rowSums(.[, in_subgroup_ixn])) + } + + # Compute II + ## From Bell (1954) https://doi.org/10.2307/2574118 + ## _{x}P_{y}^* = \sum_{i=1}^{k} \left ( \frac{x_{i}}{X}\right )\left ( \frac{y_{i}}{n_{i}}\right ) + ## Where for k geographical units i: + ## X denotes the total number of subgroup population in study (reference) area + ## x_{i} denotes the number of subgroup population X in geographical unit i + ## y_{i} denotes the number of subgroup population Y in geographical unit i + ## n_{i} denotes the total population of geographical unit i + ## If x_{i} = y_{i}, then computes the average isolation experienced by members of subgroup population X + + ## Compute + IItmp <- ii_data %>% + split(., f = list(ii_data$oid)) %>% + lapply(., FUN = ii_fun, omit_NAs = omit_NAs) %>% + utils::stack(.) %>% + dplyr::mutate( + II = values, + oid = ind + ) %>% + dplyr::select(II, oid) + + # Warning for missingness of census characteristics + missingYN <- ii_data[, c('TotalPopE', in_subgroup, in_subgroup_ixn)] + names(missingYN) <- out_names + missingYN <- missingYN %>% + tidyr::pivot_longer( + cols = dplyr::everything(), + names_to = 'variable', + values_to = 'val' + ) %>% + dplyr::group_by(variable) %>% + dplyr::summarise( + total = dplyr::n(), + n_missing = sum(is.na(val)), + percent_missing = paste0(round(mean(is.na(val)) * 100, 2), ' %') + ) + + if (quiet == FALSE) { + # Warning for missing census data + if (sum(missingYN$n_missing) > 0) { + message('Warning: Missing census data') + } + } + + # Format output + if (geo_large == 'state') { + ii <- ii_data %>% + dplyr::left_join(IItmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, state, II) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, state, II) %>% + .[.$GEOID != 'NANA',] + } + if (geo_large == 'county') { + ii <- ii_data %>% + dplyr::left_join(IItmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, state, county, II) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, state, county, II) %>% + .[.$GEOID != 'NANA',] + } + if (geo_large == 'tract') { + ii <- ii_data %>% + dplyr::left_join(IItmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, state, county, tract, II) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, state, county, tract, II) %>% + .[.$GEOID != 'NANA',] + } + + ii <- ii %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + ii_data <- ii_data %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + out <- list(ii = ii, ii_data = ii_data, missing = missingYN) + + return(out) } - - # Format output - if (geo_large == "state") { - ii <- merge(ii_data, IItmp) %>% - dplyr::select(oid, state, II) %>% - unique(.) %>% - dplyr::mutate(GEOID = oid) %>% - dplyr::select(GEOID, state, II) %>% - .[.$GEOID != "NANA", ] - } - if (geo_large == "county") { - ii <- merge(ii_data, IItmp) %>% - dplyr::select(oid, state, county, II) %>% - unique(.) %>% - dplyr::mutate(GEOID = oid) %>% - dplyr::select(GEOID, state, county, II) %>% - .[.$GEOID != "NANA", ] - } - if (geo_large == "tract") { - ii <- merge(ii_data, IItmp) %>% - dplyr::select(oid, state, county, tract, II) %>% - unique(.) %>% - dplyr::mutate(GEOID = oid) %>% - dplyr::select(GEOID, state, county, tract, II) %>% - .[.$GEOID != "NANA", ] - } - - ii <- ii %>% - dplyr::arrange(GEOID) %>% - dplyr::as_tibble() - - ii_data <- ii_data %>% - dplyr::arrange(GEOID) %>% - dplyr::as_tibble() - - out <- list(ii = ii, - ii_data = ii_data, - missing = missingYN) - - return(out) -} diff --git a/R/bemanian_beyer.R b/R/bemanian_beyer.R index b97ee28..fc3b080 100644 --- a/R/bemanian_beyer.R +++ b/R/bemanian_beyer.R @@ -1,9 +1,9 @@ #' Local Exposure and Isolation metric based on Bemanian & Beyer (2017) -#' +#' #' Compute the aspatial Local Exposure and Isolation (Bemanian & Beyer) metric of a selected racial/ethnic subgroup(s) and U.S. geographies. #' -#' @param geo_large Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = "county"}. -#' @param geo_small Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = "tract"}. +#' @param geo_large Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}. +#' @param geo_small Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}. #' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available. #' @param subgroup Character string specifying the racial/ethnic subgroup(s). See Details for available choices. #' @param subgroup_ixn Character string specifying the racial/ethnic subgroup(s) as the interaction population. If the same as \code{subgroup}, will compute the simple isolation of the group. See Details for available choices. @@ -12,47 +12,47 @@ #' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics #' #' @details This function will compute the aspatial Local Exposure and Isolation (LEx/Is) metric of selected racial/ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Bemanian & Beyer (2017) \doi{10.1158/1055-9965.EPI-16-0926}. This function provides the computation of LEx/Is for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). -#' +#' #' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: #' \itemize{ -#' \item **B03002_002**: not Hispanic or Latino \code{"NHoL"} -#' \item **B03002_003**: not Hispanic or Latino, white alone \code{"NHoLW"} -#' \item **B03002_004**: not Hispanic or Latino, Black or African American alone \code{"NHoLB"} -#' \item **B03002_005**: not Hispanic or Latino, American Indian and Alaska Native alone \code{"NHoLAIAN"} -#' \item **B03002_006**: not Hispanic or Latino, Asian alone \code{"NHoLA"} -#' \item **B03002_007**: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"NHoLNHOPI"} -#' \item **B03002_008**: not Hispanic or Latino, Some other race alone \code{"NHoLSOR"} -#' \item **B03002_009**: not Hispanic or Latino, Two or more races \code{"NHoLTOMR"} -#' \item **B03002_010**: not Hispanic or Latino, Two races including Some other race \code{"NHoLTRiSOR"} -#' \item **B03002_011**: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"NHoLTReSOR"} -#' \item **B03002_012**: Hispanic or Latino \code{"HoL"} -#' \item **B03002_013**: Hispanic or Latino, white alone \code{"HoLW"} -#' \item **B03002_014**: Hispanic or Latino, Black or African American alone \code{"HoLB"} -#' \item **B03002_015**: Hispanic or Latino, American Indian and Alaska Native alone \code{"HoLAIAN"} -#' \item **B03002_016**: Hispanic or Latino, Asian alone \code{"HoLA"} -#' \item **B03002_017**: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"HoLNHOPI"} -#' \item **B03002_018**: Hispanic or Latino, Some other race alone \code{"HoLSOR"} -#' \item **B03002_019**: Hispanic or Latino, Two or more races \code{"HoLTOMR"} -#' \item **B03002_020**: Hispanic or Latino, Two races including Some other race \code{"HoLTRiSOR"} -#' \item **B03002_021**: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"HoLTReSOR"} +#' \item **B03002_002**: not Hispanic or Latino \code{'NHoL'} +#' \item **B03002_003**: not Hispanic or Latino, white alone \code{'NHoLW'} +#' \item **B03002_004**: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +#' \item **B03002_005**: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +#' \item **B03002_006**: not Hispanic or Latino, Asian alone \code{'NHoLA'} +#' \item **B03002_007**: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +#' \item **B03002_008**: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +#' \item **B03002_009**: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +#' \item **B03002_010**: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +#' \item **B03002_011**: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +#' \item **B03002_012**: Hispanic or Latino \code{'HoL'} +#' \item **B03002_013**: Hispanic or Latino, white alone \code{'HoLW'} +#' \item **B03002_014**: Hispanic or Latino, Black or African American alone \code{'HoLB'} +#' \item **B03002_015**: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +#' \item **B03002_016**: Hispanic or Latino, Asian alone \code{'HoLA'} +#' \item **B03002_017**: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +#' \item **B03002_018**: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +#' \item **B03002_019**: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +#' \item **B03002_020**: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +#' \item **B03002_021**: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} #' } -#' +#' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. -#' +#' #' LEx/Is is a measure of the probability that two individuals living within a specific smaller geography (e.g., census tract) of either different (i.e., exposure) or the same (i.e., isolation) racial/ethnic subgroup(s) will interact, assuming that individuals within a smaller geography are randomly mixed. LEx/Is is standardized with a logit transformation and centered against an expected case that all races/ethnicities are evenly distributed across a larger geography. (Note: will adjust data by 0.025 if probabilities are zero, one, or undefined. The output will include a warning if adjusted. See \code{\link[car]{logit}} for additional details.) -#' +#' #' LEx/Is can range from negative infinity to infinity. If LEx/Is is zero then the estimated probability of the interaction between two people of the given subgroup(s) within a smaller geography is equal to the expected probability if the subgroup(s) were perfectly mixed in the larger geography. If LEx/Is is greater than zero then the interaction is more likely to occur within the smaller geography than in the larger geography, and if LEx/Is is less than zero then the interaction is less likely to occur within the smaller geography than in the larger geography. Note: the exponentiation of each LEx/Is metric results in the odds ratio of the specific exposure or isolation of interest in a smaller geography relative to the larger geography. -#' -#' Larger geographies available include state \code{geo_large = "state"}, county \code{geo_large = "county"}, and census tract \code{geo_large = "tract"} levels. Smaller geographies available include, county \code{geo_small = "county"}, census tract \code{geo_small = "tract"}, and census block group \code{geo_small = "block group"} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the LEx/Is value returned is NA. -#' +#' +#' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, and census tract \code{geo_large = 'tract'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the LEx/Is value returned is NA. +#' #' @return An object of class 'list'. This is a named list with the following components: -#' +#' #' \describe{ #' \item{\code{lexis}}{An object of class 'tbl' for the GEOID, name, and LEx/Is at specified smaller census geographies.} #' \item{\code{lexis_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.} #' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute LEx/Is.} #' } -#' +#' #' @import dplyr #' @importFrom car logit #' @importFrom sf st_drop_geometry @@ -61,196 +61,272 @@ #' @importFrom tidyr pivot_longer separate #' @importFrom utils stack #' @export -#' +#' #' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). #' #' @examples #' \dontrun{ #' # Wrapped in \dontrun{} because these examples require a Census API key. -#' +#' #' # Isolation of non-Hispanic Black vs. non-Hispanic white populations #' ## of census tracts within Georgia, U.S.A., counties (2020) -#' bemanian_beyer(geo_large = "county", geo_small = "tract", state = "GA", -#' year = 2020, subgroup = "NHoLB", subgroup_ixn = "NHoLW") -#' +#' bemanian_beyer( +#' geo_large = 'county', +#' geo_small = 'tract', +#' state = 'GA', +#' year = 2020, +#' subgroup = 'NHoLB', +#' subgroup_ixn = 'NHoLW' +#' ) +#' #' } -#' -bemanian_beyer <- function(geo_large = "county", geo_small = "tract", year = 2020, subgroup, subgroup_ixn, omit_NAs = TRUE, quiet = FALSE, ...) { - - # Check arguments - match.arg(geo_large, choices = c("state", "county", "tract")) - match.arg(geo_small, choices = c("county", "tract", "block group")) - stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward - match.arg(subgroup, several.ok = TRUE, - choices = c("NHoL", "NHoLW", "NHoLB", "NHoLAIAN", "NHoLA", "NHoLNHOPI", - "NHoLSOR", "NHoLTOMR", "NHoLTRiSOR", "NHoLTReSOR", - "HoL", "HoLW", "HoLB", "HoLAIAN", "HoLA", "HoLNHOPI", - "HoLSOR", "HoLTOMR", "HoLTRiSOR", "HoLTReSOR")) - match.arg(subgroup_ixn, several.ok = TRUE, - choices = c("NHoL", "NHoLW", "NHoLB", "NHoLAIAN", "NHoLA", "NHoLNHOPI", - "NHoLSOR", "NHoLTOMR", "NHoLTRiSOR", "NHoLTReSOR", - "HoL", "HoLW", "HoLB", "HoLAIAN", "HoLA", "HoLNHOPI", - "HoLSOR", "HoLTOMR", "HoLTRiSOR", "HoLTReSOR")) - - # Select census variables - vars <- c(TotalPop = "B03002_001", - NHoL = "B03002_002", - NHoLW = "B03002_003", - NHoLB = "B03002_004", - NHoLAIAN = "B03002_005", - NHoLA = "B03002_006", - NHoLNHOPI = "B03002_007", - NHoLSOR = "B03002_008", - NHoLTOMR = "B03002_009", - NHoLTRiSOR = "B03002_010", - NHoLTReSOR = "B03002_011", - HoL = "B03002_012", - HoLW = "B03002_013", - HoLB = "B03002_014", - HoLAIAN = "B03002_015", - HoLA = "B03002_016", - HoLNHOPI = "B03002_017", - HoLSOR = "B03002_018", - HoLTOMR = "B03002_019", - HoLTRiSOR = "B03002_020", - HoLTReSOR = "B03002_021") - - selected_vars <- vars[c("TotalPop", subgroup, subgroup_ixn)] - out_names <- names(selected_vars) # save for output - in_subgroup <- paste(subgroup, "E", sep = "") - in_subgroup_ixn <- paste(subgroup_ixn, "E", sep = "") - - # Acquire LEx/Is variables and sf geometries - lexis_data <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo_small, - year = year, - output = "wide", - variables = selected_vars, - geometry = TRUE, - keep_geo_vars = TRUE, ...))) - - # Format output - if (geo_small == "county") { - lexis_data <- sf::st_drop_geometry(lexis_data) %>% - tidyr::separate(NAME.y, into = c("county", "state"), sep = ",") - } - if (geo_small == "tract") { - lexis_data <- sf::st_drop_geometry(lexis_data) %>% - tidyr::separate(NAME.y, into = c("tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]", "", tract)) - } - if (geo_small == "block group") { - lexis_data <- sf::st_drop_geometry(lexis_data) %>% - tidyr::separate(NAME.y, into = c("block.group", "tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]", "", tract), - block.group = gsub("[^0-9\\.]", "", block.group)) - } - - # Grouping IDs for LEx/Is computation - if (geo_large == "tract") { - lexis_data <- lexis_data %>% - dplyr::mutate(oid = paste(.$STATEFP, .$COUNTYFP, .$TRACTCE, sep = ""), - state = stringr::str_trim(state), - county = stringr::str_trim(county)) - } - if (geo_large == "county") { - lexis_data <- lexis_data %>% - dplyr::mutate(oid = paste(.$STATEFP, .$COUNTYFP, sep = ""), - state = stringr::str_trim(state), - county = stringr::str_trim(county)) - } - if (geo_large == "state") { - lexis_data <- lexis_data %>% - dplyr::mutate(oid = .$STATEFP, - state = stringr::str_trim(state)) - } - - # Count of racial/ethnic subgroup populations - ## Count of racial/ethnic comparison subgroup population - if (length(in_subgroup) == 1) { - lexis_data <- lexis_data %>% - dplyr::mutate(subgroup = .[ , in_subgroup]) - } else { - lexis_data <- lexis_data %>% - dplyr::mutate(subgroup = rowSums(.[ , in_subgroup])) - } - ## Count of racial/ethnic interaction subgroup population - if (length(in_subgroup_ixn) == 1) { - lexis_data <- lexis_data %>% - dplyr::mutate(subgroup_ixn = .[ , in_subgroup_ixn]) - } else { - lexis_data <- lexis_data %>% - dplyr::mutate(subgroup_ixn = rowSums(.[ , in_subgroup_ixn])) - } - - # Compute LEx/Is - ## From Bemanian & Beyer (2017) https://doi.org/10.1158/1055-9965.EPI-16-0926 - ## E^*_{m,n}(i) = log\left(\frac{p_{im} \times p_{in}}{1 - p_{im} \times p_{in}}\right) - log\left(\frac{P_{m} \times P_{n}}{1 - P_{m} \times P_{n}}\right) - ## Where for smaller geographical unit i: - ## p_{im} denotes the number of subgroup population m in smaller geographical unit i - ## p_{in} denotes the number of subgroup population n in smaller geographical unit i - ## P_{m} denotes the number of subgroup population m in larger geographical unit within which the smaller geographic unit i is located - ## P_{n} denotes the number of subgroup population n in larger geographical unit within which the smaller geographic unit i is located - ## If m \ne n, then computes the exposure of members of subgroup populations m and n - ## If m = n, then computes the simple isolation experienced by members of subgroup population m - - ## Compute - LExIstmp <- lexis_data %>% - split(., f = list(lexis_data$oid)) %>% - lapply(., FUN = lexis_fun, omit_NAs = omit_NAs) %>% - do.call("rbind", .) - - # Warning for missingness of census characteristics - missingYN <- lexis_data[ , c("TotalPopE", in_subgroup, in_subgroup_ixn)] - names(missingYN) <- out_names - missingYN <- missingYN %>% - tidyr::pivot_longer(cols = dplyr::everything(), - names_to = "variable", - values_to = "val") %>% - dplyr::group_by(variable) %>% - dplyr::summarise(total = dplyr::n(), - n_missing = sum(is.na(val)), - percent_missing = paste0(round(mean(is.na(val)) * 100, 2), " %")) +#' +bemanian_beyer <- function(geo_large = 'county', + geo_small = 'tract', + year = 2020, + subgroup, + subgroup_ixn, + omit_NAs = TRUE, + quiet = FALSE, + ...) { - if (quiet == FALSE) { - # Warning for missing census data - if (sum(missingYN$n_missing) > 0) { - message("Warning: Missing census data") + # Check arguments + match.arg(geo_large, choices = c('state', 'county', 'tract')) + match.arg(geo_small, choices = c('county', 'tract', 'block group')) + stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward + match.arg( + subgroup, + several.ok = TRUE, + choices = c( + 'NHoL', + 'NHoLW', + 'NHoLB', + 'NHoLAIAN', + 'NHoLA', + 'NHoLNHOPI', + 'NHoLSOR', + 'NHoLTOMR', + 'NHoLTRiSOR', + 'NHoLTReSOR', + 'HoL', + 'HoLW', + 'HoLB', + 'HoLAIAN', + 'HoLA', + 'HoLNHOPI', + 'HoLSOR', + 'HoLTOMR', + 'HoLTRiSOR', + 'HoLTReSOR' + ) + ) + match.arg( + subgroup_ixn, + several.ok = TRUE, + choices = c( + 'NHoL', + 'NHoLW', + 'NHoLB', + 'NHoLAIAN', + 'NHoLA', + 'NHoLNHOPI', + 'NHoLSOR', + 'NHoLTOMR', + 'NHoLTRiSOR', + 'NHoLTReSOR', + 'HoL', + 'HoLW', + 'HoLB', + 'HoLAIAN', + 'HoLA', + 'HoLNHOPI', + 'HoLSOR', + 'HoLTOMR', + 'HoLTRiSOR', + 'HoLTReSOR' + ) + ) + + # Select census variables + vars <- c( + TotalPop = 'B03002_001', + NHoL = 'B03002_002', + NHoLW = 'B03002_003', + NHoLB = 'B03002_004', + NHoLAIAN = 'B03002_005', + NHoLA = 'B03002_006', + NHoLNHOPI = 'B03002_007', + NHoLSOR = 'B03002_008', + NHoLTOMR = 'B03002_009', + NHoLTRiSOR = 'B03002_010', + NHoLTReSOR = 'B03002_011', + HoL = 'B03002_012', + HoLW = 'B03002_013', + HoLB = 'B03002_014', + HoLAIAN = 'B03002_015', + HoLA = 'B03002_016', + HoLNHOPI = 'B03002_017', + HoLSOR = 'B03002_018', + HoLTOMR = 'B03002_019', + HoLTRiSOR = 'B03002_020', + HoLTReSOR = 'B03002_021' + ) + + selected_vars <- vars[c('TotalPop', subgroup, subgroup_ixn)] + out_names <- names(selected_vars) # save for output + in_subgroup <- paste(subgroup, 'E', sep = '') + in_subgroup_ixn <- paste(subgroup_ixn, 'E', sep = '') + + # Acquire LEx/Is variables and sf geometries + lexis_data <- suppressMessages(suppressWarnings( + tidycensus::get_acs( + geography = geo_small, + year = year, + output = 'wide', + variables = selected_vars, + geometry = TRUE, + keep_geo_vars = TRUE, + ... + ) + )) + + # Format output + if (geo_small == 'county') { + lexis_data <- lexis_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('county', 'state'), sep = ',') } - } - - # Format output - lexis <- merge(lexis_data, LExIstmp) - - if (geo_small == "state") { - lexis <- lexis %>% - dplyr::select(GEOID, state, LExIs) - } - if (geo_small == "county") { - lexis <- lexis %>% - dplyr::select(GEOID, state, county, LExIs) - } - if (geo_small == "tract") { - lexis <- lexis %>% - dplyr::select(GEOID, state, county, tract, LExIs) - } - if (geo_small == "block group") { + if (geo_small == 'tract') { + lexis_data <- lexis_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate(tract = gsub('[^0-9\\.]', '', tract)) + } + if (geo_small == 'block group') { + lexis_data <- lexis_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('block.group', 'tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate( + tract = gsub('[^0-9\\.]', '', tract), + block.group = gsub('[^0-9\\.]', '', block.group) + ) + } + + # Grouping IDs for LEx/Is computation + if (geo_large == 'tract') { + lexis_data <- lexis_data %>% + dplyr::mutate( + oid = paste(.$STATEFP, .$COUNTYFP, .$TRACTCE, sep = ''), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) + } + if (geo_large == 'county') { + lexis_data <- lexis_data %>% + dplyr::mutate( + oid = paste(.$STATEFP, .$COUNTYFP, sep = ''), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) + } + if (geo_large == 'state') { + lexis_data <- lexis_data %>% + dplyr::mutate( + oid = .$STATEFP, + state = stringr::str_trim(state) + ) + } + + # Count of racial/ethnic subgroup populations + ## Count of racial/ethnic comparison subgroup population + if (length(in_subgroup) == 1) { + lexis_data <- lexis_data %>% + dplyr::mutate(subgroup = .[, in_subgroup]) + } else { + lexis_data <- lexis_data %>% + dplyr::mutate(subgroup = rowSums(.[, in_subgroup])) + } + ## Count of racial/ethnic interaction subgroup population + if (length(in_subgroup_ixn) == 1) { + lexis_data <- lexis_data %>% + dplyr::mutate(subgroup_ixn = .[, in_subgroup_ixn]) + } else { + lexis_data <- lexis_data %>% + dplyr::mutate(subgroup_ixn = rowSums(.[, in_subgroup_ixn])) + } + + # Compute LEx/Is + ## From Bemanian & Beyer (2017) https://doi.org/10.1158/1055-9965.EPI-16-0926 + ## E^*_{m,n}(i) = log\left(\frac{p_{im} \times p_{in}}{1 - p_{im} \times p_{in}}\right) - log\left(\frac{P_{m} \times P_{n}}{1 - P_{m} \times P_{n}}\right) + ## Where for smaller geographical unit i: + ## p_{im} denotes the number of subgroup population m in smaller geographical unit i + ## p_{in} denotes the number of subgroup population n in smaller geographical unit i + ## P_{m} denotes the number of subgroup population m in larger geographical unit within which the smaller geographic unit i is located + ## P_{n} denotes the number of subgroup population n in larger geographical unit within which the smaller geographic unit i is located + ## If m \ne n, then computes the exposure of members of subgroup populations m and n + ## If m = n, then computes the simple isolation experienced by members of subgroup population m + + ## Compute + LExIstmp <- lexis_data %>% + split(., f = list(lexis_data$oid)) %>% + lapply(., FUN = lexis_fun, omit_NAs = omit_NAs) %>% + do.call('rbind', .) + + # Warning for missingness of census characteristics + missingYN <- lexis_data[, c('TotalPopE', in_subgroup, in_subgroup_ixn)] + names(missingYN) <- out_names + missingYN <- missingYN %>% + tidyr::pivot_longer( + cols = dplyr::everything(), + names_to = 'variable', + values_to = 'val' + ) %>% + dplyr::group_by(variable) %>% + dplyr::summarise( + total = dplyr::n(), + n_missing = sum(is.na(val)), + percent_missing = paste0(round(mean(is.na(val)) * 100, 2), ' %') + ) + + if (quiet == FALSE) { + # Warning for missing census data + if (sum(missingYN$n_missing) > 0) { + message('Warning: Missing census data') + } + } + + # Format output + lexis <- lexis_data %>% + dplyr::left_join(LExIstmp, by = dplyr::join_by(GEOID)) + + if (geo_small == 'state') { + lexis <- lexis %>% + dplyr::select(GEOID, state, LExIs) + } + if (geo_small == 'county') { + lexis <- lexis %>% + dplyr::select(GEOID, state, county, LExIs) + } + if (geo_small == 'tract') { + lexis <- lexis %>% + dplyr::select(GEOID, state, county, tract, LExIs) + } + if (geo_small == 'block group') { + lexis <- lexis %>% + dplyr::select(GEOID, state, county, tract, block.group, LExIs) + } + lexis <- lexis %>% - dplyr::select(GEOID, state, county, tract, block.group, LExIs) + unique(.) %>% + .[.$GEOID != 'NANA',] %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + lexis_data <- lexis_data %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + out <- list(lexis = lexis, lexis_data = lexis_data, missing = missingYN) + + return(out) } - - lexis <- lexis %>% - unique(.) %>% - .[.$GEOID != "NANA", ] %>% - dplyr::arrange(GEOID) %>% - dplyr::as_tibble() - - lexis_data <- lexis_data %>% - dplyr::arrange(GEOID) %>% - dplyr::as_tibble() - - out <- list(lexis = lexis, - lexis_data = lexis_data, - missing = missingYN) - - return(out) -} diff --git a/R/bravo.R b/R/bravo.R index b1a424f..46abceb 100644 --- a/R/bravo.R +++ b/R/bravo.R @@ -1,36 +1,36 @@ -#' Educational Isolation Index based on Bravo _et al._ (2021) -#' +#' Educational Isolation Index based on Bravo et al. (2021) +#' #' Compute the spatial Educational Isolation Index (Bravo) of selected educational attainment category(ies). #' -#' @param geo Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}. +#' @param geo Character string specifying the geography of the data either census tracts \code{geo = 'tract'} (the default) or counties \code{geo = 'county'}. #' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available. #' @param subgroup Character string specifying the educational attainment category(ies). See Details for available choices. #' @param quiet Logical. If TRUE, will display messages about potential missing census information. The default is FALSE. #' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics #' -#' @details This function will compute the spatial Educational Isolation Index (EI) of U.S. census tracts or counties for a specified geographical extent (e.g., the entire U.S. or a single state) based on Bravo _et al._ (2021) \doi{10.3390/ijerph18179384} who originally designed the metric for the educational isolation of individual without a college degree. This function provides the computation of EI for any of the U.S. Census Bureau educational attainment levels. -#' -#' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the geospatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The five educational attainment levels (U.S. Census Bureau definitions) are: +#' @details This function will compute the spatial Educational Isolation Index (EI) of U.S. census tracts or counties for a specified geographical extent (e.g., the entire U.S. or a single state) based on Bravo et al. (2021) \doi{10.3390/ijerph18179384} who originally designed the metric for the educational isolation of individual without a college degree. This function provides the computation of EI for any of the U.S. Census Bureau educational attainment levels. +#' +#' The function uses the \code{\link[tidycensus]{get_acs}} to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the geospatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The five educational attainment levels (U.S. Census Bureau definitions) are: #' \itemize{ -#' \item **B06009_002**: Less than high school graduate \code{"LtHS"} -#' \item **B06009_003**: High school graduate (includes equivalency) \code{"HSGiE"} -#' \item **B06009_004**: Some college or associate's degree \code{"SCoAD"} -#' \item **B06009_005**: Bachelor's degree \code{"BD"} -#' \item **B06009_006**: Graduate or professional degree \code{"GoPD"} +#' \item **B06009_002**: Less than high school graduate \code{'LtHS'} +#' \item **B06009_003**: High school graduate (includes equivalency) \code{'HSGiE'} +#' \item **B06009_004**: Some college or associate's degree \code{'SCoAD'} +#' \item **B06009_005**: Bachelor's degree \code{'BD'} +#' \item **B06009_006**: Graduate or professional degree \code{'GoPD'} #' } #' Note: If \code{year = 2009}, then the ACS-5 data (2005-2009) are from the **B15002** question. -#' +#' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. NOTE: Current version does not correct for edge effects (e.g., census geographies along the specified spatial extent border, coastline, or U.S.-Mexico / U.S.-Canada border) may have few neighboring census geographies, and EI values in these census geographies may be unstable. A stop-gap solution for the former source of edge effect is to compute the EI for neighboring census geographies (i.e., the states bordering a study area of interest) and then use the estimates of the study area of interest. -#' +#' #' A census geography (and its neighbors) that has nearly all of its population with the specified educational attainment category (e.g., a Bachelor's degree or more) will have an EI value close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population with the specified educational attainment category (e.g., less than a Bachelor's degree) will have an EI value close to 0. -#' +#' #' @return An object of class 'list'. This is a named list with the following components: -#' +#' #' \describe{ #' \item{\code{ei}}{An object of class 'tbl' for the GEOID, name, EI, and raw census values of specified census geographies.} #' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute EI.} #' } -#' +#' #' @import dplyr #' @importFrom Matrix sparseMatrix #' @importFrom sf st_drop_geometry st_geometry st_intersects @@ -39,182 +39,290 @@ #' @importFrom tidycensus get_acs #' @importFrom tidyr pivot_longer separate #' @export -#' +#' #' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). #' #' @examples #' \dontrun{ #' # Wrapped in \dontrun{} because these examples require a Census API key. -#' +#' #' # Tract-level metric (2020) -#' bravo(geo = "tract", state = "GA", -#' year = 2020, subgroup = c("LtHS", "HSGiE")) -#' +#' bravo( +#' geo = 'tract', +#' state = 'GA', +#' year = 2020, +#' subgroup = c('LtHS', 'HSGiE') +#' ) +#' #' # County-level metric (2020) -#' bravo(geo = "county", state = "GA", -#' year = 2020, subgroup = c("LtHS", "HSGiE")) -#' +#' bravo( +#' geo = 'county', +#' state = 'GA', +#' year = 2020, +#' subgroup = c('LtHS', 'HSGiE') +#' ) +#' #' } -#' -bravo <- function(geo = "tract", year = 2020, subgroup, quiet = FALSE, ...) { - - # Check arguments - match.arg(geo, choices = c("county", "tract")) - stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward - match.arg(subgroup, several.ok = TRUE, - choices = c("LtHS", "HSGiE", "SCoAD", "BD", "GoPD")) - - # Select census variables - vars <- c(TotalPop = "B06009_001", - LtHS = "B06009_002", - HSGiE = "B06009_003", - SCoAD = "B06009_004", - BD = "B06009_005", - GoPD = "B06009_006") - - selected_vars <- vars[c("TotalPop", subgroup)] +#' +bravo <- function(geo = 'tract', + year = 2020, + subgroup, + quiet = FALSE, + ...) { - if (year == 2009) { - vars <- matrix(c("TotalPop", "TotalPop", "B15002_001", - "LtHS", "mNSC", "B15002_003", - "LtHS", "mNt4G", "B15002_004", - "LtHS", "m5t6G", "B15002_005", - "LtHS", "m7t8G", "B15002_006", - "LtHS", "m9G", "B15002_007", - "LtHS", "m10G", "B15002_008", - "LtHS", "m11G", "B15002_009", - "LtHS", "m12GND", "B15002_010", - "HSGiE", "mHSGGEDoA", "B15002_011", - "SCoAD", "mSClt1Y", "B15002_012", - "SCoAD", "mSC1oMYND", "B15002_013", - "SCoAD", "mAD", "B15002_014", - "BD", "mBD", "B15002_015", - "GoPD", "mMD", "B15002_016", - "GoPD", "mPSD", "B15002_017", - "GoPD", "mDD", "B15002_018", - "LtHS", "fNSC", "B15002_020", - "LtHS", "fNt4G", "B15002_021", - "LtHS", "f5t6G", "B15002_022", - "LtHS", "f7t8G", "B15002_023", - "LtHS", "f9G", "B15002_024", - "LtHS", "f10G", "B15002_025", - "LtHS", "f11G", "B15002_026", - "LtHS", "f12GND", "B15002_027", - "HSGiE", "fHSGGEDoA", "B15002_028", - "SCoAD", "fSClt1Y", "B15002_029", - "SCoAD", "fSC1oMYND", "B15002_030", - "SCoAD", "fAD", "B15002_031", - "BD", "fBD", "B15002_032", - "GoPD", "fMD", "B15002_033", - "GoPD", "fPSD", "B15002_034", - "GoPD", "fDD", "B15002_035"), nrow = 33, ncol = 3, byrow = TRUE) + # Check arguments + match.arg(geo, choices = c('county', 'tract')) + stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward + match.arg( + subgroup, + several.ok = TRUE, + choices = c('LtHS', 'HSGiE', 'SCoAD', 'BD', 'GoPD') + ) + + # Select census variables + vars <- c( + TotalPop = 'B06009_001', + LtHS = 'B06009_002', + HSGiE = 'B06009_003', + SCoAD = 'B06009_004', + BD = 'B06009_005', + GoPD = 'B06009_006' + ) + + selected_vars <- vars[c('TotalPop', subgroup)] + + if (year == 2009) { + vars <- matrix( + c( + 'TotalPop', + 'TotalPop', + 'B15002_001', + 'LtHS', + 'mNSC', + 'B15002_003', + 'LtHS', + 'mNt4G', + 'B15002_004', + 'LtHS', + 'm5t6G', + 'B15002_005', + 'LtHS', + 'm7t8G', + 'B15002_006', + 'LtHS', + 'm9G', + 'B15002_007', + 'LtHS', + 'm10G', + 'B15002_008', + 'LtHS', + 'm11G', + 'B15002_009', + 'LtHS', + 'm12GND', + 'B15002_010', + 'HSGiE', + 'mHSGGEDoA', + 'B15002_011', + 'SCoAD', + 'mSClt1Y', + 'B15002_012', + 'SCoAD', + 'mSC1oMYND', + 'B15002_013', + 'SCoAD', + 'mAD', + 'B15002_014', + 'BD', + 'mBD', + 'B15002_015', + 'GoPD', + 'mMD', + 'B15002_016', + 'GoPD', + 'mPSD', + 'B15002_017', + 'GoPD', + 'mDD', + 'B15002_018', + 'LtHS', + 'fNSC', + 'B15002_020', + 'LtHS', + 'fNt4G', + 'B15002_021', + 'LtHS', + 'f5t6G', + 'B15002_022', + 'LtHS', + 'f7t8G', + 'B15002_023', + 'LtHS', + 'f9G', + 'B15002_024', + 'LtHS', + 'f10G', + 'B15002_025', + 'LtHS', + 'f11G', + 'B15002_026', + 'LtHS', + 'f12GND', + 'B15002_027', + 'HSGiE', + 'fHSGGEDoA', + 'B15002_028', + 'SCoAD', + 'fSClt1Y', + 'B15002_029', + 'SCoAD', + 'fSC1oMYND', + 'B15002_030', + 'SCoAD', + 'fAD', + 'B15002_031', + 'BD', + 'fBD', + 'B15002_032', + 'GoPD', + 'fMD', + 'B15002_033', + 'GoPD', + 'fPSD', + 'B15002_034', + 'GoPD', + 'fDD', + 'B15002_035' + ), + nrow = 33, + ncol = 3, + byrow = TRUE + ) + + selected_vars <- stats::setNames( + vars[vars[, 1] %in% c('TotalPop', subgroup) , 3], + vars[vars[, 1] %in% c('TotalPop', subgroup) , 2] + ) + } + + out_names <- names(selected_vars) # save for output + prefix <- 'subgroup' + suffix <- seq(1:length(selected_vars[-1])) + names(selected_vars) <- c('TotalPop', paste(prefix, suffix, sep = '')) + in_names <- paste(names(selected_vars), 'E', sep = '') + + # Acquire EI variables and sf geometries + ei_data <- suppressMessages(suppressWarnings( + tidycensus::get_acs( + geography = geo, + year = year, + output = 'wide', + variables = selected_vars, + geometry = TRUE, + ... + ) + )) + + if (geo == 'tract') { + ei_data <- ei_data %>% + tidyr::separate(NAME, into = c('tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate(tract = gsub('[^0-9\\.]', '', tract)) + } else { + ei_data <- ei_data %>% + tidyr::separate(NAME, into = c('county', 'state'), sep = ',') + } - selected_vars <- stats::setNames(vars[ vars[ , 1] %in% c("TotalPop", subgroup) , 3], - vars[ vars[ , 1] %in% c("TotalPop", subgroup) , 2]) - } - - out_names <- names(selected_vars) # save for output - prefix <- "subgroup" - suffix <- seq(1:length(selected_vars[-1])) - names(selected_vars) <- c("TotalPop", paste(prefix, suffix, sep = "")) - in_names <- paste(names(selected_vars), "E", sep = "") - - # Acquire EI variables and sf geometries - ei_data <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo, - year = year, - output = "wide", - variables = selected_vars, - geometry = TRUE, ...))) - - if (geo == "tract") { ei_data <- ei_data %>% - tidyr::separate(NAME, into = c("tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]","", tract)) - } else { - ei_data <- ei_data %>% tidyr::separate(NAME, into = c("county", "state"), sep = ",") - } - - ei_data <- ei_data %>% - dplyr::mutate(subgroup = rowSums(sf::st_drop_geometry(ei_data[ , in_names[-1]]))) - - # Compute EI - ## From Bravo et al. (2021) https://doi.org/10.3390/ijerph18179384 - ## EI_{im} = (Sigma_{j∈∂_{i}} w_{ij} * T_{jm}) / (Sigma_{j∈∂_{i}} w_{ij} * T_{j}) - ## Where: - ## ∂_{i} denotes the set of index units i and its neighbors - ## Given M mutually exclusive subgroups of educational attainment categories, m indexes the subgroups of M - ## T_{i} denotes the total population in region i (TotalPop) - ## T_{im} denotes the population of the selected subgroup(s) (subgroup1, ...) - ## w_{ij} denotes a nXn first-order adjacency matrix, where n is the number of census geometries in the study area - ### and the entries of w_{ij} are set to 1 if a boundary is shared by region i and region j and zero otherwise - ### Entries of the main diagonal (since i∈∂_{i}, w_{ij} = w_{ii} when j = i) of w_{ij} are set to 1.5 - ### such that the weight of the index unit, i, is larger than the weights assigned to adjacent tracts - - ## Geospatial adjacency matrix (wij) - tmp <- sf::st_intersects(sf::st_geometry(ei_data), sparse = TRUE) - names(tmp) <- as.character(seq_len(nrow(ei_data))) - tmpL <- length(tmp) - tmpcounts <- unlist(Map(length, tmp)) - tmpi <- rep(1:tmpL, tmpcounts) - tmpj <- unlist(tmp) - wij <- Matrix::sparseMatrix(i = tmpi, j = tmpj, x = 1, dims = c(tmpL, tmpL)) - diag(wij) <- 1.5 - - ## Compute - ei_data <- sf::st_drop_geometry(ei_data) # drop geometries (can join back later) - EIim <- list() - for (i in 1:dim(wij)[1]){ - EIim[[i]] <- sum(as.matrix(wij[i, ])*ei_data[ , "subgroup"]) / sum(as.matrix(wij[i, ])*ei_data[, "TotalPopE"]) - } - ei_data$EI <- unlist(EIim) - - # Warning for missingness of census characteristics - missingYN <- ei_data[ , in_names] - names(missingYN) <- out_names - missingYN <- missingYN %>% - tidyr::pivot_longer(cols = dplyr::everything(), - names_to = "variable", - values_to = "val") %>% - dplyr::group_by(variable) %>% - dplyr::summarise(total = dplyr::n(), - n_missing = sum(is.na(val)), - percent_missing = paste0(round(mean(is.na(val)) * 100, 2), " %")) - - if (quiet == FALSE) { - # Warning for missing census data - if (sum(missingYN$n_missing) > 0) { - message("Warning: Missing census data") + dplyr::mutate(subgroup = rowSums(sf::st_drop_geometry(ei_data[, in_names[-1]]))) + + # Compute EI + ## From Bravo et al. (2021) https://doi.org/10.3390/ijerph18179384 + ## EI_{im} = (Sigma_{j∈∂_{i}} w_{ij} * T_{jm}) / (Sigma_{j∈∂_{i}} w_{ij} * T_{j}) + ## Where: + ## ∂_{i} denotes the set of index units i and its neighbors + ## Given M mutually exclusive subgroups of educational attainment categories, m indexes the subgroups of M + ## T_{i} denotes the total population in region i (TotalPop) + ## T_{im} denotes the population of the selected subgroup(s) (subgroup1, ...) + ## w_{ij} denotes a nXn first-order adjacency matrix, where n is the number of census geometries in the study area + ### and the entries of w_{ij} are set to 1 if a boundary is shared by region i and region j and zero otherwise + ### Entries of the main diagonal (since i∈∂_{i}, w_{ij} = w_{ii} when j = i) of w_{ij} are set to 1.5 + ### such that the weight of the index unit, i, is larger than the weights assigned to adjacent tracts + + ## Geospatial adjacency matrix (wij) + tmp <- sf::st_intersects(sf::st_geometry(ei_data), sparse = TRUE) + names(tmp) <- as.character(seq_len(nrow(ei_data))) + tmpL <- length(tmp) + tmpcounts <- unlist(Map(length, tmp)) + tmpi <- rep(1:tmpL, tmpcounts) + tmpj <- unlist(tmp) + wij <- Matrix::sparseMatrix( + i = tmpi, + j = tmpj, + x = 1, + dims = c(tmpL, tmpL) + ) + diag(wij) <- 1.5 + + ## Compute + ei_data <- ei_data %>% + sf::st_drop_geometry() # drop geometries (can join back later) + EIim <- list() + for (i in 1:dim(wij)[1]) { + EIim[[i]] <- sum(as.matrix(wij[i,]) * ei_data[, 'subgroup']) / + sum(as.matrix(wij[i,]) * ei_data[, 'TotalPopE']) } + ei_data$EI <- unlist(EIim) + + # Warning for missingness of census characteristics + missingYN <- ei_data[, in_names] + names(missingYN) <- out_names + missingYN <- missingYN %>% + tidyr::pivot_longer( + cols = dplyr::everything(), + names_to = 'variable', + values_to = 'val' + ) %>% + dplyr::group_by(variable) %>% + dplyr::summarise( + total = dplyr::n(), + n_missing = sum(is.na(val)), + percent_missing = paste0(round(mean(is.na(val)) * 100, 2), ' %') + ) + + if (quiet == FALSE) { + # Warning for missing census data + if (sum(missingYN$n_missing) > 0) { + message('Warning: Missing census data') + } + } + + # Format output + if (geo == 'tract') { + ei <- ei_data %>% + dplyr::select(c( + 'GEOID', + 'state', + 'county', + 'tract', + 'EI', + dplyr::all_of(in_names) + )) + names(ei) <- c('GEOID', 'state', 'county', 'tract', 'EI', out_names) + } else { + ei <- ei_data %>% + dplyr::select(c('GEOID', 'state', 'county', 'EI', dplyr::all_of(in_names))) + names(ei) <- c('GEOID', 'state', 'county', 'EI', out_names) + } + + ei <- ei %>% + dplyr::mutate( + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + out <- list(ei = ei, missing = missingYN) + + return(out) } - - # Format output - if (geo == "tract") { - ei <- ei_data %>% - dplyr::select(c("GEOID", - "state", - "county", - "tract", - "EI", - dplyr::all_of(in_names))) - names(ei) <- c("GEOID", "state", "county", "tract", "EI", out_names) - } else { - ei <- ei_data %>% - dplyr::select(c("GEOID", - "state", - "county", - "EI", - dplyr::all_of(in_names))) - names(ei) <- c("GEOID", "state", "county", "EI", out_names) - } - - ei <- ei %>% - dplyr::mutate(state = stringr::str_trim(state), - county = stringr::str_trim(county)) %>% - dplyr::arrange(GEOID) %>% - dplyr::as_tibble() - - out <- list(ei = ei, - missing = missingYN) - - return(out) -} diff --git a/R/duncan.R b/R/duncan.R index 8215a26..8305b92 100644 --- a/R/duncan.R +++ b/R/duncan.R @@ -1,9 +1,9 @@ -#' Dissimilarity Index based on Duncan & Duncan (1955) -#' +#' Dissimilarity Index based on Duncan & Duncan (1955) +#' #' Compute the aspatial Dissimilarity Index (Duncan & Duncan) of selected racial/ethnic subgroup(s) and U.S. geographies #' -#' @param geo_large Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = "county"}. -#' @param geo_small Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = "tract"}. +#' @param geo_large Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}. +#' @param geo_small Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}. #' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available. #' @param subgroup Character string specifying the racial/ethnic subgroup(s) as the comparison population. See Details for available choices. #' @param subgroup_ref Character string specifying the racial/ethnic subgroup(s) as the reference population. See Details for available choices. @@ -12,45 +12,45 @@ #' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics #' #' @details This function will compute the aspatial Dissimilarity Index (DI) of selected racial/ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Duncan & Duncan (1955) \doi{10.2307/2088328}. This function provides the computation of DI for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). -#' +#' #' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: #' \itemize{ -#' \item **B03002_002**: not Hispanic or Latino \code{"NHoL"} -#' \item **B03002_003**: not Hispanic or Latino, white alone \code{"NHoLW"} -#' \item **B03002_004**: not Hispanic or Latino, Black or African American alone \code{"NHoLB"} -#' \item **B03002_005**: not Hispanic or Latino, American Indian and Alaska Native alone \code{"NHoLAIAN"} -#' \item **B03002_006**: not Hispanic or Latino, Asian alone \code{"NHoLA"} -#' \item **B03002_007**: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"NHoLNHOPI"} -#' \item **B03002_008**: not Hispanic or Latino, Some other race alone \code{"NHoLSOR"} -#' \item **B03002_009**: not Hispanic or Latino, Two or more races \code{"NHoLTOMR"} -#' \item **B03002_010**: not Hispanic or Latino, Two races including Some other race \code{"NHoLTRiSOR"} -#' \item **B03002_011**: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"NHoLTReSOR"} -#' \item **B03002_012**: Hispanic or Latino \code{"HoL"} -#' \item **B03002_013**: Hispanic or Latino, white alone \code{"HoLW"} -#' \item **B03002_014**: Hispanic or Latino, Black or African American alone \code{"HoLB"} -#' \item **B03002_015**: Hispanic or Latino, American Indian and Alaska Native alone \code{"HoLAIAN"} -#' \item **B03002_016**: Hispanic or Latino, Asian alone \code{"HoLA"} -#' \item **B03002_017**: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"HoLNHOPI"} -#' \item **B03002_018**: Hispanic or Latino, Some other race alone \code{"HoLSOR"} -#' \item **B03002_019**: Hispanic or Latino, Two or more races \code{"HoLTOMR"} -#' \item **B03002_020**: Hispanic or Latino, Two races including Some other race \code{"HoLTRiSOR"} -#' \item **B03002_021**: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"HoLTReSOR"} +#' \item **B03002_002**: not Hispanic or Latino \code{'NHoL'} +#' \item **B03002_003**: not Hispanic or Latino, white alone \code{'NHoLW'} +#' \item **B03002_004**: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +#' \item **B03002_005**: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +#' \item **B03002_006**: not Hispanic or Latino, Asian alone \code{'NHoLA'} +#' \item **B03002_007**: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +#' \item **B03002_008**: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +#' \item **B03002_009**: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +#' \item **B03002_010**: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +#' \item **B03002_011**: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +#' \item **B03002_012**: Hispanic or Latino \code{'HoL'} +#' \item **B03002_013**: Hispanic or Latino, white alone \code{'HoLW'} +#' \item **B03002_014**: Hispanic or Latino, Black or African American alone \code{'HoLB'} +#' \item **B03002_015**: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +#' \item **B03002_016**: Hispanic or Latino, Asian alone \code{'HoLA'} +#' \item **B03002_017**: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +#' \item **B03002_018**: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +#' \item **B03002_019**: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +#' \item **B03002_020**: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +#' \item **B03002_021**: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} #' } -#' +#' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. -#' +#' #' DI is a measure of the evenness of racial/ethnic residential segregation when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. DI can range in value from 0 to 1 and represents the proportion of racial/ethnic subgroup members that would have to change their area of residence to achieve an even distribution within the larger geographical area under conditions of maximum segregation. -#' -#' Larger geographies available include state \code{geo_large = "state"}, county \code{geo_large = "county"}, and census tract \code{geo_large = "tract"} levels. Smaller geographies available include, county \code{geo_small = "county"}, census tract \code{geo_small = "tract"}, and census block group \code{geo_small = "block group"} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the DI value returned is NA. -#' +#' +#' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, and census tract \code{geo_large = 'tract'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the DI value returned is NA. +#' #' @return An object of class 'list'. This is a named list with the following components: -#' +#' #' \describe{ #' \item{\code{di}}{An object of class 'tbl' for the GEOID, name, and DI at specified larger census geographies.} #' \item{\code{di_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.} #' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute DI.} #' } -#' +#' #' @import dplyr #' @importFrom sf st_drop_geometry #' @importFrom stats complete.cases @@ -58,201 +58,281 @@ #' @importFrom tidyr pivot_longer separate #' @importFrom utils stack #' @export -#' +#' #' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). #' #' @examples #' \dontrun{ #' # Wrapped in \dontrun{} because these examples require a Census API key. -#' +#' #' # Dissimilarity Index of non-Hispanic Black vs. non-Hispanic white populations #' ## of census tracts within Georgia, U.S.A., counties (2020) -#' duncan(geo_large = "county", geo_small = "tract", state = "GA", -#' year = 2020, subgroup = "NHoLB", subgroup_ref = "NHoLW") -#' +#' duncan( +#' geo_large = 'county', +#' geo_small = 'tract', +#' state = 'GA', +#' year = 2020, +#' subgroup = 'NHoLB', +#' subgroup_ref = 'NHoLW' +#' ) +#' #' } -#' -duncan <- function(geo_large = "county", geo_small = "tract", year = 2020, subgroup, subgroup_ref, omit_NAs = TRUE, quiet = FALSE, ...) { - - # Check arguments - match.arg(geo_large, choices = c("state", "county", "tract")) - match.arg(geo_small, choices = c("county", "tract", "block group")) - stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward - match.arg(subgroup, several.ok = TRUE, - choices = c("NHoL", "NHoLW", "NHoLB", "NHoLAIAN", "NHoLA", "NHoLNHOPI", - "NHoLSOR", "NHoLTOMR", "NHoLTRiSOR", "NHoLTReSOR", - "HoL", "HoLW", "HoLB", "HoLAIAN", "HoLA", "HoLNHOPI", - "HoLSOR", "HoLTOMR", "HoLTRiSOR", "HoLTReSOR")) - match.arg(subgroup_ref, several.ok = TRUE, - choices = c("NHoL", "NHoLW", "NHoLB", "NHoLAIAN", "NHoLA", "NHoLNHOPI", - "NHoLSOR", "NHoLTOMR", "NHoLTRiSOR", "NHoLTReSOR", - "HoL", "HoLW", "HoLB", "HoLAIAN", "HoLA", "HoLNHOPI", - "HoLSOR", "HoLTOMR", "HoLTRiSOR", "HoLTReSOR")) - - # Select census variables - vars <- c(NHoL = "B03002_002", - NHoLW = "B03002_003", - NHoLB = "B03002_004", - NHoLAIAN = "B03002_005", - NHoLA = "B03002_006", - NHoLNHOPI = "B03002_007", - NHoLSOR = "B03002_008", - NHoLTOMR = "B03002_009", - NHoLTRiSOR = "B03002_010", - NHoLTReSOR = "B03002_011", - HoL = "B03002_012", - HoLW = "B03002_013", - HoLB = "B03002_014", - HoLAIAN = "B03002_015", - HoLA = "B03002_016", - HoLNHOPI = "B03002_017", - HoLSOR = "B03002_018", - HoLTOMR = "B03002_019", - HoLTRiSOR = "B03002_020", - HoLTReSOR = "B03002_021") - - selected_vars <- vars[c(subgroup, subgroup_ref)] - out_names <- names(selected_vars) # save for output - in_subgroup <- paste(subgroup, "E", sep = "") - in_subgroup_ref <- paste(subgroup_ref, "E", sep = "") - - # Acquire DI variables and sf geometries - di_data <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo_small, - year = year, - output = "wide", - variables = selected_vars, - geometry = TRUE, - keep_geo_vars = TRUE, ...))) - - # Format output - if (geo_small == "county") { - di_data <- sf::st_drop_geometry(di_data) %>% - tidyr::separate(NAME.y, into = c("county", "state"), sep = ",") - } - if (geo_small == "tract") { - di_data <- sf::st_drop_geometry(di_data) %>% - tidyr::separate(NAME.y, into = c("tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]", "", tract)) - } - if (geo_small == "block group") { - di_data <- sf::st_drop_geometry(di_data) %>% - tidyr::separate(NAME.y, into = c("block.group", "tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]", "", tract), - block.group = gsub("[^0-9\\.]", "", block.group)) - } +#' +duncan <- function(geo_large = 'county', + geo_small = 'tract', + year = 2020, + subgroup, + subgroup_ref, + omit_NAs = TRUE, + quiet = FALSE, + ...) { - # Grouping IDs for DI computation - if (geo_large == "tract") { - di_data <- di_data %>% - dplyr::mutate(oid = paste(.$STATEFP, .$COUNTYFP, .$TRACTCE, sep = ""), - state = stringr::str_trim(state), - county = stringr::str_trim(county)) - } - if (geo_large == "county") { - di_data <- di_data %>% - dplyr::mutate(oid = paste(.$STATEFP, .$COUNTYFP, sep = ""), - state = stringr::str_trim(state), - county = stringr::str_trim(county)) - } - if (geo_large == "state") { - di_data <- di_data %>% - dplyr::mutate(oid = .$STATEFP, - state = stringr::str_trim(state)) - } - - # Count of racial/ethnic subgroup populations - ## Count of racial/ethnic comparison subgroup population - if (length(in_subgroup) == 1) { - di_data <- di_data %>% - dplyr::mutate(subgroup = .[ , in_subgroup]) - } else { - di_data <- di_data %>% - dplyr::mutate(subgroup = rowSums(.[ , in_subgroup])) - } - ## Count of racial/ethnic reference subgroup population - if (length(in_subgroup_ref) == 1) { - di_data <- di_data %>% - dplyr::mutate(subgroup_ref = .[ , in_subgroup_ref]) - } else { - di_data <- di_data %>% - dplyr::mutate(subgroup_ref = rowSums(.[ , in_subgroup_ref])) - } - - # Compute DI - ## From Duncan & Duncan (1955) https://doi.org/10.2307/2088328 - ## D_{jt} = 1/2 \sum_{i=1}^{k} | \frac{x_{ijt}}{X_{jt}}-\frac{y_{ijt}}{Y_{jt}}| - ## Where for k smaller geographies: - ## D_{jt} denotes the DI of larger geography j at time t - ## x_{ijt} denotes the racial/ethnic subgroup population of smaller geography i within larger geography j at time t - ## X_{jt} denotes the racial/ethnic subgroup population of larger geography j at time t - ## y_{ijt} denotes the racial/ethnic referent subgroup population of smaller geography i within larger geography j at time t - ## Y_{jt} denotes the racial/ethnic referent subgroup population of larger geography j at time t - - ## Compute - DItmp <- di_data %>% - split(., f = list(di_data$oid)) %>% - lapply(., FUN = di_fun, omit_NAs = omit_NAs) %>% - utils::stack(.) %>% - dplyr::mutate(DI = values, - oid = ind) %>% - dplyr::select(DI, oid) - - # Warning for missingness of census characteristics - missingYN <- di_data[ , c(in_subgroup, in_subgroup_ref)] - names(missingYN) <- out_names - missingYN <- missingYN %>% - tidyr::pivot_longer(cols = dplyr::everything(), - names_to = "variable", - values_to = "val") %>% - dplyr::group_by(variable) %>% - dplyr::summarise(total = dplyr::n(), - n_missing = sum(is.na(val)), - percent_missing = paste0(round(mean(is.na(val)) * 100, 2), " %")) - - if (quiet == FALSE) { - # Warning for missing census data - if (sum(missingYN$n_missing) > 0) { - message("Warning: Missing census data") + # Check arguments + match.arg(geo_large, choices = c('state', 'county', 'tract')) + match.arg(geo_small, choices = c('county', 'tract', 'block group')) + stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward + match.arg( + subgroup, + several.ok = TRUE, + choices = c( + 'NHoL', + 'NHoLW', + 'NHoLB', + 'NHoLAIAN', + 'NHoLA', + 'NHoLNHOPI', + 'NHoLSOR', + 'NHoLTOMR', + 'NHoLTRiSOR', + 'NHoLTReSOR', + 'HoL', + 'HoLW', + 'HoLB', + 'HoLAIAN', + 'HoLA', + 'HoLNHOPI', + 'HoLSOR', + 'HoLTOMR', + 'HoLTRiSOR', + 'HoLTReSOR' + ) + ) + match.arg( + subgroup_ref, + several.ok = TRUE, + choices = c( + 'NHoL', + 'NHoLW', + 'NHoLB', + 'NHoLAIAN', + 'NHoLA', + 'NHoLNHOPI', + 'NHoLSOR', + 'NHoLTOMR', + 'NHoLTRiSOR', + 'NHoLTReSOR', + 'HoL', + 'HoLW', + 'HoLB', + 'HoLAIAN', + 'HoLA', + 'HoLNHOPI', + 'HoLSOR', + 'HoLTOMR', + 'HoLTRiSOR', + 'HoLTReSOR' + ) + ) + + # Select census variables + vars <- c( + NHoL = 'B03002_002', + NHoLW = 'B03002_003', + NHoLB = 'B03002_004', + NHoLAIAN = 'B03002_005', + NHoLA = 'B03002_006', + NHoLNHOPI = 'B03002_007', + NHoLSOR = 'B03002_008', + NHoLTOMR = 'B03002_009', + NHoLTRiSOR = 'B03002_010', + NHoLTReSOR = 'B03002_011', + HoL = 'B03002_012', + HoLW = 'B03002_013', + HoLB = 'B03002_014', + HoLAIAN = 'B03002_015', + HoLA = 'B03002_016', + HoLNHOPI = 'B03002_017', + HoLSOR = 'B03002_018', + HoLTOMR = 'B03002_019', + HoLTRiSOR = 'B03002_020', + HoLTReSOR = 'B03002_021' + ) + + selected_vars <- vars[c(subgroup, subgroup_ref)] + out_names <- names(selected_vars) # save for output + in_subgroup <- paste(subgroup, 'E', sep = '') + in_subgroup_ref <- paste(subgroup_ref, 'E', sep = '') + + # Acquire DI variables and sf geometries + di_data <- suppressMessages(suppressWarnings( + tidycensus::get_acs( + geography = geo_small, + year = year, + output = 'wide', + variables = selected_vars, + geometry = TRUE, + keep_geo_vars = TRUE, + ... + ) + )) + + # Format output + if (geo_small == 'county') { + di_data <- di_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('county', 'state'), sep = ',') } + if (geo_small == 'tract') { + di_data <- di_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate(tract = gsub('[^0-9\\.]', '', tract)) + } + if (geo_small == 'block group') { + di_data <- di_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('block.group', 'tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate( + tract = gsub('[^0-9\\.]', '', tract), + block.group = gsub('[^0-9\\.]', '', block.group) + ) + } + + # Grouping IDs for DI computation + if (geo_large == 'tract') { + di_data <- di_data %>% + dplyr::mutate( + oid = paste(.$STATEFP, .$COUNTYFP, .$TRACTCE, sep = ''), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) + } + if (geo_large == 'county') { + di_data <- di_data %>% + dplyr::mutate( + oid = paste(.$STATEFP, .$COUNTYFP, sep = ''), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) + } + if (geo_large == 'state') { + di_data <- di_data %>% + dplyr::mutate( + oid = .$STATEFP, + state = stringr::str_trim(state) + ) + } + + # Count of racial/ethnic subgroup populations + ## Count of racial/ethnic comparison subgroup population + if (length(in_subgroup) == 1) { + di_data <- di_data %>% + dplyr::mutate(subgroup = .[, in_subgroup]) + } else { + di_data <- di_data %>% + dplyr::mutate(subgroup = rowSums(.[, in_subgroup])) + } + ## Count of racial/ethnic reference subgroup population + if (length(in_subgroup_ref) == 1) { + di_data <- di_data %>% + dplyr::mutate(subgroup_ref = .[, in_subgroup_ref]) + } else { + di_data <- di_data %>% + dplyr::mutate(subgroup_ref = rowSums(.[, in_subgroup_ref])) + } + + # Compute DI + ## From Duncan & Duncan (1955) https://doi.org/10.2307/2088328 + ## D_{jt} = 1/2 \sum_{i=1}^{k} | \frac{x_{ijt}}{X_{jt}}-\frac{y_{ijt}}{Y_{jt}}| + ## Where for k smaller geographies: + ## D_{jt} denotes the DI of larger geography j at time t + ## x_{ijt} denotes the racial/ethnic subgroup population of smaller geography i within larger geography j at time t + ## X_{jt} denotes the racial/ethnic subgroup population of larger geography j at time t + ## y_{ijt} denotes the racial/ethnic referent subgroup population of smaller geography i within larger geography j at time t + ## Y_{jt} denotes the racial/ethnic referent subgroup population of larger geography j at time t + + ## Compute + DItmp <- di_data %>% + split(., f = list(di_data$oid)) %>% + lapply(., FUN = di_fun, omit_NAs = omit_NAs) %>% + utils::stack(.) %>% + dplyr::mutate( + DI = values, + oid = ind + ) %>% + dplyr::select(DI, oid) + + # Warning for missingness of census characteristics + missingYN <- di_data[, c(in_subgroup, in_subgroup_ref)] + names(missingYN) <- out_names + missingYN <- missingYN %>% + tidyr::pivot_longer( + cols = dplyr::everything(), + names_to = 'variable', + values_to = 'val' + ) %>% + dplyr::group_by(variable) %>% + dplyr::summarise( + total = dplyr::n(), + n_missing = sum(is.na(val)), + percent_missing = paste0(round(mean(is.na(val)) * 100, 2), ' %') + ) + + if (quiet == FALSE) { + # Warning for missing census data + if (sum(missingYN$n_missing) > 0) { + message('Warning: Missing census data') + } + } + + # Format output + if (geo_large == 'state') { + di <- di_data %>% + dplyr::left_join(DItmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, state, DI) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, state, DI) %>% + .[.$GEOID != 'NANA',] + } + if (geo_large == 'county') { + di <- di_data %>% + dplyr::left_join(DItmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, state, county, DI) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, state, county, DI) %>% + .[.$GEOID != 'NANA',] + } + if (geo_large == 'tract') { + di <- di_data %>% + dplyr::left_join(DItmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, state, county, tract, DI) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, state, county, tract, DI) %>% + .[.$GEOID != 'NANA',] + } + + di <- di %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + di_data <- di_data %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + out <- list(di = di, di_data = di_data, missing = missingYN) + + return(out) } - - # Format output - if (geo_large == "state") { - di <- merge(di_data, DItmp) %>% - dplyr::select(oid, state, DI) %>% - unique(.) %>% - dplyr::mutate(GEOID = oid) %>% - dplyr::select(GEOID, state, DI) %>% - .[.$GEOID != "NANA", ] - } - if (geo_large == "county") { - di <- merge(di_data, DItmp) %>% - dplyr::select(oid, state, county, DI) %>% - unique(.) %>% - dplyr::mutate(GEOID = oid) %>% - dplyr::select(GEOID, state, county, DI) %>% - .[.$GEOID != "NANA", ] - } - if (geo_large == "tract") { - di <- merge(di_data, DItmp) %>% - dplyr::select(oid, state, county, tract, DI) %>% - unique(.) %>% - dplyr::mutate(GEOID = oid) %>% - dplyr::select(GEOID, state, county, tract, DI) %>% - .[.$GEOID != "NANA", ] - } - - di <- di %>% - dplyr::arrange(GEOID) %>% - dplyr::as_tibble() - - di_data <- di_data %>% - dplyr::arrange(GEOID) %>% - dplyr::as_tibble() - - out <- list(di = di, - di_data = di_data, - missing = missingYN) - - return(out) -} diff --git a/R/gini.R b/R/gini.R index 4268a57..09da8a2 100644 --- a/R/gini.R +++ b/R/gini.R @@ -1,107 +1,121 @@ -#' Gini Index based on Gini (1921) -#' +#' Gini Index based on Gini (1921) +#' #' Retrieve the aspatial Gini Index of income inequality. #' -#' @param geo Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}. +#' @param geo Character string specifying the geography of the data either census tracts \code{geo = 'tract'} (the default) or counties \code{geo = 'county'}. #' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available. #' @param quiet Logical. If TRUE, will display messages about potential missing census information #' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics #' #' @details This function will retrieve the aspatial Gini Index of U.S. census tracts or counties for a specified geographical extent (e.g., the entire U.S. or a single state) based on Gini (1921) \doi{10.2307/2223319}. -#' +#' #' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey estimates of the Gini Index for income inequality (ACS: B19083). The estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. -#' +#' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. -#' -#' According to the U.S. Census Bureau \url{https://www.census.gov/topics/income-poverty/income-inequality/about/metrics/gini-index.html}: "The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income). The Gini is based on the difference between the Lorenz curve (the observed cumulative income distribution) and the notion of a perfectly equal income distribution." -#' +#' +#' According to the U.S. Census Bureau \url{https://www.census.gov/topics/income-poverty/income-inequality/about/metrics/gini-index.html}: 'The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income). The Gini is based on the difference between the Lorenz curve (the observed cumulative income distribution) and the notion of a perfectly equal income distribution.' +#' #' @return An object of class 'list'. This is a named list with the following components: -#' +#' #' \describe{ #' \item{\code{gini}}{An object of class 'tbl' for the GEOID, name, and Gini index of specified census geographies.} #' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for the Gini index.} #' } -#' +#' #' @import dplyr #' @importFrom stringr str_trim #' @importFrom tidycensus get_acs #' @importFrom tidyr pivot_longer separate #' @export -#' +#' #' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). #' #' @examples #' \dontrun{ #' # Wrapped in \dontrun{} because these examples require a Census API key. -#' +#' #' # Tract-level metric (2020) -#' gini(geo = "tract", state = "GA", year = 2020) -#' +#' gini(geo = 'tract', state = 'GA', year = 2020) +#' #' # County-level metric (2020) -#' gini(geo = "county", state = "GA", year = 2020) -#' +#' gini(geo = 'county', state = 'GA', year = 2020) +#' #' } -#' -gini <- function(geo = "tract", year = 2020, quiet = FALSE, ...) { +#' +gini <- function(geo = 'tract', + year = 2020, + quiet = FALSE, + ...) { # Check arguments - match.arg(geo, choices = c("county", "tract")) + match.arg(geo, choices = c('county', 'tract')) stopifnot(is.numeric(year), year >= 2009) # the gini variable is available before and after 2009 but constrained for consistency with out indices (for now) # Select census variable - vars <- c(gini = "B19083_001") + vars <- c(gini = 'B19083_001') # Acquire Gini Index - gini_data <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo, - year = year, - output = "wide", - variables = vars, ...))) - - if (geo == "tract") { + gini_data <- suppressMessages(suppressWarnings( + tidycensus::get_acs( + geography = geo, + year = year, + output = 'wide', + variables = vars, + ... + ) + )) + + if (geo == 'tract') { gini_data <- gini_data %>% - tidyr::separate(NAME, into = c("tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]","", tract)) + tidyr::separate(NAME, into = c('tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate(tract = gsub('[^0-9\\.]', '', tract)) } else { - gini_data <- gini_data %>% tidyr::separate(NAME, into = c("county", "state"), sep = ",") + gini_data <- gini_data %>% + tidyr::separate(NAME, into = c('county', 'state'), sep = ',') } gini_data <- gini_data %>% - dplyr::mutate(gini = giniE) + dplyr::mutate(gini = giniE) # Warning for missingness of census characteristics missingYN <- gini_data %>% dplyr::select(gini) %>% - tidyr::pivot_longer(cols = dplyr::everything(), - names_to = "variable", - values_to = "val") %>% + tidyr::pivot_longer( + cols = dplyr::everything(), + names_to = 'variable', + values_to = 'val' + ) %>% dplyr::group_by(variable) %>% - dplyr::summarise(total = dplyr::n(), - n_missing = sum(is.na(val)), - percent_missing = paste0(round(mean(is.na(val)) * 100, 2), " %")) + dplyr::summarise( + total = dplyr::n(), + n_missing = sum(is.na(val)), + percent_missing = paste0(round(mean(is.na(val)) * 100, 2), ' %') + ) if (quiet == FALSE) { # Warning for missing census data if (sum(missingYN$n_missing) > 0) { - message("Warning: Missing census data") + message('Warning: Missing census data') } } - if (geo == "tract") { + if (geo == 'tract') { gini <- gini_data %>% dplyr::select(GEOID, state, county, tract, gini) } else { gini <- gini_data %>% - dplyr::select(GEOID, state, county, gini) + dplyr::select(GEOID, state, county, gini) } gini <- gini %>% - dplyr::mutate(state = stringr::str_trim(state), - county = stringr::str_trim(county)) %>% + dplyr::mutate( + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) %>% dplyr::arrange(GEOID) %>% - dplyr::as_tibble() + dplyr::as_tibble() - out <- list(gini = gini, - missing = missingYN) + out <- list(gini = gini, missing = missingYN) return(out) } diff --git a/R/globals.R b/R/globals.R index b2aeb06..43ec440 100644 --- a/R/globals.R +++ b/R/globals.R @@ -1,43 +1,253 @@ -globalVariables(c("CWD", "EDU", "EMP", "FHH", "GEOID", "MedHHInc", "MedHHIncE", "MedHomeVal", "MedHomeValE", "NAME", - "NDI", "OCC", "PC1", "POV", "PUB", "PctCrwdHH_denE", "PctCrwdHH_num1E", "PctCrwdHH_num2E", - "PctCrwdHH_num3E", "PctCrwdHH_num4E", "PctCrwdHH_num5E", "PctCrwdHH_num6E", - "PctEducBchPlus", "PctEducHSPlus", "PctEducLTBch", "PctEducLTBchZ", "PctEducLTHS", - "PctEducLTHSZ", "PctEducLessThanHS_denE", "PctEducLessThanHS_numE", - "PctEduc_den25upE", "PctEduc_num25upADE", "PctEduc_num25upBDE", - "PctEduc_num25upGDE", "PctEduc_num25upHSE", "PctEduc_num25upSCE", - "PctFamBelowPov", "PctFamBelowPovE", "PctFamBelowPovZ", "PctFemHeadKids", - "PctFemHeadKidsZ", "PctFemHeadKids_denE", "PctFemHeadKids_num1E", - "PctFemHeadKids_num2E", "PctHHPov_denE", "PctHHPov_numE", "PctHHUnder30K_denE", - "PctHHUnder30K_num1E", "PctHHUnder30K_num2E", "PctHHUnder30K_num3E", - "PctHHUnder30K_num4E", "PctHHUnder30K_num5E", "PctMenMgmtBusScArti_denE", - "PctMenMgmtBusScArti_num1E", "PctMenMgmtBusScArti_num2E", "PctMgmtBusScArti", - "PctMgmtBusScArti_denE", "PctMgmtBusScArti_numE", "PctNComPlmb", "PctNComPlmbE", - "PctNComPlmbZ", "PctNoIDR", "PctNoIDRZ", "PctNoPhone", "PctNoPhoneE", "PctNoPhoneZ", - "PctNotOwnerOcc", "PctNotOwnerOccZ", "PctOwnerOcc", "PctOwnerOccE", "PctPubAsst", - "PctPubAsstZ", "PctPubAsst_denE", "PctPubAsst_numE", "PctRecvIDR", - "PctRecvIDR_denE", "PctRecvIDR_numE", "PctUnemp_1619FE", "PctUnemp_1619ME", - "PctUnemp_2021FE", "PctUnemp_2021ME", "PctUnemp_2224FE", "PctUnemp_2224ME", - "PctUnemp_2529FE", "PctUnemp_2529ME", "PctUnemp_4554FE", "PctUnemp_4554ME", - "PctUnemp_5559FE", "PctUnemp_5559ME", "PctUnemp_6061FE", "PctUnemp_6061ME", - "PctUnemp_6264FE", "PctUnemp_6264ME", "PctUnemp_6569FE", "PctUnemp_6569ME", - "PctUnemp_7074FE", "PctUnemp_7074ME", "PctUnemp_75upME", "PctUnemp_denE", - "PctUnemp_numE", "PctUnempl", "PctUnemplE", "PctUnemplZ", "PctWorkClass", - "PctWorkClassZ", "TotalPop", "TotalPopulationE", "U30", "county", "logMedHHInc", - "logMedHomeVal", "percent", "state", "total", "tract", "val", "variable", "giniE", - "A_edu", "A_inc", "A_wbinc", "A_wpcinc", "B100125i", "B100125iE", "B100125nhw", - "B100125nhwE", "B1015bih", "B1015bihE", "B1015i", "B1015iE", "B1015nhw", "B1015nhwE", - "B125150i", "B125150iE", "B125150nhw", "B125150nhwE", "B150200hw", "B150200i", - "B150200iE", "B150200nhw", "B150200nhwE", "B1520bih", "B1520bihE", "B1520i", "B1520iE", - "B1520nhw", "B1520nhwE", "B2025bih", "B2025bihE", "B2025i", "B2025iE", "B2025nhw", - "B2025nhwE", "B2530bih", "B2530bihE", "B2530i", "B2530iE", "B2530nhw", "B2530nhwE", - "ICE_edu", "ICE_inc", "ICE_rewb", "ICE_wbinc", "ICE_wpcinc", "NHoLB", "NHoLBE", "NHoLW", - "NHoLWE", "O200i", "O200iE", "O200nhw", "O200nhwE", "O25F10G", "O25F10GE", "O25F11G", - "O25F11GE", "O25F12GND", "O25F12GNDE", "O25F5t6G", "O25F5t6GE", "O25F7t8G", "O25F7t8GE", - "O25F9G", "O25F9GE", "O25FBD", "O25FBDE", "O25FDD", "O25FDDE", "O25FMD", "O25FMDE", "O25FNSC", - "O25FNSCE", "O25FNt4G", "O25FNt4GE", "O25FPSD", "O25FPSDE", "O25M10G", "O25M10GE", "O25M11G", - "O25M11GE", "O25M12GND", "O25M12GNDE", "O25M5t6G", "O25M5t6GE", "O25M7t8G", "O25M7t8GE", - "O25M9G", "O25M9GE", "O25MBD", "O25MBDE", "O25MDD", "O25MDDE", "O25MMD", "O25MMDE", "O25MNSC", - "O25MNSCE", "O25MNt4G", "O25MNt4GE", "O25MPSD", "O25MPSDE", "P_edu", "P_inc", "P_wbinc", - "P_wpcinc", "TotalPop_edu", "TotalPop_inc", "TotalPop_re", "TotalPopeduE", - "TotalPopiE", "TotalPopreE", "U10bih", "U10bihE", "U10i", "U10iE", "U10nhw", "U10nhwE", "NAME.y", - ".", "values", "ind", "oid", "block.group", "DI", "AI", "II", "V", "LQ", "LExIs")) +globalVariables( + c( + 'CWD', + 'EDU', + 'EMP', + 'FHH', + 'GEOID', + 'MedHHInc', + 'MedHHIncE', + 'MedHomeVal', + 'MedHomeValE', + 'NAME', + 'NDI', + 'OCC', + 'PC1', + 'POV', + 'PUB', + 'PctCrwdHH_denE', + 'PctCrwdHH_num1E', + 'PctCrwdHH_num2E', + 'PctCrwdHH_num3E', + 'PctCrwdHH_num4E', + 'PctCrwdHH_num5E', + 'PctCrwdHH_num6E', + 'PctEducBchPlus', + 'PctEducHSPlus', + 'PctEducLTBch', + 'PctEducLTBchZ', + 'PctEducLTHS', + 'PctEducLTHSZ', + 'PctEducLessThanHS_denE', + 'PctEducLessThanHS_numE', + 'PctEduc_den25upE', + 'PctEduc_num25upADE', + 'PctEduc_num25upBDE', + 'PctEduc_num25upGDE', + 'PctEduc_num25upHSE', + 'PctEduc_num25upSCE', + 'PctFamBelowPov', + 'PctFamBelowPovE', + 'PctFamBelowPovZ', + 'PctFemHeadKids', + 'PctFemHeadKidsZ', + 'PctFemHeadKids_denE', + 'PctFemHeadKids_num1E', + 'PctFemHeadKids_num2E', + 'PctHHPov_denE', + 'PctHHPov_numE', + 'PctHHUnder30K_denE', + 'PctHHUnder30K_num1E', + 'PctHHUnder30K_num2E', + 'PctHHUnder30K_num3E', + 'PctHHUnder30K_num4E', + 'PctHHUnder30K_num5E', + 'PctMenMgmtBusScArti_denE', + 'PctMenMgmtBusScArti_num1E', + 'PctMenMgmtBusScArti_num2E', + 'PctMgmtBusScArti', + 'PctMgmtBusScArti_denE', + 'PctMgmtBusScArti_numE', + 'PctNComPlmb', + 'PctNComPlmbE', + 'PctNComPlmbZ', + 'PctNoIDR', + 'PctNoIDRZ', + 'PctNoPhone', + 'PctNoPhoneE', + 'PctNoPhoneZ', + 'PctNotOwnerOcc', + 'PctNotOwnerOccZ', + 'PctOwnerOcc', + 'PctOwnerOccE', + 'PctPubAsst', + 'PctPubAsstZ', + 'PctPubAsst_denE', + 'PctPubAsst_numE', + 'PctRecvIDR', + 'PctRecvIDR_denE', + 'PctRecvIDR_numE', + 'PctUnemp_1619FE', + 'PctUnemp_1619ME', + 'PctUnemp_2021FE', + 'PctUnemp_2021ME', + 'PctUnemp_2224FE', + 'PctUnemp_2224ME', + 'PctUnemp_2529FE', + 'PctUnemp_2529ME', + 'PctUnemp_4554FE', + 'PctUnemp_4554ME', + 'PctUnemp_5559FE', + 'PctUnemp_5559ME', + 'PctUnemp_6061FE', + 'PctUnemp_6061ME', + 'PctUnemp_6264FE', + 'PctUnemp_6264ME', + 'PctUnemp_6569FE', + 'PctUnemp_6569ME', + 'PctUnemp_7074FE', + 'PctUnemp_7074ME', + 'PctUnemp_75upME', + 'PctUnemp_denE', + 'PctUnemp_numE', + 'PctUnempl', + 'PctUnemplE', + 'PctUnemplZ', + 'PctWorkClass', + 'PctWorkClassZ', + 'TotalPop', + 'TotalPopulationE', + 'U30', + 'county', + 'logMedHHInc', + 'logMedHomeVal', + 'percent', + 'state', + 'total', + 'tract', + 'val', + 'variable', + 'giniE', + 'A_edu', + 'A_inc', + 'A_wbinc', + 'A_wpcinc', + 'B100125i', + 'B100125iE', + 'B100125nhw', + 'B100125nhwE', + 'B1015bih', + 'B1015bihE', + 'B1015i', + 'B1015iE', + 'B1015nhw', + 'B1015nhwE', + 'B125150i', + 'B125150iE', + 'B125150nhw', + 'B125150nhwE', + 'B150200hw', + 'B150200i', + 'B150200iE', + 'B150200nhw', + 'B150200nhwE', + 'B1520bih', + 'B1520bihE', + 'B1520i', + 'B1520iE', + 'B1520nhw', + 'B1520nhwE', + 'B2025bih', + 'B2025bihE', + 'B2025i', + 'B2025iE', + 'B2025nhw', + 'B2025nhwE', + 'B2530bih', + 'B2530bihE', + 'B2530i', + 'B2530iE', + 'B2530nhw', + 'B2530nhwE', + 'ICE_edu', + 'ICE_inc', + 'ICE_rewb', + 'ICE_wbinc', + 'ICE_wpcinc', + 'NHoLB', + 'NHoLBE', + 'NHoLW', + 'NHoLWE', + 'O200i', + 'O200iE', + 'O200nhw', + 'O200nhwE', + 'O25F10G', + 'O25F10GE', + 'O25F11G', + 'O25F11GE', + 'O25F12GND', + 'O25F12GNDE', + 'O25F5t6G', + 'O25F5t6GE', + 'O25F7t8G', + 'O25F7t8GE', + 'O25F9G', + 'O25F9GE', + 'O25FBD', + 'O25FBDE', + 'O25FDD', + 'O25FDDE', + 'O25FMD', + 'O25FMDE', + 'O25FNSC', + 'O25FNSCE', + 'O25FNt4G', + 'O25FNt4GE', + 'O25FPSD', + 'O25FPSDE', + 'O25M10G', + 'O25M10GE', + 'O25M11G', + 'O25M11GE', + 'O25M12GND', + 'O25M12GNDE', + 'O25M5t6G', + 'O25M5t6GE', + 'O25M7t8G', + 'O25M7t8GE', + 'O25M9G', + 'O25M9GE', + 'O25MBD', + 'O25MBDE', + 'O25MDD', + 'O25MDDE', + 'O25MMD', + 'O25MMDE', + 'O25MNSC', + 'O25MNSCE', + 'O25MNt4G', + 'O25MNt4GE', + 'O25MPSD', + 'O25MPSDE', + 'P_edu', + 'P_inc', + 'P_wbinc', + 'P_wpcinc', + 'TotalPop_edu', + 'TotalPop_inc', + 'TotalPop_re', + 'TotalPopeduE', + 'TotalPopiE', + 'TotalPopreE', + 'U10bih', + 'U10bihE', + 'U10i', + 'U10iE', + 'U10nhw', + 'U10nhwE', + 'NAME.y', + '.', + 'values', + 'ind', + 'oid', + 'block.group', + 'DI', + 'AI', + 'II', + 'V', + 'LQ', + 'LExIs', + 'DEL' + ) +) diff --git a/R/hoover.R b/R/hoover.R new file mode 100644 index 0000000..e8afe1e --- /dev/null +++ b/R/hoover.R @@ -0,0 +1,292 @@ +#' Delta based on Hoover (1941) and Duncan et al. (1961) +#' +#' Compute the aspatial Delta (Hoover) of a selected racial/ethnic subgroup(s) and U.S. geographies. +#' +#' @param geo_large Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}. +#' @param geo_small Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}. +#' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available. +#' @param subgroup Character string specifying the racial/ethnic subgroup(s). See Details for available choices. +#' @param omit_NAs Logical. If FALSE, will compute index for a larger geographical unit only if all of its smaller geographical units have values. The default is TRUE. +#' @param quiet Logical. If TRUE, will display messages about potential missing census information. The default is FALSE. +#' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics +#' +#' @details This function will compute the aspatial Delta (DEL) of selected racial/ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Hoover (1941) \doi{10.1017/S0022050700052980} and Duncan, Cuzzort, and Duncan (1961; LC:60007089). This function provides the computation of DEL for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). +#' +#' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: +#' \itemize{ +#' \item **B03002_002**: not Hispanic or Latino \code{'NHoL'} +#' \item **B03002_003**: not Hispanic or Latino, white alone \code{'NHoLW'} +#' \item **B03002_004**: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +#' \item **B03002_005**: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +#' \item **B03002_006**: not Hispanic or Latino, Asian alone \code{'NHoLA'} +#' \item **B03002_007**: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +#' \item **B03002_008**: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +#' \item **B03002_009**: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +#' \item **B03002_010**: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +#' \item **B03002_011**: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +#' \item **B03002_012**: Hispanic or Latino \code{'HoL'} +#' \item **B03002_013**: Hispanic or Latino, white alone \code{'HoLW'} +#' \item **B03002_014**: Hispanic or Latino, Black or African American alone \code{'HoLB'} +#' \item **B03002_015**: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +#' \item **B03002_016**: Hispanic or Latino, Asian alone \code{'HoLA'} +#' \item **B03002_017**: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +#' \item **B03002_018**: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +#' \item **B03002_019**: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +#' \item **B03002_020**: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +#' \item **B03002_021**: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} +#' } +#' +#' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. +#' +#' DEL is a measure of the proportion of members of one subgroup(s) residing in geographic units with above average density of members of the subgroup(s). The index provides the proportion of a subgroup population that would have to move across geographic units to achieve a uniform density. DEL can range in value from 0 to 1. +#' +#' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, and census tract \code{geo_large = 'tract'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the DEL value returned is NA. +#' +#' @return An object of class 'list'. This is a named list with the following components: +#' +#' \describe{ +#' \item{\code{del}}{An object of class 'tbl' for the GEOID, name, and DEL at specified larger census geographies.} +#' \item{\code{del_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.} +#' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute DEL.} +#' } +#' +#' @import dplyr +#' @importFrom sf st_drop_geometry +#' @importFrom stats complete.cases +#' @importFrom tidycensus get_acs +#' @importFrom tidyr pivot_longer separate +#' @importFrom utils stack +#' @export +#' +#' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). +#' +#' @examples +#' \dontrun{ +#' # Wrapped in \dontrun{} because these examples require a Census API key. +#' +#' # Delta (a measure of concentration) of non-Hispanic Black vs. non-Hispanic white populations +#' ## of census tracts within Georgia, U.S.A., counties (2020) +#' hoover( +#' geo_large = 'county', +#' geo_small = 'tract', +#' state = 'GA', +#' year = 2020, +#' subgroup = 'NHoLB' +#' ) +#' +#' } +#' +hoover <- function(geo_large = 'county', + geo_small = 'tract', + year = 2020, + subgroup, + omit_NAs = TRUE, + quiet = FALSE, + ...) { + + # Check arguments + match.arg(geo_large, choices = c('state', 'county', 'tract')) + match.arg(geo_small, choices = c('county', 'tract', 'block group')) + stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward + match.arg( + subgroup, + several.ok = TRUE, + choices = c( + 'NHoL', + 'NHoLW', + 'NHoLB', + 'NHoLAIAN', + 'NHoLA', + 'NHoLNHOPI', + 'NHoLSOR', + 'NHoLTOMR', + 'NHoLTRiSOR', + 'NHoLTReSOR', + 'HoL', + 'HoLW', + 'HoLB', + 'HoLAIAN', + 'HoLA', + 'HoLNHOPI', + 'HoLSOR', + 'HoLTOMR', + 'HoLTRiSOR', + 'HoLTReSOR' + ) + ) + + # Select census variables + vars <- c( + TotalPop = 'B03002_001', + NHoL = 'B03002_002', + NHoLW = 'B03002_003', + NHoLB = 'B03002_004', + NHoLAIAN = 'B03002_005', + NHoLA = 'B03002_006', + NHoLNHOPI = 'B03002_007', + NHoLSOR = 'B03002_008', + NHoLTOMR = 'B03002_009', + NHoLTRiSOR = 'B03002_010', + NHoLTReSOR = 'B03002_011', + HoL = 'B03002_012', + HoLW = 'B03002_013', + HoLB = 'B03002_014', + HoLAIAN = 'B03002_015', + HoLA = 'B03002_016', + HoLNHOPI = 'B03002_017', + HoLSOR = 'B03002_018', + HoLTOMR = 'B03002_019', + HoLTRiSOR = 'B03002_020', + HoLTReSOR = 'B03002_021' + ) + + selected_vars <- vars[subgroup] + out_names <- c(names(selected_vars), 'ALAND') # save for output + in_subgroup <- paste(subgroup, 'E', sep = '') + + # Acquire DEL variables and sf geometries + del_data <- suppressMessages(suppressWarnings( + tidycensus::get_acs( + geography = geo_small, + year = year, + output = 'wide', + variables = selected_vars, + geometry = TRUE, + keep_geo_vars = TRUE, + ... + ) + )) + + # Format output + if (geo_small == 'county') { + del_data <- del_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('county', 'state'), sep = ',') + } + if (geo_small == 'tract') { + del_data <- del_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate(tract = gsub('[^0-9\\.]', '', tract)) + } + if (geo_small == 'block group') { + del_data <- del_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('block.group', 'tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate( + tract = gsub('[^0-9\\.]', '', tract), block.group = gsub('[^0-9\\.]', '', block.group) + ) + } + + # Grouping IDs for DEL computation + if (geo_large == 'tract') { + del_data <- del_data %>% + dplyr::mutate( + oid = paste(.$STATEFP, .$COUNTYFP, .$TRACTCE, sep = ''), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) + } + if (geo_large == 'county') { + del_data <- del_data %>% + dplyr::mutate( + oid = paste(.$STATEFP, .$COUNTYFP, sep = ''), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) + } + if (geo_large == 'state') { + del_data <- del_data %>% + dplyr::mutate( + oid = .$STATEFP, + state = stringr::str_trim(state) + ) + } + + # Count of racial/ethnic subgroup populations + ## Count of racial/ethnic comparison subgroup population + if (length(in_subgroup) == 1) { + del_data <- del_data %>% + dplyr::mutate(subgroup = .[ , in_subgroup]) + } else { + del_data <- del_data %>% + dplyr::mutate(subgroup = rowSums(.[ , in_subgroup])) + } + + # Compute DEL + ## From Hoover (1961) https://10.1017/S0022050700052980 + ## 0.5\sum_{i=1}^{n}\left|\frac{x_{i}}{X}-\frac{a_{i}}{A}\right| + ## Where for k geographical units i: + ## X denotes the total number of subgroup population in study (reference) area + ## x_{i} denotes the number of subgroup population X in geographical unit i + ## A denotes the total land area in study (reference) area (sum of all a_{i} + ## a_{i} denotes the land area of geographical unit i + + ## Compute + DELtmp <- del_data %>% + split(., f = list(del_data$oid)) %>% + lapply(., FUN = del_fun, omit_NAs = omit_NAs) %>% + utils::stack(.) %>% + dplyr::mutate(DEL = values, oid = ind) %>% + dplyr::select(DEL, oid) + + # Warning for missingness of census characteristics + missingYN <- del_data[ , c(in_subgroup, 'ALAND')] + names(missingYN) <- out_names + missingYN <- missingYN %>% + tidyr::pivot_longer(cols = dplyr::everything(), names_to = 'variable', values_to = 'val') %>% + dplyr::group_by(variable) %>% + dplyr::summarise( + total = dplyr::n(), + n_missing = sum(is.na(val)), + percent_missing = paste0(round(mean(is.na(val)) * 100, 2), ' %') + ) + + if (quiet == FALSE) { + # Warning for missing census data + if (sum(missingYN$n_missing) > 0) { + message('Warning: Missing census data') + } + } + + # Format output + if (geo_large == 'state') { + del <- del_data %>% + dplyr::left_join(DELtmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, state, DEL) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, state, DEL) %>% + .[.$GEOID != 'NANA', ] + } + if (geo_large == 'county') { + del <- del_data %>% + dplyr::left_join(DELtmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, state, county, DEL) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, state, county, DEL) %>% + .[.$GEOID != 'NANA', ] + } + if (geo_large == 'tract') { + del <- del_data %>% + dplyr::left_join(DELtmp, by = dplyr::join_by(oid)) %>% + dplyr::select(oid, state, county, tract, DEL) %>% + unique(.) %>% + dplyr::mutate(GEOID = oid) %>% + dplyr::select(GEOID, state, county, tract, DEL) %>% + .[.$GEOID != 'NANA', ] + } + + del <- del %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + del_data <- del_data %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + out <- list(del = del, del_data = del_data, missing = missingYN) + + return(out) +} diff --git a/R/krieger.R b/R/krieger.R index 6bcdd06..73f0987 100644 --- a/R/krieger.R +++ b/R/krieger.R @@ -1,22 +1,22 @@ -#' Index of Concentration at the Extremes based on Feldman _et al._ (2015) and Krieger _et al._ (2016) -#' +#' Index of Concentration at the Extremes based on Feldman et al. (2015) and Krieger et al. (2016) +#' #' Compute the aspatial Index of Concentration at the Extremes (Krieger). #' -#' @param geo Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}. +#' @param geo Character string specifying the geography of the data either census tracts \code{geo = 'tract'} (the default) or counties \code{geo = 'county'}. #' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available. #' @param quiet Logical. If TRUE, will display messages about potential missing census information. The default is FALSE. #' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics #' -#' @details This function will compute three aspatial Index of Concentration at the Extremes (ICE) of U.S. census tracts or counties for a specified geographical extent (e.g., entire U.S. or a single state) based on Feldman _et al._ (2015) \doi{10.1136/jech-2015-205728} and Krieger _et al._ (2016) \doi{10.2105/AJPH.2015.302955}. The authors expanded the metric designed by Massey in a chapter of Booth & Crouter (2001) \doi{10.4324/9781410600141} who initially designed the metric for residential segregation. This function computes five ICE metrics: -#' -#' \itemize{ +#' @details This function will compute three aspatial Index of Concentration at the Extremes (ICE) of U.S. census tracts or counties for a specified geographical extent (e.g., entire U.S. or a single state) based on Feldman et al. (2015) \doi{10.1136/jech-2015-205728} and Krieger et al. (2016) \doi{10.2105/AJPH.2015.302955}. The authors expanded the metric designed by Massey in a chapter of Booth & Crouter (2001) \doi{10.4324/9781410600141} who initially designed the metric for residential segregation. This function computes five ICE metrics: +#' +#' \itemize{ #' \item **Income**: 80th income percentile vs. 20th income percentile #' \item **Education**: less than high school vs. four-year college degree or more #' \item **Race/Ethnicity**: white non-Hispanic vs. black non-Hispanic #' \item **Income and race/ethnicity combined**: white non-Hispanic in 80th income percentile vs. black alone (including Hispanic) in 20th income percentile #' \item **Income and race/ethnicity combined**: white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile #' } -#' +#' #' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the geospatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The ACS-5 groups used in the computation of the five ICE metrics are: #' \itemize{ #' \item **B03002**: HISPANIC OR LATINO ORIGIN BY RACE @@ -25,280 +25,453 @@ #' \item **B19001B**: HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 20XX INFLATION-ADJUSTED DOLLARS) (BLACK OR AFRICAN AMERICAN ALONE HOUSEHOLDER) #' \item **B19001H**: HOUSEHOLD INCOME IN THE PAST 12 MONTHS (IN 20XX INFLATION-ADJUSTED DOLLARS) (WHITE ALONE, NOT HISPANIC OR LATINO HOUSEHOLDER) #' } -#' +#' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. -#' +#' #' ICE metrics can range in value from -1 (most deprived) to 1 (most privileged). A value of 0 can thus represent two possibilities: (1) none of the residents are in the most privileged or most deprived categories, or (2) an equal number of persons are in the most privileged and most deprived categories, and in both cases indicates that the area is not dominated by extreme concentrations of either of the two groups. -#' +#' #' @return An object of class 'list'. This is a named list with the following components: -#' +#' #' \describe{ #' \item{\code{ice}}{An object of class 'tbl' for the GEOID, name, ICE metrics, and raw census values of specified census geographies.} #' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute the ICEs.} #' } -#' +#' #' @import dplyr #' @importFrom stringr str_trim #' @importFrom tidycensus get_acs #' @importFrom tidyr pivot_longer separate #' @export -#' +#' #' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). #' #' @examples #' \dontrun{ #' # Wrapped in \dontrun{} because these examples require a Census API key. -#' +#' #' # Tract-level metric (2020) -#' krieger(geo = "tract", state = "GA", year = 2020) -#' +#' krieger(geo = 'tract', state = 'GA', year = 2020) +#' #' # County-level metric (2020) -#' krieger(geo = "county", state = "GA", year = 2020) -#' +#' krieger(geo = 'county', state = 'GA', year = 2020) +#' #' } -#' -krieger <- function(geo = "tract", year = 2020, quiet = FALSE, ...) { - - # Check arguments - match.arg(geo, choices = c("county", "tract")) - stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward - - # Select census variables - vars <- c(TotalPopi = "B19001_001", - TotalPopedu = "B15002_001", - TotalPopre = "B03002_001", - U10i = "B19001_002", - B1015i = "B19001_003", - B1520i = "B19001_004", - B2025i = "B19001_005", - B2530i = "B19001_006", - B100125i = "B19001_014", - B125150i = "B19001_015", - B150200i = "B19001_016", - O200i = "B19001_017", - O25MNSC = "B15002_003", - O25FNSC = "B15002_020", - O25MNt4G = "B15002_004", - O25FNt4G = "B15002_021", - O25M5t6G = "B15002_005", - O25F5t6G = "B15002_022", - O25M7t8G = "B15002_006", - O25F7t8G = "B15002_023", - O25M9G = "B15002_007", - O25F9G = "B15002_024", - O25M10G = "B15002_008", - O25F10G = "B15002_025", - O25M11G = "B15002_009", - O25F11G = "B15002_026", - O25M12GND = "B15002_010", - O25F12GND = "B15002_027", - O25MBD = "B15002_015", - O25FBD = "B15002_032", - O25MMD = "B15002_016", - O25FMD = "B15002_033", - O25MPSD = "B15002_017", - O25FPSD = "B15002_034", - O25MDD = "B15002_018", - O25FDD = "B15002_035", - NHoLW = "B03002_003", - NHoLB = "B03002_004", - U10nhw = "B19001H_002", - B1015nhw = "B19001H_003", - B1520nhw = "B19001H_004", - B2025nhw = "B19001H_005", - B2530nhw = "B19001H_006", - B100125nhw = "B19001H_014", - B125150nhw = "B19001H_015", - B150200nhw = "B19001H_016", - O200nhw = "B19001H_017", - U10bih = "B19001B_002", - B1015bih = "B19001B_003", - B1520bih = "B19001B_004", - B2025bih = "B19001B_005", - B2530bih = "B19001B_006") +#' +krieger <- function(geo = 'tract', + year = 2020, + quiet = FALSE, + ...) { - # Acquire ICE variables - ice_data <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo, - year = year, - output = "wide", - variables = vars, ...))) - - if (geo == "tract") { + # Check arguments + match.arg(geo, choices = c('county', 'tract')) + stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward + + # Select census variables + vars <- c( + TotalPopi = 'B19001_001', + TotalPopedu = 'B15002_001', + TotalPopre = 'B03002_001', + U10i = 'B19001_002', + B1015i = 'B19001_003', + B1520i = 'B19001_004', + B2025i = 'B19001_005', + B2530i = 'B19001_006', + B100125i = 'B19001_014', + B125150i = 'B19001_015', + B150200i = 'B19001_016', + O200i = 'B19001_017', + O25MNSC = 'B15002_003', + O25FNSC = 'B15002_020', + O25MNt4G = 'B15002_004', + O25FNt4G = 'B15002_021', + O25M5t6G = 'B15002_005', + O25F5t6G = 'B15002_022', + O25M7t8G = 'B15002_006', + O25F7t8G = 'B15002_023', + O25M9G = 'B15002_007', + O25F9G = 'B15002_024', + O25M10G = 'B15002_008', + O25F10G = 'B15002_025', + O25M11G = 'B15002_009', + O25F11G = 'B15002_026', + O25M12GND = 'B15002_010', + O25F12GND = 'B15002_027', + O25MBD = 'B15002_015', + O25FBD = 'B15002_032', + O25MMD = 'B15002_016', + O25FMD = 'B15002_033', + O25MPSD = 'B15002_017', + O25FPSD = 'B15002_034', + O25MDD = 'B15002_018', + O25FDD = 'B15002_035', + NHoLW = 'B03002_003', + NHoLB = 'B03002_004', + U10nhw = 'B19001H_002', + B1015nhw = 'B19001H_003', + B1520nhw = 'B19001H_004', + B2025nhw = 'B19001H_005', + B2530nhw = 'B19001H_006', + B100125nhw = 'B19001H_014', + B125150nhw = 'B19001H_015', + B150200nhw = 'B19001H_016', + O200nhw = 'B19001H_017', + U10bih = 'B19001B_002', + B1015bih = 'B19001B_003', + B1520bih = 'B19001B_004', + B2025bih = 'B19001B_005', + B2530bih = 'B19001B_006' + ) + + # Acquire ICE variables + ice_data <- suppressMessages(suppressWarnings( + tidycensus::get_acs( + geography = geo, + year = year, + output = 'wide', + variables = vars, + ... + ) + )) + + + if (geo == 'tract') { + ice_data <- ice_data %>% + tidyr::separate(NAME, into = c('tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate(tract = gsub('[^0-9\\.]', '', tract)) + } else { + ice_data <- ice_data %>% + tidyr::separate(NAME, into = c('county', 'state'), sep = ',') + } + ice_data <- ice_data %>% - tidyr::separate(NAME, into = c("tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]","", tract)) - } else { - ice_data <- ice_data %>% tidyr::separate(NAME, into = c("county", "state"), sep = ",") - } - - ice_data <- ice_data %>% - dplyr::mutate(TotalPop_inc = TotalPopiE, - TotalPop_edu = TotalPopeduE, - TotalPop_re = TotalPopreE, - U10i = U10iE, - B1015i = B1015iE, - B1520i = B1520iE, - B2025i = B2025iE, - B2530i = B2530iE, - B100125i = B100125iE, - B125150i = B125150iE, - B150200i = B150200iE, - O200i = O200iE, - O25MNSC = O25MNSCE, - O25FNSC = O25FNSCE, - O25MNt4G = O25MNt4GE, - O25FNt4G = O25FNt4GE, - O25M5t6G = O25M5t6GE, - O25F5t6G = O25F5t6GE, - O25M7t8G = O25M7t8GE, - O25F7t8G = O25F7t8GE, - O25M9G = O25M9GE, - O25F9G = O25F9GE, - O25M10G = O25M10GE, - O25F10G = O25F10GE, - O25M11G = O25M11GE, - O25F11G = O25F11GE, - O25M12GND = O25M12GNDE, - O25F12GND = O25F12GNDE, - O25MBD = O25MBDE, - O25FBD = O25FBDE, - O25MMD = O25MMDE, - O25FMD = O25FMDE, - O25MPSD = O25MPSDE, - O25FPSD = O25FPSDE, - O25MDD = O25MDDE, - O25FDD = O25FDDE, - NHoLW = NHoLWE, - NHoLB = NHoLBE, - U10nhw = U10nhwE, - B1015nhw = B1015nhwE, - B1520nhw = B1520nhwE, - B2025nhw = B2025nhwE, - B2530nhw = B2530nhwE, - B100125nhw = B100125nhwE, - B125150nhw = B125150nhwE, - B150200nhw = B150200nhwE, - O200nhw = O200nhwE, - U10bih = U10bihE, - B1015bih = B1015bihE, - B1520bih = B1520bihE, - B2025bih = B2025bihE, - B2530bih = B2530bihE) - - # Sum educational attainment categories - # A_{edu} = Less than high school / 12 year / GED - # P_{edu} = Four-year college degree or more - ice_data <- ice_data %>% - dplyr::mutate(A_edu = O25MBD + O25FBD + O25MMD + O25FMD + O25MPSD + - O25FPSD + O25MDD + O25FDD, - P_edu = O25MNSC + O25FNSC + O25MNt4G + O25FNt4G + - O25M5t6G + O25F5t6G + O25M7t8G + O25F7t8G + - O25M9G + O25F9G + O25M10G + O25F10G + - O25M11G + O25F11G + O25M12GND + O25F12GND) - - # Sum income percentile counts - ## A_income (A_{inc}) is the 80th income percentile - ## P_income (P_{inc}) is the 20th income percentile - ## Add "Total, $25,000 to $34,999" for years 2016 and after - ## Remove "Total, $100,000 to $124,999" for years 2016 and after - ## According to U.S. Census Bureau Table A-4a - ## "Selected Measures of Household Income Dispersion: 1967 to 2020" - if (year < 2016) { - ice_data <- ice_data %>% - dplyr::mutate(A_inc = B100125i + B125150i + B150200i + O200i, - P_inc = U10i + B1015i + B1520i + B2025i, - A_wbinc = B100125nhw + B125150nhw + B150200nhw + O200nhw, - P_wbinc = U10bih + B1015bih + B1520bih + B2025bih, - A_wpcinc = B100125nhw + B125150nhw + B150200nhw + O200nhw, - P_wpcinc = U10nhw + B1015nhw + B1520nhw + B2025nhw) - } else { - ice_data <- ice_data %>% - dplyr::mutate(A_inc = B125150i + B150200i + O200i, - P_inc = U10i + B1015i + B1520i + B2025i + B2530i, - A_wbinc = B125150nhw + B150200nhw + O200nhw, - P_wbinc = U10bih + B1015bih + B1520bih + B2025bih + B2530bih, - A_wpcinc = B125150nhw + B150200nhw + O200nhw, - P_wpcinc = U10nhw + B1015nhw + B1520nhw + B2025nhw + B2530nhw) - } - - # Compute ICEs - ## From Kreiger et al. (2016) https://doi.org/10.2105%2FAJPH.2015.302955 - ## ICE_{i} = (A_{i} - P_{i}) / T_{i} - ## Where: - ## A_{i} denotes the count within the lowest extreme (e.g., households in 20th income percentile) - ## P_{i} denotes the count within the highest extreme (e.g., households in 80th income percentile) - ## T_{i} denotes the total population in region i (TotalPop) - - ice_data <- ice_data %>% - dplyr::mutate(ICE_inc = (A_inc - P_inc) / TotalPop_inc, - ICE_edu = (A_edu - P_edu) / TotalPop_edu, - ICE_rewb = (NHoLW - NHoLB) / TotalPop_re, - ICE_wbinc = (A_wbinc - P_wbinc) / TotalPop_inc, - ICE_wpcinc = (A_wpcinc - P_wpcinc) / TotalPop_inc) - - # Warning for missingness of census characteristics - missingYN <- ice_data %>% - dplyr::select(U10i, B1015i, B1520i, B2025i, B2530i, B100125i, B125150i, - B150200i, O200i, O25MNSC, O25FNSC,O25MNt4G, O25FNt4G, - O25M5t6G, O25F5t6G, O25M7t8G, O25F7t8G, O25M9G, O25F9G, - O25M10G, O25F10G, O25M11G, O25F11G, O25M12GND, O25F12GND, - O25MBD, O25FBD, O25MMD, O25FMD, O25MPSD, O25FPSD, O25MDD, - O25FDD, NHoLW, NHoLB, U10nhw, B1015nhw, B1520nhw, - B2025nhw, B2530nhw, B100125nhw, B125150nhw, - B150200nhw, O200nhw, U10bih, B1015bih, B1520bih, B2025bih, - B2530bih, TotalPop_inc, TotalPop_edu, TotalPop_re) %>% - tidyr::pivot_longer(cols = dplyr::everything(), - names_to = "variable", - values_to = "val") %>% - dplyr::group_by(variable) %>% - dplyr::summarise(total = dplyr::n(), - n_missing = sum(is.na(val)), - percent_missing = paste0(round(mean(is.na(val)) * 100, 2), " %")) - - if (quiet == FALSE) { - # Warning for missing census data - if (sum(missingYN$n_missing) > 0) { - message("Warning: Missing census data") + dplyr::mutate( + TotalPop_inc = TotalPopiE, + TotalPop_edu = TotalPopeduE, + TotalPop_re = TotalPopreE, + U10i = U10iE, + B1015i = B1015iE, + B1520i = B1520iE, + B2025i = B2025iE, + B2530i = B2530iE, + B100125i = B100125iE, + B125150i = B125150iE, + B150200i = B150200iE, + O200i = O200iE, + O25MNSC = O25MNSCE, + O25FNSC = O25FNSCE, + O25MNt4G = O25MNt4GE, + O25FNt4G = O25FNt4GE, + O25M5t6G = O25M5t6GE, + O25F5t6G = O25F5t6GE, + O25M7t8G = O25M7t8GE, + O25F7t8G = O25F7t8GE, + O25M9G = O25M9GE, + O25F9G = O25F9GE, + O25M10G = O25M10GE, + O25F10G = O25F10GE, + O25M11G = O25M11GE, + O25F11G = O25F11GE, + O25M12GND = O25M12GNDE, + O25F12GND = O25F12GNDE, + O25MBD = O25MBDE, + O25FBD = O25FBDE, + O25MMD = O25MMDE, + O25FMD = O25FMDE, + O25MPSD = O25MPSDE, + O25FPSD = O25FPSDE, + O25MDD = O25MDDE, + O25FDD = O25FDDE, + NHoLW = NHoLWE, + NHoLB = NHoLBE, + U10nhw = U10nhwE, + B1015nhw = B1015nhwE, + B1520nhw = B1520nhwE, + B2025nhw = B2025nhwE, + B2530nhw = B2530nhwE, + B100125nhw = B100125nhwE, + B125150nhw = B125150nhwE, + B150200nhw = B150200nhwE, + O200nhw = O200nhwE, + U10bih = U10bihE, + B1015bih = B1015bihE, + B1520bih = B1520bihE, + B2025bih = B2025bihE, + B2530bih = B2530bihE + ) + + # Sum educational attainment categories + # A_{edu} = Less than high school / 12 year / GED + # P_{edu} = Four-year college degree or more + ice_data <- ice_data %>% + dplyr::mutate( + A_edu = O25MBD + O25FBD + O25MMD + O25FMD + O25MPSD + O25FPSD + O25MDD + O25FDD, + P_edu = O25MNSC + O25FNSC + O25MNt4G + O25FNt4G + O25M5t6G + O25F5t6G + O25M7t8G + + O25F7t8G + O25M9G + O25F9G + O25M10G + O25F10G + O25M11G + O25F11G + O25M12GND + + O25F12GND + ) + + # Sum income percentile counts + ## A_income (A_{inc}) is the 80th income percentile + ## P_income (P_{inc}) is the 20th income percentile + ## Add 'Total, $25,000 to $34,999' for years 2016 and after + ## Remove 'Total, $100,000 to $124,999' for years 2016 and after + ## According to U.S. Census Bureau Table A-4a + ## 'Selected Measures of Household Income Dispersion: 1967 to 2020' + if (year < 2016) { + ice_data <- ice_data %>% + dplyr::mutate( + A_inc = B100125i + B125150i + B150200i + O200i, + P_inc = U10i + B1015i + B1520i + B2025i, + A_wbinc = B100125nhw + B125150nhw + B150200nhw + O200nhw, + P_wbinc = U10bih + B1015bih + B1520bih + B2025bih, + A_wpcinc = B100125nhw + B125150nhw + B150200nhw + O200nhw, + P_wpcinc = U10nhw + B1015nhw + B1520nhw + B2025nhw + ) + } else { + ice_data <- ice_data %>% + dplyr::mutate( + A_inc = B125150i + B150200i + O200i, + P_inc = U10i + B1015i + B1520i + B2025i + B2530i, + A_wbinc = B125150nhw + B150200nhw + O200nhw, + P_wbinc = U10bih + B1015bih + B1520bih + B2025bih + B2530bih, + A_wpcinc = B125150nhw + B150200nhw + O200nhw, + P_wpcinc = U10nhw + B1015nhw + B1520nhw + B2025nhw + B2530nhw + ) } + + # Compute ICEs + ## From Kreiger et al. (2016) https://doi.org/10.2105%2FAJPH.2015.302955 + ## ICE_{i} = (A_{i} - P_{i}) / T_{i} + ## Where: + ## A_{i} denotes the count within the lowest extreme (e.g., households in 20th income percentile) + ## P_{i} denotes the count within the highest extreme (e.g., households in 80th income percentile) + ## T_{i} denotes the total population in region i (TotalPop) + + ice_data <- ice_data %>% + dplyr::mutate( + ICE_inc = (A_inc - P_inc) / TotalPop_inc, + ICE_edu = (A_edu - P_edu) / TotalPop_edu, + ICE_rewb = (NHoLW - NHoLB) / TotalPop_re, + ICE_wbinc = (A_wbinc - P_wbinc) / TotalPop_inc, + ICE_wpcinc = (A_wpcinc - P_wpcinc) / TotalPop_inc + ) + + # Warning for missingness of census characteristics + missingYN <- ice_data %>% + dplyr::select( + U10i, + B1015i, + B1520i, + B2025i, + B2530i, + B100125i, + B125150i, + B150200i, + O200i, + O25MNSC, + O25FNSC, + O25MNt4G, + O25FNt4G, + O25M5t6G, + O25F5t6G, + O25M7t8G, + O25F7t8G, + O25M9G, + O25F9G, + O25M10G, + O25F10G, + O25M11G, + O25F11G, + O25M12GND, + O25F12GND, + O25MBD, + O25FBD, + O25MMD, + O25FMD, + O25MPSD, + O25FPSD, + O25MDD, + O25FDD, + NHoLW, + NHoLB, + U10nhw, + B1015nhw, + B1520nhw, + B2025nhw, + B2530nhw, + B100125nhw, + B125150nhw, + B150200nhw, + O200nhw, + U10bih, + B1015bih, + B1520bih, + B2025bih, + B2530bih, + TotalPop_inc, + TotalPop_edu, + TotalPop_re + ) %>% + tidyr::pivot_longer( + cols = dplyr::everything(), + names_to = 'variable', + values_to = 'val' + ) %>% + dplyr::group_by(variable) %>% + dplyr::summarise( + total = dplyr::n(), + n_missing = sum(is.na(val)), + percent_missing = paste0(round(mean(is.na(val)) * 100, 2), ' %') + ) + + if (quiet == FALSE) { + # Warning for missing census data + if (sum(missingYN$n_missing) > 0) { + message('Warning: Missing census data') + } + } + + # Format output + if (geo == 'tract') { + ice <- ice_data %>% + dplyr::select( + GEOID, + state, + county, + tract, + ICE_inc, + ICE_edu, + ICE_rewb, + ICE_wbinc, + ICE_wpcinc, + U10i, + B1015i, + B1520i, + B2025i, + B2530i, + B100125i, + B125150i, + B150200i, + O200i, + O25MNSC, + O25FNSC, + O25MNt4G, + O25FNt4G, + O25M5t6G, + O25F5t6G, + O25M7t8G, + O25F7t8G, + O25M9G, + O25F9G, + O25M10G, + O25F10G, + O25M11G, + O25F11G, + O25M12GND, + O25F12GND, + O25MBD, + O25FBD, + O25MMD, + O25FMD, + O25MPSD, + O25FPSD, + O25MDD, + O25FDD, + NHoLW, + NHoLB, + U10nhw, + B1015nhw, + B1520nhw, + B2025nhw, + B2530nhw, + B100125nhw, + B125150nhw, + B150200nhw, + O200nhw, + U10bih, + B1015bih, + B1520bih, + B2025bih, + B2530bih, + TotalPop_inc, + TotalPop_edu, + TotalPop_re + ) + } else { + ice <- ice_data %>% + dplyr::select( + GEOID, + state, + county, + ICE_inc, + ICE_edu, + ICE_rewb, + ICE_wbinc, + ICE_wpcinc, + U10i, + B1015i, + B1520i, + B2025i, + B2530i, + B100125i, + B125150i, + B150200i, + O200i, + O25MNSC, + O25FNSC, + O25MNt4G, + O25FNt4G, + O25M5t6G, + O25F5t6G, + O25M7t8G, + O25F7t8G, + O25M9G, + O25F9G, + O25M10G, + O25F10G, + O25M11G, + O25F11G, + O25M12GND, + O25F12GND, + O25MBD, + O25FBD, + O25MMD, + O25FMD, + O25MPSD, + O25FPSD, + O25MDD, + O25FDD, + NHoLW, + NHoLB, + U10nhw, + B1015nhw, + B1520nhw, + B2025nhw, + B2530nhw, + B100125nhw, + B125150nhw, + B150200nhw, + O200nhw, + U10bih, + B1015bih, + B1520bih, + B2025bih, + B2530bih, + TotalPop_inc, + TotalPop_edu, + TotalPop_re + ) + } + + ice <- ice %>% + dplyr::mutate( + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + out <- list(ice = ice, missing = missingYN) + + return(out) } - - # Format output - if (geo == "tract") { - ice <- ice_data %>% - dplyr::select(GEOID, state, county, tract, - ICE_inc, ICE_edu, ICE_rewb, ICE_wbinc, ICE_wpcinc, - U10i, B1015i, B1520i, B2025i, B2530i, B100125i, B125150i, - B150200i, O200i, O25MNSC, O25FNSC,O25MNt4G, O25FNt4G, - O25M5t6G, O25F5t6G, O25M7t8G, O25F7t8G, O25M9G, O25F9G, - O25M10G, O25F10G, O25M11G, O25F11G, O25M12GND, O25F12GND, - O25MBD, O25FBD, O25MMD, O25FMD, O25MPSD, O25FPSD, O25MDD, - O25FDD, NHoLW, NHoLB, U10nhw, B1015nhw, B1520nhw, - B2025nhw, B2530nhw, B100125nhw, B125150nhw, - B150200nhw, O200nhw, U10bih, B1015bih, B1520bih, B2025bih, - B2530bih, TotalPop_inc, TotalPop_edu, TotalPop_re) - } else { - ice <- ice_data %>% - dplyr::select(GEOID, state, county, - ICE_inc, ICE_edu, ICE_rewb, ICE_wbinc, ICE_wpcinc, - U10i, B1015i, B1520i, B2025i, B2530i, B100125i, B125150i, - B150200i, O200i, O25MNSC, O25FNSC,O25MNt4G, O25FNt4G, - O25M5t6G, O25F5t6G, O25M7t8G, O25F7t8G, O25M9G, O25F9G, - O25M10G, O25F10G, O25M11G, O25F11G, O25M12GND, O25F12GND, - O25MBD, O25FBD, O25MMD, O25FMD, O25MPSD, O25FPSD, O25MDD, - O25FDD, NHoLW, NHoLB, U10nhw, B1015nhw, B1520nhw, - B2025nhw, B2530nhw, B100125nhw, B125150nhw, - B150200nhw, O200nhw, U10bih, B1015bih, B1520bih, B2025bih, - B2530bih, TotalPop_inc, TotalPop_edu, TotalPop_re) - } - - ice <- ice %>% - dplyr::mutate(state = stringr::str_trim(state), - county = stringr::str_trim(county)) %>% - dplyr::arrange(GEOID) %>% - dplyr::as_tibble() - - out <- list(ice = ice, - missing = missingYN) - - return(out) -} diff --git a/R/messer.R b/R/messer.R index 01b71a5..614ffdd 100644 --- a/R/messer.R +++ b/R/messer.R @@ -1,8 +1,8 @@ -#' Neighborhood Deprivation Index based on Messer _et al._ (2006) +#' Neighborhood Deprivation Index based on Messer et al. (2006) #' #' Compute the aspatial Neighborhood Deprivation Index (Messer). #' -#' @param geo Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}. +#' @param geo Character string specifying the geography of the data either census tracts \code{geo = 'tract'} (the default) or counties \code{geo = 'county'}. #' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2010 onward are currently available. #' @param imp Logical. If TRUE, will impute missing census characteristics within the internal \code{\link[psych]{principal}}. If FALSE (the default), will not impute. #' @param quiet Logical. If TRUE, will display messages about potential missing census information and the proportion of variance explained by principal component analysis. The default is FALSE. @@ -10,7 +10,7 @@ #' @param df Optional. Pass a pre-formatted \code{'dataframe'} or \code{'tibble'} with the desired variables through the function. Bypasses the data obtained by \code{\link[tidycensus]{get_acs}}. The default is NULL. See Details below. #' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics #' -#' @details This function will compute the aspatial Neighborhood Deprivation Index (NDI) of U.S. census tracts or counties for a specified geographical referent (e.g., US-standardized) based on Messer _et al._ (2006) \doi{10.1007/s11524-006-9094-x}. +#' @details This function will compute the aspatial Neighborhood Deprivation Index (NDI) of U.S. census tracts or counties for a specified geographical referent (e.g., US-standardized) based on Messer et al. (2006) \doi{10.1007/s11524-006-9094-x}. #' #' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for computation involving a principal component analysis with the \code{\link[psych]{principal}} function. The yearly estimates are available for 2010 and after when all census characteristics became available. The eight characteristics are: #' \itemize{ @@ -27,11 +27,11 @@ #' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify the referent for standardizing the NDI (Messer) values. For example, if all U.S. states are specified for the \code{state} argument, then the output would be a U.S.-standardized index. #' -#' The continuous NDI (Messer) values are z-transformed, i.e., "standardized," and the categorical NDI (Messer) values are quartiles of the standardized continuous NDI (Messer) values. +#' The continuous NDI (Messer) values are z-transformed, i.e., 'standardized,' and the categorical NDI (Messer) values are quartiles of the standardized continuous NDI (Messer) values. #' #' Check if the proportion of variance explained by the first principal component is high (more than 0.5). #' -#' Users can bypass \code{\link[tidycensus]{get_acs}} by specifying a pre-formatted data frame or tibble using the \code{df} argument. This function will compute an index using the first component of a principal component analysis (PCA) with a Varimax rotation (the default for \code{\link[psych]{principal}}) and only one factor (note: PCA set-up not unspecified in Messer _et al._ (2006)). The recommended structure of the data frame or tibble is an ID (e.g., GEOID) in the first feature (column), followed by the variables of interest (in any order) and no additional information (e.g., omit state or county names from the \code{df} argument input). +#' Users can bypass \code{\link[tidycensus]{get_acs}} by specifying a pre-formatted data frame or tibble using the \code{df} argument. This function will compute an index using the first component of a principal component analysis (PCA) with a Varimax rotation (the default for \code{\link[psych]{principal}}) and only one factor (note: PCA set-up not unspecified in Messer et al. (2006)). The recommended structure of the data frame or tibble is an ID (e.g., GEOID) in the first feature (column), followed by the variables of interest (in any order) and no additional information (e.g., omit state or county names from the \code{df} argument input). #' #' @return An object of class 'list'. This is a named list with the following components: #' @@ -59,232 +59,333 @@ #' # Wrapped in \dontrun{} because these examples require a Census API key. #' #' # Tract-level metric (2020) -#' messer(geo = "tract", state = "GA", year = 2020) +#' messer(geo = 'tract', state = 'GA', year = 2020) #' #' # Impute NDI for tracts (2020) with missing census information (median values) -#' messer(state = "tract", "GA", year = 2020, imp = TRUE) +#' messer(state = 'tract', 'GA', year = 2020, imp = TRUE) #' #' # County-level metric (2020) -#' messer(geo = "county", state = "GA", year = 2020) +#' messer(geo = 'county', state = 'GA', year = 2020) #' #' } #' -messer <- function(geo = "tract", year = 2020, imp = FALSE, quiet = FALSE, round_output = FALSE, df = NULL, ...) { +messer <- function(geo = 'tract', + year = 2020, + imp = FALSE, + quiet = FALSE, + round_output = FALSE, + df = NULL, + ...) { # Check arguments - if (!is.null(df) & !inherits(df, c("tbl_df", "tbl", "data.frame"))) { stop("'df' must be class 'data.frame' or 'tbl'") } + if (!is.null(df) & + !inherits(df, c('tbl_df', 'tbl', 'data.frame'))) { + stop("df' must be class 'data.frame' or 'tbl'") + } if (is.null(df)) { - # Check additional arguments - match.arg(geo, choices = c("county", "tract")) + match.arg(geo, choices = c('county', 'tract')) stopifnot(is.numeric(year), year >= 2010) # all variables available 2010 onward # Select census variables - vars <- c(PctMenMgmtBusScArti_num1 = "C24030_018", PctMenMgmtBusScArti_num2 = "C24030_019", - PctMenMgmtBusScArti_den = "C24030_002", - PctCrwdHH_num1 = "B25014_005", PctCrwdHH_num2 = "B25014_006", - PctCrwdHH_num3 = "B25014_007", PctCrwdHH_num4 = "B25014_011", - PctCrwdHH_num5 = "B25014_012", PctCrwdHH_num6 = "B25014_013", - PctCrwdHH_den = "B25014_001", - PctHHPov_num = "B17017_002", PctHHPov_den = "B17017_001", - PctFemHeadKids_num1 = "B25115_012", PctFemHeadKids_num2 = "B25115_025", - PctFemHeadKids_den = "B25115_001", - PctPubAsst_num = "B19058_002", PctPubAsst_den = "B19058_001", - PctHHUnder30K_num1 = "B19001_002", PctHHUnder30K_num2 = "B19001_003", - PctHHUnder30K_num3 = "B19001_004", PctHHUnder30K_num4 = "B19001_005", - PctHHUnder30K_num5 = "B19001_006", PctHHUnder30K_den = "B19001_001", - PctEducLessThanHS_num = "B06009_002", PctEducLessThanHS_den = "B06009_001", - PctUnemp_num = "B23025_005", PctUnemp_den = "B23025_003") + vars <- + c( + PctMenMgmtBusScArti_num1 = 'C24030_018', + PctMenMgmtBusScArti_num2 = 'C24030_019', + PctMenMgmtBusScArti_den = 'C24030_002', + PctCrwdHH_num1 = 'B25014_005', + PctCrwdHH_num2 = 'B25014_006', + PctCrwdHH_num3 = 'B25014_007', + PctCrwdHH_num4 = 'B25014_011', + PctCrwdHH_num5 = 'B25014_012', + PctCrwdHH_num6 = 'B25014_013', + PctCrwdHH_den = 'B25014_001', + PctHHPov_num = 'B17017_002', + PctHHPov_den = 'B17017_001', + PctFemHeadKids_num1 = 'B25115_012', + PctFemHeadKids_num2 = 'B25115_025', + PctFemHeadKids_den = 'B25115_001', + PctPubAsst_num = 'B19058_002', + PctPubAsst_den = 'B19058_001', + PctHHUnder30K_num1 = 'B19001_002', + PctHHUnder30K_num2 = 'B19001_003', + PctHHUnder30K_num3 = 'B19001_004', + PctHHUnder30K_num4 = 'B19001_005', + PctHHUnder30K_num5 = 'B19001_006', + PctHHUnder30K_den = 'B19001_001', + PctEducLessThanHS_num = 'B06009_002', + PctEducLessThanHS_den = 'B06009_001', + PctUnemp_num = 'B23025_005', + PctUnemp_den = 'B23025_003' + ) if (year == 2010) { # Select census variables - vars <- c(vars[-c(26,27)], PctUnemp_den = "B23001_001", - PctUnemp_1619M = "B23001_008", PctUnemp_2021M = "B23001_015", - PctUnemp_2224M = "B23001_022", PctUnemp_2529M = "B23001_029", - PctUnemp_3034M = "B23001_036", PctUnemp_3544M = "B23001_043", - PctUnemp_4554M = "B23001_050", PctUnemp_5559M = "B23001_057", - PctUnemp_6061M = "B23001_064", PctUnemp_6264M = "B23001_071", - PctUnemp_6569M = "B23001_076", PctUnemp_7074M = "B23001_081", - PctUnemp_75upM = "B23001_086", PctUnemp_1619F = "B23001_094", - PctUnemp_2021F = "B23001_101", PctUnemp_2224F = "B23001_108", - PctUnemp_2529F = "B23001_115", PctUnemp_3034F = "B23001_122", - PctUnemp_3544F = "B23001_129", PctUnemp_4554F = "B23001_136", - PctUnemp_5559F = "B23001_143", PctUnemp_6061F = "B23001_150", - PctUnemp_6264F = "B23001_157", PctUnemp_6569F = "B23001_162", - PctUnemp_7074F = "B23001_167", PctUnemp_75upF = "B23001_172") + vars <- c( + vars[-c(26, 27)], + PctUnemp_den = 'B23001_001', + PctUnemp_1619M = 'B23001_008', + PctUnemp_2021M = 'B23001_015', + PctUnemp_2224M = 'B23001_022', + PctUnemp_2529M = 'B23001_029', + PctUnemp_3034M = 'B23001_036', + PctUnemp_3544M = 'B23001_043', + PctUnemp_4554M = 'B23001_050', + PctUnemp_5559M = 'B23001_057', + PctUnemp_6061M = 'B23001_064', + PctUnemp_6264M = 'B23001_071', + PctUnemp_6569M = 'B23001_076', + PctUnemp_7074M = 'B23001_081', + PctUnemp_75upM = 'B23001_086', + PctUnemp_1619F = 'B23001_094', + PctUnemp_2021F = 'B23001_101', + PctUnemp_2224F = 'B23001_108', + PctUnemp_2529F = 'B23001_115', + PctUnemp_3034F = 'B23001_122', + PctUnemp_3544F = 'B23001_129', + PctUnemp_4554F = 'B23001_136', + PctUnemp_5559F = 'B23001_143', + PctUnemp_6061F = 'B23001_150', + PctUnemp_6264F = 'B23001_157', + PctUnemp_6569F = 'B23001_162', + PctUnemp_7074F = 'B23001_167', + PctUnemp_75upF = 'B23001_172' + ) # Acquire NDI variables - ndi_data <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo, - year = year, - output = "wide", - variables = vars, ...))) + ndi_data <- suppressMessages(suppressWarnings( + tidycensus::get_acs( + geography = geo, + year = year, + output = 'wide', + variables = vars, + ... + ) + )) - if (geo == "tract") { + if (geo == 'tract') { ndi_data <- ndi_data %>% - tidyr::separate(NAME, into = c("tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]","", tract)) + tidyr::separate(NAME, into = c('tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate(tract = gsub('[^0-9\\.]', '', tract)) } else { - ndi_data <- ndi_data %>% tidyr::separate(NAME, into = c("county", "state"), sep = ",") + ndi_data <- + ndi_data %>% tidyr::separate(NAME, into = c('county', 'state'), sep = ',') } ndi_data <- ndi_data %>% - dplyr::mutate(OCC = (PctMenMgmtBusScArti_num1E + PctMenMgmtBusScArti_num2E) / PctMenMgmtBusScArti_denE, - CWD = (PctCrwdHH_num1E + PctCrwdHH_num2E + PctCrwdHH_num3E + - PctCrwdHH_num4E + PctCrwdHH_num5E + PctCrwdHH_num6E) / PctCrwdHH_denE, - POV = PctHHPov_numE / PctHHPov_denE, - FHH = (PctFemHeadKids_num1E + PctFemHeadKids_num2E) / PctFemHeadKids_denE, - PUB = PctPubAsst_numE / PctPubAsst_denE, - U30 = (PctHHUnder30K_num1E + PctHHUnder30K_num2E + PctHHUnder30K_num3E + - PctHHUnder30K_num4E + PctHHUnder30K_num5E) / PctHHUnder30K_denE, - EDU = PctEducLessThanHS_numE / PctEducLessThanHS_denE, - EMP = (PctUnemp_1619ME + PctUnemp_2021ME + - PctUnemp_2224ME + PctUnemp_2529ME + - PctUnemp_4554ME + PctUnemp_5559ME + - PctUnemp_6061ME + PctUnemp_6264ME + - PctUnemp_6569ME + PctUnemp_7074ME + - PctUnemp_75upME + PctUnemp_1619FE + - PctUnemp_2021FE + PctUnemp_2224FE + - PctUnemp_2529FE + PctUnemp_4554FE + - PctUnemp_5559FE + PctUnemp_6061FE + - PctUnemp_6264FE + PctUnemp_6569FE + - PctUnemp_7074FE + PctUnemp_75upME) / PctUnemp_denE) + dplyr::mutate( + OCC = (PctMenMgmtBusScArti_num1E + PctMenMgmtBusScArti_num2E) / PctMenMgmtBusScArti_denE, + CWD = ( + PctCrwdHH_num1E + PctCrwdHH_num2E + PctCrwdHH_num3E + PctCrwdHH_num4E + + PctCrwdHH_num5E + PctCrwdHH_num6E + ) / PctCrwdHH_denE, + POV = PctHHPov_numE / PctHHPov_denE, + FHH = (PctFemHeadKids_num1E + PctFemHeadKids_num2E) / PctFemHeadKids_denE, + PUB = PctPubAsst_numE / PctPubAsst_denE, + U30 = ( + PctHHUnder30K_num1E + PctHHUnder30K_num2E + PctHHUnder30K_num3E + PctHHUnder30K_num4E + + PctHHUnder30K_num5E + ) / PctHHUnder30K_denE, + EDU = PctEducLessThanHS_numE / PctEducLessThanHS_denE, + EMP = ( + PctUnemp_1619ME + PctUnemp_2021ME + + PctUnemp_2224ME + PctUnemp_2529ME + + PctUnemp_4554ME + PctUnemp_5559ME + + PctUnemp_6061ME + PctUnemp_6264ME + + PctUnemp_6569ME + PctUnemp_7074ME + + PctUnemp_75upME + PctUnemp_1619FE + + PctUnemp_2021FE + PctUnemp_2224FE + + PctUnemp_2529FE + PctUnemp_4554FE + + PctUnemp_5559FE + PctUnemp_6061FE + + PctUnemp_6264FE + PctUnemp_6569FE + + PctUnemp_7074FE + PctUnemp_75upME + ) / PctUnemp_denE + ) } else { # Acquire NDI variables - ndi_data <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo, - year = year, - output = "wide", - variables = vars, ...))) - - if (geo == "tract") { + ndi_data <- suppressMessages(suppressWarnings( + tidycensus::get_acs( + geography = geo, + year = year, + output = 'wide', + variables = vars, + ... + ) + )) + + if (geo == 'tract') { ndi_data <- ndi_data %>% - tidyr::separate(NAME, into = c("tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]","", tract)) + tidyr::separate(NAME, into = c('tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate(tract = gsub('[^0-9\\.]', '', tract)) } else { - ndi_data <- ndi_data %>% tidyr::separate(NAME, into = c("county", "state"), sep = ",") + ndi_data <- + ndi_data %>% tidyr::separate(NAME, into = c('county', 'state'), sep = ',') } ndi_data <- ndi_data %>% - dplyr::mutate(OCC = (PctMenMgmtBusScArti_num1E + PctMenMgmtBusScArti_num2E) / PctMenMgmtBusScArti_denE, - CWD = (PctCrwdHH_num1E + PctCrwdHH_num2E + PctCrwdHH_num3E + - PctCrwdHH_num4E + PctCrwdHH_num5E + PctCrwdHH_num6E) / PctCrwdHH_denE, - POV = PctHHPov_numE / PctHHPov_denE, - FHH = (PctFemHeadKids_num1E + PctFemHeadKids_num2E) / PctFemHeadKids_denE, - PUB = PctPubAsst_numE / PctPubAsst_denE, - U30 = (PctHHUnder30K_num1E + PctHHUnder30K_num2E + PctHHUnder30K_num3E + - PctHHUnder30K_num4E + PctHHUnder30K_num5E) / PctHHUnder30K_denE, - EDU = PctEducLessThanHS_numE / PctEducLessThanHS_denE, - EMP = PctUnemp_numE / PctUnemp_denE) + dplyr::mutate( + OCC = (PctMenMgmtBusScArti_num1E + PctMenMgmtBusScArti_num2E) / PctMenMgmtBusScArti_denE, + CWD = ( + PctCrwdHH_num1E + PctCrwdHH_num2E + PctCrwdHH_num3E + PctCrwdHH_num4E + + PctCrwdHH_num5E + PctCrwdHH_num6E + ) / PctCrwdHH_denE, + POV = PctHHPov_numE / PctHHPov_denE, + FHH = (PctFemHeadKids_num1E + PctFemHeadKids_num2E) / PctFemHeadKids_denE, + PUB = PctPubAsst_numE / PctPubAsst_denE, + U30 = ( + PctHHUnder30K_num1E + PctHHUnder30K_num2E + PctHHUnder30K_num3E + PctHHUnder30K_num4E + + PctHHUnder30K_num5E + ) / PctHHUnder30K_denE, + EDU = PctEducLessThanHS_numE / PctEducLessThanHS_denE, + EMP = PctUnemp_numE / PctUnemp_denE + ) } # Generate NDI - ndi_data_pca <- ndi_data %>% + ndi_data_pca <- ndi_data %>% dplyr::select(OCC, CWD, POV, FHH, PUB, U30, EDU, EMP) } else { - # If inputing pre-formatted data: + # If inputing pre-formatted data: ndi_data <- dplyr::as_tibble(df) - ndi_data_pca <- df[ , -1] # omits the first feature (column) typically an ID (e.g., GEOID or FIPS) + # omit the first feature (column) typically an ID (e.g., GEOID or FIPS) + ndi_data_pca <- df[,-1] } # Replace infinite values as zero (typically because denominator is zero) - ndi_data_pca <- do.call(data.frame, - lapply(ndi_data_pca, - function(x) replace(x, is.infinite(x), 0))) + ndi_data_pca <- do.call( + data.frame, + lapply(ndi_data_pca, function(x) replace(x, is.infinite(x), 0)) + ) # Run principal component analysis - pca <- psych::principal(ndi_data_pca, - nfactors = 1, - n.obs = nrow(ndi_data_pca), - covar = FALSE, - scores = TRUE, - missing = imp) + pca <- psych::principal( + ndi_data_pca, + nfactors = 1, + n.obs = nrow(ndi_data_pca), + covar = FALSE, + scores = TRUE, + missing = imp + ) # Warning for missingness of census characteristics missingYN <- ndi_data_pca %>% - tidyr::pivot_longer(cols = dplyr::everything(), - names_to = "variable", - values_to = "val") %>% + tidyr::pivot_longer( + cols = dplyr::everything(), + names_to = 'variable', + values_to = 'val' + ) %>% dplyr::group_by(variable) %>% - dplyr::summarise(total = dplyr::n(), - n_missing = sum(is.na(val)), - percent_missing = paste0(round(mean(is.na(val)) * 100, 2), " %")) + dplyr::summarise( + total = dplyr::n(), + n_missing = sum(is.na(val)), + percent_missing = paste0(round(mean(is.na(val)) * 100, 2), ' %') + ) if (quiet == FALSE) { # Warning for missing census data - if (sum(missingYN$n_missing) > 0) { - message("Warning: Missing census data") + if (sum(missingYN$n_missing) > 0) { + message('Warning: Missing census data') } # Warning for proportion of variance explained by PC1 if (pca$Vaccounted[2] < 0.50) { - message("Warning: The proportion of variance explained by PC1 is less than 0.50.") + message('Warning: The proportion of variance explained by PC1 is less than 0.50.') } } # NDI quartiles NDIQuart <- data.frame(PC1 = pca$scores) %>% - dplyr::mutate(NDI = PC1 / pca$value[1]^2, - NDIQuart = cut(NDI, - breaks = stats::quantile(NDI, - probs = c(0, 0.25, 0.50, 0.75, 1), - na.rm = TRUE), - labels = c("1-Least deprivation", "2-BelowAvg deprivation", - "3-AboveAvg deprivation", "4-Most deprivation"), - include.lowest = TRUE), - NDIQuart = factor(replace(as.character(NDIQuart), - is.na(NDIQuart), - "9-NDI not avail"), - c(levels(NDIQuart), "9-NDI not avail"))) %>% + dplyr::mutate( + NDI = PC1 / pca$value[1] ^ 2, + NDIQuart = cut( + NDI, + breaks = stats::quantile(NDI, probs = c(0, 0.25, 0.50, 0.75, 1), na.rm = TRUE), + labels = c( + '1-Least deprivation', + '2-BelowAvg deprivation', + '3-AboveAvg deprivation', + '4-Most deprivation' + ), + include.lowest = TRUE + ), + NDIQuart = factor( + replace(as.character(NDIQuart), is.na(NDIQuart), '9-NDI not avail'), + c(levels(NDIQuart), '9-NDI not avail') + ) + ) %>% dplyr::select(NDI, NDIQuart) if (is.null(df)) { # Format output if (round_output == TRUE) { ndi <- cbind(ndi_data, NDIQuart) %>% - dplyr::mutate(OCC = round(OCC, digits = 1), - CWD = round(CWD, digits = 1), - POV = round(POV, digits = 1), - FHH = round(FHH, digits = 1), - PUB = round(PUB, digits = 1), - U30 = round(U30, digits = 1), - EDU = round(EDU, digits = 1), - EMP = round(EMP, digits = 1), - NDI = round(NDI, digits = 4)) + dplyr::mutate( + OCC = round(OCC, digits = 1), + CWD = round(CWD, digits = 1), + POV = round(POV, digits = 1), + FHH = round(FHH, digits = 1), + PUB = round(PUB, digits = 1), + U30 = round(U30, digits = 1), + EDU = round(EDU, digits = 1), + EMP = round(EMP, digits = 1), + NDI = round(NDI, digits = 4) + ) } else { ndi <- cbind(ndi_data, NDIQuart) } - if (geo == "tract") { + if (geo == 'tract') { ndi <- ndi %>% - dplyr::select(GEOID, - state, - county, - tract, - NDI, NDIQuart, - OCC, CWD, POV, FHH, PUB, U30, EDU, EMP) + dplyr::select( + GEOID, + state, + county, + tract, + NDI, + NDIQuart, + OCC, + CWD, + POV, + FHH, + PUB, + U30, + EDU, + EMP + ) } else { ndi <- ndi %>% - dplyr::select(GEOID, - state, - county, - NDI, NDIQuart, - OCC, CWD, POV, FHH, PUB, U30, EDU, EMP) + dplyr::select( + GEOID, + state, + county, + NDI, + NDIQuart, + OCC, + CWD, + POV, + FHH, + PUB, + U30, + EDU, + EMP + ) } ndi <- ndi %>% - dplyr::mutate(state = stringr::str_trim(state), - county = stringr::str_trim(county)) %>% + dplyr::mutate( + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) %>% dplyr::arrange(GEOID) %>% dplyr::as_tibble() } else { - ndi <- cbind(df[ , 1], NDIQuart, df[ , 2:ncol(df)]) - ndi <- dplyr::as_tibble(ndi[order(ndi[ , 1]), ]) + ndi <- cbind(df[, 1], NDIQuart, df[, 2:ncol(df)]) + ndi <- dplyr::as_tibble(ndi[order(ndi[, 1]),]) } - out <- list(ndi = ndi, - pca = pca, - missing = missingYN) + out <- list(ndi = ndi, pca = pca, missing = missingYN) return(out) } diff --git a/R/ndi-package.R b/R/ndi-package.R new file mode 100644 index 0000000..efc4dc6 --- /dev/null +++ b/R/ndi-package.R @@ -0,0 +1,64 @@ +#' The ndi Package: Neighborhood Deprivation Indices +#' +#' Computes various metrics of socio-economic deprivation and disparity in the United States based on information available from the U.S. Census Bureau. +#' +#' @details The 'ndi' package computes various metrics of socio-economic deprivation and disparity in the United States. Some metrics are considered "spatial" because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other metrics are "aspatial" because they only consider the value within each census geography. Two types of aspatial neighborhood deprivation indices (NDI) are available: (1) based on Messer et al. (2006) \doi{10.1007/s11524-006-9094-x} and (2) based on Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} who use variables chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x}. Both are a decomposition of multiple demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates (ACS-5; 2006-2010 onward). Using data from the ACS-5 (2005-2009 onward), the package can also compute the (1) spatial Racial Isolation Index (RI) based on Anthopolos et al. (2011) \doi{10.1016/j.sste.2011.06.002}, (2) spatial Educational Isolation Index (EI) based on Bravo et al. (2021) \doi{10.3390/ijerph18179384}, (3) aspatial Index of Concentration at the Extremes (ICE) based on Feldman et al. (2015) \doi{10.1136/jech-2015-205728} and Krieger et al. (2016) \doi{10.2105/AJPH.2015.302955}, (4) aspatial racial/ethnic Dissimilarity Index based on Duncan & Duncan (1955) \doi{10.2307/2088328}, (5) aspatial income or racial/ethnic Atkinson Index based on Atkinson (1970) \doi{10.1016/0022-0531(70)90039-6}, (6) aspatial racial/ethnic Isolation Index (II) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell (1954) \doi{10.2307/2574118}, (7) aspatial racial/ethnic Correlation Ratio based on Bell (1954) \doi{10.2307/2574118} and White (1986) \doi{10.2307/3644339}, (8) aspatial racial/ethnic Location Quotient (LQ) based on Merton (1939) \doi{10.2307/2084686} and Sudano et al. (2013) \doi{10.1016/j.healthplace.2012.09.015}, (9) aspatial racial/ethnic Local Exposure and Isolation metric based on Bemanian & Beyer (2017) , and (10) aspatial racial/ethnic Delta based on Hoover (1941) and Duncan et al. (1961; LC:60007089). Also using data from the ACS-5 (2005-2009 onward), the package can retrieve the aspatial Gini Index based on Gini (1921) \doi{10.2307/2223319}. +#' +#' Key content of the 'ndi' package include:\cr +#' +#' \bold{Metrics of Socio-Economic Deprivation and Disparity} +#' +#' \code{\link{anthopolos}} Computes the spatial Racial Isolation Index (RI) based on Anthopolos (2011) \doi{10.1016/j.sste.2011.06.002}. +#' +#' \code{\link{atkinson}} Computes the aspatial income or racial/ethnic Atkinson Index (AI) based on Atkinson (1970) \doi{10.1016/0022-0531(70)90039-6}. +#' +#' \code{\link{bell}} Computes the aspatial racial/ethnic Isolation Index (II) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell (1954) \doi{10.2307/2574118}. +#' +#' \code{\link{bemanian_beyer}} Computes the aspatial racial/ethnic Local Exposure and Isolation (LEx/Is) metric based on Bemanian & Beyer (2017) \doi{10.1158/1055-9965.EPI-16-0926}. +#' +#' \code{\link{bravo}} Computes the spatial Educational Isolation Index (EI) based on Bravo (2021) \doi{10.3390/ijerph18179384}. +#' +#' \code{\link{duncan}} Computes the aspatial racial/ethnic Dissimilarity Index (DI) based on Duncan & Duncan (1955) \doi{10.2307/2088328}. +#' +#' \code{\link{gini}} Retrieves the aspatial Gini Index based on Gini (1921) \doi{10.2307/2223319}. +#' +#' \code{\link{hoover}} Computes the aspatial racial/ethnic Delta (DEL) based on Hoover (1941) \doi{doi:10.1017/S0022050700052980} and Duncan et al. (1961; LC:60007089). +#' +#' \code{\link{krieger}} Computes the aspatial Index of Concentration at the Extremes based on Feldman et al. (2015) \doi{10.1136/jech-2015-205728} and Krieger et al. (2016) \doi{10.2105/AJPH.2015.302955}. +#' +#' \code{\link{messer}} Computes the aspatial Neighborhood Deprivation Index (NDI) based on Messer et al. (2006) \doi{10.1007/s11524-006-9094-x}. +#' +#' \code{\link{powell_wiley}} Computes the aspatial Neighborhood Deprivation Index (NDI) based on Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} who use variables chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x}. +#' +#' \code{\link{sudano}} Computes the aspatial racial/ethnic Location Quotient (LQ) based on Merton (1939) \doi{10.2307/2084686} and Sudano et al. (2013) \doi{10.1016/j.healthplace.2012.09.015}. +#' +#' \code{\link{white}} Computes the aspatial racial/ethnic Correlation Ratio (V) based on Bell (1954) \doi{10.2307/2574118} and White (1986) \doi{10.2307/3644339}. +#' +#' \bold{Pre-formatted U.S. Census Data} +#' +#' \code{\link{DCtracts2020}} A sample dataset containing information about U.S. Census American Community Survey 5-year estimate data for the District of Columbia census tracts (2020). The data are obtained from the \code{\link[tidycensus]{get_acs}} function and formatted for the \code{\link{messer}} and \code{\link{powell_wiley}} functions input. +#' +#' @name ndi-package +#' @aliases ndi-package ndi +#' +#' @section Dependencies: The 'ndi' package relies heavily upon \code{\link{tidycensus}} to retrieve data from the U.S. Census Bureau American Community Survey five-year estimates and the \code{\link{psych}} for computing the neighborhood deprivation indices. The \code{\link{messer}} function builds upon code developed by Hruska et al. (2022) \doi{10.17605/OSF.IO/M2SAV} by fictionalizing, adding the percent of households earning <$30,000 per year to the NDI computation, and providing the option for computing the ACS-5 2006-2010 NDI values. There is no code companion to compute NDI included in Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} or Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002}, but the package author worked directly with the Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} authors to replicate their SAS code in R. The spatial metrics RI and EI rely on the \code{\link{sf}} and \code{\link{Matrix}} packages to compute the geospatial adjacency matrix between census geographies. Internal function to calculate AI is based on \code{\link[DescTools]{Atkinson}} function. There is no code companion to compute RI, EI, DI, II, V, LQ, or LEx/Is included in Anthopolos et al. (2011) \doi{10.1016/j.sste.2011.06.002}, Bravo et al. (2021) \doi{10.3390/ijerph18179384}, Duncan & Duncan (1955) \doi{10.2307/2088328}, Bell (1954) \doi{10.2307/2574118}, White (1986) \doi{10.2307/3644339}, Sudano et al. (2013) \doi{10.1016/j.healthplace.2012.09.015}, or Bemanian & Beyer (2017) \doi{10.1158/1055-9965.EPI-16-0926}, respectively. +#' +#' @author Ian D. Buller\cr \emph{Social & Scientific Systems, Inc., a DLH Corporation Holding Company, Bethesda, Maryland, USA (current); Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland, USA (original).} \cr +#' +#' Maintainer: I.D.B. \email{ian.buller@@alumni.emory.edu} +#' +#' @keywords internal +'_PACKAGE' + +#' @import dplyr +#' @importFrom car logit +#' @importFrom MASS ginv +#' @importFrom Matrix sparseMatrix +#' @importFrom psych alpha principal +#' @importFrom sf st_drop_geometry st_geometry st_intersects +#' @importFrom stats complete.cases cor cov2cor loadings median na.omit promax quantile sd setNames +#' @importFrom stringr str_trim +#' @importFrom tidycensus get_acs +#' @importFrom tidyr pivot_longer separate +#' @importFrom utils stack +NULL diff --git a/R/package.R b/R/package.R deleted file mode 100644 index 6113eef..0000000 --- a/R/package.R +++ /dev/null @@ -1,63 +0,0 @@ -#' The ndi Package: Neighborhood Deprivation Indices -#' -#' Computes various metrics of socio-economic deprivation and disparity in the United States based on information available from the U.S. Census Bureau. -#' -#' @details The 'ndi' package computes various metrics of socio-economic deprivation and disparity in the United States. Some metrics are considered "spatial" because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other metrics are "aspatial" because they only consider the value within each census geography. Two types of aspatial neighborhood deprivation indices (NDI) are available: (1) based on Messer _et al._ (2006) \doi{10.1007/s11524-006-9094-x} and (2) based on Andrews _et al._ (2020) \doi{10.1080/17445647.2020.1750066} and Slotman _et al._ (2022) \doi{10.1016/j.dib.2022.108002} who use variables chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x}. Both are a decomposition of multiple demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates (ACS-5; 2006-2010 onward). Using data from the ACS-5 (2005-2009 onward), the package can also compute the (1) spatial Racial Isolation Index (RI) based on Anthopolos _et al._ (2011) \doi{10.1016/j.sste.2011.06.002}, (2) spatial Educational Isolation Index (EI) based on Bravo _et al._ (2021) \doi{10.3390/ijerph18179384}, (3) aspatial Index of Concentration at the Extremes (ICE) based on Feldman _et al._ (2015) \doi{10.1136/jech-2015-205728} and Krieger _et al._ (2016) \doi{10.2105/AJPH.2015.302955}, (4) aspatial racial/ethnic Dissimilarity Index based on Duncan & Duncan (1955) \doi{10.2307/2088328}, (5) aspatial income or racial/ethnic Atkinson Index based on Atkinson (1970) \doi{10.1016/0022-0531(70)90039-6}, (6) aspatial racial/ethnic Isolation Index (II) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell (1954) \doi{10.2307/2574118}, (7) aspatial racial/ethnic Correlation Ratio based on Bell (1954) \doi{10.2307/2574118} and White (1986) \doi{10.2307/3644339}, and (8) aspatial racial/ethnic Location Quotient (LQ) based on Merton (1939) \doi{10.2307/2084686} and Sudano _et al._ (2013) \doi{10.1016/j.healthplace.2012.09.015}. Also using data from the ACS-5 (2005-2009 onward), the package can retrieve the aspatial Gini Index based on Gini (1921) \doi{10.2307/2223319}. -#' -#' Key content of the 'ndi' package include:\cr -#' -#' \bold{Metrics of Socio-Economic Deprivation and Disparity} -#' -#' \code{\link{anthopolos}} Computes the spatial Racial Isolation Index (RI) based on Anthopolos (2011) \doi{10.1016/j.sste.2011.06.002}. -#' -#' \code{\link{atkinson}} Computes the aspatial income or racial/ethnic Atkinson Index (AI) based on Atkinson (1970) \doi{10.1016/0022-0531(70)90039-6}. -#' -#' \code{\link{bell}} Computes the aspatial racial/ethnic Isolation Index (II) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell (1954) \doi{10.2307/2574118}. -#' -#' \code{\link{bemanian_beyer}} Computes the aspatial racial/ethnic Local Exposure and Isolation (LEx/Is) metric based on Bemanian & Beyer (2017) \doi{10.1158/1055-9965.EPI-16-0926}. -#' -#' \code{\link{bravo}} Computes the spatial Educational Isolation Index (EI) based on Bravo (2021) \doi{10.3390/ijerph18179384}. -#' -#' \code{\link{duncan}} Computes the aspatial racial/ethnic Dissimilarity Index (DI) based on Duncan & Duncan (1955) \doi{10.2307/2088328}. -#' -#' \code{\link{gini}} Retrieves the aspatial Gini Index based on Gini (1921) \doi{10.2307/2223319}. -#' -#' \code{\link{krieger}} Computes the aspatial Index of Concentration at the Extremes based on Feldman _et al._ (2015) \doi{10.1136/jech-2015-205728} and Krieger _et al._ (2016) \doi{10.2105/AJPH.2015.302955}. -#' -#' \code{\link{messer}} Computes the aspatial Neighborhood Deprivation Index (NDI) based on Messer _et al._ (2006) \doi{10.1007/s11524-006-9094-x}. -#' -#' \code{\link{powell_wiley}} Computes the aspatial Neighborhood Deprivation Index (NDI) based on Andrews _et al._ (2020) \doi{10.1080/17445647.2020.1750066} and Slotman _et al._ (2022) \doi{10.1016/j.dib.2022.108002} who use variables chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x}. -#' -#' \code{\link{sudano}} Computes the aspatial racial/ethnic Location Quotient (LQ) based on Merton (1939) \doi{10.2307/2084686} and Sudano _et al._ (2013) \doi{10.1016/j.healthplace.2012.09.015}. -#' -#' \code{\link{white}} Computes the aspatial racial/ethnic Correlation Ratio (V) based on Bell (1954) \doi{10.2307/2574118} and White (1986) \doi{10.2307/3644339}. -#' -#' \bold{Pre-formatted U.S. Census Data} -#' -#' \code{\link{DCtracts2020}} A sample dataset containing information about U.S. Census American Community Survey 5-year estimate data for the District of Columbia census tracts (2020). The data are obtained from the \code{\link[tidycensus]{get_acs}} function and formatted for the \code{\link{messer}} and \code{\link{powell_wiley}} functions input. -#' -#' @name ndi-package -#' @aliases ndi-package ndi -#' @docType package -#' -#' @section Dependencies: The 'ndi' package relies heavily upon \code{\link{tidycensus}} to retrieve data from the U.S. Census Bureau American Community Survey five-year estimates and the \code{\link{psych}} for computing the neighborhood deprivation indices. The \code{\link{messer}} function builds upon code developed by Hruska _et al._ (2022) \doi{10.17605/OSF.IO/M2SAV} by fictionalizing, adding the percent of households earning <$30,000 per year to the NDI computation, and providing the option for computing the ACS-5 2006-2010 NDI values. There is no code companion to compute NDI included in Andrews _et al._ (2020) \doi{10.1080/17445647.2020.1750066} or Slotman _et al._ (2022) \doi{10.1016/j.dib.2022.108002}, but the package author worked directly with the Slotman _et al._ (2022) \doi{10.1016/j.dib.2022.108002} authors to replicate their SAS code in R. The spatial metrics RI and EI rely on the \code{\link{sf}} and \code{\link{Matrix}} packages to compute the geospatial adjacency matrix between census geographies. Internal function to calculate AI is based on \code{\link[DescTools]{Atkinson}} function. There is no code companion to compute RI, EI, DI, II, V, LQ, or LEx/Is included in Anthopolos _et al._ (2011) \doi{10.1016/j.sste.2011.06.002}, Bravo _et al._ (2021) \doi{10.3390/ijerph18179384}, Duncan & Duncan (1955) \doi{10.2307/2088328}, Bell (1954) \doi{10.2307/2574118}, White (1986) \doi{10.2307/3644339}, Sudano _et al._ (2013) \doi{10.1016/j.healthplace.2012.09.015}, or Bemanian & Beyer (2017) \doi{10.1158/1055-9965.EPI-16-0926}, respectively. -#' -#' @author Ian D. Buller\cr \emph{Social & Scientific Systems, Inc., a division of DLH Corporation, Silver Spring, Maryland, USA (current); Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland, USA (original).} \cr -#' -#' Maintainer: I.D.B. \email{ian.buller@@alumni.emory.edu} -#' -#' @keywords package -NULL - -#' @import dplyr -#' @importFrom car logit -#' @importFrom MASS ginv -#' @importFrom Matrix sparseMatrix -#' @importFrom psych alpha principal -#' @importFrom sf st_drop_geometry st_geometry st_intersects -#' @importFrom stats complete.cases cor cov2cor loadings median na.omit promax quantile sd setNames -#' @importFrom stringr str_trim -#' @importFrom tidycensus get_acs -#' @importFrom tidyr pivot_longer separate -#' @importFrom utils stack -NULL diff --git a/R/powell_wiley.R b/R/powell_wiley.R index d765dc5..07acd46 100644 --- a/R/powell_wiley.R +++ b/R/powell_wiley.R @@ -1,17 +1,17 @@ -#' Neighborhood Deprivation Index based on Andrews _et al._ (2020) and Slotman _et al._ (2022) -#' +#' Neighborhood Deprivation Index based on Andrews et al. (2020) and Slotman et al. (2022) +#' #' Compute the aspatial Neighborhood Deprivation Index (Powell-Wiley). #' -#' @param geo Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}. +#' @param geo Character string specifying the geography of the data either census tracts \code{geo = 'tract'} (the default) or counties \code{geo = 'county'}. #' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2010 onward are currently available. -#' @param imp Logical. If TRUE, will impute missing census characteristics within the internal \code{\link[psych]{principal}} using median values of variables. If FALSE (the default), will not impute. +#' @param imp Logical. If TRUE, will impute missing census characteristics within the internal \code{\link[psych]{principal}} using median values of variables. If FALSE (the default), will not impute. #' @param quiet Logical. If TRUE, will display messages about potential missing census information, standardized Cronbach's alpha, and proportion of variance explained by principal component analysis. The default is FALSE. #' @param round_output Logical. If TRUE, will round the output of raw census and NDI values from the \code{\link[tidycensus]{get_acs}} at one and four significant digits, respectively. The default is FALSE. #' @param df Optional. Pass a pre-formatted \code{'dataframe'} or \code{'tibble'} with the desired variables through the function. Bypasses the data obtained by \code{\link[tidycensus]{get_acs}}. The default is NULL. See Details below. #' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics #' -#' @details This function will compute the aspatial Neighborhood Deprivation Index (NDI) of U.S. census tracts or counties for a specified geographical referent (e.g., US-standardized) based on Andrews _et al._ (2020) \doi{10.1080/17445647.2020.1750066} and Slotman _et al._ (2022) \doi{10.1016/j.dib.2022.108002}. -#' +#' @details This function will compute the aspatial Neighborhood Deprivation Index (NDI) of U.S. census tracts or counties for a specified geographical referent (e.g., US-standardized) based on Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002}. +#' #' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for computation involving a factor analysis with the \code{\link[psych]{principal}} function. The yearly estimates are available in 2010 and after when all census characteristics became available. The thirteen characteristics chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x} are: #' \itemize{ #' \item **MedHHInc (B19013)**: median household income (dollars) @@ -28,25 +28,25 @@ #' \item **PctFamBelowPov (S1702)**: percent of families with incomes below the poverty level #' \item **PctUnempl (S2301)**: percent unemployed #' } -#' -#' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify the referent for standardizing the NDI (Powell-Wiley) values. For example, if all U.S. states are specified for the \code{state} argument, then the output would be a U.S.-standardized index. Please note: the NDI (Powell-Wiley) values will not exactly match (but will highly correlate with) those found in Andrews _et al._ (2020) \doi{10.1080/17445647.2020.1750066} and Slotman _et al._ (2022) \doi{10.1016/j.dib.2022.108002} because the two studies used a different statistical platform (i.e., SPSS and SAS, respectively) that intrinsically calculate the principal component analysis differently from R. -#' -#' The categorical NDI (Powell-Wiley) values are population-weighted quintiles of the continuous NDI (Powell-Wiley) values. -#' +#' +#' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify the referent for standardizing the NDI (Powell-Wiley) values. For example, if all U.S. states are specified for the \code{state} argument, then the output would be a U.S.-standardized index. Please note: the NDI (Powell-Wiley) values will not exactly match (but will highly correlate with) those found in Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} because the two studies used a different statistical platform (i.e., SPSS and SAS, respectively) that intrinsically calculate the principal component analysis differently from R. +#' +#' The categorical NDI (Powell-Wiley) values are population-weighted quintiles of the continuous NDI (Powell-Wiley) values. +#' #' Check if the proportion of variance explained by the first principal component is high (more than 0.5). -#' +#' #' Users can bypass \code{\link[tidycensus]{get_acs}} by specifying a pre-formatted data frame or tibble using the \code{df} argument. This function will compute an index using the first component of a principal component analysis (PCA) with a Promax (oblique) rotation and a minimum Eigenvalue of 1, omitting variables with absolute loading score < 0.4. The recommended structure of the data frame or tibble is an ID (e.g., GEOID) in the first feature (column), an estimate of the total population in the second feature (column), followed by the variables of interest (in any order) and no additional information (e.g., omit state or county names from the \code{df} argument input). -#' +#' #' @return An object of class 'list'. This is a named list with the following components: -#' +#' #' \describe{ #' \item{\code{ndi}}{An object of class 'tbl' for the GEOID, name, NDI continuous, NDI quintiles, and raw census values of specified census geographies.} #' \item{\code{pca}}{An object of class 'principal', returns the output of \code{\link[psych]{principal}} used to compute the NDI values.} #' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute NDI.} #' \item{\code{cronbach}}{An object of class 'character' or 'numeric' for the results of the Cronbach's alpha calculation. If only one factor is computed, a message is returned. If more than one factor is computed, Cronbach's alpha is calculated and should check that it is >0.7 for respectable internal consistency between factors.} #' } -#' -#' @import dplyr +#' +#' @import dplyr #' @importFrom MASS ginv #' @importFrom psych alpha principal #' @importFrom stats complete.cases cor cov2cor loadings median promax quantile sd @@ -54,310 +54,417 @@ #' @importFrom tidycensus get_acs #' @importFrom tidyr pivot_longer separate #' @export -#' +#' #' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic referent selection (i.e., \code{state} and \code{county}). #' #' @examples -#' +#' #' powell_wiley(df = DCtracts2020[ , -c(3:10)]) -#' +#' #' \dontrun{ #' # Wrapped in \dontrun{} because these examples require a Census API key. -#' +#' #' # Tract-level metric (2020) -#' powell_wiley(geo = "tract", state = "GA", year = 2020) +#' powell_wiley(geo = 'tract', state = 'GA', year = 2020) #' #' # Impute NDI for tracts (2020) with missing census information (median values) -#' powell_wiley(state = "tract", "GA", year = 2020, imp = TRUE) -#' +#' powell_wiley(state = 'tract', 'GA', year = 2020, imp = TRUE) +#' #' # County-level metric (2020) -#' powell_wiley(geo = "county", state = "GA", year = 2020) -#' +#' powell_wiley(geo = 'county', state = 'GA', year = 2020) +#' #' } -#' -powell_wiley <- function(geo = "tract", year = 2020, imp = FALSE, quiet = FALSE, round_output = FALSE, df = NULL, ...) { - - # Check arguments - if (!is.null(df) & !inherits(df, c("tbl_df", "tbl", "data.frame"))) { stop("'df' must be class 'data.frame' or 'tbl'") } +#' +powell_wiley <- function(geo = 'tract', + year = 2020, + imp = FALSE, + quiet = FALSE, + round_output = FALSE, + df = NULL, + ...) { - if (is.null(df)) { - - # Check additional arguments - match.arg(geo, choices = c("county", "tract")) - stopifnot(is.numeric(year), year >= 2010) # all variables available 2010 onward - - # Select census variables - vars <- c(MedHHInc = "B19013_001", - PctRecvIDR_num = "B19054_002", PctRecvIDR_den = "B19054_001", - PctPubAsst_num = "B19058_002", PctPubAsst_den = "B19058_001", - MedHomeVal = "B25077_001", - PctMgmtBusScArti_num = "C24060_002", PctMgmtBusScArti_den = "C24060_001", - PctFemHeadKids_num1 = "B11005_007", PctFemHeadKids_num2 = "B11005_010", - PctFemHeadKids_den = "B11005_001", - PctOwnerOcc = "DP04_0046P", - PctNoPhone = "DP04_0075P", - PctNComPlmb = "DP04_0073P", - PctEduc_num25upHS = "S1501_C01_009", - PctEduc_num25upSC = "S1501_C01_010", - PctEduc_num25upAD = "S1501_C01_011", - PctEduc_num25upBD = "S1501_C01_012", - PctEduc_num25upGD = "S1501_C01_013", - PctEduc_den25up = "S1501_C01_006", - PctFamBelowPov = "S1702_C02_001", - PctUnempl = "S2301_C04_001", - TotalPopulation = "B01001_001") - - # Updated census variable definition(s) - if (year < 2015){ vars <- c(vars[-13], PctNoPhone = "DP04_0074P") } - - # Acquire NDI variables - ndi_data <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo, - year = year, - output = "wide", - variables = vars, ...))) + # Check arguments + if (!is.null(df) & + !inherits(df, c('tbl_df', 'tbl', 'data.frame'))) { + stop("'df' must be class 'data.frame' or 'tbl'") + } - if (geo == "tract") { + if (is.null(df)) { + # Check additional arguments + match.arg(geo, choices = c('county', 'tract')) + stopifnot(is.numeric(year), year >= 2010) # all variables available 2010 onward + + # Select census variables + vars <- c( + MedHHInc = 'B19013_001', + PctRecvIDR_num = 'B19054_002', + PctRecvIDR_den = 'B19054_001', + PctPubAsst_num = 'B19058_002', + PctPubAsst_den = 'B19058_001', + MedHomeVal = 'B25077_001', + PctMgmtBusScArti_num = 'C24060_002', + PctMgmtBusScArti_den = 'C24060_001', + PctFemHeadKids_num1 = 'B11005_007', + PctFemHeadKids_num2 = 'B11005_010', + PctFemHeadKids_den = 'B11005_001', + PctOwnerOcc = 'DP04_0046P', + PctNoPhone = 'DP04_0075P', + PctNComPlmb = 'DP04_0073P', + PctEduc_num25upHS = 'S1501_C01_009', + PctEduc_num25upSC = 'S1501_C01_010', + PctEduc_num25upAD = 'S1501_C01_011', + PctEduc_num25upBD = 'S1501_C01_012', + PctEduc_num25upGD = 'S1501_C01_013', + PctEduc_den25up = 'S1501_C01_006', + PctFamBelowPov = 'S1702_C02_001', + PctUnempl = 'S2301_C04_001', + TotalPopulation = 'B01001_001' + ) + + # Updated census variable definition(s) + if (year < 2015) { + vars <- c(vars[-13], PctNoPhone = 'DP04_0074P') + } + + # Acquire NDI variables + ndi_data <- suppressMessages(suppressWarnings( + tidycensus::get_acs( + geography = geo, + year = year, + output = 'wide', + variables = vars, + ... + ) + )) + + + if (geo == 'tract') { + ndi_data <- ndi_data %>% + tidyr::separate(NAME, into = c('tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate(tract = gsub('[^0-9\\.]', '', tract)) + } else { + ndi_data <- ndi_data %>% + tidyr::separate(NAME, into = c('county', 'state'), sep = ',') + } + ndi_data <- ndi_data %>% - tidyr::separate(NAME, into = c("tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]","", tract)) + dplyr::mutate( + MedHHInc = MedHHIncE, + PctRecvIDR = PctRecvIDR_numE / PctRecvIDR_denE * 100, + PctPubAsst = PctPubAsst_numE / PctPubAsst_denE * 100, + MedHomeVal = MedHomeValE, + PctMgmtBusScArti = PctMgmtBusScArti_numE / PctMgmtBusScArti_denE * 100, + PctFemHeadKids = (PctFemHeadKids_num1E + PctFemHeadKids_num2E) / + PctFemHeadKids_denE * 100, + PctOwnerOcc = PctOwnerOccE, + PctNoPhone = PctNoPhoneE, + PctNComPlmb = PctNComPlmbE, + PctEducHSPlus = ( + PctEduc_num25upHSE + PctEduc_num25upSCE + PctEduc_num25upADE + PctEduc_num25upBDE + + PctEduc_num25upGDE + ) / PctEduc_den25upE * 100, + PctEducBchPlus = (PctEduc_num25upBDE + PctEduc_num25upGDE) / PctEduc_den25upE * 100, + PctFamBelowPov = PctFamBelowPovE, + PctUnempl = PctUnemplE, + TotalPop = TotalPopulationE + ) %>% + # Log transform median household income and median home value + # Reverse code percentages so that higher values represent more deprivation + # Round percentages to 1 decimal place + dplyr::mutate( + logMedHHInc = log(MedHHInc), + logMedHomeVal = log(MedHomeVal), + PctNoIDR = 100 - PctRecvIDR, + PctWorkClass = 100 - PctMgmtBusScArti, + PctNotOwnerOcc = 100 - PctOwnerOcc, + PctEducLTHS = 100 - PctEducHSPlus, + PctEducLTBch = 100 - PctEducBchPlus + ) %>% + # Z-standardize the percentages + dplyr::mutate( + PctNoIDRZ = scale(PctNoIDR), + PctPubAsstZ = scale(PctPubAsst), + PctWorkClassZ = scale(PctWorkClass), + PctFemHeadKidsZ = scale(PctFemHeadKids), + PctNotOwnerOccZ = scale(PctNotOwnerOcc), + PctNoPhoneZ = scale(PctNoPhone), + PctNComPlmbZ = scale(PctNComPlmb), + PctEducLTHSZ = scale(PctEducLTHS), + PctEducLTBchZ = scale(PctEducLTBch), + PctFamBelowPovZ = scale(PctFamBelowPov), + PctUnemplZ = scale(PctUnempl) + ) + + # generate NDI + ndi_data_pca <- ndi_data %>% + dplyr::select( + logMedHHInc, + PctNoIDRZ, + PctPubAsstZ, + logMedHomeVal, + PctWorkClassZ, + PctFemHeadKidsZ, + PctNotOwnerOccZ, + PctNoPhoneZ, + PctNComPlmbZ, + PctEducLTHSZ, + PctEducLTBchZ, + PctFamBelowPovZ, + PctUnemplZ + ) } else { - ndi_data <- ndi_data %>% tidyr::separate(NAME, into = c("county", "state"), sep = ",") + # If inputing pre-formatted data: + ## rename first and second features (columns) with name to match above + colnames(df)[1:2] <- c('GEOID', 'TotalPop') + ndi_data <- dplyr::as_tibble(df) + ## omit the first two features (columns) typically an ID (e.g., GEOID or FIPS) and TotalPop + ndi_data_pca <- ndi_data[,-c(1:2)] } + # Run a factor analysis using Promax (oblique) rotation and a minimum Eigenvalue of 1 + nfa <- eigen(stats::cor(ndi_data_pca, use = 'complete.obs')) + nfa <- sum(nfa$values > 1) # count of factors with a minimum Eigenvalue of 1 + fit <- psych::principal(ndi_data_pca, nfactors = nfa, rotate = 'none') + fit_rotate <- stats::promax(stats::loadings(fit), m = 3) - ndi_data <- ndi_data %>% - dplyr::mutate(MedHHInc = MedHHIncE, - PctRecvIDR = PctRecvIDR_numE / PctRecvIDR_denE * 100, - PctPubAsst = PctPubAsst_numE / PctPubAsst_denE * 100, - MedHomeVal = MedHomeValE, - PctMgmtBusScArti = PctMgmtBusScArti_numE / PctMgmtBusScArti_denE * 100, - PctFemHeadKids = (PctFemHeadKids_num1E + PctFemHeadKids_num2E) / PctFemHeadKids_denE * 100, - PctOwnerOcc = PctOwnerOccE, - PctNoPhone = PctNoPhoneE, - PctNComPlmb = PctNComPlmbE, - PctEducHSPlus = (PctEduc_num25upHSE + PctEduc_num25upSCE + PctEduc_num25upADE + - PctEduc_num25upBDE + PctEduc_num25upGDE) / PctEduc_den25upE * 100, - PctEducBchPlus = (PctEduc_num25upBDE + PctEduc_num25upGDE) / PctEduc_den25upE * 100, - PctFamBelowPov = PctFamBelowPovE, - PctUnempl = PctUnemplE, - TotalPop = TotalPopulationE) %>% - # Log transform median household income and median home value - # Reverse code percentages so that higher values represent more deprivation - # Round percentages to 1 decimal place - dplyr::mutate(logMedHHInc = log(MedHHInc), - logMedHomeVal = log(MedHomeVal), - PctNoIDR = 100 - PctRecvIDR, - PctWorkClass = 100 - PctMgmtBusScArti, - PctNotOwnerOcc = 100 - PctOwnerOcc, - PctEducLTHS = 100 - PctEducHSPlus, - PctEducLTBch = 100 - PctEducBchPlus) %>% - # Z-standardize the percentages - dplyr::mutate(PctNoIDRZ = scale(PctNoIDR), - PctPubAsstZ = scale(PctPubAsst), - PctWorkClassZ = scale(PctWorkClass), - PctFemHeadKidsZ = scale(PctFemHeadKids), - PctNotOwnerOccZ = scale(PctNotOwnerOcc), - PctNoPhoneZ = scale(PctNoPhone), - PctNComPlmbZ = scale(PctNComPlmb), - PctEducLTHSZ = scale(PctEducLTHS), - PctEducLTBchZ = scale(PctEducLTBch), - PctFamBelowPovZ = scale(PctFamBelowPov), - PctUnemplZ = scale(PctUnempl)) + # Calculate the factors using only variables with an absolute loading score > 0.4 for the first factor + ## If number of factors > 2, use structure matrix, else pattern matrix + if (nfa > 1) { + P_mat <- matrix(stats::loadings(fit_rotate), nrow = 13, ncol = nfa) + + # Structure matrix (S_mat) from under-the-hood of the psych::principal() function + rot.mat <- fit_rotate$rotmat # rotation matrix + ui <- solve(rot.mat) + Phi <- cov2cor(ui %*% t(ui)) # interfactor correlation + S_mat <- P_mat %*% Phi # pattern matrix multiplied by interfactor correlation + + } else { + P_mat <- matrix(fit_rotate, nrow = 13, ncol = 1) + Phi <- 1 + S_mat <- P_mat + } - # generate NDI - ndi_data_pca <- ndi_data %>% - dplyr::select(logMedHHInc, PctNoIDRZ, PctPubAsstZ, logMedHomeVal, PctWorkClassZ, - PctFemHeadKidsZ, PctNotOwnerOccZ, PctNoPhoneZ, PctNComPlmbZ, PctEducLTHSZ, - PctEducLTBchZ, PctFamBelowPovZ, PctUnemplZ) - } else { - # If inputing pre-formatted data: - colnames(df)[1:2] <- c("GEOID", "TotalPop") # rename first and second features (columns) with name to match above - ndi_data <- dplyr::as_tibble(df) - ndi_data_pca <- ndi_data[ , -c(1:2)] # omits the first two features (columns) typically an ID (e.g., GEOID or FIPS) and TotalPop - } - # Run a factor analysis using Promax (oblique) rotation and a minimum Eigenvalue of 1 - nfa <- eigen(stats::cor(ndi_data_pca, use = "complete.obs")) - nfa <- sum(nfa$values > 1) # count of factors with a minimum Eigenvalue of 1 - fit <- psych::principal(ndi_data_pca, - nfactors = nfa, - rotate = "none") - fit_rotate <- stats::promax(stats::loadings(fit), m = 3) - - # Calculate the factors using only variables with an absolute loading score > 0.4 for the first factor - ## If number of factors > 2, use structure matrix, else pattern matrix - if (nfa > 1) { - P_mat <- matrix(stats::loadings(fit_rotate), nrow = 13, ncol = nfa) + ## Variable correlation matrix (R_mat) + R_mat <- as.matrix(cor(ndi_data_pca[complete.cases(ndi_data_pca),])) - # Structure matrix (S_mat) from under-the-hood of the psych::principal() function - rot.mat <- fit_rotate$rotmat # rotation matrix - ui <- solve(rot.mat) - Phi <- cov2cor(ui %*% t(ui)) # interfactor correlation - S_mat <- P_mat %*% Phi # pattern matrix multiplied by interfactor correlation + ## standardized score coefficients or weight matrix (B_mat) + B_mat <- solve(R_mat, S_mat) - } else { - P_mat <- matrix(fit_rotate, nrow = 13, ncol = 1) - Phi <- 1 - S_mat <- P_mat - } - - ## Variable correlation matrix (R_mat) - R_mat <- as.matrix(cor(ndi_data_pca[complete.cases(ndi_data_pca), ])) - - ## standardized score coefficients or weight matrix (B_mat) - B_mat <- solve(R_mat, S_mat) - - # Additional PCA Information - fit_rotate$rotation <- "promax" - fit_rotate$Phi <- Phi - fit_rotate$Structure <- S_mat - - if (nfa > 1) { - fit_rotate$communality <- rowSums(P_mat^2) - } else { - fit_rotate$communality <- P_mat^2 - } - fit_rotate$uniqueness <- diag(R_mat) - fit_rotate$communality - - if (nfa > 1) { - vx <- colSums(P_mat^2) - } else { - vx <- sum(P_mat^2) - } - - vtotal <- sum(fit_rotate$communality + fit_rotate$uniqueness) - vx <- diag(Phi %*% t(P_mat) %*% P_mat) - names(vx) <- colnames(loadings) - varex <- rbind(`SS loadings` = vx) - varex <- rbind(varex, `Proportion Var` = vx/vtotal) - if (nfa > 1) { - varex <- rbind(varex, `Cumulative Var` = cumsum(vx/vtotal)) - varex <- rbind(varex, `Proportion Explained` = vx/sum(vx)) - varex <- rbind(varex, `Cumulative Proportion` = cumsum(vx/sum(vx))) - } - fit_rotate$Vaccounted <- varex - - if (imp == TRUE) { - ndi_data_scrs <- as.matrix(ndi_data_pca) - miss <- which(is.na(ndi_data_scrs), arr.ind = TRUE) - item.med <- apply(ndi_data_scrs, 2, stats::median, na.rm = TRUE) - ndi_data_scrs[miss] <- item.med[miss[, 2]] - } else { - ndi_data_scrs <- ndi_data_pca - } - - scrs <- as.matrix(scale(ndi_data_scrs[complete.cases(ndi_data_scrs), abs(S_mat[ , 1]) > 0.4 ])) %*% B_mat[abs(S_mat[ , 1]) > 0.4, 1] - - ndi_data_NA <- ndi_data[complete.cases(ndi_data_scrs), ] - ndi_data_NA$NDI <- c(scrs) - - ndi_data_NDI <- dplyr::left_join(ndi_data[ , c("GEOID", "TotalPop")], ndi_data_NA[ , c("GEOID", "NDI")], by = "GEOID") - - # Calculate Cronbach's alpha correlation coefficient among the factors and verify values are above 0.7. - if (nfa == 1) { - crnbch <- "Only one factor with minimum Eigenvalue of 1. Cannot calculate Cronbach's alpha." - } else { - cronbach <- suppressMessages(psych::alpha(ndi_data_pca[ , abs(S_mat[ , 1]) > 0.4 ], check.keys = TRUE, na.rm = TRUE, warnings = FALSE)) - crnbch <- cronbach$total$std.alpha - } - - # Warning for missingness of census characteristics - missingYN <- ndi_data_pca %>% - tidyr::pivot_longer(cols = dplyr::everything(), - names_to = "variable", - values_to = "val") %>% - dplyr::group_by(variable) %>% - dplyr::summarise(total = dplyr::n(), - n_missing = sum(is.na(val)), - percent_missing = paste0(round(mean(is.na(val)) * 100, 2), " %")) - - if (quiet == FALSE) { + # Additional PCA Information + fit_rotate$rotation <- 'promax' + fit_rotate$Phi <- Phi + fit_rotate$Structure <- S_mat - # Warning for missing census data - if (sum(missingYN$n_missing) > 0) { - message("Warning: Missing census data") + if (nfa > 1) { + fit_rotate$communality <- rowSums(P_mat ^ 2) + } else { + fit_rotate$communality <- P_mat ^ 2 } + fit_rotate$uniqueness <- diag(R_mat) - fit_rotate$communality - # Warning for Cronbach's alpha < 0.7 - if (cronbach$total$std.alpha < 0.7) { - message("Warning: Cronbach's alpha correlation coefficient among the factors is less than 0.7.") + if (nfa > 1) { + vx <- colSums(P_mat ^ 2) + } else { + vx <- sum(P_mat ^ 2) } - # Warning for proportion of variance explained by FA1 - if (fit_rotate$Vaccounted[2] < 0.50) { - message("Warning: The proportion of variance explained by PC1 is less than 0.50.") + vtotal <- sum(fit_rotate$communality + fit_rotate$uniqueness) + vx <- diag(Phi %*% t(P_mat) %*% P_mat) + names(vx) <- colnames(loadings) + varex <- rbind(`SS loadings` = vx) + varex <- rbind(varex, `Proportion Var` = vx / vtotal) + if (nfa > 1) { + varex <- rbind(varex, `Cumulative Var` = cumsum(vx / vtotal)) + varex <- rbind(varex, `Proportion Explained` = vx / sum(vx)) + varex <- rbind(varex, `Cumulative Proportion` = cumsum(vx / sum(vx))) } - } - - # NDI quintiles weighted by tract population - NDIQuint <- ndi_data_NDI %>% - dplyr::mutate(NDIQuint = cut(NDI*log(TotalPop), - breaks = stats::quantile(NDI*log(TotalPop), - probs = c(0, 0.2, 0.4, 0.6, 0.8, 1), - na.rm = TRUE), - labels = c("1-Least deprivation", "2-BelowAvg deprivation", - "3-Average deprivation","4-AboveAvg deprivation", - "5-Most deprivation"), - include.lowest = TRUE), - NDIQuint = factor(replace(as.character(NDIQuint), - is.na(NDIQuint) | is.infinite(NDIQuint), - "9-NDI not avail"), - c(levels(NDIQuint), "9-NDI not avail"))) %>% - dplyr::select(NDI, NDIQuint) - - if (is.null(df)) { - # Format output - if (round_output == TRUE) { - ndi <- cbind(ndi_data, NDIQuint) %>% - dplyr::mutate(PctRecvIDR = round(PctRecvIDR, digits = 1), - PctPubAsst = round(PctPubAsst, digits = 1), - PctMgmtBusScArti = round(PctMgmtBusScArti, digits = 1), - PctFemHeadKids = round(PctFemHeadKids, digits = 1), - PctOwnerOcc = round(PctOwnerOcc, digits = 1), - PctNoPhone = round(PctNoPhone, digits = 1), - PctNComPlmb = round(PctNComPlmb, digits = 1), - PctEducHSPlus = round(PctEducHSPlus, digits = 1), - PctEducBchPlus = round(PctEducBchPlus, digits = 1), - PctFamBelowPov = round(PctFamBelowPov, digits = 1), - PctUnempl = round(PctUnempl, digits = 1)) + fit_rotate$Vaccounted <- varex + + if (imp == TRUE) { + ndi_data_scrs <- as.matrix(ndi_data_pca) + miss <- which(is.na(ndi_data_scrs), arr.ind = TRUE) + item.med <- apply(ndi_data_scrs, 2, stats::median, na.rm = TRUE) + ndi_data_scrs[miss] <- item.med[miss[, 2]] } else { - ndi <- cbind(ndi_data, NDIQuint) + ndi_data_scrs <- ndi_data_pca } - if (geo == "tract") { - ndi <- ndi %>% - dplyr::select(GEOID, - state, - county, - tract, - NDI, NDIQuint, - MedHHInc, PctRecvIDR, PctPubAsst, MedHomeVal, PctMgmtBusScArti, - PctFemHeadKids,PctOwnerOcc, PctNoPhone, PctNComPlmb, PctEducHSPlus, - PctEducBchPlus, PctFamBelowPov, PctUnempl, TotalPop) + scrs <- as.matrix( + scale(ndi_data_scrs[complete.cases(ndi_data_scrs), abs(S_mat[, 1]) > 0.4]) + ) %*% B_mat[abs(S_mat[, 1]) > 0.4, 1] + + ndi_data_NA <- ndi_data[complete.cases(ndi_data_scrs),] + ndi_data_NA$NDI <- c(scrs) + + ndi_data_NDI <- ndi_data[, c('GEOID', 'TotalPop')] %>% + dplyr::left_join(ndi_data_NA[, c('GEOID', 'NDI')], by = dplyr::join_by(GEOID)) + + # Calculate Cronbach's alpha correlation coefficient among the factors and verify values are above 0.7. + if (nfa == 1) { + crnbch <- + "Only one factor with minimum Eigenvalue of 1. Cannot calculate Cronbach's alpha." } else { + cronbach <- suppressMessages(psych::alpha( + ndi_data_pca[, abs(S_mat[, 1]) > 0.4], + check.keys = TRUE, + na.rm = TRUE, + warnings = FALSE + )) + crnbch <- cronbach$total$std.alpha + } + + # Warning for missingness of census characteristics + missingYN <- ndi_data_pca %>% + tidyr::pivot_longer( + cols = dplyr::everything(), + names_to = 'variable', + values_to = 'val' + ) %>% + dplyr::group_by(variable) %>% + dplyr::summarise( + total = dplyr::n(), + n_missing = sum(is.na(val)), + percent_missing = paste0(round(mean(is.na(val)) * 100, 2), ' %') + ) + + if (quiet == FALSE) { + # Warning for missing census data + if (sum(missingYN$n_missing) > 0) { + message('Warning: Missing census data') + } + + # Warning for Cronbach's alpha < 0.7 + if (cronbach$total$std.alpha < 0.7) { + message( + "Warning: Cronbach's alpha correlation coefficient among the factors is less than 0.7." + ) + } + + # Warning for proportion of variance explained by FA1 + if (fit_rotate$Vaccounted[2] < 0.50) { + message('Warning: The proportion of variance explained by PC1 is less than 0.50.') + } + } + + # NDI quintiles weighted by tract population + NDIQuint <- ndi_data_NDI %>% + dplyr::mutate( + NDIQuint = cut( + NDI * log(TotalPop), + breaks = stats::quantile( + NDI * log(TotalPop), + probs = c(0, 0.2, 0.4, 0.6, 0.8, 1), + na.rm = TRUE + ), + labels = c( + '1-Least deprivation', + '2-BelowAvg deprivation', + '3-Average deprivation', + '4-AboveAvg deprivation', + '5-Most deprivation' + ), + include.lowest = TRUE + ), + NDIQuint = factor( + replace( + as.character(NDIQuint), + is.na(NDIQuint) | + is.infinite(NDIQuint), + '9-NDI not avail' + ), + c(levels(NDIQuint), '9-NDI not avail') + ) + ) %>% + dplyr::select(NDI, NDIQuint) + + if (is.null(df)) { + # Format output + if (round_output == TRUE) { + ndi <- cbind(ndi_data, NDIQuint) %>% + dplyr::mutate( + PctRecvIDR = round(PctRecvIDR, digits = 1), + PctPubAsst = round(PctPubAsst, digits = 1), + PctMgmtBusScArti = round(PctMgmtBusScArti, digits = 1), + PctFemHeadKids = round(PctFemHeadKids, digits = 1), + PctOwnerOcc = round(PctOwnerOcc, digits = 1), + PctNoPhone = round(PctNoPhone, digits = 1), + PctNComPlmb = round(PctNComPlmb, digits = 1), + PctEducHSPlus = round(PctEducHSPlus, digits = 1), + PctEducBchPlus = round(PctEducBchPlus, digits = 1), + PctFamBelowPov = round(PctFamBelowPov, digits = 1), + PctUnempl = round(PctUnempl, digits = 1) + ) + } else { + ndi <- cbind(ndi_data, NDIQuint) + } + + if (geo == 'tract') { + ndi <- ndi %>% + dplyr::select( + GEOID, + state, + county, + tract, + NDI, + NDIQuint, + MedHHInc, + PctRecvIDR, + PctPubAsst, + MedHomeVal, + PctMgmtBusScArti, + PctFemHeadKids, + PctOwnerOcc, + PctNoPhone, + PctNComPlmb, + PctEducHSPlus, + PctEducBchPlus, + PctFamBelowPov, + PctUnempl, + TotalPop + ) + } else { + ndi <- ndi %>% + dplyr::select( + GEOID, + state, + county, + NDI, + NDIQuint, + MedHHInc, + PctRecvIDR, + PctPubAsst, + MedHomeVal, + PctMgmtBusScArti, + PctFemHeadKids, + PctOwnerOcc, + PctNoPhone, + PctNComPlmb, + PctEducHSPlus, + PctEducBchPlus, + PctFamBelowPov, + PctUnempl, + TotalPop + ) + } + ndi <- ndi %>% - dplyr::select(GEOID, - state, - county, - NDI, NDIQuint, - MedHHInc, PctRecvIDR, PctPubAsst, MedHomeVal, PctMgmtBusScArti, - PctFemHeadKids,PctOwnerOcc, PctNoPhone, PctNComPlmb, PctEducHSPlus, - PctEducBchPlus, PctFamBelowPov, PctUnempl, TotalPop) + dplyr::mutate( + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + } else { + ndi <- cbind(df[, 1], NDIQuint, df[, 2:ncol(df)]) + ndi <- dplyr::as_tibble(ndi[order(ndi[, 1]),]) } - ndi <- ndi %>% - dplyr::mutate(state = stringr::str_trim(state), - county = stringr::str_trim(county)) %>% - dplyr::arrange(GEOID) %>% - dplyr::as_tibble() + out <- list( + ndi = ndi, + pca = fit_rotate, + missing = missingYN, + cronbach = crnbch + ) - } else { - ndi <- cbind(df[ , 1], NDIQuint, df[ , 2:ncol(df)]) - ndi <- dplyr::as_tibble(ndi[order(ndi[ , 1]), ]) + return(out) } - - out <- list(ndi = ndi, - pca = fit_rotate, - missing = missingYN, - cronbach = crnbch) - - return(out) -} diff --git a/R/sudano.R b/R/sudano.R index cb1afca..5155ccc 100644 --- a/R/sudano.R +++ b/R/sudano.R @@ -1,55 +1,55 @@ -#' Location Quotient (LQ) based on Merton (1938) and Sudano _et al._ (2013) -#' +#' Location Quotient (LQ) based on Merton (1938) and Sudano et al. (2013) +#' #' Compute the aspatial Location Quotient (Sudano) of a selected racial/ethnic subgroup(s) and U.S. geographies. #' -#' @param geo_large Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = "county"}. -#' @param geo_small Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = "tract"}. +#' @param geo_large Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}. +#' @param geo_small Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}. #' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available. #' @param subgroup Character string specifying the racial/ethnic subgroup(s). See Details for available choices. #' @param omit_NAs Logical. If FALSE, will compute index for a larger geographical unit only if all of its smaller geographical units have values. The default is TRUE. #' @param quiet Logical. If TRUE, will display messages about potential missing census information. The default is FALSE. #' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics #' -#' @details This function will compute the aspatial Location Quotient (LQ) of selected racial/ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Merton (1939) \doi{10.2307/2084686} and Sudano _et al._ (2013) \doi{10.1016/j.healthplace.2012.09.015}. This function provides the computation of LQ for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). -#' +#' @details This function will compute the aspatial Location Quotient (LQ) of selected racial/ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Merton (1939) \doi{10.2307/2084686} and Sudano et al. (2013) \doi{10.1016/j.healthplace.2012.09.015}. This function provides the computation of LQ for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). +#' #' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: #' \itemize{ -#' \item **B03002_002**: not Hispanic or Latino \code{"NHoL"} -#' \item **B03002_003**: not Hispanic or Latino, white alone \code{"NHoLW"} -#' \item **B03002_004**: not Hispanic or Latino, Black or African American alone \code{"NHoLB"} -#' \item **B03002_005**: not Hispanic or Latino, American Indian and Alaska Native alone \code{"NHoLAIAN"} -#' \item **B03002_006**: not Hispanic or Latino, Asian alone \code{"NHoLA"} -#' \item **B03002_007**: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"NHoLNHOPI"} -#' \item **B03002_008**: not Hispanic or Latino, Some other race alone \code{"NHoLSOR"} -#' \item **B03002_009**: not Hispanic or Latino, Two or more races \code{"NHoLTOMR"} -#' \item **B03002_010**: not Hispanic or Latino, Two races including Some other race \code{"NHoLTRiSOR"} -#' \item **B03002_011**: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"NHoLTReSOR"} -#' \item **B03002_012**: Hispanic or Latino \code{"HoL"} -#' \item **B03002_013**: Hispanic or Latino, white alone \code{"HoLW"} -#' \item **B03002_014**: Hispanic or Latino, Black or African American alone \code{"HoLB"} -#' \item **B03002_015**: Hispanic or Latino, American Indian and Alaska Native alone \code{"HoLAIAN"} -#' \item **B03002_016**: Hispanic or Latino, Asian alone \code{"HoLA"} -#' \item **B03002_017**: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"HoLNHOPI"} -#' \item **B03002_018**: Hispanic or Latino, Some other race alone \code{"HoLSOR"} -#' \item **B03002_019**: Hispanic or Latino, Two or more races \code{"HoLTOMR"} -#' \item **B03002_020**: Hispanic or Latino, Two races including Some other race \code{"HoLTRiSOR"} -#' \item **B03002_021**: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"HoLTReSOR"} +#' \item **B03002_002**: not Hispanic or Latino \code{'NHoL'} +#' \item **B03002_003**: not Hispanic or Latino, white alone \code{'NHoLW'} +#' \item **B03002_004**: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +#' \item **B03002_005**: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +#' \item **B03002_006**: not Hispanic or Latino, Asian alone \code{'NHoLA'} +#' \item **B03002_007**: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +#' \item **B03002_008**: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +#' \item **B03002_009**: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +#' \item **B03002_010**: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +#' \item **B03002_011**: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +#' \item **B03002_012**: Hispanic or Latino \code{'HoL'} +#' \item **B03002_013**: Hispanic or Latino, white alone \code{'HoLW'} +#' \item **B03002_014**: Hispanic or Latino, Black or African American alone \code{'HoLB'} +#' \item **B03002_015**: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +#' \item **B03002_016**: Hispanic or Latino, Asian alone \code{'HoLA'} +#' \item **B03002_017**: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +#' \item **B03002_018**: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +#' \item **B03002_019**: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +#' \item **B03002_020**: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +#' \item **B03002_021**: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} #' } -#' +#' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. -#' +#' #' LQ is some measure of relative racial homogeneity of each smaller geography within a larger geography. LQ can range in value from 0 to infinity because it is ratio of two proportions in which the numerator is the proportion of subgroup population in a smaller geography and the denominator is the proportion of subgroup population in its larger geography. For example, a smaller geography with an LQ of 5 means that the proportion of the subgroup population living in the smaller geography is five times the proportion of the subgroup population in its larger geography. -#' -#' Larger geographies available include state \code{geo_large = "state"}, county \code{geo_large = "county"}, and census tract \code{geo_large = "tract"} levels. Smaller geographies available include, county \code{geo_small = "county"}, census tract \code{geo_small = "tract"}, and census block group \code{geo_small = "block group"} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the LQ value returned is NA. -#' +#' +#' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, and census tract \code{geo_large = 'tract'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the LQ value returned is NA. +#' #' @return An object of class 'list'. This is a named list with the following components: -#' +#' #' \describe{ #' \item{\code{lq}}{An object of class 'tbl' for the GEOID, name, and LQ at specified smaller census geographies.} #' \item{\code{lq_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.} #' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute LQ.} #' } -#' +#' #' @import dplyr #' @importFrom sf st_drop_geometry #' @importFrom stats complete.cases @@ -57,111 +57,162 @@ #' @importFrom tidyr pivot_longer separate #' @importFrom utils stack #' @export -#' +#' #' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). #' #' @examples #' \dontrun{ #' # Wrapped in \dontrun{} because these examples require a Census API key. -#' +#' #' # Isolation of non-Hispanic Black populations #' ## of census tracts within Georgia, U.S.A., counties (2020) -#' sudano(geo_large = "state", geo_small = "county", state = "GA", -#' year = 2020, subgroup = "NHoLB") -#' +#' sudano( +#' geo_large = 'state', +#' geo_small = 'county', +#' state = 'GA', +#' year = 2020, +#' subgroup = 'NHoLB' +#' ) +#' #' } -#' -sudano <- function(geo_large = "county", geo_small = "tract", year = 2020, subgroup, omit_NAs = TRUE, quiet = FALSE, ...) { +#' +sudano <- function(geo_large = 'county', + geo_small = 'tract', + year = 2020, + subgroup, + omit_NAs = TRUE, + quiet = FALSE, + ...) { # Check arguments - match.arg(geo_large, choices = c("state", "county", "tract")) - match.arg(geo_small, choices = c("county", "tract", "block group")) + match.arg(geo_large, choices = c('state', 'county', 'tract')) + match.arg(geo_small, choices = c('county', 'tract', 'block group')) stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward - match.arg(subgroup, several.ok = TRUE, - choices = c("NHoL", "NHoLW", "NHoLB", "NHoLAIAN", "NHoLA", "NHoLNHOPI", - "NHoLSOR", "NHoLTOMR", "NHoLTRiSOR", "NHoLTReSOR", - "HoL", "HoLW", "HoLB", "HoLAIAN", "HoLA", "HoLNHOPI", - "HoLSOR", "HoLTOMR", "HoLTRiSOR", "HoLTReSOR")) + match.arg( + subgroup, + several.ok = TRUE, + choices = c( + 'NHoL', + 'NHoLW', + 'NHoLB', + 'NHoLAIAN', + 'NHoLA', + 'NHoLNHOPI', + 'NHoLSOR', + 'NHoLTOMR', + 'NHoLTRiSOR', + 'NHoLTReSOR', + 'HoL', + 'HoLW', + 'HoLB', + 'HoLAIAN', + 'HoLA', + 'HoLNHOPI', + 'HoLSOR', + 'HoLTOMR', + 'HoLTRiSOR', + 'HoLTReSOR' + ) + ) # Select census variables - vars <- c(TotalPop = "B03002_001", - NHoL = "B03002_002", - NHoLW = "B03002_003", - NHoLB = "B03002_004", - NHoLAIAN = "B03002_005", - NHoLA = "B03002_006", - NHoLNHOPI = "B03002_007", - NHoLSOR = "B03002_008", - NHoLTOMR = "B03002_009", - NHoLTRiSOR = "B03002_010", - NHoLTReSOR = "B03002_011", - HoL = "B03002_012", - HoLW = "B03002_013", - HoLB = "B03002_014", - HoLAIAN = "B03002_015", - HoLA = "B03002_016", - HoLNHOPI = "B03002_017", - HoLSOR = "B03002_018", - HoLTOMR = "B03002_019", - HoLTRiSOR = "B03002_020", - HoLTReSOR = "B03002_021") - - selected_vars <- vars[c("TotalPop", subgroup)] + vars <- c( + TotalPop = 'B03002_001', + NHoL = 'B03002_002', + NHoLW = 'B03002_003', + NHoLB = 'B03002_004', + NHoLAIAN = 'B03002_005', + NHoLA = 'B03002_006', + NHoLNHOPI = 'B03002_007', + NHoLSOR = 'B03002_008', + NHoLTOMR = 'B03002_009', + NHoLTRiSOR = 'B03002_010', + NHoLTReSOR = 'B03002_011', + HoL = 'B03002_012', + HoLW = 'B03002_013', + HoLB = 'B03002_014', + HoLAIAN = 'B03002_015', + HoLA = 'B03002_016', + HoLNHOPI = 'B03002_017', + HoLSOR = 'B03002_018', + HoLTOMR = 'B03002_019', + HoLTRiSOR = 'B03002_020', + HoLTReSOR = 'B03002_021' + ) + + selected_vars <- vars[c('TotalPop', subgroup)] out_names <- names(selected_vars) # save for output - in_subgroup <- paste(subgroup, "E", sep = "") + in_subgroup <- paste(subgroup, 'E', sep = '') # Acquire LQ variables and sf geometries - lq_data <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo_small, - year = year, - output = "wide", - variables = selected_vars, - geometry = TRUE, - keep_geo_vars = TRUE, ...))) + lq_data <- suppressMessages(suppressWarnings( + tidycensus::get_acs( + geography = geo_small, + year = year, + output = 'wide', + variables = selected_vars, + geometry = TRUE, + keep_geo_vars = TRUE, + ... + ) + )) + # Format output - if (geo_small == "county") { - lq_data <- sf::st_drop_geometry(lq_data) %>% - tidyr::separate(NAME.y, into = c("county", "state"), sep = ",") + if (geo_small == 'county') { + lq_data <- lq_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('county', 'state'), sep = ',') + } + if (geo_small == 'tract') { + lq_data <- lq_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate(tract = gsub('[^0-9\\.]', '', tract)) + } + if (geo_small == 'block group') { + lq_data <- lq_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('block.group', 'tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate( + tract = gsub('[^0-9\\.]', '', tract), + block.group = gsub('[^0-9\\.]', '', block.group) + ) } - if (geo_small == "tract") { - lq_data <- sf::st_drop_geometry(lq_data) %>% - tidyr::separate(NAME.y, into = c("tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]", "", tract)) - } - if (geo_small == "block group") { - lq_data <- sf::st_drop_geometry(lq_data) %>% - tidyr::separate(NAME.y, into = c("block.group", "tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]", "", tract), - block.group = gsub("[^0-9\\.]", "", block.group)) - } # Grouping IDs for R computation - if (geo_large == "tract") { + if (geo_large == 'tract') { lq_data <- lq_data %>% - dplyr::mutate(oid = paste(.$STATEFP, .$COUNTYFP, .$TRACTCE, sep = ""), - state = stringr::str_trim(state), - county = stringr::str_trim(county)) + dplyr::mutate( + oid = paste(.$STATEFP, .$COUNTYFP, .$TRACTCE, sep = ''), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) } - if (geo_large == "county") { + if (geo_large == 'county') { lq_data <- lq_data %>% - dplyr::mutate(oid = paste(.$STATEFP, .$COUNTYFP, sep = ""), - state = stringr::str_trim(state), - county = stringr::str_trim(county)) + dplyr::mutate( + oid = paste(.$STATEFP, .$COUNTYFP, sep = ''), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) } - if (geo_large == "state") { + if (geo_large == 'state') { lq_data <- lq_data %>% - dplyr::mutate(oid = .$STATEFP, - state = stringr::str_trim(state)) + dplyr::mutate( + oid = .$STATEFP, + state = stringr::str_trim(state) + ) } # Count of racial/ethnic subgroup populations ## Count of racial/ethnic comparison subgroup population if (length(in_subgroup) == 1) { lq_data <- lq_data %>% - dplyr::mutate(subgroup = .[ , in_subgroup]) + dplyr::mutate(subgroup = .[, in_subgroup]) } else { lq_data <- lq_data %>% - dplyr::mutate(subgroup = rowSums(.[ , in_subgroup])) + dplyr::mutate(subgroup = rowSums(.[, in_subgroup])) } # Compute LQ @@ -174,60 +225,63 @@ sudano <- function(geo_large = "county", geo_small = "tract", year = 2020, subgr LQtmp <- lq_data %>% split(., f = list(lq_data$oid)) %>% lapply(., FUN = lq_fun, omit_NAs = omit_NAs) %>% - do.call("rbind", .) + do.call('rbind', .) # Warning for missingness of census characteristics - missingYN <- lq_data[ , c("TotalPopE", in_subgroup)] + missingYN <- lq_data[, c('TotalPopE', in_subgroup)] names(missingYN) <- out_names missingYN <- missingYN %>% - tidyr::pivot_longer(cols = dplyr::everything(), - names_to = "variable", - values_to = "val") %>% + tidyr::pivot_longer( + cols = dplyr::everything(), + names_to = 'variable', + values_to = 'val' + ) %>% dplyr::group_by(variable) %>% - dplyr::summarise(total = dplyr::n(), - n_missing = sum(is.na(val)), - percent_missing = paste0(round(mean(is.na(val)) * 100, 2), " %")) + dplyr::summarise( + total = dplyr::n(), + n_missing = sum(is.na(val)), + percent_missing = paste0(round(mean(is.na(val)) * 100, 2), ' %') + ) if (quiet == FALSE) { # Warning for missing census data if (sum(missingYN$n_missing) > 0) { - message("Warning: Missing census data") + message('Warning: Missing census data') } } # Format output - lq <- merge(lq_data, LQtmp) + lq <- lq_data %>% + dplyr::left_join(LQtmp, by = dplyr::join_by(GEOID)) - if (geo_small == "state") { + if (geo_small == 'state') { lq <- lq %>% dplyr::select(GEOID, state, LQ) } - if (geo_small == "county") { + if (geo_small == 'county') { lq <- lq %>% dplyr::select(GEOID, state, county, LQ) } - if (geo_small == "tract") { + if (geo_small == 'tract') { lq <- lq %>% dplyr::select(GEOID, state, county, tract, LQ) } - if (geo_small == "block group") { + if (geo_small == 'block group') { lq <- lq %>% dplyr::select(GEOID, state, county, tract, block.group, LQ) } lq <- lq %>% unique(.) %>% - .[.$GEOID != "NANA", ] %>% + .[.$GEOID != 'NANA',] %>% dplyr::arrange(GEOID) %>% dplyr::as_tibble() lq_data <- lq_data %>% dplyr::arrange(GEOID) %>% - dplyr::as_tibble() + dplyr::as_tibble() - out <- list(lq = lq, - lq_data = lq_data, - missing = missingYN) + out <- list(lq = lq, lq_data = lq_data, missing = missingYN) return(out) } diff --git a/R/utils.R b/R/utils.R index a7e1553..5a37b7d 100644 --- a/R/utils.R +++ b/R/utils.R @@ -1,12 +1,17 @@ # Internal function for the Dissimilarity Index (Duncan & Duncan 1955) ## Returns NA value if only one smaller geography in a larger geography di_fun <- function(x, omit_NAs) { - xx <- x[ , c("subgroup", "subgroup_ref")] + xx <- x[ , c('subgroup', 'subgroup_ref')] if (omit_NAs == TRUE) { xx <- xx[stats::complete.cases(xx), ] } if (nrow(x) < 2 || any(xx < 0) || any(is.na(xx))) { NA } else { - 1/2 * sum(abs(xx$subgroup / sum(xx$subgroup, na.rm = TRUE) - xx$subgroup_ref / sum(xx$subgroup_ref, na.rm = TRUE))) + 0.5 * sum( + abs( + xx$subgroup / sum(xx$subgroup, na.rm = TRUE) - + xx$subgroup_ref / sum(xx$subgroup_ref, na.rm = TRUE) + ), + na.rm = TRUE) } } @@ -34,24 +39,30 @@ ai_fun <- function(x, epsilon, omit_NAs) { # Internal function for the aspatial Racial Isolation Index (Bell 1954) ## Returns NA value if only one smaller geography in a larger geography ii_fun <- function(x, omit_NAs) { - xx <- x[ , c("TotalPopE", "subgroup", "subgroup_ixn")] + xx <- x[ , c('TotalPopE', 'subgroup', 'subgroup_ixn')] if (omit_NAs == TRUE) { xx <- xx[stats::complete.cases(xx), ] } if (nrow(x) < 2 || any(xx < 0) || any(is.na(xx))) { NA } else { - sum((xx$subgroup / sum(xx$subgroup, na.rm = TRUE)) * (xx$subgroup_ixn / xx$TotalPopE)) + sum( + (xx$subgroup / sum(xx$subgroup, na.rm = TRUE)) * (xx$subgroup_ixn / xx$TotalPopE), + na.rm = TRUE + ) } } # Internal function for the aspatial Correlation Ratio (White 1986) ## Returns NA value if only one smaller geography in a larger geography v_fun <- function(x, omit_NAs) { - xx <- x[ , c("TotalPopE", "subgroup")] + xx <- x[ , c('TotalPopE', 'subgroup')] if (omit_NAs == TRUE) { xx <- xx[stats::complete.cases(xx), ] } if (nrow(x) < 2 || any(xx < 0) || any(is.na(xx))) { NA } else { - xxx <- sum((xx$subgroup / sum(xx$subgroup, na.rm = TRUE)) * (xx$subgroup / xx$TotalPopE)) + xxx <- sum( + (xx$subgroup / sum(xx$subgroup, na.rm = TRUE)) * (xx$subgroup / xx$TotalPopE), + na.rm = TRUE + ) px <- sum(xx$subgroup, na.rm = TRUE) / sum(xx$TotalPopE, na.rm = TRUE) (xxx - px) / (1 - px) } @@ -60,12 +71,14 @@ v_fun <- function(x, omit_NAs) { # Internal function for the aspatial Location Quotient (Sudano et al. 2013) ## Returns NA value if only one smaller geography in a larger geography lq_fun <- function(x, omit_NAs) { - xx <- x[ , c("TotalPopE", "subgroup", "GEOID")] + xx <- x[ , c('TotalPopE', 'subgroup', 'GEOID')] if (omit_NAs == TRUE) { xx <- xx[stats::complete.cases(xx), ] } if (nrow(x) < 2 || any(xx < 0) || any(is.na(xx))) { NA } else { - LQ <- (xx$subgroup / xx$TotalPopE) / (sum(xx$subgroup, na.rm = TRUE) / sum(xx$TotalPopE, na.rm = TRUE)) + p_im <- xx$subgroup / xx$TotalPopE + if (anyNA(p_im)) { p_im[is.na(p_im), ] <- 0 } + LQ <- p_im / (sum(xx$subgroup, na.rm = TRUE) / sum(xx$TotalPopE, na.rm = TRUE)) df <- data.frame(LQ = LQ, GEOID = xx$GEOID) return(df) } @@ -74,13 +87,15 @@ lq_fun <- function(x, omit_NAs) { # Internal function for the aspatial Local Exposure & Isolation (Bemanian & Beyer 2017) metric ## Returns NA value if only one smaller geography in a larger geography lexis_fun <- function(x, omit_NAs) { - xx <- x[ , c("TotalPopE", "subgroup", "subgroup_ixn", "GEOID")] + xx <- x[ , c('TotalPopE', 'subgroup', 'subgroup_ixn', 'GEOID')] if (omit_NAs == TRUE) { xx <- xx[stats::complete.cases(xx), ] } if (nrow(x) < 2 || any(xx < 0) || any(is.na(xx))) { NA } else { p_im <- xx$subgroup / xx$TotalPopE + if (anyNA(p_im)) { p_im[is.na(p_im), ] <- 0 } p_in <- xx$subgroup_ixn / xx$TotalPopE + if (anyNA(p_in)) { p_in[is.na(p_in), ] <- 0 } P_m <- sum(xx$subgroup, na.rm = TRUE) / sum(xx$TotalPopE, na.rm = TRUE) P_n <- sum(xx$subgroup_ixn, na.rm = TRUE) / sum(xx$TotalPopE, na.rm = TRUE) LExIs <- car::logit(p_im * p_in) - car::logit(P_m * P_n) @@ -88,3 +103,19 @@ lexis_fun <- function(x, omit_NAs) { return(df) } } + +# Internal function for the aspatial Delta (Hoover 1941) +## Returns NA value if only one smaller geography in a larger geography +del_fun <- function(x, omit_NAs) { + xx <- x[ , c('subgroup', 'ALAND')] + if (omit_NAs == TRUE) { xx <- xx[stats::complete.cases(xx), ] } + if (nrow(x) < 2 || any(xx < 0) || any(is.na(xx))) { + NA + } else { + 0.5 * sum( + abs((xx$subgroup / sum(xx$subgroup, na.rm = TRUE)) - (xx$ALAND / sum(xx$ALAND, na.rm = TRUE)) + ), + na.rm = TRUE + ) + } +} diff --git a/R/white.R b/R/white.R index b5a3505..04f4208 100644 --- a/R/white.R +++ b/R/white.R @@ -2,8 +2,8 @@ #' #' Compute the aspatial Correlation Ratio (White) of a selected racial/ethnic subgroup(s) and U.S. geographies. #' -#' @param geo_large Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = "county"}. -#' @param geo_small Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = "tract"}. +#' @param geo_large Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}. +#' @param geo_small Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}. #' @param year Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available. #' @param subgroup Character string specifying the racial/ethnic subgroup(s). See Details for available choices. #' @param omit_NAs Logical. If FALSE, will compute index for a larger geographical unit only if all of its smaller geographical units have values. The default is TRUE. @@ -14,33 +14,33 @@ #' #' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: #' \itemize{ -#' \item **B03002_002**: not Hispanic or Latino \code{"NHoL"} -#' \item **B03002_003**: not Hispanic or Latino, white alone \code{"NHoLW"} -#' \item **B03002_004**: not Hispanic or Latino, Black or African American alone \code{"NHoLB"} -#' \item **B03002_005**: not Hispanic or Latino, American Indian and Alaska Native alone \code{"NHoLAIAN"} -#' \item **B03002_006**: not Hispanic or Latino, Asian alone \code{"NHoLA"} -#' \item **B03002_007**: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"NHoLNHOPI"} -#' \item **B03002_008**: not Hispanic or Latino, Some other race alone \code{"NHoLSOR"} -#' \item **B03002_009**: not Hispanic or Latino, Two or more races \code{"NHoLTOMR"} -#' \item **B03002_010**: not Hispanic or Latino, Two races including Some other race \code{"NHoLTRiSOR"} -#' \item **B03002_011**: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"NHoLTReSOR"} -#' \item **B03002_012**: Hispanic or Latino \code{"HoL"} -#' \item **B03002_013**: Hispanic or Latino, white alone \code{"HoLW"} -#' \item **B03002_014**: Hispanic or Latino, Black or African American alone \code{"HoLB"} -#' \item **B03002_015**: Hispanic or Latino, American Indian and Alaska Native alone \code{"HoLAIAN"} -#' \item **B03002_016**: Hispanic or Latino, Asian alone \code{"HoLA"} -#' \item **B03002_017**: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"HoLNHOPI"} -#' \item **B03002_018**: Hispanic or Latino, Some other race alone \code{"HoLSOR"} -#' \item **B03002_019**: Hispanic or Latino, Two or more races \code{"HoLTOMR"} -#' \item **B03002_020**: Hispanic or Latino, Two races including Some other race \code{"HoLTRiSOR"} -#' \item **B03002_021**: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"HoLTReSOR"} +#' \item **B03002_002**: not Hispanic or Latino \code{'NHoL'} +#' \item **B03002_003**: not Hispanic or Latino, white alone \code{'NHoLW'} +#' \item **B03002_004**: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +#' \item **B03002_005**: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +#' \item **B03002_006**: not Hispanic or Latino, Asian alone \code{'NHoLA'} +#' \item **B03002_007**: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +#' \item **B03002_008**: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +#' \item **B03002_009**: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +#' \item **B03002_010**: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +#' \item **B03002_011**: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +#' \item **B03002_012**: Hispanic or Latino \code{'HoL'} +#' \item **B03002_013**: Hispanic or Latino, white alone \code{'HoLW'} +#' \item **B03002_014**: Hispanic or Latino, Black or African American alone \code{'HoLB'} +#' \item **B03002_015**: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +#' \item **B03002_016**: Hispanic or Latino, Asian alone \code{'HoLA'} +#' \item **B03002_017**: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +#' \item **B03002_018**: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +#' \item **B03002_019**: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +#' \item **B03002_020**: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +#' \item **B03002_021**: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} #' } #' #' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. #' -#' V removes the asymmetry from the Isolation Index (Bell) by controlling for the effect of population composition. The Isolation Index (Bell) is some measure of the probability that a member of one subgroup(s) will meet or interact with a member of another subgroup(s) with higher values signifying higher probability of interaction (less isolation). V can range in value from 0 to 1. +#' V removes the asymmetry from the Isolation Index (Bell) by controlling for the effect of population composition. The Isolation Index (Bell) is some measure of the probability that a member of one subgroup(s) will meet or interact with a member of another subgroup(s) with higher values signifying higher probability of interaction (less isolation). V can range in value from -Inf to Inf. #' -#' Larger geographies available include state \code{geo_large = "state"}, county \code{geo_large = "county"}, and census tract \code{geo_large = "tract"} levels. Smaller geographies available include, county \code{geo_small = "county"}, census tract \code{geo_small = "tract"}, and census block group \code{geo_small = "block group"} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the V value returned is NA. +#' Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, and census tract \code{geo_large = 'tract'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the V value returned is NA. #' #' @return An object of class 'list'. This is a named list with the following components: #' @@ -66,102 +66,149 @@ #' #' # Isolation of non-Hispanic Black populations #' ## of census tracts within Georgia, U.S.A., counties (2020) -#' white(geo_large = "county", geo_small = "tract", state = "GA", -#' year = 2020, subgroup = "NHoLB") +#' white( +#' geo_large = 'county', +#' geo_small = 'tract', +#' state = 'GA', +#' year = 2020, +#' subgroup = 'NHoLB' +#' ) #' #' } #' -white <- function(geo_large = "county", geo_small = "tract", year = 2020, subgroup, omit_NAs = TRUE, quiet = FALSE, ...) { +white <- function(geo_large = 'county', + geo_small = 'tract', + year = 2020, + subgroup, + omit_NAs = TRUE, + quiet = FALSE, + ...) { # Check arguments - match.arg(geo_large, choices = c("state", "county", "tract")) - match.arg(geo_small, choices = c("county", "tract", "block group")) + match.arg(geo_large, choices = c('state', 'county', 'tract')) + match.arg(geo_small, choices = c('county', 'tract', 'block group')) stopifnot(is.numeric(year), year >= 2009) # all variables available 2009 onward - match.arg(subgroup, several.ok = TRUE, - choices = c("NHoL", "NHoLW", "NHoLB", "NHoLAIAN", "NHoLA", "NHoLNHOPI", - "NHoLSOR", "NHoLTOMR", "NHoLTRiSOR", "NHoLTReSOR", - "HoL", "HoLW", "HoLB", "HoLAIAN", "HoLA", "HoLNHOPI", - "HoLSOR", "HoLTOMR", "HoLTRiSOR", "HoLTReSOR")) + match.arg( + subgroup, + several.ok = TRUE, + choices = c( + 'NHoL', + 'NHoLW', + 'NHoLB', + 'NHoLAIAN', + 'NHoLA', + 'NHoLNHOPI', + 'NHoLSOR', + 'NHoLTOMR', + 'NHoLTRiSOR', + 'NHoLTReSOR', + 'HoL', + 'HoLW', + 'HoLB', + 'HoLAIAN', + 'HoLA', + 'HoLNHOPI', + 'HoLSOR', + 'HoLTOMR', + 'HoLTRiSOR', + 'HoLTReSOR' + ) + ) # Select census variables - vars <- c(TotalPop = "B03002_001", - NHoL = "B03002_002", - NHoLW = "B03002_003", - NHoLB = "B03002_004", - NHoLAIAN = "B03002_005", - NHoLA = "B03002_006", - NHoLNHOPI = "B03002_007", - NHoLSOR = "B03002_008", - NHoLTOMR = "B03002_009", - NHoLTRiSOR = "B03002_010", - NHoLTReSOR = "B03002_011", - HoL = "B03002_012", - HoLW = "B03002_013", - HoLB = "B03002_014", - HoLAIAN = "B03002_015", - HoLA = "B03002_016", - HoLNHOPI = "B03002_017", - HoLSOR = "B03002_018", - HoLTOMR = "B03002_019", - HoLTRiSOR = "B03002_020", - HoLTReSOR = "B03002_021") - - selected_vars <- vars[c("TotalPop", subgroup)] + vars <- c( + TotalPop = 'B03002_001', + NHoL = 'B03002_002', + NHoLW = 'B03002_003', + NHoLB = 'B03002_004', + NHoLAIAN = 'B03002_005', + NHoLA = 'B03002_006', + NHoLNHOPI = 'B03002_007', + NHoLSOR = 'B03002_008', + NHoLTOMR = 'B03002_009', + NHoLTRiSOR = 'B03002_010', + NHoLTReSOR = 'B03002_011', + HoL = 'B03002_012', + HoLW = 'B03002_013', + HoLB = 'B03002_014', + HoLAIAN = 'B03002_015', + HoLA = 'B03002_016', + HoLNHOPI = 'B03002_017', + HoLSOR = 'B03002_018', + HoLTOMR = 'B03002_019', + HoLTRiSOR = 'B03002_020', + HoLTReSOR = 'B03002_021' + ) + + selected_vars <- vars[c('TotalPop', subgroup)] out_names <- names(selected_vars) # save for output - in_subgroup <- paste(subgroup, "E", sep = "") + in_subgroup <- paste(subgroup, 'E', sep = '') # Acquire V variables and sf geometries - v_data <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo_small, - year = year, - output = "wide", - variables = selected_vars, - geometry = TRUE, - keep_geo_vars = TRUE, ...))) + v_data <- suppressMessages(suppressWarnings( + tidycensus::get_acs( + geography = geo_small, + year = year, + output = 'wide', + variables = selected_vars, + geometry = TRUE, + keep_geo_vars = TRUE, + ... + ) + )) + # Format output - if (geo_small == "county") { - v_data <- sf::st_drop_geometry(v_data) %>% - tidyr::separate(NAME.y, into = c("county", "state"), sep = ",") + if (geo_small == 'county') { + v_data <- v_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('county', 'state'), sep = ',') } - if (geo_small == "tract") { - v_data <- sf::st_drop_geometry(v_data) %>% - tidyr::separate(NAME.y, into = c("tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]", "", tract)) + if (geo_small == 'tract') { + v_data <- v_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate(tract = gsub('[^0-9\\.]', '', tract)) } - if (geo_small == "block group") { - v_data <- sf::st_drop_geometry(v_data) %>% - tidyr::separate(NAME.y, into = c("block.group", "tract", "county", "state"), sep = ",") %>% - dplyr::mutate(tract = gsub("[^0-9\\.]", "", tract), - block.group = gsub("[^0-9\\.]", "", block.group)) + if (geo_small == 'block group') { + v_data <- v_data %>% + sf::st_drop_geometry() %>% + tidyr::separate(NAME.y, into = c('block.group', 'tract', 'county', 'state'), sep = ',') %>% + dplyr::mutate( + tract = gsub('[^0-9\\.]', '', tract), block.group = gsub('[^0-9\\.]', '', block.group) + ) } # Grouping IDs for R computation - if (geo_large == "tract") { + if (geo_large == 'tract') { v_data <- v_data %>% - dplyr::mutate(oid = paste(.$STATEFP, .$COUNTYFP, .$TRACTCE, sep = ""), - state = stringr::str_trim(state), - county = stringr::str_trim(county)) + dplyr::mutate( + oid = paste(.$STATEFP, .$COUNTYFP, .$TRACTCE, sep = ''), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) } - if (geo_large == "county") { + if (geo_large == 'county') { v_data <- v_data %>% - dplyr::mutate(oid = paste(.$STATEFP, .$COUNTYFP, sep = ""), - state = stringr::str_trim(state), - county = stringr::str_trim(county)) + dplyr::mutate( + oid = paste(.$STATEFP, .$COUNTYFP, sep = ''), + state = stringr::str_trim(state), + county = stringr::str_trim(county) + ) } - if (geo_large == "state") { + if (geo_large == 'state') { v_data <- v_data %>% - dplyr::mutate(oid = .$STATEFP, - state = stringr::str_trim(state)) + dplyr::mutate(oid = .$STATEFP, state = stringr::str_trim(state)) } # Count of racial/ethnic subgroup populations ## Count of racial/ethnic comparison subgroup population if (length(in_subgroup) == 1) { v_data <- v_data %>% - dplyr::mutate(subgroup = .[ , in_subgroup]) + dplyr::mutate(subgroup = .[, in_subgroup]) } else { v_data <- v_data %>% - dplyr::mutate(subgroup = rowSums(.[ , in_subgroup])) + dplyr::mutate(subgroup = rowSums(.[, in_subgroup])) } # Compute V or \mathit{Eta}^{2} @@ -176,53 +223,59 @@ white <- function(geo_large = "county", geo_small = "tract", year = 2020, subgro split(., f = list(v_data$oid)) %>% lapply(., FUN = v_fun, omit_NAs = omit_NAs) %>% utils::stack(.) %>% - dplyr::mutate(V = values, - oid = ind) %>% + dplyr::mutate(V = values, oid = ind) %>% dplyr::select(V, oid) # Warning for missingness of census characteristics - missingYN <- v_data[ , c("TotalPopE", in_subgroup)] + missingYN <- v_data[, c('TotalPopE', in_subgroup)] names(missingYN) <- out_names missingYN <- missingYN %>% - tidyr::pivot_longer(cols = dplyr::everything(), - names_to = "variable", - values_to = "val") %>% + tidyr::pivot_longer( + cols = dplyr::everything(), + names_to = 'variable', + values_to = 'val' + ) %>% dplyr::group_by(variable) %>% - dplyr::summarise(total = dplyr::n(), - n_missing = sum(is.na(val)), - percent_missing = paste0(round(mean(is.na(val)) * 100, 2), " %")) + dplyr::summarise( + total = dplyr::n(), + n_missing = sum(is.na(val)), + percent_missing = paste0(round(mean(is.na(val)) * 100, 2), ' %') + ) if (quiet == FALSE) { # Warning for missing census data if (sum(missingYN$n_missing) > 0) { - message("Warning: Missing census data") + message('Warning: Missing census data') } } # Format output - if (geo_large == "state") { - v <- merge(v_data, Vtmp) %>% + if (geo_large == 'state') { + v <- v_data %>% + dplyr::left_join(Vtmp, by = dplyr::join_by(oid)) %>% dplyr::select(oid, state, V) %>% unique(.) %>% dplyr::mutate(GEOID = oid) %>% dplyr::select(GEOID, state, V) %>% - .[.$GEOID != "NANA", ] + .[.$GEOID != 'NANA',] } - if (geo_large == "county") { - v <- merge(v_data, Vtmp) %>% + if (geo_large == 'county') { + v <- v_data %>% + dplyr::left_join(Vtmp, by = dplyr::join_by(oid)) %>% dplyr::select(oid, state, county, V) %>% unique(.) %>% dplyr::mutate(GEOID = oid) %>% dplyr::select(GEOID, state, county, V) %>% - .[.$GEOID != "NANA", ] + .[.$GEOID != 'NANA',] } - if (geo_large == "tract") { - v <- merge(v_data, Vtmp) %>% + if (geo_large == 'tract') { + v <- v_data %>% + dplyr::left_join(Vtmp, by = dplyr::join_by(oid)) %>% dplyr::select(oid, state, county, tract, V) %>% unique(.) %>% dplyr::mutate(GEOID = oid) %>% dplyr::select(GEOID, state, county, tract, V) %>% - .[.$GEOID != "NANA", ] + .[.$GEOID != 'NANA',] } v <- v %>% @@ -231,11 +284,9 @@ white <- function(geo_large = "county", geo_small = "tract", year = 2020, subgro v_data <- v_data %>% dplyr::arrange(GEOID) %>% - dplyr::as_tibble() + dplyr::as_tibble() - out <- list(v = v, - v_data = v_data, - missing = missingYN) + out <- list(v = v, v_data = v_data, missing = missingYN) return(out) } diff --git a/R/zzz.R b/R/zzz.R index e8d7e4a..1579f24 100644 --- a/R/zzz.R +++ b/R/zzz.R @@ -1,3 +1,3 @@ .onAttach <- function(...) { - packageStartupMessage(paste("\nWelcome to {ndi} version ", utils::packageDescription("ndi")$Version, "\n> help(\"ndi\") # for documentation\n> citation(\"ndi\") # for how to cite\n", sep = ""), appendLF = TRUE) + packageStartupMessage(paste('\nWelcome to {ndi} version ', utils::packageDescription('ndi')$Version, '\n> help(\'ndi\') # for documentation\n> citation(\'ndi\') # for how to cite\n', sep = ''), appendLF = TRUE) } diff --git a/README.md b/README.md index d220950..694eddd 100644 --- a/README.md +++ b/README.md @@ -1,4 +1,4 @@ -ndi: Neighborhood Deprivation Indices +ndi: Neighborhood Deprivation Indices =================================================== @@ -7,83 +7,98 @@ ndi: Neighborhood Deprivation Indices -**Date repository last updated**: January 23, 2024 +**Date repository last updated**: July 06, 2024 ### Overview -The `ndi` package is a suite of `R` functions to compute various metrics of socio-economic deprivation and disparity in the United States. Some metrics are considered "spatial" because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other metrics are "aspatial" because they only consider the value within each census geography. Two types of aspatial NDI are available: (1) based on [Messer _et al._ (2006)](https://doi.org/10.1007/s11524-006-9094-x) and (2) based on [Andrews _et al._ (2020)](https://doi.org/10.1080/17445647.2020.1750066) and [Slotman _et al._ (2022)](https://doi.org/10.1016/j.dib.2022.108002) who use variables chosen by [Roux and Mair (2010)](https://doi.org/10.1111/j.1749-6632.2009.05333.x). Both are a decomposition of various demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates (ACS-5; 2006-2010 onward) pulled by the [tidycensus](https://CRAN.R-project.org/package=tidycensus) package. Using data from the ACS-5 (2005-2009 onward), the `ndi` package can also compute the (1) spatial Racial Isolation Index (RI) based on [Anthopolos _et al._ (2011)](https://doi.org/10.1016/j.sste.2011.06.002), (2) spatial Educational Isolation Index (EI) based on [Bravo _et al._ (2021)](https://doi.org/10.3390/ijerph18179384), (3) aspatial Index of Concentration at the Extremes (ICE) based on [Feldman _et al._ (2015)](https://doi.org/10.1136/jech-2015-205728) and [Krieger _et al._ (2016)](https://doi.org/10.2105/AJPH.2015.302955), (4) aspatial racial/ethnic Dissimilarity Index (DI) based on [Duncan & Duncan (1955)](https://doi.org/10.2307/2088328), (5) aspatial income or racial/ethnic Atkinson Index (DI) based on [Atkinson (1970)](https://doi.org/10.1016/0022-0531(70)90039-6), (6) aspatial racial/ethnic Isolation Index (II) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and [Bell (1954)](https://doi.org/10.2307/2574118), (7) aspatial racial/ethnic Correlation Ratio based on [Bell (1954)](https://doi.org/10.2307/2574118) and [White (1986)](https://doi.org/10.2307/3644339), (8) aspatial racial/ethnic Location Quotient based on [Merton (1939)](https://doi.org/10.2307/2084686) and [Sudano _et al._ (2013)](https://doi.org/10.1016/j.healthplace.2012.09.015), and (9) aspatial racial/ethnic Local Exposure and Isolation metric based on [Bemanian & Beyer (2017)](https://doi.org/10.1158/1055-9965.EPI-16-0926). Also using data from the ACS-5 (2005-2009 onward), the `ndi` package can retrieve the aspatial Gini Index based on [Gini (1921)](https://doi.org/10.2307/2223319). +The `ndi` package is a suite of `R` functions to compute various metrics of socio-economic deprivation and disparity in the United States. Some metrics are considered 'spatial' because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other metrics are 'aspatial' because they only consider the value within each census geography. Two types of aspatial NDI are available: (1) based on [Messer et al. (2006)](https://doi.org/10.1007/s11524-006-9094-x) and (2) based on [Andrews et al. (2020)](https://doi.org/10.1080/17445647.2020.1750066) and [Slotman et al. (2022)](https://doi.org/10.1016/j.dib.2022.108002) who use variables chosen by [Roux and Mair (2010)](https://doi.org/10.1111/j.1749-6632.2009.05333.x). Both are a decomposition of various demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates (ACS-5; 2006-2010 onward) pulled by the [tidycensus](https://CRAN.R-project.org/package=tidycensus) package. Using data from the ACS-5 (2005-2009 onward), the `ndi` package can also compute the (1) spatial Racial Isolation Index (RI) based on [Anthopolos et al. (2011)](https://doi.org/10.1016/j.sste.2011.06.002), (2) spatial Educational Isolation Index (EI) based on [Bravo et al. (2021)](https://doi.org/10.3390/ijerph18179384), (3) aspatial Index of Concentration at the Extremes (ICE) based on [Feldman et al. (2015)](https://doi.org/10.1136/jech-2015-205728) and [Krieger et al. (2016)](https://doi.org/10.2105/AJPH.2015.302955), (4) aspatial racial/ethnic Dissimilarity Index (DI) based on [Duncan & Duncan (1955)](https://doi.org/10.2307/2088328), (5) aspatial income or racial/ethnic Atkinson Index (DI) based on [Atkinson (1970)](https://doi.org/10.1016/0022-0531(70)90039-6), (6) aspatial racial/ethnic Isolation Index (II) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and [Bell (1954)](https://doi.org/10.2307/2574118), (7) aspatial racial/ethnic Correlation Ratio based on [Bell (1954)](https://doi.org/10.2307/2574118) and [White (1986)](https://doi.org/10.2307/3644339), (8) aspatial racial/ethnic Location Quotient based on [Merton (1939)](https://doi.org/10.2307/2084686) and [Sudano et al. (2013)](https://doi.org/10.1016/j.healthplace.2012.09.015), (9) aspatial racial/ethnic Local Exposure and Isolation metric based on [Bemanian & Beyer (2017)](https://doi.org/10.1158/1055-9965.EPI-16-0926), and (10) aspatial racial/ethnic Delta based on [Hoover (1941)](https://doi.org/10.1017/S0022050700052980) and Duncan et al. (1961; LC:60007089). Also using data from the ACS-5 (2005-2009 onward), the `ndi` package can retrieve the aspatial Gini Index based on [Gini (1921)](https://doi.org/10.2307/2223319). ### Installation To install the release version from CRAN: - install.packages("ndi") + install.packages('ndi') To install the development version from GitHub: - devtools::install_github("idblr/ndi") + devtools::install_github('idblr/ndi') ### Available functions --++ - + + - + + - + + - + + - + + - + + - + - + + + + + + - + + - + + - + + - + + - + -
Function Description
anthopolosCompute the spatial Racial Isolation Index (RI) based on Anthopolos _et al._ (2011)Compute the spatial Racial Isolation Index (RI) based on Anthopolos et al. (2011)
atkinsonCompute the aspatial Atkinson Index (AI) based on Atkinson (1970)Compute the aspatial Atkinson Index (AI) based on Atkinson (1970)
bellCompute the aspatial racial/ethnic Isolation Index (II) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell (1954)Compute the aspatial racial/ethnic Isolation Index (II) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell (1954)
bemanian_beyerCompute the aspatial racial/ethnic Local Exposure and Isolation (LEx/Is) metric based on Bemanian & Beyer (2017)Compute the aspatial racial/ethnic Local Exposure and Isolation (LEx/Is) metric based on Bemanian & Beyer (2017)
bravoCompute the spatial Educational Isolation Index (EI) based on Bravo _et al._ (2021)Compute the spatial Educational Isolation Index (EI) based on Bravo et al. (2021)
duncanCompute the aspatial racial/ethnic Dissimilarity Index (DI) based on Duncan & Duncan (1955)Compute the aspatial racial/ethnic Dissimilarity Index (DI) based on Duncan & Duncan (1955)
giniRetrieve the aspatial Gini Index based on Gini (1921)Retrieve the aspatial Gini Index based on Gini (1921)
hooverCompute the aspatial racial/ethnic Delta (DEL) based on Hoover (1941) and Duncan et al. (1961; LC:60007089).
kriegerCompute the aspatial Index of Concentration at the Extremes (ICE) based on Feldman _et al._ (2015) and Krieger _et al._ (2016)Compute the aspatial Index of Concentration at the Extremes (ICE) based on Feldman et al. (2015) and Krieger et al. (2016)
messerCompute the aspatial Neighborhood Deprivation Index (NDI) based on Messer _et al._ (2006)Compute the aspatial Neighborhood Deprivation Index (NDI) based on Messer et al. (2006)
powell_wileyCompute the aspatial Neighborhood Deprivation Index (NDI) based on Andrews _et al._ (2020) and Slotman _et al._ (2022) with variables chosen by Roux and Mair (2010)Compute the aspatial Neighborhood Deprivation Index (NDI) based on Andrews et al. (2020) and Slotman et al. (2022) with variables chosen by Roux and Mair (2010)
sudanoCompute the aspatial racial/ethnic Location Quotient (LQ) based on Merton (1938) and Sudano _et al._ (2013)Compute the aspatial racial/ethnic Location Quotient (LQ) based on Merton (1938) and Sudano et al. (2013)
whiteCompute the aspatial racial/ethnic Correlation Ratio (V) based on Bell (1954) and White (1986)Compute the aspatial racial/ethnic Correlation Ratio (V) based on Bell (1954) and White (1986)
+
The repository also includes the code to create the project hexagon sticker. -

+

### Available sample dataset @@ -91,44 +106,49 @@ The repository also includes the code to create the project hexagon sticker. --++ - + + - + -
Data Description
DCtracts2020A sample data set containing information about U.S. Census American Community Survey 5-year estimate data for the District of Columbia census tracts (2020). The data are obtained from the tidycensus package and formatted for the messer() and powell_wiley() functions input.A sample data set containing information about U.S. Census American Community Survey 5-year estimate data for the District of Columbia census tracts (2020). The data are obtained from the tidycensus package and formatted for the messer() and powell_wiley() functions input.
+
### Author -* **Ian D. Buller** - *Social & Scientific Systems, Inc., a division of DLH Corporation, Silver Spring, Maryland (current)* - *Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland (original)* - [GitHub](https://github.com/idblr) - [ORCID](https://orcid.org/0000-0001-9477-8582) +* **Ian D. Buller** - *Social & Scientific Systems, Inc., a DLH Corporation Holding Company, Bethesda, Maryland (current)* - *Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland (original)* - [GitHub](https://github.com/idblr) - [ORCID](https://orcid.org/0000-0001-9477-8582) See also the list of [contributors](https://github.com/idblr/ndi/graphs/contributors) who participated in this package, including: * **Jacob Englert** - *Biostatistics and Bioinformatics Doctoral Program, Laney Graduate School, Emory University, Atlanta, Georgia* - [GitHub](https://github.com/jacobenglert) +* **Jessica Gleason** - *Epidemiology Branch, Division of Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland* - [ORCID](https://orcid.org/0000-0001-9877-7931) + * **Chris Prener** - *Real World Evidence Center of Excellence, Pfizer, Inc.* - [GitHub](https://github.com/chris-prener) - [ORCID](https://orcid.org/0000-0002-4310-9888) -* **Jessica Gleason** - *Epidemiology Branch, Division of Population Health Research, Eunice Kennedy Shriver National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland* - [ORCID](https://orcid.org/0000-0001-9877-7931) +* **Davis Vaughan** - *Posit* - [GitHub](https://github.com/DavisVaughan) - [ORCID](https://orcid.org/0000-0003-4777-038X) Thank you to those who suggested additional metrics, including: -* **Jessica Madrigal** - *Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland* - [ORCID](https://orcid.org/0000-0001-5303-5109) - * **David Berrigan** - *Behavioral Research Program, Division of Cancer Control and Population Sciences, National Cancer Institute, National Institutes of Health, Rockville, Maryland* - [ORCID](https://orcid.org/0000-0002-5333-179X) +* **Symielle Gaston** - *Social and Environmental Determinants of Health Equity Group, Epidemiology Branch, National Institute of Environmental Health Sciences, National Institutes of Health, Research Triangle Park, North Carolina* - [ORCID](https://orcid.org/0000-0001-9495-1592) + +* **Jessica Madrigal** - *Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland* - [ORCID](https://orcid.org/0000-0001-5303-5109) + ### Getting Started * Step 1: Obtain a unique access key from the U.S. Census Bureau. Follow [this link](http://api.census.gov/data/key_signup.html) to obtain one. -* Step 2: Specify your access key in the `anthopolos()`, `atkinson()`, `bell()`, `bemanian_beyer()`, `bravo()`, `duncan()`, `gini()`, `krieger()`, `messer()`, `powell_wiley()`, `sudano()`, or `white()` functions using the internal `key` argument or by using the `census_api_key()` function from the `tidycensus` package before running the `anthopolos()`, `atkinson()`, `bell()`, `bemanian_beyer()`, `bravo()`, `duncan()`, `gini()`, `krieger()`, `messer()`, `powell_wiley()`, `sudano()`, or `white()` functions (see an example below). +* Step 2: Specify your access key in the `anthopolos()`, `atkinson()`, `bell()`, `bemanian_beyer()`, `bravo()`, `duncan()`, `gini()`, `hoover()`, `krieger()`, `messer()`, `powell_wiley()`, `sudano()`, or `white()` functions using the internal `key` argument or by using the `census_api_key()` function from the `tidycensus` package before running the `anthopolos()`, `atkinson()`, `bell()`, `bemanian_beyer()`, `bravo()`, `duncan()`, `gini()`, `hoover()`, `krieger()`, `messer()`, `powell_wiley()`, `sudano()`, or `white()` functions (see an example below). ### Usage @@ -139,8 +159,8 @@ Thank you to those who suggested additional metrics, including: library(ndi) library(ggplot2) -library(sf) # dependency fo the "ndi" package -library(tidycensus) # a dependency for the "ndi" package +library(sf) # dependency fo the 'ndi' package +library(tidycensus) # a dependency for the 'ndi' package library(tigris) # -------- # @@ -149,70 +169,92 @@ library(tigris) ## Access Key for census data download ### Obtain one at http://api.census.gov/data/key_signup.html -tidycensus::census_api_key("...") # INSERT YOUR OWN KEY FROM U.S. CENSUS API +census_api_key('...') # INSERT YOUR OWN KEY FROM U.S. CENSUS API # ---------------------- # # Calculate NDI (Messer) # # ---------------------- # # Compute the NDI (Messer) values (2016-2020 5-year ACS) for Washington, D.C. census tracts -messer2020DC <- messer(state = "DC", year = 2020) +messer2020DC <- messer(state = 'DC', year = 2020) # ------------------------------ # # Outputs from messer() function # # ------------------------------ # -# A tibble containing the identification, geographic name, NDI (Messer) values, NDI (Messer) quartiles, and raw census characteristics for each tract +# A tibble containing the identification, geographic name, NDI (Messer) values, NDI (Messer) +# quartiles, and raw census characteristics for each tract messer2020DC$ndi # The results from the principal component analysis used to compute the NDI (Messer) values messer2020DC$pca -# A tibble containing a breakdown of the missingingness of the census characteristics used to compute the NDI (Messer) values +# A tibble containing a breakdown of the missingingness of the census characteristics +# used to compute the NDI (Messer) values messer2020DC$missing # -------------------------------------- # # Visualize the messer() function output # # -------------------------------------- # -# Obtain the 2020 census tracts from the "tigris" package -tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) +# Obtain the 2020 census tracts from the 'tigris' package +tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE) # Join the NDI (Messer) values to the census tract geometry -DC2020messer <- dplyr::left_join(tract2020DC, messer2020DC$ndi, by = "GEOID") +DC2020messer <- tract2020DC %>% + left_join(messer2020DC$ndi, by = 'GEOID') # Visualize the NDI (Messer) values (2016-2020 5-year ACS) for Washington, D.C. census tracts ## Continuous Index -ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020messer, - ggplot2::aes(fill = NDI), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Neighborhood Deprivation Index\nContinuous (Messer, non-imputed)", - subtitle = "Washington, D.C. tracts as the referent") +ggplot() + + geom_sf( + data = DC2020messer, + aes(fill = NDI), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_c() + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Neighborhood Deprivation Index\nContinuous (Messer, non-imputed)', + subtitle = 'Washington, D.C. tracts as the referent' + ) ## Categorical Index (Quartiles) -### Rename "9-NDI not avail" level as NA for plotting -DC2020messer$NDIQuartNA <- factor(replace(as.character(DC2020messer$NDIQuart), - DC2020messer$NDIQuart == "9-NDI not avail", - NA), - c(levels(DC2020messer$NDIQuart)[-5], NA)) - -ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020messer, - ggplot2::aes(fill = NDIQuartNA), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE), - na.value = "grey50") + - ggplot2::labs(fill = "Index (Categorical)", - caption = "Source: U.S. Census ACS 2016-2020 estimates") + - ggplot2::ggtitle("Neighborhood Deprivation Index\nQuartiles (Messer, non-imputed)", - subtitle = "Washington, D.C. tracts as the referent") +### Rename '9-NDI not avail' level as NA for plotting +DC2020messer$NDIQuartNA <- + factor( + replace( + as.character(DC2020messer$NDIQuart), + DC2020messer$NDIQuart == '9-NDI not avail', + NA + ), + c(levels(DC2020messer$NDIQuart)[-5], NA) + ) + +ggplot() + + geom_sf( + data = DC2020messer, + aes(fill = NDIQuartNA), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_d( + guide = guide_legend(reverse = TRUE), + na.value = 'grey50' + ) + + labs( + fill = 'Index (Categorical)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Neighborhood Deprivation Index\nQuartiles (Messer, non-imputed)', + subtitle = 'Washington, D.C. tracts as the referent' + ) ``` ![](man/figures/messer1.png) ![](man/figures/messer2.png) @@ -222,66 +264,94 @@ ggplot2::ggplot() + # Calculate NDI (Powell-Wiley) # # ---------------------------- # -# Compute the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Washington, D.C. census tracts -powell_wiley2020DC <- powell_wiley(state = "DC", year = 2020) -powell_wiley2020DCi <- powell_wiley(state = "DC", year = 2020, imp = TRUE) # impute missing values +# Compute the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for +# Washington, D.C. census tracts +powell_wiley2020DC <- powell_wiley(state = 'DC', year = 2020) +# impute missing values +powell_wiley2020DCi <- powell_wiley(state = 'DC', year = 2020, imp = TRUE) # ------------------------------------ # # Outputs from powell_wiley() function # # ------------------------------------ # -# A tibble containing the identification, geographic name, NDI (Powell-Wiley) value, and raw census characteristics for each tract +# A tibble containing the identification, geographic name, NDI (Powell-Wiley) value, and +# raw census characteristics for each tract powell_wiley2020DC$ndi -# The results from the principal component analysis used to compute the NDI (Powell-Wiley) values +# The results from the principal component analysis used to +# compute the NDI (Powell-Wiley) values powell_wiley2020DC$pca -# A tibble containing a breakdown of the missingingness of the census characteristics used to compute the NDI (Powell-Wiley) values +# A tibble containing a breakdown of the missingingness of the census characteristics used to +# compute the NDI (Powell-Wiley) values powell_wiley2020DC$missing # -------------------------------------------- # # Visualize the powell_wiley() function output # # -------------------------------------------- # -# Obtain the 2020 census tracts from the "tigris" package -tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) +# Obtain the 2020 census tracts from the 'tigris' package +tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE) # Join the NDI (powell_wiley) values to the census tract geometry -DC2020powell_wiley <- dplyr::left_join(tract2020DC, powell_wiley2020DC$ndi, by = "GEOID") -DC2020powell_wiley <- dplyr::left_join(DC2020powell_wiley, powell_wiley2020DCi$ndi, by = "GEOID") +DC2020powell_wiley <- tract2020DC + left_join(powell_wiley2020DC$ndi, by = 'GEOID') +DC2020powell_wiley <- DC2020powell_wiley + left_join(powell_wiley2020DCi$ndi, by = 'GEOID') -# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Washington, D.C. census tracts +# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for +# Washington, D.C. census tracts ## Non-imputed missing tracts (Continuous) -ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020powell_wiley, - ggplot2::aes(fill = NDI.x), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Neighborhood Deprivation Index\nContinuous (Powell-Wiley, non-imputed)", - subtitle = "Washington, D.C. tracts as the referent") +ggplot() + + geom_sf( + data = DC2020powell_wiley, + aes(fill = NDI.x), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_c() + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Neighborhood Deprivation Index\nContinuous (Powell-Wiley, non-imputed)', + subtitle = 'Washington, D.C. tracts as the referent' + ) ## Non-imputed missing tracts (Categorical quintiles) -### Rename "9-NDI not avail" level as NA for plotting -DC2020powell_wiley$NDIQuintNA.x <- factor(replace(as.character(DC2020powell_wiley$NDIQuint.x), - DC2020powell_wiley$NDIQuint.x == "9-NDI not avail", - NA), - c(levels(DC2020powell_wiley$NDIQuint.x)[-6], NA)) - -ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020powell_wiley, - ggplot2::aes(fill = NDIQuintNA.x), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE), - na.value = "grey50") + - ggplot2::labs(fill = "Index (Categorical)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Neighborhood Deprivation Index\nPopulation-weighted Quintiles (Powell-Wiley, non-imputed)", - subtitle = "Washington, D.C. tracts as the referent") +### Rename '9-NDI not avail' level as NA for plotting +DC2020powell_wiley$NDIQuintNA.x <- factor( + replace( + as.character(DC2020powell_wiley$NDIQuint.x), + DC2020powell_wiley$NDIQuint.x == '9-NDI not avail', + NA + ), + c(levels(DC2020powell_wiley$NDIQuint.x)[-6], NA) +) + + +ggplot() + + geom_sf( + data = DC2020powell_wiley, + aes(fill = NDIQuintNA.x), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_d( + guide = guide_legend(reverse = TRUE), + na.value = 'grey50' + ) + + labs( + fill = 'Index (Categorical)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Neighborhood Deprivation Index\n + Population-weighted Quintiles (Powell-Wiley, non-imputed)', + subtitle = 'Washington, D.C. tracts as the referent' + ) ``` ![](man/figures/powell_wiley1.png) @@ -289,35 +359,53 @@ ggplot2::ggplot() + ``` r ## Imputed missing tracts (Continuous) -ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020powell_wiley, - ggplot2::aes(fill = NDI.y), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Neighborhood Deprivation Index\nContinuous (Powell-Wiley, imputed)", - subtitle = "Washington, D.C. tracts as the referent") +ggplot() + + geom_sf( + data = DC2020powell_wiley, + aes(fill = NDI.y), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_c() + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Neighborhood Deprivation Index\nContinuous (Powell-Wiley, imputed)', + subtitle = 'Washington, D.C. tracts as the referent' + ) ## Imputed missing tracts (Categorical quintiles) -### Rename "9-NDI not avail" level as NA for plotting -DC2020powell_wiley$NDIQuintNA.y <- factor(replace(as.character(DC2020powell_wiley$NDIQuint.y), - DC2020powell_wiley$NDIQuint.y == "9-NDI not avail", - NA), - c(levels(DC2020powell_wiley$NDIQuint.y)[-6], NA)) - -ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020powell_wiley, - ggplot2::aes(fill = NDIQuintNA.y), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE), - na.value = "grey50") + - ggplot2::labs(fill = "Index (Categorical)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Neighborhood Deprivation Index\nPopulation-weighted Quintiles (Powell-Wiley, imputed)", - subtitle = "Washington, D.C. tracts as the referent") +### Rename '9-NDI not avail' level as NA for plotting +DC2020powell_wiley$NDIQuintNA.y <- factor( + replace( + as.character(DC2020powell_wiley$NDIQuint.y), + DC2020powell_wiley$NDIQuint.y == '9-NDI not avail', + NA + ), + c(levels(DC2020powell_wiley$NDIQuint.y)[-6], NA) +) + +ggplot() + + geom_sf( + data = DC2020powell_wiley, + aes(fill = NDIQuintNA.y), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_d( + guide = guide_legend(reverse = TRUE), + na.value = 'grey50' + ) + + labs( + fill = 'Index (Categorical)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Neighborhood Deprivation Index\nPopulation-weighted Quintiles (Powell-Wiley, imputed)', + subtitle = 'Washington, D.C. tracts as the referent' + ) ``` ![](man/figures/powell_wiley3.png) @@ -329,10 +417,15 @@ ggplot2::ggplot() + # --------------------------- # # Merge the two NDI metrics (Messer and Powell-Wiley, imputed) -ndi2020DC <- dplyr::left_join(messer2020DC$ndi, powell_wiley2020DCi$ndi, by = "GEOID", suffix = c(".messer", ".powell_wiley")) +ndi2020DC <- messer2020DC$ndi %>% + left_join( + powell_wiley2020DCi$ndi, + by = 'GEOID', + suffix = c('.messer', '.powell_wiley') + ) -# Check the correlation the two NDI metrics (Messer and Powell-Wiley, imputed) as continuous values -cor(ndi2020DC$NDI.messer, ndi2020DC$NDI.powell_wiley, use = "complete.obs") # Pearsons r = 0.975 +# Check the correlation of two NDI metrics (Messer & Powell-Wiley, imputed) as continuous values +cor(ndi2020DC$NDI.messer, ndi2020DC$NDI.powell_wiley, use = 'complete.obs') # Pearson's r=0.975 # Check the similarity of the two NDI metrics (Messer and Powell-Wiley, imputed) as quartiles table(ndi2020DC$NDIQuart, ndi2020DC$NDIQuint) @@ -344,24 +437,31 @@ table(ndi2020DC$NDIQuart, ndi2020DC$NDIQuint) # ---------------------------- # # Gini Index based on Gini (1921) from the ACS-5 -gini2020DC <- gini(state = "DC", year = 2020) +gini2020DC <- gini(state = 'DC', year = 2020) -# Obtain the 2020 census tracts from the "tigris" package -tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) +# Obtain the 2020 census tracts from the 'tigris' package +tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE) # Join the Gini Index values to the census tract geometry -gini2020DC <- dplyr::left_join(tract2020DC, gini2020DC$gini, by = "GEOID") - -ggplot2::ggplot() + - ggplot2::geom_sf(data = gini2020DC, - ggplot2::aes(fill = gini), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Gini Index\nGrey color denotes no data", - subtitle = "Washington, D.C. tracts") +gini2020DC <- tract2020DC %>% + left_join(gini2020DC$gini, by = 'GEOID') + +ggplot() + + geom_sf( + data = gini2020DC, + aes(fill = gini), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_c() + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Gini Index\nGrey color denotes no data', + subtitle = 'Washington, D.C. tracts' + ) ``` ![](man/figures/gini.png) @@ -373,24 +473,32 @@ ggplot2::ggplot() + # Racial Isolation Index based on Anthopolos et al. (2011) ## Selected subgroup: Not Hispanic or Latino, Black or African American alone -ri2020DC <- anthopolos(state = "DC", year = 2020, subgroup = "NHoLB") +ri2020DC <- anthopolos(state = 'DC', year = 2020, subgroup = 'NHoLB') -# Obtain the 2020 census tracts from the "tigris" package -tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) +# Obtain the 2020 census tracts from the 'tigris' package +tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE) # Join the RI (Anthopolos) values to the census tract geometry -ri2020DC <- dplyr::left_join(tract2020DC, ri2020DC$ri, by = "GEOID") - -ggplot2::ggplot() + - ggplot2::geom_sf(data = ri2020DC, - ggplot2::aes(fill = RI), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Racial Isolation Index\nNot Hispanic or Latino, Black or African American alone (Anthopolos)", - subtitle = "Washington, D.C. tracts (not corrected for edge effects)") +ri2020DC <- tract2020DC %>% + left_join(ri2020DC$ri, by = 'GEOID') + +ggplot() + + geom_sf( + data = ri2020DC, + aes(fill = RI), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_c() + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Racial Isolation Index\n + Not Hispanic or Latino, Black or African American alone (Anthopolos)', + subtitle = 'Washington, D.C. tracts (not corrected for edge effects)' + ) ``` ![](man/figures/ri.png) @@ -402,24 +510,31 @@ ggplot2::ggplot() + # Educational Isolation Index based on Bravo et al. (2021) ## Selected subgroup: without four-year college degree -ei2020DC <- bravo(state = "DC", year = 2020, subgroup = c("LtHS", "HSGiE", "SCoAD")) +ei2020DC <- bravo(state = 'DC', year = 2020, subgroup = c('LtHS', 'HSGiE', 'SCoAD')) -# Obtain the 2020 census tracts from the "tigris" package -tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) +# Obtain the 2020 census tracts from the 'tigris' package +tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE) # Join the EI (Bravo) values to the census tract geometry -ei2020DC <- dplyr::left_join(tract2020DC, ei2020DC$ei, by = "GEOID") - -ggplot2::ggplot() + - ggplot2::geom_sf(data = ei2020DC, - ggplot2::aes(fill = EI), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Educational Isolation Index\nWithout a four-year college degree (Bravo)", - subtitle = "Washington, D.C. tracts (not corrected for edge effects)") +ei2020DC <- tract2020DC %>% + left_join(ei2020DC$ei, by = 'GEOID') + +ggplot() + + geom_sf( + data = ei2020DC, + aes(fill = EI), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_c() + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + )+ + ggtitle( + 'Educational Isolation Index\nWithout a four-year college degree (Bravo)', + subtitle = 'Washington, D.C. tracts (not corrected for edge effects)' + ) ``` ![](man/figures/ei.png) @@ -429,76 +544,124 @@ ggplot2::ggplot() + # Compute aspatial Index of Concentration at the Extremes (Krieger) # # ----------------------------------------------------------------- # -# Five Indices of Concentration at the Extremes based on Feldman et al. (2015) and Krieger et al. (2016) +# Five Indices of Concentration at the Extremes based on Feldman et al. (2015) and +# Krieger et al. (2016) -ice2020DC <- krieger(state = "DC", year = 2020) +ice2020DC <- krieger(state = 'DC', year = 2020) -# Obtain the 2020 census tracts from the "tigris" package -tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) +# Obtain the 2020 census tracts from the 'tigris' package +tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE) # Join the ICEs (Krieger) values to the census tract geometry -ice2020DC <- dplyr::left_join(tract2020DC, ice2020DC$ice, by = "GEOID") +ice2020DC <- tract2020DC %>% + left_join(ice2020DC$ice, by = 'GEOID') # Plot ICE for Income -ggplot2::ggplot() + - ggplot2::geom_sf(data = ice2020DC, - ggplot2::aes(fill = ICE_inc), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1,1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Index of Concentration at the Extremes\nIncome (Krieger)", - subtitle = "80th income percentile vs. 20th income percentile") +ggplot() + + geom_sf( + data = ice2020DC, + aes(fill = ICE_inc), + color = 'white' + ) + + theme_bw() + + scale_fill_gradient2( + low = '#998ec3', + mid = '#f7f7f7', + high = '#f1a340', + limits = c(-1, 1) + ) + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Index of Concentration at the Extremes\nIncome (Krieger)', + subtitle = '80th income percentile vs. 20th income percentile' + ) ``` ![](man/figures/ice1.png) ```r # Plot ICE for Education -ggplot2::ggplot() + - ggplot2::geom_sf(data = ice2020DC, - ggplot2::aes(fill = ICE_edu), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1,1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Index of Concentration at the Extremes\nEducation (Krieger)", - subtitle = "less than high school vs. four-year college degree or more") +ggplot() + + geom_sf( + data = ice2020DC, + aes(fill = ICE_edu), + color = 'white' + ) + + theme_bw() + + scale_fill_gradient2( + low = '#998ec3', + mid = '#f7f7f7', + high = '#f1a340', + limits = c(-1, 1) + ) + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Index of Concentration at the Extremes\nEducation (Krieger)', + subtitle = 'less than high school vs. four-year college degree or more' + ) ``` ![](man/figures/ice2.png) ```r # Plot ICE for Race/Ethnicity -ggplot2::ggplot() + - ggplot2::geom_sf(data = ice2020DC, - ggplot2::aes(fill = ICE_rewb), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1, 1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Index of Concentration at the Extremes\nRace/Ethnicity (Krieger)", - subtitle = "white non-Hispanic vs. black non-Hispanic") +ggplot() + + geom_sf( + data = ice2020DC, + aes(fill = ICE_rewb), + color = 'white' + ) + + theme_bw() + + scale_fill_gradient2( + low = '#998ec3', + mid = '#f7f7f7', + high = '#f1a340', + limits = c(-1, 1) + ) + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Index of Concentration at the Extremes\nRace/Ethnicity (Krieger)', + subtitle = 'white non-Hispanic vs. black non-Hispanic' + ) ``` ![](man/figures/ice3.png) ``` # Plot ICE for Income and Race/Ethnicity Combined -## white non-Hispanic in 80th income percentile vs. black (including Hispanic) in 20th income percentile -ggplot2::ggplot() + - ggplot2::geom_sf(data = ice2020DC, - ggplot2::aes(fill = ICE_wbinc), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1, 1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Index of Concentration at the Extremes\nIncome and race/ethnicity combined (Krieger)", - subtitle = "white non-Hispanic in 80th income percentile vs. black (incl. Hispanic) in 20th inc. percentile") +## white non-Hispanic in 80th income percentile vs. +## black (including Hispanic) in 20th income percentile +ggplot() + + geom_sf( + data = ice2020DC, + aes(fill = ICE_wbinc), + color = 'white' + ) + + theme_bw() + + scale_fill_gradient2( + low = '#998ec3', + mid = '#f7f7f7', + high = '#f1a340', + limits = c(-1, 1) + ) + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Index of Concentration at the Extremes\nIncome and race/ethnicity combined (Krieger)', + subtitle = 'white non-Hispanic in 80th income percentile vs. + black (incl. Hispanic) in 20th inc. percentile' + ) ``` ![](man/figures/ice4.png) @@ -506,16 +669,28 @@ ggplot2::ggplot() + ```r # Plot ICE for Income and Race/Ethnicity Combined ## white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile -ggplot2::ggplot() + - ggplot2::geom_sf(data = ice2020DC, - ggplot2::aes(fill = ICE_wpcinc), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1, 1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Index of Concentration at the Extremes\nIncome and race/ethnicity combined (Krieger)", - subtitle = "white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile") +ggplot() + + geom_sf( + data = ice2020DC, + aes(fill = ICE_wpcinc), + color = 'white' + ) + + theme_bw() + + scale_fill_gradient2( + low = '#998ec3', + mid = '#f7f7f7', + high = '#f1a340', + limits = c(-1, 1) + ) + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Index of Concentration at the Extremes\nIncome and race/ethnicity combined (Krieger)', + subtitle = 'white non-Hispanic in 80th income percentile vs. + white non-Hispanic in 20th income percentile' + ) ``` ![](man/figures/ice5.png) @@ -530,25 +705,38 @@ ggplot2::ggplot() + ## Selected subgroup reference: Not Hispanic or Latino, white alone ## Selected large geography: census tract ## Selected small geography: census block group -di2020DC <- duncan(geo_large = "tract", geo_small = "block group", state = "DC", - year = 2020, subgroup = "NHoLB", subgroup_ref = "NHoLW") - -# Obtain the 2020 census tracts from the "tigris" package -tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) +di2020DC <- duncan( + geo_large = 'tract', + geo_small = 'block group', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + subgroup_ref = 'NHoLW' +) + +# Obtain the 2020 census tracts from the 'tigris' package +tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE) # Join the DI (Duncan & Duncan) values to the census tract geometry -di2020DC <- dplyr::left_join(tract2020DC, di2020DC$di, by = "GEOID") - -ggplot2::ggplot() + - ggplot2::geom_sf(data = di2020DC, - ggplot2::aes(fill = DI), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_viridis_c(limits = c(0, 1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Dissimilarity Index (Duncan & Duncan)\nWashington, D.C. census block groups to tracts", - subtitle = "Black non-Hispanic vs. white non-Hispanic") +di2020DC <- tract2020DC %>% + left_join(di2020DC$di, by = 'GEOID') + +ggplot() + + geom_sf( + data = di2020DC, + aes(fill = DI), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_c(limits = c(0, 1)) + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Dissimilarity Index (Duncan & Duncan)\nWashington, D.C. census block groups to tracts', + subtitle = 'Black non-Hispanic vs. white non-Hispanic' + ) ``` ![](man/figures/di.png) @@ -563,25 +751,37 @@ ggplot2::ggplot() + ## Selected large geography: census tract ## Selected small geography: census block group ## Default epsilon (0.5 or over- and under-representation contribute equally) -ai2020DC <- atkinson(geo_large = "tract", geo_small = "block group", state = "DC", - year = 2020, subgroup = "NHoLB") +ai2020DC <- atkinson( + geo_large = 'tract', + geo_small = 'block group', + state = 'DC', + year = 2020, + subgroup = 'NHoLB' +) -# Obtain the 2020 census tracts from the "tigris" package -tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) +# Obtain the 2020 census tracts from the 'tigris' package +tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE) # Join the AI (Atkinson) values to the census tract geometry -ai2020DC <- dplyr::left_join(tract2020DC, ai2020DC$ai, by = "GEOID") - -ggplot2::ggplot() + - ggplot2::geom_sf(data = ai2020DC, - ggplot2::aes(fill = AI), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_viridis_c(limits = c(0, 1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates") + - ggplot2::ggtitle("Atkinson Index (Atkinson)\nWashington, D.C. census block groups to tracts", - subtitle = expression(paste("Black non-Hispanic (", epsilon, " = 0.5)"))) +ai2020DC <- tract2020DC %>% + left_join(ai2020DC$ai, by = 'GEOID') + +ggplot() + + geom_sf( + data = ai2020DC, + aes(fill = AI), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_c(limits = c(0, 1)) + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Atkinson Index (Atkinson)\nWashington, D.C. census block groups to tracts', + subtitle = expression(paste('Black non-Hispanic (', epsilon, ' = 0.5)')) + ) ``` ![](man/figures/ai.png) @@ -596,25 +796,38 @@ ggplot2::ggplot() + ## Selected interaction subgroup: Not Hispanic or Latino, Black or African American alone ## Selected large geography: census tract ## Selected small geography: census block group -ii2020DC <- bell(geo_large = "tract", geo_small = "block group", state = "DC", - year = 2020, subgroup = "NHoLB", subgroup_ixn = "NHoLW") - -# Obtain the 2020 census tracts from the "tigris" package -tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) +ii2020DC <- bell( + geo_large = 'tract', + geo_small = 'block group', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW' +) + +# Obtain the 2020 census tracts from the 'tigris' package +tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE) # Join the II (Bell) values to the census tract geometry -ii2020DC <- dplyr::left_join(tract2020DC, ii2020DC$ii, by = "GEOID") - -ggplot2::ggplot() + - ggplot2::geom_sf(data = ii2020DC, - ggplot2::aes(fill = II), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_viridis_c(limits = c(0, 1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates") + - ggplot2::ggtitle("Isolation Index (Bell)\nWashington, D.C. census block groups to tracts", - subtitle = "Black non-Hispanic vs. white non-Hispanic") +ii2020DC <- tract2020DC %>% + left_join(ii2020DC$ii, by = 'GEOID') + +ggplot() + + geom_sf( + data = ii2020DC, + aes(fill = II), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_c(limits = c(0, 1)) + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Isolation Index (Bell)\nWashington, D.C. census block groups to tracts', + subtitle = 'Black non-Hispanic vs. white non-Hispanic' + ) ``` ![](man/figures/ii.png) @@ -628,25 +841,37 @@ ggplot2::ggplot() + ## Selected subgroup: Not Hispanic or Latino, Black or African American alone ## Selected large geography: census tract ## Selected small geography: census block group -v2020DC <- white(geo_large = "tract", geo_small = "block group", state = "DC", - year = 2020, subgroup = "NHoLB") +v2020DC <- white( + geo_large = 'tract', + geo_small = 'block group', + state = 'DC', + year = 2020, + subgroup = 'NHoLB' +) -# Obtain the 2020 census tracts from the "tigris" package -tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) +# Obtain the 2020 census tracts from the 'tigris' package +tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE) # Join the V (White) values to the census tract geometry -v2020DC <- dplyr::left_join(tract2020DC, v2020DC$v, by = "GEOID") - -ggplot2::ggplot() + - ggplot2::geom_sf(data = v2020DC, - ggplot2::aes(fill = V), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_viridis_c(limits = c(0, 1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates") + - ggplot2::ggtitle("Correlation Ratio (White)\nWashington, D.C. census block groups to tracts", - subtitle = "Black non-Hispanic") +v2020DC <- tract2020DC %>% + left_join(v2020DC$v, by = 'GEOID') + +ggplot() + + geom_sf( + data = v2020DC, + aes(fill = V), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_c(limits = c(0, 1)) + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Correlation Ratio (White)\nWashington, D.C. census block groups to tracts', + subtitle = 'Black non-Hispanic' + ) ``` ![](man/figures/v.png) @@ -660,25 +885,37 @@ ggplot2::ggplot() + ## Selected subgroup: Not Hispanic or Latino, Black or African American alone ## Selected large geography: state ## Selected small geography: census tract -lq2020DC <- sudano(geo_large = "state", geo_small = "tract", state = "DC", - year = 2020, subgroup = "NHoLB") +lq2020DC <- sudano( + geo_large = 'state', + geo_small = 'tract', + state = 'DC', + year = 2020, + subgroup = 'NHoLB' +) -# Obtain the 2020 census tracts from the "tigris" package -tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) +# Obtain the 2020 census tracts from the 'tigris' package +tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE) # Join the LQ (Sudano) values to the census tract geometry -lq2020DC <- dplyr::left_join(tract2020DC, lq2020DC$lq, by = "GEOID") - -ggplot2::ggplot() + - ggplot2::geom_sf(data = lq2020DC, - ggplot2::aes(fill = LQ), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates") + - ggplot2::ggtitle('Location Quotient (Sudano)\nWashington, D.C. census tracts vs. "state"', - subtitle = "Black non-Hispanic") +lq2020DC <- tract2020DC %>% + left_join(lq2020DC$lq, by = 'GEOID') + +ggplot() + + geom_sf( + data = lq2020DC, + aes(fill = LQ), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_c() + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Location Quotient (Sudano)\nWashington, D.C. census tracts vs. 'state'', + subtitle = 'Black non-Hispanic' + ) ``` ![](man/figures/lq.png) @@ -693,40 +930,98 @@ ggplot2::ggplot() + ## Selected interaction subgroup: Not Hispanic or Latino, Black or African American alone ## Selected large geography: state ## Selected small geography: census tract -lexis2020DC <- bemanian_beyer(geo_large = "state", geo_small = "tract", state = "DC", - year = 2020, subgroup = "NHoLB", subgroup_ixn = "NHoLW") - -# Obtain the 2020 census tracts from the "tigris" package -tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) +lexis2020DC <- bemanian_beyer( + geo_large = 'state', + geo_small = 'tract', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW' +) + +# Obtain the 2020 census tracts from the 'tigris' package +tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE) # Join the LEx/Is (Bemanian & Beyer) values to the census tract geometry -lexis2020DC <- dplyr::left_join(tract2020DC, lexis2020DC$lexis, by = "GEOID") - -ggplot2::ggplot() + - ggplot2::geom_sf(data = lexis2020DC, - ggplot2::aes(fill = LExIs), - color = "white") + - ggplot2::theme_bw() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates") + - ggplot2::ggtitle('Local Exposure and Isolation (Bemanian & Beyer) metric\nWashington, D.C. census block groups to tracts', - subtitle = "Black non-Hispanic vs. white non-Hispanic") +lexis2020DC <- tract2020DC %>% + left_join(lexis2020DC$lexis, by = 'GEOID') + +ggplot() + + geom_sf( + data = lexis2020DC, + aes(fill = LExIs), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_c() + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Local Exposure and Isolation (Bemanian & Beyer) metric\n + Washington, D.C. census block groups to tracts', + subtitle = 'Black non-Hispanic vs. white non-Hispanic' + ) ``` ![](man/figures/lexis.png) +```r +# --------------------------------------------- # +# Compute aspatial racial/ethnic Delta (Hoover) # +# --------------------------------------------- # + +# Delta based on Hoover (1941) and Duncan et al. (1961) +## Selected subgroup: Not Hispanic or Latino, Black or African American alone +## Selected large geography: census tract +## Selected small geography: census block group +del2020DC <- hoover( + geo_large = 'tract', + geo_small = 'block group', + state = 'DC', + year = 2020, + subgroup = 'NHoLB' +) + +# Obtain the 2020 census tracts from the 'tigris' package +tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE) + +# Join the DEL (Hoover) values to the census tract geometry +del2020DC <- tract2020DC %>% + left_join(del2020DC$del, by = 'GEOID') + +ggplot() + + geom_sf( + data = del2020DC, + aes(fill = DEL), + color = 'white' + ) + + theme_bw() + + scale_fill_viridis_c(limits = c(0, 1)) + + labs( + fill = 'Index (Continuous)', + caption = 'Source: U.S. Census ACS 2016-2020 estimates' + ) + + ggtitle( + 'Delta (Hoover)\nWashington, D.C. census block groups to tracts', + subtitle = 'Black non-Hispanic' + ) +``` + +![](man/figures/del.png) + ### Funding -This package was originally developed while the author was a postdoctoral fellow supported by the [Cancer Prevention Fellowship Program](https://cpfp.cancer.gov) at the [National Cancer Institute](https://www.cancer.gov). Any modifications since December 05, 2022 were made while the author was an employee of Social & Scientific Systems, Inc., a division of [DLH Corporation](https://www.dlhcorp.com). +This package was originally developed while the author was a postdoctoral fellow supported by the [Cancer Prevention Fellowship Program](https://cpfp.cancer.gov) at the [National Cancer Institute](https://www.cancer.gov). Any modifications since December 05, 2022 were made while the author was an employee of Social & Scientific Systems, Inc., a [DLH Corporation](https://www.dlhcorp.com) Holding Company. ### Acknowledgments -The `messer()` function functionalizes the code found in [Hruska _et al._ (2022)](https://doi.org/10.1016/j.janxdis.2022.102529) available on an [OSF repository](https://doi.org/10.17605/OSF.IO/M2SAV), but with percent with income less than $30K added to the computation based on [Messer _et al._ (2006)](https://doi.org/10.1007/s11524-006-9094-x). The `messer()` function also allows for the computation of NDI (Messer) for each year between 2010-2020 (when the U.S. census characteristics are available to date). There was no code companion to compute NDI (Powell-Wiley) included in [Andrews _et al._ (2020)](https://doi.org/10.1080/17445647.2020.1750066) or [Slotman _et al._ (2022)](https://doi.org/10.1016/j.dib.2022.108002), but the package author worked directly with the latter manuscript authors to replicate their `SAS` code in `R` for the `powell_wiley()` function. Please note: the NDI (Powell-Wiley) values will not exactly match (but will highly correlate with) those found in [Andrews _et al._ (2020)](https://doi.org/10.1080/17445647.2020.1750066) and [Slotman _et al._ (2022)](https://doi.org/10.1016/j.dib.2022.108002) because the two studies used a different statistical platform (i.e., `SPSS` and `SAS`, respectively) that intrinsically calculate the principal component analysis differently from `R`. The internal function to calculate the Atkinson Index is based on the `Atkinson()` function in the [DescTools](https://cran.r-project.org/package=DescTools) package. +The `messer()` function functionalizes the code found in [Hruska et al. (2022)](https://doi.org/10.1016/j.janxdis.2022.102529) available on an [OSF repository](https://doi.org/10.17605/OSF.IO/M2SAV), but with percent with income less than $30K added to the computation based on [Messer et al. (2006)](https://doi.org/10.1007/s11524-006-9094-x). The `messer()` function also allows for the computation of NDI (Messer) for each year between 2010-2020 (when the U.S. census characteristics are available to date). There was no code companion to compute NDI (Powell-Wiley) included in [Andrews et al. (2020)](https://doi.org/10.1080/17445647.2020.1750066) or [Slotman et al. (2022)](https://doi.org/10.1016/j.dib.2022.108002), but the package author worked directly with the latter manuscript authors to replicate their `SAS` code in `R` for the `powell_wiley()` function. Please note: the NDI (Powell-Wiley) values will not exactly match (but will highly correlate with) those found in [Andrews et al. (2020)](https://doi.org/10.1080/17445647.2020.1750066) and [Slotman et al. (2022)](https://doi.org/10.1016/j.dib.2022.108002) because the two studies used a different statistical platform (i.e., `SPSS` and `SAS`, respectively) that intrinsically calculate the principal component analysis differently from `R`. The internal function to calculate the Atkinson Index is based on the `Atkinson()` function in the [DescTools](https://cran.r-project.org/package=DescTools) package. When citing this package for publication, please follow: - citation("ndi") + citation('ndi') ### Questions? Feedback? diff --git a/cran-comments.md b/cran-comments.md index e8c6374..a80e2cc 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,10 +1,12 @@ -## This is the sixth resubmission +## This is the seventh resubmission * Actions taken since previous submission: - * 'DescTools' is now Suggests to fix Rd cross-references NOTE - * Fixed 'lost braces in \itemize' NOTE for `anthopolos()`, `atkinson()`, `bell()`, `bemanian_beyer()`, `bravo()`, `duncan()`, `krieger()`, `messer()`, `powell_wiley()`, `sudano()`, and `white()` functions - * Fixed 'Moved Permanently' content by replacing the old URL with the new URL - * Fixed citation for Slotman _et al._ (2022) in CITATION + * Added `hoover()` function to compute the aspatial racial/ethnic Delta (DEL) based on [Hoover (1941)](https://doi.org/10.1017/S0022050700052980) and Duncan et al. (1961; LC:60007089) + * Fixed bug in `bell()`, `bemanian_beyer()`, `duncan()`, `sudano()`, and `white()` when a smaller geography contains n=0 total population, will assign a value of zero (0) in the internal calculation instead of NA + * 'package.R' deprecated. Replaced with 'ndi-package.R'. + * Re-formatted code and documentation throughout for consistent readability + * Updated documentation about value range of V (White) from `{0 to 1}` to `{-Inf to Inf}` + * Updated examples in vignette (& README) an example for `hoover()` and a larger variety of U.S. states * Documentation for DESCRIPTION, README, NEWS, and vignette references the following DOIs, which throws a NOTE but are a valid URL: * @@ -15,10 +17,10 @@ * * -* Some tests and examples for `anthopolos()`, `atkinson()`, `bell()`, `bemanian_beyer()`, `bravo()`, `duncan()`, `gini()`, `krieger()`, `messer()`, `powell_wiley()`, `sudano()`, and `white()` functions require a Census API key so they are skipped if NULL or not run +* Some tests and examples for `anthopolos()`, `atkinson()`, `bell()`, `bemanian_beyer()`, `bravo()`, `duncan()`, `gini()`, `hoover()`, `krieger()`, `messer()`, `powell_wiley()`, `sudano()`, and `white()` functions require a Census API key so they are skipped if NULL or not run ## Test environments -* local Windows install, R 4.2.1 +* local Windows install, R 4.4.0 * win-builder, (devel, release, oldrelease) * Rhub * Fedora Linux, R-devel, clang, gfortran diff --git a/data-raw/get_DCtracts2020.R b/data-raw/get_DCtracts2020.R index cc25284..bb881c1 100644 --- a/data-raw/get_DCtracts2020.R +++ b/data-raw/get_DCtracts2020.R @@ -1,147 +1,216 @@ -# code to prepare `DCtracts2020` +# ----------------------------------------------------------------------------------------------- # +# Code to prepare `DCtracts2020` +# ----------------------------------------------------------------------------------------------- # +# +# Created by: Ian Buller, Ph.D., M.A. (GitHub: @idblr) +# Created on: 2022-07-23 +# +# Recently modified by: @idblr +# Recently modified on: 2024-07-06 +# +# Notes: +# A) 2024-07-06 (@idblr): Re-formatted +# ----------------------------------------------------------------------------------------------- # # ------------------ # -# Necessary packages # +# NECESSARY PACKAGES # # ------------------ # -library(dplyr) -library(tidycensus) -library(usethis) +loadedPackages <- c('dplyr', 'tidycensus', 'usethis') +suppressMessages(invisible(lapply(loadedPackages, library, character.only = TRUE))) # -------- # -# Settings # +# SETTINGS # # -------- # ## Access Key for census data download ### Obtain one at http://api.census.gov/data/key_signup.html -tidycensus::census_api_key("...") # INSERT YOUR OWN KEY FROM U.S. CENSUS API +census_api_key('...') # INSERT YOUR OWN KEY FROM U.S. CENSUS API # ---------------- # -# Data preparation # +# DATA PREPARATION # # ---------------- # # U.S. Census Bureau American Community Survey (ACS) 5-year variables ## For NDI (Messer) ### ACS-5 variables -messer_vars <- c(PctMenMgmtBusScArti_num1 = "C24030_018", PctMenMgmtBusScArti_num2 = "C24030_019", - PctMenMgmtBusScArti_den = "C24030_002", - PctCrwdHH_num1 = "B25014_005", PctCrwdHH_num2 = "B25014_006", - PctCrwdHH_num3 = "B25014_007", PctCrwdHH_num4 = "B25014_011", - PctCrwdHH_num5 = "B25014_012", PctCrwdHH_num6 = "B25014_013", - PctCrwdHH_den = "B25014_001", - PctHHPov_num = "B17017_002", PctHHPov_den = "B17017_001", - PctFemHeadKids_num1 = "B25115_012", PctFemHeadKids_num2 = "B25115_025", - PctFemHeadKids_den = "B25115_001", - PctPubAsst_num = "B19058_002", PctPubAsst_den = "B19058_001", - PctHHUnder30K_num1 = "B19001_002", PctHHUnder30K_num2 = "B19001_003", - PctHHUnder30K_num3 = "B19001_004", PctHHUnder30K_num4 = "B19001_005", - PctHHUnder30K_num5 = "B19001_006", PctHHUnder30K_den = "B19001_001", - PctEducLessThanHS_num = "B06009_002", PctEducLessThanHS_den = "B06009_001", - PctUnemp_num = "B23025_005", PctUnemp_den = "B23025_003") +messer_vars <- c( + PctMenMgmtBusScArti_num1 = 'C24030_018', + PctMenMgmtBusScArti_num2 = 'C24030_019', + PctMenMgmtBusScArti_den = 'C24030_002', + PctCrwdHH_num1 = 'B25014_005', + PctCrwdHH_num2 = 'B25014_006', + PctCrwdHH_num3 = 'B25014_007', + PctCrwdHH_num4 = 'B25014_011', + PctCrwdHH_num5 = 'B25014_012', + PctCrwdHH_num6 = 'B25014_013', + PctCrwdHH_den = 'B25014_001', + PctHHPov_num = 'B17017_002', + PctHHPov_den = 'B17017_001', + PctFemHeadKids_num1 = 'B25115_012', + PctFemHeadKids_num2 = 'B25115_025', + PctFemHeadKids_den = 'B25115_001', + PctPubAsst_num = 'B19058_002', + PctPubAsst_den = 'B19058_001', + PctHHUnder30K_num1 = 'B19001_002', + PctHHUnder30K_num2 = 'B19001_003', + PctHHUnder30K_num3 = 'B19001_004', + PctHHUnder30K_num4 = 'B19001_005', + PctHHUnder30K_num5 = 'B19001_006', + PctHHUnder30K_den = 'B19001_001', + PctEducLessThanHS_num = 'B06009_002', + PctEducLessThanHS_den = 'B06009_001', + PctUnemp_num = 'B23025_005', + PctUnemp_den = 'B23025_003' +) ### Obtain ACS-5 data for DC tracts in 2020 -DCtracts2020messer <- tidycensus::get_acs(geography = "tract", - year = 2020, - output = "wide", - variables = messer_vars, - state = "DC") +DCtracts2020messer <- get_acs( + geography = 'tract', + year = 2020, + output = 'wide', + variables = messer_vars, + state = 'DC' +) ### Format ACS-5 data for NDI (Messer) of DC tracts in 2020 DCtracts2020messer <- DCtracts2020messer[ , -2] # omit NAME feature (column) DCtracts2020messer <- DCtracts2020messer %>% - dplyr::mutate(OCC = (PctMenMgmtBusScArti_num1E + PctMenMgmtBusScArti_num2E) / PctMenMgmtBusScArti_denE, - CWD = (PctCrwdHH_num1E + PctCrwdHH_num2E + PctCrwdHH_num3E + - PctCrwdHH_num4E + PctCrwdHH_num5E + PctCrwdHH_num6E) / PctCrwdHH_denE, - POV = PctHHPov_numE / PctHHPov_denE, - FHH = (PctFemHeadKids_num1E + PctFemHeadKids_num2E) / PctFemHeadKids_denE, - PUB = PctPubAsst_numE / PctPubAsst_denE, - U30 = (PctHHUnder30K_num1E + PctHHUnder30K_num2E + PctHHUnder30K_num3E + - PctHHUnder30K_num4E + PctHHUnder30K_num5E) / PctHHUnder30K_denE, - EDU = PctEducLessThanHS_numE / PctEducLessThanHS_denE, - EMP = PctUnemp_numE / PctUnemp_denE) + mutate( + OCC = (PctMenMgmtBusScArti_num1E + PctMenMgmtBusScArti_num2E) / PctMenMgmtBusScArti_denE, + CWD = ( + PctCrwdHH_num1E + PctCrwdHH_num2E + PctCrwdHH_num3E + + PctCrwdHH_num4E + PctCrwdHH_num5E + PctCrwdHH_num6E + ) / PctCrwdHH_denE, + POV = PctHHPov_numE / PctHHPov_denE, + FHH = (PctFemHeadKids_num1E + PctFemHeadKids_num2E) / PctFemHeadKids_denE, + PUB = PctPubAsst_numE / PctPubAsst_denE, + U30 = ( + PctHHUnder30K_num1E + PctHHUnder30K_num2E + PctHHUnder30K_num3E + + PctHHUnder30K_num4E + PctHHUnder30K_num5E + ) / PctHHUnder30K_denE, + EDU = PctEducLessThanHS_numE / PctEducLessThanHS_denE, + EMP = PctUnemp_numE / PctUnemp_denE + ) ### Clean-up and format DCtracts2020messer <- DCtracts2020messer %>% - dplyr::select(GEOID, OCC, CWD, POV, FHH, PUB, U30, EDU, EMP) + select(GEOID, OCC, CWD, POV, FHH, PUB, U30, EDU, EMP) ## For NDI (Powell-Wiley) ### ACS-5 variables -powell_wiley_vars <- c(MedHHInc = "B19013_001", - PctRecvIDR_num = "B19054_002", PctRecvIDR_den = "B19054_001", - PctPubAsst_num = "B19058_002", PctPubAsst_den = "B19058_001", - MedHomeVal = "B25077_001", - PctMgmtBusScArti_num = "C24060_002", PctMgmtBusScArti_den = "C24060_001", - PctFemHeadKids_num1 = "B11005_007", PctFemHeadKids_num2 = "B11005_010", - PctFemHeadKids_den = "B11005_001", - PctOwnerOcc = "DP04_0046P", - PctNoPhone = "DP04_0075P", - PctNComPlmb = "DP04_0073P", - PctEduc_num25upHS = "S1501_C01_009", - PctEduc_num25upSC = "S1501_C01_010", - PctEduc_num25upAD = "S1501_C01_011", - PctEduc_num25upBD = "S1501_C01_012", - PctEduc_num25upGD = "S1501_C01_013", - PctEduc_den25up = "S1501_C01_006", - PctFamBelowPov = "S1702_C02_001", - PctUnempl = "S2301_C04_001", - TotalPopulation = "B01001_001") +powell_wiley_vars <- c( + MedHHInc = 'B19013_001', + PctRecvIDR_num = 'B19054_002', + PctRecvIDR_den = 'B19054_001', + PctPubAsst_num = 'B19058_002', + PctPubAsst_den = 'B19058_001', + MedHomeVal = 'B25077_001', + PctMgmtBusScArti_num = 'C24060_002', + PctMgmtBusScArti_den = 'C24060_001', + PctFemHeadKids_num1 = 'B11005_007', + PctFemHeadKids_num2 = 'B11005_010', + PctFemHeadKids_den = 'B11005_001', + PctOwnerOcc = 'DP04_0046P', + PctNoPhone = 'DP04_0075P', + PctNComPlmb = 'DP04_0073P', + PctEduc_num25upHS = 'S1501_C01_009', + PctEduc_num25upSC = 'S1501_C01_010', + PctEduc_num25upAD = 'S1501_C01_011', + PctEduc_num25upBD = 'S1501_C01_012', + PctEduc_num25upGD = 'S1501_C01_013', + PctEduc_den25up = 'S1501_C01_006', + PctFamBelowPov = 'S1702_C02_001', + PctUnempl = 'S2301_C04_001', + TotalPopulation = 'B01001_001' +) ### Obtain ACS-5 data for DC tracts in 2020 -DCtracts2020pw <- tidycensus::get_acs(geography = "tract", - year = 2020, - output = "wide", - variables = powell_wiley_vars, - state = "DC") +DCtracts2020pw <- get_acs( + geography = 'tract', + year = 2020, + output = 'wide', + variables = powell_wiley_vars, + state = 'DC' +) ### Format ACS-5 data for NDI (Powell-Wiley) of DC tracts in 2020 -DCtracts2020pw <- DCtracts2020pw[ , -2] # omit NAME feature (column) +DCtracts2020pw <- DCtracts2020pw[,-2] # omit NAME feature (column) DCtracts2020pw <- DCtracts2020pw %>% - dplyr::mutate(MedHHInc = MedHHIncE, - PctRecvIDR = PctRecvIDR_numE / PctRecvIDR_denE * 100, - PctPubAsst = PctPubAsst_numE / PctPubAsst_denE * 100, - MedHomeVal = MedHomeValE, - PctMgmtBusScArti = PctMgmtBusScArti_numE / PctMgmtBusScArti_denE * 100, - PctFemHeadKids = (PctFemHeadKids_num1E + PctFemHeadKids_num2E) / PctFemHeadKids_denE * 100, - PctOwnerOcc = PctOwnerOccE, - PctNoPhone = PctNoPhoneE, - PctNComPlmb = PctNComPlmbE, - PctEducHSPlus = (PctEduc_num25upHSE + PctEduc_num25upSCE + PctEduc_num25upADE + - PctEduc_num25upBDE + PctEduc_num25upGDE) / PctEduc_den25upE * 100, - PctEducBchPlus = (PctEduc_num25upBDE + PctEduc_num25upGDE) / PctEduc_den25upE * 100, - PctFamBelowPov = PctFamBelowPovE, - PctUnempl = PctUnemplE, - TotalPop = TotalPopulationE) %>% + mutate( + MedHHInc = MedHHIncE, + PctRecvIDR = PctRecvIDR_numE / PctRecvIDR_denE * 100, + PctPubAsst = PctPubAsst_numE / PctPubAsst_denE * 100, + MedHomeVal = MedHomeValE, + PctMgmtBusScArti = PctMgmtBusScArti_numE / PctMgmtBusScArti_denE * 100, + PctFemHeadKids = (PctFemHeadKids_num1E + PctFemHeadKids_num2E) / PctFemHeadKids_denE * 100, + PctOwnerOcc = PctOwnerOccE, + PctNoPhone = PctNoPhoneE, + PctNComPlmb = PctNComPlmbE, + PctEducHSPlus = ( + PctEduc_num25upHSE + PctEduc_num25upSCE + PctEduc_num25upADE + + PctEduc_num25upBDE + PctEduc_num25upGDE + ) / PctEduc_den25upE * 100, + PctEducBchPlus = (PctEduc_num25upBDE + PctEduc_num25upGDE) / PctEduc_den25upE * 100, + PctFamBelowPov = PctFamBelowPovE, + PctUnempl = PctUnemplE, + TotalPop = TotalPopulationE + ) %>% # Log transform median household income and median home value - # Reverse code percentages so that higher values represent more deprivation + # Reverse code percentages so that higher values represent more deprivation # Round percentages to 1 decimal place - dplyr::mutate(logMedHHInc = log(MedHHInc), - logMedHomeVal = log(MedHomeVal), - PctNoIDR = 100 - PctRecvIDR, - PctWorkClass = 100 - PctMgmtBusScArti, - PctNotOwnerOcc = 100 - PctOwnerOcc, - PctEducLTHS = 100 - PctEducHSPlus, - PctEducLTBch = 100 - PctEducBchPlus) %>% + mutate( + logMedHHInc = log(MedHHInc), + logMedHomeVal = log(MedHomeVal), + PctNoIDR = 100 - PctRecvIDR, + PctWorkClass = 100 - PctMgmtBusScArti, + PctNotOwnerOcc = 100 - PctOwnerOcc, + PctEducLTHS = 100 - PctEducHSPlus, + PctEducLTBch = 100 - PctEducBchPlus + ) %>% # Z-standardize the percentages - dplyr::mutate(PctNoIDRZ = scale(PctNoIDR), - PctPubAsstZ = scale(PctPubAsst), - PctWorkClassZ = scale(PctWorkClass), - PctFemHeadKidsZ = scale(PctFemHeadKids), - PctNotOwnerOccZ = scale(PctNotOwnerOcc), - PctNoPhoneZ = scale(PctNoPhone), - PctNComPlmbZ = scale(PctNComPlmb), - PctEducLTHSZ = scale(PctEducLTHS), - PctEducLTBchZ = scale(PctEducLTBch), - PctFamBelowPovZ = scale(PctFamBelowPov), - PctUnemplZ = scale(PctUnempl)) + mutate( + PctNoIDRZ = scale(PctNoIDR), + PctPubAsstZ = scale(PctPubAsst), + PctWorkClassZ = scale(PctWorkClass), + PctFemHeadKidsZ = scale(PctFemHeadKids), + PctNotOwnerOccZ = scale(PctNotOwnerOcc), + PctNoPhoneZ = scale(PctNoPhone), + PctNComPlmbZ = scale(PctNComPlmb), + PctEducLTHSZ = scale(PctEducLTHS), + PctEducLTBchZ = scale(PctEducLTBch), + PctFamBelowPovZ = scale(PctFamBelowPov), + PctUnemplZ = scale(PctUnempl) + ) ### Clean-up and format DCtracts2020pw <- DCtracts2020pw %>% - dplyr::select(GEOID, TotalPop, logMedHHInc, PctNoIDRZ, PctPubAsstZ, logMedHomeVal, PctWorkClassZ, - PctFemHeadKidsZ, PctNotOwnerOccZ, PctNoPhoneZ, PctNComPlmbZ, PctEducLTHSZ, - PctEducLTBchZ, PctFamBelowPovZ, PctUnemplZ) + select( + GEOID, + TotalPop, + logMedHHInc, + PctNoIDRZ, + PctPubAsstZ, + logMedHomeVal, + PctWorkClassZ, + PctFemHeadKidsZ, + PctNotOwnerOccZ, + PctNoPhoneZ, + PctNComPlmbZ, + PctEducLTHSZ, + PctEducLTBchZ, + PctFamBelowPovZ, + PctUnemplZ + ) -# Combine -DCtracts2020 <- dplyr::left_join(DCtracts2020messer, DCtracts2020pw, by = "GEOID") -DCtracts2020 <- DCtracts2020[ , c(1, 10, 2:9, 11:ncol(DCtracts2020))] # reorder so TotalPop is second feature (column) +# Combine +DCtracts2020 <- left_join(DCtracts2020messer, DCtracts2020pw, by = 'GEOID') +# reorder so TotalPop is second feature (column) +DCtracts2020 <- DCtracts2020[, c(1, 10, 2:9, 11:ncol(DCtracts2020))] -# Export -usethis::use_data(DCtracts2020, overwrite = TRUE) +# ---------------- # +# DATA EXPORTATION # +# ---------------- # + +use_data(DCtracts2020, overwrite = TRUE) + +# ----------------------------------------- END OF CODE ----------------------------------------- # diff --git a/dev/hex_ndi.R b/dev/hex_ndi.R index 192285a..58ee32d 100644 --- a/dev/hex_ndi.R +++ b/dev/hex_ndi.R @@ -1,72 +1,70 @@ -# ------------------------------------------------------------------------------ # -# Hexsticker for the GitHub Repository idblr/ndi -# ------------------------------------------------------------------------------ # +# ----------------------------------------------------------------------------------------------- # +# Hexagon sticker for the GitHub Repository idblr/ndi +# ----------------------------------------------------------------------------------------------- # # # Created by: Ian Buller, Ph.D., M.A. (GitHub: @idblr) -# Created on: July 23, 2022 +# Created on: 2022-07-23 # # Recently modified by: @idblr -# Recently modified on: August 04, 2022 +# Recently modified on: 2024-07-06 # # Notes: -# A) Uses the "hexSticker" package +# A) Uses the 'hexSticker' package # B) Subplot from an example computation of tract-level NDI (Messer) for Washington, D.C. (2020) # C) Hexsticker for the GitHub Repository https://github.com/idblr/ndi -# ------------------------------------------------------------------------------ # +# ----------------------------------------------------------------------------------------------- # -############ +# -------- # # PACKAGES # -############ +# -------- # -loadedPackages <- c("hexSticker", "ndi") +loadedPackages <- c('ggplot2', 'hexSticker', 'ndi', 'tidycensus', 'tigris') suppressMessages(invisible(lapply(loadedPackages, library, character.only = TRUE))) -############ +# -------- # # SETTINGS # -############ +# -------- # ## Access Key for census data download ### Obtain one at http://api.census.gov/data/key_signup.html -tidycensus::census_api_key("...") # INSERT YOUR OWN KEY FROM U.S. CENSUS API +census_api_key('...') # INSERT YOUR OWN KEY FROM U.S. CENSUS API -###################### +# ------------------ # # SUBPLOT GENERATION # -###################### +# ------------------ # # NDI 2020 -messer2020DC <- ndi::messer(state = "DC", year = 2020, imp = TRUE) +messer2020DC <- messer(state = 'DC', year = 2020, imp = TRUE) # Tracts 2020 -tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) +tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE) # Join -DC2020messer <- merge(tract2020DC, messer2020DC$ndi, by = "GEOID") +DC2020messer <- merge(tract2020DC, messer2020DC$ndi, by = 'GEOID') # Plot of tract-level NDI (Messer) for Washington, D.C. (2020) -dcp <- ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020messer, - ggplot2::aes(fill = NDI), - color = NA, - show.legend = FALSE) + - ggplot2::theme_void() + - ggplot2::theme(axis.text = ggplot2::element_blank()) + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "", - caption = "")+ - ggplot2::ggtitle("", subtitle = "") +dcp <- ggplot() + + geom_sf(data = DC2020messer, aes(fill = NDI), color = NA, show.legend = FALSE) + + theme_void() + + theme(axis.text = element_blank()) + + scale_fill_viridis_c() + + labs(fill = '', caption = '')+ + ggtitle('', subtitle = '') -##################### -# CREATE HEXSTICKER # -##################### +# ---------------------- # +# CREATE HEXAGON STICKER # +# ---------------------- # -s <- hexSticker::sticker(subplot = dcp, - package = "ndi", - p_size = 75, p_x = 0.55, p_y = 0.75, p_color = "#FDE724", # title - s_x = 1.15, s_y = 1.05, s_width = 2.1, s_height = 2.1, # symbol - h_fill = "#695488", # inside - h_color = "#440C54", # outline - dpi = 1000, # resolution - filename = "man/figures/ndi.png", - white_around_sticker = F) +s <-sticker( + subplot = dcp, + package = 'ndi', + p_size = 75, p_x = 0.55, p_y = 0.75, p_color = '#FDE724', # title + s_x = 1.15, s_y = 1.05, s_width = 2.1, s_height = 2.1, # symbol + h_fill = '#695488', # inside + h_color = '#440C54', # outline + dpi = 1000, # resolution + filename = file.path('man', 'figures', 'ndi.png'), + white_around_sticker = FALSE +) -# -------------------------------- END OF CODE --------------------------------- # +# ----------------------------------------- END OF CODE ----------------------------------------- # diff --git a/inst/CITATION b/inst/CITATION index 8a32f7c..76fb190 100755 --- a/inst/CITATION +++ b/inst/CITATION @@ -1,385 +1,421 @@ -bibentry(bibtype = "manual", - title = "ndi: Neighborhood Deprivation Indices", - author = c(as.person("Ian D. Buller")), - publisher = "The Comprehensive R Archive Network", - year = "2024", - number = "0.1.5", - doi = "10.5281/zenodo.6989030", - url = "https://cran.r-project.org/package=ndi", +bibentry(bibtype = 'manual', + title = 'ndi: Neighborhood Deprivation Indices', + author = c(as.person('Ian D. Buller')), + publisher = 'The Comprehensive R Archive Network', + year = '2024', + number = '0.1.6.9000.', + doi = '10.5281/zenodo.6989030', + url = 'https://cran.r-project.org/package=ndi', textVersion = - paste("Ian D. Buller (2024).", - "ndi: Neighborhood Deprivation Indices.", - "The Comprehensive R Archive Network.", - "v0.1.5.", - "DOI:10.5281/zenodo.6989030", - "Accessed by: https://cran.r-project.org/package=ndi"), + paste('Ian D. Buller (2024).', + 'ndi: Neighborhood Deprivation Indices.', + 'The Comprehensive R Archive Network.', + 'v0.1.6.9000.', + 'DOI:10.5281/zenodo.6989030', + 'Accessed by: https://cran.r-project.org/package=ndi'), - header = "To cite ndi in publications, please use the following and include the version number and DOI:" + header = 'To cite ndi in publications, please use the following and include the version number and DOI:' ) -bibentry(bibtype = "Article", - title = "A spatial measure of neighborhood level racial isolation applied to low birthweight, preterm birth, and birthweight in North Carolina", - author = c(as.person("Rebecca Anthopolos"), - as.person("Sherman A. James"), - as.person("Alan E. Gelfand"), - as.person("Marie Lynn Miranda")), - journal = "Spatial and Spatio-temporal Epidemiology", - year = "2011", - volume = "2", - number = "4", - pages = "235--246", - doi = "10.1016/j.sste.2011.06.002", +bibentry(bibtype = 'Article', + title = 'A spatial measure of neighborhood level racial isolation applied to low birthweight, preterm birth, and birthweight in North Carolina', + author = c(as.person('Rebecca Anthopolos'), + as.person('Sherman A. James'), + as.person('Alan E. Gelfand'), + as.person('Marie Lynn Miranda')), + journal = 'Spatial and Spatio-temporal Epidemiology', + year = '2011', + volume = '2', + number = '4', + pages = '235--246', + doi = '10.1016/j.sste.2011.06.002', textVersion = - paste("Rebecca Anthopolos, Sherman A. James, Alan E. Gelfand, Marie Lynn Miranda (2011).", - "A spatial measure of neighborhood level racial isolation applied to low birthweight, preterm birth, and birthweight in North Carolina.", - "Spatial and Spatio-temporal Epidemiology, 2(4), 235-246.", - "DOI:10.1016/j.sste.2011.06.002"), + paste('Rebecca Anthopolos, Sherman A. James, Alan E. Gelfand, Marie Lynn Miranda (2011).', + 'A spatial measure of neighborhood level racial isolation applied to low birthweight, preterm birth, and birthweight in North Carolina.', + 'Spatial and Spatio-temporal Epidemiology, 2(4), 235-246.', + 'DOI:10.1016/j.sste.2011.06.002'), - header = "If you computed RI (Anthopolos) values, please also cite:" + header = 'If you computed RI (Anthopolos) values, please also cite:' ) -bibentry(bibtype = "Article", - title = "On the measurement of inequality", - author = c(as.person("Anthony B. Atkinson")), - journal = "Journal of economic theory", - year = "1970", - volume = "2", - number = "3", - pages = "244--263", - doi = "10.1016/0022-0531(70)90039-6", +bibentry(bibtype = 'Article', + title = 'On the measurement of inequality', + author = c(as.person('Anthony B. Atkinson')), + journal = 'Journal of economic theory', + year = '1970', + volume = '2', + number = '3', + pages = '244--263', + doi = '10.1016/0022-0531(70)90039-6', textVersion = - paste("Anthony B. Atkinson (1970).", - "On the measurement of inequality.", - "Journal of economic theory, 2(3), 244-263.", - "DOI:10.1016/0022-0531(70)90039-6"), + paste('Anthony B. Atkinson (1970).', + 'On the measurement of inequality.', + 'Journal of economic theory, 2(3), 244-263.', + 'DOI:10.1016/0022-0531(70)90039-6'), - header = "If you computed AI (Atkinson) values, please also cite:" + header = 'If you computed AI (Atkinson) values, please also cite:' ) -bibentry(bibtype = "Book", - title = "The Social Areas of Los Angeles: Analysis and Typology", - author = c(as.person("Eshref Shevky"), - as.person("Marilyn Williams")), - year = "1949", - edition = "1st edition", - city = "Los Angeles", - publisher = "John Randolph Haynes and Dora Haynes Foundation", - isbn = "978-0-837-15637-8", +bibentry(bibtype = 'Book', + title = 'The Social Areas of Los Angeles: Analysis and Typology', + author = c(as.person('Eshref Shevky'), + as.person('Marilyn Williams')), + year = '1949', + edition = '1st edition', + city = 'Los Angeles', + publisher = 'John Randolph Haynes and Dora Haynes Foundation', + isbn = '978-0-837-15637-8', textVersion = - paste("Eshref Shevky, Marilyn Williams (1949).", - "The Social Areas of Los Angeles: Analysis and Typology.", - "1st Ed.", - "Los Angeles:John Randolph Haynes and Dora Haynes Foundation.", - "ISBN-13:978-0-837-15637-8"), + paste('Eshref Shevky, Marilyn Williams (1949).', + 'The Social Areas of Los Angeles: Analysis and Typology.', + '1st Ed.', + 'Los Angeles:John Randolph Haynes and Dora Haynes Foundation.', + 'ISBN-13:978-0-837-15637-8'), - header = "If you computed II (Bell) values, please also cite (1):" + header = 'If you computed II (Bell) values, please also cite (1):' ) -bibentry(bibtype = "Article", - title = "A Probability Model for the Measurement of Ecological Segregation", - author = c(as.person("Wendell Bell")), - journal = "Social Forces", - year = "1954", - volume = "32", - issue = "4", - pages = "357--364", - doi = "10.2307/2574118", +bibentry(bibtype = 'Article', + title = 'A Probability Model for the Measurement of Ecological Segregation', + author = c(as.person('Wendell Bell')), + journal = 'Social Forces', + year = '1954', + volume = '32', + issue = '4', + pages = '357--364', + doi = '10.2307/2574118', textVersion = - paste("Wendell Bell (1954).", - "A Probability Model for the Measurement of Ecological Segregation.", - "Social Forces, 32(4), 357-364.", - "DOI:10.2307/2574118"), + paste('Wendell Bell (1954).', + 'A Probability Model for the Measurement of Ecological Segregation.', + 'Social Forces, 32(4), 357-364.', + 'DOI:10.2307/2574118'), - header = "And (2):" + header = 'And (2):' ) -bibentry(bibtype = "Article", - title = "Measures Matter: The Local Exposure/Isolation (LEx/Is) Metrics and Relationships between Local-Level Segregation and Breast Cancer Survival", - author = c(as.person("Amin Bemanian"), - as.person("Kirsten M.M. Beyer")), - journal = "Cancer Epidemiology, Biomarkers & Prevention", - year = "2017", - volume = "26", - issue = "4", - pages = "516--524", - doi = "10.1158/1055-9965.EPI-16-0926", +bibentry(bibtype = 'Article', + title = 'Measures Matter: The Local Exposure/Isolation (LEx/Is) Metrics and Relationships between Local-Level Segregation and Breast Cancer Survival', + author = c(as.person('Amin Bemanian'), + as.person('Kirsten M.M. Beyer')), + journal = 'Cancer Epidemiology, Biomarkers & Prevention', + year = '2017', + volume = '26', + issue = '4', + pages = '516--524', + doi = '10.1158/1055-9965.EPI-16-0926', textVersion = - paste("Amin Bemanian, Kirsten M.M. Beyer (2017).", - "Measures Matter: The Local Exposure/Isolation (LEx/Is) Metrics and Relationships between Local-Level Segregation and Breast Cancer Survival.", - "Cancer Epidemiology, Biomarkers & Prevention, 26(4), 516-524.", - "DOI:10.1158/1055-9965.EPI-16-0926"), + paste('Amin Bemanian, Kirsten M.M. Beyer (2017).', + 'Measures Matter: The Local Exposure/Isolation (LEx/Is) Metrics and Relationships between Local-Level Segregation and Breast Cancer Survival.', + 'Cancer Epidemiology, Biomarkers & Prevention, 26(4), 516-524.', + 'DOI:10.1158/1055-9965.EPI-16-0926'), - header = "If you computed LEx/Is (Bemanian & Beyer) values, please also cite:" + header = 'If you computed LEx/Is (Bemanian & Beyer) values, please also cite:' ) -bibentry(bibtype = "Article", - title = "Assessing Disparity Using Measures of Racial and Educational Isolation", - author = c(as.person("Mercedes A. Bravo"), - as.person("Man Chong Leong"), - as.person("Alan E. Gelfand"), - as.person("Marie Lynn Miranda")), - journal = "International Journal of Environmental Research and Public Health", - year = "2021", - volume = "18", - number = "17", - pages = "9384", - doi = "10.3390/ijerph18179384", +bibentry(bibtype = 'Article', + title = 'Assessing Disparity Using Measures of Racial and Educational Isolation', + author = c(as.person('Mercedes A. Bravo'), + as.person('Man Chong Leong'), + as.person('Alan E. Gelfand'), + as.person('Marie Lynn Miranda')), + journal = 'International Journal of Environmental Research and Public Health', + year = '2021', + volume = '18', + number = '17', + pages = '9384', + doi = '10.3390/ijerph18179384', textVersion = - paste("Mercedes A. Bravo, Man Chong Leong, Alan E. Gelfand, Marie Lynn Miranda (2021).", - "Assessing Disparity Using Measures of Racial and Educational Isolation.", - "International Journal of Environmental Research and Public Health, 18(17), 9384.", - "DOI:10.3390/ijerph18179384"), + paste('Mercedes A. Bravo, Man Chong Leong, Alan E. Gelfand, Marie Lynn Miranda (2021).', + 'Assessing Disparity Using Measures of Racial and Educational Isolation.', + 'International Journal of Environmental Research and Public Health, 18(17), 9384.', + 'DOI:10.3390/ijerph18179384'), - header = "If you computed EI (Bravo) values, please also cite:" + header = 'If you computed EI (Bravo) values, please also cite:' ) -bibentry(bibtype = "Article", - title = "A Methodological Analysis of Segregation Indexes", - author = c(as.person("Otis D. Duncan"), - as.person("Beverly Duncan")), - journal = "American Sociological Review", - year = "1955", - volume = "20", - number = "2", - pages = "210--217", - doi = "10.2307/2088328", +bibentry(bibtype = 'Article', + title = 'A Methodological Analysis of Segregation Indexes', + author = c(as.person('Otis D. Duncan'), + as.person('Beverly Duncan')), + journal = 'American Sociological Review', + year = '1955', + volume = '20', + number = '2', + pages = '210--217', + doi = '10.2307/2088328', textVersion = - paste("Otis D. Duncan, Beverly Duncan (1955).", - "A Methodological Analysis of Segregation Indexes.", - "American Sociological Review, 20(2), 210-217.", - "DOI:10.2307/2088328"), + paste('Otis D. Duncan, Beverly Duncan (1955).', + 'A Methodological Analysis of Segregation Indexes.', + 'American Sociological Review, 20(2), 210-217.', + 'DOI:10.2307/2088328'), - header = "If you computed DI (Duncan & Duncan) values, please also cite:" + header = 'If you computed DI (Duncan & Duncan) values, please also cite:' ) -bibentry(bibtype = "Article", - title = "Measurement of Inequality of Incomes", - author = c(as.person("Corrado Gini")), - journal = "The Economic Journal", - year = "1921", - volume = "31", - number = "121", - pages = "124--126", - doi = "10.2307/2223319", +bibentry(bibtype = 'Article', + title = 'Measurement of Inequality of Incomes', + author = c(as.person('Corrado Gini')), + journal = 'The Economic Journal', + year = '1921', + volume = '31', + number = '121', + pages = '124--126', + doi = '10.2307/2223319', textVersion = - paste("Corrado Gini (1921).", - "Measurement of Inequality of Incomes.", - "The Economic Journal, 31(121), 124-126.", - "DOI:10.2307/2223319"), + paste('Corrado Gini (1921).', + 'Measurement of Inequality of Incomes.', + 'The Economic Journal, 31(121), 124-126.', + 'DOI:10.2307/2223319'), - header = "If you retrieved Gini Index values, please also cite:" + header = 'If you retrieved Gini Index values, please also cite:' ) -bibentry(bibtype = "Article", - title = "Spatial social polarisation: using the Index of Concentration at the Extremes jointly for income and race/ethnicity to analyse risk of hypertension", - author = c(as.person("Justin M. Feldman"), - as.person("Pamela D. Waterman"), - as.person("Brent A. Coull"), - as.person("Nancy Krieger")), - journal = "Journal of Epidemiology and Community Health", - year = "2015", - volume = "69", - issue = "12", - pages = "1199--207", - doi = "10.1136/jech-2015-205728", +bibentry(bibtype = 'Article', + title = 'Spatial social polarisation: using the Index of Concentration at the Extremes jointly for income and race/ethnicity to analyse risk of hypertension', + author = c(as.person('Justin M. Feldman'), + as.person('Pamela D. Waterman'), + as.person('Brent A. Coull'), + as.person('Nancy Krieger')), + journal = 'Journal of Epidemiology and Community Health', + year = '2015', + volume = '69', + issue = '12', + pages = '1199--207', + doi = '10.1136/jech-2015-205728', textVersion = - paste("Justin M. Feldman, Pamela D. Waterman, Brent A. Coull, Nancy Krieger (2015).", - "Spatial social polarisation: using the Index of Concentration at the Extremes jointly for income and race/ethnicity to analyse risk of hypertension.", - "Journal of Epidemiology and Community Health, 69(12), 1199-207.", - "DOI:10.1136/jech-2015-205728"), + paste('Justin M. Feldman, Pamela D. Waterman, Brent A. Coull, Nancy Krieger (2015).', + 'Spatial social polarisation: using the Index of Concentration at the Extremes jointly for income and race/ethnicity to analyse risk of hypertension.', + 'Journal of Epidemiology and Community Health, 69(12), 1199-207.', + 'DOI:10.1136/jech-2015-205728'), - header = "If you computed ICE (Krieger) values, please also cite (1):" + header = 'If you computed ICE (Krieger) values, please also cite (1):' ) -bibentry(bibtype = "Article", - title = "Public Health Monitoring of Privilege and Deprivation With the Index of Concentration at the Extremes", - author = c(as.person("Nancy Krieger"), - as.person("Pamela D. Waterman"), - as.person("Jasmina Spasojevic"), - as.person("Wenhui Li"), - as.person("Wenhui Li"), - as.person("Gretchen Van Wye")), - journal = "American Journal of Public Health ", - year = "2016", - volume = "106", - issue = "2", - pages = "256--263", - doi = "10.2105/AJPH.2015.302955", +bibentry(bibtype = 'Article', + title = 'Public Health Monitoring of Privilege and Deprivation With the Index of Concentration at the Extremes', + author = c(as.person('Nancy Krieger'), + as.person('Pamela D. Waterman'), + as.person('Jasmina Spasojevic'), + as.person('Wenhui Li'), + as.person('Wenhui Li'), + as.person('Gretchen Van Wye')), + journal = 'American Journal of Public Health ', + year = '2016', + volume = '106', + issue = '2', + pages = '256--263', + doi = '10.2105/AJPH.2015.302955', textVersion = - paste("Beth A. Slotman, David G Stinchcomb, Tiffany M. Powell-Wiley, Danielle M. Ostendorf, Brian E. Saelens, Amy A. Gorin, Shannon N. Zenk, David Berrigan (2016).", - "Public Health Monitoring of Privilege and Deprivation With the Index of Concentration at the Extremes.", - "American Journal of Public Health, 106(2), 256-263.", - "DOI:10.2105/AJPH.2015.302955"), + paste('Beth A. Slotman, David G Stinchcomb, Tiffany M. Powell-Wiley, Danielle M. Ostendorf, Brian E. Saelens, Amy A. Gorin, Shannon N. Zenk, David Berrigan (2016).', + 'Public Health Monitoring of Privilege and Deprivation With the Index of Concentration at the Extremes.', + 'American Journal of Public Health, 106(2), 256-263.', + 'DOI:10.2105/AJPH.2015.302955'), - header = "And (2):" + header = 'And (2):' ) -bibentry(bibtype = "Article", - title = "The development of a standardized neighborhood deprivation index", - author = c(as.person("Lynne C. Messer"), - as.person("Barbara A. Laraia"), - as.person("Jay S. Kaufman"), - as.person("Janet Eyster"), - as.person("Claudia Holzman"), - as.person("Jennifer Culhane"), - as.person("Irma Elo"), - as.person("Jessica Burke"), +bibentry(bibtype = 'Article', + title = 'The development of a standardized neighborhood deprivation index', + author = c(as.person('Lynne C. Messer'), + as.person('Barbara A. Laraia'), + as.person('Jay S. Kaufman'), + as.person('Janet Eyster'), + as.person('Claudia Holzman'), + as.person('Jennifer Culhane'), + as.person('Irma Elo'), + as.person('Jessica Burke'), as.person("Patricia O'Campo")), - journal = "Journal of Urban Health", - year = "2006", - volume = "83", - number = "6", - pages = "1041--1062", - doi = "10.1007/s11524-006-9094-x", + journal = 'Journal of Urban Health', + year = '2006', + volume = '83', + number = '6', + pages = '1041--1062', + doi = '10.1007/s11524-006-9094-x', textVersion = paste("Lynne C. Messer, Barbara A. Laraia, Jay S. Kaufman, Janet Eyster, Claudia Holzman, Jennifer Culhane, Irma Elo, Jessica Burke, Patricia O'Campo (2006).", - "The development of a standardized neighborhood deprivation index.", - "Journal of Urban Health, 83(6), 1041-1062.", - "DOI:10.1007/s11524-006-9094-x"), + 'The development of a standardized neighborhood deprivation index.', + 'Journal of Urban Health, 83(6), 1041-1062.', + 'DOI:10.1007/s11524-006-9094-x'), - header = "If you computed NDI (Messer) values, please also cite:" + header = 'If you computed NDI (Messer) values, please also cite:' ) -bibentry(bibtype = "Article", - title = "Geospatial analysis of neighborhood deprivation index (NDI) for the United States by county", - author = c(as.person("Marcus A. Andrews"), - as.person("Kosuke Tomura"), - as.person("Sophie E. Claudel"), - as.person("Samantha Xu"), - as.person("Joniqua N. Ceasar"), - as.person("Billy S. Collins"), - as.person("Steven Langerman"), - as.person("Valerie M. Mitchell"), - as.person("Yvonne Baumer"), - as.person("Tiffany M. Powell-Wiley")), - journal = "Journal of Maps", - year = "2020", - volume = "16", - issue = "1", - pages = "101--112", - doi = "10.1080/17445647.2020.1750066", +bibentry(bibtype = 'Article', + title = 'Geospatial analysis of neighborhood deprivation index (NDI) for the United States by county', + author = c(as.person('Marcus A. Andrews'), + as.person('Kosuke Tomura'), + as.person('Sophie E. Claudel'), + as.person('Samantha Xu'), + as.person('Joniqua N. Ceasar'), + as.person('Billy S. Collins'), + as.person('Steven Langerman'), + as.person('Valerie M. Mitchell'), + as.person('Yvonne Baumer'), + as.person('Tiffany M. Powell-Wiley')), + journal = 'Journal of Maps', + year = '2020', + volume = '16', + issue = '1', + pages = '101--112', + doi = '10.1080/17445647.2020.1750066', textVersion = - paste("Marcus A. Andrews, Kosuke Tomura, Sophie E. Claudel, Samantha Xu, Joniqua N. Ceasar, Billy S. Collins, Steven Langerman, Valerie M. Mitchell, Yvonne Baumer, Tiffany M. Powell-Wiley (2022).", - "Geospatial analysis of neighborhood deprivation index (NDI) for the United States by county.", - "Journal of Maps, 16(1), 101-112.", - "DOI:10.1080/17445647.2020.1750066"), + paste('Marcus A. Andrews, Kosuke Tomura, Sophie E. Claudel, Samantha Xu, Joniqua N. Ceasar, Billy S. Collins, Steven Langerman, Valerie M. Mitchell, Yvonne Baumer, Tiffany M. Powell-Wiley (2022).', + 'Geospatial analysis of neighborhood deprivation index (NDI) for the United States by county.', + 'Journal of Maps, 16(1), 101-112.', + 'DOI:10.1080/17445647.2020.1750066'), - header = "If you computed NDI (Powell-Wiley) values, please also cite (1):" + header = 'If you computed NDI (Powell-Wiley) values, please also cite (1):' ) -bibentry(bibtype = "Article", - title = "Environmental data and methods from the Accumulating Data to Optimally Predict Obesity Treatment (ADOPT) core measures environmental working group", - author = c(as.person("Beth A. Slotman"), - as.person("David G Stinchcomb"), - as.person("Tiffany M. Powell-Wiley"), - as.person("Danielle M. Ostendorf"), - as.person("Brian E. Saelens"), - as.person("Amy A. Gorin"), - as.person("Shannon N. Zenk"), - as.person("David Berrigan")), - journal = "Data in Brief", - year = "2022", - volume = "41", - pages = "108002", - doi = "10.1016/j.dib.2022.108002", +bibentry(bibtype = 'Article', + title = 'Environmental data and methods from the Accumulating Data to Optimally Predict Obesity Treatment (ADOPT) core measures environmental working group', + author = c(as.person('Beth A. Slotman'), + as.person('David G Stinchcomb'), + as.person('Tiffany M. Powell-Wiley'), + as.person('Danielle M. Ostendorf'), + as.person('Brian E. Saelens'), + as.person('Amy A. Gorin'), + as.person('Shannon N. Zenk'), + as.person('David Berrigan')), + journal = 'Data in Brief', + year = '2022', + volume = '41', + pages = '108002', + doi = '10.1016/j.dib.2022.108002', textVersion = - paste("Beth A. Slotman, David G Stinchcomb, Tiffany M. Powell-Wiley, Danielle M. Ostendorf, Brian E. Saelens, Amy A. Gorin, Shannon N. Zenk, David Berrigan (2022).", - "Environmental data and methods from the Accumulating Data to Optimally Predict Obesity Treatment (ADOPT) core measures environmental working group.", - "Data in Brief, 41, 108002.", - "DOI:10.1016/j.dib.2022.108002"), + paste('Beth A. Slotman, David G Stinchcomb, Tiffany M. Powell-Wiley, Danielle M. Ostendorf, Brian E. Saelens, Amy A. Gorin, Shannon N. Zenk, David Berrigan (2022).', + 'Environmental data and methods from the Accumulating Data to Optimally Predict Obesity Treatment (ADOPT) core measures environmental working group.', + 'Data in Brief, 41, 108002.', + 'DOI:10.1016/j.dib.2022.108002'), - header = "And (2):" + header = 'And (2):' ) -bibentry(bibtype = "Article", - title = "Social Structure and Anomie", - author = c(as.person("Robert K. Merton")), - journal = "American Sociological Review", - year = "1938", - volume = "3", - number = "5", - pages = "672--682", - doi = "10.2307/2084686 ", +bibentry(bibtype = 'Article', + title = 'Social Structure and Anomie', + author = c(as.person('Robert K. Merton')), + journal = 'American Sociological Review', + year = '1938', + volume = '3', + number = '5', + pages = '672--682', + doi = '10.2307/2084686 ', textVersion = - paste("Robert K. Merton (1938).", - "Social Structure and Anomie.", - "American Sociological Review, 3(5), 672-682.", - "DOI:10.2307/2084686 "), + paste('Robert K. Merton (1938).', + 'Social Structure and Anomie.', + 'American Sociological Review, 3(5), 672-682.', + 'DOI:10.2307/2084686 '), - header = "If you computed LQ (Sudano) values, please also cite (1):" + header = 'If you computed LQ (Sudano) values, please also cite (1):' ) -bibentry(bibtype = "Article", - title = "Neighborhood racial residential segregation and changes in health or death among older adults", - author = c(as.person("Joseph J. Sudano"), - as.person("Adam Perzynski"), - as.person("David W. Wong"), - as.person("Natalie Colabianchi"), - as.person("David Litaker")), - journal = "Health & Place", - year = "2013", - volume = "19", - pages = "80--88", - doi = "10.1016/j.healthplace.2012.09.015", +bibentry(bibtype = 'Article', + title = 'Neighborhood racial residential segregation and changes in health or death among older adults', + author = c(as.person('Joseph J. Sudano'), + as.person('Adam Perzynski'), + as.person('David W. Wong'), + as.person('Natalie Colabianchi'), + as.person('David Litaker')), + journal = 'Health & Place', + year = '2013', + volume = '19', + pages = '80--88', + doi = '10.1016/j.healthplace.2012.09.015', textVersion = - paste("Joseph J. Sudano, Adam Perzynski, David W. Wong, Natalie Colabianchi, David Litaker (2013).", - "Neighborhood racial residential segregation and changes in health or death among older adults.", - "Health & Place, 19, 80-88.", - "DOI:10.1016/j.healthplace.2012.09.015"), + paste('Joseph J. Sudano, Adam Perzynski, David W. Wong, Natalie Colabianchi, David Litaker (2013).', + 'Neighborhood racial residential segregation and changes in health or death among older adults.', + 'Health & Place, 19, 80-88.', + 'DOI:10.1016/j.healthplace.2012.09.015'), - header = "And (2):" + header = 'And (2):' ) -bibentry(bibtype = "Article", - title = "A Probability Model for the Measurement of Ecological Segregation", - author = c(as.person("Wendell Bell")), - journal = "Social Forces", - year = "1954", - volume = "32", - issue = "4", - pages = "357--364", - doi = "10.2307/2574118", +bibentry(bibtype = 'Article', + title = 'A Probability Model for the Measurement of Ecological Segregation', + author = c(as.person('Wendell Bell')), + journal = 'Social Forces', + year = '1954', + volume = '32', + issue = '4', + pages = '357--364', + doi = '10.2307/2574118', textVersion = - paste("Wendell Bell (1954).", - "A Probability Model for the Measurement of Ecological Segregation.", - "Social Forces, 32(4), 357-364.", - "DOI:10.2307/2574118"), + paste('Wendell Bell (1954).', + 'A Probability Model for the Measurement of Ecological Segregation.', + 'Social Forces, 32(4), 357-364.', + 'DOI:10.2307/2574118'), - header = "If you computed V (White) values, please also cite (1):" + header = 'If you computed V (White) values, please also cite (1):' ) -bibentry(bibtype = "Article", - title = "Segregation and Diversity Measures in Population Distribution", - author = c(as.person("Michael J. White")), - journal = "Population Index", - year = "1986", - volume = "52", - issue = "2", - pages = "198--221", - doi = "10.2307/3644339", +bibentry(bibtype = 'Article', + title = 'Segregation and Diversity Measures in Population Distribution', + author = c(as.person('Michael J. White')), + journal = 'Population Index', + year = '1986', + volume = '52', + issue = '2', + pages = '198--221', + doi = '10.2307/3644339', textVersion = - paste("Michael J. White (1986).", - "Segregation and Diversity Measures in Population Distribution.", - "Population Index, 52(2), 198-221.", - "DOI:10.2307/3644339"), + paste('Michael J. White (1986).', + 'Segregation and Diversity Measures in Population Distribution.', + 'Population Index, 52(2), 198-221.', + 'DOI:10.2307/3644339'), - header = "And (2):" + header = 'And (2):' +) + +bibentry(bibtype = 'Article', + title = 'Interstate Redistribution of Population, 1850-1940', + author = c(as.person('Edgar M. Hoover')), + journal = 'Journal of Economic History', + year = '1941', + volume = '1', + pages = '199--205', + doi = '10.2307/2223319', + + textVersion = + paste('Edgar M. Hoover (1941).', + 'Interstate Redistribution of Population, 1850-1940.', + 'Journal of Economic History, 1, 199-205.', + 'DOI:10.2307/2223319'), + + header = 'If you computed DEL (Hoover) values, please also cite (1):' +) + +bibentry(bibtype = 'Book', + title = 'Statistical Geography: Problems in Analyzing Area Data', + author = c(as.person('Otis D. Duncan'), + as.person('Ray P. Cuzzort'), + as.person('Beverly Duncan')), + year = '1961', + publisher = 'Free Press', + lc = '60007089', + + textVersion = + paste('Otis D. Duncan, Ray P. Cuzzort, & Beverly Duncan (1961).', + 'Statistical Geography: Problems in Analyzing Area Data.', + 'Free Press', + 'LC:60007089'), + + header = 'And (2):' ) diff --git a/man/anthopolos.Rd b/man/anthopolos.Rd index 26043dd..98d55e7 100644 --- a/man/anthopolos.Rd +++ b/man/anthopolos.Rd @@ -2,12 +2,12 @@ % Please edit documentation in R/anthopolos.R \name{anthopolos} \alias{anthopolos} -\title{Racial Isolation Index based on Anthopolos \emph{et al.} (2011)} +\title{Racial Isolation Index based on Anthopolos et al. (2011)} \usage{ anthopolos(geo = "tract", year = 2020, subgroup, quiet = FALSE, ...) } \arguments{ -\item{geo}{Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}.} +\item{geo}{Character string specifying the geography of the data either census tracts \code{geo = 'tract'} (the default) or counties \code{geo = 'county'}.} \item{year}{Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available.} @@ -29,30 +29,30 @@ An object of class 'list'. This is a named list with the following components: Compute the spatial Racial Isolation Index (Anthopolos) of selected subgroup(s). } \details{ -This function will compute the spatial Racial Isolation Index (RI) of U.S. census tracts or counties for a specified geographical extent (e.g., the entire U.S. or a single state) based on Anthopolos \emph{et al.} (2011) \doi{10.1016/j.sste.2011.06.002} who originally designed the metric for the racial isolation of non-Hispanic Black individuals. This function provides the computation of RI for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). +This function will compute the spatial Racial Isolation Index (RI) of U.S. census tracts or counties for a specified geographical extent (e.g., the entire U.S. or a single state) based on Anthopolos et al. (2011) \doi{10.1016/j.sste.2011.06.002} who originally designed the metric for the racial isolation of non-Hispanic Black individuals. This function provides the computation of RI for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the geospatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: \itemize{ -\item \strong{B03002_002}: not Hispanic or Latino \code{"NHoL"} -\item \strong{B03002_003}: not Hispanic or Latino, white alone\code{"NHoLW"} -\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{"NHoLB"} -\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{"NHoLAIAN"} -\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{"NHoLA"} -\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"NHoLNHOPI"} -\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{"NHoLSOR"} -\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{"NHoLTOMR"} -\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{"NHoLTRiSOR"} -\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"NHoLTReSOR"} -\item \strong{B03002_012}: Hispanic or Latino \code{"HoL"} -\item \strong{B03002_013}: Hispanic or Latino, white alone \code{"HoLW"} -\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{"HoLB"} -\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{"HoLAIAN"} -\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{"HoLA"} -\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"HoLNHOPI"} -\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{"HoLSOR"} -\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{"HoLTOMR"} -\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{"HoLTRiSOR"} -\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"HoLTReSOR"} +\item \strong{B03002_002}: not Hispanic or Latino \code{'NHoL'} +\item \strong{B03002_003}: not Hispanic or Latino, white alone\code{'NHoLW'} +\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{'NHoLA'} +\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +\item \strong{B03002_012}: Hispanic or Latino \code{'HoL'} +\item \strong{B03002_013}: Hispanic or Latino, white alone \code{'HoLW'} +\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{'HoLB'} +\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{'HoLA'} +\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} } Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. NOTE: Current version does not correct for edge effects (e.g., census geographies along the specified spatial extent border, coastline, or U.S.-Mexico / U.S.-Canada border) may have few neighboring census geographies, and RI values in these census geographies may be unstable. A stop-gap solution for the former source of edge effect is to compute the RI for neighboring census geographies (i.e., the states bordering a study area of interest) and then use the estimates of the study area of interest. @@ -62,15 +62,23 @@ A census geography (and its neighbors) that has nearly all of its population who \examples{ \dontrun{ # Wrapped in \dontrun{} because these examples require a Census API key. - + # Tract-level metric (2020) - anthopolos(geo = "tract", state = "GA", - year = 2020, subgroup = c("NHoLB", "HoLB")) - + anthopolos( + geo = 'tract', + state = 'GA', + year = 2020, + subgroup = c('NHoLB', 'HoLB') + ) + # County-level metric (2020) - anthopolos(geo = "county", state = "GA", - year = 2020, subgroup = c("NHoLB", "HoLB")) - + anthopolos( + geo = 'county', + state = 'GA', + year = 2020, + subgroup = c('NHoLB', 'HoLB') + ) + } } diff --git a/man/atkinson.Rd b/man/atkinson.Rd index 6950807..9d1f376 100644 --- a/man/atkinson.Rd +++ b/man/atkinson.Rd @@ -16,9 +16,9 @@ atkinson( ) } \arguments{ -\item{geo_large}{Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = "county"}.} +\item{geo_large}{Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}.} -\item{geo_small}{Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = "tract"}.} +\item{geo_small}{Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}.} \item{year}{Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available.} @@ -47,47 +47,52 @@ Compute the aspatial Atkinson Index of income or selected racial/ethnic subgroup \details{ This function will compute the aspatial Atkinson Index (AI) of income or selected racial/ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Atkinson (1970) \doi{10.1016/0022-0531(70)90039-6}. This function provides the computation of AI for median household income and any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). -The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. When \code{subgroup = "MedHHInc"}, the metric will be computed for median household income ("B19013_001"). The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: +The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. When \code{subgroup = 'MedHHInc'}, the metric will be computed for median household income ('B19013_001'). The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: \itemize{ -\item \strong{B03002_002}: not Hispanic or Latino \code{"NHoL"} -\item \strong{B03002_003}: not Hispanic or Latino, white alone \code{"NHoLW"} -\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{"NHoLB"} -\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{"NHoLAIAN"} -\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{"NHoLA"} -\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"NHoLNHOPI"} -\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{"NHoLSOR"} -\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{"NHoLTOMR"} -\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{"NHoLTRiSOR"} -\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"NHoLTReSOR"} -\item \strong{B03002_012}: Hispanic or Latino \code{"HoL"} -\item \strong{B03002_013}: Hispanic or Latino, white alone \code{"HoLW"} -\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{"HoLB"} -\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{"HoLAIAN"} -\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{"HoLA"} -\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"HoLNHOPI"} -\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{"HoLSOR"} -\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{"HoLTOMR"} -\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{"HoLTRiSOR"} -\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"HoLTReSOR"} +\item \strong{B03002_002}: not Hispanic or Latino \code{'NHoL'} +\item \strong{B03002_003}: not Hispanic or Latino, white alone \code{'NHoLW'} +\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{'NHoLA'} +\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +\item \strong{B03002_012}: Hispanic or Latino \code{'HoL'} +\item \strong{B03002_013}: Hispanic or Latino, white alone \code{'HoLW'} +\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{'HoLB'} +\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{'HoLA'} +\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} } Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. AI is a measure of the evenness of residential inequality (e.g., racial/ethnic segregation) when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. The AI metric can range in value from 0 to 1 with smaller values indicating lower levels of inequality (e.g., less segregation). -The \code{epsilon} argument that determines how to weight the increments to inequality contributed by different proportions of the Lorenz curve. A user must explicitly decide how heavily to weight smaller geographical units at different points on the Lorenz curve (i.e., whether the index should take greater account of differences among areas of over- or under-representation). The \code{epsilon} argument must have values between 0 and 1.0. For \code{0 <= epsilon < 0.5} or less "inequality-averse," smaller geographical units with a subgroup proportion smaller than the subgroup proportion of the larger geographical unit contribute more to inequality ("over-representation"). For \code{0.5 < epsilon <= 1.0} or more "inequality-averse," smaller geographical units with a subgroup proportion larger than the subgroup proportion of the larger geographical unit contribute more to inequality ("under-representation"). If \code{epsilon = 0.5} (the default), units of over- and under-representation contribute equally to the index. See Section 2.3 of Saint-Jacques \emph{et al.} (2020) \doi{10.48550/arXiv.2002.05819} for one method to select \code{epsilon}. +The \code{epsilon} argument that determines how to weight the increments to inequality contributed by different proportions of the Lorenz curve. A user must explicitly decide how heavily to weight smaller geographical units at different points on the Lorenz curve (i.e., whether the index should take greater account of differences among areas of over- or under-representation). The \code{epsilon} argument must have values between 0 and 1.0. For \code{0 <= epsilon < 0.5} or less 'inequality-averse,' smaller geographical units with a subgroup proportion smaller than the subgroup proportion of the larger geographical unit contribute more to inequality ('over-representation'). For \code{0.5 < epsilon <= 1.0} or more 'inequality-averse,' smaller geographical units with a subgroup proportion larger than the subgroup proportion of the larger geographical unit contribute more to inequality ('under-representation'). If \code{epsilon = 0.5} (the default), units of over- and under-representation contribute equally to the index. See Section 2.3 of Saint-Jacques et al. (2020) \doi{10.48550/arXiv.2002.05819} for one method to select \code{epsilon}. -Larger geographies available include state \code{geo_large = "state"}, county \code{geo_large = "county"}, and census tract \code{geo_large = "tract"} levels. Smaller geographies available include, county \code{geo_small = "county"}, census tract \code{geo_small = "tract"}, and census block group \code{geo_small = "block group"} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the AI value returned is NA. +Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, and census tract \code{geo_large = 'tract'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the AI value returned is NA. } \examples{ \dontrun{ # Wrapped in \dontrun{} because these examples require a Census API key. - + # Atkinson Index of non-Hispanic Black populations ## of census tracts within Georgia, U.S.A., counties (2020) - atkinson(geo_large = "county", geo_small = "tract", state = "GA", - year = 2020, subgroup = "NHoLB") - + atkinson( + geo_large = 'county', + geo_small = 'tract', + state = 'GA', + year = 2020, + subgroup = 'NHoLB' + ) + } } diff --git a/man/bell.Rd b/man/bell.Rd index a0743a4..58ee254 100644 --- a/man/bell.Rd +++ b/man/bell.Rd @@ -16,9 +16,9 @@ bell( ) } \arguments{ -\item{geo_large}{Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = "county"}.} +\item{geo_large}{Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}.} -\item{geo_small}{Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = "tract"}.} +\item{geo_small}{Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}.} \item{year}{Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available.} @@ -49,43 +49,49 @@ This function will compute the aspatial Isolation Index (II) of selected racial/ The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: \itemize{ -\item \strong{B03002_002}: not Hispanic or Latino \code{"NHoL"} -\item \strong{B03002_003}: not Hispanic or Latino, white alone \code{"NHoLW"} -\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{"NHoLB"} -\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{"NHoLAIAN"} -\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{"NHoLA"} -\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"NHoLNHOPI"} -\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{"NHoLSOR"} -\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{"NHoLTOMR"} -\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{"NHoLTRiSOR"} -\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"NHoLTReSOR"} -\item \strong{B03002_012}: Hispanic or Latino \code{"HoL"} -\item \strong{B03002_013}: Hispanic or Latino, white alone \code{"HoLW"} -\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{"HoLB"} -\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{"HoLAIAN"} -\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{"HoLA"} -\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"HoLNHOPI"} -\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{"HoLSOR"} -\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{"HoLTOMR"} -\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{"HoLTRiSOR"} -\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"HoLTReSOR"} +\item \strong{B03002_002}: not Hispanic or Latino \code{'NHoL'} +\item \strong{B03002_003}: not Hispanic or Latino, white alone \code{'NHoLW'} +\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{'NHoLA'} +\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +\item \strong{B03002_012}: Hispanic or Latino \code{'HoL'} +\item \strong{B03002_013}: Hispanic or Latino, white alone \code{'HoLW'} +\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{'HoLB'} +\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{'HoLA'} +\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} } Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. II is some measure of the probability that a member of one subgroup(s) will meet or interact with a member of another subgroup(s) with higher values signifying higher probability of interaction (less isolation). II can range in value from 0 to 1. -Larger geographies available include state \code{geo_large = "state"}, county \code{geo_large = "county"}, and census tract \code{geo_large = "tract"} levels. Smaller geographies available include, county \code{geo_small = "county"}, census tract \code{geo_small = "tract"}, and census block group \code{geo_small = "block group"} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the II value returned is NA. +Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, and census tract \code{geo_large = 'tract'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the II value returned is NA. } \examples{ \dontrun{ # Wrapped in \dontrun{} because these examples require a Census API key. - + # Isolation of non-Hispanic Black vs. non-Hispanic white populations ## of census tracts within Georgia, U.S.A., counties (2020) - bell(geo_large = "county", geo_small = "tract", state = "GA", - year = 2020, subgroup = "NHoLB", subgroup_ixn = "NHoLW") - + bell( + geo_large = 'county', + geo_small = 'tract', + state = 'GA', + year = 2020, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW' + ) + } } diff --git a/man/bemanian_beyer.Rd b/man/bemanian_beyer.Rd index 69bebf6..cf2a1c0 100644 --- a/man/bemanian_beyer.Rd +++ b/man/bemanian_beyer.Rd @@ -16,9 +16,9 @@ bemanian_beyer( ) } \arguments{ -\item{geo_large}{Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = "county"}.} +\item{geo_large}{Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}.} -\item{geo_small}{Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = "tract"}.} +\item{geo_small}{Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}.} \item{year}{Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available.} @@ -49,26 +49,26 @@ This function will compute the aspatial Local Exposure and Isolation (LEx/Is) me The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: \itemize{ -\item \strong{B03002_002}: not Hispanic or Latino \code{"NHoL"} -\item \strong{B03002_003}: not Hispanic or Latino, white alone \code{"NHoLW"} -\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{"NHoLB"} -\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{"NHoLAIAN"} -\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{"NHoLA"} -\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"NHoLNHOPI"} -\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{"NHoLSOR"} -\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{"NHoLTOMR"} -\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{"NHoLTRiSOR"} -\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"NHoLTReSOR"} -\item \strong{B03002_012}: Hispanic or Latino \code{"HoL"} -\item \strong{B03002_013}: Hispanic or Latino, white alone \code{"HoLW"} -\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{"HoLB"} -\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{"HoLAIAN"} -\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{"HoLA"} -\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"HoLNHOPI"} -\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{"HoLSOR"} -\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{"HoLTOMR"} -\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{"HoLTRiSOR"} -\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"HoLTReSOR"} +\item \strong{B03002_002}: not Hispanic or Latino \code{'NHoL'} +\item \strong{B03002_003}: not Hispanic or Latino, white alone \code{'NHoLW'} +\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{'NHoLA'} +\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +\item \strong{B03002_012}: Hispanic or Latino \code{'HoL'} +\item \strong{B03002_013}: Hispanic or Latino, white alone \code{'HoLW'} +\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{'HoLB'} +\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{'HoLA'} +\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} } Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. @@ -77,17 +77,23 @@ LEx/Is is a measure of the probability that two individuals living within a spec LEx/Is can range from negative infinity to infinity. If LEx/Is is zero then the estimated probability of the interaction between two people of the given subgroup(s) within a smaller geography is equal to the expected probability if the subgroup(s) were perfectly mixed in the larger geography. If LEx/Is is greater than zero then the interaction is more likely to occur within the smaller geography than in the larger geography, and if LEx/Is is less than zero then the interaction is less likely to occur within the smaller geography than in the larger geography. Note: the exponentiation of each LEx/Is metric results in the odds ratio of the specific exposure or isolation of interest in a smaller geography relative to the larger geography. -Larger geographies available include state \code{geo_large = "state"}, county \code{geo_large = "county"}, and census tract \code{geo_large = "tract"} levels. Smaller geographies available include, county \code{geo_small = "county"}, census tract \code{geo_small = "tract"}, and census block group \code{geo_small = "block group"} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the LEx/Is value returned is NA. +Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, and census tract \code{geo_large = 'tract'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the LEx/Is value returned is NA. } \examples{ \dontrun{ # Wrapped in \dontrun{} because these examples require a Census API key. - + # Isolation of non-Hispanic Black vs. non-Hispanic white populations ## of census tracts within Georgia, U.S.A., counties (2020) - bemanian_beyer(geo_large = "county", geo_small = "tract", state = "GA", - year = 2020, subgroup = "NHoLB", subgroup_ixn = "NHoLW") - + bemanian_beyer( + geo_large = 'county', + geo_small = 'tract', + state = 'GA', + year = 2020, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW' + ) + } } diff --git a/man/bravo.Rd b/man/bravo.Rd index 53727d9..396390a 100644 --- a/man/bravo.Rd +++ b/man/bravo.Rd @@ -2,12 +2,12 @@ % Please edit documentation in R/bravo.R \name{bravo} \alias{bravo} -\title{Educational Isolation Index based on Bravo \emph{et al.} (2021)} +\title{Educational Isolation Index based on Bravo et al. (2021)} \usage{ bravo(geo = "tract", year = 2020, subgroup, quiet = FALSE, ...) } \arguments{ -\item{geo}{Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}.} +\item{geo}{Character string specifying the geography of the data either census tracts \code{geo = 'tract'} (the default) or counties \code{geo = 'county'}.} \item{year}{Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available.} @@ -29,15 +29,15 @@ An object of class 'list'. This is a named list with the following components: Compute the spatial Educational Isolation Index (Bravo) of selected educational attainment category(ies). } \details{ -This function will compute the spatial Educational Isolation Index (EI) of U.S. census tracts or counties for a specified geographical extent (e.g., the entire U.S. or a single state) based on Bravo \emph{et al.} (2021) \doi{10.3390/ijerph18179384} who originally designed the metric for the educational isolation of individual without a college degree. This function provides the computation of EI for any of the U.S. Census Bureau educational attainment levels. +This function will compute the spatial Educational Isolation Index (EI) of U.S. census tracts or counties for a specified geographical extent (e.g., the entire U.S. or a single state) based on Bravo et al. (2021) \doi{10.3390/ijerph18179384} who originally designed the metric for the educational isolation of individual without a college degree. This function provides the computation of EI for any of the U.S. Census Bureau educational attainment levels. -The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the geospatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The five educational attainment levels (U.S. Census Bureau definitions) are: +The function uses the \code{\link[tidycensus]{get_acs}} to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the geospatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The five educational attainment levels (U.S. Census Bureau definitions) are: \itemize{ -\item \strong{B06009_002}: Less than high school graduate \code{"LtHS"} -\item \strong{B06009_003}: High school graduate (includes equivalency) \code{"HSGiE"} -\item \strong{B06009_004}: Some college or associate's degree \code{"SCoAD"} -\item \strong{B06009_005}: Bachelor's degree \code{"BD"} -\item \strong{B06009_006}: Graduate or professional degree \code{"GoPD"} +\item \strong{B06009_002}: Less than high school graduate \code{'LtHS'} +\item \strong{B06009_003}: High school graduate (includes equivalency) \code{'HSGiE'} +\item \strong{B06009_004}: Some college or associate's degree \code{'SCoAD'} +\item \strong{B06009_005}: Bachelor's degree \code{'BD'} +\item \strong{B06009_006}: Graduate or professional degree \code{'GoPD'} } Note: If \code{year = 2009}, then the ACS-5 data (2005-2009) are from the \strong{B15002} question. @@ -48,15 +48,23 @@ A census geography (and its neighbors) that has nearly all of its population wit \examples{ \dontrun{ # Wrapped in \dontrun{} because these examples require a Census API key. - + # Tract-level metric (2020) - bravo(geo = "tract", state = "GA", - year = 2020, subgroup = c("LtHS", "HSGiE")) - + bravo( + geo = 'tract', + state = 'GA', + year = 2020, + subgroup = c('LtHS', 'HSGiE') + ) + # County-level metric (2020) - bravo(geo = "county", state = "GA", - year = 2020, subgroup = c("LtHS", "HSGiE")) - + bravo( + geo = 'county', + state = 'GA', + year = 2020, + subgroup = c('LtHS', 'HSGiE') + ) + } } diff --git a/man/duncan.Rd b/man/duncan.Rd index b626abd..0c19fc5 100644 --- a/man/duncan.Rd +++ b/man/duncan.Rd @@ -16,9 +16,9 @@ duncan( ) } \arguments{ -\item{geo_large}{Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = "county"}.} +\item{geo_large}{Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}.} -\item{geo_small}{Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = "tract"}.} +\item{geo_small}{Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}.} \item{year}{Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available.} @@ -49,43 +49,49 @@ This function will compute the aspatial Dissimilarity Index (DI) of selected rac The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: \itemize{ -\item \strong{B03002_002}: not Hispanic or Latino \code{"NHoL"} -\item \strong{B03002_003}: not Hispanic or Latino, white alone \code{"NHoLW"} -\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{"NHoLB"} -\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{"NHoLAIAN"} -\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{"NHoLA"} -\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"NHoLNHOPI"} -\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{"NHoLSOR"} -\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{"NHoLTOMR"} -\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{"NHoLTRiSOR"} -\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"NHoLTReSOR"} -\item \strong{B03002_012}: Hispanic or Latino \code{"HoL"} -\item \strong{B03002_013}: Hispanic or Latino, white alone \code{"HoLW"} -\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{"HoLB"} -\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{"HoLAIAN"} -\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{"HoLA"} -\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"HoLNHOPI"} -\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{"HoLSOR"} -\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{"HoLTOMR"} -\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{"HoLTRiSOR"} -\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"HoLTReSOR"} +\item \strong{B03002_002}: not Hispanic or Latino \code{'NHoL'} +\item \strong{B03002_003}: not Hispanic or Latino, white alone \code{'NHoLW'} +\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{'NHoLA'} +\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +\item \strong{B03002_012}: Hispanic or Latino \code{'HoL'} +\item \strong{B03002_013}: Hispanic or Latino, white alone \code{'HoLW'} +\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{'HoLB'} +\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{'HoLA'} +\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} } Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. DI is a measure of the evenness of racial/ethnic residential segregation when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. DI can range in value from 0 to 1 and represents the proportion of racial/ethnic subgroup members that would have to change their area of residence to achieve an even distribution within the larger geographical area under conditions of maximum segregation. -Larger geographies available include state \code{geo_large = "state"}, county \code{geo_large = "county"}, and census tract \code{geo_large = "tract"} levels. Smaller geographies available include, county \code{geo_small = "county"}, census tract \code{geo_small = "tract"}, and census block group \code{geo_small = "block group"} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the DI value returned is NA. +Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, and census tract \code{geo_large = 'tract'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the DI value returned is NA. } \examples{ \dontrun{ # Wrapped in \dontrun{} because these examples require a Census API key. - + # Dissimilarity Index of non-Hispanic Black vs. non-Hispanic white populations ## of census tracts within Georgia, U.S.A., counties (2020) - duncan(geo_large = "county", geo_small = "tract", state = "GA", - year = 2020, subgroup = "NHoLB", subgroup_ref = "NHoLW") - + duncan( + geo_large = 'county', + geo_small = 'tract', + state = 'GA', + year = 2020, + subgroup = 'NHoLB', + subgroup_ref = 'NHoLW' + ) + } } diff --git a/man/figures/del.png b/man/figures/del.png new file mode 100644 index 0000000..02ef683 Binary files /dev/null and b/man/figures/del.png differ diff --git a/man/gini.Rd b/man/gini.Rd index d9b1976..0a83558 100644 --- a/man/gini.Rd +++ b/man/gini.Rd @@ -7,7 +7,7 @@ gini(geo = "tract", year = 2020, quiet = FALSE, ...) } \arguments{ -\item{geo}{Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}.} +\item{geo}{Character string specifying the geography of the data either census tracts \code{geo = 'tract'} (the default) or counties \code{geo = 'county'}.} \item{year}{Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available.} @@ -33,18 +33,18 @@ The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. -According to the U.S. Census Bureau \url{https://www.census.gov/topics/income-poverty/income-inequality/about/metrics/gini-index.html}: "The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income). The Gini is based on the difference between the Lorenz curve (the observed cumulative income distribution) and the notion of a perfectly equal income distribution." +According to the U.S. Census Bureau \url{https://www.census.gov/topics/income-poverty/income-inequality/about/metrics/gini-index.html}: 'The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income). The Gini is based on the difference between the Lorenz curve (the observed cumulative income distribution) and the notion of a perfectly equal income distribution.' } \examples{ \dontrun{ # Wrapped in \dontrun{} because these examples require a Census API key. - + # Tract-level metric (2020) - gini(geo = "tract", state = "GA", year = 2020) - + gini(geo = 'tract', state = 'GA', year = 2020) + # County-level metric (2020) - gini(geo = "county", state = "GA", year = 2020) - + gini(geo = 'county', state = 'GA', year = 2020) + } } diff --git a/man/hoover.Rd b/man/hoover.Rd new file mode 100644 index 0000000..ec9edca --- /dev/null +++ b/man/hoover.Rd @@ -0,0 +1,96 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/hoover.R +\name{hoover} +\alias{hoover} +\title{Delta based on Hoover (1941) and Duncan et al. (1961)} +\usage{ +hoover( + geo_large = "county", + geo_small = "tract", + year = 2020, + subgroup, + omit_NAs = TRUE, + quiet = FALSE, + ... +) +} +\arguments{ +\item{geo_large}{Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}.} + +\item{geo_small}{Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}.} + +\item{year}{Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available.} + +\item{subgroup}{Character string specifying the racial/ethnic subgroup(s). See Details for available choices.} + +\item{omit_NAs}{Logical. If FALSE, will compute index for a larger geographical unit only if all of its smaller geographical units have values. The default is TRUE.} + +\item{quiet}{Logical. If TRUE, will display messages about potential missing census information. The default is FALSE.} + +\item{...}{Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics} +} +\value{ +An object of class 'list'. This is a named list with the following components: + +\describe{ +\item{\code{del}}{An object of class 'tbl' for the GEOID, name, and DEL at specified larger census geographies.} +\item{\code{del_data}}{An object of class 'tbl' for the raw census values at specified smaller census geographies.} +\item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute DEL.} +} +} +\description{ +Compute the aspatial Delta (Hoover) of a selected racial/ethnic subgroup(s) and U.S. geographies. +} +\details{ +This function will compute the aspatial Delta (DEL) of selected racial/ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Hoover (1941) \doi{10.1017/S0022050700052980} and Duncan, Cuzzort, and Duncan (1961; LC:60007089). This function provides the computation of DEL for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). + +The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: +\itemize{ +\item \strong{B03002_002}: not Hispanic or Latino \code{'NHoL'} +\item \strong{B03002_003}: not Hispanic or Latino, white alone \code{'NHoLW'} +\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{'NHoLA'} +\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +\item \strong{B03002_012}: Hispanic or Latino \code{'HoL'} +\item \strong{B03002_013}: Hispanic or Latino, white alone \code{'HoLW'} +\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{'HoLB'} +\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{'HoLA'} +\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} +} + +Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. + +DEL is a measure of the proportion of members of one subgroup(s) residing in geographic units with above average density of members of the subgroup(s). The index provides the proportion of a subgroup population that would have to move across geographic units to achieve a uniform density. DEL can range in value from 0 to 1. + +Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, and census tract \code{geo_large = 'tract'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the DEL value returned is NA. +} +\examples{ +\dontrun{ +# Wrapped in \dontrun{} because these examples require a Census API key. + + # Delta (a measure of concentration) of non-Hispanic Black vs. non-Hispanic white populations + ## of census tracts within Georgia, U.S.A., counties (2020) + hoover( + geo_large = 'county', + geo_small = 'tract', + state = 'GA', + year = 2020, + subgroup = 'NHoLB' + ) + +} + +} +\seealso{ +\code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). +} diff --git a/man/krieger.Rd b/man/krieger.Rd index 2a09ec4..de13069 100644 --- a/man/krieger.Rd +++ b/man/krieger.Rd @@ -2,12 +2,12 @@ % Please edit documentation in R/krieger.R \name{krieger} \alias{krieger} -\title{Index of Concentration at the Extremes based on Feldman \emph{et al.} (2015) and Krieger \emph{et al.} (2016)} +\title{Index of Concentration at the Extremes based on Feldman et al. (2015) and Krieger et al. (2016)} \usage{ krieger(geo = "tract", year = 2020, quiet = FALSE, ...) } \arguments{ -\item{geo}{Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}.} +\item{geo}{Character string specifying the geography of the data either census tracts \code{geo = 'tract'} (the default) or counties \code{geo = 'county'}.} \item{year}{Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available.} @@ -27,7 +27,7 @@ An object of class 'list'. This is a named list with the following components: Compute the aspatial Index of Concentration at the Extremes (Krieger). } \details{ -This function will compute three aspatial Index of Concentration at the Extremes (ICE) of U.S. census tracts or counties for a specified geographical extent (e.g., entire U.S. or a single state) based on Feldman \emph{et al.} (2015) \doi{10.1136/jech-2015-205728} and Krieger \emph{et al.} (2016) \doi{10.2105/AJPH.2015.302955}. The authors expanded the metric designed by Massey in a chapter of Booth & Crouter (2001) \doi{10.4324/9781410600141} who initially designed the metric for residential segregation. This function computes five ICE metrics: +This function will compute three aspatial Index of Concentration at the Extremes (ICE) of U.S. census tracts or counties for a specified geographical extent (e.g., entire U.S. or a single state) based on Feldman et al. (2015) \doi{10.1136/jech-2015-205728} and Krieger et al. (2016) \doi{10.2105/AJPH.2015.302955}. The authors expanded the metric designed by Massey in a chapter of Booth & Crouter (2001) \doi{10.4324/9781410600141} who initially designed the metric for residential segregation. This function computes five ICE metrics: \itemize{ \item \strong{Income}: 80th income percentile vs. 20th income percentile @@ -53,13 +53,13 @@ ICE metrics can range in value from -1 (most deprived) to 1 (most privileged). A \examples{ \dontrun{ # Wrapped in \dontrun{} because these examples require a Census API key. - + # Tract-level metric (2020) - krieger(geo = "tract", state = "GA", year = 2020) - + krieger(geo = 'tract', state = 'GA', year = 2020) + # County-level metric (2020) - krieger(geo = "county", state = "GA", year = 2020) - + krieger(geo = 'county', state = 'GA', year = 2020) + } } diff --git a/man/messer.Rd b/man/messer.Rd index b9c4f5e..5294391 100644 --- a/man/messer.Rd +++ b/man/messer.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/messer.R \name{messer} \alias{messer} -\title{Neighborhood Deprivation Index based on Messer \emph{et al.} (2006)} +\title{Neighborhood Deprivation Index based on Messer et al. (2006)} \usage{ messer( geo = "tract", @@ -15,7 +15,7 @@ messer( ) } \arguments{ -\item{geo}{Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}.} +\item{geo}{Character string specifying the geography of the data either census tracts \code{geo = 'tract'} (the default) or counties \code{geo = 'county'}.} \item{year}{Numeric. The year to compute the estimate. The default is 2020, and the years 2010 onward are currently available.} @@ -42,7 +42,7 @@ An object of class 'list'. This is a named list with the following components: Compute the aspatial Neighborhood Deprivation Index (Messer). } \details{ -This function will compute the aspatial Neighborhood Deprivation Index (NDI) of U.S. census tracts or counties for a specified geographical referent (e.g., US-standardized) based on Messer \emph{et al.} (2006) \doi{10.1007/s11524-006-9094-x}. +This function will compute the aspatial Neighborhood Deprivation Index (NDI) of U.S. census tracts or counties for a specified geographical referent (e.g., US-standardized) based on Messer et al. (2006) \doi{10.1007/s11524-006-9094-x}. The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for computation involving a principal component analysis with the \code{\link[psych]{principal}} function. The yearly estimates are available for 2010 and after when all census characteristics became available. The eight characteristics are: \itemize{ @@ -59,11 +59,11 @@ The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify the referent for standardizing the NDI (Messer) values. For example, if all U.S. states are specified for the \code{state} argument, then the output would be a U.S.-standardized index. -The continuous NDI (Messer) values are z-transformed, i.e., "standardized," and the categorical NDI (Messer) values are quartiles of the standardized continuous NDI (Messer) values. +The continuous NDI (Messer) values are z-transformed, i.e., 'standardized,' and the categorical NDI (Messer) values are quartiles of the standardized continuous NDI (Messer) values. Check if the proportion of variance explained by the first principal component is high (more than 0.5). -Users can bypass \code{\link[tidycensus]{get_acs}} by specifying a pre-formatted data frame or tibble using the \code{df} argument. This function will compute an index using the first component of a principal component analysis (PCA) with a Varimax rotation (the default for \code{\link[psych]{principal}}) and only one factor (note: PCA set-up not unspecified in Messer \emph{et al.} (2006)). The recommended structure of the data frame or tibble is an ID (e.g., GEOID) in the first feature (column), followed by the variables of interest (in any order) and no additional information (e.g., omit state or county names from the \code{df} argument input). +Users can bypass \code{\link[tidycensus]{get_acs}} by specifying a pre-formatted data frame or tibble using the \code{df} argument. This function will compute an index using the first component of a principal component analysis (PCA) with a Varimax rotation (the default for \code{\link[psych]{principal}}) and only one factor (note: PCA set-up not unspecified in Messer et al. (2006)). The recommended structure of the data frame or tibble is an ID (e.g., GEOID) in the first feature (column), followed by the variables of interest (in any order) and no additional information (e.g., omit state or county names from the \code{df} argument input). } \examples{ @@ -73,13 +73,13 @@ messer(df = DCtracts2020[ , c(1, 3:10)]) # Wrapped in \dontrun{} because these examples require a Census API key. # Tract-level metric (2020) - messer(geo = "tract", state = "GA", year = 2020) + messer(geo = 'tract', state = 'GA', year = 2020) # Impute NDI for tracts (2020) with missing census information (median values) - messer(state = "tract", "GA", year = 2020, imp = TRUE) + messer(state = 'tract', 'GA', year = 2020, imp = TRUE) # County-level metric (2020) - messer(geo = "county", state = "GA", year = 2020) + messer(geo = 'county', state = 'GA', year = 2020) } diff --git a/man/ndi-package.Rd b/man/ndi-package.Rd index f06f9ac..85de2d5 100644 --- a/man/ndi-package.Rd +++ b/man/ndi-package.Rd @@ -1,5 +1,5 @@ % Generated by roxygen2: do not edit by hand -% Please edit documentation in R/package.R +% Please edit documentation in R/ndi-package.R \docType{package} \name{ndi-package} \alias{ndi-package} @@ -9,7 +9,7 @@ Computes various metrics of socio-economic deprivation and disparity in the United States based on information available from the U.S. Census Bureau. } \details{ -The 'ndi' package computes various metrics of socio-economic deprivation and disparity in the United States. Some metrics are considered "spatial" because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other metrics are "aspatial" because they only consider the value within each census geography. Two types of aspatial neighborhood deprivation indices (NDI) are available: (1) based on Messer \emph{et al.} (2006) \doi{10.1007/s11524-006-9094-x} and (2) based on Andrews \emph{et al.} (2020) \doi{10.1080/17445647.2020.1750066} and Slotman \emph{et al.} (2022) \doi{10.1016/j.dib.2022.108002} who use variables chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x}. Both are a decomposition of multiple demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates (ACS-5; 2006-2010 onward). Using data from the ACS-5 (2005-2009 onward), the package can also compute the (1) spatial Racial Isolation Index (RI) based on Anthopolos \emph{et al.} (2011) \doi{10.1016/j.sste.2011.06.002}, (2) spatial Educational Isolation Index (EI) based on Bravo \emph{et al.} (2021) \doi{10.3390/ijerph18179384}, (3) aspatial Index of Concentration at the Extremes (ICE) based on Feldman \emph{et al.} (2015) \doi{10.1136/jech-2015-205728} and Krieger \emph{et al.} (2016) \doi{10.2105/AJPH.2015.302955}, (4) aspatial racial/ethnic Dissimilarity Index based on Duncan & Duncan (1955) \doi{10.2307/2088328}, (5) aspatial income or racial/ethnic Atkinson Index based on Atkinson (1970) \doi{10.1016/0022-0531(70)90039-6}, (6) aspatial racial/ethnic Isolation Index (II) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell (1954) \doi{10.2307/2574118}, (7) aspatial racial/ethnic Correlation Ratio based on Bell (1954) \doi{10.2307/2574118} and White (1986) \doi{10.2307/3644339}, and (8) aspatial racial/ethnic Location Quotient (LQ) based on Merton (1939) \doi{10.2307/2084686} and Sudano \emph{et al.} (2013) \doi{10.1016/j.healthplace.2012.09.015}. Also using data from the ACS-5 (2005-2009 onward), the package can retrieve the aspatial Gini Index based on Gini (1921) \doi{10.2307/2223319}. +The 'ndi' package computes various metrics of socio-economic deprivation and disparity in the United States. Some metrics are considered "spatial" because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other metrics are "aspatial" because they only consider the value within each census geography. Two types of aspatial neighborhood deprivation indices (NDI) are available: (1) based on Messer et al. (2006) \doi{10.1007/s11524-006-9094-x} and (2) based on Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} who use variables chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x}. Both are a decomposition of multiple demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates (ACS-5; 2006-2010 onward). Using data from the ACS-5 (2005-2009 onward), the package can also compute the (1) spatial Racial Isolation Index (RI) based on Anthopolos et al. (2011) \doi{10.1016/j.sste.2011.06.002}, (2) spatial Educational Isolation Index (EI) based on Bravo et al. (2021) \doi{10.3390/ijerph18179384}, (3) aspatial Index of Concentration at the Extremes (ICE) based on Feldman et al. (2015) \doi{10.1136/jech-2015-205728} and Krieger et al. (2016) \doi{10.2105/AJPH.2015.302955}, (4) aspatial racial/ethnic Dissimilarity Index based on Duncan & Duncan (1955) \doi{10.2307/2088328}, (5) aspatial income or racial/ethnic Atkinson Index based on Atkinson (1970) \doi{10.1016/0022-0531(70)90039-6}, (6) aspatial racial/ethnic Isolation Index (II) based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and Bell (1954) \doi{10.2307/2574118}, (7) aspatial racial/ethnic Correlation Ratio based on Bell (1954) \doi{10.2307/2574118} and White (1986) \doi{10.2307/3644339}, (8) aspatial racial/ethnic Location Quotient (LQ) based on Merton (1939) \doi{10.2307/2084686} and Sudano et al. (2013) \doi{10.1016/j.healthplace.2012.09.015}, (9) aspatial racial/ethnic Local Exposure and Isolation metric based on Bemanian & Beyer (2017) \url{doi:10.1158/1055-9965.EPI-16-0926}, and (10) aspatial racial/ethnic Delta based on Hoover (1941) \url{doi:10.1017/S0022050700052980} and Duncan et al. (1961; LC:60007089). Also using data from the ACS-5 (2005-2009 onward), the package can retrieve the aspatial Gini Index based on Gini (1921) \doi{10.2307/2223319}. Key content of the 'ndi' package include:\cr @@ -29,13 +29,15 @@ Key content of the 'ndi' package include:\cr \code{\link{gini}} Retrieves the aspatial Gini Index based on Gini (1921) \doi{10.2307/2223319}. -\code{\link{krieger}} Computes the aspatial Index of Concentration at the Extremes based on Feldman \emph{et al.} (2015) \doi{10.1136/jech-2015-205728} and Krieger \emph{et al.} (2016) \doi{10.2105/AJPH.2015.302955}. +\code{\link{hoover}} Computes the aspatial racial/ethnic Delta (DEL) based on Hoover (1941) \doi{doi:10.1017/S0022050700052980} and Duncan et al. (1961; LC:60007089). -\code{\link{messer}} Computes the aspatial Neighborhood Deprivation Index (NDI) based on Messer \emph{et al.} (2006) \doi{10.1007/s11524-006-9094-x}. +\code{\link{krieger}} Computes the aspatial Index of Concentration at the Extremes based on Feldman et al. (2015) \doi{10.1136/jech-2015-205728} and Krieger et al. (2016) \doi{10.2105/AJPH.2015.302955}. -\code{\link{powell_wiley}} Computes the aspatial Neighborhood Deprivation Index (NDI) based on Andrews \emph{et al.} (2020) \doi{10.1080/17445647.2020.1750066} and Slotman \emph{et al.} (2022) \doi{10.1016/j.dib.2022.108002} who use variables chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x}. +\code{\link{messer}} Computes the aspatial Neighborhood Deprivation Index (NDI) based on Messer et al. (2006) \doi{10.1007/s11524-006-9094-x}. -\code{\link{sudano}} Computes the aspatial racial/ethnic Location Quotient (LQ) based on Merton (1939) \doi{10.2307/2084686} and Sudano \emph{et al.} (2013) \doi{10.1016/j.healthplace.2012.09.015}. +\code{\link{powell_wiley}} Computes the aspatial Neighborhood Deprivation Index (NDI) based on Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} who use variables chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x}. + +\code{\link{sudano}} Computes the aspatial racial/ethnic Location Quotient (LQ) based on Merton (1939) \doi{10.2307/2084686} and Sudano et al. (2013) \doi{10.1016/j.healthplace.2012.09.015}. \code{\link{white}} Computes the aspatial racial/ethnic Correlation Ratio (V) based on Bell (1954) \doi{10.2307/2574118} and White (1986) \doi{10.2307/3644339}. @@ -44,12 +46,20 @@ Key content of the 'ndi' package include:\cr \code{\link{DCtracts2020}} A sample dataset containing information about U.S. Census American Community Survey 5-year estimate data for the District of Columbia census tracts (2020). The data are obtained from the \code{\link[tidycensus]{get_acs}} function and formatted for the \code{\link{messer}} and \code{\link{powell_wiley}} functions input. } \section{Dependencies}{ - The 'ndi' package relies heavily upon \code{\link{tidycensus}} to retrieve data from the U.S. Census Bureau American Community Survey five-year estimates and the \code{\link{psych}} for computing the neighborhood deprivation indices. The \code{\link{messer}} function builds upon code developed by Hruska \emph{et al.} (2022) \doi{10.17605/OSF.IO/M2SAV} by fictionalizing, adding the percent of households earning <$30,000 per year to the NDI computation, and providing the option for computing the ACS-5 2006-2010 NDI values. There is no code companion to compute NDI included in Andrews \emph{et al.} (2020) \doi{10.1080/17445647.2020.1750066} or Slotman \emph{et al.} (2022) \doi{10.1016/j.dib.2022.108002}, but the package author worked directly with the Slotman \emph{et al.} (2022) \doi{10.1016/j.dib.2022.108002} authors to replicate their SAS code in R. The spatial metrics RI and EI rely on the \code{\link{sf}} and \code{\link{Matrix}} packages to compute the geospatial adjacency matrix between census geographies. Internal function to calculate AI is based on \code{\link[DescTools]{Atkinson}} function. There is no code companion to compute RI, EI, DI, II, V, LQ, or LEx/Is included in Anthopolos \emph{et al.} (2011) \doi{10.1016/j.sste.2011.06.002}, Bravo \emph{et al.} (2021) \doi{10.3390/ijerph18179384}, Duncan & Duncan (1955) \doi{10.2307/2088328}, Bell (1954) \doi{10.2307/2574118}, White (1986) \doi{10.2307/3644339}, Sudano \emph{et al.} (2013) \doi{10.1016/j.healthplace.2012.09.015}, or Bemanian & Beyer (2017) \doi{10.1158/1055-9965.EPI-16-0926}, respectively. + The 'ndi' package relies heavily upon \code{\link{tidycensus}} to retrieve data from the U.S. Census Bureau American Community Survey five-year estimates and the \code{\link{psych}} for computing the neighborhood deprivation indices. The \code{\link{messer}} function builds upon code developed by Hruska et al. (2022) \doi{10.17605/OSF.IO/M2SAV} by fictionalizing, adding the percent of households earning <$30,000 per year to the NDI computation, and providing the option for computing the ACS-5 2006-2010 NDI values. There is no code companion to compute NDI included in Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} or Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002}, but the package author worked directly with the Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} authors to replicate their SAS code in R. The spatial metrics RI and EI rely on the \code{\link{sf}} and \code{\link{Matrix}} packages to compute the geospatial adjacency matrix between census geographies. Internal function to calculate AI is based on \code{\link[DescTools]{Atkinson}} function. There is no code companion to compute RI, EI, DI, II, V, LQ, or LEx/Is included in Anthopolos et al. (2011) \doi{10.1016/j.sste.2011.06.002}, Bravo et al. (2021) \doi{10.3390/ijerph18179384}, Duncan & Duncan (1955) \doi{10.2307/2088328}, Bell (1954) \doi{10.2307/2574118}, White (1986) \doi{10.2307/3644339}, Sudano et al. (2013) \doi{10.1016/j.healthplace.2012.09.015}, or Bemanian & Beyer (2017) \doi{10.1158/1055-9965.EPI-16-0926}, respectively. +} + +\seealso{ +Useful links: +\itemize{ + \item \url{https://github.com/idblr/ndi} + \item Report bugs at \url{https://github.com/idblr/ndi/issues} } +} \author{ -Ian D. Buller\cr \emph{Social & Scientific Systems, Inc., a division of DLH Corporation, Silver Spring, Maryland, USA (current); Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland, USA (original).} \cr +Ian D. Buller\cr \emph{Social & Scientific Systems, Inc., a DLH Corporation Holding Company, Bethesda, Maryland, USA (current); Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland, USA (original).} \cr Maintainer: I.D.B. \email{ian.buller@alumni.emory.edu} } -\keyword{package} +\keyword{internal} diff --git a/man/powell_wiley.Rd b/man/powell_wiley.Rd index 3ca5529..1fbc20c 100644 --- a/man/powell_wiley.Rd +++ b/man/powell_wiley.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/powell_wiley.R \name{powell_wiley} \alias{powell_wiley} -\title{Neighborhood Deprivation Index based on Andrews \emph{et al.} (2020) and Slotman \emph{et al.} (2022)} +\title{Neighborhood Deprivation Index based on Andrews et al. (2020) and Slotman et al. (2022)} \usage{ powell_wiley( geo = "tract", @@ -15,7 +15,7 @@ powell_wiley( ) } \arguments{ -\item{geo}{Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}.} +\item{geo}{Character string specifying the geography of the data either census tracts \code{geo = 'tract'} (the default) or counties \code{geo = 'county'}.} \item{year}{Numeric. The year to compute the estimate. The default is 2020, and the years 2010 onward are currently available.} @@ -43,7 +43,7 @@ An object of class 'list'. This is a named list with the following components: Compute the aspatial Neighborhood Deprivation Index (Powell-Wiley). } \details{ -This function will compute the aspatial Neighborhood Deprivation Index (NDI) of U.S. census tracts or counties for a specified geographical referent (e.g., US-standardized) based on Andrews \emph{et al.} (2020) \doi{10.1080/17445647.2020.1750066} and Slotman \emph{et al.} (2022) \doi{10.1016/j.dib.2022.108002}. +This function will compute the aspatial Neighborhood Deprivation Index (NDI) of U.S. census tracts or counties for a specified geographical referent (e.g., US-standardized) based on Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002}. The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for computation involving a factor analysis with the \code{\link[psych]{principal}} function. The yearly estimates are available in 2010 and after when all census characteristics became available. The thirteen characteristics chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x} are: \itemize{ @@ -62,7 +62,7 @@ The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. \item \strong{PctUnempl (S2301)}: percent unemployed } -Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify the referent for standardizing the NDI (Powell-Wiley) values. For example, if all U.S. states are specified for the \code{state} argument, then the output would be a U.S.-standardized index. Please note: the NDI (Powell-Wiley) values will not exactly match (but will highly correlate with) those found in Andrews \emph{et al.} (2020) \doi{10.1080/17445647.2020.1750066} and Slotman \emph{et al.} (2022) \doi{10.1016/j.dib.2022.108002} because the two studies used a different statistical platform (i.e., SPSS and SAS, respectively) that intrinsically calculate the principal component analysis differently from R. +Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify the referent for standardizing the NDI (Powell-Wiley) values. For example, if all U.S. states are specified for the \code{state} argument, then the output would be a U.S.-standardized index. Please note: the NDI (Powell-Wiley) values will not exactly match (but will highly correlate with) those found in Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} because the two studies used a different statistical platform (i.e., SPSS and SAS, respectively) that intrinsically calculate the principal component analysis differently from R. The categorical NDI (Powell-Wiley) values are population-weighted quintiles of the continuous NDI (Powell-Wiley) values. @@ -76,16 +76,16 @@ powell_wiley(df = DCtracts2020[ , -c(3:10)]) \dontrun{ # Wrapped in \dontrun{} because these examples require a Census API key. - + # Tract-level metric (2020) - powell_wiley(geo = "tract", state = "GA", year = 2020) + powell_wiley(geo = 'tract', state = 'GA', year = 2020) # Impute NDI for tracts (2020) with missing census information (median values) - powell_wiley(state = "tract", "GA", year = 2020, imp = TRUE) - + powell_wiley(state = 'tract', 'GA', year = 2020, imp = TRUE) + # County-level metric (2020) - powell_wiley(geo = "county", state = "GA", year = 2020) - + powell_wiley(geo = 'county', state = 'GA', year = 2020) + } } diff --git a/man/sudano.Rd b/man/sudano.Rd index 4576cc3..c98c5c2 100644 --- a/man/sudano.Rd +++ b/man/sudano.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/sudano.R \name{sudano} \alias{sudano} -\title{Location Quotient (LQ) based on Merton (1938) and Sudano \emph{et al.} (2013)} +\title{Location Quotient (LQ) based on Merton (1938) and Sudano et al. (2013)} \usage{ sudano( geo_large = "county", @@ -15,9 +15,9 @@ sudano( ) } \arguments{ -\item{geo_large}{Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = "county"}.} +\item{geo_large}{Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}.} -\item{geo_small}{Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = "tract"}.} +\item{geo_small}{Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}.} \item{year}{Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available.} @@ -42,47 +42,52 @@ An object of class 'list'. This is a named list with the following components: Compute the aspatial Location Quotient (Sudano) of a selected racial/ethnic subgroup(s) and U.S. geographies. } \details{ -This function will compute the aspatial Location Quotient (LQ) of selected racial/ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Merton (1939) \doi{10.2307/2084686} and Sudano \emph{et al.} (2013) \doi{10.1016/j.healthplace.2012.09.015}. This function provides the computation of LQ for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). +This function will compute the aspatial Location Quotient (LQ) of selected racial/ethnic subgroups and U.S. geographies for a specified geographical extent (e.g., the entire U.S. or a single state) based on Merton (1939) \doi{10.2307/2084686} and Sudano et al. (2013) \doi{10.1016/j.healthplace.2012.09.015}. This function provides the computation of LQ for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: \itemize{ -\item \strong{B03002_002}: not Hispanic or Latino \code{"NHoL"} -\item \strong{B03002_003}: not Hispanic or Latino, white alone \code{"NHoLW"} -\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{"NHoLB"} -\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{"NHoLAIAN"} -\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{"NHoLA"} -\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"NHoLNHOPI"} -\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{"NHoLSOR"} -\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{"NHoLTOMR"} -\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{"NHoLTRiSOR"} -\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"NHoLTReSOR"} -\item \strong{B03002_012}: Hispanic or Latino \code{"HoL"} -\item \strong{B03002_013}: Hispanic or Latino, white alone \code{"HoLW"} -\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{"HoLB"} -\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{"HoLAIAN"} -\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{"HoLA"} -\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"HoLNHOPI"} -\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{"HoLSOR"} -\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{"HoLTOMR"} -\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{"HoLTRiSOR"} -\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"HoLTReSOR"} +\item \strong{B03002_002}: not Hispanic or Latino \code{'NHoL'} +\item \strong{B03002_003}: not Hispanic or Latino, white alone \code{'NHoLW'} +\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{'NHoLA'} +\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +\item \strong{B03002_012}: Hispanic or Latino \code{'HoL'} +\item \strong{B03002_013}: Hispanic or Latino, white alone \code{'HoLW'} +\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{'HoLB'} +\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{'HoLA'} +\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} } Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. LQ is some measure of relative racial homogeneity of each smaller geography within a larger geography. LQ can range in value from 0 to infinity because it is ratio of two proportions in which the numerator is the proportion of subgroup population in a smaller geography and the denominator is the proportion of subgroup population in its larger geography. For example, a smaller geography with an LQ of 5 means that the proportion of the subgroup population living in the smaller geography is five times the proportion of the subgroup population in its larger geography. -Larger geographies available include state \code{geo_large = "state"}, county \code{geo_large = "county"}, and census tract \code{geo_large = "tract"} levels. Smaller geographies available include, county \code{geo_small = "county"}, census tract \code{geo_small = "tract"}, and census block group \code{geo_small = "block group"} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the LQ value returned is NA. +Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, and census tract \code{geo_large = 'tract'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the LQ value returned is NA. } \examples{ \dontrun{ # Wrapped in \dontrun{} because these examples require a Census API key. - + # Isolation of non-Hispanic Black populations ## of census tracts within Georgia, U.S.A., counties (2020) - sudano(geo_large = "state", geo_small = "county", state = "GA", - year = 2020, subgroup = "NHoLB") - + sudano( + geo_large = 'state', + geo_small = 'county', + state = 'GA', + year = 2020, + subgroup = 'NHoLB' + ) + } } diff --git a/man/white.Rd b/man/white.Rd index cdcafee..53d1097 100644 --- a/man/white.Rd +++ b/man/white.Rd @@ -15,9 +15,9 @@ white( ) } \arguments{ -\item{geo_large}{Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = "county"}.} +\item{geo_large}{Character string specifying the larger geographical unit of the data. The default is counties \code{geo_large = 'county'}.} -\item{geo_small}{Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = "tract"}.} +\item{geo_small}{Character string specifying the smaller geographical unit of the data. The default is census tracts \code{geo_large = 'tract'}.} \item{year}{Numeric. The year to compute the estimate. The default is 2020, and the years 2009 onward are currently available.} @@ -46,33 +46,33 @@ This function will compute the aspatial Correlation Ratio (V or \eqn{Eta^{2}}{Et The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the aspatial computation. The yearly estimates are available for 2009 onward when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: \itemize{ -\item \strong{B03002_002}: not Hispanic or Latino \code{"NHoL"} -\item \strong{B03002_003}: not Hispanic or Latino, white alone \code{"NHoLW"} -\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{"NHoLB"} -\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{"NHoLAIAN"} -\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{"NHoLA"} -\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"NHoLNHOPI"} -\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{"NHoLSOR"} -\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{"NHoLTOMR"} -\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{"NHoLTRiSOR"} -\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"NHoLTReSOR"} -\item \strong{B03002_012}: Hispanic or Latino \code{"HoL"} -\item \strong{B03002_013}: Hispanic or Latino, white alone \code{"HoLW"} -\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{"HoLB"} -\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{"HoLAIAN"} -\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{"HoLA"} -\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{"HoLNHOPI"} -\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{"HoLSOR"} -\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{"HoLTOMR"} -\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{"HoLTRiSOR"} -\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{"HoLTReSOR"} +\item \strong{B03002_002}: not Hispanic or Latino \code{'NHoL'} +\item \strong{B03002_003}: not Hispanic or Latino, white alone \code{'NHoLW'} +\item \strong{B03002_004}: not Hispanic or Latino, Black or African American alone \code{'NHoLB'} +\item \strong{B03002_005}: not Hispanic or Latino, American Indian and Alaska Native alone \code{'NHoLAIAN'} +\item \strong{B03002_006}: not Hispanic or Latino, Asian alone \code{'NHoLA'} +\item \strong{B03002_007}: not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'NHoLNHOPI'} +\item \strong{B03002_008}: not Hispanic or Latino, Some other race alone \code{'NHoLSOR'} +\item \strong{B03002_009}: not Hispanic or Latino, Two or more races \code{'NHoLTOMR'} +\item \strong{B03002_010}: not Hispanic or Latino, Two races including Some other race \code{'NHoLTRiSOR'} +\item \strong{B03002_011}: not Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'NHoLTReSOR'} +\item \strong{B03002_012}: Hispanic or Latino \code{'HoL'} +\item \strong{B03002_013}: Hispanic or Latino, white alone \code{'HoLW'} +\item \strong{B03002_014}: Hispanic or Latino, Black or African American alone \code{'HoLB'} +\item \strong{B03002_015}: Hispanic or Latino, American Indian and Alaska Native alone \code{'HoLAIAN'} +\item \strong{B03002_016}: Hispanic or Latino, Asian alone \code{'HoLA'} +\item \strong{B03002_017}: Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone \code{'HoLNHOPI'} +\item \strong{B03002_018}: Hispanic or Latino, Some other race alone \code{'HoLSOR'} +\item \strong{B03002_019}: Hispanic or Latino, Two or more races \code{'HoLTOMR'} +\item \strong{B03002_020}: Hispanic or Latino, Two races including Some other race \code{'HoLTRiSOR'} +\item \strong{B03002_021}: Hispanic or Latino, Two races excluding Some other race, and three or more races \code{'HoLTReSOR'} } Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. -V removes the asymmetry from the Isolation Index (Bell) by controlling for the effect of population composition. The Isolation Index (Bell) is some measure of the probability that a member of one subgroup(s) will meet or interact with a member of another subgroup(s) with higher values signifying higher probability of interaction (less isolation). V can range in value from 0 to 1. +V removes the asymmetry from the Isolation Index (Bell) by controlling for the effect of population composition. The Isolation Index (Bell) is some measure of the probability that a member of one subgroup(s) will meet or interact with a member of another subgroup(s) with higher values signifying higher probability of interaction (less isolation). V can range in value from -Inf to Inf. -Larger geographies available include state \code{geo_large = "state"}, county \code{geo_large = "county"}, and census tract \code{geo_large = "tract"} levels. Smaller geographies available include, county \code{geo_small = "county"}, census tract \code{geo_small = "tract"}, and census block group \code{geo_small = "block group"} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the V value returned is NA. +Larger geographies available include state \code{geo_large = 'state'}, county \code{geo_large = 'county'}, and census tract \code{geo_large = 'tract'} levels. Smaller geographies available include, county \code{geo_small = 'county'}, census tract \code{geo_small = 'tract'}, and census block group \code{geo_small = 'block group'} levels. If a larger geographical area is comprised of only one smaller geographical area (e.g., a U.S county contains only one census tract), then the V value returned is NA. } \examples{ \dontrun{ @@ -80,8 +80,13 @@ Larger geographies available include state \code{geo_large = "state"}, county \c # Isolation of non-Hispanic Black populations ## of census tracts within Georgia, U.S.A., counties (2020) - white(geo_large = "county", geo_small = "tract", state = "GA", - year = 2020, subgroup = "NHoLB") + white( + geo_large = 'county', + geo_small = 'tract', + state = 'GA', + year = 2020, + subgroup = 'NHoLB' + ) } diff --git a/tests/testthat.R b/tests/testthat.R index e78b8e8..40f13cf 100644 --- a/tests/testthat.R +++ b/tests/testthat.R @@ -1,4 +1,4 @@ library(testthat) library(ndi) -test_check("ndi") +test_check('ndi') diff --git a/tests/testthat/test-anthopolos.R b/tests/testthat/test-anthopolos.R index 5b368fb..41ba7ac 100644 --- a/tests/testthat/test-anthopolos.R +++ b/tests/testthat/test-anthopolos.R @@ -1,37 +1,68 @@ -context("anthopolos") +context('anthopolos') -####################### +# ------------------- # # anthopolos testthat # -####################### +# ------------------- # -test_that("anthopolos throws error with invalid arguments", { - +test_that('anthopolos throws error with invalid arguments', { # Unavailable geography - expect_error(anthopolos(geo = "zcta", state = "DC", year = 2020, subgroup = "NHoLB", quiet = TRUE)) + expect_error(anthopolos( + geo = 'zcta', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + )) # Unavailable year - expect_error(anthopolos(state = "DC", year = 2005, subgroup = "NHoLB", quiet = TRUE)) + expect_error(anthopolos( + state = 'DC', + year = 2005, + subgroup = 'NHoLB', + quiet = TRUE + )) # Unavailable subgroup - expect_error(anthopolos(state = "DC", year = 2020, subgroup = "terran", quiet = TRUE)) + expect_error(anthopolos( + state = 'DC', + year = 2020, + subgroup = 'terran', + quiet = TRUE + )) - skip_if(Sys.getenv("CENSUS_API_KEY") == "") + skip_if(Sys.getenv('CENSUS_API_KEY') == '') # Incorrect state - expect_error(anthopolos(state = "AB", year = 2020, subgroup = "NHoLB", quiet = TRUE)) + expect_error(anthopolos( + state = 'AB', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + )) -} -) +}) -test_that("anthopolos works", { - - skip_if(Sys.getenv("CENSUS_API_KEY") == "") +test_that('anthopolos works', { + skip_if(Sys.getenv('CENSUS_API_KEY') == '') - expect_output(anthopolos(state = "DC", year = 2020, subgroup = c("NHoLB", "HoLB"))) + expect_output(anthopolos( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB') + )) - expect_silent(anthopolos(state = "DC", year = 2020, subgroup = "NHoLB", quiet = TRUE)) + expect_silent(anthopolos( + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + )) - expect_silent(anthopolos(state = "DC", year = 2020, subgroup = c("NHoLB", "HoLB"), quiet = TRUE)) + expect_silent(anthopolos( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB'), + quiet = TRUE + )) -} -) +}) diff --git a/tests/testthat/test-atkinson.R b/tests/testthat/test-atkinson.R index 0695241..419c0a0 100644 --- a/tests/testthat/test-atkinson.R +++ b/tests/testthat/test-atkinson.R @@ -1,50 +1,86 @@ -context("atkinson") +context('atkinson') -##################### +# ----------------- # # atkinson testthat # -##################### +# ----------------- # -test_that("atkinson throws error with invalid arguments", { - +test_that('atkinson throws error with invalid arguments', { # Unavailable geography - expect_error(atkinson(geo_small = "zcta", state = "DC", year = 2020, - subgroup = "NHoLB", quiet = TRUE)) - expect_error(atkinson(geo_large = "block group", state = "DC", year = 2020, - subgroup = "NHoLB", quiet = TRUE)) + expect_error(atkinson( + geo_small = 'zcta', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + )) + expect_error( + atkinson( + geo_large = 'block group', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + ) + ) # Unavailable year - expect_error(atkinson(state = "DC", year = 2005, - subgroup = "NHoLB", quiet = TRUE)) + expect_error(atkinson( + state = 'DC', + year = 2005, + subgroup = 'NHoLB', + quiet = TRUE + )) # Unavailable subgroup - expect_error(atkinson(state = "DC", year = 2020, - subgroup = "terran", quiet = TRUE)) + expect_error(atkinson( + state = 'DC', + year = 2020, + subgroup = 'terran', + quiet = TRUE + )) # Incorrect epsilon - expect_error(atkinson(state = "DC", year = 2020, - subgroup = "NHoLB", epsilon = 2, quiet = TRUE)) + expect_error(atkinson( + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + epsilon = 2, + quiet = TRUE + )) - skip_if(Sys.getenv("CENSUS_API_KEY") == "") + skip_if(Sys.getenv('CENSUS_API_KEY') == '') # Incorrect state - expect_error(atkinson(state = "AB", year = 2020, - subgroup = "NHoLB", quiet = TRUE)) + expect_error(atkinson( + state = 'AB', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + )) -} -) +}) -test_that("atkinson works", { - - skip_if(Sys.getenv("CENSUS_API_KEY") == "") +test_that('atkinson works', { + skip_if(Sys.getenv('CENSUS_API_KEY') == '') - expect_silent(atkinson(state = "DC", year = 2020, - subgroup = c("NHoLB", "HoLB"))) + expect_silent(atkinson( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB') + )) - expect_silent(atkinson(state = "DC", year = 2020, - subgroup = "NHoLB", quiet = TRUE)) + expect_silent(atkinson( + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + )) - expect_silent(atkinson(state = "DC", year = 2020, - subgroup = c("NHoLB", "HoLB"), quiet = TRUE)) + expect_silent(atkinson( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB'), + quiet = TRUE + )) -} -) +}) diff --git a/tests/testthat/test-bell.R b/tests/testthat/test-bell.R index 2f50bcc..f7bc18f 100644 --- a/tests/testthat/test-bell.R +++ b/tests/testthat/test-bell.R @@ -1,48 +1,94 @@ -context("bell") +context('bell') -################# +# ------------- # # bell testthat # -################# +# ------------- # -test_that("bell throws error with invalid arguments", { - +test_that('bell throws error with invalid arguments', { # Unavailable geography - expect_error(bell(geo_small = "zcta", state = "DC", year = 2020, - subgroup = "NHoLB", subgroup_ixn = "NHoLW", quiet = TRUE)) - expect_error(bell(geo_large = "block group", state = "DC", year = 2020, - subgroup = "NHoLB", subgroup_ixn = "NHoLW", quiet = TRUE)) + expect_error( + bell( + geo_small = 'zcta', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW', + quiet = TRUE + ) + ) + expect_error( + bell( + geo_large = 'block group', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW', + quiet = TRUE + ) + ) # Unavailable year - expect_error(bell(state = "DC", year = 2005, - subgroup = "NHoLB", subgroup_ixn = "NHoLW", quiet = TRUE)) + expect_error(bell( + state = 'DC', + year = 2005, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW', + quiet = TRUE + )) # Unavailable subgroup - expect_error(bell(state = "DC", year = 2020, - subgroup = "terran", subgroup_ixn = "NHoLW", quiet = TRUE)) - expect_error(bell(state = "DC", year = 2020, - subgroup = "NHoLB", subgroup_ixn = "terran", quiet = TRUE)) + expect_error(bell( + state = 'DC', + year = 2020, + subgroup = 'terran', + subgroup_ixn = 'NHoLW', + quiet = TRUE + )) + expect_error(bell( + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + subgroup_ixn = 'terran', + quiet = TRUE + )) - skip_if(Sys.getenv("CENSUS_API_KEY") == "") + skip_if(Sys.getenv('CENSUS_API_KEY') == '') # Incorrect state - expect_error(bell(state = "AB", year = 2020, - subgroup = "NHoLB", subgroup_ixn = "NHoLW", quiet = TRUE)) + expect_error(bell( + state = 'AB', + year = 2020, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW', + quiet = TRUE + )) -} -) +}) -test_that("bell works", { - - skip_if(Sys.getenv("CENSUS_API_KEY") == "") +test_that('bell works', { + skip_if(Sys.getenv('CENSUS_API_KEY') == '') - expect_silent(bell(state = "DC", year = 2020, - subgroup = c("NHoLB", "HoLB"), subgroup_ixn = c("NHoLW", "HoLW"))) + expect_silent(bell( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB'), + subgroup_ixn = c('NHoLW', 'HoLW') + )) - expect_silent(bell(state = "DC", year = 2020, - subgroup = "NHoLB", subgroup_ixn = "NHoLW", quiet = TRUE)) + expect_silent(bell( + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW', + quiet = TRUE + )) - expect_silent(bell(state = "DC", year = 2020, - subgroup = c("NHoLB", "HoLB"), subgroup_ixn = c("NHoLW", "HoLW"), quiet = TRUE)) + expect_silent(bell( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB'), + subgroup_ixn = c('NHoLW', 'HoLW'), + quiet = TRUE + )) -} -) +}) diff --git a/tests/testthat/test-bemanian_beyer.R b/tests/testthat/test-bemanian_beyer.R index 5a648ff..2cd0509 100644 --- a/tests/testthat/test-bemanian_beyer.R +++ b/tests/testthat/test-bemanian_beyer.R @@ -1,48 +1,104 @@ -context("bemanian_beyer") +context('bemanian_beyer') -########################### +# ----------------------- # # bemanian_beyer testthat # -########################### +# ----------------------- # -test_that("bemanian_beyer throws error with invalid arguments", { - +test_that('bemanian_beyer throws error with invalid arguments', { # Unavailable geography - expect_error(bemanian_beyer(geo_small = "zcta", state = "DC", year = 2020, - subgroup = "NHoLB", subgroup_ixn = "NHoLW", quiet = TRUE)) - expect_error(bemanian_beyer(geo_large = "block group", state = "DC", year = 2020, - subgroup = "NHoLB", subgroup_ixn = "NHoLW", quiet = TRUE)) + expect_error( + bemanian_beyer( + geo_small = 'zcta', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW', + quiet = TRUE + ) + ) + expect_error( + bemanian_beyer( + geo_large = 'block group', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW', + quiet = TRUE + ) + ) # Unavailable year - expect_error(bemanian_beyer(state = "DC", year = 2005, - subgroup = "NHoLB", subgroup_ixn = "NHoLW", quiet = TRUE)) + expect_error( + bemanian_beyer( + state = 'DC', + year = 2005, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW', + quiet = TRUE + ) + ) # Unavailable subgroup - expect_error(bemanian_beyer(state = "DC", year = 2020, - subgroup = "terran", subgroup_ixn = "NHoLW", quiet = TRUE)) - expect_error(bemanian_beyer(state = "DC", year = 2020, - subgroup = "NHoLB", subgroup_ixn = "terran", quiet = TRUE)) + expect_error( + bemanian_beyer( + state = 'DC', + year = 2020, + subgroup = 'terran', + subgroup_ixn = 'NHoLW', + quiet = TRUE + ) + ) + expect_error( + bemanian_beyer( + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + subgroup_ixn = 'terran', + quiet = TRUE + ) + ) - skip_if(Sys.getenv("CENSUS_API_KEY") == "") + skip_if(Sys.getenv('CENSUS_API_KEY') == '') # Incorrect state - expect_error(bemanian_beyer(state = "AB", year = 2020, - subgroup = "NHoLB", subgroup_ixn = "NHoLW", quiet = TRUE)) + expect_error( + bemanian_beyer( + state = 'AB', + year = 2020, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW', + quiet = TRUE + ) + ) -} -) +}) -test_that("bemanian_beyer works", { - - skip_if(Sys.getenv("CENSUS_API_KEY") == "") +test_that('bemanian_beyer works', { + skip_if(Sys.getenv('CENSUS_API_KEY') == '') - expect_warning(bemanian_beyer(state = "DC", year = 2020, - subgroup = c("NHoLB", "HoLB"), subgroup_ixn = c("NHoLW", "HoLW"))) + expect_warning(bemanian_beyer( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB'), + subgroup_ixn = c('NHoLW', 'HoLW') + )) - expect_warning(bemanian_beyer(state = "DC", year = 2020, - subgroup = "NHoLB", subgroup_ixn = "NHoLW", quiet = TRUE)) + expect_warning( + bemanian_beyer( + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW', + quiet = TRUE + ) + ) - expect_warning(bemanian_beyer(state = "DC", year = 2020, - subgroup = c("NHoLB", "HoLB"), subgroup_ixn = c("NHoLW", "HoLW"), quiet = TRUE)) + expect_warning(bemanian_beyer( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB'), + subgroup_ixn = c('NHoLW', 'HoLW'), + quiet = TRUE + )) -} -) +}) diff --git a/tests/testthat/test-bravo.R b/tests/testthat/test-bravo.R index 662e1d0..60efbe0 100644 --- a/tests/testthat/test-bravo.R +++ b/tests/testthat/test-bravo.R @@ -1,37 +1,68 @@ -context("bravo") +context('bravo') -################## +# -------------- # # bravo testthat # -################## +# -------------- # -test_that("bravo throws error with invalid arguments", { - +test_that('bravo throws error with invalid arguments', { # Unavailable geography - expect_error(bravo(geo = "zcta", state = "DC", year = 2020, subgroup = "LtHS", quiet = TRUE)) + expect_error(bravo( + geo = 'zcta', + state = 'DC', + year = 2020, + subgroup = 'LtHS', + quiet = TRUE + )) # Unavailable year - expect_error(bravo(state = "DC", year = 2005, subgroup = "LtHS", quiet = TRUE)) + expect_error(bravo( + state = 'DC', + year = 2005, + subgroup = 'LtHS', + quiet = TRUE + )) # Unavailable subgroup - expect_error(bravo(state = "DC", year = 2020, subgroup = "terran", quiet = TRUE)) + expect_error(bravo( + state = 'DC', + year = 2020, + subgroup = 'terran', + quiet = TRUE + )) - skip_if(Sys.getenv("CENSUS_API_KEY") == "") + skip_if(Sys.getenv('CENSUS_API_KEY') == '') # Incorrect state - expect_error(bravo(state = "AB", year = 2020, subgroup = "LtHS", quiet = TRUE)) + expect_error(bravo( + state = 'AB', + year = 2020, + subgroup = 'LtHS', + quiet = TRUE + )) -} -) +}) -test_that("bravo works", { - - skip_if(Sys.getenv("CENSUS_API_KEY") == "") +test_that('bravo works', { + skip_if(Sys.getenv('CENSUS_API_KEY') == '') - expect_output(bravo(state = "DC", year = 2009, subgroup = c("LtHS", "HSGiE"))) + expect_output(bravo( + state = 'DC', + year = 2009, + subgroup = c('LtHS', 'HSGiE') + )) - expect_silent(bravo(state = "DC", year = 2020, subgroup = "LtHS", quiet = TRUE)) + expect_silent(bravo( + state = 'DC', + year = 2020, + subgroup = 'LtHS', + quiet = TRUE + )) - expect_silent(bravo(state = "DC", year = 2020, subgroup = c("LtHS", "HSGiE"), quiet = TRUE)) + expect_silent(bravo( + state = 'DC', + year = 2020, + subgroup = c('LtHS', 'HSGiE'), + quiet = TRUE + )) -} -) +}) diff --git a/tests/testthat/test-duncan.R b/tests/testthat/test-duncan.R index d895140..da49e0e 100644 --- a/tests/testthat/test-duncan.R +++ b/tests/testthat/test-duncan.R @@ -1,48 +1,104 @@ -context("duncan") +context('duncan') -################### +# --------------- # # duncan testthat # -################### +# --------------- # -test_that("duncan throws error with invalid arguments", { - +test_that('duncan throws error with invalid arguments', { # Unavailable geography - expect_error(duncan(geo_small = "zcta", state = "DC", year = 2020, - subgroup = "NHoLB", subgroup_ref = "NHoLW", quiet = TRUE)) - expect_error(duncan(geo_large = "block group", state = "DC", year = 2020, - subgroup = "NHoLB", subgroup_ref = "NHoLW", quiet = TRUE)) + expect_error( + duncan( + geo_small = 'zcta', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + subgroup_ref = 'NHoLW', + quiet = TRUE + ) + ) + expect_error( + duncan( + geo_large = 'block group', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + subgroup_ref = 'NHoLW', + quiet = TRUE + ) + ) # Unavailable year - expect_error(duncan(state = "DC", year = 2005, - subgroup = "NHoLB", subgroup_ref = "NHoLW", quiet = TRUE)) + expect_error( + duncan( + state = 'DC', + year = 2005, + subgroup = 'NHoLB', + subgroup_ref = 'NHoLW', + quiet = TRUE + ) + ) # Unavailable subgroup - expect_error(duncan(state = "DC", year = 2020, - subgroup = "terran", subgroup_ref = "NHoLW", quiet = TRUE)) - expect_error(duncan(state = "DC", year = 2020, - subgroup = "NHoLB", subgroup_ref = "terran", quiet = TRUE)) + expect_error( + duncan( + state = 'DC', + year = 2020, + subgroup = 'terran', + subgroup_ref = 'NHoLW', + quiet = TRUE + ) + ) + expect_error( + duncan( + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + subgroup_ref = 'terran', + quiet = TRUE + ) + ) - skip_if(Sys.getenv("CENSUS_API_KEY") == "") + skip_if(Sys.getenv('CENSUS_API_KEY') == '') # Incorrect state - expect_error(duncan(state = "AB", year = 2020, - subgroup = "NHoLB", subgroup_ref = "NHoLW", quiet = TRUE)) + expect_error( + duncan( + state = 'AB', + year = 2020, + subgroup = 'NHoLB', + subgroup_ref = 'NHoLW', + quiet = TRUE + ) + ) -} -) +}) -test_that("duncan works", { - - skip_if(Sys.getenv("CENSUS_API_KEY") == "") +test_that('duncan works', { + skip_if(Sys.getenv('CENSUS_API_KEY') == '') - expect_silent(duncan(state = "DC", year = 2020, - subgroup = c("NHoLB", "HoLB"), subgroup_ref = c("NHoLW", "HoLW"))) + expect_silent(duncan( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB'), + subgroup_ref = c('NHoLW', 'HoLW') + )) - expect_silent(duncan(state = "DC", year = 2020, - subgroup = "NHoLB", subgroup_ref = "NHoLW", quiet = TRUE)) + expect_silent( + duncan( + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + subgroup_ref = 'NHoLW', + quiet = TRUE + ) + ) - expect_silent(duncan(state = "DC", year = 2020, - subgroup = c("NHoLB", "HoLB"), subgroup_ref = c("NHoLW", "HoLW"), quiet = TRUE)) + expect_silent(duncan( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB'), + subgroup_ref = c('NHoLW', 'HoLW'), + quiet = TRUE + )) -} -) +}) diff --git a/tests/testthat/test-gini.R b/tests/testthat/test-gini.R index 957d982..e4b7b23 100644 --- a/tests/testthat/test-gini.R +++ b/tests/testthat/test-gini.R @@ -1,35 +1,49 @@ -context("gini") +context('gini') -################# +# ------------- # # gini testthat # -################# +# ------------- # -test_that("gini throws error with invalid arguments", { - +test_that('gini throws error with invalid arguments', { # Unavailable geography - expect_error(gini(geo = "zcta", state = "DC", year = 2020, quiet = TRUE)) + expect_error(gini( + geo = 'zcta', + state = 'DC', + year = 2020, + quiet = TRUE + )) # Unavailable year - expect_error(gini(state = "DC", year = 2005, quiet = TRUE)) + expect_error(gini( + state = 'DC', + year = 2005, + quiet = TRUE + )) - skip_if(Sys.getenv("CENSUS_API_KEY") == "") + skip_if(Sys.getenv('CENSUS_API_KEY') == '') # Incorrect state - expect_error(gini(state = "AB", year = 2020)) + expect_error(gini(state = 'AB', year = 2020)) # Unavailable geography for DC (only 1 'county' in DC so, alone, NDI cannot be computed) - expect_error(gini(geo = "county", state = "DC", year = 2009, quiet = TRUE)) - -} -) + expect_error(gini( + geo = 'county', + state = 'DC', + year = 2009, + quiet = TRUE + )) + +}) -test_that("gini works", { - - skip_if(Sys.getenv("CENSUS_API_KEY") == "") +test_that('gini works', { + skip_if(Sys.getenv('CENSUS_API_KEY') == '') - expect_message(gini(state = "DC", year = 2020)) + expect_message(gini(state = 'DC', year = 2020)) - expect_silent(gini(state = "DC", year = 2020, quiet = TRUE)) + expect_silent(gini( + state = 'DC', + year = 2020, + quiet = TRUE + )) -} -) +}) diff --git a/tests/testthat/test-hoover.R b/tests/testthat/test-hoover.R new file mode 100644 index 0000000..994f7d2 --- /dev/null +++ b/tests/testthat/test-hoover.R @@ -0,0 +1,77 @@ +context('hoover') + +# --------------- # +# hoover testthat # +# --------------- # + +test_that('hoover throws error with invalid arguments', { + # Unavailable geography + expect_error(hoover( + geo_small = 'zcta', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + )) + expect_error( + hoover( + geo_large = 'block group', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + ) + ) + + # Unavailable year + expect_error(hoover( + state = 'DC', + year = 2005, + subgroup = 'NHoLB', + quiet = TRUE + )) + + # Unavailable subgroup + expect_error(hoover( + state = 'DC', + year = 2020, + subgroup = 'terran', + quiet = TRUE + )) + + skip_if(Sys.getenv('CENSUS_API_KEY') == '') + + # Incorrect state + expect_error(hoover( + state = 'AB', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + )) + +}) + +test_that('hoover works', { + skip_if(Sys.getenv('CENSUS_API_KEY') == '') + + expect_silent(hoover( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB') + )) + + expect_silent(hoover( + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + )) + + expect_silent(hoover( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB'), + quiet = TRUE + )) + +}) diff --git a/tests/testthat/test-krieger.R b/tests/testthat/test-krieger.R index c2727e9..5d20b63 100644 --- a/tests/testthat/test-krieger.R +++ b/tests/testthat/test-krieger.R @@ -1,32 +1,45 @@ -context("krieger") +context('krieger') -#################### +# ---------------- # # krieger testthat # -#################### +# ---------------- # -test_that(" throws error with invalid arguments", { - +test_that(' throws error with invalid arguments', { # Unavailable geography - expect_error(krieger(geo = "zcta", state = "DC", year = 2020, quiet = TRUE)) + expect_error(krieger( + geo = 'zcta', + state = 'DC', + year = 2020, + quiet = TRUE + )) # Unavailable year - expect_error(krieger(state = "DC", year = 2005, quiet = TRUE)) + expect_error(krieger( + state = 'DC', + year = 2005, + quiet = TRUE + )) - skip_if(Sys.getenv("CENSUS_API_KEY") == "") + skip_if(Sys.getenv('CENSUS_API_KEY') == '') # Incorrect state - expect_error(krieger(state = "AB", year = 2020, quiet = TRUE)) + expect_error(krieger( + state = 'AB', + year = 2020, + quiet = TRUE + )) -} -) +}) -test_that("krieger works", { - - skip_if(Sys.getenv("CENSUS_API_KEY") == "") +test_that('krieger works', { + skip_if(Sys.getenv('CENSUS_API_KEY') == '') - expect_silent(krieger(state = "DC", year = 2020)) + expect_silent(krieger(state = 'DC', year = 2020)) - expect_silent(krieger(state = "DC", year = 2020, quiet = TRUE)) + expect_silent(krieger( + state = 'DC', + year = 2020, + quiet = TRUE + )) -} -) +}) diff --git a/tests/testthat/test-messer.R b/tests/testthat/test-messer.R index 2193965..698100a 100644 --- a/tests/testthat/test-messer.R +++ b/tests/testthat/test-messer.R @@ -1,46 +1,77 @@ -context("messer") +context('messer') -################### +# --------------- # # messer testthat # -################### +# --------------- # -test_that("messer throws error with invalid arguments", { - +test_that('messer throws error with invalid arguments', { # Not a data.frame or tibble for `df` - expect_error(messer(df = c("a", "b", "c"))) + expect_error(messer(df = c('a', 'b', 'c'))) # Unavailable geography - expect_error(messer(geo = "zcta", state = "DC", year = 2020, quiet = TRUE)) + expect_error(messer( + geo = 'zcta', + state = 'DC', + year = 2020, + quiet = TRUE + )) # Unavailable year - expect_error(messer(state = "DC", year = 2005, quiet = TRUE)) + expect_error(messer( + state = 'DC', + year = 2005, + quiet = TRUE + )) - skip_if(Sys.getenv("CENSUS_API_KEY") == "") + skip_if(Sys.getenv('CENSUS_API_KEY') == '') # Incorrect state - expect_error(messer(state = "AB", year = 2020, quiet = TRUE)) + expect_error(messer( + state = 'AB', + year = 2020, + quiet = TRUE + )) # Unavailable geography for DC (only 1 'county' in DC so, alone, NDI cannot be computed) - expect_error(messer(geo = "county", state = "DC", year = 2009, quiet = TRUE)) + expect_error(messer( + geo = 'county', + state = 'DC', + year = 2009, + quiet = TRUE + )) -} -) +}) -test_that("messer works", { +test_that('messer works', { + expect_message(messer(df = DCtracts2020[,-c(2, 11:ncol(DCtracts2020))])) - expect_message(messer(df = DCtracts2020[, -c(2, 11:ncol(DCtracts2020))])) + skip_if(Sys.getenv('CENSUS_API_KEY') == '') - skip_if(Sys.getenv("CENSUS_API_KEY") == "") + expect_message(messer(state = 'DC', year = 2020)) - expect_message(messer(state = "DC", year = 2020)) - - expect_message(messer(state = "DC", year = 2020, round_output = TRUE)) + expect_message(messer( + state = 'DC', + year = 2020, + round_output = TRUE + )) - expect_message(messer(state = "DC", year = 2020, imp = TRUE)) + expect_message(messer( + state = 'DC', + year = 2020, + imp = TRUE + )) - expect_silent(messer(state = "DC", year = 2020, quiet = TRUE)) + expect_silent(messer( + state = 'DC', + year = 2020, + quiet = TRUE + )) - expect_silent(messer(state = "DC", year = 2020, imp = TRUE, quiet = TRUE)) + expect_silent(messer( + state = 'DC', + year = 2020, + imp = TRUE, + quiet = TRUE + )) -} -) +}) diff --git a/tests/testthat/test-powell_wiley.R b/tests/testthat/test-powell_wiley.R index 1ff4de4..97f509f 100644 --- a/tests/testthat/test-powell_wiley.R +++ b/tests/testthat/test-powell_wiley.R @@ -1,46 +1,77 @@ -context("powell_wiley") +context('powell_wiley') -######################### +# --------------------- # # powell_wiley testthat # -######################### +# --------------------- # -test_that("powell_wiley throws error with invalid arguments", { - +test_that('powell_wiley throws error with invalid arguments', { # Not a data.frame or tibble for `df` - expect_error(powell_wiley(df = c("a", "b", "c"))) + expect_error(powell_wiley(df = c('a', 'b', 'c'))) # Unavailable geography - expect_error(powell_wiley(geo = "zcta", state = "DC", year = 2020, quiet = TRUE)) + expect_error(powell_wiley( + geo = 'zcta', + state = 'DC', + year = 2020, + quiet = TRUE + )) # Unavailable year - expect_error(powell_wiley(state = "DC", year = 2005, quiet = TRUE)) + expect_error(powell_wiley( + state = 'DC', + year = 2005, + quiet = TRUE + )) - skip_if(Sys.getenv("CENSUS_API_KEY") == "") + skip_if(Sys.getenv('CENSUS_API_KEY') == '') # Incorrect state - expect_error(powell_wiley(state = "AB", year = 2020, quiet = TRUE)) + expect_error(powell_wiley( + state = 'AB', + year = 2020, + quiet = TRUE + )) # Unavailable geography for DC (only 1 'county' in DC so, alone, NDI cannot be computed) - expect_error(powell_wiley(geo = "county", state = "DC", year = 2009, quiet = TRUE)) - -} -) - -test_that("powell_wiley works", { + expect_error(powell_wiley( + geo = 'county', + state = 'DC', + year = 2009, + quiet = TRUE + )) - expect_message(powell_wiley(df = DCtracts2020[ , -c(3:10)])) +}) + +test_that('powell_wiley works', { + expect_message(powell_wiley(df = DCtracts2020[,-c(3:10)])) - skip_if(Sys.getenv("CENSUS_API_KEY") == "") + skip_if(Sys.getenv('CENSUS_API_KEY') == '') - expect_message(powell_wiley(state = "DC", year = 2020)) + expect_message(powell_wiley(state = 'DC', year = 2020)) - expect_message(powell_wiley(state = "DC", year = 2020, round_output = TRUE)) + expect_message(powell_wiley( + state = 'DC', + year = 2020, + round_output = TRUE + )) - expect_message(powell_wiley(state = "DC", year = 2020, imp = TRUE)) + expect_message(powell_wiley( + state = 'DC', + year = 2020, + imp = TRUE + )) - expect_silent(powell_wiley(state = "DC", year = 2020, quiet = TRUE)) + expect_silent(powell_wiley( + state = 'DC', + year = 2020, + quiet = TRUE + )) - expect_silent(powell_wiley(state = "DC", year = 2020, imp = TRUE, quiet = TRUE)) + expect_silent(powell_wiley( + state = 'DC', + year = 2020, + imp = TRUE, + quiet = TRUE + )) -} -) +}) diff --git a/tests/testthat/test-sudano.R b/tests/testthat/test-sudano.R index 8ecc335..c929055 100644 --- a/tests/testthat/test-sudano.R +++ b/tests/testthat/test-sudano.R @@ -1,46 +1,77 @@ -context("sudano") +context('sudano') -################### +# --------------- # # sudano testthat # -################### +# --------------- # -test_that("sudano throws error with invalid arguments", { - +test_that('sudano throws error with invalid arguments', { # Unavailable geography - expect_error(sudano(geo_small = "zcta", state = "DC", year = 2020, - subgroup = "NHoLB", quiet = TRUE)) - expect_error(sudano(geo_large = "block group", state = "DC", year = 2020, - subgroup = "NHoLB", quiet = TRUE)) + expect_error(sudano( + geo_small = 'zcta', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + )) + expect_error( + sudano( + geo_large = 'block group', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + ) + ) # Unavailable year - expect_error(sudano(state = "DC", year = 2005, - subgroup = "NHoLB", quiet = TRUE)) + expect_error(sudano( + state = 'DC', + year = 2005, + subgroup = 'NHoLB', + quiet = TRUE + )) # Unavailable subgroup - expect_error(sudano(state = "DC", year = 2020, - subgroup = "terran", quiet = TRUE)) + expect_error(sudano( + state = 'DC', + year = 2020, + subgroup = 'terran', + quiet = TRUE + )) - skip_if(Sys.getenv("CENSUS_API_KEY") == "") + skip_if(Sys.getenv('CENSUS_API_KEY') == '') # Incorrect state - expect_error(sudano(state = "AB", year = 2020, - subgroup = "NHoLB", quiet = TRUE)) + expect_error(sudano( + state = 'AB', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + )) -} -) +}) -test_that("sudano works", { - - skip_if(Sys.getenv("CENSUS_API_KEY") == "") +test_that('sudano works', { + skip_if(Sys.getenv('CENSUS_API_KEY') == '') - expect_silent(sudano(state = "DC", year = 2020, - subgroup = c("NHoLB", "HoLB"))) + expect_silent(sudano( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB') + )) - expect_silent(sudano(state = "DC", year = 2020, - subgroup = "NHoLB", quiet = TRUE)) + expect_silent(sudano( + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + )) - expect_silent(sudano(state = "DC", year = 2020, - subgroup = c("NHoLB", "HoLB"), quiet = TRUE)) + expect_silent(sudano( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB'), + quiet = TRUE + )) -} -) +}) diff --git a/tests/testthat/test-white.R b/tests/testthat/test-white.R index 07e618d..0e85449 100644 --- a/tests/testthat/test-white.R +++ b/tests/testthat/test-white.R @@ -1,46 +1,77 @@ -context("white") +context('white') -################## +# -------------- # # white testthat # -################## +# -------------- # -test_that("white throws error with invalid arguments", { - +test_that('white throws error with invalid arguments', { # Unavailable geography - expect_error(white(geo_small = "zcta", state = "DC", year = 2020, - subgroup = "NHoLB", quiet = TRUE)) - expect_error(white(geo_large = "block group", state = "DC", year = 2020, - subgroup = "NHoLB", quiet = TRUE)) + expect_error(white( + geo_small = 'zcta', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + )) + expect_error( + white( + geo_large = 'block group', + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + ) + ) # Unavailable year - expect_error(white(state = "DC", year = 2005, - subgroup = "NHoLB", quiet = TRUE)) + expect_error(white( + state = 'DC', + year = 2005, + subgroup = 'NHoLB', + quiet = TRUE + )) # Unavailable subgroup - expect_error(white(state = "DC", year = 2020, - subgroup = "terran", quiet = TRUE)) + expect_error(white( + state = 'DC', + year = 2020, + subgroup = 'terran', + quiet = TRUE + )) - skip_if(Sys.getenv("CENSUS_API_KEY") == "") + skip_if(Sys.getenv('CENSUS_API_KEY') == '') # Incorrect state - expect_error(white(state = "AB", year = 2020, - subgroup = "NHoLB", quiet = TRUE)) + expect_error(white( + state = 'AB', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + )) -} -) +}) -test_that("white works", { - - skip_if(Sys.getenv("CENSUS_API_KEY") == "") +test_that('white works', { + skip_if(Sys.getenv('CENSUS_API_KEY') == '') - expect_silent(white(state = "DC", year = 2020, - subgroup = c("NHoLB", "HoLB"))) + expect_silent(white( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB') + )) - expect_silent(white(state = "DC", year = 2020, - subgroup = "NHoLB", quiet = TRUE)) + expect_silent(white( + state = 'DC', + year = 2020, + subgroup = 'NHoLB', + quiet = TRUE + )) - expect_silent(white(state = "DC", year = 2020, - subgroup = c("NHoLB", "HoLB"), quiet = TRUE)) + expect_silent(white( + state = 'DC', + year = 2020, + subgroup = c('NHoLB', 'HoLB'), + quiet = TRUE + )) -} -) +}) diff --git a/vignettes/vignette.Rmd b/vignettes/vignette.Rmd index 1cdfcc0..71315e2 100644 --- a/vignettes/vignette.Rmd +++ b/vignettes/vignette.Rmd @@ -1,7 +1,7 @@ --- -title: "ndi: Neighborhood Deprivation Indices" +title: 'ndi: Neighborhood Deprivation Indices' author: 'Ian D. Buller (GitHub: @idblr)' -date: "`r Sys.Date()`" +date: '`r Sys.Date()`' output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{ndi: Neighborhood Deprivation Indices} @@ -11,13 +11,13 @@ vignette: > ```{r setup, include = FALSE} library(knitr) -knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, cache = FALSE, fig.show = "hold") +knitr::opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE, cache = FALSE, fig.show = 'hold') ``` Start with the necessary packages for the vignette. ```{r packages, results = 'hide'} -loadedPackages <- c("dplyr", "ggplot2", "ndi", "tidycensus", "tigris") +loadedPackages <- c('dplyr', 'ggplot2', 'ndi', 'tidycensus', 'tigris') invisible(lapply(loadedPackages, library, character.only = TRUE)) options(tigris_use_cache = TRUE) ``` @@ -25,12 +25,12 @@ options(tigris_use_cache = TRUE) Set your U.S. Census Bureau access key. Follow [this link](http://api.census.gov/data/key_signup.html) to obtain one. Specify your access key in the `messer()` or `powell_wiley()` functions using the `key` argument of the `get_acs()` function from the `tidycensus` package called within each or by using the `census_api_key()` function from the `tidycensus` package before running the `messer()` or `powell_wiley()` functions (see an example of the latter below). ```{r access_key_private, echo = FALSE} -source("../dev/private_key.R") -tidycensus::census_api_key(private_key) +source(file.path('..', 'dev', 'private_key.R')) +census_api_key(private_key) ``` ```{r access_key_public, eval = FALSE} -tidycensus::census_api_key("...") # INSERT YOUR OWN KEY FROM U.S. CENSUS API +census_api_key('...') # INSERT YOUR OWN KEY FROM U.S. CENSUS API ``` ### Compute NDI (Messer) @@ -49,7 +49,7 @@ Compute the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., cens | EMP | Employment | B23001 (2010 only); B23025 (2011 onward) | Percent unemployed | ```{r messer, results = 'hide'} -messer2010GA <- ndi::messer(state = "GA", year = 2010, round_output = TRUE) +messer2010GA <- messer(state = 'GA', year = 2010, round_output = TRUE) ``` One output from the `messer()` function is a tibble containing the identification, geographic name, NDI (Messer) values, and raw census characteristics for each tract. @@ -73,106 +73,134 @@ messer2010GA$missing We can visualize the NDI (Messer) values geographically by linking them to spatial information from the `tigris` package and plotting with the `ggplot2` package suite. ```{r messer_prep, results = 'hide'} -# Obtain the 2010 counties from the "tigris" package -county2010GA <- tigris::counties(state = "GA", year = 2010, cb = TRUE) +# Obtain the 2010 counties from the 'tigris' package +county2010GA <- counties(state = 'GA', year = 2010, cb = TRUE) # Remove first 9 characters from GEOID for compatibility with tigris information county2010GA$GEOID <- substring(county2010GA$GEO_ID, 10) -# Obtain the 2010 census tracts from the "tigris" package -tract2010GA <- tigris::tracts(state = "GA", year = 2010, cb = TRUE) +# Obtain the 2010 census tracts from the 'tigris' package +tract2010GA <- tracts(state = 'GA', year = 2010, cb = TRUE) # Remove first 9 characters from GEOID for compatibility with tigris information tract2010GA$GEOID <- substring(tract2010GA$GEO_ID, 10) # Join the NDI (Messer) values to the census tract geometry -GA2010messer <- dplyr::left_join(tract2010GA, messer2010GA$ndi, by = "GEOID") +GA2010messer <- tract2010GA %>% + left_join(messer2010GA$ndi, by = 'GEOID') ``` ```{r messer_plot, fig.height = 7, fig.width = 7} # Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., census tracts ## Continuous Index -ggplot2::ggplot() + - ggplot2::geom_sf(data = GA2010messer, - ggplot2::aes(fill = NDI), - size = 0.05, - color = "transparent") + - ggplot2::geom_sf(data = county2010GA, - fill = "transparent", - color = "white", - size = 0.2) + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2006-2010 estimates") + - ggplot2::ggtitle("Neighborhood Deprivation Index (Messer)", - subtitle = "GA census tracts as the referent") +ggplot() + + geom_sf( + data = GA2010messer, + aes(fill = NDI), + size = 0.05, + color = 'transparent' + ) + + geom_sf( + data = county2010GA, + fill = 'transparent', + color = 'white', + size = 0.2 + ) + + theme_minimal() + + scale_fill_viridis_c() + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') + + ggtitle( + 'Neighborhood Deprivation Index (Messer)', + subtitle = 'GA census tracts as the referent' + ) ## Categorical Index -### Rename "9-NDI not avail" level as NA for plotting -GA2010messer$NDIQuartNA <- factor(replace(as.character(GA2010messer$NDIQuart), - GA2010messer$NDIQuart == "9-NDI not avail", NA), - c(levels(GA2010messer$NDIQuart)[-5], NA)) - -ggplot2::ggplot() + - ggplot2::geom_sf(data = GA2010messer, - ggplot2::aes(fill = NDIQuartNA), - size = 0.05, - color = "transparent") + - ggplot2::geom_sf(data = county2010GA, - fill = "transparent", - color = "white", - size = 0.2) + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE), - na.value = "grey80") + - ggplot2::labs(fill = "Index (Categorical)", - caption = "Source: U.S. Census ACS 2006-2010 estimates") + - ggplot2::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles", - subtitle = "GA census tracts as the referent") +### Rename '9-NDI not avail' level as NA for plotting +GA2010messer$NDIQuartNA <- + factor( + replace( + as.character(GA2010messer$NDIQuart), + GA2010messer$NDIQuart == '9-NDI not avail', + NA + ), + c(levels(GA2010messer$NDIQuart)[-5], NA) + ) + +ggplot() + + geom_sf( + data = GA2010messer, + aes(fill = NDIQuartNA), + size = 0.05, + color = 'transparent' + ) + + geom_sf( + data = county2010GA, + fill = 'transparent', + color = 'white', + size = 0.2 + ) + + theme_minimal() + + scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') + + labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') + + ggtitle( + 'Neighborhood Deprivation Index (Messer) Quartiles', + subtitle = 'GA census tracts as the referent' + ) ``` The results above are at the tract level. The NDI (Messer) values can also be calculated at the county level. ```{r messer_county_prep, results = 'hide'} -messer2010GA_county <- ndi::messer(geo = "county", state = "GA", year = 2010) +messer2010GA_county <- messer(geo = 'county', state = 'GA', year = 2010) # Join the NDI (Messer) values to the county geometry -GA2010messer_county <- dplyr::left_join(county2010GA, messer2010GA_county$ndi, by = "GEOID") +GA2010messer_county <- county2010GA %>% + left_join(messer2010GA_county$ndi, by = 'GEOID') ``` ```{r messer_county_plot, fig.height = 7, fig.width = 7} # Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., counties ## Continuous Index -ggplot2::ggplot() + - ggplot2::geom_sf(data = GA2010messer_county, - ggplot2::aes(fill = NDI), - size = 0.20, - color = "white") + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2006-2010 estimates") + - ggplot2::ggtitle("Neighborhood Deprivation Index (Messer)", - subtitle = "GA counties as the referent") +ggplot() + + geom_sf( + data = GA2010messer_county, + aes(fill = NDI), + size = 0.20, + color = 'white' + ) + + theme_minimal() + + scale_fill_viridis_c() + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') + + ggtitle( + 'Neighborhood Deprivation Index (Messer)', + subtitle = 'GA counties as the referent' + ) ## Categorical Index -### Rename "9-NDI not avail" level as NA for plotting -GA2010messer_county$NDIQuartNA <- factor(replace(as.character(GA2010messer_county$NDIQuart), - GA2010messer_county$NDIQuart == "9-NDI not avail", NA), - c(levels(GA2010messer_county$NDIQuart)[-5], NA)) - -ggplot2::ggplot() + - ggplot2::geom_sf(data = GA2010messer_county, - ggplot2::aes(fill = NDIQuartNA), - size = 0.20, - color = "white") + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE), - na.value = "grey80") + - ggplot2::labs(fill = "Index (Categorical)", - caption = "Source: U.S. Census ACS 2006-2010 estimates") + - ggplot2::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles", - subtitle = "GA counties as the referent") +### Rename '9-NDI not avail' level as NA for plotting +GA2010messer_county$NDIQuartNA <- + factor( + replace( + as.character(GA2010messer_county$NDIQuart), + GA2010messer_county$NDIQuart == '9-NDI not avail', + NA + ), + c(levels(GA2010messer_county$NDIQuart)[-5], NA) + ) + +ggplot() + + geom_sf( + data = GA2010messer_county, + aes(fill = NDIQuartNA), + size = 0.20, + color = 'white' + ) + + theme_minimal() + + scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') + + labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') + + ggtitle( + 'Neighborhood Deprivation Index (Messer) Quartiles', + subtitle = 'GA counties as the referent' + ) ``` ### Compute NDI (Powell-Wiley) @@ -198,7 +226,11 @@ Compute the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Maryland, Virgi More information about the [codebook](https://gis.cancer.gov/research/NeighDeprvIndex_Methods.pdf) and [computation](https://gis.cancer.gov/research/NeighDeprvIndex_Methods.pdf) of the NDI (Powell-Wiley) can be found on a [GIS Portal for Cancer Research](https://gis.cancer.gov/research/files.html#soc-dep) website. ```{r powell_wiley, results = 'hide'} -powell_wiley2020DMVW <- ndi::powell_wiley(state = c("DC", "MD", "VA", "WV"), year = 2020, round_output = TRUE) +powell_wiley2020DMVW <- powell_wiley( + state = c('DC', 'MD', 'VA', 'WV'), + year = 2020, + round_output = TRUE +) ``` One output from the `powell_wiley()` function is a tibble containing the identification, geographic name, NDI (Powell-Wiley) values, and raw census characteristics for each tract. @@ -228,197 +260,242 @@ powell_wiley2020DMVW$cronbach We can visualize the NDI (Powell-Wiley) values geographically by linking them to spatial information from the `tigris` package and plotting with the `ggplot2` package suite. ```{r powell_wiley_prep, results = 'hide'} -# Obtain the 2020 counties from the "tigris" package -county2020 <- tigris::counties(cb = TRUE) -county2020DMVW <- county2020[county2020$STUSPS %in% c("DC", "MD", "VA", "WV"), ] - -# Obtain the 2020 census tracts from the "tigris" package -tract2020D <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) -tract2020M <- tigris::tracts(state = "MD", year = 2020, cb = TRUE) -tract2020V <- tigris::tracts(state = "VA", year = 2020, cb = TRUE) -tract2020W <- tigris::tracts(state = "WV", year = 2020, cb = TRUE) +# Obtain the 2020 counties from the 'tigris' package +county2020 <- counties(cb = TRUE) +county2020DMVW <- county2020[county2020$STUSPS %in% c('DC', 'MD', 'VA', 'WV'), ] + +# Obtain the 2020 census tracts from the 'tigris' package +tract2020D <- tracts(state = 'DC', year = 2020, cb = TRUE) +tract2020M <- tracts(state = 'MD', year = 2020, cb = TRUE) +tract2020V <- tracts(state = 'VA', year = 2020, cb = TRUE) +tract2020W <- tracts(state = 'WV', year = 2020, cb = TRUE) tracts2020DMVW <- rbind(tract2020D, tract2020M, tract2020V, tract2020W) # Join the NDI (Powell-Wiley) values to the census tract geometry -DMVW2020pw <- dplyr::left_join(tracts2020DMVW, powell_wiley2020DMVW$ndi, by = "GEOID") +DMVW2020pw <- tracts2020DMVW %>% + left_join(powell_wiley2020DMVW$ndi, by = 'GEOID') ``` ```{r powell_wiley_plot, fig.height = 4, fig.width = 7} # Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) ## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., census tracts ## Continuous Index -ggplot2::ggplot() + - ggplot2::geom_sf(data = DMVW2020pw, - ggplot2::aes(fill = NDI), - color = NA) + - ggplot2::geom_sf(data = county2020DMVW, - fill = "transparent", - color = "white") + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c(na.value = "grey80") + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)", - subtitle = "DC, MD, VA, and WV tracts as the referent") +ggplot() + + geom_sf( + data = DMVW2020pw, + aes(fill = NDI), + color = NA + ) + + geom_sf( + data = county2020DMVW, + fill = 'transparent', + color = 'white' + ) + + theme_minimal() + + scale_fill_viridis_c(na.value = 'grey80') + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') + + ggtitle( + 'Neighborhood Deprivation Index (Powell-Wiley)', + subtitle = 'DC, MD, VA, and WV tracts as the referent' + ) ## Categorical Index (Population-weighted quintiles) -### Rename "9-NDI not avail" level as NA for plotting -DMVW2020pw$NDIQuintNA <- factor(replace(as.character(DMVW2020pw$NDIQuint), - DMVW2020pw$NDIQuint == "9-NDI not avail", NA), - c(levels(DMVW2020pw$NDIQuint)[-6], NA)) - -ggplot2::ggplot() + - ggplot2::geom_sf(data = DMVW2020pw, - ggplot2::aes(fill = NDIQuintNA), - color = NA) + - ggplot2::geom_sf(data = county2020DMVW, - fill = "transparent", - color = "white") + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE), - na.value = "grey80") + - ggplot2::labs(fill = "Index (Categorical)", - caption = "Source: U.S. Census ACS 2016-2020 estimates")+ - ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles", - subtitle = "DC, MD, VA, and WV tracts as the referent") +### Rename '9-NDI not avail' level as NA for plotting +DMVW2020pw$NDIQuintNA <- + factor(replace( + as.character(DMVW2020pw$NDIQuint), + DMVW2020pw$NDIQuint == '9-NDI not avail', + NA + ), + c(levels(DMVW2020pw$NDIQuint)[-6], NA)) + +ggplot() + + geom_sf(data = DMVW2020pw, aes(fill = NDIQuintNA), color = NA) + + geom_sf(data = county2020DMVW, fill = 'transparent', color = 'white') + + theme_minimal() + + scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') + + labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') + + ggtitle( + 'Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles', + subtitle = 'DC, MD, VA, and WV tracts as the referent' + ) ``` Like the NDI (Messer), we also compute county-level NDI (Powell-Wiley). ```{r powell_wiley_county_prep, results = 'hide'} -# Obtain the 2020 counties from the "tigris" package -county2020DMVW <- tigris::counties(state = c("DC", "MD", "VA", "WV"), year = 2020, cb = TRUE) +# Obtain the 2020 counties from the 'tigris' package +county2020DMVW <- counties(state = c('DC', 'MD', 'VA', 'WV'), year = 2020, cb = TRUE) # NDI (Powell-Wiley) at the county level (2016-2020) -powell_wiley2020DMVW_county <- ndi::powell_wiley(geo = "county", - state = c("DC", "MD", "VA", "WV"), - year = 2020) +powell_wiley2020DMVW_county <- powell_wiley( + geo = 'county', + state = c('DC', 'MD', 'VA', 'WV'), + year = 2020 +) # Join the NDI (Powell-Wiley) values to the county geometry -DMVW2020pw_county <- dplyr::left_join(county2020DMVW, powell_wiley2020DMVW_county$ndi, by = "GEOID") +DMVW2020pw_county <- county2020DMVW %>% + left_join(powell_wiley2020DMVW_county$ndi, by = 'GEOID') ``` ```{r powell_wiley_county_plot, fig.height = 4, fig.width = 7} # Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) ## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., counties ## Continuous Index -ggplot2::ggplot() + - ggplot2::geom_sf(data = DMVW2020pw_county, - ggplot2::aes(fill = NDI), - size = 0.20, - color = "white") + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates") + - ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)", - subtitle = "DC, MD, VA, and WV counties as the referent") +ggplot() + + geom_sf( + data = DMVW2020pw_county, + aes(fill = NDI), + size = 0.20, + color = 'white' + ) + + theme_minimal() + + scale_fill_viridis_c() + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') + + ggtitle( + 'Neighborhood Deprivation Index (Powell-Wiley)', + subtitle = 'DC, MD, VA, and WV counties as the referent' + ) ## Categorical Index -### Rename "9-NDI not avail" level as NA for plotting -DMVW2020pw_county$NDIQuintNA <- factor(replace(as.character(DMVW2020pw_county$NDIQuint), - DMVW2020pw_county$NDIQuint == "9-NDI not avail", NA), - c(levels(DMVW2020pw_county$NDIQuint)[-6], NA)) - -ggplot2::ggplot() + - ggplot2::geom_sf(data = DMVW2020pw_county, - ggplot2::aes(fill = NDIQuint), - size = 0.20, - color = "white") + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE), - na.value = "grey80") + - ggplot2::labs(fill = "Index (Categorical)", - caption = "Source: U.S. Census ACS 2016-2020 estimates") + - ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles", - subtitle = "DC, MD, VA, and WV counties as the referent") +### Rename '9-NDI not avail' level as NA for plotting +DMVW2020pw_county$NDIQuintNA <- + factor( + replace( + as.character(DMVW2020pw_county$NDIQuint), + DMVW2020pw_county$NDIQuint == '9-NDI not avail', + NA + ), + c(levels(DMVW2020pw_county$NDIQuint)[-6], NA) + ) + +ggplot() + + geom_sf( + data = DMVW2020pw_county, + aes(fill = NDIQuint), + size = 0.20, + color = 'white' + ) + + theme_minimal() + + scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') + + labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') + + ggtitle( + 'Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles', + subtitle = 'DC, MD, VA, and WV counties as the referent' + ) ``` ### Advanced Features #### Imputing missing census variables -In the `messer()` and `powell_wiley()` functions, missing census characteristics can be imputed using the `missing` and `impute` arguments of the `pca()` function in the `psych` package called within the `messer()` and `powell_wiley()` functions. Impute values using the logical `imp` argument (currently only calls `impute = "median"` by default, which assigns the median values of each missing census variable for a geography). +In the `messer()` and `powell_wiley()` functions, missing census characteristics can be imputed using the `missing` and `impute` arguments of the `pca()` function in the `psych` package called within the `messer()` and `powell_wiley()` functions. Impute values using the logical `imp` argument (currently only calls `impute = 'median'` by default, which assigns the median values of each missing census variable for a geography). ```{r powell_wiley_imp, results = 'hide'} -powell_wiley2020DC <- ndi::powell_wiley(state = "DC", year = 2020) # without imputation -powell_wiley2020DCi <- ndi::powell_wiley(state = "DC", year = 2020, imp = TRUE) # with imputation +powell_wiley2020DC <- powell_wiley(state = 'DC', year = 2020) # without imputation +powell_wiley2020DCi <- powell_wiley(state = 'DC', year = 2020, imp = TRUE) # with imputation table(is.na(powell_wiley2020DC$ndi$NDI)) # n=13 tracts without NDI (Powell-Wiley) values table(is.na(powell_wiley2020DCi$ndi$NDI)) # n=0 tracts without NDI (Powell-Wiley) values -# Obtain the 2020 census tracts from the "tigris" package -tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) +# Obtain the 2020 census tracts from the 'tigris' package +tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE) # Join the NDI (Powell-Wiley) values to the census tract geometry -DC2020pw <- dplyr::left_join(tract2020DC, powell_wiley2020DC$ndi, by = "GEOID") -DC2020pw <- dplyr::left_join(DC2020pw, powell_wiley2020DCi$ndi, by = "GEOID", suffix = c("_nonimp", "_imp")) +DC2020pw <- tract2020DC %>% + left_join(powell_wiley2020DC$ndi, by = 'GEOID') +DC2020pw <- DC2020pw %>% + left_join(powell_wiley2020DCi$ndi, by = 'GEOID', suffix = c('_nonimp', '_imp')) ``` ```{r powell_wiley_imp_plot, fig.height = 7, fig.width = 7} -# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Washington, D.C., census tracts +# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for +## Washington, D.C., census tracts ## Continuous Index -ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020pw, - ggplot2::aes(fill = NDI_nonimp), - size = 0.2, - color = "white") + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates") + - ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Non-Imputed", - subtitle = "DC census tracts as the referent") - -ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020pw, - ggplot2::aes(fill = NDI_imp), - size = 0.2, - color = "white") + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2016-2020 estimates") + - ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Imputed", - subtitle = "DC census tracts as the referent") +ggplot() + + geom_sf( + data = DC2020pw, + aes(fill = NDI_nonimp), + size = 0.2, + color = 'white' + ) + + theme_minimal() + + scale_fill_viridis_c() + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') + + ggtitle( + 'Neighborhood Deprivation Index (Powell-Wiley), Non-Imputed', + subtitle = 'DC census tracts as the referent' + ) + +ggplot() + + geom_sf( + data = DC2020pw, + aes(fill = NDI_imp), + size = 0.2, + color = 'white' + ) + + theme_minimal() + + scale_fill_viridis_c() + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') + + ggtitle( + 'Neighborhood Deprivation Index (Powell-Wiley), Imputed', + subtitle = 'DC census tracts as the referent' + ) ## Categorical Index -### Rename "9-NDI not avail" level as NA for plotting -DC2020pw$NDIQuintNA_nonimp <- factor(replace(as.character(DC2020pw$NDIQuint_nonimp), - DC2020pw$NDIQuint_nonimp == "9-NDI not avail", NA), - c(levels(DC2020pw$NDIQuint_nonimp)[-6], NA)) - -ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020pw, - ggplot2::aes(fill = NDIQuintNA_nonimp), - size = 0.2, - color = "white") + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE), - na.value = "grey80") + - ggplot2::labs(fill = "Index (Categorical)", - caption = "Source: U.S. Census ACS 2016-2020 estimates") + - ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Non-Imputed", - subtitle = "DC census tracts as the referent") - -### Rename "9-NDI not avail" level as NA for plotting -DC2020pw$NDIQuintNA_imp <- factor(replace(as.character(DC2020pw$NDIQuint_imp), - DC2020pw$NDIQuint_imp == "9-NDI not avail", NA), - c(levels(DC2020pw$NDIQuint_imp)[-6], NA)) - -ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020pw, - ggplot2::aes(fill = NDIQuintNA_imp), - size = 0.2, - color = "white") + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE), - na.value = "grey80") + - ggplot2::labs(fill = "Index (Categorical)", - caption = "Source: U.S. Census ACS 2016-2020 estimates") + - ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Imputed", - subtitle = "DC census tracts as the referent") +### Rename '9-NDI not avail' level as NA for plotting +DC2020pw$NDIQuintNA_nonimp <- + factor( + replace( + as.character(DC2020pw$NDIQuint_nonimp), + DC2020pw$NDIQuint_nonimp == '9-NDI not avail', + NA + ), + c(levels(DC2020pw$NDIQuint_nonimp)[-6], NA) + ) + +ggplot() + + geom_sf( + data = DC2020pw, + aes(fill = NDIQuintNA_nonimp), + size = 0.2, + color = 'white' + ) + + theme_minimal() + + scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') + + labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') + + ggtitle( + 'Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Non-Imputed', + subtitle = 'DC census tracts as the referent' + ) + +### Rename '9-NDI not avail' level as NA for plotting +DC2020pw$NDIQuintNA_imp <- + factor( + replace( + as.character(DC2020pw$NDIQuint_imp), + DC2020pw$NDIQuint_imp == '9-NDI not avail', + NA + ), + c(levels(DC2020pw$NDIQuint_imp)[-6], NA) + ) + +ggplot() + + geom_sf( + data = DC2020pw, + aes(fill = NDIQuintNA_imp), + size = 0.2, + color = 'white' + ) + + theme_minimal() + + scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') + + labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') + + ggtitle( + 'Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Imputed', + subtitle = 'DC census tracts as the referent' + ) ``` #### Assign the referent (U.S.-Standardized Metric) @@ -426,25 +503,30 @@ ggplot2::ggplot() + To conduct a contiguous US-standardized index, compute an NDI for all states as in the example below that replicates the nationally standardized NDI (Powell-Wiley) values (2013-2017 ACS-5) found in [Slotman et al. (2022)](https://doi.org/10.1016/j.dib.2022.108002) and available from a [GIS Portal for Cancer Research](https://gis.cancer.gov/research/files.html#soc-dep) website. To replicate the nationally standardized NDI (Powell-Wiley) values (2006-2010 ACS-5) found in [Andrews et al. (2020)](https://doi.org/10.1080/17445647.2020.1750066) change the `year` argument to `2010` (i.e., `year = 2010`). ```{r national_prep, results = 'hide'} -us <- tigris::states() -n51 <- c("Commonwealth of the Northern Mariana Islands", "Guam", "American Samoa", - "Puerto Rico", "United States Virgin Islands") +us <- states() +n51 <- c( + 'Commonwealth of the Northern Mariana Islands', + 'Guam', + 'American Samoa', + 'Puerto Rico', + 'United States Virgin Islands' +) y51 <- us$STUSPS[!(us$NAME %in% n51)] start_time <- Sys.time() # record start time -powell_wiley2017US <- ndi::powell_wiley(state = y51, year = 2017) +powell_wiley2017US <- powell_wiley(state = y51, year = 2017) end_time <- Sys.time() # record end time time_srr <- end_time - start_time # Calculate run time ``` ```{r national_hist, fig.height = 7, fig.width = 7} -ggplot2::ggplot(powell_wiley2017US$ndi, - ggplot2::aes(x = NDI)) + - ggplot2::geom_histogram(color = "black", - fill = "white") + - ggplot2::theme_minimal() + - ggplot2::ggtitle("Histogram of US-standardized NDI (Powell-Wiley) values (2013-2017)", - subtitle = "U.S. census tracts as the referent (including AK, HI, and DC)") +ggplot(powell_wiley2017US$ndi, aes(x = NDI)) + + geom_histogram(color = 'black', fill = 'white') + + theme_minimal() + + ggtitle( + 'Histogram of US-standardized NDI (Powell-Wiley) values (2013-2017)', + subtitle = 'U.S. census tracts as the referent (including AK, HI, and DC)' + ) ``` The process to compute a US-standardized NDI (Powell-Wiley) took about `r round(time_srr, digits = 1)` minutes to run on a machine with the features listed at the end of the vignette. @@ -463,6 +545,7 @@ Since version v0.1.1, the `ndi` package can compute additional metrics of socio- 8. `white()` function that computes the aspatial racial/ethnic Correlation Ratio based on [Bell (1954)](https://doi.org/10.2307/2574118) and [White (1986)](https://doi.org/10.2307/3644339) 9. `sudano()` function that computes the aspatial racial/ethnic Location Quotient based on [Merton (1939)](https://doi.org/10.2307/2084686) and [Sudano et al. (2013)](https://doi.org/10.1016/j.healthplace.2012.09.015) 10. `bemanian_beyer()` function that computes the aspatial racial/ethnic Local Exposure and Isolation metric based on [Bemanian & Beyer (2017)](https://doi.org/10.1158/1055-9965.EPI-16-0926) +11. `hoover()` function that computes the aspatial racial/ethnic Delta based on [Hoover (1941)](https://doi.org/10.1017/S0022050700052980) and Duncan et al. (1961; LC:60007089) #### Compute Racial Isolation Index (RI) @@ -494,83 +577,99 @@ Compute the spatial RI values (2006-2010 5-year ACS) for North Carolina, U.S.A. A census geography (and its neighbors) that has nearly all of its population who identify with the specified race/ethnicity subgroup(s) (e.g., Not Hispanic or Latino, Black or African American alone) will have an RI value close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population who identify with the specified race/ethnicity subgroup(s) (e.g., not Not Hispanic or Latino, Black or African American alone) will have an RI value close to 0. ```{r anthopolos_prep, results = 'hide'} -anthopolos2010NC <- ndi::anthopolos(state = "NC", year = 2010, subgroup = "NHoLB") +anthopolos2010NC <- anthopolos(state = 'NC', year = 2010, subgroup = 'NHoLB') -# Obtain the 2010 census tracts from the "tigris" package -tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE) +# Obtain the 2010 census tracts from the 'tigris' package +tract2010NC <- tracts(state = 'NC', year = 2010, cb = TRUE) # Remove first 9 characters from GEOID for compatibility with tigris information tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) -# Obtain the 2010 counties from the "tigris" package -county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE) +# Obtain the 2010 counties from the 'tigris' package +county2010NC <- counties(state = 'NC', year = 2010, cb = TRUE) # Join the RI values to the census tract geometry -NC2010anthopolos <- dplyr::left_join(tract2010NC, anthopolos2010NC$ri, by = "GEOID") +NC2010anthopolos <- tract2010NC %>% + left_join(anthopolos2010NC$ri, by = 'GEOID') ``` ```{r anthopolos_plot, fig.height = 4, fig.width = 7} # Visualize the RI values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts -ggplot2::ggplot() + - ggplot2::geom_sf(data = NC2010anthopolos, - ggplot2::aes(fill = RI), - size = 0.05, - color = "transparent") + - ggplot2::geom_sf(data = county2010NC, - fill = "transparent", - color = "white", - size = 0.2) + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2006-2010 estimates") + - ggplot2::ggtitle("Racial Isolation Index (Anthopolos), non-Hispanic Black", - subtitle = "NC census tracts (not corrected for edge effects)") +ggplot() + + geom_sf( + data = NC2010anthopolos, + aes(fill = RI), + size = 0.05, + color = 'transparent' + ) + + geom_sf( + data = county2010NC, + fill = 'transparent', + color = 'white', + size = 0.2 + ) + + theme_minimal() + + scale_fill_viridis_c() + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') + + ggtitle( + 'Racial Isolation Index (Anthopolos), non-Hispanic Black', + subtitle = 'NC census tracts (not corrected for edge effects)' + ) ``` The current version of the `ndi` package does not correct for edge effects (e.g., census geographies along the specified spatial extent border, coastline, or U.S.-Mexico / U.S.-Canada border) may have few neighboring census geographies, and RI values in these census geographies may be unstable. A stop-gap solution for the former source of edge effect is to compute the RI for neighboring census geographies (i.e., the states bordering a study area of interest) and then use the estimates of the study area of interest. ```{r anthopolos_edge_prep, results = 'hide'} # Compute RI for all census tracts in neighboring states -anthopolos2010GNSTV <- ndi::anthopolos(state = c("GA", "NC", "SC", "TN", "VA"), - year = 2010, subgroup = "NHoLB") +anthopolos2010GNSTV <- anthopolos( + state = c('GA', 'NC', 'SC', 'TN', 'VA'), + year = 2010, + subgroup = 'NHoLB' +) # Crop to only North Carolina, U.S.A. census tracts -anthopolos2010NCe <- anthopolos2010GNSTV$ri[anthopolos2010GNSTV$ri$GEOID %in% anthopolos2010NC$ri$GEOID, ] +anthopolos2010NCe <- anthopolos2010GNSTV$ri[anthopolos2010GNSTV$ri$GEOID %in% + anthopolos2010NC$ri$GEOID, ] -# Obtain the 2010 census tracts from the "tigris" package -tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE) +# Obtain the 2010 census tracts from the 'tigris' package +tract2010NC <- tracts(state = 'NC', year = 2010, cb = TRUE) # Remove first 9 characters from GEOID for compatibility with tigris information tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) -# Obtain the 2010 counties from the "tigris" package -county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE) +# Obtain the 2010 counties from the 'tigris' package +county2010NC <- counties(state = 'NC', year = 2010, cb = TRUE) # Join the RI values to the census tract geometry -edgeNC2010anthopolos <- dplyr::left_join(tract2010NC, anthopolos2010NCe, by = "GEOID") +edgeNC2010anthopolos <- tract2010NC %>% + left_join(anthopolos2010NCe, by = 'GEOID') ``` ```{r anthopolos_edge_plot, fig.height = 4, fig.width = 7} # Visualize the RI values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts -ggplot2::ggplot() + - ggplot2::geom_sf(data = edgeNC2010anthopolos, - ggplot2::aes(fill = RI), - size = 0.05, - color = "transparent") + - ggplot2::geom_sf(data = county2010NC, - fill = "transparent", - color = "white", - size = 0.2) + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2006-2010 estimates") + - ggplot2::ggtitle("Racial Isolation Index (Anthopolos), non-Hispanic Black", - subtitle = "NC census tracts (corrected for interstate edge effects)") +ggplot() + + geom_sf( + data = edgeNC2010anthopolos, + aes(fill = RI), + size = 0.05, + color = 'transparent' + ) + + geom_sf( + data = county2010NC, + fill = 'transparent', + color = 'white', + size = 0.2 + ) + + theme_minimal() + + scale_fill_viridis_c() + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') + + ggtitle( + 'Racial Isolation Index (Anthopolos), non-Hispanic Black', + subtitle = 'NC census tracts (corrected for interstate edge effects)' + ) ``` #### Compute Educational Isolation Index (EI) -Compute the spatial EI (Bravo) values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts. This metric is based on [Bravo et al. (2021)](https://doi.org/10.3390/ijerph18179384) that assessed the educational isolation of the population without a four-year college degree. Multiple educational attainment categories are available in the `bravo()` function, including: +Compute the spatial EI (Bravo) values (2006-2010 5-year ACS) for Oklahoma, U.S.A., census tracts. This metric is based on [Bravo et al. (2021)](https://doi.org/10.3390/ijerph18179384) that assessed the educational isolation of the population without a four-year college degree. Multiple educational attainment categories are available in the `bravo()` function, including: | ACS table source | educational attainment category | character for `subgroup` argument | | -------------- | ------------- | ---------------- | @@ -579,84 +678,93 @@ Compute the spatial EI (Bravo) values (2006-2010 5-year ACS) for North Carolina, | B06009_004 | some college or associate's degree | SCoAD | | B06009_005 | Bachelor's degree | BD | | B06009_006 | graduate or professional degree | GoPD | -Note: The ACS-5 data (2005-2009) uses the "B15002" question. +Note: The ACS-5 data (2005-2009) uses the 'B15002' question. A census geography (and its neighbors) that has nearly all of its population with the specified educational attainment category (e.g., a four-year college degree or more) will have an EI (Bravo) value close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population with the specified educational attainment category (e.g., with a four-year college degree) will have an EI (Bravo) value close to 0. ```{r bravo_prep, results = 'hide'} -bravo2010NC <- ndi::bravo(state = "NC", year = 2010, subgroup = c("LtHS", "HSGiE", "SCoAD")) +bravo2010OK <- bravo(state = 'OK', year = 2010, subgroup = c('LtHS', 'HSGiE', 'SCoAD')) -# Obtain the 2010 census tracts from the "tigris" package -tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE) +# Obtain the 2010 census tracts from the 'tigris' package +tract2010OK <- tracts(state = 'OK', year = 2010, cb = TRUE) # Remove first 9 characters from GEOID for compatibility with tigris information -tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) +tract2010OK$GEOID <- substring(tract2010OK$GEO_ID, 10) -# Obtain the 2010 counties from the "tigris" package -county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE) +# Obtain the 2010 counties from the 'tigris' package +county2010OK <- counties(state = 'OK', year = 2010, cb = TRUE) # Join the EI (Bravo) values to the census tract geometry -NC2010bravo <- dplyr::left_join(tract2010NC, bravo2010NC$ei, by = "GEOID") +OK2010bravo <- tract2010OK %>% + left_join(bravo2010OK$ei, by = 'GEOID') ``` ```{r bravo_plot, fig.height = 4, fig.width = 7} -# Visualize the EI (Bravo) values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts -ggplot2::ggplot() + - ggplot2::geom_sf(data = NC2010bravo, - ggplot2::aes(fill = EI), - size = 0.05, - color = "transparent") + - ggplot2::geom_sf(data = county2010NC, - fill = "transparent", - color = "white", - size = 0.2) + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2006-2010 estimates") + - ggplot2::ggtitle("Educational Isolation Index (Bravo), without a four-year college degree", - subtitle = "NC census tracts (not corrected for edge effects)") +# Visualize the EI (Bravo) values (2006-2010 5-year ACS) for Oklahoma, U.S.A., census tracts +ggplot() + + geom_sf( + data = OK2010bravo, + aes(fill = EI), + size = 0.05, + color = 'transparent' + ) + + geom_sf( + data = county2010OK, + fill = 'transparent', + color = 'white', + size = 0.2 + ) + + theme_minimal() + + scale_fill_viridis_c(limits = c(0, 1)) + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') + + ggtitle( + 'Educational Isolation Index (Bravo), without a four-year college degree', + subtitle = 'OK census tracts (not corrected for edge effects)' + ) ``` Can correct one source of edge effect in the same manner as shown for the RI metric. #### Retrieve the Gini Index -Retrieve the aspatial Gini Index values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts. This metric is based on [Gini (1921)](https://doi.org/10.2307/2223319), and the `gini()` function retrieves the estimate from the ACS-5. +Retrieve the aspatial Gini Index values (2006-2010 5-year ACS) for Massachusetts, U.S.A., census tracts. This metric is based on [Gini (1921)](https://doi.org/10.2307/2223319), and the `gini()` function retrieves the estimate from the ACS-5. -According to the [U.S. Census Bureau](https://census.gov/topics/income-poverty/income-inequality/about/metrics/gini-index.html): "The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income). The Gini is based on the difference between the Lorenz curve (the observed cumulative income distribution) and the notion of a perfectly equal income distribution." +According to the [U.S. Census Bureau](https://census.gov/topics/income-poverty/income-inequality/about/metrics/gini-index.html): 'The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income). The Gini is based on the difference between the Lorenz curve (the observed cumulative income distribution) and the notion of a perfectly equal income distribution.' ```{r gini_prep, results = 'hide'} -gini2010NC <- ndi::gini(state = "NC", year = 2010) +gini2010MA <- gini(state = 'MA', year = 2010) -# Obtain the 2010 census tracts from the "tigris" package -tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE) +# Obtain the 2010 census tracts from the 'tigris' package +tract2010MA <- tracts(state = 'MA', year = 2010, cb = TRUE) # Remove first 9 characters from GEOID for compatibility with tigris information -tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) +tract2010MA$GEOID <- substring(tract2010MA$GEO_ID, 10) -# Obtain the 2010 counties from the "tigris" package -county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE) +# Obtain the 2010 counties from the 'tigris' package +county2010MA <- counties(state = 'MA', year = 2010, cb = TRUE) # Join the Gini Index values to the census tract geometry -NC2010gini <- dplyr::left_join(tract2010NC, gini2010NC$gini, by = "GEOID") +MA2010gini <- tract2010MA %>% + left_join(gini2010MA$gini, by = 'GEOID') ``` ```{r gini_plot, fig.height = 4, fig.width = 7} -# Visualize the Gini Index values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts -ggplot2::ggplot() + - ggplot2::geom_sf(data = NC2010gini, - ggplot2::aes(fill = gini), - size = 0.05, - color = "transparent") + - ggplot2::geom_sf(data = county2010NC, - fill = "transparent", - color = "white", - size = 0.2) + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2006-2010 estimates") + - ggplot2::ggtitle("Gini Index", - subtitle = "NC census tracts") +# Visualize the Gini Index values (2006-2010 5-year ACS) for Massachusetts, U.S.A., census tracts +ggplot() + + geom_sf( + data = MA2010gini, + aes(fill = gini), + size = 0.05, + color = 'transparent' + ) + + geom_sf( + data = county2010MA, + fill = 'transparent', + color = 'white', + size = 0.2 + ) + + theme_minimal() + + scale_fill_viridis_c() + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') + + ggtitle('Gini Index', subtitle = 'MA census tracts') ``` ### Index of Concentration at the Extremes (ICE) @@ -665,93 +773,135 @@ Compute the aspatial Index of Concentration at the Extremes values (2006-2010 5- | ACS table group | ICE metric | Comparison | -------------- | ------------- | ---------------- | -| B19001 | Income, "ICE_inc"| 80th income percentile vs. 20th income percentile | -| B15002 | Education, "ICE_edu"| less than high school vs. four-year college degree or more | -| B03002 | Race/Ethnicity, "ICE_rewb"| 80th income percentile vs. 20th income percentile | -| B19001 & B19001B & B19001H | Income and race/ethnicity combined, "ICE_wbinc" | white non-Hispanic in 80th income percentile vs. black alone (including Hispanic) in 20th income percentile | -| B19001 & B19001H | Income and race/ethnicity combined, "ICE_wpcinc"| white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile | +| B19001 | Income, 'ICE_inc'| 80th income percentile vs. 20th income percentile | +| B15002 | Education, 'ICE_edu'| less than high school vs. four-year college degree or more | +| B03002 | Race/Ethnicity, 'ICE_rewb'| 80th income percentile vs. 20th income percentile | +| B19001 & B19001B & B19001H | Income and race/ethnicity combined, 'ICE_wbinc' | white non-Hispanic in 80th income percentile vs. black alone (including Hispanic) in 20th income percentile | +| B19001 & B19001H | Income and race/ethnicity combined, 'ICE_wpcinc'| white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile | ICE metrics can range in value from −1 (most deprived) to 1 (most privileged). A value of 0 can thus represent two possibilities: (1) none of the residents are in the most privileged or most deprived categories, or (2) an equal number of persons are in the most privileged and most deprived categories, and in both cases indicates that the area is not dominated by extreme concentrations of either of the two groups. ```{r krieger_prep, results = 'hide'} -ice2020WC <- krieger(state = "MI", county = "Wayne", year = 2010) +ice2020WC <- krieger(state = 'MI', county = 'Wayne', year = 2010) -# Obtain the 2010 census tracts from the "tigris" package -tract2010WC <- tigris::tracts(state = "MI", county = "Wayne", year = 2010, cb = TRUE) +# Obtain the 2010 census tracts from the 'tigris' package +tract2010WC <- tracts(state = 'MI', county = 'Wayne', year = 2010, cb = TRUE) # Remove first 9 characters from GEOID for compatibility with tigris information tract2010WC$GEOID <- substring(tract2010WC$GEO_ID, 10) # Join the ICE values to the census tract geometry -ice2020WC <- dplyr::left_join(tract2010WC, ice2020WC$ice, by = "GEOID") +ice2020WC <- tract2010WC %>% + left_join(ice2020WC$ice, by = 'GEOID') ``` ```{r krieger_plot, fig.height = 5.5, fig.width = 7} # Plot ICE for Income -ggplot2::ggplot() + - ggplot2::geom_sf(data = ice2020WC, - ggplot2::aes(fill = ICE_inc), - color = "white", - size = 0.05) + - ggplot2::theme_bw() + - ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1,1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2006-2010 estimates")+ - ggplot2::ggtitle("Index of Concentration at the Extremes\nIncome (Krieger)", - subtitle = "80th income percentile vs. 20th income percentile") +ggplot() + + geom_sf( + data = ice2020WC, + aes(fill = ICE_inc), + color = 'white', + size = 0.05 + ) + + theme_bw() + + scale_fill_gradient2( + low = '#998ec3', + mid = '#f7f7f7', + high = '#f1a340', + limits = c(-1, 1) + ) + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') + + ggtitle( + 'Index of Concentration at the Extremes\nIncome (Krieger)', + subtitle = '80th income percentile vs. 20th income percentile' + ) # Plot ICE for Education -ggplot2::ggplot() + - ggplot2::geom_sf(data = ice2020WC, - ggplot2::aes(fill = ICE_edu), - color = "white", - size = 0.05) + - ggplot2::theme_bw() + - ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1,1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2006-2010 estimates")+ - ggplot2::ggtitle("Index of Concentration at the Extremes\nEducation (Krieger)", - subtitle = "less than high school vs. four-year college degree or more") +ggplot() + + geom_sf( + data = ice2020WC, + aes(fill = ICE_edu), + color = 'white', + size = 0.05 + ) + + theme_bw() + + scale_fill_gradient2( + low = '#998ec3', + mid = '#f7f7f7', + high = '#f1a340', + limits = c(-1, 1) + ) + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') + + ggtitle( + 'Index of Concentration at the Extremes\nEducation (Krieger)', + subtitle = 'less than high school vs. four-year college degree or more' + ) # Plot ICE for Race/Ethnicity -ggplot2::ggplot() + - ggplot2::geom_sf(data = ice2020WC, - ggplot2::aes(fill = ICE_rewb), - color = "white", - size = 0.05) + - ggplot2::theme_bw() + - ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1, 1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2006-2010 estimates")+ - ggplot2::ggtitle("Index of Concentration at the Extremes\nRace/Ethnicity (Krieger)", - subtitle = "white non-Hispanic vs. black non-Hispanic") +ggplot() + + geom_sf( + data = ice2020WC, + aes(fill = ICE_rewb), + color = 'white', + size = 0.05 + ) + + theme_bw() + + scale_fill_gradient2( + low = '#998ec3', + mid = '#f7f7f7', + high = '#f1a340', + limits = c(-1, 1) + ) + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') + + ggtitle( + 'Index of Concentration at the Extremes\nRace/Ethnicity (Krieger)', + subtitle = 'white non-Hispanic vs. black non-Hispanic' + ) # Plot ICE for Income and Race/Ethnicity Combined -## white non-Hispanic in 80th income percentile vs. black (including Hispanic) in 20th income percentile -ggplot2::ggplot() + - ggplot2::geom_sf(data = ice2020WC, - ggplot2::aes(fill = ICE_wbinc), - color = "white", - size = 0.05) + - ggplot2::theme_bw() + - ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1, 1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2006-2010 estimates")+ - ggplot2::ggtitle("Index of Concentration at the Extremes\nIncome & race/ethnicity combined (Krieger)", - subtitle = "white non-Hispanic in 80th inc ptcl vs. black alone in 20th inc pctl") +## white non-Hispanic in 80th income percentile vs. +## black (including Hispanic) in 20th income percentile +ggplot() + + geom_sf( + data = ice2020WC, + aes(fill = ICE_wbinc), + color = 'white', + size = 0.05 + ) + + theme_bw() + + scale_fill_gradient2( + low = '#998ec3', + mid = '#f7f7f7', + high = '#f1a340', + limits = c(-1, 1) + ) + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') + + ggtitle( + 'Index of Concentration at the Extremes\nIncome & race/ethnicity combined (Krieger)', + subtitle = 'white non-Hispanic in 80th inc ptcl vs. black alone in 20th inc pctl' + ) # Plot ICE for Income and Race/Ethnicity Combined ## white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile -ggplot2::ggplot() + - ggplot2::geom_sf(data = ice2020WC, - ggplot2::aes(fill = ICE_wpcinc), - color = "white", - size = 0.05) + - ggplot2::theme_bw() + - ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1, 1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2006-2010 estimates")+ - ggplot2::ggtitle("Index of Concentration at the Extremes\nIncome & race/ethnicity combined (Krieger)", - subtitle = "white non-Hispanic (WNH) in 80th inc pctl vs. WNH in 20th inc pctl") +ggplot() + + geom_sf( + data = ice2020WC, + aes(fill = ICE_wpcinc), + color = 'white', + size = 0.05 + ) + + theme_bw() + + scale_fill_gradient2( + low = '#998ec3', + mid = '#f7f7f7', + high = '#f1a340', + limits = c(-1, 1) + ) + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') + + ggtitle( + 'Index of Concentration at the Extremes\nIncome & race/ethnicity combined (Krieger)', + subtitle = 'white non-Hispanic (WNH) in 80th inc pctl vs. WNH in 20th inc pctl' + ) ``` #### Compute racial/ethnic Dissimilarity Index (DI) @@ -784,40 +934,52 @@ Compute the aspatial racial/ethnic DI values (2006-2010 5-year ACS) for Pennsylv DI is a measure of the evenness of racial/ethnic residential segregation when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. DI can range in value from 0 to 1 and represents the proportion of racial/ethnic subgroup members that would have to change their area of residence to achieve an even distribution within the larger geographical area under conditions of maximum segregation. ```{r duncan_prep, results = 'hide'} -duncan2010PA <- ndi::duncan(geo_large = "county", geo_small = "tract", state = "PA", - year = 2010, subgroup = "NHoLB", subgroup_ref = "NHoLW") - -# Obtain the 2010 census counties from the "tigris" package -county2010PA <- tigris::counties(state = "PA", year = 2010, cb = TRUE) +duncan2010PA <- duncan( + geo_large = 'county', + geo_small = 'tract', + state = 'PA', + year = 2010, + subgroup = 'NHoLB', + subgroup_ref = 'NHoLW' +) + +# Obtain the 2010 census counties from the 'tigris' package +county2010PA <- counties(state = 'PA', year = 2010, cb = TRUE) # Remove first 9 characters from GEOID for compatibility with tigris information county2010PA$GEOID <- substring(county2010PA$GEO_ID, 10) # Join the DI values to the county geometry -PA2010duncan <- dplyr::left_join(county2010PA, duncan2010PA$di, by = "GEOID") +PA2010duncan <- county2010PA %>% + left_join(duncan2010PA$di, by = 'GEOID') ``` ```{r duncan_plot, fig.height = 4, fig.width = 7} # Visualize the DI values (2006-2010 5-year ACS) for Pennsylvania, U.S.A., counties -ggplot2::ggplot() + - ggplot2::geom_sf(data = PA2010duncan, - ggplot2::aes(fill = DI), - size = 0.05, - color = "white") + - ggplot2::geom_sf(data = county2010PA, - fill = "transparent", - color = "white", - size = 0.2) + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c(limits = c(0, 1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2006-2010 estimates") + - ggplot2::ggtitle("Dissimilarity Index (Duncan & Duncan)\nPennsylvania census tracts to counties", - subtitle = "Black non-Hispanic vs. white non-Hispanic") +ggplot() + + geom_sf( + data = PA2010duncan, + aes(fill = DI), + size = 0.05, + color = 'white' + ) + + geom_sf( + data = county2010PA, + fill = 'transparent', + color = 'white', + size = 0.2 + ) + + theme_minimal() + + scale_fill_viridis_c(limits = c(0, 1)) + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') + + ggtitle( + 'Dissimilarity Index (Duncan & Duncan)\nPennsylvania census tracts to counties', + subtitle = 'Black non-Hispanic vs. white non-Hispanic' + ) ``` #### Compute aspatial income or racial/ethnic Atkinson Index (AI) -Compute the aspatial income or racial/ethnic AI values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties from census block groups. This metric is based on [Atkinson (1970)](https://doi.org/10.2307/2088328) that assessed the distribution of income within 12 counties but has since been adapted to study racial/ethnic segregation (see [James & Taeuber 1985](https://doi.org/10.2307/270845)). To compare median household income, specify `subgroup = "MedHHInc"` which will use the ACS-5 variable "B19013_001" in the computation. Multiple racial/ethnic subgroups are available in the `atkinson()` function, including: +Compute the aspatial income or racial/ethnic AI values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties from census block groups. This metric is based on [Atkinson (1970)](https://doi.org/10.2307/2088328) that assessed the distribution of income within 12 counties but has since been adapted to study racial/ethnic segregation (see [James & Taeuber 1985](https://doi.org/10.2307/270845)). To compare median household income, specify `subgroup = 'MedHHInc'` which will use the ACS-5 variable 'B19013_001' in the computation. Multiple racial/ethnic subgroups are available in the `atkinson()` function, including: | ACS table source | racial/ethnic subgroup | character for `subgroup` argument | | -------------- | ------------- | ---------------- | @@ -844,41 +1006,53 @@ Compute the aspatial income or racial/ethnic AI values (2017-2021 5-year ACS) fo AI is a measure of the inequality and, in the context of residential race/ethnicity, segregation when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. AI can range in value from 0 to 1 and smaller values of the index indicate lower levels of inequality (e.g., less segregation). -AI is sensitive to the choice of `epsilon` argument or the shape parameter that determines how to weight the increments to inequality (segregation) contributed by different proportions of the Lorenz curve. A user must explicitly decide how heavily to weight smaller geographical units at different points on the Lorenz curve (i.e., whether the index should take greater account of differences among areas of over- or under-representation). The `epsilon` argument must have values between 0 and 1.0. For `0 <= epsilon < 0.5` or less "inequality-averse," smaller geographical units with a subgroup proportion smaller than the subgroup proportion of the larger geographical unit contribute more to inequality ("over-representation"). For `0.5 < epsilon <= 1.0` or more "inequality-averse," smaller geographical units with a subgroup proportion larger than the subgroup proportion of the larger geographical unit contribute more to inequality ("under-representation"). If `epsilon = 0.5` (the default), units of over- and under-representation contribute equally to the index. See Section 2.3 of [Saint-Jacques et al. (2020)](https://doi.org/10.48550/arXiv.2002.05819) for one method to select `epsilon`. We choose `epsilon = 0.67` in the example below: +AI is sensitive to the choice of `epsilon` argument or the shape parameter that determines how to weight the increments to inequality (segregation) contributed by different proportions of the Lorenz curve. A user must explicitly decide how heavily to weight smaller geographical units at different points on the Lorenz curve (i.e., whether the index should take greater account of differences among areas of over- or under-representation). The `epsilon` argument must have values between 0 and 1.0. For `0 <= epsilon < 0.5` or less 'inequality-averse,' smaller geographical units with a subgroup proportion smaller than the subgroup proportion of the larger geographical unit contribute more to inequality ('over-representation'). For `0.5 < epsilon <= 1.0` or more 'inequality-averse,' smaller geographical units with a subgroup proportion larger than the subgroup proportion of the larger geographical unit contribute more to inequality ('under-representation'). If `epsilon = 0.5` (the default), units of over- and under-representation contribute equally to the index. See Section 2.3 of [Saint-Jacques et al. (2020)](https://doi.org/10.48550/arXiv.2002.05819) for one method to select `epsilon`. We choose `epsilon = 0.67` in the example below: ```{r atkinson_prep, results = 'hide'} -atkinson2021KY <- ndi::atkinson(geo_large = "county", geo_small = "block group", state = "KY", - year = 2021, subgroup = "NHoLB", epsilon = 0.67) - -# Obtain the 2021 census counties from the "tigris" package -county2021KY <- tigris::counties(state = "KY", year = 2021, cb = TRUE) +atkinson2021KY <- atkinson( + geo_large = 'county', + geo_small = 'block group', + state = 'KY', + year = 2021, + subgroup = 'NHoLB', + epsilon = 0.67 +) + +# Obtain the 2021 census counties from the 'tigris' package +county2021KY <- counties(state = 'KY', year = 2021, cb = TRUE) # Join the AI values to the county geometry -KY2021atkinson <- dplyr::left_join(county2021KY, atkinson2021KY$ai, by = "GEOID") +KY2021atkinson <- county2021KY %>% + left_join(atkinson2021KY$ai, by = 'GEOID') ``` ```{r atkinson_plot, fig.height = 4, fig.width = 7} # Visualize the AI values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties -ggplot2::ggplot() + - ggplot2::geom_sf(data = KY2021atkinson, - ggplot2::aes(fill = AI), - size = 0.05, - color = "white") + - ggplot2::geom_sf(data = county2021KY, - fill = "transparent", - color = "white", - size = 0.2) + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c(limits = c(0, 1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2017-2021 estimates") + - ggplot2::ggtitle("Atkinson Index (Atkinson)\nKentucky census block groups to counties", - subtitle = expression(paste("Black non-Hispanic (", epsilon, " = 0.67)"))) +ggplot() + + geom_sf( + data = KY2021atkinson, + aes(fill = AI), + size = 0.05, + color = 'white' + ) + + geom_sf( + data = county2021KY, + fill = 'transparent', + color = 'white', + size = 0.2 + ) + + theme_minimal() + + scale_fill_viridis_c(limits = c(0, 1)) + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') + + ggtitle( + 'Atkinson Index (Atkinson)\nKentucky census block groups to counties', + subtitle = expression(paste('Black non-Hispanic (', epsilon, ' = 0.67)')) + ) ``` #### Compute racial/ethnic Isolation Index (II) -Compute the aspatial racial/ethnic II values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties from census block groups. This metric is based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and adapted by [Bell (1954)](https://doi.org/10.2307/2574118). Multiple racial/ethnic subgroups are available in the `bell()` function, including: +Compute the aspatial racial/ethnic II values (2017-2021 5-year ACS) for Ohio, U.S.A., counties from census block groups. This metric is based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and adapted by [Bell (1954)](https://doi.org/10.2307/2574118). Multiple racial/ethnic subgroups are available in the `bell()` function, including: | ACS table source | racial/ethnic subgroup | character for `subgroup` or or `subgroup_ref` argument | | -------------- | ------------- | ---------------- | @@ -906,38 +1080,50 @@ Compute the aspatial racial/ethnic II values (2017-2021 5-year ACS) for Kentucky II is some measure of the probability that a member of one subgroup(s) will meet or interact with a member of another subgroup(s) with higher values signifying higher probability of interaction (less isolation) when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. II can range in value from 0 to 1. ```{r bell_prep, results = 'hide'} -bell2021KY <- ndi::bell(geo_large = "county", geo_small = "tract", state = "KY", - year = 2021, subgroup = "NHoLB", subgroup_ixn = "NHoLW") - -# Obtain the 2021 census counties from the "tigris" package -county2021KY <- tigris::counties(state = "KY", year = 2021, cb = TRUE) +bell2021OH <- bell( + geo_large = 'county', + geo_small = 'tract', + state = 'OH', + year = 2021, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW' +) + +# Obtain the 2021 census counties from the 'tigris' package +county2021OH <- counties(state = 'OH', year = 2021, cb = TRUE) # Join the II values to the county geometry -KY2021bell <- dplyr::left_join(county2021KY, bell2021KY$ii, by = "GEOID") +OH2021bell <- county2021OH %>% + left_join(bell2021OH$ii, by = 'GEOID') ``` -```{r bell_plot, fig.height = 4, fig.width = 7} -# Visualize the II values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties -ggplot2::ggplot() + - ggplot2::geom_sf(data = KY2021bell, - ggplot2::aes(fill = II), - size = 0.05, - color = "white") + - ggplot2::geom_sf(data = county2021KY, - fill = "transparent", - color = "white", - size = 0.2) + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c(limits = c(0, 1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2017-2021 estimates") + - ggplot2::ggtitle("Isolation Index (Bell)\nKentucky census tracts to counties", - subtitle = "Black non-Hispanic vs. white non-Hispanic") +```{r bell_plot, fig.height = 6, fig.width = 7} +# Visualize the II values (2017-2021 5-year ACS) for Ohio, U.S.A., counties +ggplot() + + geom_sf( + data = OH2021bell, + aes(fill = II), + size = 0.05, + color = 'white' + ) + + geom_sf( + data = county2021OH, + fill = 'transparent', + color = 'white', + size = 0.2 + ) + + theme_minimal() + + scale_fill_viridis_c(limits = c(0, 1)) + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') + + ggtitle( + 'Isolation Index (Bell)\nOhio census tracts to counties', + subtitle = 'Black non-Hispanic vs. white non-Hispanic' + ) ``` #### Compute Correlation Ratio (V) -Compute the aspatial racial/ethnic V values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties from census tracts. This metric is based on [Bell (1954)](https://doi.org/10.2307/2574118) and adapted by [White (1986)](https://doi.org/10.2307/3644339). Multiple racial/ethnic subgroups are available in the `white()` function, including: +Compute the aspatial racial/ethnic V values (2017-2021 5-year ACS) for South Carolina, U.S.A., counties from census tracts. This metric is based on [Bell (1954)](https://doi.org/10.2307/2574118) and adapted by [White (1986)](https://doi.org/10.2307/3644339). Multiple racial/ethnic subgroups are available in the `white()` function, including: | ACS table source | racial/ethnic subgroup | character for `subgroup` argument | | -------------- | ------------- | ---------------- | @@ -962,41 +1148,52 @@ Compute the aspatial racial/ethnic V values (2017-2021 5-year ACS) for Kentucky, | B03002_020 | Hispanic or Latino, two races including some other race | HoLTRiSOR | | B03002_021 | Hispanic or Latino, two races excluding some other race, and three or more races | HoLTReSOR | -V removes the asymmetry from the Isolation Index by controlling for the effect of population composition when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. The Isolation Index is some measure of the probability that a member of one subgroup(s) will meet or interact with a member of another subgroup(s) with higher values signifying higher probability of interaction (less isolation). V can range in value from 0 to 1. +V removes the asymmetry from the Isolation Index by controlling for the effect of population composition when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. The Isolation Index is some measure of the probability that a member of one subgroup(s) will meet or interact with a member of another subgroup(s) with higher values signifying higher probability of interaction (less isolation). V can range in value from -Inf to Inf. ```{r white_prep, results = 'hide'} -white2021KY <- ndi::white(geo_large = "county", geo_small = "tract", state = "KY", - year = 2021, subgroup = "NHoLB") +white2021SC <- white( + geo_large = 'county', + geo_small = 'tract', + state = 'SC', + year = 2021, + subgroup = 'NHoLB' +) -# Obtain the 2021 census counties from the "tigris" package -county2021KY <- tigris::counties(state = "KY", year = 2021, cb = TRUE) +# Obtain the 2021 census counties from the 'tigris' package +county2021SC <- counties(state = 'SC', year = 2021, cb = TRUE) # Join the V values to the county geometry -KY2021white <- dplyr::left_join(county2021KY, white2021KY$v, by = "GEOID") +SC2021white <- county2021SC %>% + left_join(white2021SC$v, by = 'GEOID') ``` -```{r white_plot, fig.height = 4, fig.width = 7} -# Visualize the V values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties -ggplot2::ggplot() + - ggplot2::geom_sf(data = KY2021white, - ggplot2::aes(fill = V), - size = 0.05, - color = "white") + - ggplot2::geom_sf(data = county2021KY, - fill = "transparent", - color = "white", - size = 0.2) + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c(limits = c(0, 1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2017-2021 estimates") + - ggplot2::ggtitle("Correlation Ratio (White)\nKentucky census tracts to counties", - subtitle = "Black non-Hispanic") +```{r white_plot, fig.height = 6, fig.width = 7} +# Visualize the V values (2017-2021 5-year ACS) for South Carolina, U.S.A., counties +ggplot() + + geom_sf( + data = SC2021white, + aes(fill = V), + size = 0.05, + color = 'white' + ) + + geom_sf( + data = county2021SC, + fill = 'transparent', + color = 'white', + size = 0.2 + ) + + theme_minimal() + + scale_fill_viridis_c() + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') + + ggtitle( + 'Correlation Ratio (White)\nSouth Carolina census tracts to counties', + subtitle = 'Black non-Hispanic' + ) ``` #### Compute Location Quotient (LQ) -Compute the aspatial racial/ethnic LQ values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties vs. the state. This metric is based on [Merton (1939)](https://doi.org/10.2307/2084686) and adapted by [Sudano et al. (2013)](https://doi.org/10.1016/j.healthplace.2012.09.015). Multiple racial/ethnic subgroups are available in the `sudano()` function, including: +Compute the aspatial racial/ethnic LQ values (2017-2021 5-year ACS) for Tennessee, U.S.A., counties vs. the state. This metric is based on [Merton (1939)](https://doi.org/10.2307/2084686) and adapted by [Sudano et al. (2013)](https://doi.org/10.1016/j.healthplace.2012.09.015). Multiple racial/ethnic subgroups are available in the `sudano()` function, including: | ACS table source | racial/ethnic subgroup | character for `subgroup` argument | | -------------- | ------------- | ---------------- | @@ -1024,37 +1221,49 @@ Compute the aspatial racial/ethnic LQ values (2017-2021 5-year ACS) for Kentucky LQ is some measure of relative racial homogeneity of each smaller geography within a larger geography. LQ can range in value from 0 to infinity because it is ratio of two proportions in which the numerator is the proportion of subgroup population in a smaller geography and the denominator is the proportion of subgroup population in its larger geography. For example, a smaller geography with an LQ of 5 means that the proportion of the subgroup population living in the smaller geography is five times the proportion of the subgroup population in its larger geography. Unlike the previous metrics that aggregate to the larger geography, LQ computes values for each smaller geography relative to the larger geography. ```{r sudano_prep, results = 'hide'} -sudano2021KY <- ndi::sudano(geo_large = "state", geo_small = "county", state = "KY", - year = 2021, subgroup = "NHoLB") +sudano2021TN <- sudano( + geo_large = 'state', + geo_small = 'county', + state = 'TN', + year = 2021, + subgroup = 'NHoLB' +) -# Obtain the 2021 census counties from the "tigris" package -county2021KY <- tigris::counties(state = "KY", year = 2021, cb = TRUE) +# Obtain the 2021 census counties from the 'tigris' package +county2021TN <- counties(state = 'TN', year = 2021, cb = TRUE) # Join the LQ values to the county geometry -KY2021sudano <- dplyr::left_join(county2021KY, sudano2021KY$lq, by = "GEOID") +TN2021sudano <- county2021TN %>% + left_join(sudano2021TN$lq, by = 'GEOID') ``` -```{r sudano_plot, fig.height = 4, fig.width = 7} -# Visualize the LQ values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties -ggplot2::ggplot() + - ggplot2::geom_sf(data = KY2021sudano, - ggplot2::aes(fill = LQ), - size = 0.05, - color = "white") + - ggplot2::geom_sf(data = county2021KY, - fill = "transparent", - color = "white", - size = 0.2) + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c(limits = c(0, 1)) + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2017-2021 estimates") + - ggplot2::ggtitle("Location Quotient (Sudano)\nKentucky counties vs. state", - subtitle = "Black non-Hispanic") +```{r sudano_plot, fig.height = 3, fig.width = 7} +# Visualize the LQ values (2017-2021 5-year ACS) for Tennessee, U.S.A., counties +ggplot() + + geom_sf( + data = TN2021sudano, + aes(fill = LQ), + size = 0.05, + color = 'white' + ) + + geom_sf( + data = county2021TN, + fill = 'transparent', + color = 'white', + size = 0.2 + ) + + theme_minimal() + + scale_fill_viridis_c() + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') + + ggtitle( + 'Location Quotient (Sudano)\nTennessee counties vs. state', + subtitle = 'Black non-Hispanic' + ) ``` + #### Compute Local Exposure and Isolation (LEx/Is) -Compute the aspatial racial/ethnic Local Exposure and Isolation metric (2017-2021 5-year ACS) for Kentucky, U.S.A., counties vs. the state. This metric is based on [Bemanian & Beyer (2017)](https://doi.org/10.1158/1055-9965.EPI-16-0926). Multiple racial/ethnic subgroups are available in the `bemanian_beyer()` function, including: +Compute the aspatial racial/ethnic Local Exposure and Isolation metric (2017-2021 5-year ACS) for Mississippi, U.S.A., counties vs. the state. This metric is based on [Bemanian & Beyer (2017)](https://doi.org/10.1158/1055-9965.EPI-16-0926). Multiple racial/ethnic subgroups are available in the `bemanian_beyer()` function, including: | ACS table source | racial/ethnic subgroup | character for `subgroup` argument | | -------------- | ------------- | ---------------- | @@ -1082,51 +1291,144 @@ Compute the aspatial racial/ethnic Local Exposure and Isolation metric (2017-202 LEx/Is is a measure of the probability that two individuals living within a specific smaller geography (e.g., census tract) of either different (i.e., exposure) or the same (i.e., isolation) racial/ethnic subgroup(s) will interact, assuming that individuals within a smaller geography are randomly mixed. LEx/Is is standardized with a logit transformation and centered against an expected case that all races/ethnicities are evenly distributed across a larger geography. LEx/Is can range from negative infinity to infinity. If LEx/Is is zero then the estimated probability of the interaction between two people of the given subgroup(s) within a smaller geography is equal to the expected probability if the subgroup(s) were perfectly mixed in the larger geography. If LEx/Is is greater than zero then the interaction is more likely to occur within the smaller geography than in the larger geography, and if LEx/Is is less than zero then the interaction is less likely to occur within the smaller geography than in the larger geography. Note: the exponentiation of each LEx/Is metric results in the odds ratio of the specific exposure or isolation of interest in a smaller geography relative to the larger geography. Similar to LQ (Sudano), LEx/Is computes values for each smaller geography relative to the larger geography. ```{r bemanian_beyer_prep, results = 'hide'} -bemanian_beyer2021KY <- ndi::bemanian_beyer(geo_large = "state", geo_small = "county", state = "KY", - year = 2021, subgroup = "NHoLB", subgroup_ixn = "NHoLW") - -# Obtain the 2021 census counties from the "tigris" package -county2021KY <- tigris::counties(state = "KY", year = 2021, cb = TRUE) +bemanian_beyer2021MS <- bemanian_beyer( + geo_large = 'state', + geo_small = 'county', + state = 'MS', + year = 2021, + subgroup = 'NHoLB', + subgroup_ixn = 'NHoLW' +) + +# Obtain the 2021 census counties from the 'tigris' package +county2021MS <- counties(state = 'MS', year = 2021, cb = TRUE) # Join the LEx/Is values to the county geometry -KY2021bemanian_beyer <- dplyr::left_join(county2021KY, bemanian_beyer2021KY$lexis, by = "GEOID") +MS2021bemanian_beyer <- county2021MS %>% + left_join(bemanian_beyer2021MS$lexis, by = 'GEOID') ``` -```{r bemanian_beyer_plot, fig.height = 4, fig.width = 7} -# Visualize the LEx/Is values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties -ggplot2::ggplot() + - ggplot2::geom_sf(data = KY2021bemanian_beyer, - ggplot2::aes(fill = LExIs), - size = 0.05, - color = "white") + - ggplot2::geom_sf(data = county2021KY, - fill = "transparent", - color = "white", - size = 0.2) + - ggplot2::theme_minimal() + - ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340") + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2017-2021 estimates") + - ggplot2::ggtitle("Local Exposure and Isolation (Bemanian & Beyer) metric\nKentucky counties vs. state", - subtitle = "Black non-Hispanic vs. White non-Hispanic") +```{r bemanian_beyer_plot, fig.height = 7, fig.width = 6.5} +# Visualize the LEx/Is values (2017-2021 5-year ACS) for Mississippi, U.S.A., counties +ggplot() + + geom_sf( + data = MS2021bemanian_beyer, + aes(fill = LExIs), + size = 0.05, + color = 'white' + ) + + geom_sf( + data = county2021MS, + fill = 'transparent', + color = 'white', + size = 0.2 + ) + + theme_minimal() + + scale_fill_gradient2( + low = '#998ec3', + mid = '#f7f7f7', + high = '#f1a340' + ) + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') + + ggtitle( + 'Local Exposure and Isolation (Bemanian & Beyer)\nMississippi counties vs. state', + subtitle = 'Black non-Hispanic vs. White non-Hispanic' + ) +``` +```{r bemanian_beyer_odds, fig.height = 7, fig.width = 6.5} +# Visualize the exponentiated LEx/Is values (2017-2021 5-year ACS) for +## Mississippi, U.S.A., counties +ggplot() + + geom_sf( + data = MS2021bemanian_beyer, + aes(fill = exp(LExIs)), + size = 0.05, + color = 'white' + ) + + geom_sf( + data = county2021MS, + fill = 'transparent', + color = 'white', + size = 0.2 + ) + + theme_minimal() + + scale_fill_viridis_c() + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') + + ggtitle( + 'Odds ratio of Local Exposure and Isolation (Bemanian & Beyer)\n + Mississippi counties vs. state', + subtitle = 'Black non-Hispanic vs. White non-Hispanic' + ) ``` -```{r bemanian_beyer_odds, fig.height = 4, fig.width = 7} -# Visualize the exponentiated LEx/Is values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties -ggplot2::ggplot() + - ggplot2::geom_sf(data = KY2021bemanian_beyer, - ggplot2::aes(fill = exp(LExIs)), - size = 0.05, - color = "white") + - ggplot2::geom_sf(data = county2021KY, - fill = "transparent", - color = "white", - size = 0.2) + - ggplot2::theme_minimal() + - ggplot2::scale_fill_viridis_c() + - ggplot2::labs(fill = "Index (Continuous)", - caption = "Source: U.S. Census ACS 2017-2021 estimates") + - ggplot2::ggtitle("Odds ratio of Local Exposure and Isolation (Bemanian & Beyer) metric\nKentucky counties vs. state", - subtitle = "Black non-Hispanic vs. White non-Hispanic") + +#### Compute Delta (DEL) + +Compute the aspatial racial/ethnic DEL values (2017-2021 5-year ACS) for Alabama, U.S.A., counties from census tracts. This metric is based on [Hoover (1941)](https://doi.org/10.1017/S0022050700052980) and Duncan et al. (1961; LC:60007089). Multiple racial/ethnic subgroups are available in the `hoover()` function, including: + +| ACS table source | racial/ethnic subgroup | character for `subgroup` argument | +| -------------- | ------------- | ---------------- | +| B03002_002 | not Hispanic or Latino | NHoL | +| B03002_003 | not Hispanic or Latino, white alone | NHoLW | +| B03002_004 | not Hispanic or Latino, Black or African American alone | NHoLB | +| B03002_005 | not Hispanic or Latino, American Indian and Alaska Native alone | NHoLAIAN | +| B03002_006 | not Hispanic or Latino, Asian alone | NHoLA | +| B03002_007 | not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone | NHoLNHOPI | +| B03002_008 | not Hispanic or Latino, some other race alone | NHoLSOR | +| B03002_009 | not Hispanic or Latino, two or more races | NHoLTOMR | +| B03002_010 | not Hispanic or Latino, two races including some other race | NHoLTRiSOR | +| B03002_011 | not Hispanic or Latino, two races excluding some other race, and three or more races | NHoLTReSOR | +| B03002_012 | Hispanic or Latino | HoL | +| B03002_013 | Hispanic or Latino, white alone | HoLW | +| B03002_014 | Hispanic or Latino, Black or African American alone | HoLB | +| B03002_015 | Hispanic or Latino, American Indian and Alaska Native alone | HoLAIAN | +| B03002_016 | Hispanic or Latino, Asian alone | HoLA | +| B03002_017 | Hispanic or Latino, Native Hawaiian and other Pacific Islander alone | HoLNHOPI | +| B03002_018 | Hispanic or Latino, some other race alone | HoLSOR | +| B03002_019 | Hispanic or Latino, two or more races | HoLTOMR | +| B03002_020 | Hispanic or Latino, two races including some other race | HoLTRiSOR | +| B03002_021 | Hispanic or Latino, two races excluding some other race, and three or more races | HoLTReSOR | + +DEL is a measure of the proportion of members of one subgroup(s) residing in geographic units with above average density of members of the subgroup(s). The index provides the proportion of a subgroup population that would have to move across geographic units to achieve a uniform density. DEL can range in value from 0 to 1. + +```{r hoover_prep, results = 'hide'} +hoover2021AL <- hoover( + geo_large = 'county', + geo_small = 'tract', + state = 'AL', + year = 2021, + subgroup = 'NHoLB' +) + +# Obtain the 2021 census counties from the 'tigris' package +county2021AL <- counties(state = 'AL', year = 2021, cb = TRUE) + +# Join the DEL values to the county geometry +AL2021hoover <- county2021AL %>% + left_join(hoover2021AL$del, by = 'GEOID') +``` + +```{r hoover_plot, fig.height = 7, fig.width = 6} +# Visualize the DEL values (2017-2021 5-year ACS) for Alabama, U.S.A., counties +ggplot() + + geom_sf( + data = AL2021hoover, + aes(fill = DEL), + size = 0.05, + color = 'white' + ) + + geom_sf( + data = county2021AL, + fill = 'transparent', + color = 'white', + size = 0.2 + ) + + theme_minimal() + + scale_fill_viridis_c(limits = c(0, 1)) + + labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') + + ggtitle( + 'Delta (Hoover)\nAlabama census tracts to counties', + subtitle = 'Black non-Hispanic' + ) ``` ```{r system} diff --git a/vignettes/vignette.html b/vignettes/vignette.html index ac5c2d8..6d60e41 100644 --- a/vignettes/vignette.html +++ b/vignettes/vignette.html @@ -12,7 +12,7 @@ - + ndi: Neighborhood Deprivation Indices @@ -340,14 +340,14 @@

ndi: Neighborhood Deprivation Indices

Ian D. Buller (GitHub: @idblr)

-

2023-02-01

+

2024-07-06

Start with the necessary packages for the vignette.

-
loadedPackages <- c("dplyr", "ggplot2", "ndi", "tidycensus", "tigris")
-invisible(lapply(loadedPackages, library, character.only = TRUE))
-options(tigris_use_cache = TRUE)
+
loadedPackages <- c('dplyr', 'ggplot2', 'ndi', 'tidycensus', 'tigris')
+invisible(lapply(loadedPackages, library, character.only = TRUE))
+options(tigris_use_cache = TRUE)

Set your U.S. Census Bureau access key. Follow this link to obtain one. Specify your access key in the messer() or powell_wiley() functions using the key @@ -357,7 +357,7 @@

2023-02-01

package before running the messer() or powell_wiley() functions (see an example of the latter below).

-
tidycensus::census_api_key("...") # INSERT YOUR OWN KEY FROM U.S. CENSUS API
+
census_api_key('...') # INSERT YOUR OWN KEY FROM U.S. CENSUS API

Compute NDI (Messer)

Compute the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, @@ -430,30 +430,30 @@

Compute NDI (Messer)

-
messer2010GA <- ndi::messer(state = "GA", year = 2010, round_output = TRUE)
+
messer2010GA <- messer(state = 'GA', year = 2010, round_output = TRUE)

One output from the messer() function is a tibble containing the identification, geographic name, NDI (Messer) values, and raw census characteristics for each tract.

-
messer2010GA$ndi
+
messer2010GA$ndi
## # A tibble: 1,969 × 14
-##    GEOID  state county tract     NDI NDIQu…¹   OCC   CWD   POV   FHH   PUB   U30
-##    <chr>  <chr> <chr>  <chr>   <dbl> <fct>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-##  1 13001… Geor… Appli… 9501  -0.0075 2-Belo…     0   0     0.1   0.1   0.1   0.3
-##  2 13001… Geor… Appli… 9502   0.0458 4-Most…     0   0     0.3   0.1   0.2   0.5
-##  3 13001… Geor… Appli… 9503   0.0269 3-Abov…     0   0     0.2   0     0.2   0.4
-##  4 13001… Geor… Appli… 9504  -0.0083 2-Belo…     0   0     0.1   0     0.1   0.3
-##  5 13001… Geor… Appli… 9505   0.0231 3-Abov…     0   0     0.2   0     0.2   0.4
-##  6 13003… Geor… Atkin… 9601   0.0619 4-Most…     0   0.1   0.2   0.2   0.2   0.5
-##  7 13003… Geor… Atkin… 9602   0.0593 4-Most…     0   0.1   0.3   0.1   0.2   0.4
-##  8 13003… Geor… Atkin… 9603   0.0252 3-Abov…     0   0     0.3   0.1   0.2   0.4
-##  9 13005… Geor… Bacon… 9701   0.0061 3-Abov…     0   0     0.2   0     0.2   0.4
-## 10 13005… Geor… Bacon… 9702…  0.0121 3-Abov…     0   0     0.2   0.1   0.1   0.5
-## # … with 1,959 more rows, 2 more variables: EDU <dbl>, EMP <dbl>, and
-## #   abbreviated variable name ¹​NDIQuart
+## GEOID state county tract NDI NDIQuart OCC CWD POV FHH PUB U30 +## <chr> <chr> <chr> <chr> <dbl> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +## 1 1300… Geor… Appli… 9501 -0.0075 2-Below… 0 0 0.1 0.1 0.1 0.3 +## 2 1300… Geor… Appli… 9502 0.0458 4-Most … 0 0 0.3 0.1 0.2 0.5 +## 3 1300… Geor… Appli… 9503 0.0269 3-Above… 0 0 0.2 0 0.2 0.4 +## 4 1300… Geor… Appli… 9504 -0.0083 2-Below… 0 0 0.1 0 0.1 0.3 +## 5 1300… Geor… Appli… 9505 0.0231 3-Above… 0 0 0.2 0 0.2 0.4 +## 6 1300… Geor… Atkin… 9601 0.0619 4-Most … 0 0.1 0.2 0.2 0.2 0.5 +## 7 1300… Geor… Atkin… 9602 0.0593 4-Most … 0 0.1 0.3 0.1 0.2 0.4 +## 8 1300… Geor… Atkin… 9603 0.0252 3-Above… 0 0 0.3 0.1 0.2 0.4 +## 9 1300… Geor… Bacon… 9701 0.0061 3-Above… 0 0 0.2 0 0.2 0.4 +## 10 1300… Geor… Bacon… 9702… 0.0121 3-Above… 0 0 0.2 0.1 0.1 0.5 +## # ℹ 1,959 more rows +## # ℹ 2 more variables: EDU <dbl>, EMP <dbl>

A second output from the messer() function is the results from the principal component analysis used to compute the NDI (Messer) values.

-
messer2010GA$pca
+
messer2010GA$pca
## Principal Components Analysis
 ## Call: psych::principal(r = ndi_data_pca, nfactors = 1, n.obs = nrow(ndi_data_pca), 
 ##     covar = FALSE, scores = TRUE, missing = imp)
@@ -482,7 +482,7 @@ 

Compute NDI (Messer)

A third output from the messer() function is a tibble containing a breakdown of the missingness of the census characteristics used to compute the NDI (Messer) values.

-
messer2010GA$missing
+
messer2010GA$missing
## # A tibble: 8 × 4
 ##   variable total n_missing percent_missing
 ##   <chr>    <int>     <int> <chr>          
@@ -497,99 +497,127 @@ 

Compute NDI (Messer)

We can visualize the NDI (Messer) values geographically by linking them to spatial information from the tigris package and plotting with the ggplot2 package suite.

-
# Obtain the 2010 counties from the "tigris" package
-county2010GA <- tigris::counties(state = "GA", year = 2010, cb = TRUE)
-# Remove first 9 characters from GEOID for compatibility with tigris information
-county2010GA$GEOID <- substring(county2010GA$GEO_ID, 10) 
-
-# Obtain the 2010 census tracts from the "tigris" package
-tract2010GA <- tigris::tracts(state = "GA", year = 2010, cb = TRUE)
-# Remove first 9 characters from GEOID for compatibility with tigris information
-tract2010GA$GEOID <- substring(tract2010GA$GEO_ID, 10) 
-
-# Join the NDI (Messer) values to the census tract geometry
-GA2010messer <- dplyr::left_join(tract2010GA, messer2010GA$ndi, by = "GEOID")
-
# Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., census tracts 
-## Continuous Index
-ggplot2::ggplot() + 
-  ggplot2::geom_sf(data = GA2010messer, 
-                   ggplot2::aes(fill = NDI),
-                   size = 0.05,
-                   color = "transparent") +
-   ggplot2::geom_sf(data = county2010GA,
-                   fill = "transparent", 
-                   color = "white",
-                   size = 0.2) +
-  ggplot2::theme_minimal() +
-  ggplot2::scale_fill_viridis_c() +
-  ggplot2::labs(fill = "Index (Continuous)",
-                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
-  ggplot2::ggtitle("Neighborhood Deprivation Index (Messer)",
-                   subtitle = "GA census tracts as the referent")
-
-## Categorical Index
-### Rename "9-NDI not avail" level as NA for plotting
-GA2010messer$NDIQuartNA <- factor(replace(as.character(GA2010messer$NDIQuart), 
-                                            GA2010messer$NDIQuart == "9-NDI not avail", NA),
-                                  c(levels(GA2010messer$NDIQuart)[-5], NA))
-
-ggplot2::ggplot() + 
-  ggplot2::geom_sf(data = GA2010messer, 
-                   ggplot2::aes(fill = NDIQuartNA),
-                   size = 0.05,
-                   color = "transparent") +
-   ggplot2::geom_sf(data = county2010GA,
-                   fill = "transparent", 
-                   color = "white",
-                   size = 0.2) +
-  ggplot2::theme_minimal() + 
-  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
-                                na.value = "grey80") +
-  ggplot2::labs(fill = "Index (Categorical)",
-                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
-  ggplot2::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles",
-                   subtitle = "GA census tracts as the referent")
-

+
# Obtain the 2010 counties from the 'tigris' package
+county2010GA <- counties(state = 'GA', year = 2010, cb = TRUE)
+# Remove first 9 characters from GEOID for compatibility with tigris information
+county2010GA$GEOID <- substring(county2010GA$GEO_ID, 10) 
+
+# Obtain the 2010 census tracts from the 'tigris' package
+tract2010GA <- tracts(state = 'GA', year = 2010, cb = TRUE)
+# Remove first 9 characters from GEOID for compatibility with tigris information
+tract2010GA$GEOID <- substring(tract2010GA$GEO_ID, 10) 
+
+# Join the NDI (Messer) values to the census tract geometry
+GA2010messer <- tract2010GA %>%
+  left_join(messer2010GA$ndi, by = 'GEOID')
+
# Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., census tracts 
+## Continuous Index
+ggplot() +
+  geom_sf(
+    data = GA2010messer,
+    aes(fill = NDI),
+    size = 0.05,
+    color = 'transparent'
+  ) +
+  geom_sf(
+    data = county2010GA,
+    fill = 'transparent',
+    color = 'white',
+    size = 0.2
+  ) +
+  theme_minimal() +
+  scale_fill_viridis_c() +
+  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
+  ggtitle(
+    'Neighborhood Deprivation Index (Messer)',
+    subtitle = 'GA census tracts as the referent'
+  )
+
+## Categorical Index
+### Rename '9-NDI not avail' level as NA for plotting
+GA2010messer$NDIQuartNA <-
+  factor(
+    replace(
+      as.character(GA2010messer$NDIQuart),
+      GA2010messer$NDIQuart == '9-NDI not avail',
+      NA
+    ),
+    c(levels(GA2010messer$NDIQuart)[-5], NA)
+  )
+
+ggplot() +
+  geom_sf(
+    data = GA2010messer,
+    aes(fill = NDIQuartNA),
+    size = 0.05,
+    color = 'transparent'
+  ) +
+  geom_sf(
+    data = county2010GA,
+    fill = 'transparent',
+    color = 'white',
+    size = 0.2
+  ) +
+  theme_minimal() +
+  scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') +
+  labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
+  ggtitle(
+    'Neighborhood Deprivation Index (Messer) Quartiles',
+    subtitle = 'GA census tracts as the referent'
+  )
+

The results above are at the tract level. The NDI (Messer) values can also be calculated at the county level.

-
messer2010GA_county <- ndi::messer(geo = "county", state = "GA", year = 2010)
-
-# Join the NDI (Messer) values to the county geometry
-GA2010messer_county <- dplyr::left_join(county2010GA, messer2010GA_county$ndi, by = "GEOID")
-
# Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., counties
-## Continuous Index
-ggplot2::ggplot() + 
-  ggplot2::geom_sf(data = GA2010messer_county, 
-                   ggplot2::aes(fill = NDI),
-                   size = 0.20,
-                   color = "white") +
-  ggplot2::theme_minimal() + 
-  ggplot2::scale_fill_viridis_c() +
-  ggplot2::labs(fill = "Index (Continuous)",
-                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
-  ggplot2::ggtitle("Neighborhood Deprivation Index (Messer)",
-                   subtitle = "GA counties as the referent")
-
-## Categorical Index
-
-### Rename "9-NDI not avail" level as NA for plotting
-GA2010messer_county$NDIQuartNA <- factor(replace(as.character(GA2010messer_county$NDIQuart), 
-                                            GA2010messer_county$NDIQuart == "9-NDI not avail", NA),
-                                         c(levels(GA2010messer_county$NDIQuart)[-5], NA))
-
-ggplot2::ggplot() + 
-  ggplot2::geom_sf(data = GA2010messer_county, 
-                   ggplot2::aes(fill = NDIQuartNA),
-                   size = 0.20,
-                   color = "white") +
-  ggplot2::theme_minimal() + 
-  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
-                                na.value = "grey80") +
-  ggplot2::labs(fill = "Index (Categorical)",
-                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
-  ggplot2::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles",
-                   subtitle = "GA counties as the referent")
-

+
messer2010GA_county <- messer(geo = 'county', state = 'GA', year = 2010)
+
+# Join the NDI (Messer) values to the county geometry
+GA2010messer_county <- county2010GA %>%
+  left_join(messer2010GA_county$ndi, by = 'GEOID')
+
# Visualize the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., counties
+## Continuous Index
+ggplot() +
+  geom_sf(
+    data = GA2010messer_county,
+    aes(fill = NDI),
+    size = 0.20,
+    color = 'white'
+  ) +
+  theme_minimal() +
+  scale_fill_viridis_c() +
+  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
+  ggtitle(
+    'Neighborhood Deprivation Index (Messer)',
+    subtitle = 'GA counties as the referent'
+  )
+
+## Categorical Index
+
+### Rename '9-NDI not avail' level as NA for plotting
+GA2010messer_county$NDIQuartNA <-
+  factor(
+    replace(
+      as.character(GA2010messer_county$NDIQuart),
+      GA2010messer_county$NDIQuart == '9-NDI not avail',
+      NA
+    ),
+    c(levels(GA2010messer_county$NDIQuart)[-5], NA)
+  )
+
+ggplot() +
+  geom_sf(
+    data = GA2010messer_county,
+    aes(fill = NDIQuartNA),
+    size = 0.20,
+    color = 'white'
+  ) +
+  theme_minimal() +
+  scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') +
+  labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
+  ggtitle(
+    'Neighborhood Deprivation Index (Messer) Quartiles',
+    subtitle = 'GA counties as the referent'
+  )
+

Compute NDI (Powell-Wiley)

@@ -703,33 +731,37 @@

Compute NDI (Powell-Wiley)

and computation of the NDI (Powell-Wiley) can be found on a GIS Portal for Cancer Research website.

-
powell_wiley2020DMVW <- ndi::powell_wiley(state = c("DC", "MD", "VA", "WV"), year = 2020, round_output = TRUE)
+
powell_wiley2020DMVW <- powell_wiley(
+  state = c('DC', 'MD', 'VA', 'WV'),
+  year = 2020,
+  round_output = TRUE
+)

One output from the powell_wiley() function is a tibble containing the identification, geographic name, NDI (Powell-Wiley) values, and raw census characteristics for each tract.

-
powell_wiley2020DMVW$ndi
+
powell_wiley2020DMVW$ndi
## # A tibble: 4,425 × 20
-##    GEOID       state  county tract   NDI NDIQu…¹ MedHH…² PctRe…³ PctPu…⁴ MedHo…⁵
-##    <chr>       <chr>  <chr>  <chr> <dbl> <fct>     <dbl>   <dbl>   <dbl>   <dbl>
-##  1 11001000101 Distr… Distr… 1.01  -2.13 1-Leas…  187839    50.9     0.8  699100
-##  2 11001000102 Distr… Distr… 1.02  -2.46 1-Leas…  184167    52.2     0.6 1556000
-##  3 11001000201 Distr… Distr… 2.01  NA    9-NDI …      NA   NaN     NaN        NA
-##  4 11001000202 Distr… Distr… 2.02  -2.30 1-Leas…  164261    49.6     0.9 1309100
-##  5 11001000300 Distr… Distr… 3     -2.06 1-Leas…  156483    46       0.6  976500
-##  6 11001000400 Distr… Distr… 4     -2.09 1-Leas…  153397    47.8     0   1164200
-##  7 11001000501 Distr… Distr… 5.01  -2.11 1-Leas…  119911    44.5     0.8  674600
-##  8 11001000502 Distr… Distr… 5.02  -2.21 1-Leas…  153264    46.8     0.5 1012500
-##  9 11001000600 Distr… Distr… 6     -2.16 1-Leas…  154266    60.8     7.4 1109800
-## 10 11001000702 Distr… Distr… 7.02  -1.20 1-Leas…   71747    22.9     0    289900
-## # … with 4,415 more rows, 10 more variables: PctMgmtBusScArti <dbl>,
+##    GEOID       state  county tract   NDI NDIQuint MedHHInc PctRecvIDR PctPubAsst
+##    <chr>       <chr>  <chr>  <chr> <dbl> <fct>       <dbl>      <dbl>      <dbl>
+##  1 11001000101 Distr… Distr… 1.01  -2.13 1-Least…   187839       50.9        0.8
+##  2 11001000102 Distr… Distr… 1.02  -2.46 1-Least…   184167       52.2        0.6
+##  3 11001000201 Distr… Distr… 2.01  NA    9-NDI n…       NA      NaN        NaN  
+##  4 11001000202 Distr… Distr… 2.02  -2.30 1-Least…   164261       49.6        0.9
+##  5 11001000300 Distr… Distr… 3     -2.06 1-Least…   156483       46          0.6
+##  6 11001000400 Distr… Distr… 4     -2.09 1-Least…   153397       47.8        0  
+##  7 11001000501 Distr… Distr… 5.01  -2.11 1-Least…   119911       44.5        0.8
+##  8 11001000502 Distr… Distr… 5.02  -2.21 1-Least…   153264       46.8        0.5
+##  9 11001000600 Distr… Distr… 6     -2.16 1-Least…   154266       60.8        7.4
+## 10 11001000702 Distr… Distr… 7.02  -1.20 1-Least…    71747       22.9        0  
+## # ℹ 4,415 more rows
+## # ℹ 11 more variables: MedHomeVal <dbl>, PctMgmtBusScArti <dbl>,
 ## #   PctFemHeadKids <dbl>, PctOwnerOcc <dbl>, PctNoPhone <dbl>,
 ## #   PctNComPlmb <dbl>, PctEducHSPlus <dbl>, PctEducBchPlus <dbl>,
-## #   PctFamBelowPov <dbl>, PctUnempl <dbl>, TotalPop <dbl>, and abbreviated
-## #   variable names ¹​NDIQuint, ²​MedHHInc, ³​PctRecvIDR, ⁴​PctPubAsst, ⁵​MedHomeVal
+## # PctFamBelowPov <dbl>, PctUnempl <dbl>, TotalPop <dbl>

A second output from the powell_wiley() function is the results from the principal component analysis used to compute the NDI (Powell-Wiley) values.

-
powell_wiley2020DMVW$pca
+
powell_wiley2020DMVW$pca
## $loadings
 ## 
 ## Loadings:
@@ -806,7 +838,7 @@ 

Compute NDI (Powell-Wiley)

A third output from the powell_wiley() function is a tibble containing a breakdown of the missingness of the census characteristics used to compute the NDI (Powell-Wiley) values.

-
powell_wiley2020DMVW$missing
+
powell_wiley2020DMVW$missing
## # A tibble: 13 × 4
 ##    variable        total n_missing percent_missing
 ##    <chr>           <int>     <int> <chr>          
@@ -826,109 +858,129 @@ 

Compute NDI (Powell-Wiley)

A fourth output from the powell_wiley() function is a character string or numeric value of a standardized Cronbach’s alpha. A value greater than 0.7 is desired.

-
powell_wiley2020DMVW$cronbach
-
## [1] 0.931138
+
powell_wiley2020DMVW$cronbach
+
## [1] 0.9321693

We can visualize the NDI (Powell-Wiley) values geographically by linking them to spatial information from the tigris package and plotting with the ggplot2 package suite.

-
# Obtain the 2020 counties from the "tigris" package
-county2020 <- tigris::counties(cb = TRUE)
-county2020DMVW <- county2020[county2020$STUSPS %in% c("DC", "MD", "VA", "WV"), ]
-
-# Obtain the 2020 census tracts from the "tigris" package
-tract2020D <- tigris::tracts(state = "DC", year = 2020, cb = TRUE)
-tract2020M <- tigris::tracts(state = "MD", year = 2020, cb = TRUE)
-tract2020V <- tigris::tracts(state = "VA", year = 2020, cb = TRUE)
-tract2020W <- tigris::tracts(state = "WV", year = 2020, cb = TRUE)
-tracts2020DMVW <- rbind(tract2020D, tract2020M, tract2020V, tract2020W)
-
-# Join the NDI (Powell-Wiley) values to the census tract geometry
-DMVW2020pw <- dplyr::left_join(tracts2020DMVW, powell_wiley2020DMVW$ndi, by = "GEOID")
-
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) 
-## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., census tracts 
-## Continuous Index
-ggplot2::ggplot() + 
-  ggplot2::geom_sf(data = DMVW2020pw, 
-                   ggplot2::aes(fill = NDI), 
-                   color = NA) +
-  ggplot2::geom_sf(data = county2020DMVW,
-                   fill = "transparent", 
-                   color = "white") +
-  ggplot2::theme_minimal() + 
-  ggplot2::scale_fill_viridis_c(na.value = "grey80") +
-  ggplot2::labs(fill = "Index (Continuous)",
-                caption = "Source: U.S. Census ACS 2016-2020 estimates")+
-  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)",
-                   subtitle = "DC, MD, VA, and WV tracts as the referent")
-
-## Categorical Index (Population-weighted quintiles)
-### Rename "9-NDI not avail" level as NA for plotting
-DMVW2020pw$NDIQuintNA <- factor(replace(as.character(DMVW2020pw$NDIQuint), 
-                                        DMVW2020pw$NDIQuint == "9-NDI not avail", NA),
-                                c(levels(DMVW2020pw$NDIQuint)[-6], NA))
-
-ggplot2::ggplot() + 
-  ggplot2::geom_sf(data = DMVW2020pw, 
-                   ggplot2::aes(fill = NDIQuintNA), 
-                   color = NA) +
-  ggplot2::geom_sf(data = county2020DMVW,
-                   fill = "transparent", 
-                   color = "white") +
-  ggplot2::theme_minimal() + 
-  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
-                                na.value = "grey80") +
-  ggplot2::labs(fill = "Index (Categorical)",
-                caption = "Source: U.S. Census ACS 2016-2020 estimates")+
-  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles",
-                   subtitle = "DC, MD, VA, and WV tracts as the referent")
-

+
# Obtain the 2020 counties from the 'tigris' package
+county2020 <- counties(cb = TRUE)
+county2020DMVW <- county2020[county2020$STUSPS %in% c('DC', 'MD', 'VA', 'WV'), ]
+
+# Obtain the 2020 census tracts from the 'tigris' package
+tract2020D <- tracts(state = 'DC', year = 2020, cb = TRUE)
+tract2020M <- tracts(state = 'MD', year = 2020, cb = TRUE)
+tract2020V <- tracts(state = 'VA', year = 2020, cb = TRUE)
+tract2020W <- tracts(state = 'WV', year = 2020, cb = TRUE)
+tracts2020DMVW <- rbind(tract2020D, tract2020M, tract2020V, tract2020W)
+
+# Join the NDI (Powell-Wiley) values to the census tract geometry
+DMVW2020pw <- tracts2020DMVW %>%
+  left_join(powell_wiley2020DMVW$ndi, by = 'GEOID')
+
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) 
+## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., census tracts 
+## Continuous Index
+ggplot() +
+  geom_sf(
+    data = DMVW2020pw,
+    aes(fill = NDI),
+    color = NA
+  ) +
+  geom_sf(
+    data = county2020DMVW,
+    fill = 'transparent',
+    color = 'white'
+  ) +
+  theme_minimal() +
+  scale_fill_viridis_c(na.value = 'grey80') +
+  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
+  ggtitle(
+    'Neighborhood Deprivation Index (Powell-Wiley)',
+    subtitle = 'DC, MD, VA, and WV tracts as the referent'
+  )
+
+## Categorical Index (Population-weighted quintiles)
+### Rename '9-NDI not avail' level as NA for plotting
+DMVW2020pw$NDIQuintNA <-
+  factor(replace(
+    as.character(DMVW2020pw$NDIQuint),
+    DMVW2020pw$NDIQuint == '9-NDI not avail',
+    NA
+  ),
+  c(levels(DMVW2020pw$NDIQuint)[-6], NA))
+
+ggplot() +
+  geom_sf(data = DMVW2020pw, aes(fill = NDIQuintNA), color = NA) +
+  geom_sf(data = county2020DMVW, fill = 'transparent', color = 'white') +
+  theme_minimal() +
+  scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') +
+  labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
+  ggtitle(
+    'Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles',
+    subtitle = 'DC, MD, VA, and WV tracts as the referent'
+  )
+

Like the NDI (Messer), we also compute county-level NDI (Powell-Wiley).

-
# Obtain the 2020 counties from the "tigris" package
-county2020DMVW <- tigris::counties(state = c("DC", "MD", "VA", "WV"), year = 2020, cb = TRUE)
-
-# NDI (Powell-Wiley) at the county level (2016-2020)
-powell_wiley2020DMVW_county <- ndi::powell_wiley(geo = "county",
-                                                 state = c("DC", "MD", "VA", "WV"),
-                                                 year = 2020)
-
-# Join the NDI (Powell-Wiley) values to the county geometry
-DMVW2020pw_county <- dplyr::left_join(county2020DMVW, powell_wiley2020DMVW_county$ndi, by = "GEOID")
-
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS)
-## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., counties
-## Continuous Index
-ggplot2::ggplot() + 
-  ggplot2::geom_sf(data = DMVW2020pw_county, 
-                   ggplot2::aes(fill = NDI),
-                   size = 0.20,
-                   color = "white") +
-  ggplot2::theme_minimal() + 
-  ggplot2::scale_fill_viridis_c() +
-  ggplot2::labs(fill = "Index (Continuous)",
-                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
-  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)",
-                   subtitle = "DC, MD, VA, and WV counties as the referent")
-
-## Categorical Index
-
-### Rename "9-NDI not avail" level as NA for plotting
-DMVW2020pw_county$NDIQuintNA <- factor(replace(as.character(DMVW2020pw_county$NDIQuint), 
-                                            DMVW2020pw_county$NDIQuint == "9-NDI not avail", NA),
-                                         c(levels(DMVW2020pw_county$NDIQuint)[-6], NA))
-
-ggplot2::ggplot() + 
-  ggplot2::geom_sf(data = DMVW2020pw_county, 
-                   ggplot2::aes(fill = NDIQuint),
-                   size = 0.20,
-                   color = "white") +
-  ggplot2::theme_minimal() + 
-  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
-                                na.value = "grey80") +
-  ggplot2::labs(fill = "Index (Categorical)",
-                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
-  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles",
-                   subtitle = "DC, MD, VA, and WV counties as the referent")
-

+
# Obtain the 2020 counties from the 'tigris' package
+county2020DMVW <- counties(state = c('DC', 'MD', 'VA', 'WV'), year = 2020, cb = TRUE)
+
+# NDI (Powell-Wiley) at the county level (2016-2020)
+powell_wiley2020DMVW_county <- powell_wiley(
+  geo = 'county',
+  state = c('DC', 'MD', 'VA', 'WV'),
+  year = 2020
+)
+
+# Join the NDI (Powell-Wiley) values to the county geometry
+DMVW2020pw_county <- county2020DMVW %>%
+  left_join(powell_wiley2020DMVW_county$ndi, by = 'GEOID')
+
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS)
+## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., counties
+## Continuous Index
+ggplot() +
+  geom_sf(
+    data = DMVW2020pw_county,
+    aes(fill = NDI),
+    size = 0.20,
+    color = 'white'
+  ) +
+  theme_minimal() +
+  scale_fill_viridis_c() +
+  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
+  ggtitle(
+    'Neighborhood Deprivation Index (Powell-Wiley)',
+    subtitle = 'DC, MD, VA, and WV counties as the referent'
+  )
+
+## Categorical Index
+
+### Rename '9-NDI not avail' level as NA for plotting
+DMVW2020pw_county$NDIQuintNA <-
+  factor(
+    replace(
+      as.character(DMVW2020pw_county$NDIQuint),
+      DMVW2020pw_county$NDIQuint == '9-NDI not avail',
+      NA
+    ),
+    c(levels(DMVW2020pw_county$NDIQuint)[-6], NA)
+  )
+
+ggplot() +
+  geom_sf(
+    data = DMVW2020pw_county,
+    aes(fill = NDIQuint),
+    size = 0.20,
+    color = 'white'
+  ) +
+  theme_minimal() +
+  scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') +
+  labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
+  ggtitle(
+    'Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles',
+    subtitle = 'DC, MD, VA, and WV counties as the referent'
+  )
+

Advanced Features

@@ -940,84 +992,109 @@

Imputing missing census variables

pca() function in the psych package called within the messer() and powell_wiley() functions. Impute values using the logical imp argument -(currently only calls impute = "median" by default, which +(currently only calls impute = 'median' by default, which assigns the median values of each missing census variable for a geography).

-
powell_wiley2020DC <- ndi::powell_wiley(state = "DC", year = 2020) # without imputation
-powell_wiley2020DCi <- ndi::powell_wiley(state = "DC", year = 2020, imp = TRUE) # with imputation
-
-table(is.na(powell_wiley2020DC$ndi$NDI)) # n=13 tracts without NDI (Powell-Wiley) values
-table(is.na(powell_wiley2020DCi$ndi$NDI)) # n=0 tracts without NDI (Powell-Wiley) values
-
-# Obtain the 2020 census tracts from the "tigris" package
-tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE)
-
-# Join the NDI (Powell-Wiley) values to the census tract geometry
-DC2020pw <- dplyr::left_join(tract2020DC, powell_wiley2020DC$ndi, by = "GEOID")
-DC2020pw <- dplyr::left_join(DC2020pw, powell_wiley2020DCi$ndi, by = "GEOID", suffix = c("_nonimp", "_imp"))
-
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Washington, D.C., census tracts
-## Continuous Index
-ggplot2::ggplot() + 
-  ggplot2::geom_sf(data = DC2020pw, 
-                   ggplot2::aes(fill = NDI_nonimp),
-                   size = 0.2,
-                   color = "white") +
-  ggplot2::theme_minimal() + 
-  ggplot2::scale_fill_viridis_c() +
-  ggplot2::labs(fill = "Index (Continuous)",
-                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
-  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Non-Imputed",
-                   subtitle = "DC census tracts as the referent")
-
-ggplot2::ggplot() + 
-  ggplot2::geom_sf(data = DC2020pw, 
-                   ggplot2::aes(fill = NDI_imp),
-                   size = 0.2,
-                   color = "white") +
-  ggplot2::theme_minimal() + 
-  ggplot2::scale_fill_viridis_c() +
-  ggplot2::labs(fill = "Index (Continuous)",
-                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
-  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Imputed",
-                   subtitle = "DC census tracts as the referent")
-
-## Categorical Index
-### Rename "9-NDI not avail" level as NA for plotting
-DC2020pw$NDIQuintNA_nonimp <- factor(replace(as.character(DC2020pw$NDIQuint_nonimp), 
-                                            DC2020pw$NDIQuint_nonimp == "9-NDI not avail", NA),
-                                         c(levels(DC2020pw$NDIQuint_nonimp)[-6], NA))
-
-ggplot2::ggplot() + 
-  ggplot2::geom_sf(data = DC2020pw, 
-                   ggplot2::aes(fill = NDIQuintNA_nonimp),
-                   size = 0.2,
-                   color = "white") +
-  ggplot2::theme_minimal() + 
-  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
-                                na.value = "grey80") +
-  ggplot2::labs(fill = "Index (Categorical)",
-                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
-  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Non-Imputed",
-                   subtitle = "DC census tracts as the referent")
-
-### Rename "9-NDI not avail" level as NA for plotting
-DC2020pw$NDIQuintNA_imp <- factor(replace(as.character(DC2020pw$NDIQuint_imp), 
-                                            DC2020pw$NDIQuint_imp == "9-NDI not avail", NA),
-                                      c(levels(DC2020pw$NDIQuint_imp)[-6], NA))
-
-ggplot2::ggplot() + 
-  ggplot2::geom_sf(data = DC2020pw, 
-                   ggplot2::aes(fill = NDIQuintNA_imp),
-                   size = 0.2,
-                   color = "white") +
-  ggplot2::theme_minimal() + 
-  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
-                                na.value = "grey80") +
-  ggplot2::labs(fill = "Index (Categorical)",
-                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
-  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Imputed",
-                   subtitle = "DC census tracts as the referent")
-

+
powell_wiley2020DC <- powell_wiley(state = 'DC', year = 2020) # without imputation
+powell_wiley2020DCi <- powell_wiley(state = 'DC', year = 2020, imp = TRUE) # with imputation
+
+table(is.na(powell_wiley2020DC$ndi$NDI)) # n=13 tracts without NDI (Powell-Wiley) values
+table(is.na(powell_wiley2020DCi$ndi$NDI)) # n=0 tracts without NDI (Powell-Wiley) values
+
+# Obtain the 2020 census tracts from the 'tigris' package
+tract2020DC <- tracts(state = 'DC', year = 2020, cb = TRUE)
+
+# Join the NDI (Powell-Wiley) values to the census tract geometry
+DC2020pw <- tract2020DC %>%
+  left_join(powell_wiley2020DC$ndi, by = 'GEOID')
+DC2020pw <- DC2020pw %>%
+  left_join(powell_wiley2020DCi$ndi, by = 'GEOID', suffix = c('_nonimp', '_imp'))
+
# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for 
+## Washington, D.C., census tracts
+## Continuous Index
+ggplot() +
+  geom_sf(
+    data = DC2020pw,
+    aes(fill = NDI_nonimp),
+    size = 0.2,
+    color = 'white'
+  ) +
+  theme_minimal() +
+  scale_fill_viridis_c() +
+  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
+  ggtitle(
+    'Neighborhood Deprivation Index (Powell-Wiley), Non-Imputed',
+    subtitle = 'DC census tracts as the referent'
+  )
+
+ggplot() +
+  geom_sf(
+    data = DC2020pw,
+    aes(fill = NDI_imp),
+    size = 0.2,
+    color = 'white'
+  ) +
+  theme_minimal() +
+  scale_fill_viridis_c() +
+  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
+  ggtitle(
+    'Neighborhood Deprivation Index (Powell-Wiley), Imputed',
+    subtitle = 'DC census tracts as the referent'
+  )
+
+## Categorical Index
+### Rename '9-NDI not avail' level as NA for plotting
+DC2020pw$NDIQuintNA_nonimp <-
+  factor(
+    replace(
+      as.character(DC2020pw$NDIQuint_nonimp),
+      DC2020pw$NDIQuint_nonimp == '9-NDI not avail',
+      NA
+    ),
+    c(levels(DC2020pw$NDIQuint_nonimp)[-6], NA)
+  )
+
+ggplot() +
+  geom_sf(
+    data = DC2020pw,
+    aes(fill = NDIQuintNA_nonimp),
+    size = 0.2,
+    color = 'white'
+  ) +
+  theme_minimal() +
+  scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') +
+  labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
+  ggtitle(
+    'Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Non-Imputed',
+    subtitle = 'DC census tracts as the referent'
+  )
+
+### Rename '9-NDI not avail' level as NA for plotting
+DC2020pw$NDIQuintNA_imp <-
+  factor(
+    replace(
+      as.character(DC2020pw$NDIQuint_imp),
+      DC2020pw$NDIQuint_imp == '9-NDI not avail',
+      NA
+    ),
+    c(levels(DC2020pw$NDIQuint_imp)[-6], NA)
+  )
+
+ggplot() +
+  geom_sf(
+    data = DC2020pw,
+    aes(fill = NDIQuintNA_imp),
+    size = 0.2,
+    color = 'white'
+  ) +
+  theme_minimal() +
+  scale_fill_viridis_d(guide = guide_legend(reverse = TRUE), na.value = 'grey80') +
+  labs(fill = 'Index (Categorical)', caption = 'Source: U.S. Census ACS 2016-2020 estimates') +
+  ggtitle(
+    'Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Imputed',
+    subtitle = 'DC census tracts as the referent'
+  )
+

Assign the referent (U.S.-Standardized Metric)

@@ -1029,25 +1106,30 @@

Assign the referent (U.S.-Standardized Metric)

NDI (Powell-Wiley) values (2006-2010 ACS-5) found in Andrews et al. (2020) change the year argument to 2010 (i.e., year = 2010).

-
us <- tigris::states()
-n51 <- c("Commonwealth of the Northern Mariana Islands", "Guam", "American Samoa",
-         "Puerto Rico", "United States Virgin Islands")
-y51 <- us$STUSPS[!(us$NAME %in% n51)]
-
-start_time <- Sys.time() # record start time
-powell_wiley2017US <- ndi::powell_wiley(state = y51, year = 2017)
-end_time <- Sys.time() # record end time
-time_srr <- end_time - start_time # Calculate run time
-
ggplot2::ggplot(powell_wiley2017US$ndi, 
-                ggplot2::aes(x = NDI)) +
-  ggplot2::geom_histogram(color = "black",
-                          fill = "white") + 
-  ggplot2::theme_minimal() +
-  ggplot2::ggtitle("Histogram of US-standardized NDI (Powell-Wiley) values (2013-2017)",
-                   subtitle = "U.S. census tracts as the referent (including AK, HI, and DC)")
+
us <- states()
+n51 <- c(
+  'Commonwealth of the Northern Mariana Islands',
+  'Guam',
+  'American Samoa',
+  'Puerto Rico',
+  'United States Virgin Islands'
+)
+y51 <- us$STUSPS[!(us$NAME %in% n51)]
+
+start_time <- Sys.time() # record start time
+powell_wiley2017US <- powell_wiley(state = y51, year = 2017)
+end_time <- Sys.time() # record end time
+time_srr <- end_time - start_time # Calculate run time
+
ggplot(powell_wiley2017US$ndi, aes(x = NDI)) +
+  geom_histogram(color = 'black', fill = 'white') +
+  theme_minimal() +
+  ggtitle(
+    'Histogram of US-standardized NDI (Powell-Wiley) values (2013-2017)',
+    subtitle = 'U.S. census tracts as the referent (including AK, HI, and DC)'
+  )

The process to compute a US-standardized NDI (Powell-Wiley) took -about 2.5 minutes to run on a machine with the features listed at the +about 2.7 minutes to run on a machine with the features listed at the end of the vignette.

@@ -1087,6 +1169,9 @@

Additional metrics socio-economic deprivation and disparity

  • bemanian_beyer() function that computes the aspatial racial/ethnic Local Exposure and Isolation metric based on Bemanian & Beyer (2017)
  • +
  • hoover() function that computes the aspatial +racial/ethnic Delta based on Hoover (1941) and +Duncan et al. (1961; LC:60007089)
  • Compute Racial Isolation Index (RI)

    @@ -1223,34 +1308,40 @@

    Compute Racial Isolation Index (RI)

    neighbors) that is nearly none of its population who identify with the specified race/ethnicity subgroup(s) (e.g., not Not Hispanic or Latino, Black or African American alone) will have an RI value close to 0.

    -
    anthopolos2010NC <- ndi::anthopolos(state = "NC", year = 2010, subgroup = "NHoLB")
    -
    -# Obtain the 2010 census tracts from the "tigris" package
    -tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
    -# Remove first 9 characters from GEOID for compatibility with tigris information
    -tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) 
    -
    -# Obtain the 2010 counties from the "tigris" package
    -county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE)
    -
    -# Join the RI values to the census tract geometry
    -NC2010anthopolos <- dplyr::left_join(tract2010NC, anthopolos2010NC$ri, by = "GEOID")
    -
    # Visualize the RI values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts 
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = NC2010anthopolos, 
    -                   ggplot2::aes(fill = RI),
    -                   size = 0.05,
    -                   color = "transparent") +
    -   ggplot2::geom_sf(data = county2010NC,
    -                   fill = "transparent", 
    -                   color = "white",
    -                   size = 0.2) +
    -  ggplot2::theme_minimal() +
    -  ggplot2::scale_fill_viridis_c() +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
    -  ggplot2::ggtitle("Racial Isolation Index (Anthopolos), non-Hispanic Black",
    -                   subtitle = "NC census tracts (not corrected for edge effects)")
    +
    anthopolos2010NC <- anthopolos(state = 'NC', year = 2010, subgroup = 'NHoLB')
    +
    +# Obtain the 2010 census tracts from the 'tigris' package
    +tract2010NC <- tracts(state = 'NC', year = 2010, cb = TRUE)
    +# Remove first 9 characters from GEOID for compatibility with tigris information
    +tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) 
    +
    +# Obtain the 2010 counties from the 'tigris' package
    +county2010NC <- counties(state = 'NC', year = 2010, cb = TRUE)
    +
    +# Join the RI values to the census tract geometry
    +NC2010anthopolos <- tract2010NC %>%
    +  left_join(anthopolos2010NC$ri, by = 'GEOID')
    +
    # Visualize the RI values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts 
    +ggplot() +
    +  geom_sf(
    +    data = NC2010anthopolos,
    +    aes(fill = RI),
    +    size = 0.05,
    +    color = 'transparent'
    +  ) +
    +  geom_sf(
    +    data = county2010NC,
    +    fill = 'transparent',
    +    color = 'white',
    +    size = 0.2
    +  ) +
    +  theme_minimal() +
    +  scale_fill_viridis_c() +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
    +  ggtitle(
    +    'Racial Isolation Index (Anthopolos), non-Hispanic Black',
    +    subtitle = 'NC census tracts (not corrected for edge effects)'
    +  )

    The current version of the ndi package does not correct for edge effects (e.g., census geographies along the specified spatial @@ -1260,45 +1351,55 @@

    Compute Racial Isolation Index (RI)

    of edge effect is to compute the RI for neighboring census geographies (i.e., the states bordering a study area of interest) and then use the estimates of the study area of interest.

    -
    # Compute RI for all census tracts in neighboring states
    -anthopolos2010GNSTV <- ndi::anthopolos(state = c("GA", "NC", "SC", "TN", "VA"),
    -                                     year = 2010, subgroup = "NHoLB")
    -
    -# Crop to only North Carolina, U.S.A. census tracts
    -anthopolos2010NCe <- anthopolos2010GNSTV$ri[anthopolos2010GNSTV$ri$GEOID %in% anthopolos2010NC$ri$GEOID, ]
    -
    -# Obtain the 2010 census tracts from the "tigris" package
    -tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
    -# Remove first 9 characters from GEOID for compatibility with tigris information
    -tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) 
    -
    -# Obtain the 2010 counties from the "tigris" package
    -county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE)
    -
    -# Join the RI values to the census tract geometry
    -edgeNC2010anthopolos <- dplyr::left_join(tract2010NC, anthopolos2010NCe, by = "GEOID")
    -
    # Visualize the RI values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts 
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = edgeNC2010anthopolos, 
    -                   ggplot2::aes(fill = RI),
    -                   size = 0.05,
    -                   color = "transparent") +
    -   ggplot2::geom_sf(data = county2010NC,
    -                   fill = "transparent", 
    -                   color = "white",
    -                   size = 0.2) +
    -  ggplot2::theme_minimal() +
    -  ggplot2::scale_fill_viridis_c() +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
    -  ggplot2::ggtitle("Racial Isolation Index (Anthopolos), non-Hispanic Black",
    -                   subtitle = "NC census tracts (corrected for interstate edge effects)")
    +
    # Compute RI for all census tracts in neighboring states
    +anthopolos2010GNSTV <- anthopolos(
    +  state = c('GA', 'NC', 'SC', 'TN', 'VA'),
    +  year = 2010,
    +  subgroup = 'NHoLB'
    +)
    +
    +# Crop to only North Carolina, U.S.A. census tracts
    +anthopolos2010NCe <- anthopolos2010GNSTV$ri[anthopolos2010GNSTV$ri$GEOID %in% 
    +                                              anthopolos2010NC$ri$GEOID, ]
    +
    +# Obtain the 2010 census tracts from the 'tigris' package
    +tract2010NC <- tracts(state = 'NC', year = 2010, cb = TRUE)
    +# Remove first 9 characters from GEOID for compatibility with tigris information
    +tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) 
    +
    +# Obtain the 2010 counties from the 'tigris' package
    +county2010NC <- counties(state = 'NC', year = 2010, cb = TRUE)
    +
    +# Join the RI values to the census tract geometry
    +edgeNC2010anthopolos <- tract2010NC %>% 
    +  left_join(anthopolos2010NCe, by = 'GEOID')
    +
    # Visualize the RI values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts 
    +ggplot() +
    +  geom_sf(
    +    data = edgeNC2010anthopolos,
    +    aes(fill = RI),
    +    size = 0.05,
    +    color = 'transparent'
    +  ) +
    +  geom_sf(
    +    data = county2010NC,
    +    fill = 'transparent',
    +    color = 'white',
    +    size = 0.2
    +  ) +
    +  theme_minimal() +
    +  scale_fill_viridis_c() +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
    +  ggtitle(
    +    'Racial Isolation Index (Anthopolos), non-Hispanic Black',
    +    subtitle = 'NC census tracts (corrected for interstate edge effects)'
    +  )

    Compute Educational Isolation Index (EI)

    Compute the spatial EI (Bravo) values (2006-2010 5-year ACS) for -North Carolina, U.S.A., census tracts. This metric is based on Bravo et al. (2021) +Oklahoma, U.S.A., census tracts. This metric is based on Bravo et al. (2021) that assessed the educational isolation of the population without a four-year college degree. Multiple educational attainment categories are available in the bravo() function, including:

    @@ -1343,7 +1444,7 @@

    Compute Educational Isolation Index (EI)

    -

    Note: The ACS-5 data (2005-2009) uses the “B15002” question.

    +

    Note: The ACS-5 data (2005-2009) uses the ‘B15002’ question.

    A census geography (and its neighbors) that has nearly all of its population with the specified educational attainment category (e.g., a four-year college degree or more) will have an EI (Bravo) value close to @@ -1351,45 +1452,51 @@

    Compute Educational Isolation Index (EI)

    none of its population with the specified educational attainment category (e.g., with a four-year college degree) will have an EI (Bravo) value close to 0.

    -
    bravo2010NC <- ndi::bravo(state = "NC", year = 2010, subgroup = c("LtHS", "HSGiE", "SCoAD"))
    -
    -# Obtain the 2010 census tracts from the "tigris" package
    -tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
    -# Remove first 9 characters from GEOID for compatibility with tigris information
    -tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) 
    -
    -# Obtain the 2010 counties from the "tigris" package
    -county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE)
    -
    -# Join the EI (Bravo) values to the census tract geometry
    -NC2010bravo <- dplyr::left_join(tract2010NC, bravo2010NC$ei, by = "GEOID")
    -
    # Visualize the EI (Bravo) values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts 
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = NC2010bravo, 
    -                   ggplot2::aes(fill = EI),
    -                   size = 0.05,
    -                   color = "transparent") +
    -   ggplot2::geom_sf(data = county2010NC,
    -                   fill = "transparent", 
    -                   color = "white",
    -                   size = 0.2) +
    -  ggplot2::theme_minimal() +
    -  ggplot2::scale_fill_viridis_c() +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
    -  ggplot2::ggtitle("Educational Isolation Index (Bravo), without a four-year college degree",
    -                   subtitle = "NC census tracts (not corrected for edge effects)")
    -

    +
    bravo2010OK <- bravo(state = 'OK', year = 2010, subgroup = c('LtHS', 'HSGiE', 'SCoAD'))
    +
    +# Obtain the 2010 census tracts from the 'tigris' package
    +tract2010OK <- tracts(state = 'OK', year = 2010, cb = TRUE)
    +# Remove first 9 characters from GEOID for compatibility with tigris information
    +tract2010OK$GEOID <- substring(tract2010OK$GEO_ID, 10) 
    +
    +# Obtain the 2010 counties from the 'tigris' package
    +county2010OK <- counties(state = 'OK', year = 2010, cb = TRUE)
    +
    +# Join the EI (Bravo) values to the census tract geometry
    +OK2010bravo <- tract2010OK %>%
    +  left_join(bravo2010OK$ei, by = 'GEOID')
    +
    # Visualize the EI (Bravo) values (2006-2010 5-year ACS) for Oklahoma, U.S.A., census tracts 
    +ggplot() +
    +  geom_sf(
    +    data = OK2010bravo,
    +    aes(fill = EI),
    +    size = 0.05,
    +    color = 'transparent'
    +  ) +
    +  geom_sf(
    +    data = county2010OK,
    +    fill = 'transparent',
    +    color = 'white',
    +    size = 0.2
    +  ) +
    +  theme_minimal() +
    +  scale_fill_viridis_c(limits = c(0, 1)) +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
    +  ggtitle(
    +    'Educational Isolation Index (Bravo), without a four-year college degree',
    +    subtitle = 'OK census tracts (not corrected for edge effects)'
    +  )
    +

    Can correct one source of edge effect in the same manner as shown for the RI metric.

    Retrieve the Gini Index

    Retrieve the aspatial Gini Index values (2006-2010 5-year ACS) for -North Carolina, U.S.A., census tracts. This metric is based on Gini (1921), and the +Massachusetts, U.S.A., census tracts. This metric is based on Gini (1921), and the gini() function retrieves the estimate from the ACS-5.

    According to the U.S. -Census Bureau: “The Gini Index is a summary measure of income +Census Bureau: ‘The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from @@ -1397,36 +1504,39 @@

    Retrieve the Gini Index

    to 1, perfect inequality (where only one recipient or group of recipients receives all the income). The Gini is based on the difference between the Lorenz curve (the observed cumulative income distribution) -and the notion of a perfectly equal income distribution.”

    -
    gini2010NC <- ndi::gini(state = "NC", year = 2010)
    -
    -# Obtain the 2010 census tracts from the "tigris" package
    -tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
    -# Remove first 9 characters from GEOID for compatibility with tigris information
    -tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) 
    -
    -# Obtain the 2010 counties from the "tigris" package
    -county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE)
    -
    -# Join the Gini Index values to the census tract geometry
    -NC2010gini <- dplyr::left_join(tract2010NC, gini2010NC$gini, by = "GEOID")
    -
    # Visualize the Gini Index values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts 
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = NC2010gini, 
    -                   ggplot2::aes(fill = gini),
    -                   size = 0.05,
    -                   color = "transparent") +
    -   ggplot2::geom_sf(data = county2010NC,
    -                   fill = "transparent", 
    -                   color = "white",
    -                   size = 0.2) +
    -  ggplot2::theme_minimal() +
    -  ggplot2::scale_fill_viridis_c() +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
    -  ggplot2::ggtitle("Gini Index",
    -                   subtitle = "NC census tracts")
    -

    +and the notion of a perfectly equal income distribution.’

    +
    gini2010MA <- gini(state = 'MA', year = 2010)
    +
    +# Obtain the 2010 census tracts from the 'tigris' package
    +tract2010MA <- tracts(state = 'MA', year = 2010, cb = TRUE)
    +# Remove first 9 characters from GEOID for compatibility with tigris information
    +tract2010MA$GEOID <- substring(tract2010MA$GEO_ID, 10) 
    +
    +# Obtain the 2010 counties from the 'tigris' package
    +county2010MA <- counties(state = 'MA', year = 2010, cb = TRUE)
    +
    +# Join the Gini Index values to the census tract geometry
    +MA2010gini <- tract2010MA %>%
    +  left_join(gini2010MA$gini, by = 'GEOID')
    +
    # Visualize the Gini Index values (2006-2010 5-year ACS) for Massachusetts, U.S.A., census tracts 
    +ggplot() +
    +  geom_sf(
    +    data = MA2010gini,
    +    aes(fill = gini),
    +    size = 0.05,
    +    color = 'transparent'
    +  ) +
    +  geom_sf(
    +    data = county2010MA,
    +    fill = 'transparent',
    +    color = 'white',
    +    size = 0.2
    +  ) +
    +  theme_minimal() +
    +  scale_fill_viridis_c() +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
    +  ggtitle('Gini Index', subtitle = 'MA census tracts')
    +

    @@ -1457,28 +1567,28 @@

    Index of Concentration at the Extremes (ICE)

    B19001 -Income, “ICE_inc” +Income, ‘ICE_inc’ 80th income percentile vs. 20th income percentile B15002 -Education, “ICE_edu” +Education, ‘ICE_edu’ less than high school vs. four-year college degree or more B03002 -Race/Ethnicity, “ICE_rewb” +Race/Ethnicity, ‘ICE_rewb’ 80th income percentile vs. 20th income percentile B19001 & B19001B & B19001H -Income and race/ethnicity combined, “ICE_wbinc” +Income and race/ethnicity combined, ‘ICE_wbinc’ white non-Hispanic in 80th income percentile vs. black alone (including Hispanic) in 20th income percentile B19001 & B19001H -Income and race/ethnicity combined, “ICE_wpcinc” +Income and race/ethnicity combined, ‘ICE_wpcinc’ white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile @@ -1490,81 +1600,123 @@

    Index of Concentration at the Extremes (ICE)

    or (2) an equal number of persons are in the most privileged and most deprived categories, and in both cases indicates that the area is not dominated by extreme concentrations of either of the two groups.

    -
    ice2020WC <- krieger(state = "MI", county = "Wayne", year = 2010)
    -
    -# Obtain the 2010 census tracts from the "tigris" package
    -tract2010WC <- tigris::tracts(state = "MI", county = "Wayne", year = 2010, cb = TRUE)
    -# Remove first 9 characters from GEOID for compatibility with tigris information
    -tract2010WC$GEOID <- substring(tract2010WC$GEO_ID, 10) 
    -
    -# Join the ICE values to the census tract geometry
    -ice2020WC <- dplyr::left_join(tract2010WC, ice2020WC$ice, by = "GEOID")
    -
    # Plot ICE for Income
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = ice2020WC, 
    -                   ggplot2::aes(fill = ICE_inc),
    -                   color = "white",
    -                   size = 0.05) +
    -  ggplot2::theme_bw() + 
    -  ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1,1)) +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2006-2010 estimates")+
    -  ggplot2::ggtitle("Index of Concentration at the Extremes\nIncome (Krieger)",
    -                   subtitle = "80th income percentile vs. 20th income percentile")
    -
    -# Plot ICE for Education
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = ice2020WC, 
    -                   ggplot2::aes(fill = ICE_edu),
    -                   color = "white",
    -                   size = 0.05) +
    -  ggplot2::theme_bw() + 
    -  ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1,1)) +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2006-2010 estimates")+
    -  ggplot2::ggtitle("Index of Concentration at the Extremes\nEducation (Krieger)",
    -                   subtitle = "less than high school vs. four-year college degree or more")
    -
    -# Plot ICE for Race/Ethnicity
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = ice2020WC, 
    -                   ggplot2::aes(fill = ICE_rewb),
    -                   color = "white",
    -                   size = 0.05) +
    -  ggplot2::theme_bw() + 
    -  ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1, 1)) +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2006-2010 estimates")+
    -  ggplot2::ggtitle("Index of Concentration at the Extremes\nRace/Ethnicity (Krieger)",
    -                   subtitle = "white non-Hispanic vs. black non-Hispanic")
    -
    -# Plot ICE for Income and Race/Ethnicity Combined
    -## white non-Hispanic in 80th income percentile vs. black (including Hispanic) in 20th income percentile
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = ice2020WC, 
    -                   ggplot2::aes(fill = ICE_wbinc),
    -                   color = "white",
    -                   size = 0.05) +
    -  ggplot2::theme_bw() + 
    -  ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1, 1)) +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2006-2010 estimates")+
    -  ggplot2::ggtitle("Index of Concentration at the Extremes\nIncome & race/ethnicity combined (Krieger)",
    -                   subtitle = "white non-Hispanic in 80th inc ptcl vs. black alone in 20th inc pctl")
    -
    -# Plot ICE for Income and Race/Ethnicity Combined
    -## white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = ice2020WC, 
    -                   ggplot2::aes(fill = ICE_wpcinc),
    -                   color = "white",
    -                   size = 0.05) +
    -  ggplot2::theme_bw() + 
    -  ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340", limits = c(-1, 1)) +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2006-2010 estimates")+
    -  ggplot2::ggtitle("Index of Concentration at the Extremes\nIncome & race/ethnicity combined (Krieger)",
    -                   subtitle = "white non-Hispanic (WNH) in 80th inc pctl vs. WNH in 20th inc pctl")
    +
    ice2020WC <- krieger(state = 'MI', county = 'Wayne', year = 2010)
    +
    +# Obtain the 2010 census tracts from the 'tigris' package
    +tract2010WC <- tracts(state = 'MI', county = 'Wayne', year = 2010, cb = TRUE)
    +# Remove first 9 characters from GEOID for compatibility with tigris information
    +tract2010WC$GEOID <- substring(tract2010WC$GEO_ID, 10) 
    +
    +# Join the ICE values to the census tract geometry
    +ice2020WC <- tract2010WC %>%
    +  left_join(ice2020WC$ice, by = 'GEOID')
    +
    # Plot ICE for Income
    +ggplot() +
    +  geom_sf(
    +    data = ice2020WC,
    +    aes(fill = ICE_inc),
    +    color = 'white',
    +    size = 0.05
    +  ) +
    +  theme_bw() +
    +  scale_fill_gradient2(
    +    low = '#998ec3',
    +    mid = '#f7f7f7',
    +    high = '#f1a340',
    +    limits = c(-1, 1)
    +  ) +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
    +  ggtitle(
    +    'Index of Concentration at the Extremes\nIncome (Krieger)',
    +    subtitle = '80th income percentile vs. 20th income percentile'
    +  )
    +
    +# Plot ICE for Education
    +ggplot() +
    +  geom_sf(
    +    data = ice2020WC,
    +    aes(fill = ICE_edu),
    +    color = 'white',
    +    size = 0.05
    +  ) +
    +  theme_bw() +
    +  scale_fill_gradient2(
    +    low = '#998ec3',
    +    mid = '#f7f7f7',
    +    high = '#f1a340',
    +    limits = c(-1, 1)
    +  ) +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
    +  ggtitle(
    +    'Index of Concentration at the Extremes\nEducation (Krieger)',
    +    subtitle = 'less than high school vs. four-year college degree or more'
    +  )
    +
    +# Plot ICE for Race/Ethnicity
    +ggplot() +
    +  geom_sf(
    +    data = ice2020WC,
    +    aes(fill = ICE_rewb),
    +    color = 'white',
    +    size = 0.05
    +  ) +
    +  theme_bw() +
    +  scale_fill_gradient2(
    +    low = '#998ec3',
    +    mid = '#f7f7f7',
    +    high = '#f1a340',
    +    limits = c(-1, 1)
    +  ) +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
    +  ggtitle(
    +    'Index of Concentration at the Extremes\nRace/Ethnicity (Krieger)',
    +    subtitle = 'white non-Hispanic vs. black non-Hispanic'
    +  )
    +
    +# Plot ICE for Income and Race/Ethnicity Combined
    +## white non-Hispanic in 80th income percentile vs. 
    +## black (including Hispanic) in 20th income percentile
    +ggplot() +
    +  geom_sf(
    +    data = ice2020WC,
    +    aes(fill = ICE_wbinc),
    +    color = 'white',
    +    size = 0.05
    +  ) +
    +  theme_bw() +
    +  scale_fill_gradient2(
    +    low = '#998ec3',
    +    mid = '#f7f7f7',
    +    high = '#f1a340',
    +    limits = c(-1, 1)
    +  ) +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
    +  ggtitle(
    +    'Index of Concentration at the Extremes\nIncome & race/ethnicity combined (Krieger)',
    +    subtitle = 'white non-Hispanic in 80th inc ptcl vs. black alone in 20th inc pctl'
    +  )
    +
    +# Plot ICE for Income and Race/Ethnicity Combined
    +## white non-Hispanic in 80th income percentile vs. white non-Hispanic in 20th income percentile
    +ggplot() +
    +  geom_sf(
    +    data = ice2020WC,
    +    aes(fill = ICE_wpcinc),
    +    color = 'white',
    +    size = 0.05
    +  ) +
    +  theme_bw() +
    +  scale_fill_gradient2(
    +    low = '#998ec3',
    +    mid = '#f7f7f7',
    +    high = '#f1a340',
    +    limits = c(-1, 1)
    +  ) +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
    +  ggtitle(
    +    'Index of Concentration at the Extremes\nIncome & race/ethnicity combined (Krieger)',
    +    subtitle = 'white non-Hispanic (WNH) in 80th inc pctl vs. WNH in 20th inc pctl'
    +  )

    Compute racial/ethnic Dissimilarity Index (DI)

    @@ -1705,32 +1857,44 @@

    Compute racial/ethnic Dissimilarity Index (DI)

    subgroup members that would have to change their area of residence to achieve an even distribution within the larger geographical area under conditions of maximum segregation.

    -
    duncan2010PA <- ndi::duncan(geo_large = "county", geo_small = "tract", state = "PA",
    -                            year = 2010, subgroup = "NHoLB", subgroup_ref = "NHoLW")
    -
    -# Obtain the 2010 census counties from the "tigris" package
    -county2010PA <- tigris::counties(state = "PA", year = 2010, cb = TRUE)
    -# Remove first 9 characters from GEOID for compatibility with tigris information
    -county2010PA$GEOID <- substring(county2010PA$GEO_ID, 10) 
    -
    -# Join the DI values to the county geometry
    -PA2010duncan <- dplyr::left_join(county2010PA, duncan2010PA$di, by = "GEOID")
    -
    # Visualize the DI values (2006-2010 5-year ACS) for Pennsylvania, U.S.A., counties 
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = PA2010duncan, 
    -                   ggplot2::aes(fill = DI),
    -                   size = 0.05,
    -                   color = "white") +
    -   ggplot2::geom_sf(data = county2010PA,
    -                    fill = "transparent", 
    -                    color = "white",
    -                    size = 0.2) +
    -  ggplot2::theme_minimal() +
    -  ggplot2::scale_fill_viridis_c(limits = c(0, 1)) +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
    -  ggplot2::ggtitle("Dissimilarity Index (Duncan & Duncan)\nPennsylvania census tracts to counties",
    -                   subtitle = "Black non-Hispanic vs. white non-Hispanic")
    +
    duncan2010PA <- duncan(
    +  geo_large = 'county',
    +  geo_small = 'tract',
    +  state = 'PA',
    +  year = 2010,
    +  subgroup = 'NHoLB',
    +  subgroup_ref = 'NHoLW'
    +)
    +
    +# Obtain the 2010 census counties from the 'tigris' package
    +county2010PA <- counties(state = 'PA', year = 2010, cb = TRUE)
    +# Remove first 9 characters from GEOID for compatibility with tigris information
    +county2010PA$GEOID <- substring(county2010PA$GEO_ID, 10) 
    +
    +# Join the DI values to the county geometry
    +PA2010duncan <- county2010PA %>%
    +  left_join(duncan2010PA$di, by = 'GEOID')
    +
    # Visualize the DI values (2006-2010 5-year ACS) for Pennsylvania, U.S.A., counties 
    +ggplot() +
    +  geom_sf(
    +    data = PA2010duncan,
    +    aes(fill = DI),
    +    size = 0.05,
    +    color = 'white'
    +  ) +
    +  geom_sf(
    +    data = county2010PA,
    +    fill = 'transparent',
    +    color = 'white',
    +    size = 0.2
    +  ) +
    +  theme_minimal() +
    +  scale_fill_viridis_c(limits = c(0, 1)) +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2006-2010 estimates') +
    +  ggtitle(
    +    'Dissimilarity Index (Duncan & Duncan)\nPennsylvania census tracts to counties',
    +    subtitle = 'Black non-Hispanic vs. white non-Hispanic'
    +  )

    @@ -1741,8 +1905,8 @@

    Compute aspatial income or racial/ethnic Atkinson Index (AI)

    the distribution of income within 12 counties but has since been adapted to study racial/ethnic segregation (see James & Taeuber 1985). To compare median household income, specify -subgroup = "MedHHInc" which will use the ACS-5 variable -“B19013_001” in the computation. Multiple racial/ethnic subgroups are +subgroup = 'MedHHInc' which will use the ACS-5 variable +‘B19013_001’ in the computation. Multiple racial/ethnic subgroups are available in the atkinson() function, including:

    @@ -1877,48 +2041,60 @@

    Compute aspatial income or racial/ethnic Atkinson Index (AI)

    (i.e., whether the index should take greater account of differences among areas of over- or under-representation). The epsilon argument must have values between 0 and 1.0. For -0 <= epsilon < 0.5 or less “inequality-averse,” +0 <= epsilon < 0.5 or less ‘inequality-averse,’ smaller geographical units with a subgroup proportion smaller than the subgroup proportion of the larger geographical unit contribute more to -inequality (“over-representation”). For -0.5 < epsilon <= 1.0 or more “inequality-averse,” +inequality (‘over-representation’). For +0.5 < epsilon <= 1.0 or more ‘inequality-averse,’ smaller geographical units with a subgroup proportion larger than the subgroup proportion of the larger geographical unit contribute more to -inequality (“under-representation”). If epsilon = 0.5 (the +inequality (‘under-representation’). If epsilon = 0.5 (the default), units of over- and under-representation contribute equally to the index. See Section 2.3 of Saint-Jacques et al. (2020) for one method to select epsilon. We choose epsilon = 0.67 in the example below:

    -
    atkinson2021KY <- ndi::atkinson(geo_large = "county", geo_small = "block group", state = "KY",
    -                                year = 2021, subgroup = "NHoLB", epsilon = 0.67)
    -
    -# Obtain the 2021 census counties from the "tigris" package
    -county2021KY <- tigris::counties(state = "KY", year = 2021, cb = TRUE)
    -
    -# Join the AI values to the county geometry
    -KY2021atkinson <- dplyr::left_join(county2021KY, atkinson2021KY$ai, by = "GEOID")
    -
    # Visualize the AI values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = KY2021atkinson, 
    -                   ggplot2::aes(fill = AI),
    -                   size = 0.05,
    -                   color = "white") +
    -   ggplot2::geom_sf(data = county2021KY,
    -                    fill = "transparent", 
    -                    color = "white",
    -                    size = 0.2) +
    -  ggplot2::theme_minimal() +
    -  ggplot2::scale_fill_viridis_c(limits = c(0, 1)) +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2017-2021 estimates") +
    -  ggplot2::ggtitle("Atkinson Index (Atkinson)\nKentucky census block groups to counties",
    -                   subtitle = expression(paste("Black non-Hispanic (", epsilon, " = 0.67)")))
    +
    atkinson2021KY <- atkinson(
    +  geo_large = 'county',
    +  geo_small = 'block group',
    +  state = 'KY',
    +  year = 2021,
    +  subgroup = 'NHoLB',
    +  epsilon = 0.67
    +)
    +
    +# Obtain the 2021 census counties from the 'tigris' package
    +county2021KY <- counties(state = 'KY', year = 2021, cb = TRUE)
    +
    +# Join the AI values to the county geometry
    +KY2021atkinson <- county2021KY %>% 
    +  left_join(atkinson2021KY$ai, by = 'GEOID')
    +
    # Visualize the AI values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties
    +ggplot() +
    +  geom_sf(
    +    data = KY2021atkinson,
    +    aes(fill = AI),
    +    size = 0.05,
    +    color = 'white'
    +  ) +
    +  geom_sf(
    +    data = county2021KY,
    +    fill = 'transparent',
    +    color = 'white',
    +    size = 0.2
    +  ) +
    +  theme_minimal() +
    +  scale_fill_viridis_c(limits = c(0, 1)) +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') +
    +  ggtitle(
    +    'Atkinson Index (Atkinson)\nKentucky census block groups to counties',
    +    subtitle = expression(paste('Black non-Hispanic (', epsilon, ' = 0.67)'))
    +  )

    Compute racial/ethnic Isolation Index (II)

    Compute the aspatial racial/ethnic II values (2017-2021 5-year ACS) -for Kentucky, U.S.A., counties from census block groups. This metric is +for Ohio, U.S.A., counties from census block groups. This metric is based on Shevky & Williams (1949; ISBN-13:978-0-837-15637-8) and adapted by Bell (1954). Multiple racial/ethnic subgroups are available in the @@ -2050,40 +2226,52 @@

    Compute racial/ethnic Isolation Index (II)

    isolation) when comparing smaller geographical areas to larger ones within which the smaller geographical areas are located. II can range in value from 0 to 1.

    -
    bell2021KY <- ndi::bell(geo_large = "county", geo_small = "tract", state = "KY",
    -                                year = 2021, subgroup = "NHoLB", subgroup_ixn = "NHoLW")
    -
    -# Obtain the 2021 census counties from the "tigris" package
    -county2021KY <- tigris::counties(state = "KY", year = 2021, cb = TRUE)
    -
    -# Join the II values to the county geometry
    -KY2021bell <- dplyr::left_join(county2021KY, bell2021KY$ii, by = "GEOID")
    -
    # Visualize the II values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = KY2021bell, 
    -                   ggplot2::aes(fill = II),
    -                   size = 0.05,
    -                   color = "white") +
    -   ggplot2::geom_sf(data = county2021KY,
    -                    fill = "transparent", 
    -                    color = "white",
    -                    size = 0.2) +
    -  ggplot2::theme_minimal() +
    -  ggplot2::scale_fill_viridis_c(limits = c(0, 1)) +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2017-2021 estimates") +
    -  ggplot2::ggtitle("Isolation Index (Bell)\nKentucky census tracts to counties",
    -                   subtitle = "Black non-Hispanic vs. white non-Hispanic")
    -

    +
    bell2021OH <- bell(
    +  geo_large = 'county',
    +  geo_small = 'tract',
    +  state = 'OH',
    +  year = 2021,
    +  subgroup = 'NHoLB',
    +  subgroup_ixn = 'NHoLW'
    +)
    +
    +# Obtain the 2021 census counties from the 'tigris' package
    +county2021OH <- counties(state = 'OH', year = 2021, cb = TRUE)
    +
    +# Join the II values to the county geometry
    +OH2021bell <- county2021OH %>%
    +  left_join(bell2021OH$ii, by = 'GEOID')
    +
    # Visualize the II values (2017-2021 5-year ACS) for Ohio, U.S.A., counties
    +ggplot() +
    +  geom_sf(
    +    data = OH2021bell,
    +    aes(fill = II),
    +    size = 0.05,
    +    color = 'white'
    +  ) +
    +  geom_sf(
    +    data = county2021OH,
    +    fill = 'transparent',
    +    color = 'white',
    +    size = 0.2
    +  ) +
    +  theme_minimal() +
    +  scale_fill_viridis_c(limits = c(0, 1)) +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') +
    +  ggtitle(
    +    'Isolation Index (Bell)\nOhio census tracts to counties',
    +    subtitle = 'Black non-Hispanic vs. white non-Hispanic'
    +  )
    +

    Compute Correlation Ratio (V)

    Compute the aspatial racial/ethnic V values (2017-2021 5-year ACS) -for Kentucky, U.S.A., counties from census tracts. This metric is based -on Bell (1954) and adapted -by White (1986). Multiple -racial/ethnic subgroups are available in the white() -function, including:

    +for South Carolina, U.S.A., counties from census tracts. This metric is +based on Bell (1954) and +adapted by White (1986). +Multiple racial/ethnic subgroups are available in the +white() function, including:

    @@ -2210,38 +2398,50 @@

    Compute Correlation Ratio (V)

    located. The Isolation Index is some measure of the probability that a member of one subgroup(s) will meet or interact with a member of another subgroup(s) with higher values signifying higher probability of -interaction (less isolation). V can range in value from 0 to 1.

    -
    white2021KY <- ndi::white(geo_large = "county", geo_small = "tract", state = "KY",
    -                                year = 2021, subgroup = "NHoLB")
    -
    -# Obtain the 2021 census counties from the "tigris" package
    -county2021KY <- tigris::counties(state = "KY", year = 2021, cb = TRUE)
    -
    -# Join the V values to the county geometry
    -KY2021white <- dplyr::left_join(county2021KY, white2021KY$v, by = "GEOID")
    -
    # Visualize the V values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = KY2021white, 
    -                   ggplot2::aes(fill = V),
    -                   size = 0.05,
    -                   color = "white") +
    -   ggplot2::geom_sf(data = county2021KY,
    -                    fill = "transparent", 
    -                    color = "white",
    -                    size = 0.2) +
    -  ggplot2::theme_minimal() +
    -  ggplot2::scale_fill_viridis_c(limits = c(0, 1)) +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2017-2021 estimates") +
    -  ggplot2::ggtitle("Correlation Ratio (White)\nKentucky census tracts to counties",
    -                   subtitle = "Black non-Hispanic")
    -

    +interaction (less isolation). V can range in value from -Inf to Inf.

    +
    white2021SC <- white(
    +  geo_large = 'county',
    +  geo_small = 'tract',
    +  state = 'SC',
    +  year = 2021,
    +  subgroup = 'NHoLB'
    +)
    +
    +# Obtain the 2021 census counties from the 'tigris' package
    +county2021SC <- counties(state = 'SC', year = 2021, cb = TRUE)
    +
    +# Join the V values to the county geometry
    +SC2021white <- county2021SC %>%
    +  left_join(white2021SC$v, by = 'GEOID')
    +
    # Visualize the V values (2017-2021 5-year ACS) for South Carolina, U.S.A., counties
    +ggplot() +
    +  geom_sf(
    +    data = SC2021white,
    +    aes(fill = V),
    +    size = 0.05,
    +    color = 'white'
    +  ) +
    +  geom_sf(
    +    data = county2021SC,
    +    fill = 'transparent',
    +    color = 'white',
    +    size = 0.2
    +  ) +
    +  theme_minimal() +
    +  scale_fill_viridis_c() +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') +
    +  ggtitle(
    +    'Correlation Ratio (White)\nSouth Carolina census tracts to counties',
    +    subtitle = 'Black non-Hispanic'
    +  )
    +

    Compute Location Quotient (LQ)

    Compute the aspatial racial/ethnic LQ values (2017-2021 5-year ACS) -for Kentucky, U.S.A., counties vs. the state. This metric is based on Merton (1939) and adapted by -Sudano et +for Tennessee, U.S.A., counties vs. the state. This metric is based on +Merton (1939) and adapted +by Sudano et al. (2013). Multiple racial/ethnic subgroups are available in the sudano() function, including:

    @@ -2375,34 +2575,47 @@

    Compute Location Quotient (LQ)

    larger geography. Unlike the previous metrics that aggregate to the larger geography, LQ computes values for each smaller geography relative to the larger geography.

    -
    sudano2021KY <- ndi::sudano(geo_large = "state", geo_small = "county", state = "KY",
    -                                year = 2021, subgroup = "NHoLB")
    -
    -# Obtain the 2021 census counties from the "tigris" package
    -county2021KY <- tigris::counties(state = "KY", year = 2021, cb = TRUE)
    -
    -# Join the LQ values to the county geometry
    -KY2021sudano <- dplyr::left_join(county2021KY, sudano2021KY$lq, by = "GEOID")
    -
    # Visualize the LQ values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = KY2021sudano, 
    -                   ggplot2::aes(fill = LQ),
    -                   size = 0.05,
    -                   color = "white") +
    -   ggplot2::geom_sf(data = county2021KY,
    -                    fill = "transparent", 
    -                    color = "white",
    -                    size = 0.2) +
    -  ggplot2::theme_minimal() +
    -  ggplot2::scale_fill_viridis_c(limits = c(0, 1)) +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2017-2021 estimates") +
    -  ggplot2::ggtitle("Location Quotient (Sudano)\nKentucky counties vs. state",
    -                   subtitle = "Black non-Hispanic")
    -

    -#### Compute Local Exposure and Isolation (LEx/Is)

    +
    sudano2021TN <- sudano(
    +  geo_large = 'state',
    +  geo_small = 'county',
    +  state = 'TN',
    +  year = 2021,
    +  subgroup = 'NHoLB'
    +)
    +
    +# Obtain the 2021 census counties from the 'tigris' package
    +county2021TN <- counties(state = 'TN', year = 2021, cb = TRUE)
    +
    +# Join the LQ values to the county geometry
    +TN2021sudano <- county2021TN %>% 
    +                   left_join(sudano2021TN$lq, by = 'GEOID')
    +
    # Visualize the LQ values (2017-2021 5-year ACS) for Tennessee, U.S.A., counties
    +ggplot() +
    +  geom_sf(
    +    data = TN2021sudano,
    +    aes(fill = LQ),
    +    size = 0.05,
    +    color = 'white'
    +  ) +
    +  geom_sf(
    +    data = county2021TN,
    +    fill = 'transparent',
    +    color = 'white',
    +    size = 0.2
    +  ) +
    +  theme_minimal() +
    +  scale_fill_viridis_c() +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') +
    +  ggtitle(
    +    'Location Quotient (Sudano)\nTennessee counties vs. state',
    +    subtitle = 'Black non-Hispanic'
    +  )
    +

    + +
    +

    Compute Local Exposure and Isolation (LEx/Is)

    Compute the aspatial racial/ethnic Local Exposure and Isolation -metric (2017-2021 5-year ACS) for Kentucky, U.S.A., counties vs. the +metric (2017-2021 5-year ACS) for Mississippi, U.S.A., counties vs. the state. This metric is based on Bemanian & Beyer (2017). Multiple racial/ethnic subgroups are available in the bemanian_beyer() function, including:

    @@ -2546,55 +2759,248 @@

    Compute Location Quotient (LQ)

    smaller geography relative to the larger geography. Similar to LQ (Sudano), LEx/Is computes values for each smaller geography relative to the larger geography.

    -
    bemanian_beyer2021KY <- ndi::bemanian_beyer(geo_large = "state", geo_small = "county", state = "KY",
    -                                            year = 2021, subgroup = "NHoLB", subgroup_ixn = "NHoLW")
    -
    -# Obtain the 2021 census counties from the "tigris" package
    -county2021KY <- tigris::counties(state = "KY", year = 2021, cb = TRUE)
    -
    -# Join the LEx/Is values to the county geometry
    -KY2021bemanian_beyer <- dplyr::left_join(county2021KY, bemanian_beyer2021KY$lexis, by = "GEOID")
    -
    # Visualize the LEx/Is values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = KY2021bemanian_beyer, 
    -                   ggplot2::aes(fill = LExIs),
    -                   size = 0.05,
    -                   color = "white") +
    -   ggplot2::geom_sf(data = county2021KY,
    -                    fill = "transparent", 
    -                    color = "white",
    -                    size = 0.2) +
    -  ggplot2::theme_minimal() +
    -   ggplot2::scale_fill_gradient2(low = "#998ec3", mid = "#f7f7f7", high = "#f1a340") +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2017-2021 estimates") +
    -  ggplot2::ggtitle("Local Exposure and Isolation (Bemanian & Beyer) metric\nKentucky counties vs. state",
    -                   subtitle = "Black non-Hispanic vs. White non-Hispanic")
    -

    -
    # Visualize the exponentiated LEx/Is values (2017-2021 5-year ACS) for Kentucky, U.S.A., counties
    -ggplot2::ggplot() + 
    -  ggplot2::geom_sf(data = KY2021bemanian_beyer, 
    -                   ggplot2::aes(fill = exp(LExIs)),
    -                   size = 0.05,
    -                   color = "white") +
    -   ggplot2::geom_sf(data = county2021KY,
    -                    fill = "transparent", 
    -                    color = "white",
    -                    size = 0.2) +
    -  ggplot2::theme_minimal() +
    -  ggplot2::scale_fill_viridis_c() +
    -  ggplot2::labs(fill = "Index (Continuous)",
    -                caption = "Source: U.S. Census ACS 2017-2021 estimates") +
    -  ggplot2::ggtitle("Odds ratio of Local Exposure and Isolation (Bemanian & Beyer) metric\nKentucky counties vs. state",
    -                   subtitle = "Black non-Hispanic vs. White non-Hispanic")
    -

    -
    sessionInfo()
    -
    ## R version 4.2.1 (2022-06-23 ucrt)
    -## Platform: x86_64-w64-mingw32/x64 (64-bit)
    +
    bemanian_beyer2021MS <- bemanian_beyer(
    +  geo_large = 'state',
    +  geo_small = 'county',
    +  state = 'MS',
    +  year = 2021,
    +  subgroup = 'NHoLB',
    +  subgroup_ixn = 'NHoLW'
    +)
    +
    +# Obtain the 2021 census counties from the 'tigris' package
    +county2021MS <- counties(state = 'MS', year = 2021, cb = TRUE)
    +
    +# Join the LEx/Is values to the county geometry
    +MS2021bemanian_beyer <- county2021MS %>%
    +  left_join(bemanian_beyer2021MS$lexis, by = 'GEOID')
    +
    # Visualize the LEx/Is values (2017-2021 5-year ACS) for Mississippi, U.S.A., counties
    +ggplot() +
    +  geom_sf(
    +    data = MS2021bemanian_beyer,
    +    aes(fill = LExIs),
    +    size = 0.05,
    +    color = 'white'
    +  ) +
    +  geom_sf(
    +    data = county2021MS,
    +    fill = 'transparent',
    +    color = 'white',
    +    size = 0.2
    +  ) +
    +  theme_minimal() +
    +  scale_fill_gradient2(
    +    low = '#998ec3',
    +    mid = '#f7f7f7',
    +    high = '#f1a340'
    +  ) +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') +
    +  ggtitle(
    +    'Local Exposure and Isolation (Bemanian & Beyer)\nMississippi counties vs. state',
    +    subtitle = 'Black non-Hispanic vs. White non-Hispanic'
    +  )
    +

    +
    # Visualize the exponentiated LEx/Is values (2017-2021 5-year ACS) for 
    +## Mississippi, U.S.A., counties
    +ggplot() +
    +  geom_sf(
    +    data = MS2021bemanian_beyer,
    +    aes(fill = exp(LExIs)),
    +    size = 0.05,
    +    color = 'white'
    +  ) +
    +  geom_sf(
    +    data = county2021MS,
    +    fill = 'transparent',
    +    color = 'white',
    +    size = 0.2
    +  ) +
    +  theme_minimal() +
    +  scale_fill_viridis_c() +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') +
    +  ggtitle(
    +    'Odds ratio of Local Exposure and Isolation (Bemanian & Beyer)\n
    +    Mississippi counties vs. state',
    +    subtitle = 'Black non-Hispanic vs. White non-Hispanic'
    +  )
    +

    +
    +
    +

    Compute Delta (DEL)

    +

    Compute the aspatial racial/ethnic DEL values (2017-2021 5-year ACS) +for Alabama, U.S.A., counties from census tracts. This metric is based +on Hoover (1941) +and Duncan et al. (1961; LC:60007089). Multiple racial/ethnic subgroups +are available in the hoover() function, including:

    +
    +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
    ACS table sourceracial/ethnic subgroupcharacter for subgroup argument
    B03002_002not Hispanic or LatinoNHoL
    B03002_003not Hispanic or Latino, white aloneNHoLW
    B03002_004not Hispanic or Latino, Black or African American aloneNHoLB
    B03002_005not Hispanic or Latino, American Indian and Alaska Native aloneNHoLAIAN
    B03002_006not Hispanic or Latino, Asian aloneNHoLA
    B03002_007not Hispanic or Latino, Native Hawaiian and Other Pacific Islander +aloneNHoLNHOPI
    B03002_008not Hispanic or Latino, some other race aloneNHoLSOR
    B03002_009not Hispanic or Latino, two or more racesNHoLTOMR
    B03002_010not Hispanic or Latino, two races including some other raceNHoLTRiSOR
    B03002_011not Hispanic or Latino, two races excluding some other race, and +three or more racesNHoLTReSOR
    B03002_012Hispanic or LatinoHoL
    B03002_013Hispanic or Latino, white aloneHoLW
    B03002_014Hispanic or Latino, Black or African American aloneHoLB
    B03002_015Hispanic or Latino, American Indian and Alaska Native aloneHoLAIAN
    B03002_016Hispanic or Latino, Asian aloneHoLA
    B03002_017Hispanic or Latino, Native Hawaiian and other Pacific Islander +aloneHoLNHOPI
    B03002_018Hispanic or Latino, some other race aloneHoLSOR
    B03002_019Hispanic or Latino, two or more racesHoLTOMR
    B03002_020Hispanic or Latino, two races including some other raceHoLTRiSOR
    B03002_021Hispanic or Latino, two races excluding some other race, and three +or more racesHoLTReSOR
    +

    DEL is a measure of the proportion of members of one subgroup(s) +residing in geographic units with above average density of members of +the subgroup(s). The index provides the proportion of a subgroup +population that would have to move across geographic units to achieve a +uniform density. DEL can range in value from 0 to 1.

    +
    hoover2021AL <- hoover(
    +  geo_large = 'county',
    +  geo_small = 'tract',
    +  state = 'AL',
    +  year = 2021,
    +  subgroup = 'NHoLB'
    +)
    +
    +# Obtain the 2021 census counties from the 'tigris' package
    +county2021AL <- counties(state = 'AL', year = 2021, cb = TRUE)
    +
    +# Join the DEL values to the county geometry
    +AL2021hoover <- county2021AL %>%
    +  left_join(hoover2021AL$del, by = 'GEOID')
    +
    # Visualize the DEL values (2017-2021 5-year ACS) for Alabama, U.S.A., counties
    +ggplot() +
    +  geom_sf(
    +    data = AL2021hoover,
    +    aes(fill = DEL),
    +    size = 0.05,
    +    color = 'white'
    +  ) +
    +  geom_sf(
    +    data = county2021AL,
    +    fill = 'transparent',
    +    color = 'white',
    +    size = 0.2
    +  ) +
    +  theme_minimal() +
    +  scale_fill_viridis_c(limits = c(0, 1)) +
    +  labs(fill = 'Index (Continuous)', caption = 'Source: U.S. Census ACS 2017-2021 estimates') +
    +  ggtitle(
    +    'Delta (Hoover)\nAlabama census tracts to counties',
    +    subtitle = 'Black non-Hispanic'
    +  )
    +

    +
    sessionInfo()
    +
    ## R version 4.4.0 (2024-04-24 ucrt)
    +## Platform: x86_64-w64-mingw32/x64
     ## Running under: Windows 10 x64 (build 19045)
     ## 
     ## Matrix products: default
     ## 
    +## 
     ## locale:
     ## [1] LC_COLLATE=English_United States.utf8 
     ## [2] LC_CTYPE=English_United States.utf8   
    @@ -2602,33 +3008,36 @@ 

    Compute Location Quotient (LQ)

    ## [4] LC_NUMERIC=C ## [5] LC_TIME=English_United States.utf8 ## +## time zone: America/New_York +## tzcode source: internal +## ## attached base packages: ## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: -## [1] tigris_2.0.1 tidycensus_1.3.2 ndi_0.1.4 ggplot2_3.4.0 -## [5] dplyr_1.1.0 knitr_1.42 +## [1] tigris_2.1 tidycensus_1.6.3 ndi_0.1.6.9000 ggplot2_3.5.1 +## [5] dplyr_1.1.4 knitr_1.46 ## ## loaded via a namespace (and not attached): -## [1] Rcpp_1.0.10 lattice_0.20-45 tidyr_1.3.0 class_7.3-20 -## [5] digest_0.6.31 psych_2.2.9 utf8_1.2.2 R6_2.5.1 -## [9] evaluate_0.20 e1071_1.7-12 highr_0.10 httr_1.4.4 -## [13] pillar_1.8.1 rlang_1.0.6 curl_5.0.0 uuid_1.1-0 -## [17] rstudioapi_0.14 car_3.1-1 jquerylib_0.1.4 Matrix_1.4-1 -## [21] rmarkdown_2.20 labeling_0.4.2 readr_2.1.3 stringr_1.5.0 -## [25] munsell_0.5.0 proxy_0.4-27 compiler_4.2.1 xfun_0.36 -## [29] pkgconfig_2.0.3 mnormt_2.1.1 htmltools_0.5.4 tidyselect_1.2.0 -## [33] tibble_3.1.8 viridisLite_0.4.1 fansi_1.0.4 tzdb_0.3.0 -## [37] crayon_1.5.2 withr_2.5.0 sf_1.0-9 wk_0.7.1 -## [41] MASS_7.3-58.1 rappdirs_0.3.3 grid_4.2.1 nlme_3.1-157 -## [45] jsonlite_1.8.4 gtable_0.3.1 lifecycle_1.0.3 DBI_1.1.3 -## [49] magrittr_2.0.3 units_0.8-1 scales_1.2.1 KernSmooth_2.23-20 -## [53] cli_3.6.0 stringi_1.7.12 cachem_1.0.6 carData_3.0-5 -## [57] farver_2.1.1 xml2_1.3.3 bslib_0.4.2 ellipsis_0.3.2 -## [61] generics_0.1.3 vctrs_0.5.2 s2_1.1.2 tools_4.2.1 -## [65] Cairo_1.6-0 glue_1.6.2 purrr_1.0.1 hms_1.1.2 -## [69] abind_1.4-5 parallel_4.2.1 fastmap_1.1.0 yaml_2.3.6 -## [73] colorspace_2.1-0 classInt_0.4-8 rvest_1.0.3 sass_0.4.4
    +## [1] gtable_0.3.5 xfun_0.43 bslib_0.7.0 psych_2.4.6.26 +## [5] lattice_0.22-6 tzdb_0.4.0 Cairo_1.6-2 vctrs_0.6.5 +## [9] tools_4.4.0 generics_0.1.3 curl_5.2.1 parallel_4.4.0 +## [13] tibble_3.2.1 proxy_0.4-27 fansi_1.0.6 highr_0.10 +## [17] pkgconfig_2.0.3 Matrix_1.7-0 KernSmooth_2.23-22 uuid_1.2-0 +## [21] lifecycle_1.0.4 farver_2.1.2 compiler_4.4.0 stringr_1.5.1 +## [25] munsell_0.5.1 mnormt_2.1.1 carData_3.0-5 htmltools_0.5.8.1 +## [29] class_7.3-22 sass_0.4.9 yaml_2.3.8 pillar_1.9.0 +## [33] car_3.1-2 crayon_1.5.2 jquerylib_0.1.4 tidyr_1.3.1 +## [37] MASS_7.3-60.2 classInt_0.4-10 cachem_1.0.8 wk_0.9.1 +## [41] abind_1.4-5 nlme_3.1-164 tidyselect_1.2.1 rvest_1.0.4 +## [45] digest_0.6.35 stringi_1.8.3 sf_1.0-16 purrr_1.0.2 +## [49] labeling_0.4.3 fastmap_1.1.1 grid_4.4.0 colorspace_2.1-0 +## [53] cli_3.6.3 magrittr_2.0.3 utf8_1.2.4 e1071_1.7-14 +## [57] readr_2.1.5 withr_3.0.0 scales_1.3.0 rappdirs_0.3.3 +## [61] rmarkdown_2.26 httr_1.4.7 hms_1.1.3 evaluate_0.23 +## [65] viridisLite_0.4.2 s2_1.1.6 rlang_1.1.4 Rcpp_1.0.12 +## [69] glue_1.7.0 DBI_1.2.2 xml2_1.3.6 rstudioapi_0.16.0 +## [73] jsonlite_1.8.8 R6_2.5.1 units_0.8-5