diff --git a/DESCRIPTION b/DESCRIPTION index 5de803b..85cb3de 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,7 +1,7 @@ Package: ndi Title: Neighborhood Deprivation Indices -Version: 0.1.0 -Date: 2022-08-10 +Version: 0.1.1 +Date: 2022-08-14 Authors@R: c(person(given = "Ian D.", family = "Buller", @@ -11,14 +11,24 @@ Authors@R: person(given = "NCI", role = c("cph", "fnd"))) Maintainer: Ian D. Buller -Description: Compute various neighborhood deprivation indices (NDI), including: +Description: Computes various metrics of socio-economic deprivation and disparity in + the United States. Some metrics are considered "spatial" because they + consider the values of neighboring (i.e., adjacent) census geographies in + their computation, while other metrics are "aspatial" because they only + consider the value within each census geography. Two types of aspatial + neighborhood deprivation indices (NDI) are available: including: (1) based on Messer et al. (2006) and (2) based on Andrews et al. (2020) and Slotman et al. (2022) - who uses variables chosen by Roux and Mair (2010) + who use variables chosen by Roux and Mair (2010) . Both are a decomposition of multiple demographic characteristics from the U.S. Census Bureau - American Community Survey 5-year estimates. + American Community Survey 5-year estimates (ACS-5; 2010-2020). Using data + from the ACS-5 (2009-2020), the package can also (1) compute the spatial + Racial Isolation Index (RI) based on Anthopolos et al. (2011) + s, (2) compute spatial the Educational Isolation + Index (EI) based on Bravo et al. (2021) ,and + (3) retrieve the aspatial Gini Index based on Gini (1921) . License: Apache License (>= 2.0) Encoding: UTF-8 Roxygen: list(markdown = TRUE) @@ -28,7 +38,9 @@ Depends: Imports: dplyr, MASS, + Matrix, psych, + sf, stats, stringr, tidycensus, @@ -38,7 +50,6 @@ Suggests: testthat, tigris, R.rsp, - sf, spelling VignetteBuilder: R.rsp Language: en-US diff --git a/NAMESPACE b/NAMESPACE index d1715d7..96ecd72 100644 --- a/NAMESPACE +++ b/NAMESPACE @@ -1,11 +1,18 @@ # Generated by roxygen2: do not edit by hand +export(anthopolos) +export(bravo) +export(gini) export(messer) export(powell_wiley) import(dplyr) importFrom(MASS,ginv) +importFrom(Matrix,sparseMatrix) importFrom(psych,alpha) importFrom(psych,principal) +importFrom(sf,st_drop_geometry) +importFrom(sf,st_geometry) +importFrom(sf,st_intersects) importFrom(stats,complete.cases) importFrom(stats,cor) importFrom(stats,cov2cor) diff --git a/NEWS.md b/NEWS.md index b30ab7c..58f96ed 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,5 +1,16 @@ # ndi (development version) +# ndi v0.1.1 +* Added `anthopolos()` function to compute the Racial Isolation Index (RI) based on based on [Anthopolos et al. (2011)](https://www.doi.org/10.1016/j.sste.2011.06.002) for specified counties/tracts 2009-2020 +* Added `bravo()` function to compute the Educational Isolation Index (EI) based on based on [Bravo et al. (2021)](https://www.doi.org/10.3390/ijerph18179384) for specified counties/tracts 2009-2020 +* Added `gini()` function to retrieve the Gini Index based on [Gini (1921)](https://www.doi.org/10.2307/2223319) for specified counties/tracts 2009-2020 +* `Matrix` and `sf` are now Depends +* Updated vignette and README for new features +* Fixed typos throughout documentation +* Updated Description in DESCRIPTION +* Updated 'package.R' with new details and section +* Updated CITATION with new citations for the additional metrics + # ndi v0.1.0 * Fixed invalid URL and typos in package README.md diff --git a/R/anthopolos.R b/R/anthopolos.R new file mode 100644 index 0000000..29f3c85 --- /dev/null +++ b/R/anthopolos.R @@ -0,0 +1,216 @@ +#' Racial Isolation Index based on Anthopolos et al. (2011) +#' +#' Compute the Racial Isolation Index (Anthopolos) values for selected subgroup(s). +#' +#' @param geo Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}. +#' @param year Numeric. The year to compute the estimate. The default is 2020 and the years between 2009 and 2020 are currently available. +#' @param subgroup Character string specifying the racial/ethnic subgroup(s). See Details for available choices. +#' @param quiet Logical. If TRUE, will display messages about potential missing census information. The default is FALSE. +#' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics +#' +#' @details This function will compute the Racial Isolation Index (RI) of U.S. census tracts or counties for a specified geographical extent (e.g., entire U.S. or a single state) based on Anthopolos et al. (2011) \doi{10.1016/j.sste.2011.06.002} who originally designed the metric for the racial isolation of non-Hispanic Black individuals. This function provides the computation of RI for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). +#' +#' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the geospatial computation. The yearly estimates available for 2009 through 2020 when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: +#' \itemize{ +#' \item{B03002_002: }{Not Hispanic or Latino "NHoL"} +#' \item{B03002_003: }{Not Hispanic or Latino, White alone "NHoLW"} +#' \item{B03002_004: }{Not Hispanic or Latino, Black or African American alone "NHoLB"} +#' \item{B03002_005: }{Not Hispanic or Latino, American Indian and Alaska Native alone "NHoLAIAN"} +#' \item{B03002_006: }{Not Hispanic or Latino, Asian alone "NHoLA"} +#' \item{B03002_007: }{Not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone "NHoLNHOPI"} +#' \item{B03002_008: }{Not Hispanic or Latino, Some other race alone "NHoLSOR"} +#' \item{B03002_009: }{Not Hispanic or Latino, Two or more races "NHoLTOMR"} +#' \item{B03002_010: }{Not Hispanic or Latino, Two races including Some other race "NHoLTRiSOR"} +#' \item{B03002_011: }{Not Hispanic or Latino, Two races excluding Some other race, and three or more races "NHoLTReSOR"} +#' \item{B03002_012: }{Hispanic or Latino "HoL"} +#' \item{B03002_013: }{Hispanic or Latino, White alone "HoLW"} +#' \item{B03002_014: }{Hispanic or Latino, Black or African American alone "HoLB"} +#' \item{B03002_015: }{Hispanic or Latino, American Indian and Alaska Native alone "HoLAIAN"} +#' \item{B03002_016: }{Hispanic or Latino, Asian alone "HoLA"} +#' \item{B03002_017: }{Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone "HoLNHOPI"} +#' \item{B03002_018: }{Hispanic or Latino, Some other race alone "HoLSOR"} +#' \item{B03002_019: }{Hispanic or Latino, Two or more races "HoLTOMR"} +#' \item{B03002_020: }{Hispanic or Latino, Two races including Some other race "HoLTRiSOR"} +#' \item{B03002_021: }{Hispanic or Latino, Two races excluding Some other race, and three or more races "HoLTReSOR"} +#' } +#' +#' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. NOTE: Current version does not correct for edge effects (e.g., census geographies along the specified spatial extent border, coastline, or U.S.-Mexico / U.S.-Canada border) may have few neighboring census geographies and RI values in these census geographies may be unstable. A stop-gap solution for the former source of edge effect is to compute the RI for neighboring census geographies (i.e., the states bordering a study area of interest) and then use the estimates of the study area of interest. +#' +#' A census geography (and its neighbors) that has nearly all of its population who identify with the specified race/ethnicity subgroup(s) (e.g., non-Hispanic or Latino, Black or African American alone) will have an RI value that is close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population who identify with the specified race/ethnicity subgroup(s) (e.g., not non-Hispanic or Latino, Black or African American alone) will have an RI value that is close to 0. +#' +#' @return An object of class 'list'. This is a named list with the following components: +#' +#' \describe{ +#' \item{\code{ri}}{An object of class 'tbl' for the GEOID, name, RI, and raw census values of specified census geographies.} +#' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute the RI.} +#' } +#' +#' @import dplyr +#' @importFrom Matrix sparseMatrix +#' @importFrom sf st_drop_geometry st_geometry st_intersects +#' @importFrom stringr str_trim +#' @importFrom tidycensus get_acs +#' @importFrom tidyr gather separate +#' @export +#' +#' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). +#' +#' @examples +#' \dontrun{ +#' # Wrapped in \dontrun{} because these examples require a Census API key. +#' +#' # Tract-level metric (2020) +#' anthopolos(geo = "tract", state = "GA", year = 2020, subgroup = c("NHoLB", "HoLB")) +#' +#' # County-level metric (2020) +#' anthopolos(geo = "county", state = "GA", year = 2020, subgroup = c("NHoLB", "HoLB")) +#' +#' } +#' +anthopolos <- function(geo = "tract", year = 2020, subgroup, quiet = FALSE, ...) { + + # Check arguments + match.arg(geo, choices = c("county", "tract")) + stopifnot(is.numeric(year), year %in% 2009:2020) + match.arg(subgroup, several.ok = TRUE, + choices = c("NHoL", "NHoLW", "NHoLB", "NHoLAIAN", "NHoLA", "NHoLNHOPI", + "NHoLSOR", "NHoLTOMR", "NHoLTRiSOR", "NHoLTReSOR", + "HoL", "HoLW", "HoLB", "HoLAIAN", "HoLA", "HoLNHOPI", + "HoLSOR", "HoLTOMR", "HoLTRiSOR", "HoLTReSOR")) + + # select census variables + vars <- c(TotalPop = "B03002_001", + NHoL = "B03002_002", + NHoLW = "B03002_003", + NHoLB = "B03002_004", + NHoLAIAN = "B03002_005", + NHoLA = "B03002_006", + NHoLNHOPI = "B03002_007", + NHoLSOR = "B03002_008", + NHoLTOMR = "B03002_009", + NHoLTRiSOR = "B03002_010", + NHoLTReSOR = "B03002_011", + HoL = "B03002_012", + HoLW = "B03002_013", + HoLB = "B03002_014", + HoLAIAN = "B03002_015", + HoLA = "B03002_016", + HoLNHOPI = "B03002_017", + HoLSOR = "B03002_018", + HoLTOMR = "B03002_019", + HoLTRiSOR = "B03002_020", + HoLTReSOR = "B03002_021") + + selected_vars <- vars[c("TotalPop", subgroup)] + out_names <- names(selected_vars) # save for output + prefix <- "subgroup" + suffix <- seq(1:length(subgroup)) + names(selected_vars) <- c("TotalPop", paste(prefix, suffix, sep = "")) + in_names <- paste(names(selected_vars), "E", sep = "") + + # acquire RI variables and sf geometries + ri_vars <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo, + year = year, + output = "wide", + variables = selected_vars, + geometry = TRUE, ...))) + + + if (geo == "tract") { + ri_vars <- ri_vars %>% + tidyr::separate(NAME, into = c("tract", "county", "state"), sep = ",") %>% + dplyr::mutate(tract = gsub("[^0-9\\.]","", tract)) + } else { + ri_vars <- ri_vars %>% tidyr::separate(NAME, into = c("county", "state"), sep = ",") + } + + ri_vars <- ri_vars %>% + dplyr::mutate(county = stringr::str_trim(county), + subgroup = rowSums(sf::st_drop_geometry(ri_vars[ , in_names[-1]]))) + + # Compute RI + ## From Anthopolos et al. (2011) https://doi.org/10.1016/j.sste.2011.06.002 + ## RI_{im} = (Sigma_{j∈∂_{i}} w_{ij} * T_{jm}) / (Sigma_{j∈∂_{i}} w_{ij} * T_{j}) + ## Where: + ## ∂_{i} denotes the set of index units i and its neighbors + ## Given M mutually exclusive racial/ethnic subgroups, m indexes the subgroups of M + ## T_{i} denotes the total population in region i (TotalPop) + ## T_{im} denotes the population of the selected subgroup(s) (subgroup1, ...) + ## w_{ij} denotes a nXn first-order adjacency matrix, where n is the number of census geometries in the study area + ### and the entries of w_{ij} are set to 1 if a boundary is shared by region i and region j and zero otherwise + ### Entries of the main diagonal (since i∈∂_{i}, w_{ij} = w_{ii} when j = i) of w_{ij} are set to 1.5 + ### such that the weight of the index unit, i, is larger than the weights assigned to adjacent tracts + + ## Geospatial adjacency matrix (wij) + tmp <- sf::st_intersects(sf::st_geometry(ri_vars), sparse = TRUE) + names(tmp) <- as.character(seq_len(nrow(ri_vars))) + tmpL <- length(tmp) + tmpcounts <- unlist(Map(length, tmp)) + tmpi <- rep(1:tmpL, tmpcounts) + tmpj <- unlist(tmp) + wij <- Matrix::sparseMatrix(i = tmpi, j = tmpj, x = 1, dims = c(tmpL, tmpL)) + diag(wij) <- 1.5 + + ## Compute + ri_vars <- sf::st_drop_geometry(ri_vars) # drop geometries (can join back later) + RIim <- list() + for (i in 1:dim(wij)[1]){ + RIim[[i]] <- sum(as.matrix(wij[i, ])*ri_vars[ , "subgroup"]) / sum(as.matrix(wij[i, ])*ri_vars[, "TotalPopE"]) + } + ri_vars$RI <- unlist(RIim) + + # warning for missingness of census characteristics + missingYN <- ri_vars %>% + dplyr::select(in_names) + names(missingYN) <- out_names + missingYN <- missingYN %>% + tidyr::gather(key = "variable", value = "val") %>% + dplyr::mutate(missing = is.na(val)) %>% + dplyr::group_by(variable) %>% + dplyr::mutate(total = n()) %>% + dplyr::group_by(variable, total, missing) %>% + dplyr::count() %>% + dplyr::mutate(percent = round(n / total * 100,2), + percent = paste0(percent," %")) %>% + dplyr::filter(missing == TRUE) + + if (quiet == FALSE) { + # warning for missing census data + if (nrow(missingYN) != 0) { + message("Warning: Missing census data") + } else { + returnValue(missingYN) + } + } + + # format output + if (geo == "tract") { + ri <- ri_vars %>% + dplyr::select(c("GEOID", + "state", + "county", + "tract", + "RI", + in_names)) + names(ri) <- c("GEOID", "state", "county", "tract", "RI", out_names) + } else { + ri <- ri_vars %>% + dplyr::select(c("GEOID", + "state", + "county", + "RI", + in_names)) + names(ri) <- c("GEOID", "state", "county", "RI", out_names) + } + + ri <- ri %>% + dplyr::mutate(county = stringr::str_trim(county), + state = stringr::str_trim(state)) %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + out <- list(ri = ri, + missing = missingYN) + + return(out) +} diff --git a/R/bravo.R b/R/bravo.R new file mode 100644 index 0000000..c1c7a72 --- /dev/null +++ b/R/bravo.R @@ -0,0 +1,183 @@ +#' Educational Isolation Index based on Bravo et al. (2021) +#' +#' Compute the Educational Isolation Index (Bravo) values for selected educational attainment category(ies). +#' +#' @param geo Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}. +#' @param year Numeric. The year to compute the estimate. The default is 2020 and the years between 2009 and 2020 are currently available. +#' @param subgroup Character string specifying the educational attainment category(ies). See Details for available choices. +#' @param quiet Logical. If TRUE, will display messages about potential missing census information. The default is FALSE. +#' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics +#' +#' @details This function will compute the Educational Isolation Index (EI) of U.S. census tracts or counties for a specified geographical extent (e.g., entire U.S. or a single state) based on Bravo et al. (2021) \doi{10.3390/ijerph18179384} who originally designed the metric for the educational isolation of individual without a college degree. This function provides the computation of EI for any of the U.S. Census Bureau educational attainment levels. +#' +#' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the geospatial computation. The yearly estimates available for 2009 through 2020 when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The five educational attainment levels (U.S. Census Bureau definitions) are: +#' \itemize{ +#' \item{B06009_002: }{Less than high school graduate "LtHS"} +#' \item{B06009_003: }{High school graduate (includes equivalency) "HSGiE"} +#' \item{B06009_004: }{Some college or associate's degree "SCoAD"} +#' \item{B06009_005: }{Bachelor's degree "BD"} +#' \item{B06009_006: }{Graduate or professional degree "GoPD"} +#' } +#' +#' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. NOTE: Current version does not correct for edge effects (e.g., census geographies along the specified spatial extent border, coastline, or U.S.-Mexico / U.S.-Canada border) may have few neighboring census geographies and EI values in these census geographies may be unstable. A stop-gap solution for the former source of edge effect is to compute the EI for neighboring census geographies (i.e., the states bordering a study area of interest) and then use the estimates of the study area of interest. +#' +#' A census geography (and its neighbors) that has nearly all of its population with the specified educational attainment category (e.g., a Bachelor's degree or more) will have an EI value that is close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population with the specified educational attainment category (e.g., less than a Bachelor's degree) will have an EI value that is close to 0. +#' +#' @return An object of class 'list'. This is a named list with the following components: +#' +#' \describe{ +#' \item{\code{ei}}{An object of class 'tbl' for the GEOID, name, EI, and raw census values of specified census geographies.} +#' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute the EI.} +#' } +#' +#' @import dplyr +#' @importFrom Matrix sparseMatrix +#' @importFrom sf st_drop_geometry st_geometry st_intersects +#' @importFrom stringr str_trim +#' @importFrom tidycensus get_acs +#' @importFrom tidyr gather separate +#' @export +#' +#' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). +#' +#' @examples +#' \dontrun{ +#' # Wrapped in \dontrun{} because these examples require a Census API key. +#' +#' # Tract-level metric (2020) +#' bravo(geo = "tract", state = "GA", year = 2020, subgroup = c("LtHS", "HSGiE")) +#' +#' # County-level metric (2020) +#' bravo(geo = "county", state = "GA", year = 2020, subgroup = c("LtHS", "HSGiE")) +#' +#' } +#' +bravo <- function(geo = "tract", year = 2020, subgroup, quiet = FALSE, ...) { + + # Check arguments + match.arg(geo, choices = c("county", "tract")) + stopifnot(is.numeric(year), year %in% 2009:2020) + match.arg(subgroup, several.ok = TRUE, + choices = c("LtHS", "HSGiE", "SCoAD", "BD", "GoPD")) + + # select census variables + vars <- c(TotalPop = "B06009_001", + LtHS = "B06009_002", + HSGiE = "B06009_003", + SCoAD = "B06009_004", + BD = "B06009_005", + GoPD = "B06009_006") + + selected_vars <- vars[c("TotalPop", subgroup)] + out_names <- names(selected_vars) # save for output + prefix <- "subgroup" + suffix <- seq(1:length(subgroup)) + names(selected_vars) <- c("TotalPop", paste(prefix, suffix, sep = "")) + in_names <- paste(names(selected_vars), "E", sep = "") + + # acquire EI variables and sf geometries + ei_vars <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo, + year = year, + output = "wide", + variables = selected_vars, + geometry = TRUE, ...))) + + + if (geo == "tract") { + ei_vars <- ei_vars %>% + tidyr::separate(NAME, into = c("tract", "county", "state"), sep = ",") %>% + dplyr::mutate(tract = gsub("[^0-9\\.]","", tract)) + } else { + ei_vars <- ei_vars %>% tidyr::separate(NAME, into = c("county", "state"), sep = ",") + } + + ei_vars <- ei_vars %>% + dplyr::mutate(county = stringr::str_trim(county), + subgroup = rowSums(sf::st_drop_geometry(ei_vars[ , in_names[-1]]))) + + # Compute EI + ## From Bravo et al. (2021) https://doi.org/10.3390/ijerph18179384 + ## EI_{im} = (Sigma_{j∈∂_{i}} w_{ij} * T_{jm}) / (Sigma_{j∈∂_{i}} w_{ij} * T_{j}) + ## Where: + ## ∂_{i} denotes the set of index units i and its neighbors + ## Given M mutually exclusive subgroups of educational attainment categories, m indexes the subgroups of M + ## T_{i} denotes the total population in region i (TotalPop) + ## T_{im} denotes the population of the selected subgroup(s) (subgroup1, ...) + ## w_{ij} denotes a nXn first-order adjacency matrix, where n is the number of census geometries in the study area + ### and the entries of w_{ij} are set to 1 if a boundary is shared by region i and region j and zero otherwise + ### Entries of the main diagonal (since i∈∂_{i}, w_{ij} = w_{ii} when j = i) of w_{ij} are set to 1.5 + ### such that the weight of the index unit, i, is larger than the weights assigned to adjacent tracts + + ## Geospatial adjacency matrix (wij) + tmp <- sf::st_intersects(sf::st_geometry(ei_vars), sparse = TRUE) + names(tmp) <- as.character(seq_len(nrow(ei_vars))) + tmpL <- length(tmp) + tmpcounts <- unlist(Map(length, tmp)) + tmpi <- rep(1:tmpL, tmpcounts) + tmpj <- unlist(tmp) + wij <- Matrix::sparseMatrix(i = tmpi, j = tmpj, x = 1, dims = c(tmpL, tmpL)) + diag(wij) <- 1.5 + + ## Compute + ei_vars <- sf::st_drop_geometry(ei_vars) # drop geometries (can join back later) + EIim <- list() + for (i in 1:dim(wij)[1]){ + EIim[[i]] <- sum(as.matrix(wij[i, ])*ei_vars[ , "subgroup"]) / sum(as.matrix(wij[i, ])*ei_vars[, "TotalPopE"]) + } + ei_vars$EI <- unlist(EIim) + + # warning for missingness of census characteristics + missingYN <- ei_vars %>% + dplyr::select(in_names) + names(missingYN) <- out_names + missingYN <- missingYN %>% + tidyr::gather(key = "variable", value = "val") %>% + dplyr::mutate(missing = is.na(val)) %>% + dplyr::group_by(variable) %>% + dplyr::mutate(total = n()) %>% + dplyr::group_by(variable, total, missing) %>% + dplyr::count() %>% + dplyr::mutate(percent = round(n / total * 100,2), + percent = paste0(percent," %")) %>% + dplyr::filter(missing == TRUE) + + if (quiet == FALSE) { + # warning for missing census data + if (nrow(missingYN) != 0) { + message("Warning: Missing census data") + } else { + returnValue(missingYN) + } + } + + # format output + if (geo == "tract") { + ei <- ei_vars %>% + dplyr::select(c("GEOID", + "state", + "county", + "tract", + "EI", + in_names)) + names(ei) <- c("GEOID", "state", "county", "tract", "EI", out_names) + } else { + ei <- ei_vars %>% + dplyr::select(c("GEOID", + "state", + "county", + "EI", + in_names)) + names(ei) <- c("GEOID", "state", "county", "EI", out_names) + } + + ei <- ei %>% + dplyr::mutate(county = stringr::str_trim(county), + state = stringr::str_trim(state)) %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + out <- list(ei = ei, + missing = missingYN) + + return(out) +} diff --git a/R/gini.R b/R/gini.R new file mode 100644 index 0000000..32f3bba --- /dev/null +++ b/R/gini.R @@ -0,0 +1,112 @@ +#' Gini Index based on Gini (1921) +#' +#' Retrieve the Gini Index values. +#' +#' @param geo Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}. +#' @param year Numeric. The year to compute the estimate. The default is 2020 and the years between 2009 and 2020 are currently available. +#' @param quiet Logical. If TRUE, will display messages about potential missing census information +#' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics +#' +#' @details This function will retrieve the Gini Index of U.S. census tracts or counties for a specified geographical extent (e.g., entire U.S. or a single state) based on Gini (1921) \doi{10.2307/2223319}. +#' +#' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey estimate of the Gini Index (ACS: B19083). The estimate are available for 2009 through 2020 when ACS-5 data are available but are available from other U.S. Census Bureau surveys. +#' +#' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. +#' +#' According to the U.S. Census Bureau \url{https://www.census.gov/topics/income-poverty/income-inequality/about/metrics/gini-index.html}: "The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income). The Gini is based on the difference between the Lorenz curve (the observed cumulative income distribution) and the notion of a perfectly equal income distribution." +#' +#' @return An object of class 'list'. This is a named list with the following components: +#' +#' \describe{ +#' \item{\code{gini}}{An object of class 'tbl' for the GEOID, name, and Gini index of specified census geographies.} +#' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for the Gini index.} +#' } +#' +#' @import dplyr +#' @importFrom stringr str_trim +#' @importFrom tidycensus get_acs +#' @importFrom tidyr gather separate +#' @export +#' +#' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). +#' +#' @examples +#' \dontrun{ +#' # Wrapped in \dontrun{} because these examples require a Census API key. +#' +#' # Tract-level metric (2020) +#' gini(geo = "tract", state = "GA", year = 2020) +#' +#' # County-level metric (2020) +#' gini(geo = "county", state = "GA", year = 2020) +#' +#' } +#' +gini <- function(geo = "tract", year = 2020, quiet = FALSE, ...) { + + # Check arguments + match.arg(geo, choices = c("county", "tract")) + stopifnot(is.numeric(year), year %in% 2009:2020) + + # select census variable + vars <- c(gini = "B19083_001") + + # acquire Gini Index + gini_vars <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo, + year = year, + output = "wide", + variables = vars, ...))) + + if (geo == "tract") { + gini_vars <- gini_vars %>% + tidyr::separate(NAME, into = c("tract", "county", "state"), sep = ",") %>% + dplyr::mutate(tract = gsub("[^0-9\\.]","", tract)) + } else { + gini_vars <- gini_vars %>% tidyr::separate(NAME, into = c("county", "state"), sep = ",") + } + + gini_vars <- gini_vars %>% + dplyr::mutate(gini = giniE, + county = stringr::str_trim(county)) + + # warning for missingness of census characteristics + missingYN <- gini_vars %>% + dplyr::select(gini) %>% + tidyr::gather(key = "variable", value = "val") %>% + dplyr::mutate(missing = is.na(val)) %>% + dplyr::group_by(variable) %>% + dplyr::mutate(total = n()) %>% + dplyr::group_by(variable, total, missing) %>% + dplyr::count() %>% + dplyr::mutate(percent = round(n / total * 100,2), + percent = paste0(percent," %")) %>% + dplyr::filter(missing == TRUE) + + if (quiet == FALSE) { + # warning for missing census data + if (nrow(missingYN) != 0) { + message("Warning: Missing census data") + } else { + returnValue(missingYN) + } + } + + if (geo == "tract") { + gini <- gini_vars %>% + dplyr::select(GEOID, state, county, tract, gini) + } else { + gini <- gini_vars %>% + dplyr::select(GEOID, state, county, gini) + } + + gini <- gini %>% + dplyr::mutate(county = stringr::str_trim(county), + state = stringr::str_trim(state)) %>% + dplyr::arrange(GEOID) %>% + dplyr::as_tibble() + + out <- list(gini = gini, + missing = missingYN) + + return(out) +} diff --git a/R/globals.R b/R/globals.R index 41d07f7..2bb113e 100644 --- a/R/globals.R +++ b/R/globals.R @@ -23,4 +23,4 @@ globalVariables(c("CWD", "EDU", "EMP", "FHH", "GEOID", "MedHHInc", "MedHHIncE", "PctUnemp_7074FE", "PctUnemp_7074ME", "PctUnemp_75upME", "PctUnemp_denE", "PctUnemp_numE", "PctUnempl", "PctUnemplE", "PctUnemplZ", "PctWorkClass", "PctWorkClassZ", "TotalPop", "TotalPopulationE", "U30", "county", "logMedHHInc", - "logMedHomeVal", "percent", "state", "total", "tract", "val", "variable")) + "logMedHomeVal", "percent", "state", "total", "tract", "val", "variable", "giniE")) diff --git a/R/messer.R b/R/messer.R index cfe604e..dcad0e9 100644 --- a/R/messer.R +++ b/R/messer.R @@ -32,7 +32,7 @@ #' @return An object of class 'list'. This is a named list with the following components: #' #' \describe{ -#' \item{\code{ndi}}{An object of class 'tbl' for the GEOID, name, NDI (standardized), NDI (quartile), and raw census values of specified census tracts.} +#' \item{\code{ndi}}{An object of class 'tbl' for the GEOID, name, NDI (standardized), NDI (quartile), and raw census values of specified census geographies.} #' \item{\code{pca}}{An object of class 'principal', returns the output of \code{\link[psych]{principal}} used to compute the NDI values.} #' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute the NDI.} #' } @@ -64,7 +64,7 @@ #' messer <- function(geo = "tract", year = 2020, imp = FALSE, quiet = FALSE, ...) { - # Check arugments + # Check arguments match.arg(geo, choices = c("county", "tract")) stopifnot(is.numeric(year), year %in% 2010:2020) diff --git a/R/package.R b/R/package.R index a2ab5da..f9290bb 100644 --- a/R/package.R +++ b/R/package.R @@ -1,22 +1,28 @@ #' The ndi Package: Neighborhood Deprivation Indices #' -#' Computes various geospatial neighborhood deprivation indices in the United States +#' Computes various metrics of socio-economic deprivation and disparity in the United States based on information available from the U.S. Census Bureau. #' -#' @details The 'ndi' package computes various neighborhood deprivation indices (NDI), including: (1) based on Messer et al. (2006) \doi{10.1007/s11524-006-9094-x} and (2) based on Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} who uses variables chosen by Roux and Mair (2010)] \doi{10.1111/j.1749-6632.2009.05333.x}. Both are a decomposition of multiple demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates. +#' @details The 'ndi' package computes various metrics of socio-economic deprivation and disparity in the United States. Some metrics are considered "spatial" because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other metrics are "aspatial" because they only consider the value within each census geography. Two types of aspatial neighborhood deprivation indices (NDI) are available: (1) based on Messer et al. (2006) \doi{10.1007/s11524-006-9094-x} and (2) based on Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} who use variables chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x}. Both are a decomposition of multiple demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates. (ACS-5; 2010-2020). Using data from the ACS-5 (2009-2020), the package can also (1) compute the spatial Racial Isolation Index (RI) based on Anthopolos et al. (2011) \doi{10.1016/j.sste.2011.06.002}, (2) compute the spatial Educational Isolation Index (EI) based on Bravo et al. (2021) \doi{10.3390/ijerph18179384}, and (3) retrieve the aspatial Gini Index based on Gini (1921) \doi{10.2307/2223319}. #' #' Key content of the 'ndi' package include:\cr #' -#' \bold{Neighborhood Deprivation Indices} +#' \bold{Metrics of Socio-Economic Deprivation and Disparity} #' -#' \code{\link{messer}} Computes NDI values based on Messer et al. (2006) \doi{10.1007/s11524-006-9094-x}. +#' \code{\link{anthopolos}} Computes the spatial Racial Isolation Index (RI) based on Anthopolos (2011) \doi{10.1016/j.sste.2011.06.002}. #' -#' \code{\link{powell_wiley}} Computes NDI values based on Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} who uses variables chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x}. +#' \code{\link{bravo}} Computes the spatial Educational Isolation Index (EI) based on Bravo (2021) \doi{10.3390/ijerph18179384}. +#' +#' \code{\link{gini}} Retrieves the aspatial Gini Index based on Gini (1921) \doi{10.2307/2223319}. +#' +#' \code{\link{messer}} Computes the aspatial Neighborhood Deprivation Index (NDI) based on Messer et al. (2006) \doi{10.1007/s11524-006-9094-x}. +#' +#' \code{\link{powell_wiley}} Computes the aspatial Neighborhood Deprivation Index (NDI) based on Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} who use variables chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x}. #' #' @name ndi-package #' @aliases ndi-package ndi #' @docType package #' -#' @section Dependencies: The 'ndi' package relies heavily upon \code{\link{tidycensus}} and \code{\link{psych}} for computing the neighborhood deprivation indices. The \code{\link{messer}} function builds upon code developed by Hruska et al. (2022) \doi{10.17605/OSF.IO/M2SAV} by fictionalizing, adding percent households earning <$30,000 per year to the NDI computation, and providing the option for computing the ACS-5 2006-2010 NDI values. There was no code companion to compute NDI included in Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} or Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002}, but the package maintainer worked directly with the authors to replicate their SAS code in R. +#' @section Dependencies: The 'ndi' package relies heavily upon \code{\link{tidycensus}} and \code{\link{psych}} for computing the neighborhood deprivation indices. The \code{\link{messer}} function builds upon code developed by Hruska et al. (2022) \doi{10.17605/OSF.IO/M2SAV} by fictionalizing, adding percent households earning <$30,000 per year to the NDI computation, and providing the option for computing the ACS-5 2006-2010 NDI values. There is no code companion to compute NDI included in Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} or Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002}, respectively, but the package maintainer worked directly with the authors to replicate their SAS code in R. The spatial metrics RI and EI rely on the \code{\link{sf}} and \code{\link{Matrix}} packages to compute the geospatial adjacency matrix between census geographies. There is no code companion to compute RI or EI included in Anthopolos et al. (2011) \doi{10.1016/j.sste.2011.06.002} or Bravo et al. (2021) \doi{10.3390/ijerph18179384}, respectively. #' #' @author Ian D. Buller\cr \emph{Occupational and Environmental Epidemiology Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, Maryland, USA.} \cr #' @@ -27,7 +33,9 @@ NULL #' @import dplyr #' @importFrom MASS ginv +#' @importFrom Matrix sparseMatrix #' @importFrom psych alpha principal +#' @importFrom sf st_drop_geometry st_geometry st_intersects #' @importFrom stats complete.cases cor cov2cor loadings median promax quantile sd #' @importFrom stringr str_trim #' @importFrom tidycensus get_acs diff --git a/R/powell_wiley.R b/R/powell_wiley.R index e247f81..a97d18d 100644 --- a/R/powell_wiley.R +++ b/R/powell_wiley.R @@ -36,7 +36,7 @@ #' @return An object of class 'list'. This is a named list with the following components: #' #' \describe{ -#' \item{\code{ndi}}{An object of class 'tbl' for the GEOID, name, NDI continuous, NDI quintiles, and raw census values of specified census tracts.} +#' \item{\code{ndi}}{An object of class 'tbl' for the GEOID, name, NDI continuous, NDI quintiles, and raw census values of specified census geographies.} #' \item{\code{pca}}{An object of class 'principal', returns the output of \code{\link[psych]{principal}} used to compute the NDI values.} #' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute the NDI.} #' \item{\code{cronbach}}{An object of class 'character' or 'numeric' for the results of the Cronbach's alpha calculation. If only one factor is computed, a message is returned. If more than one factor is computed, the Cronbach's alpha is calculated and should check that it is >0.7 for respectable internal consistency between factors.} @@ -70,7 +70,7 @@ #' powell_wiley <- function(geo = "tract", year, imp = FALSE, quiet = FALSE, ...) { - # Check arugments + # Check arguments match.arg(geo, choices = c("county", "tract")) stopifnot(is.numeric(year), year %in% 2010:2020) diff --git a/README.md b/README.md index 1a0c6be..97bd768 100644 --- a/README.md +++ b/README.md @@ -10,11 +10,11 @@ ndi: Neighborhood Deprivation Indices -**Date repository last updated**: August 10, 2022 +**Date repository last updated**: August 14, 2022 ### Overview -The `ndi` package is a suite of `R` functions to compute various geospatial neighborhood deprivation indices (NDI) in the United States. Two types of NDI are available in the initial repository: (1) based on [Messer et al. (2006)](https://doi.org/10.1007/s11524-006-9094-x) and (2) based on [Andrews et al. (2020)](https://doi.org/10.1080/17445647.2020.1750066) and [Slotman et al. (2022)](https://doi.org/10.1016/j.dib.2022.108002) who use variables chosen by [Roux and Mair (2010)](https://doi.org/10.1111/j.1749-6632.2009.05333.x). Both are a decomposition of various demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates pulled by the [tidycensus](https://CRAN.R-project.org/package=tidycensus) package. +The `ndi` package is a suite of `R` functions to compute various metrics of socio-economic deprivation and disparity in the United States. Some metrics are considered "spatial" because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other metrics are "aspatial" because they only consider the value within each census geography. Two types of aspatial NDI are available: (1) based on [Messer et al. (2006)](https://doi.org/10.1007/s11524-006-9094-x) and (2) based on [Andrews et al. (2020)](https://doi.org/10.1080/17445647.2020.1750066) and [Slotman et al. (2022)](https://doi.org/10.1016/j.dib.2022.108002) who use variables chosen by [Roux and Mair (2010)](https://doi.org/10.1111/j.1749-6632.2009.05333.x). Both are a decomposition of various demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates (ACS-5; 2010-2020) pulled by the [tidycensus](https://CRAN.R-project.org/package=tidycensus) package. Using data from the ACS-5 (2009-2020), the `ndi` package can also (1) compute the spatial Racial Isolation Index (RI) based on [Anthopolos et al. (2011)](https://www.doi.org/10.1016/j.sste.2011.06.002), (2) the spatial Educational Isolation Index (EI) based on [Bravo et al. (2021)](https://www.doi.org/10.3390/ijerph18179384), and (3) retrieve the aspatial Gini Index based on [Gini (1921)](https://www.doi.org/10.2307/2223319). ### Installation @@ -40,11 +40,20 @@ To install the development version from GitHub: +anthopolos +Compute the Racial Isolation Index (RI) based on Anthopolos et al. (2011). + +bravo +Compute the Educational Isolation Index (EI) based on Bravo et al. (2021). + +gini +Retrieve the Gini Index based on Gini (1921). + messer -Compute NDI based on Messer et al. (2006). +Compute the Neighboorhood Deprivation Index (NDI) based on Messer et al. (2006). powell_wiley -Compute NDI based on Andrews et al. (2020) and Slotman et al. (2022) with variables chosen by Roux and Mair (2010). +Compute the Neighboorhood Deprivation Index (NDI) based on Andrews et al. (2020) and Slotman et al. (2022) with variables chosen by Roux and Mair (2010). @@ -60,7 +69,7 @@ See also the list of [contributors](https://github.com/idblr/ndi/graphs/contribu ### Getting Started * Step 1: You must obtain a unique access key from the U.S. Census Bureau. Follow [this link](http://api.census.gov/data/key_signup.html) to obtain one. -* Step 2: Specify your access key in the `messer()` or `powell_wiley()` functions using the `key` argument or by using the `census_api_key()` function from the `tidycensus` package before running the `messer()` or `powell_wiley()` functions (see an example below). +* Step 2: Specify your access key in the `anthopolos()`, `bravo()`, `gini()`, `messer()`, or `powell_wiley()` functions using the internal `key` argument or by using the `census_api_key()` function from the `tidycensus` package before running the `anthopolos()`, `bravo()`, `gini()`, `messer()`, or `powell_wiley()` functions (see an example below). ### Usage @@ -72,8 +81,8 @@ See also the list of [contributors](https://github.com/idblr/ndi/graphs/contribu library(ndi) library(ggplot2) library(sf) -library(tidycensus) # a dependency for the "ndi"" package -library(tigris) # a dependency for the "ndi"" package +library(tidycensus) # a dependency for the "ndi" package +library(tigris) # a dependency for the "ndi" package # -------- # # Settings # @@ -270,6 +279,92 @@ cor(ndi2020DC$NDI.messer, ndi2020DC$NDI.powell_wiley, use = "complete.obs") # Pe table(ndi2020DC$NDIQuart, ndi2020DC$NDIQuint) ``` +``` r +# ------------------- # +# Retrieve Gini Index # +# ------------------- # + +# Gini Index based on Gini (1921) from the ACS-5 +gini2020DC <- gini(state = "DC", year = 2020) + +# Obtain the 2020 census tracts from the "tigris" package +tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) + +# Join the Gini Index values to the census tract geometry +gini2020DC <- merge(tractsDC, gini2020DC$gini, by = "GEOID") + +ggplot2::ggplot() + + ggplot2::geom_sf(data = gini2020DC, + ggplot2::aes(fill = gini), + color = "white") + + ggplot2::theme_bw() + + ggplot2::scale_fill_viridis_c() + + ggplot2::labs(fill = "Index (Continuous)", + caption = "Source: U.S. Census ACS 2016-2020 estimates")+ + ggplot2::ggtitle("Gini Index\nGrey color denotes no data", + subtitle = "Washington, D.C. tracts") +``` + +![](man/figures/gini.png) + +``` r +# -------------------------------------------- # +# Compute Racial Isoliation Index (Anthopolos) # +# -------------------------------------------- # + +# Racial Isolation Index based on Anthopolos et al (2011) +## Selected subgroup: Not Hispanic or Latino, Black or African American alone +ri2020DC <- anthopolos(state = "DC", year = 2020, subgroup = "NHoLB") + +# Obtain the 2020 census tracts from the "tigris" package +tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) + +# Join the RI (Anthopolos) values to the census tract geometry +ri2020DC <- merge(tractsDC, ri2020DC$ri, by = "GEOID") + +ggplot2::ggplot() + + ggplot2::geom_sf(data = ri2020DC, + ggplot2::aes(fill = RI), + color = "white") + + ggplot2::theme_bw() + + ggplot2::scale_fill_viridis_c() + + ggplot2::labs(fill = "Index (Continuous)", + caption = "Source: U.S. Census ACS 2016-2020 estimates")+ + ggplot2::ggtitle("Racial Isolation Index\nNot Hispanic or Latino, Black or African American alone (Anthopolos)", + subtitle = "Washington, D.C. tracts (not corrected for edge effects)") +``` + +![](man/figures/ri.png) + +``` r +# -------------------------------------------- # +# Compute Educational Isoliation Index (Bravo) # +# -------------------------------------------- # + +# Educational Isolation Index based on Bravo et al (2021) +## Selected subgroup: without four-year college degree +ei2020DC <- bravo(state = "DC", year = 2020, subgroup = c("LtHS", "HSGiE", "SCoAD")) + +# Obtain the 2020 census tracts from the "tigris" package +tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) + +# Join the EI (Bravo) values to the census tract geometry +ei2020DC <- merge(tractsDC, ei2020DC$ei, by = "GEOID") + +ggplot2::ggplot() + + ggplot2::geom_sf(data = ei2020DC, + ggplot2::aes(fill = EI), + color = "white") + + ggplot2::theme_bw() + + ggplot2::scale_fill_viridis_c() + + ggplot2::labs(fill = "Index (Continuous)", + caption = "Source: U.S. Census ACS 2016-2020 estimates")+ + ggplot2::ggtitle("Educational Isolation Index\nWithout a four-year college degree (Bravo)", + subtitle = "Washington, D.C. tracts (not corrected for edge effects)") +``` + +![](man/figures/ei.png) + ### Funding This package was developed while the author was a postdoctoral fellow supported by the [Cancer Prevention Fellowship Program](https://cpfp.cancer.gov/) at the [National Cancer Institute](https://www.cancer.gov/). diff --git a/cran-comments.md b/cran-comments.md index cef2c25..ca6eaf3 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,12 +1,21 @@ -## This is the first resubmission +## This is the second resubmission -* Actions taken since previous submission based on feedback from Beni Altmann: - * Fixed invalid URL and typos in package README.md +* Actions taken since previous submission: +* Added `anthopolos()` function to compute the Racial Isolation Index (RI) based on based on [Anthopolos et al. (2011)](https://www.doi.org/10.1016/j.sste.2011.06.002) for specified counties/tracts 2009-2020 +* Added `bravo()` function to compute the Educational Isolation Index (EI) based on based on [Bravo et al. (2021)](https://www.doi.org/10.3390/ijerph18179384) for specified counties/tracts 2009-2020 +* Added `gini()` function to retrieve the Gini Index based on [Gini (1921)](https://www.doi.org/10.2307/2223319) for specified counties/tracts 2009-2020 +* `Matrix` and `sf` are now Depends +* Updated vignette and README for new features +* Fixed typos throughout documentation +* Updated Description in DESCRIPTION +* Updated 'package.R' with new details and section +* Updated CITATION with new citations for the additional metrics -* Documentation for DESCRIPTION and README references the following DOI, which throws a NOTE but is a valid URL: - * +* Documentation for DESCRIPTION, README, NEWS, and vignette references the following DOIs, which throws a NOTE but are a valid URL: + * https://doi.org/10.1111/j.1749-6632.2009.05333.x + * https://doi.org/10.2307/2223319 -* Some tests and examples for `messer()` and `powell_wiley()` functions require require a Census API key so they are skipped if NULL or not run +* Some tests and examples for `anthopolos()`, `bravo()`, `gini()`, `messer()` and `powell_wiley()` functions require require a Census API key so they are skipped if NULL or not run ## Test environments * local OS X install, R 4.2.1 diff --git a/inst/CITATION b/inst/CITATION index 3275d50..b335da0 100755 --- a/inst/CITATION +++ b/inst/CITATION @@ -14,6 +14,69 @@ citEntry(entry = "manual", "Accessed by: https://cran.r-project.org/package=ndi") ) +citEntry(entry = "Article", + title = "A spatial measure of neighborhood level racial isolation applied to low birthweight, preterm birth, and birthweight in North Carolina", + author = personList(as.person("Rebecca Anthopolos"), + as.person("Sherman A. James"), + as.person("Alan E. Gelfand"), + as.person("Marie Lynn Miranda")), + journal = "Spatial and Spatio-temporal Epidemiology", + year = "2011", + volume = "2", + number = "4", + pages = "235--246", + doi = "10.1016/j.sste.2011.06.002", + + textVersion = + paste("Rebecca Anthopolos, Sherman A. James, Alan E. Gelfand, Marie Lynn Miranda (2011).", + "A spatial measure of neighborhood level racial isolation applied to low birthweight, preterm birth, and birthweight in North Carolina.", + "Spatial and Spatio-temporal Epidemiology, 2(4), 235-246.", + "DOI: 10.1016/j.sste.2011.06.002"), + + header = "If you computed RI (Anthopolos) values, please also cite:" +) + +citEntry(entry = "Article", + title = "Assessing Disparity Using Measures of Racial and Educational Isolation", + author = personList(as.person("Mercedes A. Bravo"), + as.person("Man Chong Leong"), + as.person("Alan E. Gelfand"), + as.person("Marie Lynn Miranda")), + journal = "International Journal of Environmental Research and Public Health", + year = "2021", + volume = "18", + number = "17", + pages = "9384", + doi = "10.3390/ijerph18179384", + + textVersion = + paste("Mercedes A. Bravo, Man Chong Leong, Alan E. Gelfand, Marie Lynn Miranda (2021).", + "Assessing Disparity Using Measures of Racial and Educational Isolation.", + "International Journal of Environmental Research and Public Health, 18(17), 9384.", + "DOI: 10.3390/ijerph18179384"), + + header = "If you computed EI (Bravo) values, please also cite:" +) + +citEntry(entry = "Article", + title = "Measurement of Inequality of Incomes", + author = personList(as.person("Corrado Gini")), + journal = "The Economic Journal", + year = "1921", + volume = "31", + number = "121", + pages = "124-126", + doi = "10.2307/2223319", + + textVersion = + paste("Corrado Gini (1921).", + "Measurement of Inequality of Incomes.", + "The Economic Journal, 31(121), 124-126.", + "DOI: 10.2307/2223319"), + + header = "If you retrieved Gini Index values, please also cite:" +) + citEntry(entry = "Article", title = "The development of a standardized neighborhood deprivation index", author = personList(as.person("Lynne C. Messer"), diff --git a/man/anthopolos.Rd b/man/anthopolos.Rd new file mode 100644 index 0000000..56aa038 --- /dev/null +++ b/man/anthopolos.Rd @@ -0,0 +1,77 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/anthopolos.R +\name{anthopolos} +\alias{anthopolos} +\title{Racial Isolation Index based on Anthopolos et al. (2011)} +\usage{ +anthopolos(geo = "tract", year = 2020, subgroup, quiet = FALSE, ...) +} +\arguments{ +\item{geo}{Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}.} + +\item{year}{Numeric. The year to compute the estimate. The default is 2020 and the years between 2009 and 2020 are currently available.} + +\item{subgroup}{Character string specifying the racial/ethnic subgroup(s). See Details for available choices.} + +\item{quiet}{Logical. If TRUE, will display messages about potential missing census information. The default is FALSE.} + +\item{...}{Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics} +} +\value{ +An object of class 'list'. This is a named list with the following components: + +\describe{ +\item{\code{ri}}{An object of class 'tbl' for the GEOID, name, RI, and raw census values of specified census geographies.} +\item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute the RI.} +} +} +\description{ +Compute the Racial Isolation Index (Anthopolos) values for selected subgroup(s). +} +\details{ +This function will compute the Racial Isolation Index (RI) of U.S. census tracts or counties for a specified geographical extent (e.g., entire U.S. or a single state) based on Anthopolos et al. (2011) \doi{10.1016/j.sste.2011.06.002} who originally designed the metric for the racial isolation of non-Hispanic Black individuals. This function provides the computation of RI for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals). + +The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the geospatial computation. The yearly estimates available for 2009 through 2020 when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are: +\itemize{ +\item{B03002_002: }{Not Hispanic or Latino "NHoL"} +\item{B03002_003: }{Not Hispanic or Latino, White alone "NHoLW"} +\item{B03002_004: }{Not Hispanic or Latino, Black or African American alone "NHoLB"} +\item{B03002_005: }{Not Hispanic or Latino, American Indian and Alaska Native alone "NHoLAIAN"} +\item{B03002_006: }{Not Hispanic or Latino, Asian alone "NHoLA"} +\item{B03002_007: }{Not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone "NHoLNHOPI"} +\item{B03002_008: }{Not Hispanic or Latino, Some other race alone "NHoLSOR"} +\item{B03002_009: }{Not Hispanic or Latino, Two or more races "NHoLTOMR"} +\item{B03002_010: }{Not Hispanic or Latino, Two races including Some other race "NHoLTRiSOR"} +\item{B03002_011: }{Not Hispanic or Latino, Two races excluding Some other race, and three or more races "NHoLTReSOR"} +\item{B03002_012: }{Hispanic or Latino "HoL"} +\item{B03002_013: }{Hispanic or Latino, White alone "HoLW"} +\item{B03002_014: }{Hispanic or Latino, Black or African American alone "HoLB"} +\item{B03002_015: }{Hispanic or Latino, American Indian and Alaska Native alone "HoLAIAN"} +\item{B03002_016: }{Hispanic or Latino, Asian alone "HoLA"} +\item{B03002_017: }{Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone "HoLNHOPI"} +\item{B03002_018: }{Hispanic or Latino, Some other race alone "HoLSOR"} +\item{B03002_019: }{Hispanic or Latino, Two or more races "HoLTOMR"} +\item{B03002_020: }{Hispanic or Latino, Two races including Some other race "HoLTRiSOR"} +\item{B03002_021: }{Hispanic or Latino, Two races excluding Some other race, and three or more races "HoLTReSOR"} +} + +Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. NOTE: Current version does not correct for edge effects (e.g., census geographies along the specified spatial extent border, coastline, or U.S.-Mexico / U.S.-Canada border) may have few neighboring census geographies and RI values in these census geographies may be unstable. A stop-gap solution for the former source of edge effect is to compute the RI for neighboring census geographies (i.e., the states bordering a study area of interest) and then use the estimates of the study area of interest. + +A census geography (and its neighbors) that has nearly all of its population who identify with the specified race/ethnicity subgroup(s) (e.g., non-Hispanic or Latino, Black or African American alone) will have an RI value that is close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population who identify with the specified race/ethnicity subgroup(s) (e.g., not non-Hispanic or Latino, Black or African American alone) will have an RI value that is close to 0. +} +\examples{ +\dontrun{ +# Wrapped in \dontrun{} because these examples require a Census API key. + + # Tract-level metric (2020) + anthopolos(geo = "tract", state = "GA", year = 2020, subgroup = c("NHoLB", "HoLB")) + + # County-level metric (2020) + anthopolos(geo = "county", state = "GA", year = 2020, subgroup = c("NHoLB", "HoLB")) + +} + +} +\seealso{ +\code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). +} diff --git a/man/bravo.Rd b/man/bravo.Rd new file mode 100644 index 0000000..5e3b228 --- /dev/null +++ b/man/bravo.Rd @@ -0,0 +1,62 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/bravo.R +\name{bravo} +\alias{bravo} +\title{Educational Isolation Index based on Bravo et al. (2021)} +\usage{ +bravo(geo = "tract", year = 2020, subgroup, quiet = FALSE, ...) +} +\arguments{ +\item{geo}{Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}.} + +\item{year}{Numeric. The year to compute the estimate. The default is 2020 and the years between 2009 and 2020 are currently available.} + +\item{subgroup}{Character string specifying the educational attainment category(ies). See Details for available choices.} + +\item{quiet}{Logical. If TRUE, will display messages about potential missing census information. The default is FALSE.} + +\item{...}{Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics} +} +\value{ +An object of class 'list'. This is a named list with the following components: + +\describe{ +\item{\code{ei}}{An object of class 'tbl' for the GEOID, name, EI, and raw census values of specified census geographies.} +\item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute the EI.} +} +} +\description{ +Compute the Educational Isolation Index (Bravo) values for selected educational attainment category(ies). +} +\details{ +This function will compute the Educational Isolation Index (EI) of U.S. census tracts or counties for a specified geographical extent (e.g., entire U.S. or a single state) based on Bravo et al. (2021) \doi{10.3390/ijerph18179384} who originally designed the metric for the educational isolation of individual without a college degree. This function provides the computation of EI for any of the U.S. Census Bureau educational attainment levels. + +The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the geospatial computation. The yearly estimates available for 2009 through 2020 when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The five educational attainment levels (U.S. Census Bureau definitions) are: +\itemize{ +\item{B06009_002: }{Less than high school graduate "LtHS"} +\item{B06009_003: }{High school graduate (includes equivalency) "HSGiE"} +\item{B06009_004: }{Some college or associate's degree "SCoAD"} +\item{B06009_005: }{Bachelor's degree "BD"} +\item{B06009_006: }{Graduate or professional degree "GoPD"} +} + +Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. NOTE: Current version does not correct for edge effects (e.g., census geographies along the specified spatial extent border, coastline, or U.S.-Mexico / U.S.-Canada border) may have few neighboring census geographies and EI values in these census geographies may be unstable. A stop-gap solution for the former source of edge effect is to compute the EI for neighboring census geographies (i.e., the states bordering a study area of interest) and then use the estimates of the study area of interest. + +A census geography (and its neighbors) that has nearly all of its population with the specified educational attainment category (e.g., a Bachelor's degree or more) will have an EI value that is close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population with the specified educational attainment category (e.g., less than a Bachelor's degree) will have an EI value that is close to 0. +} +\examples{ +\dontrun{ +# Wrapped in \dontrun{} because these examples require a Census API key. + + # Tract-level metric (2020) + bravo(geo = "tract", state = "GA", year = 2020, subgroup = c("LtHS", "HSGiE")) + + # County-level metric (2020) + bravo(geo = "county", state = "GA", year = 2020, subgroup = c("LtHS", "HSGiE")) + +} + +} +\seealso{ +\code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). +} diff --git a/man/figures/ei.png b/man/figures/ei.png new file mode 100644 index 0000000..6c87c56 Binary files /dev/null and b/man/figures/ei.png differ diff --git a/man/figures/gini.png b/man/figures/gini.png new file mode 100644 index 0000000..dd05541 Binary files /dev/null and b/man/figures/gini.png differ diff --git a/man/figures/ri.png b/man/figures/ri.png new file mode 100644 index 0000000..992b59e Binary files /dev/null and b/man/figures/ri.png differ diff --git a/man/gini.Rd b/man/gini.Rd new file mode 100644 index 0000000..c72b2fa --- /dev/null +++ b/man/gini.Rd @@ -0,0 +1,53 @@ +% Generated by roxygen2: do not edit by hand +% Please edit documentation in R/gini.R +\name{gini} +\alias{gini} +\title{Gini Index based on Gini (1921)} +\usage{ +gini(geo = "tract", year = 2020, quiet = FALSE, ...) +} +\arguments{ +\item{geo}{Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}.} + +\item{year}{Numeric. The year to compute the estimate. The default is 2020 and the years between 2009 and 2020 are currently available.} + +\item{quiet}{Logical. If TRUE, will display messages about potential missing census information} + +\item{...}{Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics} +} +\value{ +An object of class 'list'. This is a named list with the following components: + +\describe{ +\item{\code{gini}}{An object of class 'tbl' for the GEOID, name, and Gini index of specified census geographies.} +\item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for the Gini index.} +} +} +\description{ +Retrieve the Gini Index values. +} +\details{ +This function will retrieve the Gini Index of U.S. census tracts or counties for a specified geographical extent (e.g., entire U.S. or a single state) based on Gini (1921) \doi{10.2307/2223319}. + +The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey estimate of the Gini Index (ACS: B19083). The estimate are available for 2009 through 2020 when ACS-5 data are available but are available from other U.S. Census Bureau surveys. + +Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. + +According to the U.S. Census Bureau \url{https://www.census.gov/topics/income-poverty/income-inequality/about/metrics/gini-index.html}: "The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income). The Gini is based on the difference between the Lorenz curve (the observed cumulative income distribution) and the notion of a perfectly equal income distribution." +} +\examples{ +\dontrun{ +# Wrapped in \dontrun{} because these examples require a Census API key. + + # Tract-level metric (2020) + gini(geo = "tract", state = "GA", year = 2020) + + # County-level metric (2020) + gini(geo = "county", state = "GA", year = 2020) + +} + +} +\seealso{ +\code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}). +} diff --git a/man/messer.Rd b/man/messer.Rd index 7a5666c..c877335 100644 --- a/man/messer.Rd +++ b/man/messer.Rd @@ -21,7 +21,7 @@ messer(geo = "tract", year = 2020, imp = FALSE, quiet = FALSE, ...) An object of class 'list'. This is a named list with the following components: \describe{ -\item{\code{ndi}}{An object of class 'tbl' for the GEOID, name, NDI (standardized), NDI (quartile), and raw census values of specified census tracts.} +\item{\code{ndi}}{An object of class 'tbl' for the GEOID, name, NDI (standardized), NDI (quartile), and raw census values of specified census geographies.} \item{\code{pca}}{An object of class 'principal', returns the output of \code{\link[psych]{principal}} used to compute the NDI values.} \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute the NDI.} } diff --git a/man/ndi-package.Rd b/man/ndi-package.Rd index b2d2093..d23aa22 100644 --- a/man/ndi-package.Rd +++ b/man/ndi-package.Rd @@ -6,21 +6,27 @@ \alias{ndi} \title{The ndi Package: Neighborhood Deprivation Indices} \description{ -Computes various geospatial neighborhood deprivation indices in the United States +Computes various metrics of socio-economic deprivation and disparity in the United States based on information available from the U.S. Census Bureau. } \details{ -The 'ndi' package computes various neighborhood deprivation indices (NDI), including: (1) based on Messer et al. (2006) \doi{10.1007/s11524-006-9094-x} and (2) based on Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} who uses variables chosen by Roux and Mair (2010)] \doi{10.1111/j.1749-6632.2009.05333.x}. Both are a decomposition of multiple demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates. +The 'ndi' package computes various metrics of socio-economic deprivation and disparity in the United States. Some metrics are considered "spatial" because they consider the values of neighboring (i.e., adjacent) census geographies in their computation, while other metrics are "aspatial" because they only consider the value within each census geography. Two types of aspatial neighborhood deprivation indices (NDI) are available: (1) based on Messer et al. (2006) \doi{10.1007/s11524-006-9094-x} and (2) based on Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} who use variables chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x}. Both are a decomposition of multiple demographic characteristics from the U.S. Census Bureau American Community Survey 5-year estimates. (ACS-5; 2010-2020). Using data from the ACS-5 (2009-2020), the package can also (1) compute the spatial Racial Isolation Index (RI) based on Anthopolos et al. (2011) \doi{10.1016/j.sste.2011.06.002}, (2) compute the spatial Educational Isolation Index (EI) based on Bravo et al. (2021) \doi{10.3390/ijerph18179384}, and (3) retrieve the aspatial Gini Index based on Gini (1921) \doi{10.2307/2223319}. Key content of the 'ndi' package include:\cr -\bold{Neighborhood Deprivation Indices} +\bold{Metrics of Socio-Economic Deprivation and Disparity} -\code{\link{messer}} Computes NDI values based on Messer et al. (2006) \doi{10.1007/s11524-006-9094-x}. +\code{\link{anthopolos}} Computes the spatial Racial Isolation Index (RI) based on Anthopolos (2011) \doi{10.1016/j.sste.2011.06.002}. -\code{\link{powell_wiley}} Computes NDI values based on Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} who uses variables chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x}. +\code{\link{bravo}} Computes the spatial Educational Isolation Index (EI) based on Bravo (2021) \doi{10.3390/ijerph18179384}. + +\code{\link{gini}} Retrieves the aspatial Gini Index based on Gini (1921) \doi{10.2307/2223319}. + +\code{\link{messer}} Computes the aspatial Neighborhood Deprivation Index (NDI) based on Messer et al. (2006) \doi{10.1007/s11524-006-9094-x}. + +\code{\link{powell_wiley}} Computes the aspatial Neighborhood Deprivation Index (NDI) based on Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} and Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002} who use variables chosen by Roux and Mair (2010) \doi{10.1111/j.1749-6632.2009.05333.x}. } \section{Dependencies}{ - The 'ndi' package relies heavily upon \code{\link{tidycensus}} and \code{\link{psych}} for computing the neighborhood deprivation indices. The \code{\link{messer}} function builds upon code developed by Hruska et al. (2022) \doi{10.17605/OSF.IO/M2SAV} by fictionalizing, adding percent households earning <$30,000 per year to the NDI computation, and providing the option for computing the ACS-5 2006-2010 NDI values. There was no code companion to compute NDI included in Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} or Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002}, but the package maintainer worked directly with the authors to replicate their SAS code in R. + The 'ndi' package relies heavily upon \code{\link{tidycensus}} and \code{\link{psych}} for computing the neighborhood deprivation indices. The \code{\link{messer}} function builds upon code developed by Hruska et al. (2022) \doi{10.17605/OSF.IO/M2SAV} by fictionalizing, adding percent households earning <$30,000 per year to the NDI computation, and providing the option for computing the ACS-5 2006-2010 NDI values. There is no code companion to compute NDI included in Andrews et al. (2020) \doi{10.1080/17445647.2020.1750066} or Slotman et al. (2022) \doi{10.1016/j.dib.2022.108002}, respectively, but the package maintainer worked directly with the authors to replicate their SAS code in R. The spatial metrics RI and EI rely on the \code{\link{sf}} and \code{\link{Matrix}} packages to compute the geospatial adjacency matrix between census geographies. There is no code companion to compute RI or EI included in Anthopolos et al. (2011) \doi{10.1016/j.sste.2011.06.002} or Bravo et al. (2021) \doi{10.3390/ijerph18179384}, respectively. } \author{ diff --git a/man/powell_wiley.Rd b/man/powell_wiley.Rd index afb663e..f3c2864 100644 --- a/man/powell_wiley.Rd +++ b/man/powell_wiley.Rd @@ -21,7 +21,7 @@ powell_wiley(geo = "tract", year, imp = FALSE, quiet = FALSE, ...) An object of class 'list'. This is a named list with the following components: \describe{ -\item{\code{ndi}}{An object of class 'tbl' for the GEOID, name, NDI continuous, NDI quintiles, and raw census values of specified census tracts.} +\item{\code{ndi}}{An object of class 'tbl' for the GEOID, name, NDI continuous, NDI quintiles, and raw census values of specified census geographies.} \item{\code{pca}}{An object of class 'principal', returns the output of \code{\link[psych]{principal}} used to compute the NDI values.} \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute the NDI.} \item{\code{cronbach}}{An object of class 'character' or 'numeric' for the results of the Cronbach's alpha calculation. If only one factor is computed, a message is returned. If more than one factor is computed, the Cronbach's alpha is calculated and should check that it is >0.7 for respectable internal consistency between factors.} diff --git a/tests/testthat/test-anthopolos.R b/tests/testthat/test-anthopolos.R new file mode 100644 index 0000000..8049e97 --- /dev/null +++ b/tests/testthat/test-anthopolos.R @@ -0,0 +1,40 @@ +context("anthopolos") + +####################### +# anthopolos testthat # +####################### + +test_that("anthopolos throws error with invalid arguments", { + + # Unavailable geography + expect_error(anthopolos(geo = "zcta", state = "DC", year = 2020, subgroup = "NHoLB", quiet = TRUE)) + + # Unavailable year + expect_error(anthopolos(state = "DC", year = 2005, subgroup = "NHoLB", quiet = TRUE)) + + # Unavailable subgroup + expect_error(anthopolos(state = "DC", year = 2020, subgroup = "terran", quiet = TRUE)) + + skip_if(Sys.getenv("CENSUS_API_KEY") == "") + + # Incorrect state + expect_error(anthopolos(state = "AB", year = 2020)) + + # Unavailable geography for DC (only 1 'county' in DC so, alone, NDI cannot be computed) + expect_error(anthopolos(geo = "county", state = "DC", year = 2009, quiet = TRUE)) + +} +) + +test_that("anthopolos works", { + + skip_if(Sys.getenv("CENSUS_API_KEY") == "") + + expect_message(anthopolos(state = "DC", year = 2020, subgroup = c("NHoLB", "HoLB"))) + + expect_silent(anthopolos(state = "DC", year = 2020, subgroup = "NHoLB", quiet = TRUE)) + + expect_silent(anthopolos(state = "DC", year = 2020, subgroup = c("NHoLB", "HoLB"), quiet = TRUE)) + +} +) diff --git a/tests/testthat/test-bravo.R b/tests/testthat/test-bravo.R new file mode 100644 index 0000000..204b256 --- /dev/null +++ b/tests/testthat/test-bravo.R @@ -0,0 +1,40 @@ +context("bravo") + +################## +# bravo testthat # +################## + +test_that("bravo throws error with invalid arguments", { + + # Unavailable geography + expect_error(bravo(geo = "zcta", state = "DC", year = 2020, subgroup = "LtHS", quiet = TRUE)) + + # Unavailable year + expect_error(bravo(state = "DC", year = 2005, subgroup = "LtHS", quiet = TRUE)) + + # Unavailable subgroup + expect_error(bravo(state = "DC", year = 2020, subgroup = "terran", quiet = TRUE)) + + skip_if(Sys.getenv("CENSUS_API_KEY") == "") + + # Incorrect state + expect_error(bravo(state = "AB", year = 2020)) + + # Unavailable geography for DC (only 1 'county' in DC so, alone, NDI cannot be computed) + expect_error(bravo(geo = "county", state = "DC", year = 2009, quiet = TRUE)) + +} +) + +test_that("bravo works", { + + skip_if(Sys.getenv("CENSUS_API_KEY") == "") + + expect_silent(bravo(state = "DC", year = 2020, subgroup = c("LtHS", "HSGiE"))) + + expect_silent(bravo(state = "DC", year = 2020, subgroup = "LtHS", quiet = TRUE)) + + expect_silent(bravo(state = "DC", year = 2020, subgroup = c("LtHS", "HSGiE"), quiet = TRUE)) + +} +) diff --git a/tests/testthat/test-gini.R b/tests/testthat/test-gini.R new file mode 100644 index 0000000..2116ac2 --- /dev/null +++ b/tests/testthat/test-gini.R @@ -0,0 +1,35 @@ +context("gini") + +################### +# gin i testthat # +################### + +test_that("gini throws error with invalid arguments", { + + # Unavailable geography + expect_error(gini(geo = "zcta", state = "DC", year = 2020, quiet = TRUE)) + + # Unavailable year + expect_error(gini(state = "DC", year = 2005, quiet = TRUE)) + + skip_if(Sys.getenv("CENSUS_API_KEY") == "") + + # Incorrect state + expect_error(gini(state = "AB", year = 2020)) + + # Unavailable geography for DC (only 1 'county' in DC so, alone, NDI cannot be computed) + expect_error(gini(geo = "county", state = "DC", year = 2009, quiet = TRUE)) + +} +) + +test_that("gini works", { + + skip_if(Sys.getenv("CENSUS_API_KEY") == "") + + expect_message(gini(state = "DC", year = 2020)) + + expect_silent(gini(state = "DC", year = 2020, quiet = TRUE)) + +} +) diff --git a/tests/testthat/test-messer.R b/tests/testthat/test-messer.R index ead3317..b36b952 100644 --- a/tests/testthat/test-messer.R +++ b/tests/testthat/test-messer.R @@ -7,10 +7,10 @@ context("messer") test_that("messer throws error with invalid arguments", { # Unavailable geography - expect_error(messer(geo = "zcta", state = "DC", year = 2009, quiet = TRUE)) + expect_error(messer(geo = "zcta", state = "DC", year = 2020, quiet = TRUE)) # Unavailable year - expect_error(messer(state = "DC", year = 2009, quiet = TRUE)) + expect_error(messer(state = "DC", year = 2005, quiet = TRUE)) skip_if(Sys.getenv("CENSUS_API_KEY") == "") diff --git a/tests/testthat/test-powell_wiley.R b/tests/testthat/test-powell_wiley.R index 2e3f4e8..1b6ac23 100644 --- a/tests/testthat/test-powell_wiley.R +++ b/tests/testthat/test-powell_wiley.R @@ -7,10 +7,10 @@ context("powell_wiley") test_that("powell_wiley throws error with invalid arguments", { # Unavailable geography - expect_error(powell_wiley(geo = "zcta", state = "DC", year = 2009, quiet = TRUE)) + expect_error(powell_wiley(geo = "zcta", state = "DC", year = 2020, quiet = TRUE)) # Unavailable year - expect_error(powell_wiley(state = "DC", year = 2009, quiet = TRUE)) + expect_error(powell_wiley(state = "DC", year = 2005, quiet = TRUE)) skip_if(Sys.getenv("CENSUS_API_KEY") == "") diff --git a/vignettes/vignette.Rmd b/vignettes/vignette.Rmd index db2a7af..ae7b81e 100644 --- a/vignettes/vignette.Rmd +++ b/vignettes/vignette.Rmd @@ -33,7 +33,7 @@ tidycensus::census_api_key(private_key) tidycensus::census_api_key("...") # INSERT YOUR OWN KEY FROM U.S. CENSUS API ``` -### Calculate NDI (Messer) +### Compute NDI (Messer) Compute the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., census tracts. This metric is based on [Messer et al. (2006)](https://doi.org/10.1007/s11524-006-9094-x) with the following socio-economic status (SES) variables: @@ -102,7 +102,7 @@ ggplot2::ggplot() + ggplot2::labs(fill = "Index (Continuous)", caption = "Source: U.S. Census ACS 2006-2010 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Messer)", - subtitle = "Georgia, U.S.A., census tracts as the referent") + subtitle = "GA census tracts as the referent") ## Categorical Index ### Rename "9-NDI not avail" level as NA for plotting @@ -125,7 +125,7 @@ ggplot2::ggplot() + ggplot2::labs(fill = "Index (Categorical)", caption = "Source: U.S. Census ACS 2006-2010 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles", - subtitle = "Georgia, U.S.A., census tracts as the referent") + subtitle = "GA census tracts as the referent") ``` The results above are at the tract-level. The NDI (Messer) values can also be calculated at the county level. @@ -148,7 +148,7 @@ ggplot2::ggplot() + ggplot2::labs(fill = "Index (Continuous)", caption = "Source: U.S. Census ACS 2006-2010 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Messer)", - subtitle = "Georgia, U.S.A., counties as the referent") + subtitle = "GA counties as the referent") ## Categorical Index @@ -168,12 +168,12 @@ ggplot2::ggplot() + ggplot2::labs(fill = "Index (Categorical)", caption = "Source: U.S. Census ACS 2006-2010 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles", - subtitle = "Georgia, U.S.A., counties as the referent") + subtitle = "GA counties as the referent") ``` -### Calculate NDI (Powell-Wiley) +### Compute NDI (Powell-Wiley) -Compute the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Maryland, Virginia, Washington, D.C., and West Virginia census tracts. This metric is based on [Andrews et al. (2020)](https://doi.org/10.1080/17445647.2020.1750066) and [Slotman et al. (2022)](https://doi.org/10.1016/j.dib.2022.108002) with socio-economic status (SES) variables chosen by [Roux and Mair (2010)](https://doi.org/10.1111/j.1749-6632.2009.05333.x): +Compute the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., census tracts. This metric is based on [Andrews et al. (2020)](https://doi.org/10.1080/17445647.2020.1750066) and [Slotman et al. (2022)](https://doi.org/10.1016/j.dib.2022.108002) with socio-economic status (SES) variables chosen by [Roux and Mair (2010)](https://doi.org/10.1111/j.1749-6632.2009.05333.x): | Characteristic | SES dimension | ACS table source | Description | | -------------- | ------------- | ---------------- | ----------- | @@ -234,13 +234,13 @@ tract2020W <- tigris::tracts(state = "WV", year = 2020, cb = TRUE) tracts2020DMVW <- rbind(tract2020D, tract2020M, tract2020V, tract2020W) # Join the NDI (Powell-Wiley) values to the census tract geometry -DMVW2020powell_wiley <- merge(tracts2020DMVW, powell_wiley2020DMVW$ndi, by = "GEOID") +DMVW2020pw <- merge(tracts2020DMVW, powell_wiley2020DMVW$ndi, by = "GEOID") # Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) -## Maryland, Virginia, Washington, D.C., and West Virginia census tracts +## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., census tracts ## Continuous Index ggplot2::ggplot() + - ggplot2::geom_sf(data = DMVW2020powell_wiley, + ggplot2::geom_sf(data = DMVW2020pw, ggplot2::aes(fill = NDI), color = NA) + ggplot2::geom_sf(data = state2020DMVW, @@ -251,16 +251,16 @@ ggplot2::ggplot() + ggplot2::labs(fill = "Index (Continuous)", caption = "Source: U.S. Census ACS 2016-2020 estimates")+ ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)", - subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia tracts as the referent") + subtitle = "DC, MD, VA, and WV tracts as the referent") ## Categorical Index (Population-weighted quintiles) ### Rename "9-NDI not avail" level as NA for plotting -DMVW2020powell_wiley$NDIQuintNA <- factor(replace(as.character(DMVW2020powell_wiley$NDIQuint), - DMVW2020powell_wiley$NDIQuint == "9-NDI not avail", NA), - c(levels(DMVW2020powell_wiley$NDIQuint)[-6], NA)) +DMVW2020pw$NDIQuintNA <- factor(replace(as.character(DMVW2020pw$NDIQuint), + DMVW2020pw$NDIQuint == "9-NDI not avail", NA), + c(levels(DMVW2020pw$NDIQuint)[-6], NA)) ggplot2::ggplot() + - ggplot2::geom_sf(data = DMVW2020powell_wiley, + ggplot2::geom_sf(data = DMVW2020pw, ggplot2::aes(fill = NDIQuintNA), color = NA) + ggplot2::geom_sf(data = state2020DMVW, @@ -272,26 +272,28 @@ ggplot2::ggplot() + ggplot2::labs(fill = "Index (Categorical)", caption = "Source: U.S. Census ACS 2016-2020 estimates")+ ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles", - subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia tracts as the referent") + subtitle = "DC, MD, VA, and WV tracts as the referent") ``` -Like the NDI (Messer), we also calculate county-level NDI (Powell-Wiley). +Like the NDI (Messer), we also compute county-level NDI (Powell-Wiley). ```{r powell_wiley_county, fig.height=4, fig.width=7} # Obtain the 2020 counties from the "tigris" package -county2010DMVW <- tigris::counties(state = c("DC", "MD", "VA", "WV"), year = 2020, cb = TRUE) +county2020DMVW <- tigris::counties(state = c("DC", "MD", "VA", "WV"), year = 2020, cb = TRUE) # NDI (Powell-Wiley) at the county level (2016-2020) -powell_wiley2020DMVW_county <- ndi::powell_wiley(geo = "county", state = c("DC", "MD", "VA", "WV"), year = 2020) +powell_wiley2020DMVW_county <- ndi::powell_wiley(geo = "county", + state = c("DC", "MD", "VA", "WV"), + year = 2020) # Join the NDI (Powell-Wiley) values to the county geometry -DMVW2020powell_wiley_county <- merge(county2010DMVW, powell_wiley2020DMVW_county$ndi, by = "GEOID") +DMVW2020pw_county <- merge(county2020DMVW, powell_wiley2020DMVW_county$ndi, by = "GEOID") # Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) -## Maryland, Virginia, Washington, D.C., and West Virginia counties +## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., counties ## Continuous Index ggplot2::ggplot() + - ggplot2::geom_sf(data = DMVW2020powell_wiley_county, + ggplot2::geom_sf(data = DMVW2020pw_county, ggplot2::aes(fill = NDI), size = 0.20, color = "white") + @@ -300,17 +302,17 @@ ggplot2::ggplot() + ggplot2::labs(fill = "Index (Continuous)", caption = "Source: U.S. Census ACS 2016-2020 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)", - subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia counties as the referent") + subtitle = "DC, MD, VA, and WV counties as the referent") ## Categorical Index ### Rename "9-NDI not avail" level as NA for plotting -DMVW2020powell_wiley_county$NDIQuintNA <- factor(replace(as.character(DMVW2020powell_wiley_county$NDIQuint), - DMVW2020powell_wiley_county$NDIQuint == "9-NDI not avail", NA), - c(levels(DMVW2020powell_wiley_county$NDIQuint)[-6], NA)) +DMVW2020pw_county$NDIQuintNA <- factor(replace(as.character(DMVW2020pw_county$NDIQuint), + DMVW2020pw_county$NDIQuint == "9-NDI not avail", NA), + c(levels(DMVW2020pw_county$NDIQuint)[-6], NA)) ggplot2::ggplot() + - ggplot2::geom_sf(data = DMVW2020powell_wiley_county, + ggplot2::geom_sf(data = DMVW2020pw_county, ggplot2::aes(fill = NDIQuint), size = 0.20, color = "white") + @@ -320,7 +322,7 @@ ggplot2::ggplot() + ggplot2::labs(fill = "Index (Categorical)", caption = "Source: U.S. Census ACS 2016-2020 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles", - subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia counties as the referent") + subtitle = "DC, MD, VA, and WV counties as the referent") ``` ### Advanced Features @@ -333,20 +335,20 @@ In the `messer()` and `powell_wiley()` functions, missing census characteristics powell_wiley2020DC <- ndi::powell_wiley(state = "DC", year = 2020) # without imputation powell_wiley2020DCi <- ndi::powell_wiley(state = "DC", year = 2020, imp = TRUE) # with imputation -table(is.na(powell_wiley2020DC$ndi$NDI)) # n=2 tracts without NDI (Powell-Wiley) values +table(is.na(powell_wiley2020DC$ndi$NDI)) # n=13 tracts without NDI (Powell-Wiley) values table(is.na(powell_wiley2020DCi$ndi$NDI)) # n=0 tracts without NDI (Powell-Wiley) values # Obtain the 2020 census tracts from the "tigris" package tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE) # Join the NDI (Powell-Wiley) values to the census tract geometry -DC2020powell_wiley <- merge(tract2020DC, powell_wiley2020DC$ndi, by = "GEOID") -DC2020powell_wiley <- merge(DC2020powell_wiley, powell_wiley2020DCi$ndi, by = "GEOID", suffixes = c("_nonimp", "_imp")) +DC2020pw <- merge(tract2020DC, powell_wiley2020DC$ndi, by = "GEOID") +DC2020pw <- merge(DC2020pw, powell_wiley2020DCi$ndi, by = "GEOID", suffixes = c("_nonimp", "_imp")) -# Visualize the NDI (Powell-Wiley) values (2006-2010 5-year ACS) for Washington, D.C., census tracts +# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Washington, D.C., census tracts ## Continuous Index ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020powell_wiley, + ggplot2::geom_sf(data = DC2020pw, ggplot2::aes(fill = NDI_nonimp), size = 0.2, color = "white") + @@ -355,10 +357,10 @@ ggplot2::ggplot() + ggplot2::labs(fill = "Index (Continuous)", caption = "Source: U.S. Census ACS 2016-2020 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Non-Imputed", - subtitle = "Washington, D.C., census tracts as the referent") + subtitle = "DC census tracts as the referent") ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020powell_wiley, + ggplot2::geom_sf(data = DC2020pw, ggplot2::aes(fill = NDI_imp), size = 0.2, color = "white") + @@ -367,16 +369,16 @@ ggplot2::ggplot() + ggplot2::labs(fill = "Index (Continuous)", caption = "Source: U.S. Census ACS 2016-2020 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Imputed", - subtitle = "Washington, D.C., census tracts as the referent") + subtitle = "DC census tracts as the referent") ## Categorical Index ### Rename "9-NDI not avail" level as NA for plotting -DC2020powell_wiley$NDIQuintNA_nonimp <- factor(replace(as.character(DC2020powell_wiley$NDIQuint_nonimp), - DC2020powell_wiley$NDIQuint_nonimp == "9-NDI not avail", NA), - c(levels(DC2020powell_wiley$NDIQuint_nonimp)[-6], NA)) +DC2020pw$NDIQuintNA_nonimp <- factor(replace(as.character(DC2020pw$NDIQuint_nonimp), + DC2020pw$NDIQuint_nonimp == "9-NDI not avail", NA), + c(levels(DC2020pw$NDIQuint_nonimp)[-6], NA)) ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020powell_wiley, + ggplot2::geom_sf(data = DC2020pw, ggplot2::aes(fill = NDIQuintNA_nonimp), size = 0.2, color = "white") + @@ -386,15 +388,15 @@ ggplot2::ggplot() + ggplot2::labs(fill = "Index (Categorical)", caption = "Source: U.S. Census ACS 2016-2020 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Non-Imputed", - subtitle = "Washington, D.C., census tracts as the referent") + subtitle = "DC census tracts as the referent") ### Rename "9-NDI not avail" level as NA for plotting -DC2020powell_wiley$NDIQuintNA_imp <- factor(replace(as.character(DC2020powell_wiley$NDIQuint_imp), - DC2020powell_wiley$NDIQuint_imp == "9-NDI not avail", NA), - c(levels(DC2020powell_wiley$NDIQuint_imp)[-6], NA)) +DC2020pw$NDIQuintNA_imp <- factor(replace(as.character(DC2020pw$NDIQuint_imp), + DC2020pw$NDIQuint_imp == "9-NDI not avail", NA), + c(levels(DC2020pw$NDIQuint_imp)[-6], NA)) ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020powell_wiley, + ggplot2::geom_sf(data = DC2020pw, ggplot2::aes(fill = NDIQuintNA_imp), size = 0.2, color = "white") + @@ -404,7 +406,7 @@ ggplot2::ggplot() + ggplot2::labs(fill = "Index (Categorical)", caption = "Source: U.S. Census ACS 2016-2020 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Imputed", - subtitle = "Washington, D.C., census tracts as the referent") + subtitle = "DC census tracts as the referent") ``` #### Assign the referent (U.S.-Standardized Metric) @@ -429,9 +431,203 @@ ggplot2::ggplot(powell_wiley2017US$ndi, ggplot2::aes(x = NDI)) + ggplot2::geom_histogram(color = "black", fill = "white") + ggplot2::theme_minimal() + ggplot2::ggtitle("Histogram of US-standardized NDI (Powell-Wiley) values (2013-2017)", - subtitle = "U.S. census tracts as the referent (including Alaska, Hawaii, and Washington, D.C.)") + subtitle = "U.S. census tracts as the referent (including AK, HI, and DC)") +``` +The process to compute a US-standardized NDI (Powell-Wiley) took about `r round(time_srr, digits = 1)` minutes to run on a machine with the features listed at the end of the vignette. + +### Additional metrics socio-economic deprivation and disparity + +Since version v0.1.1, the `ndi` package can compute additional metrics of socio-economic deprivation and disparity beyond neighborhood deprivation indices, including: + +1. `anthopolos()` function that computes the Racial Isolation Index (RI) based on [Anthopolos et al. (2011)](https://www.doi.org/10.1016/j.sste.2011.06.002) with data from the ACS-5. +2. `bravo()` function that computes the Educational Isolation Index (EI) based on [Bravo et al. (2021)](https://www.doi.org/10.3390/ijerph18179384) with data from the ACS-5. +3. `gini()` function that retrieves the Gini Index based on [Gini (1921)](https://www.doi.org/10.2307/2223319) from the ACS-5. + +#### Compute Racial Isolation Index (RI) + +Compute the RI (Anthopolos) values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts. This metric is based on [Anthopolos et al. (2011)](https://www.doi.org/10.1016/j.sste.2011.06.002) that assessed the racial isolation of the population that identifies as non-Hispanic or Latino, Black or African American alone. Multiple racial/ethnic subgroups are available in the `anthopolos()` function, including: + +| ACS table source | racial/ethnic subgroup | character for `subgroup` argument | +| -------------- | ------------- | ---------------- | +| B03002_002 | Not Hispanic or Latino | NHoL | +| B03002_003 | Not Hispanic or Latino, White alone | NHoLW | +| B03002_004 | Not Hispanic or Latino, Black or African American alone | NHoLB | +| B03002_005 | Not Hispanic or Latino, American Indian and Alaska Native alone | NHoLAIAN | +| B03002_006 | Not Hispanic or Latino, Asian alone | NHoLA | +| B03002_007 | Not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone | NHoLNHOPI | +| B03002_008 | Not Hispanic or Latino, Some other race alone | NHoLSOR | +| B03002_009 | Not Hispanic or Latino, Two or more races | NHoLTOMR | +| B03002_010 | Not Hispanic or Latino, Two races including Some other race | NHoLTRiSOR | +| B03002_011 | Not Hispanic or Latino, Two races excluding Some other race, and three or more races | NHoLTReSOR | +| B03002_012 | Hispanic or Latino | HoL | +| B03002_013 | Hispanic or Latino, White alone | HoLW | +| B03002_014 | Hispanic or Latino, Black or African American alone | HoLB | +| B03002_015 | Hispanic or Latino, American Indian and Alaska Native alone | HoLAIAN | +| B03002_016 | Hispanic or Latino, Asian alone | HoLA | +| B03002_017 | Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone | HoLNHOPI | +| B03002_018 | Hispanic or Latino, Some other race alone | HoLSOR | +| B03002_019 | Hispanic or Latino, Two or more races | HoLTOMR | +| B03002_020 | Hispanic or Latino, Two races including Some other race | HoLTRiSOR | +| B03002_021 | Hispanic or Latino, Two races excluding Some other race, and three or more races | HoLTReSOR | + +A census geography (and its neighbors) that has nearly all of its population who identify with the specified race/ethnicity subgroup(s) (e.g., Not Hispanic or Latino, Black or African American alone) will have an RI value that is close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population who identify with the specified race/ethnicity subgroup(s) (e.g., not Not Hispanic or Latino, Black or African American alone) will have an RI value that is close to 0. + +```{r anthopolos, fig.height=4, fig.width=7} +anthopolos2010NC <- ndi::anthopolos(state = "NC", year = 2010, subgroup = "NHoLB") + +# Obtain the 2010 census tracts from the "tigris" package +tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE) +# Remove first 9 characters from GEOID for compatibility with tigris information +tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) + +# Obtain the 2010 counties from the "tigris" package +county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE) + +# Join the RI (Anthopolos) values to the census tract geometry +NC2010anthopolos <- merge(tract2010NC, anthopolos2010NC$ri, by = "GEOID") + +# Visualize the RI (Anthopolos) values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts +ggplot2::ggplot() + + ggplot2::geom_sf(data = NC2010anthopolos, + ggplot2::aes(fill = RI), + size = 0.05, + color = "white") + + ggplot2::geom_sf(data = county2010NC, + fill = "transparent", + color = "white", + size = 0.2) + + ggplot2::theme_minimal() + + ggplot2::scale_fill_viridis_c() + + ggplot2::labs(fill = "Index (Continuous)", + caption = "Source: U.S. Census ACS 2006-2010 estimates") + + ggplot2::ggtitle("Racial Isolation Index (Anthopolos), non-Hispanic Black", + subtitle = "NC census tracts (not corrected for edge effects)") +``` + +Current version of `ndi` package does not correct for edge effects (e.g., census geographies along the specified spatial extent border, coastline, or U.S.-Mexico / U.S.-Canada border) may have few neighboring census geographies and RI values in these census geographies may be unstable. A stop-gap solution for the former source of edge effect is to compute the RI for neighboring census geographies (i.e., the states bordering a study area of interest) and then use the estimates of the study area of interest. + +```{r anthopolos_edge, fig.height=4, fig.width=7} +# Compute RI for all census tracts in neighboring states +anthopolos2010GNSTV <- ndi::anthopolos(state = c("GA", "NC", "SC", "TN", "VA"), + year = 2010, subgroup = "NHoLB") + +# Crop to only North Carolina, U.S.A. census tracts +anthopolos2010NCe <- anthopolos2010GNSTV$ri[anthopolos2010GNSTV$ri$GEOID %in% anthopolos2010NC$ri$GEOID, ] + +# Obtain the 2010 census tracts from the "tigris" package +tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE) +# Remove first 9 characters from GEOID for compatibility with tigris information +tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) + +# Obtain the 2010 counties from the "tigris" package +county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE) + +# Join the RI (Anthopolos) values to the census tract geometry +edgeNC2010anthopolos <- merge(tract2010NC, anthopolos2010NCe, by = "GEOID") + +# Visualize the RI (Anthopolos) values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts +ggplot2::ggplot() + + ggplot2::geom_sf(data = edgeNC2010anthopolos, + ggplot2::aes(fill = RI), + size = 0.05, + color = "white") + + ggplot2::geom_sf(data = county2010NC, + fill = "transparent", + color = "white", + size = 0.2) + + ggplot2::theme_minimal() + + ggplot2::scale_fill_viridis_c() + + ggplot2::labs(fill = "Index (Continuous)", + caption = "Source: U.S. Census ACS 2006-2010 estimates") + + ggplot2::ggtitle("Racial Isolation Index (Anthopolos), non-Hispanic Black", + subtitle = "NC census tracts (corrected for interstate edge effects)") +``` + +#### Compute Educational Isolation Index (EI) + +Compute the EI (Bravo) values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts. This metric is based on [Bravo et al. (2021)](https://www.doi.org/10.3390/ijerph18179384) that assessed the educational isolation of the population without a four-year college degree. Multiple educational attainment categories are available in the `bravo()` function, including: + +| ACS table source | educational attainment category | character for `subgroup` argument | +| -------------- | ------------- | ---------------- | +| B06009_002 | Less than high school graduate | LtHS | +| B06009_003 | High school graduate (includes equivalency) | HSGiE | +| B06009_004 | Some college or associate's degree | SCoAD | +| B06009_005 | Bachelor's degree | BD | +| B06009_006 | Graduate or professional degree | GoPD | + +A census geography (and its neighbors) that has nearly all of its population with the specified educational attainment category (e.g., a four-year college degree or more) will have an EI value that is close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population with the specified educational attainment category (e.g., with a four-year college degree) will have an EI value that is close to 0. + +```{r bravo, fig.height=4, fig.width=7} +bravo2010NC <- ndi::bravo(state = "NC", year = 2010, subgroup = c("LtHS", "HSGiE", "SCoAD")) + +# Obtain the 2010 census tracts from the "tigris" package +tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE) +# Remove first 9 characters from GEOID for compatibility with tigris information +tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) + +# Obtain the 2010 counties from the "tigris" package +county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE) + +# Join the RI (Bravo) values to the census tract geometry +NC2010bravo <- merge(tract2010NC, bravo2010NC$ei, by = "GEOID") + +# Visualize the RI (Bravo) values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts +ggplot2::ggplot() + + ggplot2::geom_sf(data = NC2010bravo, + ggplot2::aes(fill = EI), + size = 0.05, + color = "white") + + ggplot2::geom_sf(data = county2010NC, + fill = "transparent", + color = "white", + size = 0.2) + + ggplot2::theme_minimal() + + ggplot2::scale_fill_viridis_c() + + ggplot2::labs(fill = "Index (Continuous)", + caption = "Source: U.S. Census ACS 2006-2010 estimates") + + ggplot2::ggtitle("Educational Isolation Index (Bravo), without a four-year college degree", + subtitle = "NC census tracts (not corrected for edge effects)") +``` + +Can correct for one source of edge effect in the same manner as shown for the RI (Anthopolos) metric. + +#### Retrieve the Educational Isolation Index (EI) + +Retrieve the Gini Index values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts. This metric is based on [Gini (1921)](https://www.doi.org/10.2307/2223319) and the `gini()` function retrieve the estimate from the ACS-5. + +According to the [U.S. Census Bureau](https://www.census.gov/topics/income-poverty/income-inequality/about/metrics/gini-index.html): "The Gini Index is a summary measure of income inequality. The Gini coefficient incorporates the detailed shares data into a single statistic, which summarizes the dispersion of income across the entire income distribution. The Gini coefficient ranges from 0, indicating perfect equality (where everyone receives an equal share), to 1, perfect inequality (where only one recipient or group of recipients receives all the income). The Gini is based on the difference between the Lorenz curve (the observed cumulative income distribution) and the notion of a perfectly equal income distribution." + +```{r gini, fig.height=4, fig.width=7} +gini2010NC <- ndi::gini(state = "NC", year = 2010) + +# Obtain the 2010 census tracts from the "tigris" package +tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE) +# Remove first 9 characters from GEOID for compatibility with tigris information +tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) + +# Obtain the 2010 counties from the "tigris" package +county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE) + +# Join the Gini Index values to the census tract geometry +NC2010gini <- merge(tract2010NC, gini2010NC$gini, by = "GEOID") + +# Visualize the Gini Index values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts +ggplot2::ggplot() + + ggplot2::geom_sf(data = NC2010gini, + ggplot2::aes(fill = gini), + size = 0.05, + color = "white") + + ggplot2::geom_sf(data = county2010NC, + fill = "transparent", + color = "white", + size = 0.2) + + ggplot2::theme_minimal() + + ggplot2::scale_fill_viridis_c() + + ggplot2::labs(fill = "Index (Continuous)", + caption = "Source: U.S. Census ACS 2006-2010 estimates") + + ggplot2::ggtitle("Gini Index", + subtitle = "NC census tracts") ``` -The process to compute a US-standardized NDI (Powell-Wiley) took about `r round(time_srr, digits = 1)` minutes to run on an machine with the following features: ```{r system} sessionInfo() diff --git a/vignettes/vignette.html b/vignettes/vignette.html index 099a91e..9c9cf81 100644 --- a/vignettes/vignette.html +++ b/vignettes/vignette.html @@ -12,7 +12,7 @@ - + ndi: Neighborhood Deprivation Indices @@ -334,7 +334,7 @@

ndi: Neighborhood Deprivation Indices

Ian D. Buller (GitHub: @idblr)

-

2022-08-04

+

2022-08-14

@@ -352,8 +352,8 @@

2022-08-04

powell_wiley() functions (see an example of the latter below).

tidycensus::census_api_key("...") # INSERT YOUR OWN KEY FROM U.S. CENSUS API
-
-

Calculate NDI (Messer)

+
+

Compute NDI (Messer)

Compute the NDI (Messer) values (2006-2010 5-year ACS) for Georgia, U.S.A., census tracts. This metric is based on Messer et al. (2006) with the following socio-economic status (SES) @@ -522,7 +522,7 @@

Calculate NDI (Messer)

ggplot2::labs(fill = "Index (Continuous)", caption = "Source: U.S. Census ACS 2006-2010 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Messer)", - subtitle = "Georgia, U.S.A., census tracts as the referent") + subtitle = "GA census tracts as the referent") ## Categorical Index ### Rename "9-NDI not avail" level as NA for plotting @@ -545,8 +545,8 @@

Calculate NDI (Messer)

ggplot2::labs(fill = "Index (Categorical)", caption = "Source: U.S. Census ACS 2006-2010 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles", - subtitle = "Georgia, U.S.A., census tracts as the referent")
-

+ subtitle = "GA census tracts as the referent")
+

The results above are at the tract-level. The NDI (Messer) values can also be calculated at the county level.

messer2010GA_county <- ndi::messer(geo = "county", state = "GA", year = 2010)
@@ -566,7 +566,7 @@ 

Calculate NDI (Messer)

ggplot2::labs(fill = "Index (Continuous)", caption = "Source: U.S. Census ACS 2006-2010 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Messer)", - subtitle = "Georgia, U.S.A., counties as the referent") + subtitle = "GA counties as the referent") ## Categorical Index @@ -586,14 +586,14 @@

Calculate NDI (Messer)

ggplot2::labs(fill = "Index (Categorical)", caption = "Source: U.S. Census ACS 2006-2010 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Messer) Quartiles", - subtitle = "Georgia, U.S.A., counties as the referent")
-

+ subtitle = "GA counties as the referent") +

-
-

Calculate NDI (Powell-Wiley)

+
+

Compute NDI (Powell-Wiley)

Compute the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for -Maryland, Virginia, Washington, D.C., and West Virginia census tracts. -This metric is based on Andrews et +Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., census +tracts. This metric is based on Andrews et al. (2020) and Slotman et al. (2022) with socio-economic status (SES) variables chosen by Roux and Mair (2010):

@@ -841,13 +841,13 @@

Calculate NDI (Powell-Wiley)

tracts2020DMVW <- rbind(tract2020D, tract2020M, tract2020V, tract2020W) # Join the NDI (Powell-Wiley) values to the census tract geometry -DMVW2020powell_wiley <- merge(tracts2020DMVW, powell_wiley2020DMVW$ndi, by = "GEOID") +DMVW2020pw <- merge(tracts2020DMVW, powell_wiley2020DMVW$ndi, by = "GEOID") # Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) -## Maryland, Virginia, Washington, D.C., and West Virginia census tracts +## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., census tracts ## Continuous Index ggplot2::ggplot() + - ggplot2::geom_sf(data = DMVW2020powell_wiley, + ggplot2::geom_sf(data = DMVW2020pw, ggplot2::aes(fill = NDI), color = NA) + ggplot2::geom_sf(data = state2020DMVW, @@ -858,16 +858,16 @@

Calculate NDI (Powell-Wiley)

ggplot2::labs(fill = "Index (Continuous)", caption = "Source: U.S. Census ACS 2016-2020 estimates")+ ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)", - subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia tracts as the referent") + subtitle = "DC, MD, VA, and WV tracts as the referent") ## Categorical Index (Population-weighted quintiles) ### Rename "9-NDI not avail" level as NA for plotting -DMVW2020powell_wiley$NDIQuintNA <- factor(replace(as.character(DMVW2020powell_wiley$NDIQuint), - DMVW2020powell_wiley$NDIQuint == "9-NDI not avail", NA), - c(levels(DMVW2020powell_wiley$NDIQuint)[-6], NA)) +DMVW2020pw$NDIQuintNA <- factor(replace(as.character(DMVW2020pw$NDIQuint), + DMVW2020pw$NDIQuint == "9-NDI not avail", NA), + c(levels(DMVW2020pw$NDIQuint)[-6], NA)) ggplot2::ggplot() + - ggplot2::geom_sf(data = DMVW2020powell_wiley, + ggplot2::geom_sf(data = DMVW2020pw, ggplot2::aes(fill = NDIQuintNA), color = NA) + ggplot2::geom_sf(data = state2020DMVW, @@ -879,54 +879,56 @@

Calculate NDI (Powell-Wiley)

ggplot2::labs(fill = "Index (Categorical)", caption = "Source: U.S. Census ACS 2016-2020 estimates")+ ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles", - subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia tracts as the referent")
-

-

Like the NDI (Messer), we also calculate county-level NDI + subtitle = "DC, MD, VA, and WV tracts as the referent")

+

+

Like the NDI (Messer), we also compute county-level NDI (Powell-Wiley).

# Obtain the 2020 counties from the "tigris" package
-county2010DMVW <- tigris::counties(state = c("DC", "MD", "VA", "WV"), year = 2020, cb = TRUE)
+county2020DMVW <- tigris::counties(state = c("DC", "MD", "VA", "WV"), year = 2020, cb = TRUE)
 
 # NDI (Powell-Wiley) at the county level (2016-2020)
-powell_wiley2020DMVW_county <- ndi::powell_wiley(geo = "county", state = c("DC", "MD", "VA", "WV"), year = 2020)
-
-# Join the NDI (Powell-Wiley) values to the county geometry
-DMVW2020powell_wiley_county <- merge(county2010DMVW, powell_wiley2020DMVW_county$ndi, by = "GEOID")
-
-# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS)
-## Maryland, Virginia, Washington, D.C., and West Virginia counties
-## Continuous Index
-ggplot2::ggplot() + 
-  ggplot2::geom_sf(data = DMVW2020powell_wiley_county, 
-                   ggplot2::aes(fill = NDI),
-                   size = 0.20,
-                   color = "white") +
-  ggplot2::theme_minimal() + 
-  ggplot2::scale_fill_viridis_c() +
-  ggplot2::labs(fill = "Index (Continuous)",
-                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
-  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)",
-                   subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia counties as the referent")
-
-## Categorical Index
+powell_wiley2020DMVW_county <- ndi::powell_wiley(geo = "county",
+                                                 state = c("DC", "MD", "VA", "WV"),
+                                                 year = 2020)
+
+# Join the NDI (Powell-Wiley) values to the county geometry
+DMVW2020pw_county <- merge(county2020DMVW, powell_wiley2020DMVW_county$ndi, by = "GEOID")
+
+# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS)
+## Maryland, Virginia, Washington, D.C., and West Virginia, U.S.A., counties
+## Continuous Index
+ggplot2::ggplot() + 
+  ggplot2::geom_sf(data = DMVW2020pw_county, 
+                   ggplot2::aes(fill = NDI),
+                   size = 0.20,
+                   color = "white") +
+  ggplot2::theme_minimal() + 
+  ggplot2::scale_fill_viridis_c() +
+  ggplot2::labs(fill = "Index (Continuous)",
+                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
+  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley)",
+                   subtitle = "DC, MD, VA, and WV counties as the referent")
 
-### Rename "9-NDI not avail" level as NA for plotting
-DMVW2020powell_wiley_county$NDIQuintNA <- factor(replace(as.character(DMVW2020powell_wiley_county$NDIQuint), 
-                                            DMVW2020powell_wiley_county$NDIQuint == "9-NDI not avail", NA),
-                                         c(levels(DMVW2020powell_wiley_county$NDIQuint)[-6], NA))
-
-ggplot2::ggplot() + 
-  ggplot2::geom_sf(data = DMVW2020powell_wiley_county, 
-                   ggplot2::aes(fill = NDIQuint),
-                   size = 0.20,
-                   color = "white") +
-  ggplot2::theme_minimal() + 
-  ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE),
-                                na.value = "grey80") +
-  ggplot2::labs(fill = "Index (Categorical)",
-                caption = "Source: U.S. Census ACS 2016-2020 estimates") +
-  ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles",
-                   subtitle = "Maryland, Virginia, Washington, D.C., and West Virginia counties as the referent")
-

+## Categorical Index + +### Rename "9-NDI not avail" level as NA for plotting +DMVW2020pw_county$NDIQuintNA <- factor(replace(as.character(DMVW2020pw_county$NDIQuint), + DMVW2020pw_county$NDIQuint == "9-NDI not avail", NA), + c(levels(DMVW2020pw_county$NDIQuint)[-6], NA)) + +ggplot2::ggplot() + + ggplot2::geom_sf(data = DMVW2020pw_county, + ggplot2::aes(fill = NDIQuint), + size = 0.20, + color = "white") + + ggplot2::theme_minimal() + + ggplot2::scale_fill_viridis_d(guide = ggplot2::guide_legend(reverse = TRUE), + na.value = "grey80") + + ggplot2::labs(fill = "Index (Categorical)", + caption = "Source: U.S. Census ACS 2016-2020 estimates") + + ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Population-weighted Quintiles", + subtitle = "DC, MD, VA, and WV counties as the referent") +

Advanced Features

@@ -944,20 +946,20 @@

Imputing missing census variables

powell_wiley2020DC <- ndi::powell_wiley(state = "DC", year = 2020) # without imputation
 powell_wiley2020DCi <- ndi::powell_wiley(state = "DC", year = 2020, imp = TRUE) # with imputation
 
-table(is.na(powell_wiley2020DC$ndi$NDI)) # n=2 tracts without NDI (Powell-Wiley) values
+table(is.na(powell_wiley2020DC$ndi$NDI)) # n=13 tracts without NDI (Powell-Wiley) values
 table(is.na(powell_wiley2020DCi$ndi$NDI)) # n=0 tracts without NDI (Powell-Wiley) values
 
 # Obtain the 2020 census tracts from the "tigris" package
 tract2020DC <- tigris::tracts(state = "DC", year = 2020, cb = TRUE)
 
 # Join the NDI (Powell-Wiley) values to the census tract geometry
-DC2020powell_wiley <- merge(tract2020DC, powell_wiley2020DC$ndi, by = "GEOID")
-DC2020powell_wiley <- merge(DC2020powell_wiley, powell_wiley2020DCi$ndi, by = "GEOID", suffixes = c("_nonimp", "_imp"))
+DC2020pw <- merge(tract2020DC, powell_wiley2020DC$ndi, by = "GEOID")
+DC2020pw <- merge(DC2020pw, powell_wiley2020DCi$ndi, by = "GEOID", suffixes = c("_nonimp", "_imp"))
 
-# Visualize the NDI (Powell-Wiley) values (2006-2010 5-year ACS) for Washington, D.C., census tracts
+# Visualize the NDI (Powell-Wiley) values (2016-2020 5-year ACS) for Washington, D.C., census tracts
 ## Continuous Index
 ggplot2::ggplot() + 
-  ggplot2::geom_sf(data = DC2020powell_wiley, 
+  ggplot2::geom_sf(data = DC2020pw, 
                    ggplot2::aes(fill = NDI_nonimp),
                    size = 0.2,
                    color = "white") +
@@ -966,10 +968,10 @@ 

Imputing missing census variables

ggplot2::labs(fill = "Index (Continuous)", caption = "Source: U.S. Census ACS 2016-2020 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Non-Imputed", - subtitle = "Washington, D.C., census tracts as the referent") + subtitle = "DC census tracts as the referent") ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020powell_wiley, + ggplot2::geom_sf(data = DC2020pw, ggplot2::aes(fill = NDI_imp), size = 0.2, color = "white") + @@ -978,16 +980,16 @@

Imputing missing census variables

ggplot2::labs(fill = "Index (Continuous)", caption = "Source: U.S. Census ACS 2016-2020 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley), Imputed", - subtitle = "Washington, D.C., census tracts as the referent") + subtitle = "DC census tracts as the referent") ## Categorical Index ### Rename "9-NDI not avail" level as NA for plotting -DC2020powell_wiley$NDIQuintNA_nonimp <- factor(replace(as.character(DC2020powell_wiley$NDIQuint_nonimp), - DC2020powell_wiley$NDIQuint_nonimp == "9-NDI not avail", NA), - c(levels(DC2020powell_wiley$NDIQuint_nonimp)[-6], NA)) +DC2020pw$NDIQuintNA_nonimp <- factor(replace(as.character(DC2020pw$NDIQuint_nonimp), + DC2020pw$NDIQuint_nonimp == "9-NDI not avail", NA), + c(levels(DC2020pw$NDIQuint_nonimp)[-6], NA)) ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020powell_wiley, + ggplot2::geom_sf(data = DC2020pw, ggplot2::aes(fill = NDIQuintNA_nonimp), size = 0.2, color = "white") + @@ -997,15 +999,15 @@

Imputing missing census variables

ggplot2::labs(fill = "Index (Categorical)", caption = "Source: U.S. Census ACS 2016-2020 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Non-Imputed", - subtitle = "Washington, D.C., census tracts as the referent") + subtitle = "DC census tracts as the referent") ### Rename "9-NDI not avail" level as NA for plotting -DC2020powell_wiley$NDIQuintNA_imp <- factor(replace(as.character(DC2020powell_wiley$NDIQuint_imp), - DC2020powell_wiley$NDIQuint_imp == "9-NDI not avail", NA), - c(levels(DC2020powell_wiley$NDIQuint_imp)[-6], NA)) +DC2020pw$NDIQuintNA_imp <- factor(replace(as.character(DC2020pw$NDIQuint_imp), + DC2020pw$NDIQuint_imp == "9-NDI not avail", NA), + c(levels(DC2020pw$NDIQuint_imp)[-6], NA)) ggplot2::ggplot() + - ggplot2::geom_sf(data = DC2020powell_wiley, + ggplot2::geom_sf(data = DC2020pw, ggplot2::aes(fill = NDIQuintNA_imp), size = 0.2, color = "white") + @@ -1015,8 +1017,8 @@

Imputing missing census variables

ggplot2::labs(fill = "Index (Categorical)", caption = "Source: U.S. Census ACS 2016-2020 estimates") + ggplot2::ggtitle("Neighborhood Deprivation Index (Powell-Wiley) Quintiles, Imputed", - subtitle = "Washington, D.C., census tracts as the referent")
-

+ subtitle = "DC census tracts as the referent")
+

Assign the referent (U.S.-Standardized Metric)

@@ -1160,11 +1162,374 @@

Assign the referent (U.S.-Standardized Metric)

ggplot2::geom_histogram(color = "black", fill = "white") + ggplot2::theme_minimal() + ggplot2::ggtitle("Histogram of US-standardized NDI (Powell-Wiley) values (2013-2017)", - subtitle = "U.S. census tracts as the referent (including Alaska, Hawaii, and Washington, D.C.)")
-

+ subtitle = "U.S. census tracts as the referent (including AK, HI, and DC)") +

The process to compute a US-standardized NDI (Powell-Wiley) took about -4.5 minutes to run on an machine with the following features:

-
sessionInfo()
+3.1 minutes to run on a machine with the features listed at the end of +the vignette.

+ + +
+

Additional metrics socio-economic deprivation and disparity

+

Since version v0.1.1, the ndi package can compute +additional metrics of socio-economic deprivation and disparity beyond +neighborhood deprivation indices, including:

+
    +
  1. anthopolos() function that computes the Racial +Isolation Index (RI) based on Anthopolos et +al. (2011) with data from the ACS-5.
  2. +
  3. bravo() function that computes the Educational +Isolation Index (EI) based on Bravo et +al. (2021) with data from the ACS-5.
  4. +
  5. gini() function that retrieves the Gini Index based on +Gini (1921) from the +ACS-5.
  6. +
+
+

Compute Racial Isolation Index (RI)

+

Compute the RI (Anthopolos) values (2006-2010 5-year ACS) for North +Carolina, U.S.A., census tracts. This metric is based on Anthopolos et +al. (2011) that assessed the racial isolation of the population that +identifies as non-Hispanic or Latino, Black or African American alone. +Multiple racial/ethnic subgroups are available in the +anthopolos() function, including:

+
+++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ACS table sourceracial/ethnic subgroupcharacter for subgroup argument
B03002_002Not Hispanic or LatinoNHoL
B03002_003Not Hispanic or Latino, White aloneNHoLW
B03002_004Not Hispanic or Latino, Black or African American aloneNHoLB
B03002_005Not Hispanic or Latino, American Indian and Alaska Native aloneNHoLAIAN
B03002_006Not Hispanic or Latino, Asian aloneNHoLA
B03002_007Not Hispanic or Latino, Native Hawaiian and Other Pacific Islander +aloneNHoLNHOPI
B03002_008Not Hispanic or Latino, Some other race aloneNHoLSOR
B03002_009Not Hispanic or Latino, Two or more racesNHoLTOMR
B03002_010Not Hispanic or Latino, Two races including Some other raceNHoLTRiSOR
B03002_011Not Hispanic or Latino, Two races excluding Some other race, and +three or more racesNHoLTReSOR
B03002_012Hispanic or LatinoHoL
B03002_013Hispanic or Latino, White aloneHoLW
B03002_014Hispanic or Latino, Black or African American aloneHoLB
B03002_015Hispanic or Latino, American Indian and Alaska Native aloneHoLAIAN
B03002_016Hispanic or Latino, Asian aloneHoLA
B03002_017Hispanic or Latino, Native Hawaiian and Other Pacific Islander +aloneHoLNHOPI
B03002_018Hispanic or Latino, Some other race aloneHoLSOR
B03002_019Hispanic or Latino, Two or more racesHoLTOMR
B03002_020Hispanic or Latino, Two races including Some other raceHoLTRiSOR
B03002_021Hispanic or Latino, Two races excluding Some other race, and three +or more racesHoLTReSOR
+

A census geography (and its neighbors) that has nearly all of its +population who identify with the specified race/ethnicity subgroup(s) +(e.g., Not Hispanic or Latino, Black or African American alone) will +have an RI value that is close to 1. In contrast, a census geography +(and its neighbors) that is nearly none of its population who identify +with the specified race/ethnicity subgroup(s) (e.g., not Not Hispanic or +Latino, Black or African American alone) will have an RI value that is +close to 0.

+
anthopolos2010NC <- ndi::anthopolos(state = "NC", year = 2010, subgroup = "NHoLB")
+
+# Obtain the 2010 census tracts from the "tigris" package
+tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
+# Remove first 9 characters from GEOID for compatibility with tigris information
+tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) 
+
+# Obtain the 2010 counties from the "tigris" package
+county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE)
+
+# Join the RI (Anthopolos) values to the census tract geometry
+NC2010anthopolos <- merge(tract2010NC, anthopolos2010NC$ri, by = "GEOID")
+
+# Visualize the RI (Anthopolos) values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts 
+ggplot2::ggplot() + 
+  ggplot2::geom_sf(data = NC2010anthopolos, 
+                   ggplot2::aes(fill = RI),
+                   size = 0.05,
+                   color = "white") +
+   ggplot2::geom_sf(data = county2010NC,
+                   fill = "transparent", 
+                   color = "white",
+                   size = 0.2) +
+  ggplot2::theme_minimal() +
+  ggplot2::scale_fill_viridis_c() +
+  ggplot2::labs(fill = "Index (Continuous)",
+                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
+  ggplot2::ggtitle("Racial Isolation Index (Anthopolos), non-Hispanic Black",
+                   subtitle = "NC census tracts (not corrected for edge effects)")
+

+

Current version of ndi package does not correct for edge +effects (e.g., census geographies along the specified spatial extent +border, coastline, or U.S.-Mexico / U.S.-Canada border) may have few +neighboring census geographies and RI values in these census geographies +may be unstable. A stop-gap solution for the former source of edge +effect is to compute the RI for neighboring census geographies (i.e., +the states bordering a study area of interest) and then use the +estimates of the study area of interest.

+
# Compute RI for all census tracts in neighboring states
+anthopolos2010GNSTV <- ndi::anthopolos(state = c("GA", "NC", "SC", "TN", "VA"),
+                                     year = 2010, subgroup = "NHoLB")
+
+# Crop to only North Carolina, U.S.A. census tracts
+anthopolos2010NCe <- anthopolos2010GNSTV$ri[anthopolos2010GNSTV$ri$GEOID %in% anthopolos2010NC$ri$GEOID, ]
+
+# Obtain the 2010 census tracts from the "tigris" package
+tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
+# Remove first 9 characters from GEOID for compatibility with tigris information
+tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) 
+
+# Obtain the 2010 counties from the "tigris" package
+county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE)
+
+# Join the RI (Anthopolos) values to the census tract geometry
+edgeNC2010anthopolos <- merge(tract2010NC, anthopolos2010NCe, by = "GEOID")
+
+# Visualize the RI (Anthopolos) values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts 
+ggplot2::ggplot() + 
+  ggplot2::geom_sf(data = edgeNC2010anthopolos, 
+                   ggplot2::aes(fill = RI),
+                   size = 0.05,
+                   color = "white") +
+   ggplot2::geom_sf(data = county2010NC,
+                   fill = "transparent", 
+                   color = "white",
+                   size = 0.2) +
+  ggplot2::theme_minimal() +
+  ggplot2::scale_fill_viridis_c() +
+  ggplot2::labs(fill = "Index (Continuous)",
+                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
+  ggplot2::ggtitle("Racial Isolation Index (Anthopolos), non-Hispanic Black",
+                   subtitle = "NC census tracts (corrected for interstate edge effects)")
+

+ +
+

Compute Educational Isolation Index (EI)

+

Compute the EI (Bravo) values (2006-2010 5-year ACS) for North +Carolina, U.S.A., census tracts. This metric is based on Bravo et +al. (2021) that assessed the educational isolation of the population +without a four-year college degree. Multiple educational attainment +categories are available in the bravo() function, +including:

+ +++++ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
ACS table sourceeducational attainment categorycharacter for subgroup argument
B06009_002Less than high school graduateLtHS
B06009_003High school graduate (includes equivalency)HSGiE
B06009_004Some college or associate’s degreeSCoAD
B06009_005Bachelor’s degreeBD
B06009_006Graduate or professional degreeGoPD
+

A census geography (and its neighbors) that has nearly all of its +population with the specified educational attainment category (e.g., a +four-year college degree or more) will have an EI value that is close to +1. In contrast, a census geography (and its neighbors) that is nearly +none of its population with the specified educational attainment +category (e.g., with a four-year college degree) will have an EI value +that is close to 0.

+
bravo2010NC <- ndi::bravo(state = "NC", year = 2010, subgroup = c("LtHS", "HSGiE", "SCoAD"))
+
+# Obtain the 2010 census tracts from the "tigris" package
+tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
+# Remove first 9 characters from GEOID for compatibility with tigris information
+tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) 
+
+# Obtain the 2010 counties from the "tigris" package
+county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE)
+
+# Join the RI (Bravo) values to the census tract geometry
+NC2010bravo <- merge(tract2010NC, bravo2010NC$ei, by = "GEOID")
+
+# Visualize the RI (Bravo) values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts 
+ggplot2::ggplot() + 
+  ggplot2::geom_sf(data = NC2010bravo, 
+                   ggplot2::aes(fill = EI),
+                   size = 0.05,
+                   color = "white") +
+   ggplot2::geom_sf(data = county2010NC,
+                   fill = "transparent", 
+                   color = "white",
+                   size = 0.2) +
+  ggplot2::theme_minimal() +
+  ggplot2::scale_fill_viridis_c() +
+  ggplot2::labs(fill = "Index (Continuous)",
+                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
+  ggplot2::ggtitle("Educational Isolation Index (Bravo), without a four-year college degree",
+                   subtitle = "NC census tracts (not corrected for edge effects)")
+

+

Can correct for one source of edge effect in the same manner as shown +for the RI (Anthopolos) metric.

+
+
+

Retrieve the Educational Isolation Index (EI)

+

Retrieve the Gini Index values (2006-2010 5-year ACS) for North +Carolina, U.S.A., census tracts. This metric is based on Gini (1921) and the +gini() function retrieve the estimate from the ACS-5.

+

According to the U.S. +Census Bureau: “The Gini Index is a summary measure of income +inequality. The Gini coefficient incorporates the detailed shares data +into a single statistic, which summarizes the dispersion of income +across the entire income distribution. The Gini coefficient ranges from +0, indicating perfect equality (where everyone receives an equal share), +to 1, perfect inequality (where only one recipient or group of +recipients receives all the income). The Gini is based on the difference +between the Lorenz curve (the observed cumulative income distribution) +and the notion of a perfectly equal income distribution.”

+
gini2010NC <- ndi::gini(state = "NC", year = 2010)
+
+# Obtain the 2010 census tracts from the "tigris" package
+tract2010NC <- tigris::tracts(state = "NC", year = 2010, cb = TRUE)
+# Remove first 9 characters from GEOID for compatibility with tigris information
+tract2010NC$GEOID <- substring(tract2010NC$GEO_ID, 10) 
+
+# Obtain the 2010 counties from the "tigris" package
+county2010NC <- tigris::counties(state = "NC", year = 2010, cb = TRUE)
+
+# Join the Gini Index values to the census tract geometry
+NC2010gini <- merge(tract2010NC, gini2010NC$gini, by = "GEOID")
+
+# Visualize the Gini Index values (2006-2010 5-year ACS) for North Carolina, U.S.A., census tracts 
+ggplot2::ggplot() + 
+  ggplot2::geom_sf(data = NC2010gini, 
+                   ggplot2::aes(fill = gini),
+                   size = 0.05,
+                   color = "white") +
+   ggplot2::geom_sf(data = county2010NC,
+                   fill = "transparent", 
+                   color = "white",
+                   size = 0.2) +
+  ggplot2::theme_minimal() +
+  ggplot2::scale_fill_viridis_c() +
+  ggplot2::labs(fill = "Index (Continuous)",
+                caption = "Source: U.S. Census ACS 2006-2010 estimates") +
+  ggplot2::ggtitle("Gini Index",
+                   subtitle = "NC census tracts")
+

+
sessionInfo()
## R version 4.2.1 (2022-06-23)
 ## Platform: x86_64-apple-darwin17.0 (64-bit)
 ## Running under: macOS Catalina 10.15.7
@@ -1180,29 +1545,29 @@ 

Assign the referent (U.S.-Standardized Metric)

## [1] stats graphics grDevices utils datasets methods base ## ## other attached packages: -## [1] tigris_1.6.1 tidycensus_1.2.2 sf_1.0-7 ndi_0.0.1 +## [1] tigris_1.6.1 tidycensus_1.2.2 sf_1.0-7 ndi_0.1.1 ## [5] ggplot2_3.3.6 knitr_1.39 ## ## loaded via a namespace (and not attached): ## [1] httr_1.4.3 sass_0.4.2 tidyr_1.2.0 jsonlite_1.8.0 -## [5] viridisLite_0.4.0 bslib_0.4.0 sp_1.5-0 highr_0.9 +## [5] viridisLite_0.4.0 bslib_0.4.0 highr_0.9 sp_1.5-0 ## [9] yaml_2.3.5 pillar_1.8.0 lattice_0.20-45 glue_1.6.2 ## [13] uuid_1.1-0 digest_0.6.29 rvest_1.0.2 colorspace_2.0-3 -## [17] htmltools_0.5.3 psych_2.2.5 pkgconfig_2.0.3 s2_1.1.0 -## [21] purrr_0.3.4 scales_1.2.0 tzdb_0.3.0 tibble_3.1.8 -## [25] proxy_0.4-27 generics_0.1.3 farver_2.1.1 ellipsis_0.3.2 -## [29] cachem_1.0.6 withr_2.5.0 cli_3.3.0 mnormt_2.1.0 -## [33] magrittr_2.0.3 crayon_1.5.1 maptools_1.1-4 evaluate_0.15 -## [37] fansi_1.0.3 nlme_3.1-158 MASS_7.3-57 xml2_1.3.3 -## [41] foreign_0.8-82 class_7.3-20 tools_4.2.1 hms_1.1.1 -## [45] lifecycle_1.0.1 stringr_1.4.0 munsell_0.5.0 compiler_4.2.1 -## [49] jquerylib_0.1.4 e1071_1.7-11 rlang_1.0.4 classInt_0.4-7 -## [53] units_0.8-0 grid_4.2.1 rstudioapi_0.13 rappdirs_0.3.3 -## [57] labeling_0.4.2 rmarkdown_2.14 wk_0.6.0 gtable_0.3.0 -## [61] DBI_1.1.3 curl_4.3.2 R6_2.5.1 dplyr_1.0.9 -## [65] rgdal_1.5-30 fastmap_1.1.0 utf8_1.2.2 KernSmooth_2.23-20 -## [69] readr_2.1.2 stringi_1.7.8 parallel_4.2.1 Rcpp_1.0.9 -## [73] vctrs_0.4.1 tidyselect_1.1.2 xfun_0.31
+## [17] htmltools_0.5.3 Matrix_1.4-1 psych_2.2.5 pkgconfig_2.0.3 +## [21] s2_1.1.0 purrr_0.3.4 scales_1.2.0 tzdb_0.3.0 +## [25] tibble_3.1.8 proxy_0.4-27 generics_0.1.3 farver_2.1.1 +## [29] ellipsis_0.3.2 cachem_1.0.6 withr_2.5.0 cli_3.3.0 +## [33] mnormt_2.1.0 magrittr_2.0.3 crayon_1.5.1 maptools_1.1-4 +## [37] evaluate_0.15 fansi_1.0.3 nlme_3.1-158 MASS_7.3-57 +## [41] xml2_1.3.3 foreign_0.8-82 class_7.3-20 tools_4.2.1 +## [45] hms_1.1.1 lifecycle_1.0.1 stringr_1.4.0 munsell_0.5.0 +## [49] compiler_4.2.1 jquerylib_0.1.4 e1071_1.7-11 rlang_1.0.4 +## [53] classInt_0.4-7 units_0.8-0 grid_4.2.1 rstudioapi_0.13 +## [57] rappdirs_0.3.3 labeling_0.4.2 rmarkdown_2.14 wk_0.6.0 +## [61] gtable_0.3.0 DBI_1.1.3 curl_4.3.2 R6_2.5.1 +## [65] dplyr_1.0.9 rgdal_1.5-30 fastmap_1.1.0 utf8_1.2.2 +## [69] KernSmooth_2.23-20 readr_2.1.2 stringi_1.7.8 parallel_4.2.1 +## [73] Rcpp_1.0.9 vctrs_0.4.1 tidyselect_1.1.2 xfun_0.31