Skip to content

Commit

Permalink
Preparation for CRAN v0.1.1 release
Browse files Browse the repository at this point in the history
* Added `anthopolos()` function to compute the Racial Isolation Index (RI) based on based on [Anthopolos et al. (2011)](https://www.doi.org/10.1016/j.sste.2011.06.002) for specified counties/tracts 2009-2020
* Added `bravo()` function to compute the Educational Isolation Index (EI) based on based on [Bravo et al. (2021)](https://www.doi.org/10.3390/ijerph18179384) for specified counties/tracts 2009-2020
* Added `gini()` function to retrieve the Gini Index based on [Gini (1921)](https://www.doi.org/10.2307/2223319) for specified counties/tracts 2009-2020
* `Matrix` and `sf` are now Depends
* Updated vignette and README for new features
* Fixed typos throughout documentation
* Updated Description in DESCRIPTION
* Updated 'package.R' with new details and section
* Updated CITATION with new citations for the additional metrics
  • Loading branch information
Ian Buller, PhD, MA committed Aug 14, 2022
1 parent 8544ff3 commit 0a51633
Show file tree
Hide file tree
Showing 29 changed files with 1,784 additions and 195 deletions.
23 changes: 17 additions & 6 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Package: ndi
Title: Neighborhood Deprivation Indices
Version: 0.1.0
Date: 2022-08-10
Version: 0.1.1
Date: 2022-08-14
Authors@R:
c(person(given = "Ian D.",
family = "Buller",
Expand All @@ -11,14 +11,24 @@ Authors@R:
person(given = "NCI",
role = c("cph", "fnd")))
Maintainer: Ian D. Buller <[email protected]>
Description: Compute various neighborhood deprivation indices (NDI), including:
Description: Computes various metrics of socio-economic deprivation and disparity in
the United States. Some metrics are considered "spatial" because they
consider the values of neighboring (i.e., adjacent) census geographies in
their computation, while other metrics are "aspatial" because they only
consider the value within each census geography. Two types of aspatial
neighborhood deprivation indices (NDI) are available: including:
(1) based on Messer et al. (2006) <doi:10.1007/s11524-006-9094-x>
and (2) based on Andrews et al. (2020) <doi:10.1080/17445647.2020.1750066>
and Slotman et al. (2022) <doi:10.1016/j.dib.2022.108002>
who uses variables chosen by Roux and Mair (2010)
who use variables chosen by Roux and Mair (2010)
<doi:10.1111/j.1749-6632.2009.05333.x>. Both are a decomposition
of multiple demographic characteristics from the U.S. Census Bureau
American Community Survey 5-year estimates.
American Community Survey 5-year estimates (ACS-5; 2010-2020). Using data
from the ACS-5 (2009-2020), the package can also (1) compute the spatial
Racial Isolation Index (RI) based on Anthopolos et al. (2011)
<doi:10.1016/j.sste.2011.06.002>s, (2) compute spatial the Educational Isolation
Index (EI) based on Bravo et al. (2021) <doi:10.3390/ijerph18179384>,and
(3) retrieve the aspatial Gini Index based on Gini (1921) <doi:10.2307/2223319>.
License: Apache License (>= 2.0)
Encoding: UTF-8
Roxygen: list(markdown = TRUE)
Expand All @@ -28,7 +38,9 @@ Depends:
Imports:
dplyr,
MASS,
Matrix,
psych,
sf,
stats,
stringr,
tidycensus,
Expand All @@ -38,7 +50,6 @@ Suggests:
testthat,
tigris,
R.rsp,
sf,
spelling
VignetteBuilder: R.rsp
Language: en-US
Expand Down
7 changes: 7 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,11 +1,18 @@
# Generated by roxygen2: do not edit by hand

export(anthopolos)
export(bravo)
export(gini)
export(messer)
export(powell_wiley)
import(dplyr)
importFrom(MASS,ginv)
importFrom(Matrix,sparseMatrix)
importFrom(psych,alpha)
importFrom(psych,principal)
importFrom(sf,st_drop_geometry)
importFrom(sf,st_geometry)
importFrom(sf,st_intersects)
importFrom(stats,complete.cases)
importFrom(stats,cor)
importFrom(stats,cov2cor)
Expand Down
11 changes: 11 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# ndi (development version)

# ndi v0.1.1
* Added `anthopolos()` function to compute the Racial Isolation Index (RI) based on based on [Anthopolos et al. (2011)](https://www.doi.org/10.1016/j.sste.2011.06.002) for specified counties/tracts 2009-2020
* Added `bravo()` function to compute the Educational Isolation Index (EI) based on based on [Bravo et al. (2021)](https://www.doi.org/10.3390/ijerph18179384) for specified counties/tracts 2009-2020
* Added `gini()` function to retrieve the Gini Index based on [Gini (1921)](https://www.doi.org/10.2307/2223319) for specified counties/tracts 2009-2020
* `Matrix` and `sf` are now Depends
* Updated vignette and README for new features
* Fixed typos throughout documentation
* Updated Description in DESCRIPTION
* Updated 'package.R' with new details and section
* Updated CITATION with new citations for the additional metrics

# ndi v0.1.0
* Fixed invalid URL and typos in package README.md

Expand Down
216 changes: 216 additions & 0 deletions R/anthopolos.R
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
#' Racial Isolation Index based on Anthopolos et al. (2011)
#'
#' Compute the Racial Isolation Index (Anthopolos) values for selected subgroup(s).
#'
#' @param geo Character string specifying the geography of the data either census tracts \code{geo = "tract"} (the default) or counties \code{geo = "county"}.
#' @param year Numeric. The year to compute the estimate. The default is 2020 and the years between 2009 and 2020 are currently available.
#' @param subgroup Character string specifying the racial/ethnic subgroup(s). See Details for available choices.
#' @param quiet Logical. If TRUE, will display messages about potential missing census information. The default is FALSE.
#' @param ... Arguments passed to \code{\link[tidycensus]{get_acs}} to select state, county, and other arguments for census characteristics
#'
#' @details This function will compute the Racial Isolation Index (RI) of U.S. census tracts or counties for a specified geographical extent (e.g., entire U.S. or a single state) based on Anthopolos et al. (2011) \doi{10.1016/j.sste.2011.06.002} who originally designed the metric for the racial isolation of non-Hispanic Black individuals. This function provides the computation of RI for any of the U.S. Census Bureau race/ethnicity subgroups (including Hispanic and non-Hispanic individuals).
#'
#' The function uses the \code{\link[tidycensus]{get_acs}} function to obtain U.S. Census Bureau 5-year American Community Survey characteristics used for the geospatial computation. The yearly estimates available for 2009 through 2020 when ACS-5 data are available but are available from other U.S. Census Bureau surveys. The twenty racial/ethnic subgroups (U.S. Census Bureau definitions) are:
#' \itemize{
#' \item{B03002_002: }{Not Hispanic or Latino "NHoL"}
#' \item{B03002_003: }{Not Hispanic or Latino, White alone "NHoLW"}
#' \item{B03002_004: }{Not Hispanic or Latino, Black or African American alone "NHoLB"}
#' \item{B03002_005: }{Not Hispanic or Latino, American Indian and Alaska Native alone "NHoLAIAN"}
#' \item{B03002_006: }{Not Hispanic or Latino, Asian alone "NHoLA"}
#' \item{B03002_007: }{Not Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone "NHoLNHOPI"}
#' \item{B03002_008: }{Not Hispanic or Latino, Some other race alone "NHoLSOR"}
#' \item{B03002_009: }{Not Hispanic or Latino, Two or more races "NHoLTOMR"}
#' \item{B03002_010: }{Not Hispanic or Latino, Two races including Some other race "NHoLTRiSOR"}
#' \item{B03002_011: }{Not Hispanic or Latino, Two races excluding Some other race, and three or more races "NHoLTReSOR"}
#' \item{B03002_012: }{Hispanic or Latino "HoL"}
#' \item{B03002_013: }{Hispanic or Latino, White alone "HoLW"}
#' \item{B03002_014: }{Hispanic or Latino, Black or African American alone "HoLB"}
#' \item{B03002_015: }{Hispanic or Latino, American Indian and Alaska Native alone "HoLAIAN"}
#' \item{B03002_016: }{Hispanic or Latino, Asian alone "HoLA"}
#' \item{B03002_017: }{Hispanic or Latino, Native Hawaiian and Other Pacific Islander alone "HoLNHOPI"}
#' \item{B03002_018: }{Hispanic or Latino, Some other race alone "HoLSOR"}
#' \item{B03002_019: }{Hispanic or Latino, Two or more races "HoLTOMR"}
#' \item{B03002_020: }{Hispanic or Latino, Two races including Some other race "HoLTRiSOR"}
#' \item{B03002_021: }{Hispanic or Latino, Two races excluding Some other race, and three or more races "HoLTReSOR"}
#' }
#'
#' Use the internal \code{state} and \code{county} arguments within the \code{\link[tidycensus]{get_acs}} function to specify geographic extent of the data output. NOTE: Current version does not correct for edge effects (e.g., census geographies along the specified spatial extent border, coastline, or U.S.-Mexico / U.S.-Canada border) may have few neighboring census geographies and RI values in these census geographies may be unstable. A stop-gap solution for the former source of edge effect is to compute the RI for neighboring census geographies (i.e., the states bordering a study area of interest) and then use the estimates of the study area of interest.
#'
#' A census geography (and its neighbors) that has nearly all of its population who identify with the specified race/ethnicity subgroup(s) (e.g., non-Hispanic or Latino, Black or African American alone) will have an RI value that is close to 1. In contrast, a census geography (and its neighbors) that is nearly none of its population who identify with the specified race/ethnicity subgroup(s) (e.g., not non-Hispanic or Latino, Black or African American alone) will have an RI value that is close to 0.
#'
#' @return An object of class 'list'. This is a named list with the following components:
#'
#' \describe{
#' \item{\code{ri}}{An object of class 'tbl' for the GEOID, name, RI, and raw census values of specified census geographies.}
#' \item{\code{missing}}{An object of class 'tbl' of the count and proportion of missingness for each census variable used to compute the RI.}
#' }
#'
#' @import dplyr
#' @importFrom Matrix sparseMatrix
#' @importFrom sf st_drop_geometry st_geometry st_intersects
#' @importFrom stringr str_trim
#' @importFrom tidycensus get_acs
#' @importFrom tidyr gather separate
#' @export
#'
#' @seealso \code{\link[tidycensus]{get_acs}} for additional arguments for geographic extent selection (i.e., \code{state} and \code{county}).
#'
#' @examples
#' \dontrun{
#' # Wrapped in \dontrun{} because these examples require a Census API key.
#'
#' # Tract-level metric (2020)
#' anthopolos(geo = "tract", state = "GA", year = 2020, subgroup = c("NHoLB", "HoLB"))
#'
#' # County-level metric (2020)
#' anthopolos(geo = "county", state = "GA", year = 2020, subgroup = c("NHoLB", "HoLB"))
#'
#' }
#'
anthopolos <- function(geo = "tract", year = 2020, subgroup, quiet = FALSE, ...) {

# Check arguments
match.arg(geo, choices = c("county", "tract"))
stopifnot(is.numeric(year), year %in% 2009:2020)
match.arg(subgroup, several.ok = TRUE,
choices = c("NHoL", "NHoLW", "NHoLB", "NHoLAIAN", "NHoLA", "NHoLNHOPI",
"NHoLSOR", "NHoLTOMR", "NHoLTRiSOR", "NHoLTReSOR",
"HoL", "HoLW", "HoLB", "HoLAIAN", "HoLA", "HoLNHOPI",
"HoLSOR", "HoLTOMR", "HoLTRiSOR", "HoLTReSOR"))

# select census variables
vars <- c(TotalPop = "B03002_001",
NHoL = "B03002_002",
NHoLW = "B03002_003",
NHoLB = "B03002_004",
NHoLAIAN = "B03002_005",
NHoLA = "B03002_006",
NHoLNHOPI = "B03002_007",
NHoLSOR = "B03002_008",
NHoLTOMR = "B03002_009",
NHoLTRiSOR = "B03002_010",
NHoLTReSOR = "B03002_011",
HoL = "B03002_012",
HoLW = "B03002_013",
HoLB = "B03002_014",
HoLAIAN = "B03002_015",
HoLA = "B03002_016",
HoLNHOPI = "B03002_017",
HoLSOR = "B03002_018",
HoLTOMR = "B03002_019",
HoLTRiSOR = "B03002_020",
HoLTReSOR = "B03002_021")

selected_vars <- vars[c("TotalPop", subgroup)]
out_names <- names(selected_vars) # save for output
prefix <- "subgroup"
suffix <- seq(1:length(subgroup))
names(selected_vars) <- c("TotalPop", paste(prefix, suffix, sep = ""))
in_names <- paste(names(selected_vars), "E", sep = "")

# acquire RI variables and sf geometries
ri_vars <- suppressMessages(suppressWarnings(tidycensus::get_acs(geography = geo,
year = year,
output = "wide",
variables = selected_vars,
geometry = TRUE, ...)))


if (geo == "tract") {
ri_vars <- ri_vars %>%
tidyr::separate(NAME, into = c("tract", "county", "state"), sep = ",") %>%
dplyr::mutate(tract = gsub("[^0-9\\.]","", tract))
} else {
ri_vars <- ri_vars %>% tidyr::separate(NAME, into = c("county", "state"), sep = ",")
}

ri_vars <- ri_vars %>%
dplyr::mutate(county = stringr::str_trim(county),
subgroup = rowSums(sf::st_drop_geometry(ri_vars[ , in_names[-1]])))

# Compute RI
## From Anthopolos et al. (2011) https://doi.org/10.1016/j.sste.2011.06.002
## RI_{im} = (Sigma_{j∈∂_{i}} w_{ij} * T_{jm}) / (Sigma_{j∈∂_{i}} w_{ij} * T_{j})
## Where:
## ∂_{i} denotes the set of index units i and its neighbors
## Given M mutually exclusive racial/ethnic subgroups, m indexes the subgroups of M
## T_{i} denotes the total population in region i (TotalPop)
## T_{im} denotes the population of the selected subgroup(s) (subgroup1, ...)
## w_{ij} denotes a nXn first-order adjacency matrix, where n is the number of census geometries in the study area
### and the entries of w_{ij} are set to 1 if a boundary is shared by region i and region j and zero otherwise
### Entries of the main diagonal (since i∈∂_{i}, w_{ij} = w_{ii} when j = i) of w_{ij} are set to 1.5
### such that the weight of the index unit, i, is larger than the weights assigned to adjacent tracts

## Geospatial adjacency matrix (wij)
tmp <- sf::st_intersects(sf::st_geometry(ri_vars), sparse = TRUE)
names(tmp) <- as.character(seq_len(nrow(ri_vars)))
tmpL <- length(tmp)
tmpcounts <- unlist(Map(length, tmp))
tmpi <- rep(1:tmpL, tmpcounts)
tmpj <- unlist(tmp)
wij <- Matrix::sparseMatrix(i = tmpi, j = tmpj, x = 1, dims = c(tmpL, tmpL))
diag(wij) <- 1.5

## Compute
ri_vars <- sf::st_drop_geometry(ri_vars) # drop geometries (can join back later)
RIim <- list()
for (i in 1:dim(wij)[1]){
RIim[[i]] <- sum(as.matrix(wij[i, ])*ri_vars[ , "subgroup"]) / sum(as.matrix(wij[i, ])*ri_vars[, "TotalPopE"])
}
ri_vars$RI <- unlist(RIim)

# warning for missingness of census characteristics
missingYN <- ri_vars %>%
dplyr::select(in_names)
names(missingYN) <- out_names
missingYN <- missingYN %>%
tidyr::gather(key = "variable", value = "val") %>%
dplyr::mutate(missing = is.na(val)) %>%
dplyr::group_by(variable) %>%
dplyr::mutate(total = n()) %>%
dplyr::group_by(variable, total, missing) %>%
dplyr::count() %>%
dplyr::mutate(percent = round(n / total * 100,2),
percent = paste0(percent," %")) %>%
dplyr::filter(missing == TRUE)

if (quiet == FALSE) {
# warning for missing census data
if (nrow(missingYN) != 0) {
message("Warning: Missing census data")
} else {
returnValue(missingYN)
}
}

# format output
if (geo == "tract") {
ri <- ri_vars %>%
dplyr::select(c("GEOID",
"state",
"county",
"tract",
"RI",
in_names))
names(ri) <- c("GEOID", "state", "county", "tract", "RI", out_names)
} else {
ri <- ri_vars %>%
dplyr::select(c("GEOID",
"state",
"county",
"RI",
in_names))
names(ri) <- c("GEOID", "state", "county", "RI", out_names)
}

ri <- ri %>%
dplyr::mutate(county = stringr::str_trim(county),
state = stringr::str_trim(state)) %>%
dplyr::arrange(GEOID) %>%
dplyr::as_tibble()

out <- list(ri = ri,
missing = missingYN)

return(out)
}
Loading

0 comments on commit 0a51633

Please sign in to comment.