Skip to content

Commit

Permalink
Water Insecurity Data (#798)
Browse files Browse the repository at this point in the history
* - alt text for image added
- cleaning_script compiled from article
- data_dictionary added

* kept 2022 and 2023 data separate to stay under 20 MB

* Update data/curated/water-insecurity/intro.md

* Update data/curated/water-insecurity/intro.md

* Update data/curated/water-insecurity/meta.yaml

* Oops, the image needs to be saved locally.

* Accept submission

---------

Co-authored-by: Jon Harmon <[email protected]>
Co-authored-by: jonthegeek <[email protected]>
  • Loading branch information
3 people authored Jan 19, 2025
1 parent 2500765 commit 4d0ef20
Show file tree
Hide file tree
Showing 8 changed files with 44,953 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Our over-arching goal for TidyTuesday is to provide real-world datasets so that
| 1 | `2025-01-07` | Bring your own data to start the year! | | |
| 2 | `2025-01-14` | [posit::conf talks](data/2025/2025-01-14/readme.md) | [posit::conf attendee portal 2023](https://reg.conf.posit.co/flow/posit/positconf23/attendee-portal/page/sessioncatalog), [posit::conf attendee portal 2024](https://reg.conf.posit.co/flow/posit/positconf24/attendee-portal/page/sessioncatalog) | [posit::conf(2025) in-person registration is now open!](https://posit.co/blog/positconf2025-in-person-registration-is-now-open/) |
| 3 | `2025-01-21` | [The History of Himalayan Mountaineering Expeditions](data/2025/2025-01-21/readme.md) | [The Himalayan Database](https://www.himalayandatabase.com/downloads.html) | [The Expedition Archives of Elizabeth Hawley](https://www.himalayandatabase.com/index.html) |
| 4 | `2025-01-28` | [Water Insecurity](data/2025/2025-01-28/readme.md) | [US Census Data from tidycensus](https://cran.r-project.org/package=tidycensus) | [Mapping water insecurity in R with tidycensus](https://waterdata.usgs.gov/blog/acs-maps/) |

***

Expand Down
19 changes: 19 additions & 0 deletions data/2025/2025-01-28/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
title: "Water Insecurity"
article:
title: "Mapping water insecurity in R with tidycensus"
url: "https://waterdata.usgs.gov/blog/acs-maps/"
data_source:
title: "US Census Data from tidycensus"
url: "https://cran.r-project.org/package=tidycensus"
images:
# Please include at least one image, and up to three images
- file: "tidycensus-intro-banner.png"
alt: >
Three choropleth maps of the United States west of the Mississippi River, using 2022 U.S. Census Bureau Data, entitled Mapping water insecurity in R with tidycensus. The first choropleth is labeled Percent Hispanic, 2022, and shows the highest percentages of Hispanic people near the US-Mexico border, with scattered high percentages, such as in the state of Washington. The second choropleth is labeled Median gross rent, 2022, and shows the highest rents in California, Washington state, and Colorado. The third choropleth is labeled Average household size, 2022, and has scattered areas of large household size, with the highest averages in South Dakota, Utah, southern California, and southern Texas. The image also includes the hex logo of the tidycensus R package, with an indistinct choropleth map in shades of green.
credit:
# We want to thank you for curating this dataset! If you do not want a
# particular type of credit, please delete the related line.
post: "Niha Pereira"
bluesky: "https://bsky.app/profile/nnpereira"
linkedin: "https://www.linkedin.com/in/niha-pereira"
github: "https://github.com/nnpereira"
141 changes: 141 additions & 0 deletions data/2025/2025-01-28/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
# Water Insecurity

This week we're exploring water insecurity data featured in the article [Mapping water insecurity in R with tidycensus](https://waterdata.usgs.gov/blog/acs-maps/)!

> Water insecurity can be influenced by number of social vulnerability indicators—from demographic characteristics to living conditions and socioeconomic status —that vary spatially across the U.S. This blog shows how the tidycensus package for R can be used to access U.S. Census Bureau data, including the American Community Surveys, as featured in the “Unequal Access to Water ” data visualization from the USGS Vizlab. It offers reproducible code examples demonstrating use of tidycensus for easy exploration and visualization of social vulnerability indicators in the Western U.S.
- How does the lack of complete indoor plumbing compare between the 2023 and 2022 Census data?
- What counties have the greatest percent of households lacking plumbing?
- Are there differences in indoor plumbing availability between Western U.S and Eastern U.S counties?

Thank you to [Niha Pereira](https://github.com/nnpereira) for curating this week's dataset.

## The Data

```r
# Option 1: tidytuesdayR package
## install.packages("tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load('2025-01-28')
## OR
tuesdata <- tidytuesdayR::tt_load(2025, week = 4)

water_insecurity_2022 <- tuesdata$water_insecurity_2022
water_insecurity_2023 <- tuesdata$water_insecurity_2023

# Option 2: Read directly from GitHub

water_insecurity_2022 <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-01-28/water_insecurity_2022.csv')
water_insecurity_2023 <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2025/2025-01-28/water_insecurity_2023.csv')
```

## How to Participate

- [Explore the data](https://r4ds.hadley.nz/), watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about **causation** in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a [shiny app](https://shiny.posit.co/), or some other piece of data-science-related output, using R or another programming language.
- [Share your output and the code used to generate it](../../../sharing.md) on social media with the #TidyTuesday hashtag.
- [Submit your own dataset!](../../../.github/pr_instructions.md)

### Data Dictionary

# `water_insecurity_2022.csv`

|variable |class |description |
|:------------------------|:----------------|:-------------------------------------|
|geoid |character |The U.S. Census Bureau ACS county id. |
|name |character |The U.S. Census Bureau ACS county name. |
|year |character |The year of U.S. Census Bureau ACS sample. |
|geometry |sfc_MULTIPOLYGON |The county geographic boundaries. |
|total_pop |double |The total population. |
|plumbing |double |The total owner occupied households lacking plumbing facilities. |
|percent_lacking_plumbing |double |The percent of population lacking plumbing facilities. |

# `water_insecurity_2023.csv`

|variable |class |description |
|:------------------------|:----------------|:-------------------------------------|
|geoid |character |The U.S. Census Bureau ACS county id. |
|name |character |The U.S. Census Bureau ACS county name. |
|year |character |The year of U.S. Census Bureau ACS sample. |
|geometry |sfc_MULTIPOLYGON |The county geographic boundaries. |
|total_pop |double |The total population. |
|plumbing |double |The total owner occupied households lacking plumbing facilities. |
|percent_lacking_plumbing |double |The percent of population lacking plumbing facilities. |

### Cleaning Script

```r
# Clean data compiled from code referenced in article (https://waterdata.usgs.gov/blog/acs-maps/).
# Code was revised to pull data for all US counties for years 2022 - 2023.

# Load packages -----
library(tidycensus)
library(sf)
library(janitor)
library(tidyverse)

# Helper functions -----
get_census_data <- function(geography, var_names, year, proj, survey_var) {
df <- get_acs(
geography = geography,
variable = var_names,
year = year,
geometry = TRUE,
survey = survey_var) |>
clean_names() |>
st_transform(proj) |>
mutate(year = year)

return(df)
}

# Grab relevant variables - B01003_001: total population, B25049_004: households lacking plumbing----
vars <- c("B01003_001", "B25049_004")

# Pull data for 2023 and 2022 for all US counties ------
water_insecurity_2023 <- get_census_data(
geography = 'county',
var_names = vars,
year = "2023",
proj = "EPSG:5070",
survey_var = "acs1"
) |>
mutate(
variable_long = case_when(
variable == "B01003_001" ~ "total_pop",
variable == "B25049_004" ~ "plumbing",
.default = NA_character_
)
) |>
select(geoid, name, variable_long, estimate, geometry, year) |>
pivot_wider(
names_from = variable_long,
values_from = estimate
) |>
mutate(
percent_lacking_plumbing = (plumbing / total_pop) * 100
)

water_insecurity_2022 <- get_census_data(
geography = 'county',
var_names = vars,
year = "2022",
proj = "EPSG:5070",
survey_var = "acs1"
) |>
mutate(
variable_long = case_when(
variable == "B01003_001" ~ "total_pop",
variable == "B25049_004" ~ "plumbing",
.default = NA_character_
)
) |>
select(geoid, name, variable_long, estimate, geometry, year) |>
pivot_wider(
names_from = variable_long,
values_from = estimate
) |>
mutate(
percent_lacking_plumbing = (plumbing / total_pop) * 100
)
```
Binary file added data/2025/2025-01-28/tidycensus-intro-banner.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 4d0ef20

Please sign in to comment.