diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
index 5c5da50c..108dcf78 100644
--- a/CONTRIBUTING.md
+++ b/CONTRIBUTING.md
@@ -1,6 +1,6 @@
# Contributing
Thank you for considering improving this project! By participating, you
-agree to abide by the [code of conduct](https://github.com/ipums/ipumsr/blob/master/CONDUCT.md).
+agree to abide by the [code of conduct](https://tech.popdata.org/ipumsr/CODE_OF_CONDUCT.html).
# Issues (Reporting a problem or suggestion)
If you've experience a problem with the package, or have a suggestion for it,
@@ -17,6 +17,7 @@ We'll do our best to answer your question.
# Pull Requests (Making changes to the package)
We appreciate pull requests that follow these guidelines:
+
1) Make sure that tests pass (and add new ones if possible).
2) Do your best to conform to the code style of the package, currently
diff --git a/R/api_process_extract.R b/R/api_process_extract.R
index dc43f63a..2b269713 100644
--- a/R/api_process_extract.R
+++ b/R/api_process_extract.R
@@ -766,7 +766,7 @@ extract_is_completed_and_has_links.micro_extract <- function(extract) {
is_complete <- extract$status == "completed"
has_codebook <- has_url(download_links, "ddi_codebook")
- has_data <- has_url(download_links, "data")
+ has_data <- has_url(download_links, "data")
is_complete && has_codebook && has_data
}
diff --git a/R/micro_read_chunked.R b/R/micro_read_chunked.R
index 0fac526d..8f997597 100644
--- a/R/micro_read_chunked.R
+++ b/R/micro_read_chunked.R
@@ -168,25 +168,25 @@
#' # the full dataset in memory
#' if (requireNamespace("biglm")) {
#' lm_results <- read_ipums_micro_chunked(
-#' ipums_example("cps_00160.xml"),
-#' IpumsBiglmCallback$new(
-#' INCTOT ~ AGE + HEALTH, # Model formula
-#' function(x, pos) {
-#' x %>%
-#' mutate(
-#' INCTOT = lbl_na_if(
-#' INCTOT,
-#' ~ grepl("Missing|N.I.U.", .lbl)
-#' ),
-#' HEALTH = as_factor(HEALTH)
-#' )
-#' }
-#' ),
-#' chunk_size = 1000,
-#' verbose = FALSE
-#' )
+#' ipums_example("cps_00160.xml"),
+#' IpumsBiglmCallback$new(
+#' INCTOT ~ AGE + HEALTH, # Model formula
+#' function(x, pos) {
+#' x %>%
+#' mutate(
+#' INCTOT = lbl_na_if(
+#' INCTOT,
+#' ~ grepl("Missing|N.I.U.", .lbl)
+#' ),
+#' HEALTH = as_factor(HEALTH)
+#' )
+#' }
+#' ),
+#' chunk_size = 1000,
+#' verbose = FALSE
+#' )
#'
-#' summary(lm_results)
+#' summary(lm_results)
#' }
read_ipums_micro_chunked <- function(
ddi,
diff --git a/R/viewer.R b/R/viewer.R
index bf62c6d5..680e29cc 100644
--- a/R/viewer.R
+++ b/R/viewer.R
@@ -61,8 +61,10 @@ ipums_view <- function(x, out_file = NULL, launch = TRUE) {
if (is.null(out_file)) {
if (!launch) {
rlang::warn(c(
- paste0("Some operating systems may have trouble opening an HTML ",
- "file from a temporary directory."),
+ paste0(
+ "Some operating systems may have trouble opening an HTML ",
+ "file from a temporary directory."
+ ),
"i" = "Use `out_file` to specify an alternate output location."
))
}
diff --git a/README.Rmd b/README.Rmd
index f4728d6a..5b1c4da4 100644
--- a/README.Rmd
+++ b/README.Rmd
@@ -53,7 +53,7 @@ remotes::install_github("ipums/ipumsr")
## What is IPUMS?
-[IPUMS](https://www.ipums.org/mission-purpose) is the world's largest
+[IPUMS](https://www.ipums.org) is the world's largest
publicly available population database, providing census and survey data
from around the world integrated across time and space. IPUMS
integration and documentation make it easy to study change, conduct
@@ -61,7 +61,7 @@ comparative research, merge information across data types, and analyze
individuals within family and community context. Data and services are
available free of charge.
-IPUMS consists of multiple projects, or collections, that provide
+IPUMS consists of multiple projects, or *collections*, that provide
different data products.
- **Microdata** projects distribute data for individual survey units,
@@ -71,7 +71,7 @@ statistics for particular geographic units along with corresponding
GIS mapping files.
ipumsr supports different levels of functionality for each IPUMS project, as
-summarized in the following table:
+summarized in the table below.
```{r}
#| echo: false
@@ -88,90 +88,90 @@ tbl_config <- list(
list(
img = "",
proj = "IPUMS USA",
- type = "Microdata",
- desc = "U.S. Census and American Community Survey microdata (1850-present)",
- read = checkmark(),
- request = checkmark(),
+ type = "Microdata",
+ desc = "U.S. Census and American Community Survey microdata (1850-present)",
+ read = checkmark(),
+ request = checkmark(),
metadata = ""
),
list(
img = "",
- proj = "IPUMS CPS",
- type = "Microdata",
- desc = "Current Population Survey microdata including basic monthly surveys and supplements (1962-present)",
- read = checkmark(),
- request = checkmark(),
+ proj = "IPUMS CPS",
+ type = "Microdata",
+ desc = "Current Population Survey microdata including basic monthly surveys and supplements (1962-present)",
+ read = checkmark(),
+ request = checkmark(),
metadata = ""
),
list(
img = "",
proj = "IPUMS International",
- type = "Microdata",
- desc = "Census microdata covering over 100 countries, contemporary and historical",
- read = checkmark(),
- request = checkmark(),
+ type = "Microdata",
+ desc = "Census microdata covering over 100 countries, contemporary and historical",
+ read = checkmark(),
+ request = checkmark(),
metadata = ""
),
list(
img = "",
- proj = "IPUMS NHGIS",
- type = "Aggregate Data",
- desc = "Tabular U.S. Census data and GIS mapping files (1790-present)",
+ proj = "IPUMS NHGIS",
+ type = "Aggregate Data",
+ desc = "Tabular U.S. Census data and GIS mapping files (1790-present)",
read = checkmark(),
- request = checkmark(),
+ request = checkmark(),
metadata = checkmark()
),
list(
img = "",
- proj = "IPUMS IHGIS",
- type = "Aggregate Data",
- desc = "Tabular and GIS data from population, housing, and agricultural censuses around the world",
- read = "",
- request = "",
+ proj = "IPUMS IHGIS",
+ type = "Aggregate Data",
+ desc = "Tabular and GIS data from population, housing, and agricultural censuses around the world",
+ read = "",
+ request = "",
metadata = ""
),
list(
img = "",
- proj = "IPUMS Time Use",
- type = "Microdata",
- desc = "Time use microdata from the U.S. (1930-present) and thirteen other countries (1965-present)",
- read = checkmark(),
- request = "",
+ proj = "IPUMS Time Use",
+ type = "Microdata",
+ desc = "Time use microdata from the U.S. (1930-present) and thirteen other countries (1965-present)",
+ read = checkmark(),
+ request = "",
metadata = ""
),
list(
img = "",
- proj = "IPUMS Health Surveys",
- type = "Microdata",
+ proj = "IPUMS Health Surveys",
+ type = "Microdata",
desc = paste0(
"Microdata from the U.S. ",
"National Health Interview Survey (NHIS) (1963-present) and ",
"Medical Expenditure Panel Survey (MEPS) (1996-present)"
),
- read = checkmark(),
- request = "",
+ read = checkmark(),
+ request = "",
metadata = ""
),
list(
img = "",
- proj = "IPUMS Global Health",
+ proj = "IPUMS Global Health",
type = "Microdata",
desc = paste0(
"Health survey microdata for low- and middle-income countries, including ",
"harmonized data collections for Demographic and Health Surveys (DHS) ",
"and Performance Monitoring for Action (PMA) surveys"
),
- read = checkmark(),
- request = "",
+ read = checkmark(),
+ request = "",
metadata = ""
),
list(
img = "",
- proj = "IPUMS Higher Ed",
- type = "Microdata",
- desc = "Survey microdata on the science and engineering workforce in the U.S. from 1993 to 2013",
- read = checkmark(),
- request = "",
+ proj = "IPUMS Higher Ed",
+ type = "Microdata",
+ desc = "Survey microdata on the science and engineering workforce in the U.S. from 1993 to 2013",
+ read = checkmark(),
+ request = "",
metadata = ""
)
)
@@ -196,25 +196,28 @@ knitr::kable(
ipumsr uses the [IPUMS API](https://developer.ipums.org/) to submit data
requests, download data extracts, and get metadata, so the scope of
-ipumsr functionality generally corresponds to the [available API
-functionality](https://developer.ipums.org/docs/v2/apiprogram/apis/). As
+functionality generally corresponds to that [available via the API](https://developer.ipums.org/docs/v2/apiprogram/apis/). As
the IPUMS team extends the API to support more functionality for more
projects, we aim to extend ipumsr capabilities accordingly.
## Getting started
If you're new to IPUMS data, learn more about what's available through
-the [IPUMS Projects Overview](https://www.ipums.org/overview).
+the [IPUMS Projects Overview](https://www.ipums.org/overview). Then, see
+`vignette("ipums")` for an overview of how to obtain IPUMS data.
-The package vignettes are the best place to learn about what's available in
-ipumsr itself:
+The package vignettes are the best place to explore what ipumsr has to offer:
- To read IPUMS data extracts into R, see `vignette("ipums-read")`.
-- To interact with the IPUMS extract system via the IPUMS API, see
- `vignette("ipums-api")`.
+
+- To interact with the IPUMS extract and metadata system via the IPUMS API,
+ see `vignette("ipums-api")`.
+
- For additional details about microdata and NHGIS extract requests, see
`vignette("ipums-api-micro")` and `vignette("ipums-api-nhgis")`.
+
- To work with labelled values in IPUMS data, see `vignette("value-labels")`.
+
- For techniques for working with large data extracts, see
`vignette("ipums-bigdata")`.
@@ -243,9 +246,9 @@ We greatly appreciate feedback and development contributions. Please
submit any bug reports, pull requests, or other suggestions on
[GitHub](https://github.com/ipums/ipumsr/issues). Before contributing,
please be sure to read the [Contributing
-Guidelines](https://github.com/ipums/ipumsr/blob/master/CONTRIBUTING.md)
-and the [Code of
-Conduct](https://github.com/ipums/ipumsr/blob/master/CONDUCT.md).
+Guidelines](https://tech.popdata.org/ipumsr/CONTRIBUTING.html)
+and the
+[Code of Conduct](https://tech.popdata.org/ipumsr/CODE_OF_CONDUCT.html).
If you have general questions or concerns about IPUMS data, check out
our [user forum](https://forum.ipums.org) or send an email to
diff --git a/README.md b/README.md
index a4c61cbe..59288cba 100644
--- a/README.md
+++ b/README.md
@@ -42,15 +42,15 @@ remotes::install_github("ipums/ipumsr")
## What is IPUMS?
-[IPUMS](https://www.ipums.org/mission-purpose) is the world’s largest
-publicly available population database, providing census and survey data
-from around the world integrated across time and space. IPUMS
-integration and documentation make it easy to study change, conduct
-comparative research, merge information across data types, and analyze
-individuals within family and community context. Data and services are
-available free of charge.
-
-IPUMS consists of multiple projects, or collections, that provide
+[IPUMS](https://www.ipums.org) is the world’s largest publicly available
+population database, providing census and survey data from around the
+world integrated across time and space. IPUMS integration and
+documentation make it easy to study change, conduct comparative
+research, merge information across data types, and analyze individuals
+within family and community context. Data and services are available
+free of charge.
+
+IPUMS consists of multiple projects, or *collections*, that provide
different data products.
- **Microdata** projects distribute data for individual survey units,
@@ -60,7 +60,7 @@ different data products.
GIS mapping files.
ipumsr supports different levels of functionality for each IPUMS
-project, as summarized in the following table:
+project, as summarized in the table below.
@@ -298,26 +298,31 @@ from 1993 to 2013
ipumsr uses the [IPUMS API](https://developer.ipums.org/) to submit data
requests, download data extracts, and get metadata, so the scope of
-ipumsr functionality generally corresponds to the [available API
-functionality](https://developer.ipums.org/docs/v2/apiprogram/apis/). As
-the IPUMS team extends the API to support more functionality for more
-projects, we aim to extend ipumsr capabilities accordingly.
+functionality generally corresponds to that [available via the
+API](https://developer.ipums.org/docs/v2/apiprogram/apis/). As the IPUMS
+team extends the API to support more functionality for more projects, we
+aim to extend ipumsr capabilities accordingly.
## Getting started
If you’re new to IPUMS data, learn more about what’s available through
-the [IPUMS Projects Overview](https://www.ipums.org/overview).
+the [IPUMS Projects Overview](https://www.ipums.org/overview). Then, see
+`vignette("ipums")` for an overview of how to obtain IPUMS data.
-The package vignettes are the best place to learn about what’s available
-in ipumsr itself:
+The package vignettes are the best place to explore what ipumsr has to
+offer:
- To read IPUMS data extracts into R, see `vignette("ipums-read")`.
-- To interact with the IPUMS extract system via the IPUMS API, see
- `vignette("ipums-api")`.
+
+- To interact with the IPUMS extract and metadata system via the IPUMS
+ API, see `vignette("ipums-api")`.
+
- For additional details about microdata and NHGIS extract requests, see
`vignette("ipums-api-micro")` and `vignette("ipums-api-nhgis")`.
+
- To work with labelled values in IPUMS data, see
`vignette("value-labels")`.
+
- For techniques for working with large data extracts, see
`vignette("ipums-bigdata")`.
@@ -346,9 +351,8 @@ We greatly appreciate feedback and development contributions. Please
submit any bug reports, pull requests, or other suggestions on
[GitHub](https://github.com/ipums/ipumsr/issues). Before contributing,
please be sure to read the [Contributing
-Guidelines](https://github.com/ipums/ipumsr/blob/master/CONTRIBUTING.md)
-and the [Code of
-Conduct](https://github.com/ipums/ipumsr/blob/master/CONDUCT.md).
+Guidelines](https://tech.popdata.org/ipumsr/CONTRIBUTING.html) and the
+[Code of Conduct](https://tech.popdata.org/ipumsr/CODE_OF_CONDUCT.html).
If you have general questions or concerns about IPUMS data, check out
our [user forum](https://forum.ipums.org) or send an email to
diff --git a/docs/CONDUCT.html b/docs/CONDUCT.html
deleted file mode 100644
index d3f87ab2..00000000
--- a/docs/CONDUCT.html
+++ /dev/null
@@ -1,112 +0,0 @@
-
-Contributor Code of Conduct • ipumsr
- Skip to contents
-
-
-
As contributors and maintainers of this project, we pledge to respect all people who contribute through reporting issues, posting feature requests, updating documentation, submitting pull requests or patches, and other activities.
-
We are committed to making participation in this project a harassment-free experience for everyone, regardless of level of experience, gender, gender identity and expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, or religion.
-
Examples of unacceptable behavior by participants include the use of sexual language or imagery, derogatory comments or personal attacks, trolling, public or private harassment, insults, or other unprofessional conduct.
-
Project maintainers have the right and responsibility to remove, edit, or reject comments, commits, code, wiki edits, issues, and other contributions that are not aligned to this Code of Conduct. Project maintainers who do not follow the Code of Conduct may be removed from the project team.
-
Instances of abusive, harassing, or otherwise unacceptable behavior may be reported by opening an issue or contacting one or more of the project maintainers.
Thank you for considering improving this project! By participating, you agree to abide by the code of conduct.
+
Thank you for considering improving this project! By participating, you agree to abide by the code of conduct.
Issues (Reporting a problem or suggestion)
@@ -93,8 +93,9 @@
Issues (Reporting a problem or
Pull Requests (Making changes to the package)
-
We appreciate pull requests that follow these guidelines: 1) Make sure that tests pass (and add new ones if possible).
-
Do your best to conform to the code style of the package, currently based on the tidyverse style guide. See the styler package to easily catch stylistic errors.
+
We appreciate pull requests that follow these guidelines:
+
Make sure that tests pass (and add new ones if possible).
+
Do your best to conform to the code style of the package, currently based on the tidyverse style guide. See the styler package to easily catch stylistic errors.
Please add you name and affiliation to the NOTICE.txt file.
Summarize your changes in the NEWS.md file.
diff --git a/docs/articles/cps_select_data.jpg b/docs/articles/cps_select_data.jpg
new file mode 100644
index 00000000..1b2c8879
Binary files /dev/null and b/docs/articles/cps_select_data.jpg differ
diff --git a/docs/articles/ipums-api-micro.html b/docs/articles/ipums-api-micro.html
index 5de7916b..98ff9856 100644
--- a/docs/articles/ipums-api-micro.html
+++ b/docs/articles/ipums-api-micro.html
@@ -144,8 +144,8 @@
This will add a new variable (in this case, SEX_SP) to the output
-data that will contain the sex of a person’s spouse (if no such record
-exists, the value will be 0).
+
This will add a new variable (in this case, SEX_SP) to
+the output data that will contain the sex of a person’s spouse (if no
+such record exists, the value will be 0).
Multiple attached characteristics can be attached for a single
variable:
We can use basic functions from dplyr to filter the metadata to
-those records of interest. For instance, if we wanted to find all the
-data sources related to agriculture from the 1900 Census, we could
-filter on group and description:
+
We can use basic functions from dplyr to filter the
+metadata to those records of interest. For instance, if we wanted to
+find all the data sources related to agriculture from the 1900 Census,
+we could filter on group and description:
This provides a comprehensive list of the possible specifications for
-the input data source. For instance, for the 1900_cAg dataset, we have
-66 tables to choose from, and 3 possible geographic levels:
+the input data source. For instance, for the 1900_cAg
+dataset, we have 66 tables to choose from, and 3 possible geographic
+levels:
Let’s say we’re interested in getting state-level data on the number
-of farms and their average size from the 1900_cAg dataset that we
-identified above. As we can see in the metadata, these data are
-contained in tables NT2 and NT3:
+of farms and their average size from the 1900_cAg dataset
+that we identified above. As we can see in the metadata, these data are
+contained in tables NT2 and NT3:
To request these data, we need to make an explicit dataset
specification. All datasets must be associated with a selection of
data tables and geographic levels. We can use the ds_spec()
@@ -380,8 +384,8 @@
(Dataset specifications can also include selections for
+
Dataset specifications can also include selections for
years and breakdown_values, but these are not
-available for all datasets.)
+available for all datasets.
+
+
+
Time series table specifications
+
Similarly, to make a request for time series tables, use the
tst_spec() helper. This makes a tst_spec
object containing a time series table specification.
@@ -419,7 +427,7 @@
Basic extract definitionsdefine_extract_nhgis( description ="Example time series table request", time_series_tables =tst_spec(
-"CW3",
+"CW3", geog_levels =c("county", "tract"), years =c("1990", "2000"))
@@ -430,10 +438,29 @@
Note that it is still possible to make invalid extract requests (for
-instance, by requesting a dataset or table that doesn’t exist). This
-kind of issue will be caught upon submission to the API, not upon the
-creation of the extract definition.
-
Shapefiles don’t have any additional specification options, and
-therefore can be requested simply by providing their names:
+instance, by requesting a dataset or data table that doesn’t exist).
+This kind of issue will be caught upon submission to the API, not upon
+the creation of the extract definition.
+
More complicated extract definitions
@@ -489,8 +506,8 @@
More complicated extract definitio
easier to generate the specifications independently before creating your
extract request object. You can quickly create multiple
ds_spec objects by iterating across the specifications you
-want to include. Here, we use purrr to do so, but you could also use a
-for loop:
+want to include. Here, we use purrr to do so, but you
+could also use a for loop:
More complicated extract definitio
# data tabels and geog levels indicated abovedatasets<-purrr::map(ds_names,
-~ds_spec(
- name =.x,
- data_tables =tables,
- geog_levels =geogs
-)
+~ds_spec(name =.x, data_tables =tables, geog_levels =geogs))nhgis_ext<-define_extract_nhgis(
diff --git a/docs/articles/ipums-api.html b/docs/articles/ipums-api.html
index 80df8a5e..4699e070 100644
--- a/docs/articles/ipums-api.html
+++ b/docs/articles/ipums-api.html
@@ -122,12 +122,14 @@
IPUMS API
The IPUMS API provides two asset types, both of which are supported
by ipumsr:
-
IPUMS extract endpoints can be used to submit
+
+IPUMS extract endpoints can be used to submit
extract requests for processing and download completed extract
-files.
-
IPUMS metadata endpoints can be used to discover
+files.
+
+IPUMS metadata endpoints can be used to discover
and explore available IPUMS data as well as retrieve codes, names, and
-other extract parameters necessary to form extract requests.
+other extract parameters necessary to form extract requests.
Use of the IPUMS API enables the adoption of a programmatic workflow
that can help users to:
Browsing for IPUMS data can be a little like grocery shopping when
you’re hungry—you show up to grab a couple things, but everything looks
-so good that you end up with an overflowing cart1. Unfortunately, this
+so good that you end up with an overflowing cart.1 Unfortunately, this
can lead to extracts so large that they don’t fit in your computer’s
memory.
If you’ve got an extract that’s too big, both the IPUMS website and
@@ -208,8 +208,9 @@
ipumsr provides two related options for reading data sources in
increments:
-
Chunked functions allow you to specify a function that
-will be called on each chunk of data as it is read in as well as how you
+
+Chunked functions allow you to specify a function that will
+be called on each chunk of data as it is read in as well as how you
would like the chunks to be combined at the end. These functions use the
readr framework
-for reading chunked data.
-
Yielded functions allow more flexibility by returning
+for reading chunked data.
+
+Yielded functions allow more flexibility by returning
control to the user between the loading of each piece of data. These
-functions are unique to ipumsr and fixed-width data.
+functions are unique to ipumsr and fixed-width data.
R has several tools that support database integration, including DBI, dbplyr, sparklyr, sparkR, bigrquery, and others. In this
-example, we’ll use RSQLite to load the data into an in-memory database.
-(We use RSQLite because it is easy to set up, but it is likely not
-efficient enough to fully resolve issues with large IPUMS data, so it
-may be wise to consider an alternative in practice.)
+
+
Importing data into the database
+
Connecting the database to R
+
+
R has several tools that support database integration, including
+DBI, dbplyr, sparklyr,
+bigrquery, and others. In this example, we’ll use
+RSQLite to load the data into an in-memory database. (We
+use RSQLite because it is easy to set up, but it is likely not efficient
+enough to fully resolve issues with large IPUMS data, so it may be wise
+to consider an alternative in practice.)
IPUMS microdata can come in either “rectangular” or “hierarchical”
-format.
+
IPUMS microdata can come in either rectangular or
+hierarchical format.
Rectangular data are transformed such that every row of data
represents the same type of record. For instance, each row will
represent a person record, and all household-level information for that
-person will be included in the same row. (This is the case for the CPS
-example above.)
+person will be included in the same row. (This is the case for
+cps_data shown in the example above.)
Hierarchical data have records of different types interspersed in a
single file. For instance, a household record will be included in its
own row followed by the person records associated with that
@@ -324,11 +324,11 @@
The long format consists of a single data.frame that
-includes rows with varying record types. In this example, some rows have
-a record type of “Household” and others have a record type of “Person”.
-Variables that do not apply to a particular record type will be filled
-with NA in rows of that record type.
+
The long format consists of a single tibble
+that includes rows with varying record types. In this example, some rows
+have a record type of “Household” and others have a record type of
+“Person”. Variables that do not apply to a particular record type will
+be filled with NA in rows of that record type.
To read data in list format, use
read_ipums_micro_list(). This function returns a list where
each element contains all the records for a given record type:
Variable metadata for NHGIS data are slightly different than those
-provided by microdata products. First, they come from a .txt codebook
-file rather than an .xml DDI file. Codebooks can still be loaded into an
-ipums_ddi object, but fields that do not apply to aggregate
-data will be empty. In general, NHGIS codebooks provide only variable
-labels and descriptions, along with citation information.
+
However, variable metadata for NHGIS data are slightly different than
+those provided by microdata products. First, they come from a .txt
+codebook file rather than an .xml DDI file. Codebooks can still be
+loaded into an ipums_ddi object, but fields that do not
+apply to aggregate data will be empty. In general, NHGIS codebooks
+provide only variable labels and descriptions, along with citation
+information.
By design, NHGIS codebooks are human-readable. To view the codebook
-contents themselves without converting to an ipums_ddi
-object, set raw = TRUE.
+
By design, NHGIS codebooks are human-readable, and it may be easier
+to interpret their contents in raw format. To view the codebook itself
+without converting to an ipums_ddi object, set
+raw = TRUE.
In the above example, read_nhgis_codebook() was able to
-identify and load the codebook file, even though the provided file path
-is the same that was provided to read_nhgis() earlier.
-However, for more complicated NHGIS extracts that include data from
-multiple data sources, the provided .zip archive will contain multiple
-codebook and data files.
+
For more complicated NHGIS extracts that include data from multiple
+data sources, the provided .zip archive will contain multiple codebook
+and data files.
You can view the files contained in an extract to determine if this
is the case:
NHGIS data are most easily handled when in .csv format.
+
NHGIS data are most easily handled in .csv format.
read_nhgis() uses readr::read_csv() to handle
the generation of column type specifications. If the guessed
specifications are incorrect, you can use the col_types
@@ -611,12 +609,10 @@
Note that in this case numeric geographic codes are correctly loaded
-as character variables. The correct parsing of NHGIS fixed-width files
-is driven by the column parsing information contained in the .do file
-provided in the .zip archive. This contains information not only about
-column positions and data types, but also implicit decimals in the
-data.
+
The correct parsing of NHGIS fixed-width files is driven by the
+column parsing information contained in the .do file provided in the
+.zip archive. This contains information not only about column positions
+and data types, but also implicit decimals in the data.
If you no longer have access to the .do file, it is best to resubmit
and/or re-download the extract (you may also consider converting to .csv
format in the process). If you have moved the .do file, provide its file
@@ -636,25 +632,21 @@
to load spatial data from any of
-these sources (ipumsr is phasing out support for objects from the
-sp package. If you prefer to work with these objects, use
-sf::as_Spatial() to convert from sf to
-sp).
+these sources as an sf object from sf.
read_ipums_sf() also supports the loading of spatial
files within .zip archives and the file_select syntax for
-file selection (we don’t need file_select in this example
-because there is only one shapefile in this example extract).
+file selection when multiple internal files are present.
These data can then be joined to associated tabular data. To preserve
-IPUMS attributes from the tabular data used in the join, use
-anipums_shape_*_join function:
+IPUMS attributes from the tabular data used in the join, use an
+ipums_shape_*_join() function:
For NHGIS data, the join code typically corresponds to the “GISJOIN”
-variable. However, for microdata projects, the variable name used for a
-geographic level in the tabular data may differ from that in the spatial
-data. Consult the documentation and metadata for these files to identify
-the correct join columns and use the by argument to join on
-these columns.
+
For NHGIS data, the join code typically corresponds to the
+GISJOIN variable. However, for microdata projects, the
+variable name used for a geographic level in the tabular data may differ
+from that in the spatial data. Consult the documentation and metadata
+for these files to identify the correct join columns and use the
+by argument to join on these columns.
Once joined, data include both statistical and spatial information
along with the variable metadata.
@@ -707,11 +699,11 @@
Harmonized vs. non-harmonized data<
that geographic boundaries shift over time. IPUMS therefore provides
multiple types of spatial data:
-
Harmonized (also called “integrated” or “consistent”) files have
+
Harmonized (also called “integrated” or “consistent”) files have
been made consistent over time by combining geographies that share area
-for different time periods.
-
Non-harmonized, or year-specific, files represent geographies at
-a specific point in time.
+for different time periods.
+
Non-harmonized, or year-specific, files represent geographies at a
+specific point in time.
Furthermore, some NHGIS time series tables have been standardized
such that the statistics have been adjusted to apply to a year-specific
diff --git a/docs/articles/ipums.html b/docs/articles/ipums.html
index ad89e620..ca80e422 100644
--- a/docs/articles/ipums.html
+++ b/docs/articles/ipums.html
@@ -119,9 +119,9 @@
IPUMS API
-
This text provides an overview of how to find, request, download, and
-read IPUMS data into R. For a general introduction to IPUMS and ipumsr,
-see the ipumsr home
+
Obtaining IPUMS datacertain
-IPUMS projects, which also determines the functionality that ipumsr
-can support.
+the IPUMS website or the IPUMS API.
+ipumsr provides a set of client tools to interface with the API. Note
+that only certain
+IPUMS projects are currently supported by the IPUMS API.
Obtaining data via an IPUMS project website
-
To create a new extract request via an IPUMS project website,
-navigate to the extract interface for the IPUMS project of interest by
-clicking Select Data in the heading of the project
-website. The project extract interface allows you to explore what’s
-available, find documentation about data concepts and sources, and then
+
To create a new extract request via an IPUMS project website (e.g. IPUMS CPS), navigate to the
+extract interface for that project by clicking Select
+Data in the heading of the project website.
+
+
The project’s extract interface allows you to explore what’s
+available, find documentation about data concepts and sources, and
specify the data you’d like to download. The data selection parameters
will differ across projects; see each project’s documentation for more
-details on the available options. If you’ve never created an extract for
-the project you’re interested in, a good way to learn the basics is to
-watch a project-specific video on creating extracts hosted on the IPUMS Tutorials
+details on the available options.
+
If you’ve never created an extract for the project you’re interested
+in, a good way to learn the basics is to watch a project-specific video
+on creating extracts hosted on the IPUMS Tutorials
page.
Downloading from microdata projects
@@ -169,18 +170,18 @@
Downloading from microdata projects
button to download the data file. Then, right-click the
DDI link in the Codebook column, and select
Save Link As… (see below).
+
Note that some browsers may display different text, but there should
-be an option to download the DDI file as .xml. For instance, on Safari,
-select Download Linked File As…. For ipumsr to read the
-metadata, it is necessary to save the file in .xml format,
+be an option to download the DDI file as .xml. (For instance, on Safari,
+select Download Linked File As….) For ipumsr to read
+the metadata, you must save the file in .xml format,
not .html format.
-
Downloading from aggregate data projects
Aggregate data projects include data and metadata together in a
-single .zip archive file. To download them, simply click on the green
+single .zip archive. To download them, simply click on the green
Tables button (for tabular data) and/or GIS
Files button (for spatial boundary or location data) in the
Download Data column.
@@ -190,27 +191,53 @@
Downloading from aggregate dat
Obtaining data via the IPUMS API
Users can also create and submit extract requests within R by using
-ipumsr functions that interface with the IPUMS API. The IPUMS API
-currently supports access to the extract system for the following
+ipumsr functions that interface with the IPUMS API. The IPUMS API
+currently supports access to the extract system for certain
+IPUMS collections.
+
+
Extract support
+
+
ipumsr provides an interface to the IPUMS extract system via the
+IPUMS API for the following collections:
+
+
IPUMS USA
+
IPUMS CPS
+
IPUMS International
+
IPUMS NHGIS
+
+
+
+
Metadata support
+
+
ipumsr provides access to comprehensive metadata via the IPUMS API
+for the following collections:
+
+
IPUMS NHGIS
+
+
Users can query NHGIS metadata to explore available data when
+specifying NHGIS extract requests.
+
A listing of available samples is provided for the following
collections:
-
IPUMS USA
-
IPUMS CPS
-
IPUMS International
-
IPUMS NHGIS
+
IPUMS USA
+
IPUMS CPS
+
IPUMS International
-
The IPUMS API and ipumsr also support access to IPUMS NHGIS metadata,
-so users can query NHGIS metadata in R to explore what data are
-available and specify NHGIS data requests. At this time, creating
-requests for microdata generally requires using the corresponding
-project websites to find samples and variables of interest and obtain
-their identifiers for use in R extract definitions.
+
Increased access to metadata for these projects is in progress.
+Currently, creating extract requests for these projects requires using
+the corresponding project websites to find samples and variables of
+interest and obtain their API identifiers for use in R extract
+definitions.
+
+
+
Workflow
+
Once you have identified the data you would like to request, the
-workflow for requesting and downloading data via API is straightforward.
-First, define the parameters of your extract. The available extract
+workflow for requesting and downloading data via API is
+straightforward.
+
First, define the parameters of your extract. The available extract
definition options will differ by IPUMS data collection. See the microdata API request and NHGIS API request vignettes for more
-details on defining an extract. (The NHGIS vignette also discusses how
-to access NHGIS metadata.)
Reading IPUMS datareadr in two ways:
+functions expand on those provided in readr in two
+ways:
-
ipumsr anticipates standard IPUMS file structures, limiting the
-need for users to manually extract and organize their downloaded files
-before reading.
-
ipumsr uses an extract’s metadata files to automatically attach
+
ipumsr anticipates standard IPUMS file structures, limiting the need
+for users to manually extract and organize their downloaded files before
+reading.
+
ipumsr uses an extract’s metadata files to automatically attach
contextual information to the data. This allows users to easily identify
-variable names, variable descriptions, and labeled data values (from haven), which are common in
-IPUMS files.
+variable names, variable descriptions, and labeled data values (from
+haven), which are common in IPUMS files.
-
For microdata files, use the read_ipums_micro_*()
-family:
nhgis_file<-ipums_example("nhgis0972_csv.zip")
-nhgis_data<-read_nhgis(nhgis_file)
-#> Use of data from NHGIS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details.
-#> Rows: 71Columns: 25
-#> ──Column specification────────────────────────────────────────────────────────
-#> Delimiter: ","
-#> chr (9): GISJOIN, STUSAB, CMSA, PMSA, PMSAA, AREALAND, AREAWAT, ANPSADPI, F...
-#> dbl (13): YEAR, MSA_CMSAA, INTPTLAT, INTPTLNG, PSADC, D6Z001, D6Z002, D6Z003...
-#> lgl (3): DIVISIONA, REGIONA, STATEA
-#>
-#> ℹ Use `spec()` to retrieve the full column specification for this data.
-#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
+nhgis_data<-read_nhgis(nhgis_file, verbose =FALSE)head(nhgis_data)#> # A tibble: 6 × 25
@@ -303,6 +330,10 @@
ipumsr is primarily designed to read data produced by the IPUMS
extract system. However, IPUMS does distribute other files, often
available via direct download. In many cases, these can be loaded with
ipumsr. Otherwise, these files can likely be handled by existing data
-reading packages like readr
-(for delimited files) or haven (for Stata, SPSS, or SAS
-files).
ipumsr also provides a family of lbl_*() functions to
assist in accessing and manipulating the value-level metadata included
in IPUMS data. This allows for value labels to be incorporated into the
@@ -416,6 +452,7 @@
This indicates that the data contained in these columns are integers
but include value labels. You can use the function
is.labelled() to determine if a variable is indeed
@@ -289,14 +286,16 @@
Cautions regarding labelled
While labelled variables provide the benefits described
above, they also present challenges.
For example, you may have noticed that both of the means
-calculated above are suspect.
-
In the case of AGE_FACTOR, the values have been remapped
-during conversion and several are inconsistent with the original
-data.
-
In the case of AGE, we have considered all people over
+calculated above are suspect:
+
+
In the case of AGE_FACTOR, the values have been
+remapped during conversion and several are inconsistent with the
+original data.
+
In the case of AGE, we have considered all people over
90 to be exactly 90, and all people over 99 to be exactly
99—labelled variables don’t ensure that calculations are
-correct any more than factors do!
+correct any more than factors do!
+
Furthermore, many R functions ignore value labels or even actively
remove them from the data:
On the right-hand side, provide a function that returns
+
On the left-hand side, use the lbl() helper to define a
+new value-label pair.
+
On the right-hand side, provide a function that returns
TRUE for those value-label pairs that should be relabelled
-with the new value-label pair from the left-hand side.
+with the new value-label pair from the left-hand side.
The function again uses the .val and .lblsyntax mentioned above to refer to values and
@@ -728,18 +727,17 @@
The labelled
-package provides other methods for manipulating value labels, some of
-which overlap those provided by ipumsr.
-
The questionr package
-includes functions for exploring labelled variables. In
-particular, the functions describe, freq and
-lookfor all print out to console information about the
-variable using the value labels.
The labelled package provides other methods for
+manipulating value labels, some of which overlap those provided by
+ipumsr.
+
The questionr package includes functions for exploring
+labelled variables. In particular, the functions
+describe, freq and lookfor all
+print out to console information about the variable using the value
+labels.
Finally, the foreign and prettyR packages
don’t use the labelled class, but provide similar
functionality for handling value labels, which could be adapted for use
diff --git a/docs/bootstrap-toc.css b/docs/bootstrap-toc.css
deleted file mode 100644
index 5a859415..00000000
--- a/docs/bootstrap-toc.css
+++ /dev/null
@@ -1,60 +0,0 @@
-/*!
- * Bootstrap Table of Contents v0.4.1 (http://afeld.github.io/bootstrap-toc/)
- * Copyright 2015 Aidan Feldman
- * Licensed under MIT (https://github.com/afeld/bootstrap-toc/blob/gh-pages/LICENSE.md) */
-
-/* modified from https://github.com/twbs/bootstrap/blob/94b4076dd2efba9af71f0b18d4ee4b163aa9e0dd/docs/assets/css/src/docs.css#L548-L601 */
-
-/* All levels of nav */
-nav[data-toggle='toc'] .nav > li > a {
- display: block;
- padding: 4px 20px;
- font-size: 13px;
- font-weight: 500;
- color: #767676;
-}
-nav[data-toggle='toc'] .nav > li > a:hover,
-nav[data-toggle='toc'] .nav > li > a:focus {
- padding-left: 19px;
- color: #563d7c;
- text-decoration: none;
- background-color: transparent;
- border-left: 1px solid #563d7c;
-}
-nav[data-toggle='toc'] .nav > .active > a,
-nav[data-toggle='toc'] .nav > .active:hover > a,
-nav[data-toggle='toc'] .nav > .active:focus > a {
- padding-left: 18px;
- font-weight: bold;
- color: #563d7c;
- background-color: transparent;
- border-left: 2px solid #563d7c;
-}
-
-/* Nav: second level (shown on .active) */
-nav[data-toggle='toc'] .nav .nav {
- display: none; /* Hide by default, but at >768px, show it */
- padding-bottom: 10px;
-}
-nav[data-toggle='toc'] .nav .nav > li > a {
- padding-top: 1px;
- padding-bottom: 1px;
- padding-left: 30px;
- font-size: 12px;
- font-weight: normal;
-}
-nav[data-toggle='toc'] .nav .nav > li > a:hover,
-nav[data-toggle='toc'] .nav .nav > li > a:focus {
- padding-left: 29px;
-}
-nav[data-toggle='toc'] .nav .nav > .active > a,
-nav[data-toggle='toc'] .nav .nav > .active:hover > a,
-nav[data-toggle='toc'] .nav .nav > .active:focus > a {
- padding-left: 28px;
- font-weight: 500;
-}
-
-/* from https://github.com/twbs/bootstrap/blob/e38f066d8c203c3e032da0ff23cd2d6098ee2dd6/docs/assets/css/src/docs.css#L631-L634 */
-nav[data-toggle='toc'] .nav > .active > ul {
- display: block;
-}
diff --git a/docs/bootstrap-toc.js b/docs/bootstrap-toc.js
deleted file mode 100644
index 1cdd573b..00000000
--- a/docs/bootstrap-toc.js
+++ /dev/null
@@ -1,159 +0,0 @@
-/*!
- * Bootstrap Table of Contents v0.4.1 (http://afeld.github.io/bootstrap-toc/)
- * Copyright 2015 Aidan Feldman
- * Licensed under MIT (https://github.com/afeld/bootstrap-toc/blob/gh-pages/LICENSE.md) */
-(function() {
- 'use strict';
-
- window.Toc = {
- helpers: {
- // return all matching elements in the set, or their descendants
- findOrFilter: function($el, selector) {
- // http://danielnouri.org/notes/2011/03/14/a-jquery-find-that-also-finds-the-root-element/
- // http://stackoverflow.com/a/12731439/358804
- var $descendants = $el.find(selector);
- return $el.filter(selector).add($descendants).filter(':not([data-toc-skip])');
- },
-
- generateUniqueIdBase: function(el) {
- var text = $(el).text();
- var anchor = text.trim().toLowerCase().replace(/[^A-Za-z0-9]+/g, '-');
- return anchor || el.tagName.toLowerCase();
- },
-
- generateUniqueId: function(el) {
- var anchorBase = this.generateUniqueIdBase(el);
- for (var i = 0; ; i++) {
- var anchor = anchorBase;
- if (i > 0) {
- // add suffix
- anchor += '-' + i;
- }
- // check if ID already exists
- if (!document.getElementById(anchor)) {
- return anchor;
- }
- }
- },
-
- generateAnchor: function(el) {
- if (el.id) {
- return el.id;
- } else {
- var anchor = this.generateUniqueId(el);
- el.id = anchor;
- return anchor;
- }
- },
-
- createNavList: function() {
- return $('
IPUMS is the world’s largest publicly available population database, providing census and survey data from around the world integrated across time and space. IPUMS integration and documentation make it easy to study change, conduct comparative research, merge information across data types, and analyze individuals within family and community context. Data and services are available free of charge.
-
IPUMS consists of multiple projects, or collections, that provide different data products.
+
IPUMS is the world’s largest publicly available population database, providing census and survey data from around the world integrated across time and space. IPUMS integration and documentation make it easy to study change, conduct comparative research, merge information across data types, and analyze individuals within family and community context. Data and services are available free of charge.
+
IPUMS consists of multiple projects, or collections, that provide different data products.
Microdata projects distribute data for individual survey units, like people or households.
Aggregate data projects distribute summary tables of aggregate statistics for particular geographic units along with corresponding GIS mapping files.
-
ipumsr supports different levels of functionality for each IPUMS project, as summarized in the following table:
+
ipumsr supports different levels of functionality for each IPUMS project, as summarized in the table below.
@@ -369,19 +369,19 @@
What is IPUMS?
-
ipumsr uses the IPUMS API to submit data requests, download data extracts, and get metadata, so the scope of ipumsr functionality generally corresponds to the available API functionality. As the IPUMS team extends the API to support more functionality for more projects, we aim to extend ipumsr capabilities accordingly.
+
ipumsr uses the IPUMS API to submit data requests, download data extracts, and get metadata, so the scope of functionality generally corresponds to that available via the API. As the IPUMS team extends the API to support more functionality for more projects, we aim to extend ipumsr capabilities accordingly.
Getting started
-
If you’re new to IPUMS data, learn more about what’s available through the IPUMS Projects Overview.
-
The package vignettes are the best place to learn about what’s available in ipumsr itself:
+
If you’re new to IPUMS data, learn more about what’s available through the IPUMS Projects Overview. Then, see vignette("ipums") for an overview of how to obtain IPUMS data.
+
The package vignettes are the best place to explore what ipumsr has to offer:
The IPUMS support website also houses many project-specific R-based training exercises. However, note that some of these exercises may not be be up to date with ipumsr’s current functionality.
We greatly appreciate feedback and development contributions. Please submit any bug reports, pull requests, or other suggestions on GitHub. Before contributing, please be sure to read the Contributing Guidelines and the Code of Conduct.
+
We greatly appreciate feedback and development contributions. Please submit any bug reports, pull requests, or other suggestions on GitHub. Before contributing, please be sure to read the Contributing Guidelines and the Code of Conduct.
If you have general questions or concerns about IPUMS data, check out our user forum or send an email to ipums@umn.edu.
diff --git a/docs/pkgdown.css b/docs/pkgdown.css
deleted file mode 100644
index 80ea5b83..00000000
--- a/docs/pkgdown.css
+++ /dev/null
@@ -1,384 +0,0 @@
-/* Sticky footer */
-
-/**
- * Basic idea: https://philipwalton.github.io/solved-by-flexbox/demos/sticky-footer/
- * Details: https://github.com/philipwalton/solved-by-flexbox/blob/master/assets/css/components/site.css
- *
- * .Site -> body > .container
- * .Site-content -> body > .container .row
- * .footer -> footer
- *
- * Key idea seems to be to ensure that .container and __all its parents__
- * have height set to 100%
- *
- */
-
-html, body {
- height: 100%;
-}
-
-body {
- position: relative;
-}
-
-body > .container {
- display: flex;
- height: 100%;
- flex-direction: column;
-}
-
-body > .container .row {
- flex: 1 0 auto;
-}
-
-footer {
- margin-top: 45px;
- padding: 35px 0 36px;
- border-top: 1px solid #e5e5e5;
- color: #666;
- display: flex;
- flex-shrink: 0;
-}
-footer p {
- margin-bottom: 0;
-}
-footer div {
- flex: 1;
-}
-footer .pkgdown {
- text-align: right;
-}
-footer p {
- margin-bottom: 0;
-}
-
-img.icon {
- float: right;
-}
-
-/* Ensure in-page images don't run outside their container */
-.contents img {
- max-width: 100%;
- height: auto;
-}
-
-/* Fix bug in bootstrap (only seen in firefox) */
-summary {
- display: list-item;
-}
-
-/* Typographic tweaking ---------------------------------*/
-
-.contents .page-header {
- margin-top: calc(-60px + 1em);
-}
-
-dd {
- margin-left: 3em;
-}
-
-/* Section anchors ---------------------------------*/
-
-a.anchor {
- display: none;
- margin-left: 5px;
- width: 20px;
- height: 20px;
-
- background-image: url(./link.svg);
- background-repeat: no-repeat;
- background-size: 20px 20px;
- background-position: center center;
-}
-
-h1:hover .anchor,
-h2:hover .anchor,
-h3:hover .anchor,
-h4:hover .anchor,
-h5:hover .anchor,
-h6:hover .anchor {
- display: inline-block;
-}
-
-/* Fixes for fixed navbar --------------------------*/
-
-.contents h1, .contents h2, .contents h3, .contents h4 {
- padding-top: 60px;
- margin-top: -40px;
-}
-
-/* Navbar submenu --------------------------*/
-
-.dropdown-submenu {
- position: relative;
-}
-
-.dropdown-submenu>.dropdown-menu {
- top: 0;
- left: 100%;
- margin-top: -6px;
- margin-left: -1px;
- border-radius: 0 6px 6px 6px;
-}
-
-.dropdown-submenu:hover>.dropdown-menu {
- display: block;
-}
-
-.dropdown-submenu>a:after {
- display: block;
- content: " ";
- float: right;
- width: 0;
- height: 0;
- border-color: transparent;
- border-style: solid;
- border-width: 5px 0 5px 5px;
- border-left-color: #cccccc;
- margin-top: 5px;
- margin-right: -10px;
-}
-
-.dropdown-submenu:hover>a:after {
- border-left-color: #ffffff;
-}
-
-.dropdown-submenu.pull-left {
- float: none;
-}
-
-.dropdown-submenu.pull-left>.dropdown-menu {
- left: -100%;
- margin-left: 10px;
- border-radius: 6px 0 6px 6px;
-}
-
-/* Sidebar --------------------------*/
-
-#pkgdown-sidebar {
- margin-top: 30px;
- position: -webkit-sticky;
- position: sticky;
- top: 70px;
-}
-
-#pkgdown-sidebar h2 {
- font-size: 1.5em;
- margin-top: 1em;
-}
-
-#pkgdown-sidebar h2:first-child {
- margin-top: 0;
-}
-
-#pkgdown-sidebar .list-unstyled li {
- margin-bottom: 0.5em;
-}
-
-/* bootstrap-toc tweaks ------------------------------------------------------*/
-
-/* All levels of nav */
-
-nav[data-toggle='toc'] .nav > li > a {
- padding: 4px 20px 4px 6px;
- font-size: 1.5rem;
- font-weight: 400;
- color: inherit;
-}
-
-nav[data-toggle='toc'] .nav > li > a:hover,
-nav[data-toggle='toc'] .nav > li > a:focus {
- padding-left: 5px;
- color: inherit;
- border-left: 1px solid #878787;
-}
-
-nav[data-toggle='toc'] .nav > .active > a,
-nav[data-toggle='toc'] .nav > .active:hover > a,
-nav[data-toggle='toc'] .nav > .active:focus > a {
- padding-left: 5px;
- font-size: 1.5rem;
- font-weight: 400;
- color: inherit;
- border-left: 2px solid #878787;
-}
-
-/* Nav: second level (shown on .active) */
-
-nav[data-toggle='toc'] .nav .nav {
- display: none; /* Hide by default, but at >768px, show it */
- padding-bottom: 10px;
-}
-
-nav[data-toggle='toc'] .nav .nav > li > a {
- padding-left: 16px;
- font-size: 1.35rem;
-}
-
-nav[data-toggle='toc'] .nav .nav > li > a:hover,
-nav[data-toggle='toc'] .nav .nav > li > a:focus {
- padding-left: 15px;
-}
-
-nav[data-toggle='toc'] .nav .nav > .active > a,
-nav[data-toggle='toc'] .nav .nav > .active:hover > a,
-nav[data-toggle='toc'] .nav .nav > .active:focus > a {
- padding-left: 15px;
- font-weight: 500;
- font-size: 1.35rem;
-}
-
-/* orcid ------------------------------------------------------------------- */
-
-.orcid {
- font-size: 16px;
- color: #A6CE39;
- /* margins are required by official ORCID trademark and display guidelines */
- margin-left:4px;
- margin-right:4px;
- vertical-align: middle;
-}
-
-/* Reference index & topics ----------------------------------------------- */
-
-.ref-index th {font-weight: normal;}
-
-.ref-index td {vertical-align: top; min-width: 100px}
-.ref-index .icon {width: 40px;}
-.ref-index .alias {width: 40%;}
-.ref-index-icons .alias {width: calc(40% - 40px);}
-.ref-index .title {width: 60%;}
-
-.ref-arguments th {text-align: right; padding-right: 10px;}
-.ref-arguments th, .ref-arguments td {vertical-align: top; min-width: 100px}
-.ref-arguments .name {width: 20%;}
-.ref-arguments .desc {width: 80%;}
-
-/* Nice scrolling for wide elements --------------------------------------- */
-
-table {
- display: block;
- overflow: auto;
-}
-
-/* Syntax highlighting ---------------------------------------------------- */
-
-pre, code, pre code {
- background-color: #f8f8f8;
- color: #333;
-}
-pre, pre code {
- white-space: pre-wrap;
- word-break: break-all;
- overflow-wrap: break-word;
-}
-
-pre {
- border: 1px solid #eee;
-}
-
-pre .img, pre .r-plt {
- margin: 5px 0;
-}
-
-pre .img img, pre .r-plt img {
- background-color: #fff;
-}
-
-code a, pre a {
- color: #375f84;
-}
-
-a.sourceLine:hover {
- text-decoration: none;
-}
-
-.fl {color: #1514b5;}
-.fu {color: #000000;} /* function */
-.ch,.st {color: #036a07;} /* string */
-.kw {color: #264D66;} /* keyword */
-.co {color: #888888;} /* comment */
-
-.error {font-weight: bolder;}
-.warning {font-weight: bolder;}
-
-/* Clipboard --------------------------*/
-
-.hasCopyButton {
- position: relative;
-}
-
-.btn-copy-ex {
- position: absolute;
- right: 0;
- top: 0;
- visibility: hidden;
-}
-
-.hasCopyButton:hover button.btn-copy-ex {
- visibility: visible;
-}
-
-/* headroom.js ------------------------ */
-
-.headroom {
- will-change: transform;
- transition: transform 200ms linear;
-}
-.headroom--pinned {
- transform: translateY(0%);
-}
-.headroom--unpinned {
- transform: translateY(-100%);
-}
-
-/* mark.js ----------------------------*/
-
-mark {
- background-color: rgba(255, 255, 51, 0.5);
- border-bottom: 2px solid rgba(255, 153, 51, 0.3);
- padding: 1px;
-}
-
-/* vertical spacing after htmlwidgets */
-.html-widget {
- margin-bottom: 10px;
-}
-
-/* fontawesome ------------------------ */
-
-.fab {
- font-family: "Font Awesome 5 Brands" !important;
-}
-
-/* don't display links in code chunks when printing */
-/* source: https://stackoverflow.com/a/10781533 */
-@media print {
- code a:link:after, code a:visited:after {
- content: "";
- }
-}
-
-/* Section anchors ---------------------------------
- Added in pandoc 2.11: https://github.com/jgm/pandoc-templates/commit/9904bf71
-*/
-
-div.csl-bib-body { }
-div.csl-entry {
- clear: both;
-}
-.hanging-indent div.csl-entry {
- margin-left:2em;
- text-indent:-2em;
-}
-div.csl-left-margin {
- min-width:2em;
- float:left;
-}
-div.csl-right-inline {
- margin-left:2em;
- padding-left:1em;
-}
-div.csl-indent {
- margin-left: 2em;
-}
diff --git a/docs/pkgdown.yml b/docs/pkgdown.yml
index 6187d6a6..e0970590 100644
--- a/docs/pkgdown.yml
+++ b/docs/pkgdown.yml
@@ -9,7 +9,7 @@ articles:
ipums-read: ipums-read.html
ipums: ipums.html
value-labels: value-labels.html
-last_built: 2024-02-22T19:34Z
+last_built: 2024-02-23T18:11Z
urls:
reference: http://tech.popdata.org/ipumsr/reference
article: http://tech.popdata.org/ipumsr/articles
diff --git a/docs/search.json b/docs/search.json
index e1ab4156..ebeb13b2 100644
--- a/docs/search.json
+++ b/docs/search.json
@@ -1 +1 @@
-[{"path":"http://tech.popdata.org/ipumsr/CODE_OF_CONDUCT.html","id":null,"dir":"","previous_headings":"","what":"Contributor Code of Conduct","title":"Contributor Code of Conduct","text":"contributors maintainers project, pledge respect people contribute reporting issues, posting feature requests, updating documentation, submitting pull requests patches, activities. committed making participation project harassment-free experience everyone, regardless level experience, gender, gender identity expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion. Examples unacceptable behavior participants include use sexual language imagery, derogatory comments personal attacks, trolling, public private harassment, insults, unprofessional conduct. Project maintainers right responsibility remove, edit, reject comments, commits, code, wiki edits, issues, contributions aligned Code Conduct. Project maintainers follow Code Conduct may removed project team. Instances abusive, harassing, otherwise unacceptable behavior may reported opening issue contacting one project maintainers. Code Conduct adapted Contributor Covenant (http:contributor-covenant.org), version 1.0.0, available http://contributor-covenant.org/version/1/0/0/","code":""},{"path":"http://tech.popdata.org/ipumsr/CONDUCT.html","id":null,"dir":"","previous_headings":"","what":"Contributor Code of Conduct","title":"Contributor Code of Conduct","text":"contributors maintainers project, pledge respect people contribute reporting issues, posting feature requests, updating documentation, submitting pull requests patches, activities. committed making participation project harassment-free experience everyone, regardless level experience, gender, gender identity expression, sexual orientation, disability, personal appearance, body size, race, ethnicity, age, religion. Examples unacceptable behavior participants include use sexual language imagery, derogatory comments personal attacks, trolling, public private harassment, insults, unprofessional conduct. Project maintainers right responsibility remove, edit, reject comments, commits, code, wiki edits, issues, contributions aligned Code Conduct. Project maintainers follow Code Conduct may removed project team. Instances abusive, harassing, otherwise unacceptable behavior may reported opening issue contacting one project maintainers. Code Conduct adapted Contributor Covenant (http:contributor-covenant.org), version 1.0.0, available http://contributor-covenant.org/version/1/0/0/","code":""},{"path":"http://tech.popdata.org/ipumsr/CONTRIBUTING.html","id":null,"dir":"","previous_headings":"","what":"Contributing","title":"Contributing","text":"Thank considering improving project! participating, agree abide code conduct.","code":""},{"path":"http://tech.popdata.org/ipumsr/CONTRIBUTING.html","id":"issues-reporting-a-problem-or-suggestion","dir":"","previous_headings":"","what":"Issues (Reporting a problem or suggestion)","title":"Contributing","text":"’ve experience problem package, suggestion , please post issues tab. space meant questions directly related R package, questions related specific extract may better answered via email ipums@umn.edu (don’t worry making mistake, know tough tell difference). Since extracts large files, posting minimal reproducible examples may difficult. Therefore, helpful can provide much detail problem possible including code error message, project extract , variables selected, file type, etc. ’ll best answer question.","code":""},{"path":"http://tech.popdata.org/ipumsr/CONTRIBUTING.html","id":"pull-requests-making-changes-to-the-package","dir":"","previous_headings":"","what":"Pull Requests (Making changes to the package)","title":"Contributing","text":"appreciate pull requests follow guidelines: 1) Make sure tests pass (add new ones possible). best conform code style package, currently based tidyverse style guide. See styler package easily catch stylistic errors. Please add name affiliation NOTICE.txt file. Summarize changes NEWS.md file.","code":""},{"path":"http://tech.popdata.org/ipumsr/CONTRIBUTING.html","id":"basics-of-pull-requests","dir":"","previous_headings":"","what":"Basics of Pull Requests","title":"Contributing","text":"’ve never worked R package , book R Packages Hadley Wickham great resource learning mechanics building R package contributing R packages github. Additionally, ’s great primer git github specifically. meantime, ’s quick step--step guide contributing project using RStudio: don’t already RStudio Git installed, can download . Fork repo (top right corner button github website). Clone repo RStudio’s toolbar: File > New Project > Verson Control > https://github.com/*YOUR_USER_NAME*/ipumsr/. Make changes local copy. Commit changes push github webiste using RStudio’s Git pane (push using green arrow). Submit pull request, selecting “compare across forks” option. Please include short message summarizing changes.","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-micro.html","id":"supported-microdata-collections","dir":"Articles","previous_headings":"","what":"Supported microdata collections","title":"Microdata API Requests","text":"IPUMS provides several data collections classified microdata. Currently, following microdata collections supported IPUMS API (shown codes used refer ipumsr): IPUMS USA (\"usa\") IPUMS CPS (\"cps\") IPUMS International (\"ipumsi\") API support continue added collections future. See API documentation information upcoming additions API. addition microdata projects, IPUMS API also supports IPUMS NHGIS data. details obtaining IPUMS NHGIS data using ipumsr, see NHGIS-specific vignette. getting started, ’ll load ipumsr dplyr, helpful demo:","code":"library(ipumsr) library(dplyr)"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-micro.html","id":"basic-ipums-microdata-concepts","dir":"Articles","previous_headings":"","what":"Basic IPUMS microdata concepts","title":"Microdata API Requests","text":"Every microdata extract definition must contain set requested samples variables. IPUMS microdata collection, sample refers distinct combination records variables. record set values describe characteristics single unit measurement (e.g. single person single household), variables define characteristics measured. single sample can contain multiple record types (e.g. person records, household records, activity records, ), correspond different units measurement. Note usage term “sample” correspond perfectly statistical sense subset individuals population. Many IPUMS samples samples statistical sense, “full-count” samples, meaning contain individuals population.","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-micro.html","id":"ipums-microdata-metadata-forthcoming","dir":"Articles","previous_headings":"","what":"IPUMS microdata metadata (forthcoming)","title":"Microdata API Requests","text":"course, request samples variables, know codes API uses refer . samples, IPUMS API uses special codes don’t appear web-based extract builder. variables, API uses variable names appear web. IPUMS API yet provide comprehensive set metadata endpoints IPUMS microdata collections, users can use get_sample_info() function identify codes used refer specific samples communicating API. values listed name column correspond code use request sample creating extract definition submitted IPUMS API. can use basic functions dplyr filter metadata samples interest. instance, find IPUMS International samples Mexico, following: IPUMS intends add support accessing variable metadata via API future. , use web-based extract builder given collection find variable names availability sample. See IPUMS API documentation links extract builder microdata collection API support. Alternatively, made extract previously web interface, can use get_extract_info() identify variable names includes. See IPUMS API introduction details.","code":"cps_samps <- get_sample_info(\"cps\") head(cps_samps) #> # A tibble: 6 × 2 #> name description #> #> 1 cps1962_03s IPUMS-CPS, ASEC 1962 #> 2 cps1963_03s IPUMS-CPS, ASEC 1963 #> 3 cps1964_03s IPUMS-CPS, ASEC 1964 #> 4 cps1965_03s IPUMS-CPS, ASEC 1965 #> 5 cps1966_03s IPUMS-CPS, ASEC 1966 #> 6 cps1967_03s IPUMS-CPS, ASEC 1967 ipumsi_samps <- get_sample_info(\"ipumsi\") ipumsi_samps %>% filter(grepl(\"Mexico\", description)) #> # A tibble: 70 × 2 #> name description #> #> 1 mx1960a Mexico 1960 #> 2 mx1970a Mexico 1970 #> 3 mx1990a Mexico 1990 #> 4 mx1995a Mexico 1995 #> 5 mx2000a Mexico 2000 #> 6 mx2005a Mexico 2005 #> 7 mx2010a Mexico 2010 #> 8 mx2015a Mexico 2015 #> 9 mx2005h Mexico 2005 Q1 LFS #> 10 mx2005i Mexico 2005 Q2 LFS #> # ℹ 60 more rows"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-micro.html","id":"defining-an-ipums-microdata-extract-request","dir":"Articles","previous_headings":"","what":"Defining an IPUMS microdata extract request","title":"Microdata API Requests","text":"IPUMS collection extract definition function used specify parameters new extract request scratch. functions take form define_extract_*(). microdata collections, : IPUMS USA: define_extract_usa() IPUMS CPS: define_extract_cps() IPUMS International: define_extract_ipumsi() define extract request, can specify data included extract indicate desired format layout. microdata collection extract definition function, uses syntax. examples vignette use multiple collections, syntax demonstrate can applied supported microdata collections. simple extract definition needs contain names samples variables include request: produces ipums_extract object containing extract request specifications ready submitted IPUMS API. request variable extract definition, resulting data extract include variable requested samples available. request variable available requested samples, IPUMS API throw informative error try submit request. Beyond just specifying samples variables, several additional options available refine data requested microdata extract request.","code":"cps_ext <- define_extract_cps( description = \"Example CPS extract\", samples = c(\"cps2018_03s\", \"cps2019_03s\"), variables = c(\"AGE\", \"SEX\", \"RACE\", \"STATEFIP\") ) cps_ext #> Unsubmitted IPUMS CPS extract #> Description: Example CPS extract #> #> Samples: (2 total) cps2018_03s, cps2019_03s #> Variables: (4 total) AGE, SEX, RACE, STATEFIP"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-micro.html","id":"detailed-variable-specifications","dir":"Articles","previous_headings":"","what":"Detailed variable specifications","title":"Microdata API Requests","text":"IPUMS API supports several detailed specification options can applied individual variables extract request: case selections, attached characteristics, data quality flags. describe options depth, ’ll introduce syntax used add extract definition.","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-micro.html","id":"syntax","dir":"Articles","previous_headings":"Detailed variable specifications","what":"Syntax","title":"Microdata API Requests","text":"add options variable, need introduce var_spec() helper function. var_spec() bundles selections given variable together single object (case, var_spec object): include specification extract, simply provide variables argument extract definition. multiple variables included, pass list var_spec objects: fact, investigate original extract object , ’ll notice variables automatically converted var_spec objects, even though provided character vectors: , var_spec object additional specifications produce default data given variable. , following equivalent: specified variables converted var_spec objects, can also pass list elements var_spec objects just variable names. convenient detailed specifications subset variables: (Samples also converted samp_spec objects, currently aren’t additional specifications available samples, reason use anything character vector samples argument.) Now ’ve covered basic syntax including detailed variable specifications, can describe available options depth.","code":"var <- var_spec(\"SEX\", case_selections = \"2\") str(var) #> List of 3 #> $ name : chr \"SEX\" #> $ case_selections : chr \"2\" #> $ case_selection_type: chr \"general\" #> - attr(*, \"class\")= chr [1:3] \"var_spec\" \"ipums_spec\" \"list\" define_extract_cps( description = \"Case selection example\", samples = c(\"cps2018_03s\", \"cps2019_03s\"), variables = list( var_spec(\"SEX\", case_selections = \"2\"), var_spec(\"AGE\", attached_characteristics = \"head\") ) ) #> Unsubmitted IPUMS CPS extract #> Description: Case selection example #> #> Samples: (2 total) cps2018_03s, cps2019_03s #> Variables: (2 total) SEX, AGE str(cps_ext$variables) #> List of 4 #> $ AGE :List of 1 #> ..$ name: chr \"AGE\" #> ..- attr(*, \"class\")= chr [1:3] \"var_spec\" \"ipums_spec\" \"list\" #> $ SEX :List of 1 #> ..$ name: chr \"SEX\" #> ..- attr(*, \"class\")= chr [1:3] \"var_spec\" \"ipums_spec\" \"list\" #> $ RACE :List of 1 #> ..$ name: chr \"RACE\" #> ..- attr(*, \"class\")= chr [1:3] \"var_spec\" \"ipums_spec\" \"list\" #> $ STATEFIP:List of 1 #> ..$ name: chr \"STATEFIP\" #> ..- attr(*, \"class\")= chr [1:3] \"var_spec\" \"ipums_spec\" \"list\" define_extract_cps( description = \"Example CPS extract\", samples = \"cps2018_03s\", variables = \"AGE\" ) define_extract_cps( description = \"Example CPS extract\", samples = \"cps2018_03s\", variables = var_spec(\"AGE\") ) define_extract_cps( description = \"Case selection example\", samples = c(\"cps2018_03s\", \"cps2019_03s\"), variables = list( var_spec(\"SEX\", case_selections = \"2\"), \"AGE\" ) ) #> Unsubmitted IPUMS CPS extract #> Description: Case selection example #> #> Samples: (2 total) cps2018_03s, cps2019_03s #> Variables: (2 total) SEX, AGE"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-micro.html","id":"case-selections","dir":"Articles","previous_headings":"Detailed variable specifications","what":"Case selections","title":"Microdata API Requests","text":"Case selections allow us limit data records match particular value specified variable. instance, following specification indicate records value \"27\" (Minnesota) \"19\" (Iowa) variable \"STATEFIP\" included: variables versions general detailed coding schemes. default, case selections interpreted refer general codes: variables detailed versions, can also select detailed codes. instance, IPUMS USA variable RACE available general detailed versions. wanted limit extract persons identifying “Two major races”, specifying case selection \"8\". However, wanted limit extract persons identifying “White Chinese” “White Japanese”, need specify detailed codes \"811\" \"812\". include case selections detailed codes, set case_selection_type = \"detailed\": noted , IPUMS intends add support accessing variable metadata via API future, users able query variable coding schemes right R sessions. , use IPUMS web interface given collection find general detailed variable codes purposes case selection. See IPUMS API documentation relevant links. default, case selection person-level variables produces data file includes individuals match specified values specified variables. ’s also possible use case selection include matching individuals members households, using case_select_who parameter. case_select_who parameter must case selections extract, thus set extract level rather var_spec level. include household members matching individuals, set case_select_who = \"households\" extract definition:","code":"var <- var_spec(\"STATEFIP\", case_selections = c(\"27\", \"19\")) var$case_selection_type #> [1] \"general\" # General case selection is the default var_spec(\"RACE\", case_selections = \"8\") #> $name #> [1] \"RACE\" #> #> $case_selections #> [1] \"8\" #> #> $case_selection_type #> [1] \"general\" #> #> attr(,\"class\") #> [1] \"var_spec\" \"ipums_spec\" \"list\" # For detailed case selection, change the `case_selection_type` var_spec( \"RACE\", case_selections = c(\"811\", \"812\"), case_selection_type = \"detailed\" ) #> $name #> [1] \"RACE\" #> #> $case_selections #> [1] \"811\" \"812\" #> #> $case_selection_type #> [1] \"detailed\" #> #> attr(,\"class\") #> [1] \"var_spec\" \"ipums_spec\" \"list\" define_extract_usa( description = \"Household level case selection\", samples = \"us2021a\", variables = var_spec(\"RACE\", case_selections = \"8\"), case_select_who = \"households\" ) #> Unsubmitted IPUMS USA extract #> Description: Household level case selection #> #> Samples: (1 total) us2021a #> Variables: (1 total) RACE"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-micro.html","id":"attached-characteristics","dir":"Articles","previous_headings":"Detailed variable specifications","what":"Attached characteristics","title":"Microdata API Requests","text":"IPUMS allows users create variables reflect characteristics household members. , use attached_characteristics argument var_spec(). instance, attach spouse’s SEX value record: add new variable (case, SEX_SP) output data contain sex person’s spouse (record exists, value 0). Multiple attached characteristics can attached single variable: Acceptable values \"spouse\", \"mother\", \"father\", \"head\".","code":"var_spec(\"SEX\", attached_characteristics = \"spouse\") #> $name #> [1] \"SEX\" #> #> $attached_characteristics #> [1] \"spouse\" #> #> attr(,\"class\") #> [1] \"var_spec\" \"ipums_spec\" \"list\" var_spec(\"AGE\", attached_characteristics = c(\"mother\", \"father\")) #> $name #> [1] \"AGE\" #> #> $attached_characteristics #> [1] \"mother\" \"father\" #> #> attr(,\"class\") #> [1] \"var_spec\" \"ipums_spec\" \"list\""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-micro.html","id":"data-quality-flags","dir":"Articles","previous_headings":"Detailed variable specifications","what":"Data quality flags","title":"Microdata API Requests","text":"variables IPUMS edited missing, illegible, inconsistent values. Data quality flags indicate values edited allocated. include data quality flags individual variable, use data_quality_flags argument var_spec(): produce new variable (QRACE) containing data quality flag given variable. add data quality flags variables , set data_quality_flags = TRUE extract definition directly: data quality flag corresponds one variables, codes flag vary based sample. See documentation IPUMS collection interest information data quality flag codes.","code":"var_spec(\"RACE\", data_quality_flags = TRUE) #> $name #> [1] \"RACE\" #> #> $data_quality_flags #> [1] TRUE #> #> attr(,\"class\") #> [1] \"var_spec\" \"ipums_spec\" \"list\" usa_ext <- define_extract_usa( description = \"Data quality flags\", samples = \"us2021a\", variables = list( var_spec(\"RACE\", case_selections = \"8\"), var_spec(\"AGE\") ), data_quality_flags = TRUE )"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-micro.html","id":"data-structure-and-file-format","dir":"Articles","previous_headings":"","what":"Data structure and file format","title":"Microdata API Requests","text":"default, microdata extract definitions request data rectangular structure fixed-width file format. Rectangular data data person records included, household-level variables converted person-level variables copying values associated household record onto household members. instead create hierarchical extract, includes separate records households persons, set data_structure = \"hierarchical\" extract definition. See IPUMS data reading vignette information loading hierarchical data R. request file format fixed-width, adjust data_format argument. Note can request data variety formats (Stata, SPSS, etc.), ipumsr’s read_ipums_micro() function supports fixed-width csv files.","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-micro.html","id":"next-steps","dir":"Articles","previous_headings":"","what":"Next steps","title":"Microdata API Requests","text":"defined extract request, can submit extract processing: workflow submitting monitoring extract request downloading files complete described IPUMS API introduction.","code":"usa_ext_submitted <- submit_extract(usa_ext)"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-nhgis.html","id":"basic-ipums-nhgis-concepts","dir":"Articles","previous_headings":"","what":"Basic IPUMS NHGIS concepts","title":"NHGIS API Requests","text":"IPUMS NHGIS supports 3 main types data products: datasets, time series tables, shapefiles. dataset contains collection data tables correspond particular tabulated summary statistic. dataset distinguished years, geographic levels, topics covers. instance, 2021 1-year data American Community Survey (ACS) encapsulated single dataset. cases, single census product split multiple datasets. time series table longitudinal data source links comparable statistics multiple U.S. censuses single bundle. table comprised one related time series, describes single summary statistic measured multiple times given geographic level. shapefile (GIS file) contains geographic data given geographic level year. Typically, files composed polygon geometries containing boundaries census reporting areas.","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-nhgis.html","id":"ipums-nhgis-metadata","dir":"Articles","previous_headings":"","what":"IPUMS NHGIS metadata","title":"NHGIS API Requests","text":"course, make request data sources, know codes API uses refer . Fortunately, can browse metadata available IPUMS NHGIS data sources get_metadata_nhgis(). Users can view summary metadata available data sources given data type, detailed metadata specific data source name.","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-nhgis.html","id":"summary-metadata","dir":"Articles","previous_headings":"IPUMS NHGIS metadata","what":"Summary metadata","title":"NHGIS API Requests","text":"see summary available sources given data product type, use type argument. returns data frame containing available datasets, data tables, time series tables, shapefiles. can use basic functions dplyr filter metadata records interest. instance, wanted find data sources related agriculture 1900 Census, filter group description: values listed name column correspond code use request dataset creating extract definition submitted IPUMS API. Similarly, time series tables: metadata fields consistent across different data types, , like geographic_integration, specific time series tables: Note time series tables, metadata fields stored list columns, entry data frame: filter columns, can use map_lgl() purrr. instance, find time series tables include data particular year: details working nested data frames, see documentation dplyr purrr.","code":"ds <- get_metadata_nhgis(type = \"datasets\") head(ds) #> # A tibble: 6 × 4 #> name group description sequence #> #> 1 1790_cPop 1790 Census Population Data [US, States & Counties] 101 #> 2 1800_cPop 1800 Census Population Data [US, States & Counties] 201 #> 3 1810_cPop 1810 Census Population Data [US, States & Counties] 301 #> 4 1820_cPop 1820 Census Population Data [US, States & Counties] 401 #> 5 1830_cPop 1830 Census Population Data [US, States & Counties] 501 #> 6 1840_cAg 1840 Census Agriculture Data [US, States & Counties] 601 ds %>% filter( group == \"1900 Census\", grepl(\"Agriculture\", description) ) #> # A tibble: 2 × 4 #> name group description sequence #> #> 1 1900_cAg 1900 Census Agriculture Data [US, States & Counties] 1401 #> 2 1900_cPHAM 1900 Census Population, Housing, Agriculture & Manufactur… 1403 tst <- get_metadata_nhgis(\"time_series_tables\") head(tst) #> # A tibble: 6 × 7 #> name description geographic_integration sequence time_series years #> #> 1 A00 Total Population Nominal 100. #> 2 AV0 Total Population Nominal 100. #> 3 B78 Total Population Nominal 100. #> 4 CL8 Total Population Standardized to 2010 100. #> 5 A57 Persons by Urban/R… Nominal 101. #> 6 A59 Persons by Urban/R… Nominal 101. #> # ℹ 1 more variable: geog_levels tst$years[[1]] #> # A tibble: 24 × 3 #> name description sequence #> #> 1 1790 1790 1 #> 2 1800 1800 2 #> 3 1810 1810 3 #> 4 1820 1820 4 #> 5 1830 1830 5 #> 6 1840 1840 6 #> 7 1850 1850 7 #> 8 1860 1860 8 #> 9 1870 1870 12 #> 10 1880 1880 22 #> # ℹ 14 more rows tst$geog_levels[[1]] #> # A tibble: 2 × 3 #> name description sequence #> #> 1 state State 4 #> 2 county State--County 25 # Iterate over each `years` entry, identifying whether that entry # contains \"1840\" in its `name` column. tst %>% filter(map_lgl(years, ~ \"1840\" %in% .x$name)) #> # A tibble: 2 × 7 #> name description geographic_integration sequence time_series years #> #> 1 A00 Total Population Nominal 100. #> 2 A08 Persons by Sex [2] Nominal 102. #> # ℹ 1 more variable: geog_levels "},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-nhgis.html","id":"detailed-metadata","dir":"Articles","previous_headings":"IPUMS NHGIS metadata","what":"Detailed metadata","title":"NHGIS API Requests","text":"identified data source interest, can find detailed options providing name corresponding argument get_metadata_nhgis(): provides comprehensive list possible specifications input data source. instance, 1900_cAg dataset, 66 tables choose , 3 possible geographic levels: can also get detailed metadata individual data table. Since data tables belong specific datasets, need specified identify data table: Note name element one contains codes used interacting IPUMS API. nhgis_code element refers prefix attached individual variables output data, API throw error use extract definition. details interpreting provided metadata elements, see documentation get_metadata_nhgis(). Now identified options, can go ahead define extract request submit IPUMS API.","code":"cAg_meta <- get_metadata_nhgis(dataset = \"1900_cAg\") cAg_meta$data_tables #> # A tibble: 66 × 4 #> name nhgis_code description sequence #> #> 1 NT1 AWS Total Population 1 #> 2 NT2 AW3 Number of Farms 2 #> 3 NT3 AXE Average Farm Size 3 #> 4 NT4 AXP Farm Acreage 4 #> 5 NT5 AXZ Farm Management 5 #> 6 NT6 AYA Race of Farmer 6 #> 7 NT7 AYJ Race of Farmer by Detailed Management 7 #> 8 NT8 AYK Number of Farms 8 #> 9 NT9 AYL Farms with Buildings 9 #> 10 NT10 AWT Acres of Farmland 10 #> # ℹ 56 more rows cAg_meta$geog_levels #> # A tibble: 3 × 4 #> name description has_geog_extent_selection sequence #> #> 1 nation Nation FALSE 1 #> 2 state State FALSE 4 #> 3 county State--County FALSE 25 get_metadata_nhgis(dataset = \"1900_cAg\", data_table = \"NT2\") #> $name #> [1] \"NT2\" #> #> $description #> [1] \"Number of Farms\" #> #> $universe #> [1] \"Farms\" #> #> $nhgis_code #> [1] \"AW3\" #> #> $sequence #> [1] 2 #> #> $dataset_name #> [1] \"1900_cAg\" #> #> $variables #> # A tibble: 1 × 2 #> description nhgis_code #> #> 1 Total AW3001"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-nhgis.html","id":"defining-an-ipums-nhgis-extract-request","dir":"Articles","previous_headings":"","what":"Defining an IPUMS NHGIS extract request","title":"NHGIS API Requests","text":"create extract definition containing specifications specific set IPUMS NHGIS data, use define_extract_nhgis(). define extract request, can specify data included extract indicate desired format layout.","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-nhgis.html","id":"basic-extract-definitions","dir":"Articles","previous_headings":"Defining an IPUMS NHGIS extract request","what":"Basic extract definitions","title":"NHGIS API Requests","text":"Let’s say ’re interested getting state-level data number farms average size 1900_cAg dataset identified . can see metadata, data contained tables NT2 NT3: request data, need make explicit dataset specification. datasets must associated selection data tables geographic levels. can use ds_spec() helper function specify selections parameters. ds_spec() bundles selections given dataset together single object (case, ds_spec object): dataset specification can provided extract definition: (Dataset specifications can also include selections years breakdown_values, available datasets.) Similarly, make request time series tables, use tst_spec() helper. makes tst_spec object containing time series table specification. Time series tables contain individual data tables, require geographic level selection, allow optional selection years: attempt define extract required specifications given dataset time series table throw error: Note still possible make invalid extract requests (instance, requesting dataset table doesn’t exist). kind issue caught upon submission API, upon creation extract definition. Shapefiles don’t additional specification options, therefore can requested simply providing names:","code":"cAg_meta$data_tables #> # A tibble: 66 × 4 #> name nhgis_code description sequence #> #> 1 NT1 AWS Total Population 1 #> 2 NT2 AW3 Number of Farms 2 #> 3 NT3 AXE Average Farm Size 3 #> 4 NT4 AXP Farm Acreage 4 #> 5 NT5 AXZ Farm Management 5 #> 6 NT6 AYA Race of Farmer 6 #> 7 NT7 AYJ Race of Farmer by Detailed Management 7 #> 8 NT8 AYK Number of Farms 8 #> 9 NT9 AYL Farms with Buildings 9 #> 10 NT10 AWT Acres of Farmland 10 #> # ℹ 56 more rows dataset <- ds_spec( \"1900_cAg\", data_tables = c(\"NT1\", \"NT2\"), geog_levels = \"state\" ) str(dataset) #> List of 3 #> $ name : chr \"1900_cAg\" #> $ data_tables: chr [1:2] \"NT1\" \"NT2\" #> $ geog_levels: chr \"state\" #> - attr(*, \"class\")= chr [1:3] \"ds_spec\" \"ipums_spec\" \"list\" nhgis_ext <- define_extract_nhgis( description = \"Example farm data in 1900\", datasets = dataset ) nhgis_ext #> Unsubmitted IPUMS NHGIS extract #> Description: Example farm data in 1900 #> #> Dataset: 1900_cAg #> Tables: NT1, NT2 #> Geog Levels: state define_extract_nhgis( description = \"Example time series table request\", time_series_tables = tst_spec( \"CW3\", geog_levels = c(\"county\", \"tract\"), years = c(\"1990\", \"2000\") ) ) #> Unsubmitted IPUMS NHGIS extract #> Description: Example time series table request #> #> Time Series Table: CW3 #> Geog Levels: county, tract #> Years: 1990, 2000 define_extract_nhgis( description = \"Invalid extract\", datasets = ds_spec(\"1900_STF1\", data_tables = \"NP1\") ) #> Error in `validate_ipums_extract()`: #> ! Invalid `ds_spec` specification: #> ✖ `geog_levels` must not contain missing values. define_extract_nhgis( description = \"Example shapefiles request\", shapefiles = c(\"us_county_2021_tl2021\", \"us_county_2020_tl2020\") ) #> Unsubmitted IPUMS NHGIS extract #> Description: Example shapefiles request #> #> Shapefiles: us_county_2021_tl2021, us_county_2020_tl2020"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-nhgis.html","id":"more-complicated-extract-definitions","dir":"Articles","previous_headings":"Defining an IPUMS NHGIS extract request","what":"More complicated extract definitions","title":"NHGIS API Requests","text":"’s possible request data multiple datasets (time series tables) single extract definition. , pass list ds_spec tst_spec objects define_extract_nhgis(): extracts multiple datasets time series tables, may easier generate specifications independently creating extract request object. can quickly create multiple ds_spec objects iterating across specifications want include. , use purrr , also use loop: workflow also makes easy quickly update specifications future. instance, add 2017 ACS 1-year data extract definition , ’d need add \"2017_ACS1\" ds_names variable. iteration automatically add selected tables geog levels new dataset. (workflow works particularly well ACS datasets, often data table names across datasets.)","code":"define_extract_nhgis( description = \"Slightly more complicated extract request\", datasets = list( ds_spec(\"2018_ACS1\", \"B01001\", \"state\"), ds_spec(\"2019_ACS1\", \"B01001\", \"state\") ), shapefiles = c(\"us_state_2018_tl2018\", \"us_state_2019_tl2019\") ) #> Unsubmitted IPUMS NHGIS extract #> Description: Slightly more complicated extract request #> #> Dataset: 2018_ACS1 #> Tables: B01001 #> Geog Levels: state #> #> Dataset: 2019_ACS1 #> Tables: B01001 #> Geog Levels: state #> #> Shapefiles: us_state_2018_tl2018, us_state_2019_tl2019 ds_names <- c(\"2019_ACS1\", \"2018_ACS1\") tables <- c(\"B01001\", \"B01002\") geogs <- c(\"county\", \"state\") # For each dataset to include, create a specification with the # data tabels and geog levels indicated above datasets <- purrr::map( ds_names, ~ ds_spec( name = .x, data_tables = tables, geog_levels = geogs ) ) nhgis_ext <- define_extract_nhgis( description = \"Slightly more complicated extract request\", datasets = datasets ) nhgis_ext #> Unsubmitted IPUMS NHGIS extract #> Description: Slightly more complicated extract request #> #> Dataset: 2019_ACS1 #> Tables: B01001, B01002 #> Geog Levels: county, state #> #> Dataset: 2018_ACS1 #> Tables: B01001, B01002 #> Geog Levels: county, state"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-nhgis.html","id":"data-layout-and-file-format","dir":"Articles","previous_headings":"Defining an IPUMS NHGIS extract request","what":"Data layout and file format","title":"NHGIS API Requests","text":"IPUMS NHGIS extract definitions also support additional options modify layout format extract’s resulting data files. extracts contain time series tables, tst_layout argument indicates longitudinal data organized. extracts contain datasets multiple breakdowns data types, use breakdown_and_data_type_layout argument specify layout . common data sources contain estimates margins error, like ACS. File formats can specified data_format argument. IPUMS NHGIS currently distributes files csv fixed-width format. See documentation define_extract_nhgis() details options.","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api-nhgis.html","id":"next-steps","dir":"Articles","previous_headings":"","what":"Next steps","title":"NHGIS API Requests","text":"defined extract request, can submit extract processing: workflow submitting monitoring extract request downloading files complete described IPUMS API introduction.","code":"nhgis_ext_submitted <- submit_extract(nhgis_ext)"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api.html","id":"api-availability","dir":"Articles","previous_headings":"","what":"API availability","title":"Introduction to the IPUMS API for R Users","text":"IPUMS extract support currently available via API following collections: IPUMS USA IPUMS CPS IPUMS International IPUMS NHGIS Note support includes data available via collection’s extract engine. Many collections provide additional data via direct download, products supported IPUMS API. IPUMS metadata support currently available via API following collections: IPUMS NHGIS API support continue added collections future. can check general API availability IPUMS collections ipums_data_collections(). Note tools ipumsr may necessarily support functionality currently supported IPUMS API. See API documentation information latest features.","code":"ipums_data_collections() #> # A tibble: 14 × 4 #> collection_name collection_type code_for_api api_support #> #> 1 IPUMS USA microdata usa TRUE #> 2 IPUMS CPS microdata cps TRUE #> 3 IPUMS International microdata ipumsi TRUE #> 4 IPUMS NHGIS aggregate data nhgis TRUE #> 5 IPUMS IHGIS aggregate data ihgis FALSE #> 6 IPUMS ATUS microdata atus FALSE #> 7 IPUMS AHTUS microdata ahtus FALSE #> 8 IPUMS MTUS microdata mtus FALSE #> 9 IPUMS DHS microdata dhs FALSE #> 10 IPUMS PMA microdata pma FALSE #> 11 IPUMS MICS microdata mics FALSE #> 12 IPUMS NHIS microdata nhis FALSE #> 13 IPUMS MEPS microdata meps FALSE #> 14 IPUMS Higher Ed microdata highered FALSE"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api.html","id":"set-key","dir":"Articles","previous_headings":"","what":"Set up your API key","title":"Introduction to the IPUMS API for R Users","text":"interact IPUMS API, ’ll need register access IPUMS project ’ll using. yet registered, can find links register API-supported IPUMS collections : IPUMS USA IPUMS CPS IPUMS International IPUMS NHGIS ’re registered, ’ll able create API key. default, ipumsr API functions assume key stored IPUMS_API_KEY environment variable. can also provide key directly functions, storing environment variable saves typing helps prevent inadvertently sharing key others (instance, GitHub). can save API key IPUMS_API_KEY environment variable set_ipums_api_key(). save key use future sessions, set save = TRUE. add API key .Renviron file user home directory. rest vignette assumes obtained API key stored IPUMS_API_KEY environment variable.","code":"# Save key in .Renviron for use across sessions set_ipums_api_key(\"paste-your-key-here\", save = TRUE)"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api.html","id":"define","dir":"Articles","previous_headings":"","what":"Define an extract request","title":"Introduction to the IPUMS API for R Users","text":"IPUMS collection extract definition function used specify parameters new extract request scratch. functions take form define_extract_*(): define_extract_usa() define_extract_cps() define_extract_ipumsi() define_extract_nhgis() define extract request, can specify data included extract indicate desired format layout. instance, following defines simple IPUMS USA extract request AGE, SEX, RACE, STATEFIP, MARST variables 2018 2019 American Community Survey (ACS): exact extract definition options vary across collections, collections can used general workflow. details available extract definition options, see associated microdata NHGIS vignettes. purposes demonstrating overall workflow, continue work sample IPUMS USA extract definition created .","code":"usa_ext_def <- define_extract_usa( description = \"USA extract for API vignette\", samples = c(\"us2018a\", \"us2019a\"), variables = c(\"AGE\", \"SEX\", \"RACE\", \"STATEFIP\", \"MARST\") ) usa_ext_def #> Unsubmitted IPUMS USA extract #> Description: USA extract for API vignette #> #> Samples: (2 total) us2018a, us2019a #> Variables: (5 total) AGE, SEX, RACE, STATEFIP, MARST"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api.html","id":"extract-request-objects","dir":"Articles","previous_headings":"Define an extract request","what":"Extract request objects","title":"Introduction to the IPUMS API for R Users","text":"define_extract_*() functions always produce ipums_extract object, can handled API functions (see ?ipums_extract). Furthermore, objects subclass particular collection associated. Many specifications given extract request object can accessed indexing object: ipums_extract objects also contain information extract request’s processing status assigned extract number, serves identifier extract request. Since extract request still unsubmitted, request number: obtain data requested extract definition, must first submit IPUMS API processing.","code":"class(usa_ext_def) #> [1] \"usa_extract\" \"micro_extract\" \"ipums_extract\" \"list\" names(usa_ext_def$samples) #> [1] \"us2018a\" \"us2019a\" names(usa_ext_def$variables) #> [1] \"AGE\" \"SEX\" \"RACE\" \"STATEFIP\" \"MARST\" usa_ext_def$data_format #> [1] \"fixed_width\" usa_ext_def$status #> [1] \"unsubmitted\" usa_ext_def$number #> [1] NA"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api.html","id":"submit","dir":"Articles","previous_headings":"","what":"Submit an extract request","title":"Introduction to the IPUMS API for R Users","text":"submit extract definition, use submit_extract(). errors detected extract definition, submitted extract request returned assigned number status. Storing returned object can useful checking extract request’s status later. extract number stored returned object: Note fields submitted extract may automatically updated API upon submission. instance, microdata extracts, additional preselected variables may added extract even weren’t specified explicitly extract definition. forget store updated extract object returned submit_extract(), can use get_last_extract_info() helper request information recent extract request given collection:","code":"usa_ext_submitted <- submit_extract(usa_ext_def) #> Successfully submitted IPUMS USA extract number 348 usa_ext_submitted$number #> [1] 348 usa_ext_submitted$status #> [1] \"queued\" names(usa_ext_submitted$variables) #> [1] \"YEAR\" \"SAMPLE\" \"SERIAL\" \"CBSERIAL\" \"HHWT\" \"CLUSTER\" #> [7] \"STATEFIP\" \"STRATA\" \"GQ\" \"PERNUM\" \"PERWT\" \"SEX\" #> [13] \"AGE\" \"MARST\" \"RACE\" usa_ext_submitted <- get_last_extract_info(\"usa\") usa_ext_submitted$number #> [1] 348"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api.html","id":"wait","dir":"Articles","previous_headings":"","what":"Wait for an extract request to complete","title":"Introduction to the IPUMS API for R Users","text":"may take time IPUMS servers process extract request. can ensure extract finished processing attempt download files using wait_for_extract(). polls API regularly processing completed (default, interval increases 10 seconds). returns ipums_extract object containing completed extract definition. Note wait_for_extract() tie R session extract ready download. fine strictly programmatic workflow, may frustrating working interactively, especially large extracts IPUMS servers busy. cases, can manually check whether extract ready download is_extract_ready(). long returns TRUE, able download extract’s files. detailed status check, provide extract’s collection number get_extract_info(). returns ipums_extract object reflecting requested extract definition current status. status submitted extract one \"queued\", \"started\", \"produced\", \"canceled\", \"failed\", \"completed\". Note extracts removed IPUMS servers set period time (72 hours microdata collections, 2 weeks IPUMS NHGIS). Therefore, extract \"completed\" status may still unavailable download. is_extract_ready() alert extract expired needs resubmitted. Simply use submit_extract() resubmit extract request. Note produce new extract (new extract number), even extract definition identical.","code":"usa_ext_complete <- wait_for_extract(usa_ext_submitted) #> Checking extract status... #> Waiting 10 seconds... #> Checking extract status... #> IPUMS USA extract 348 is ready to download. usa_ext_complete$status #> [1] \"completed\" # `download_links` should be populated if the extract is ready for download names(usa_ext_complete$download_links) #> [1] \"r_command_file\" \"basic_codebook\" \"data\" #> [4] \"stata_command_file\" \"sas_command_file\" \"spss_command_file\" #> [7] \"ddi_codebook\" is_extract_ready(usa_ext_submitted) #> [1] TRUE usa_ext_submitted <- get_extract_info(usa_ext_submitted) usa_ext_submitted$status #> [1] \"completed\""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api.html","id":"download","dir":"Articles","previous_headings":"","what":"Download an extract","title":"Introduction to the IPUMS API for R Users","text":"extract finished processing, use download_extract() download extract’s data files local machine. return path downloaded file(s) required load data R. microdata collections, path DDI codebook (.xml) file, can used read associated data (contained .dat.gz file). NHGIS, path .zip archive containing requested data files /shapefiles. files produced download_extract() can passed directly reader functions provided ipumsr. instance, microdata projects: instead ’re working NHGIS extract, use read_nhgis() read_ipums_sf(). See associated vignette information loading IPUMS data R.","code":"# By default, downloads to your current working directory filepath <- download_extract(usa_ext_submitted) ddi <- read_ipums_ddi(filepath) micro_data <- read_ipums_micro(ddi)"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api.html","id":"recent","dir":"Articles","previous_headings":"","what":"Get info on past extracts","title":"Introduction to the IPUMS API for R Users","text":"retrieve definition corresponding particular extract, provide collection number get_extract_info(). can provided either single string form \"collection:number\" length-2 vector: c(collection, number). Several API functions support syntax well. know made specific extract definition past, can’t remember exact number, can use get_extract_history() peruse recent extract requests particular collection. default, returns 10 recent extract requests list ipums_extract objects. can adjust many requests retrieve how_many argument: list ipums_extract objects, can operate API functions introduced already. can also iterate extract history find extracts particular characteristics. instance, can use purrr::keep() find extracts contain certain variable ready download: can use purrr::map() family browse certain values: regularly use single IPUMS collection, can save typing setting collection default. set_ipums_default_collection() save specified collection value IPUMS_DEFAULT_COLLECTION environment variable. default collection set, API functions use collection requests, assuming collection specified.","code":"usa_ext <- get_extract_info(\"usa:47\") # Alternatively: usa_ext <- get_extract_info(c(\"usa\", 47)) usa_ext #> Submitted IPUMS USA extract number 47 #> Description: Test extract #> #> Samples: (1 total) us2017b #> Variables: (8 total) YEAR, SAMPLE, SERIAL, CBSERIAL, HHWT, GQ, PERNUM, PERWT usa_extracts <- get_extract_history(\"usa\", how_many = 3) usa_extracts #> [[1]] #> Submitted IPUMS USA extract number 348 #> Description: USA extract for API vignette #> #> Samples: (2 total) us2018a, us2019a #> Variables: (15 total) YEAR, SAMPLE, SERIAL, CBSERIAL, HHWT, CLUSTER,... #> #> [[2]] #> Submitted IPUMS USA extract number 347 #> Description: Data from long ago #> #> Samples: (1 total) us1880a #> Variables: (12 total) YEAR, SAMPLE, SERIAL, HHWT, CLUSTER, STRATA, G... #> #> [[3]] #> Submitted IPUMS USA extract number 346 #> Description: Data from 2017 PRCS #> #> Samples: (1 total) us2017b #> Variables: (9 total) YEAR, SAMPLE, SERIAL, CBSERIAL, HHWT, GQ, PERNU... is_extract_ready(usa_extracts[[2]]) #> [1] TRUE purrr::keep(usa_extracts, ~ \"MARST\" %in% names(.x$variables)) #> [[1]] #> Submitted IPUMS USA extract number 348 #> Description: USA extract for API vignette #> #> Samples: (2 total) us2018a, us2019a #> Variables: (15 total) YEAR, SAMPLE, SERIAL, CBSERIAL, HHWT, CLUSTER,... purrr::keep(usa_extracts, is_extract_ready) #> [[1]] #> Submitted IPUMS USA extract number 348 #> Description: USA extract for API vignette #> #> Samples: (2 total) us2018a, us2019a #> Variables: (15 total) YEAR, SAMPLE, SERIAL, CBSERIAL, HHWT, CLUSTER,... #> #> [[2]] #> Submitted IPUMS USA extract number 347 #> Description: Data from long ago #> #> Samples: (1 total) us1880a #> Variables: (12 total) YEAR, SAMPLE, SERIAL, HHWT, CLUSTER, STRATA, G... #> #> [[3]] #> Submitted IPUMS USA extract number 346 #> Description: Data from 2017 PRCS #> #> Samples: (1 total) us2017b #> Variables: (9 total) YEAR, SAMPLE, SERIAL, CBSERIAL, HHWT, GQ, PERNU... purrr::map_chr(usa_extracts, ~ .x$description) #> [1] \"USA extract for API vignette\" \"Data from long ago\" #> [3] \"Data from 2017 PRCS\" set_ipums_default_collection(\"usa\") # Set `save = TRUE` to store across sessions # Check the default collection: Sys.getenv(\"IPUMS_DEFAULT_COLLECTION\") #> [1] \"usa\" # Most recent USA extract: usa_last <- get_last_extract_info() # Request info on extract request \"usa:10\" usa_ext_10 <- get_extract_info(10) # You can still request other collections as usual: cps_ext_10 <- get_extract_info(\"cps:10\")"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api.html","id":"share","dir":"Articles","previous_headings":"","what":"Share an extract definition","title":"Introduction to the IPUMS API for R Users","text":"One exciting feature enabled IPUMS API ability share standardized extract definition IPUMS users can create identical extract request . terms use IPUMS collections prohibit redistribution IPUMS data, don’t prohibit sharing data extract definitions. ipumsr facilitates type sharing save_extract_as_json() define_extract_from_json(), read write ipums_extract objects standardized JSON-formatted file. point, can send usa_extract_10.json another user allow create duplicate ipums_extract object, can load submit API (long API access). Note code previous chunk assumes file saved current working directory. ’s saved somewhere else, replace \"usa_extract_10.json\" full path file.","code":"usa_ext_10 <- get_extract_info(\"usa:10\") save_extract_as_json(usa_ext_10, file = \"usa_extract_10.json\") clone_of_usa_ext_10 <- define_extract_from_json(\"usa_extract_10.json\") usa_ext_10_resubmitted <- submit_extract(clone_of_usa_ext_10)"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api.html","id":"revise-a-previous-extract-request","dir":"Articles","previous_headings":"","what":"Revise a previous extract request","title":"Introduction to the IPUMS API for R Users","text":"Occasionally, may want modify existing extract definition (e.g. update analysis new data). easiest way add new specifications define_extract_*() code produced original extract definition. highly recommend save code somewhere can accessed updated future. However, cases original extract definition code exist (e.g. extract created using online IPUMS extract system). case, best approach view extract definition get_extract_info() create new extract definition (using define_extract_*() function) reproduces definition along desired modifications. may bit tedious complex extract definitions, one-time investment make future updates extract definition much easier. Previously, encouraged users use helpers add_to_extract() remove_from_extract() modifying extracts. now encourage re-write extract definitions improve reproducibility: extract definition code always clear stable written explicitly, rather based old extract number. two functions may retired future.","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-api.html","id":"putting-it-all-together","dir":"Articles","previous_headings":"","what":"Putting it all together","title":"Introduction to the IPUMS API for R Users","text":"core API functions ipumsr compatible one another can combined single pipeline requests, downloads, reads extract data R data frame: Note NHGIS extracts contain data shapefiles, single file need selected reading, download_extract() return path file. instance, hypothetical nhgis_extract contains tabular spatial data: API workflow allow obtain IPUMS data without ever leaving R environment, also allows retain reproducible record process. makes much easier document workflow, collaborate researchers, update analysis future.","code":"usa_data <- define_extract_usa( \"USA extract for API vignette\", c(\"us2018a\", \"us2019a\"), c(\"AGE\", \"SEX\", \"RACE\", \"STATEFIP\") ) %>% submit_extract() %>% wait_for_extract() %>% download_extract() %>% read_ipums_micro() nhgis_data <- download_extract(nhgis_extract) %>% purrr::pluck(\"data\") %>% # Select only the tabular data file to read read_nhgis()"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"setup","dir":"Articles","previous_headings":"","what":"Setup","title":"Big IPUMS Data","text":"examples vignette rely helpful packages. haven’t already installed , can :","code":"# To run the full vignette, you'll also need the following packages. If they # aren't installed already, do so with: install.packages(\"biglm\") install.packages(\"DBI\") install.packages(\"RSQLite\") install.packages(\"dbplyr\") library(ipumsr) library(dplyr)"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"option-1-trade-money-for-convenience","dir":"Articles","previous_headings":"","what":"Option 1: Trade money for convenience","title":"Big IPUMS Data","text":"need work dataset ’s big RAM, simplest option get space. upgrading hardware isn’t option, paying cloud service like Amazon Microsoft Azure may worth considering. guides using R Amazon Microsoft Azure. course, option isn’t feasible users—case, updates data used analysis processing pipeline may required.","code":""},{"path":[]},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"remove-unused-data","dir":"Articles","previous_headings":"Option 2: Reduce extract size","what":"Remove unused data","title":"Big IPUMS Data","text":"easiest way reduce size extract drop unused samples variables. can done extract interface specific IPUMS project ’re using within R using IPUMS API (projects supported). using API, simply updated extract definition code exclude specifications longer need. , resubmit extract request download new files. See introduction IPUMS API information making extract requests ipumsr.","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"select-cases","dir":"Articles","previous_headings":"Option 2: Reduce extract size","what":"Select cases","title":"Big IPUMS Data","text":"microdata projects, another good option reducing extract size select cases relevant research question, producing extract containing data particular subset values given variable. ’re using IPUMS API, can use var_spec() specify case selections variable extract definition. instance, following produce extract including records married women: ’re using online interface, “Select Cases” option available last page submitting extract request.","code":"define_extract_usa( description = \"2013 ACS Data for Married Women\", samples = \"us2013a\", variables = list( var_spec(\"MARST\", case_selections = \"1\"), var_spec(\"SEX\", case_selections = \"2\") ) ) #> Unsubmitted IPUMS USA extract #> Description: 2013 ACS Data for Married Women #> #> Samples: (1 total) us2013a #> Variables: (2 total) MARST, SEX"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"use-a-sampled-subset-of-the-data","dir":"Articles","previous_headings":"Option 2: Reduce extract size","what":"Use a sampled subset of the data","title":"Big IPUMS Data","text":"Yet another option (also microdata projects) take random subsample data producing extract. Sampled data available via IPUMS API, can use “Customize Sample Size” option online interface . also appears final page submitting extract request. ’ve already submitted extract, can click “REVISE” link “Download Revise Extracts” page access features produce new data extract.","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"option-3-process-the-data-in-pieces","dir":"Articles","previous_headings":"","what":"Option 3: Process the data in pieces","title":"Big IPUMS Data","text":"ipumsr provides two related options reading data sources increments: Chunked functions allow specify function called chunk data read well like chunks combined end. functions use readr framework reading chunked data. Yielded functions allow flexibility returning control user loading piece data. functions unique ipumsr fixed-width data.","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"reading-chunked-data","dir":"Articles","previous_headings":"Option 3: Process the data in pieces","what":"Reading chunked data","title":"Big IPUMS Data","text":"Use read_ipums_micro_chunked() read_ipums_micro_list_chunked() read data chunks. analogous standard read_ipums_micro() read_ipums_micro_list() functions, allow specify function applied data chunk control results chunks combined. , ’ll use chunking outline solutions three common use-cases IPUMS data: tabulation, regression case selection. First, ’ll load example data. Note -sampled data example storage reasons; none output “results” reflected vignette considered legitimate!","code":"cps_ddi_file <- ipums_example(\"cps_00097.xml\")"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"chunked-tab","dir":"Articles","previous_headings":"Option 3: Process the data in pieces > Reading chunked data","what":"Chunked tabulation","title":"Big IPUMS Data","text":"Imagine wanted find percent people workforce grouped self-reported health. Since example extract small enough fit memory, load full dataset read_ipums_micro(), relabel EMPSTAT variable binary variable (see vignette(\"value-labels\")), count people group. sake example, let’s imagine can store 1,000 rows memory time. case, need use chunked function, tabulate chunk, calculate counts across chunks. chunked functions apply user-defined callback function chunk. callback takes two arguments: x, represents data contained given chunk, pos, represents position chunk, expressed line input file chunk starts. Generally need use x, callback must always take arguments. case, callback implement processing steps demonstrated : Next, need create callback object, determine want combine ultimate results chunk. ipumsr provides three main types callback objects preserve variable metadata: IpumsDataFrameCallback combines results chunk together row binding together IpumsListCallback returns list one item per chunk containing results chunk. Use don’t want (can’t) immediately combine results. IpumsSideEffectCallback return results. Use callback function intended side effects (instance, saving results chunk disk). (ipumsr also provides fourth callback used running linear regression models discussed ). case, want row-bind data frames returned cb_function(), use IpumsDataFrameCallback. Callback objects R6 objects, don’t need familiar R6 use them2. initialize callback object, simply use $new(): point, ’re ready load data chunks. use read_ipums_micro_chunked() specify callback chunk size: Now data frame counts health work status within chunk. get full table, just need sum health work status one time:","code":"read_ipums_micro(cps_ddi_file, verbose = FALSE) %>% mutate( HEALTH = as_factor(HEALTH), AT_WORK = as_factor( lbl_relabel( EMPSTAT, lbl(1, \"Yes\") ~ .lbl == \"At work\", lbl(0, \"No\") ~ .lbl != \"At work\" ) ) ) %>% group_by(HEALTH, AT_WORK) %>% summarize(n = n(), .groups = \"drop\") #> # A tibble: 10 × 3 #> HEALTH AT_WORK n #> #> 1 Excellent No 4055 #> 2 Excellent Yes 2900 #> 3 Very good No 3133 #> 4 Very good Yes 3371 #> 5 Good No 2480 #> 6 Good Yes 2178 #> 7 Fair No 1123 #> 8 Fair Yes 443 #> 9 Poor No 603 #> 10 Poor Yes 65 cb_function <- function(x, pos) { x %>% mutate( HEALTH = as_factor(HEALTH), AT_WORK = as_factor( lbl_relabel( EMPSTAT, lbl(1, \"Yes\") ~ .lbl == \"At work\", lbl(0, \"No\") ~ .lbl != \"At work\" ) ) ) %>% group_by(HEALTH, AT_WORK) %>% summarize(n = n(), .groups = \"drop\") } cb <- IpumsDataFrameCallback$new(cb_function) chunked_tabulations <- read_ipums_micro_chunked( cps_ddi_file, callback = cb, chunk_size = 1000, verbose = FALSE ) chunked_tabulations #> # A tibble: 209 × 3 #> HEALTH AT_WORK n #> #> 1 Excellent No 183 #> 2 Excellent Yes 147 #> 3 Very good No 134 #> 4 Very good Yes 217 #> 5 Good No 111 #> 6 Good Yes 105 #> 7 Fair No 53 #> 8 Fair Yes 22 #> 9 Poor No 27 #> 10 Poor Yes 1 #> # ℹ 199 more rows chunked_tabulations %>% group_by(HEALTH, AT_WORK) %>% summarize(n = sum(n), .groups = \"drop\") #> # A tibble: 10 × 3 #> HEALTH AT_WORK n #> #> 1 Excellent No 4055 #> 2 Excellent Yes 2900 #> 3 Very good No 3133 #> 4 Very good Yes 3371 #> 5 Good No 2480 #> 6 Good Yes 2178 #> 7 Fair No 1123 #> 8 Fair Yes 443 #> 9 Poor No 603 #> 10 Poor Yes 65"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"chunked-reg","dir":"Articles","previous_headings":"Option 3: Process the data in pieces > Reading chunked data","what":"Chunked regression","title":"Big IPUMS Data","text":"biglm package, possible use R perform regression data large store memory . ipumsr package provides another callback designed make simple: IpumsBiglmCallback. example, ’ll conduct regression total hours worked (AHRSWORKT) outcome age (AGE) self-reported health (HEALTH) predictors. (Note intended code demonstration, ignore many complexities addressed real analyses.) running analysis full dataset, ’d first load data prepare variables analysis use model: , ’d provide model formula data lm: regression, 1,000 rows loaded time, work similar manner. First make IpumsBiglmCallback callback object. provide model formula well code used process data running regression: read data using read_ipums_micro_chunked(), passing callback just made.","code":"data <- read_ipums_micro(cps_ddi_file, verbose = FALSE) %>% mutate( HEALTH = as_factor(HEALTH), AHRSWORKT = lbl_na_if(AHRSWORKT, ~ .lbl == \"NIU (Not in universe)\"), AT_WORK = as_factor( lbl_relabel( EMPSTAT, lbl(1, \"Yes\") ~ .lbl == \"At work\", lbl(0, \"No\") ~ .lbl != \"At work\" ) ) ) %>% filter(AT_WORK == \"Yes\") model <- lm(AHRSWORKT ~ AGE + I(AGE^2) + HEALTH, data = data) summary(model) #> #> Call: #> lm(formula = AHRSWORKT ~ AGE + I(AGE^2) + HEALTH, data = data) #> #> Residuals: #> Min 1Q Median 3Q Max #> -41.217 -4.734 -0.077 5.957 63.994 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|) #> (Intercept) 5.2440289 1.1823985 4.435 9.31e-06 *** #> AGE 1.5868169 0.0573268 27.680 < 2e-16 *** #> I(AGE^2) -0.0170043 0.0006568 -25.888 < 2e-16 *** #> HEALTHVery good -0.2550306 0.3276759 -0.778 0.436412 #> HEALTHGood -0.9637395 0.3704123 -2.602 0.009289 ** #> HEALTHFair -3.8899430 0.6629725 -5.867 4.58e-09 *** #> HEALTHPoor -5.7597200 1.6197136 -3.556 0.000378 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> Residual standard error: 12.88 on 8950 degrees of freedom #> Multiple R-squared: 0.08711, Adjusted R-squared: 0.0865 #> F-statistic: 142.3 on 6 and 8950 DF, p-value: < 2.2e-16 library(biglm) #> Loading required package: DBI biglm_cb <- IpumsBiglmCallback$new( model = AHRSWORKT ~ AGE + I(AGE^2) + HEALTH, prep = function(x, pos) { x %>% mutate( HEALTH = as_factor(HEALTH), AHRSWORKT = lbl_na_if(AHRSWORKT, ~ .lbl == \"NIU (Not in universe)\"), AT_WORK = as_factor( lbl_relabel( EMPSTAT, lbl(1, \"Yes\") ~ .lbl == \"At work\", lbl(0, \"No\") ~ .lbl != \"At work\" ) ) ) %>% filter(AT_WORK == \"Yes\") } ) chunked_model <- read_ipums_micro_chunked( cps_ddi_file, callback = biglm_cb, chunk_size = 1000, verbose = FALSE ) summary(chunked_model) #> Large data regression model: biglm(AHRSWORKT ~ AGE + I(AGE^2) + HEALTH, data, ...) #> Sample size = 8957 #> Coef (95% CI) SE p #> (Intercept) 5.2440 2.8792 7.6088 1.1824 0.0000 #> AGE 1.5868 1.4722 1.7015 0.0573 0.0000 #> I(AGE^2) -0.0170 -0.0183 -0.0157 0.0007 0.0000 #> HEALTHVery good -0.2550 -0.9104 0.4003 0.3277 0.4364 #> HEALTHGood -0.9637 -1.7046 -0.2229 0.3704 0.0093 #> HEALTHFair -3.8899 -5.2159 -2.5640 0.6630 0.0000 #> HEALTHPoor -5.7597 -8.9991 -2.5203 1.6197 0.0004"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"reading-yielded-data","dir":"Articles","previous_headings":"Option 3: Process the data in pieces","what":"Reading yielded data","title":"Big IPUMS Data","text":"addition chunked reading, ipumsr also provides similar flexible “yielded” reading. read_ipums_micro_yield() read_ipums_micro_list_yield() grant freedom determining R code run chunks include ability multiple files open . Additionally, yields compatible bigglm function biglm, allows run glm models data larger memory. downside greater control yields API unique IPUMS data way work unusual R code.","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"yielded-tabulation","dir":"Articles","previous_headings":"Option 3: Process the data in pieces > Reading yielded data","what":"Yielded tabulation","title":"Big IPUMS Data","text":"’ll compare yield chunked functions conducting tabulation example using yields. First, create yield object function read_ipums_micro_yield(): function returns R6 object contains methods reading data. important method yield() method return n rows data: Note row position data stored object, running code produce different rows data: Use cur_pos get current position data file: is_done() method tells us whether read entire file yet: preparation actual example, ’ll use reset() reset beginning data: Using yield() is_done(), can set processing pipeline. First, create empty placeholder tibble store results: , iterate data, yielding 1,000 rows time processing results chunked example. iteration end ’ve finished reading entire file.","code":"data <- read_ipums_micro_yield(cps_ddi_file, verbose = FALSE) # Return the first 10 rows of data data$yield(10) #> # A tibble: 10 × 14 #> YEAR SERIAL MONTH CPSID ASECFLAG ASECWTH FOODSTMP PERNUM CPSIDP ASECWT #> #> 1 2011 33 3 [Marc… 2.01e13 1 [ASEC] 308. 1 [No] 1 2.01e13 308. #> 2 2011 33 3 [Marc… 2.01e13 1 [ASEC] 308. 1 [No] 2 2.01e13 217. #> 3 2011 33 3 [Marc… 2.01e13 1 [ASEC] 308. 1 [No] 3 2.01e13 249. #> 4 2011 46 3 [Marc… 2.01e13 1 [ASEC] 266. 1 [No] 1 2.01e13 266. #> 5 2011 46 3 [Marc… 2.01e13 1 [ASEC] 266. 1 [No] 2 2.01e13 266. #> 6 2011 46 3 [Marc… 2.01e13 1 [ASEC] 266. 1 [No] 3 2.01e13 265. #> 7 2011 46 3 [Marc… 2.01e13 1 [ASEC] 266. 1 [No] 4 2.01e13 296. #> 8 2011 64 3 [Marc… 2.01e13 1 [ASEC] 241. 1 [No] 1 2.01e13 241. #> 9 2011 64 3 [Marc… 2.01e13 1 [ASEC] 241. 1 [No] 2 2.01e13 241. #> 10 2011 64 3 [Marc… 2.01e13 1 [ASEC] 241. 1 [No] 3 2.01e13 278. #> # ℹ 4 more variables: AGE , EMPSTAT , AHRSWORKT , #> # HEALTH # Return the next 10 rows of data data$yield(10) #> # A tibble: 10 × 14 #> YEAR SERIAL MONTH CPSID ASECFLAG ASECWTH FOODSTMP PERNUM CPSIDP ASECWT #> #> 1 2011 82 3 [Marc… 0 1 [ASEC] 373. 1 [No] 1 0 373. #> 2 2011 82 3 [Marc… 0 1 [ASEC] 373. 1 [No] 2 0 373. #> 3 2011 82 3 [Marc… 0 1 [ASEC] 373. 1 [No] 3 0 326. #> 4 2011 86 3 [Marc… 2.01e13 1 [ASEC] 554. 1 [No] 1 2.01e13 554. #> 5 2011 104 3 [Marc… 2.01e13 1 [ASEC] 543. 1 [No] 1 2.01e13 543. #> 6 2011 104 3 [Marc… 2.01e13 1 [ASEC] 543. 1 [No] 2 2.01e13 543. #> 7 2011 106 3 [Marc… 2.01e13 1 [ASEC] 543. 1 [No] 1 2.01e13 543. #> 8 2011 137 3 [Marc… 2.01e13 1 [ASEC] 271. 1 [No] 1 2.01e13 271. #> 9 2011 137 3 [Marc… 2.01e13 1 [ASEC] 271. 1 [No] 2 2.01e13 271. #> 10 2011 137 3 [Marc… 2.01e13 1 [ASEC] 271. 1 [No] 3 2.01e13 365. #> # ℹ 4 more variables: AGE , EMPSTAT , AHRSWORKT , #> # HEALTH data$cur_pos #> [1] 21 data$is_done() #> [1] FALSE data$reset() yield_results <- tibble( HEALTH = factor(levels = c(\"Excellent\", \"Very good\", \"Good\", \"Fair\", \"Poor\")), AT_WORK = factor(levels = c(\"No\", \"Yes\")), n = integer(0) ) while (!data$is_done()) { # Yield new data and process new <- data$yield(n = 1000) %>% mutate( HEALTH = as_factor(HEALTH), AT_WORK = as_factor( lbl_relabel( EMPSTAT, lbl(1, \"Yes\") ~ .lbl == \"At work\", lbl(0, \"No\") ~ .lbl != \"At work\" ) ) ) %>% group_by(HEALTH, AT_WORK) %>% summarize(n = n(), .groups = \"drop\") # Combine the new yield with the previously processed yields yield_results <- bind_rows(yield_results, new) %>% group_by(HEALTH, AT_WORK) %>% summarize(n = sum(n), .groups = \"drop\") } yield_results #> # A tibble: 10 × 3 #> HEALTH AT_WORK n #> #> 1 Excellent No 4055 #> 2 Excellent Yes 2900 #> 3 Very good No 3133 #> 4 Very good Yes 3371 #> 5 Good No 2480 #> 6 Good Yes 2178 #> 7 Fair No 1123 #> 8 Fair Yes 443 #> 9 Poor No 603 #> 10 Poor Yes 65"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"yielded-glm-regression","dir":"Articles","previous_headings":"Option 3: Process the data in pieces > Reading yielded data","what":"Yielded GLM regression","title":"Big IPUMS Data","text":"One major benefits yielded reading chunked reading compatible GLM functions biglm, allowing use complicated models. run logistic regression, first need reset yield object previous example: Next make function takes single argument: reset. reset TRUE, resets data beginning. dictated bigglm biglm. create function, use reset() method yield object: Finally feed function model specification bigglm() function:","code":"data$reset() get_model_data <- function(reset) { if (reset) { data$reset() } else { yield <- data$yield(n = 1000) if (is.null(yield)) { return(yield) } yield %>% mutate( HEALTH = as_factor(HEALTH), WORK30PLUS = lbl_na_if(AHRSWORKT, ~ .lbl == \"NIU (Not in universe)\") >= 30, AT_WORK = as_factor( lbl_relabel( EMPSTAT, lbl(1, \"Yes\") ~ .lbl == \"At work\", lbl(0, \"No\") ~ .lbl != \"At work\" ) ) ) %>% filter(AT_WORK == \"Yes\") } } results <- bigglm( WORK30PLUS ~ AGE + I(AGE^2) + HEALTH, family = binomial(link = \"logit\"), data = get_model_data ) summary(results) #> Large data regression model: bigglm(WORK30PLUS ~ AGE + I(AGE^2) + HEALTH, family = binomial(link = \"logit\"), #> data = get_model_data) #> Sample size = 8957 #> Coef (95% CI) SE p #> (Intercept) -4.0021 -4.4297 -3.5744 0.2138 0.0000 #> AGE 0.2714 0.2498 0.2930 0.0108 0.0000 #> I(AGE^2) -0.0029 -0.0032 -0.0027 0.0001 0.0000 #> HEALTHVery good 0.0038 -0.1346 0.1423 0.0692 0.9557 #> HEALTHGood -0.1129 -0.2685 0.0426 0.0778 0.1465 #> HEALTHFair -0.6637 -0.9160 -0.4115 0.1261 0.0000 #> HEALTHPoor -0.7879 -1.3697 -0.2062 0.2909 0.0068"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"database","dir":"Articles","previous_headings":"","what":"Option 4: Use a database","title":"Big IPUMS Data","text":"Storing data database another way work data fit memory data frame. access database remote machine, can easily select use parts data analysis. Even databases machine may provide efficient data storage use hard drive, enabling data loaded R. many different kinds databases, benefits drawbacks, database choose use specific use case. However, ’ve chosen database, two general steps: Importing data database Connecting database R R several tools support database integration, including DBI, dbplyr, sparklyr, sparkR, bigrquery, others. example, ’ll use RSQLite load data -memory database. (use RSQLite easy set , likely efficient enough fully resolve issues large IPUMS data, may wise consider alternative practice.)","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"importing-data-into-the-database","dir":"Articles","previous_headings":"Option 4: Use a database","what":"Importing data into the database","title":"Big IPUMS Data","text":"rectangular extracts, likely simplest load data database CSV format, widely supported. working hierarchical extract (database software doesn’t support CSV format), can use ipumsr chunked function load data database without needing store entire dataset R. (rectangular vs. hierarchical extracts, see “Hierarchical extracts” section vignette(\"ipums-read\").)","code":"library(DBI) library(RSQLite) # Connect to database con <- dbConnect(SQLite(), path = \":memory:\") # Load file metadata ddi <- read_ipums_ddi(cps_ddi_file) # Write data to database in chunks read_ipums_micro_chunked( ddi, readr::SideEffectChunkCallback$new( function(x, pos) { if (pos == 1) { dbWriteTable(con, \"cps\", x) } else { dbWriteTable(con, \"cps\", x, row.names = FALSE, append = TRUE) } } ), chunk_size = 1000, verbose = FALSE )"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"connecting-to-a-database-with-dbplyr","dir":"Articles","previous_headings":"Option 4: Use a database","what":"Connecting to a database with dbplyr","title":"Big IPUMS Data","text":"variety ways access data stored database. example, use dbplyr. details dbplyr, see vignette(\"dbplyr\", package = \"dbplyr\"). run simple query AGE, can use syntax use dplyr: dbplyr shows us nice preview first rows result query, data still exist database. can use dplyr::collect() load full results query current R session. However, omit variable metadata attached IPUMS data, since database doesn’t store metadata: Instead, use ipums_collect(), uses provided ipums_ddi object reattach metadata loading R environment: variable metadata IPUMS data, see vignette(\"value-labels\").","code":"example <- tbl(con, \"cps\") example %>% filter(\"AGE\" > 25) #> # Source: SQL [?? x 14] #> # Database: sqlite 3.43.2 [] #> YEAR SERIAL MONTH CPSID ASECFLAG ASECWTH FOODSTMP PERNUM CPSIDP ASECWT #> #> 1 2011 33 3 2.01e13 1 308. 1 1 2.01e13 308. #> 2 2011 33 3 2.01e13 1 308. 1 2 2.01e13 217. #> 3 2011 33 3 2.01e13 1 308. 1 3 2.01e13 249. #> 4 2011 46 3 2.01e13 1 266. 1 1 2.01e13 266. #> 5 2011 46 3 2.01e13 1 266. 1 2 2.01e13 266. #> 6 2011 46 3 2.01e13 1 266. 1 3 2.01e13 265. #> 7 2011 46 3 2.01e13 1 266. 1 4 2.01e13 296. #> 8 2011 64 3 2.01e13 1 241. 1 1 2.01e13 241. #> 9 2011 64 3 2.01e13 1 241. 1 2 2.01e13 241. #> 10 2011 64 3 2.01e13 1 241. 1 3 2.01e13 278. #> # ℹ more rows #> # ℹ 4 more variables: AGE , EMPSTAT , AHRSWORKT , HEALTH data <- example %>% filter(\"AGE\" > 25) %>% collect() # Variable metadata is missing ipums_val_labels(data$MONTH) #> # A tibble: 0 × 2 #> # ℹ 2 variables: val , lbl data <- example %>% filter(\"AGE\" > 25) %>% ipums_collect(ddi) ipums_val_labels(data$MONTH) #> # A tibble: 12 × 2 #> val lbl #> #> 1 1 January #> 2 2 February #> 3 3 March #> 4 4 April #> 5 5 May #> 6 6 June #> 7 7 July #> 8 8 August #> 9 9 September #> 10 10 October #> 11 11 November #> 12 12 December"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-bigdata.html","id":"learning-more","dir":"Articles","previous_headings":"","what":"Learning more","title":"Big IPUMS Data","text":"Big data isn’t just problem IPUMS users, many R resources available. See documentation packages mentioned databases section information options. past blog posts articles topic, see following: Big Data R - Part Stephen Mooney’s EPIC: Epidemiologic Analysis Using R, June 2015 class Statistical Analysis Open-Source R RStudio Amazon EMR - Markus Schmidberger AWS Big Data Blog Hosting RStudio Server Azure - Colin Gillespie’s blog post using Rstudio Azure Improving DBI: Retrospect - Kirill Müller’s report R Consortium grant improve database support R","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-read.html","id":"ipums-extract-structure","dir":"Articles","previous_headings":"","what":"IPUMS extract structure","title":"Reading IPUMS Data","text":"IPUMS extracts organized slightly differently different IPUMS projects. general, projects provide multiple files data extract. files relevant ipumsr : metadata file containing information variables included extract data One data files, depending project specifications extract files necessary properly load data R. Obviously, data files contain actual data values loaded. often fixed-width format, metadata files required correctly parse data load. Even .csv files, metadata file allows addition contextual variable information loaded data. makes much easier interpret values data variables effectively use data processing pipeline. See vignette value labels information working labels.","code":""},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-read.html","id":"reading-microdata-extracts","dir":"Articles","previous_headings":"","what":"Reading microdata extracts","title":"Reading IPUMS Data","text":"Microdata extracts typically provide metadata DDI (.xml) file separate compressed data (.dat.gz) files. Provide path DDI file read_ipums_micro() directly load associated data file R. Note provide path DDI file, data file. ipumsr needs find DDI data files read data, DDI file includes name data file, whereas data file contains raw data. loaded data parsed correctly include variable metadata column. summary column contents, use ipums_var_info(): information also attached specific columns. can obtain attributes() using ipumsr helpers: straightforward way load microdata, ’s often advantageous independently load DDI file ipums_ddi object containing metadata: many common data processing functions side-effect removing attributes: case, can always use separate DDI metadata reference: even reattach metadata, assuming variable names still match DDI:","code":"library(ipumsr) library(dplyr) # Example data cps_ddi_file <- ipums_example(\"cps_00157.xml\") cps_data <- read_ipums_micro(cps_ddi_file) head(cps_data) #> # A tibble: 6 × 8 #> YEAR SERIAL MONTH ASECWTH STATEFIP PERNUM ASECWT INCTOT #> #> 1 1962 80 3 [March] 1476. 55 [Wisconsin] 1 1476. 4883 #> 2 1962 80 3 [March] 1476. 55 [Wisconsin] 2 1471. 5800 #> 3 1962 80 3 [March] 1476. 55 [Wisconsin] 3 1579. 999999998 [Missin… #> 4 1962 82 3 [March] 1598. 27 [Minnesota] 1 1598. 14015 #> 5 1962 83 3 [March] 1707. 27 [Minnesota] 1 1707. 16552 #> 6 1962 84 3 [March] 1790. 27 [Minnesota] 1 1790. 6375 ipums_var_info(cps_data) #> # A tibble: 8 × 4 #> var_name var_label var_desc val_labels #> #> 1 YEAR Survey year \"YEAR r… #> 2 SERIAL Household serial number \"SERIAL… #> 3 MONTH Month \"MONTH … #> 4 ASECWTH Annual Social and Economic Supplement Household … \"ASECWT… #> 5 STATEFIP State (FIPS code) \"STATEF… #> 6 PERNUM Person number in sample unit \"PERNUM… #> 7 ASECWT Annual Social and Economic Supplement Weight \"ASECWT… #> 8 INCTOT Total personal income \"INCTOT… attributes(cps_data$MONTH) #> $labels #> January February March April May June July August #> 1 2 3 4 5 6 7 8 #> September October November December #> 9 10 11 12 #> #> $class #> [1] \"haven_labelled\" \"vctrs_vctr\" \"integer\" #> #> $label #> [1] \"Month\" #> #> $var_desc #> [1] \"MONTH indicates the calendar month of the CPS interview.\" ipums_val_labels(cps_data$MONTH) #> # A tibble: 12 × 2 #> val lbl #> #> 1 1 January #> 2 2 February #> 3 3 March #> 4 4 April #> 5 5 May #> 6 6 June #> 7 7 July #> 8 8 August #> 9 9 September #> 10 10 October #> 11 11 November #> 12 12 December cps_ddi <- read_ipums_ddi(cps_ddi_file) cps_ddi #> An IPUMS DDI for IPUMS CPS with 8 variables #> Extract 'cps_00157.dat' created on 2023-07-10 #> User notes: User-provided description: Reproducing cps00006 # This doesn't actually change the data... cps_data2 <- cps_data %>% mutate(MONTH = ifelse(TRUE, MONTH, MONTH)) # but removes attributes! ipums_val_labels(cps_data2$MONTH) #> # A tibble: 0 × 2 #> # ℹ 2 variables: val , lbl ipums_val_labels(cps_ddi, var = MONTH) #> # A tibble: 12 × 2 #> val lbl #> #> 1 1 January #> 2 2 February #> 3 3 March #> 4 4 April #> 5 5 May #> 6 6 June #> 7 7 July #> 8 8 August #> 9 9 September #> 10 10 October #> 11 11 November #> 12 12 December cps_data2 <- set_ipums_var_attributes(cps_data2, cps_ddi) ipums_val_labels(cps_data2$MONTH) #> # A tibble: 12 × 2 #> val lbl #> #> 1 1 January #> 2 2 February #> 3 3 March #> 4 4 April #> 5 5 May #> 6 6 June #> 7 7 July #> 8 8 August #> 9 9 September #> 10 10 October #> 11 11 November #> 12 12 December"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-read.html","id":"hierarchical-extracts","dir":"Articles","previous_headings":"Reading microdata extracts","what":"Hierarchical extracts","title":"Reading IPUMS Data","text":"IPUMS microdata can come either “rectangular” “hierarchical” format. Rectangular data transformed every row data represents type record. instance, row represent person record, household-level information person included row. (case CPS example .) Hierarchical data records different types interspersed single file. instance, household record included row followed person records associated household. Hierarchical data can loaded list format long format. read_ipums_micro() read long format: long format consists single data.frame includes rows varying record types. example, rows record type “Household” others record type “Person”. Variables apply particular record type filled NA rows record type. read data list format, use read_ipums_micro_list(). function returns list element contains records given record type: read_ipums_micro() read_ipums_micro_list() also support partial loading selecting subset columns limited number rows. See documentation details options.","code":"cps_hier_ddi <- read_ipums_ddi(ipums_example(\"cps_00159.xml\")) read_ipums_micro(cps_hier_ddi) #> Use of data from IPUMS CPS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details. #> # A tibble: 11,053 × 9 #> RECTYPE YEAR SERIAL MONTH ASECWTH STATEFIP PERNUM ASECWT INCTOT #> #> 1 H [Househ… 1962 80 3 [Mar… 1476. 55 [Wis… NA NA NA #> 2 P [Person… 1962 80 NA NA NA 1 1476. 4.88e3 #> 3 P [Person… 1962 80 NA NA NA 2 1471. 5.8 e3 #> 4 P [Person… 1962 80 NA NA NA 3 1579. 1.00e9 [Mis… #> 5 H [Househ… 1962 82 3 [Mar… 1598. 27 [Min… NA NA NA #> 6 P [Person… 1962 82 NA NA NA 1 1598. 1.40e4 #> 7 H [Househ… 1962 83 3 [Mar… 1707. 27 [Min… NA NA NA #> 8 P [Person… 1962 83 NA NA NA 1 1707. 1.66e4 #> 9 H [Househ… 1962 84 3 [Mar… 1790. 27 [Min… NA NA NA #> 10 P [Person… 1962 84 NA NA NA 1 1790. 6.38e3 #> # ℹ 11,043 more rows read_ipums_micro_list(cps_hier_ddi) #> Use of data from IPUMS CPS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details. #> $HOUSEHOLD #> # A tibble: 3,385 × 6 #> RECTYPE YEAR SERIAL MONTH ASECWTH STATEFIP #> #> 1 H [Household Record] 1962 80 3 [March] 1476. 55 [Wisconsin] #> 2 H [Household Record] 1962 82 3 [March] 1598. 27 [Minnesota] #> 3 H [Household Record] 1962 83 3 [March] 1707. 27 [Minnesota] #> 4 H [Household Record] 1962 84 3 [March] 1790. 27 [Minnesota] #> 5 H [Household Record] 1962 107 3 [March] 4355. 19 [Iowa] #> 6 H [Household Record] 1962 108 3 [March] 1479. 19 [Iowa] #> 7 H [Household Record] 1962 122 3 [March] 3603. 27 [Minnesota] #> 8 H [Household Record] 1962 124 3 [March] 4104. 55 [Wisconsin] #> 9 H [Household Record] 1962 125 3 [March] 2182. 55 [Wisconsin] #> 10 H [Household Record] 1962 126 3 [March] 1826. 55 [Wisconsin] #> # ℹ 3,375 more rows #> #> $PERSON #> # A tibble: 7,668 × 6 #> RECTYPE YEAR SERIAL PERNUM ASECWT INCTOT #> #> 1 P [Person Record] 1962 80 1 1476. 4883 #> 2 P [Person Record] 1962 80 2 1471. 5800 #> 3 P [Person Record] 1962 80 3 1579. 999999998 [Missing. (1962-1964 … #> 4 P [Person Record] 1962 82 1 1598. 14015 #> 5 P [Person Record] 1962 83 1 1707. 16552 #> 6 P [Person Record] 1962 84 1 1790. 6375 #> 7 P [Person Record] 1962 107 1 4355. 999999999 [N.I.U.] #> 8 P [Person Record] 1962 107 2 1386. 0 #> 9 P [Person Record] 1962 107 3 1629. 600 #> 10 P [Person Record] 1962 107 4 1432. 999999999 [N.I.U.] #> # ℹ 7,658 more rows"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-read.html","id":"reading-ipums-nhgis-extracts","dir":"Articles","previous_headings":"","what":"Reading IPUMS NHGIS extracts","title":"Reading IPUMS Data","text":"Unlike microdata projects, NHGIS extracts provide data metadata files bundled single .zip archive. read_nhgis() anticipates structure can read data files directly file without need manually extract files: Like microdata extracts, data include variable-level metadata, available: Variable metadata NHGIS data slightly different provided microdata products. First, come .txt codebook file rather .xml DDI file. Codebooks can still loaded ipums_ddi object, fields apply aggregate data empty. general, NHGIS codebooks provide variable labels descriptions, along citation information. design, NHGIS codebooks human-readable. view codebook contents without converting ipums_ddi object, set raw = TRUE.","code":"nhgis_ex1 <- ipums_example(\"nhgis0972_csv.zip\") nhgis_data <- read_nhgis(nhgis_ex1) #> Use of data from NHGIS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details. #> Rows: 71 Columns: 25 #> ── Column specification ──────────────────────────────────────────────────────── #> Delimiter: \",\" #> chr (9): GISJOIN, STUSAB, CMSA, PMSA, PMSAA, AREALAND, AREAWAT, ANPSADPI, F... #> dbl (13): YEAR, MSA_CMSAA, INTPTLAT, INTPTLNG, PSADC, D6Z001, D6Z002, D6Z003... #> lgl (3): DIVISIONA, REGIONA, STATEA #> #> ℹ Use `spec()` to retrieve the full column specification for this data. #> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. nhgis_data #> # A tibble: 71 × 25 #> GISJOIN YEAR STUSAB CMSA DIVISIONA MSA_CMSAA PMSA PMSAA REGIONA STATEA #> #> 1 G0080 1990 OH 28 NA 1692 Akron, O… 0080 NA NA #> 2 G0360 1990 CA 49 NA 4472 Anaheim-… 0360 NA NA #> 3 G0440 1990 MI 35 NA 2162 Ann Arbo… 0440 NA NA #> 4 G0620 1990 IL 14 NA 1602 Aurora--… 0620 NA NA #> 5 G0845 1990 PA 78 NA 6282 Beaver C… 0845 NA NA #> 6 G0875 1990 NJ 70 NA 5602 Bergen--… 0875 NA NA #> 7 G1120 1990 MA 07 NA 1122 Boston, … 1120 NA NA #> 8 G1125 1990 CO 34 NA 2082 Boulder-… 1125 NA NA #> 9 G1145 1990 TX 42 NA 3362 Brazoria… 1145 NA NA #> 10 G1160 1990 CT 70 NA 5602 Bridgepo… 1160 NA NA #> # ℹ 61 more rows #> # ℹ 15 more variables: AREALAND , AREAWAT , ANPSADPI , #> # FUNCSTAT , INTPTLAT , INTPTLNG , PSADC , D6Z001 , #> # D6Z002 , D6Z003 , D6Z004 , D6Z005 , D6Z006 , #> # D6Z007 , D6Z008 attributes(nhgis_data$D6Z001) #> $label #> [1] \"Total area: 1989 to March 1990\" #> #> $var_desc #> [1] \"Table D6Z: Year Structure Built (Universe: Housing Units)\" nhgis_cb <- read_nhgis_codebook(nhgis_ex1) # Most useful metadata for NHGIS is for variable labels: ipums_var_info(nhgis_cb) %>% select(var_name, var_label, var_desc) #> # A tibble: 25 × 3 #> var_name var_label var_desc #> #> 1 GISJOIN GIS Join Match Code \"\" #> 2 YEAR Data File Year \"\" #> 3 STUSAB State/US Abbreviation \"\" #> 4 CMSA Consolidated Metropolitan Statistical Area \"\" #> 5 DIVISIONA Division Code \"\" #> 6 MSA_CMSAA Metropolitan Statistical Area/Consolidated Metropolitan S… \"\" #> 7 PMSA Primary Metropolitan Statistical Area Name \"\" #> 8 PMSAA Primary Metropolitan Statistical Area Code \"\" #> 9 REGIONA Region Code \"\" #> 10 STATEA State Code \"\" #> # ℹ 15 more rows nhgis_cb <- read_nhgis_codebook(nhgis_ex1, raw = TRUE) cat(nhgis_cb[1:20], sep = \"\\n\") #> -------------------------------------------------------------------------------- #> Codebook for NHGIS data file 'nhgis0972_ds135_1990_pmsa' #> -------------------------------------------------------------------------------- #> #> Contents #> - Data Summary #> - Data Dictionary #> - Citation and Use #> #> Additional documentation on NHGIS data sources is available at: #> https://www.nhgis.org/documentation/tabular-data #> #> -------------------------------------------------------------------------------- #> Data Summary #> -------------------------------------------------------------------------------- #> #> Year: 1990 #> Geographic level: Consolidated Metropolitan Statistical Area--Primary Metropolitan Statistical Area #> Dataset: 1990 Census: SSTF 9 - Housing Characteristics of New Units #> NHGIS code: 1990_SSTF09"},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-read.html","id":"handling-multiple-files","dir":"Articles","previous_headings":"Reading IPUMS NHGIS extracts","what":"Handling multiple files","title":"Reading IPUMS Data","text":"example, read_nhgis_codebook() able identify load codebook file, even though provided file path provided read_nhgis() earlier. However, complicated NHGIS extracts include data multiple data sources, provided .zip archive contain multiple codebook data files. can view files contained extract determine case: cases, can use file_select argument indicate file load. file_select supports features tidyselect selection language. (See ?selection_language documentation features supported ipumsr.) matching codebook automatically loaded attached data: (reason codebook loaded correctly, can load separately read_nhgis_codebook(), also accepts file_select specification.) file_select also accepts full path index file load:","code":"nhgis_ex2 <- ipums_example(\"nhgis0731_csv.zip\") ipums_list_files(nhgis_ex2) #> # A tibble: 2 × 2 #> type file #> #> 1 data nhgis0731_csv/nhgis0731_ds239_20185_nation.csv #> 2 data nhgis0731_csv/nhgis0731_ts_nominal_state.csv nhgis_data2 <- read_nhgis(nhgis_ex2, file_select = contains(\"nation\")) nhgis_data3 <- read_nhgis(nhgis_ex2, file_select = contains(\"ts_nominal_state\")) attributes(nhgis_data2$AJWBE001) #> $label #> [1] \"Estimates: Total\" #> #> $var_desc #> [1] \"Table AJWB: Sex by Age (Universe: Total population)\" attributes(nhgis_data3$A00AA1790) #> $label #> [1] \"1790: Persons: Total\" #> #> $var_desc #> [1] \"Table A00: Total Population\" # Match by file name read_nhgis(nhgis_ex2, file_select = \"nhgis0731_csv/nhgis0731_ds239_20185_nation.csv\") # Match first file in extract read_nhgis(nhgis_ex2, file_select = 1)"},{"path":[]},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-read.html","id":"csv-data","dir":"Articles","previous_headings":"Reading IPUMS NHGIS extracts > NHGIS data formats","what":"CSV data","title":"Reading IPUMS Data","text":"NHGIS data easily handled .csv format. read_nhgis() uses readr::read_csv() handle generation column type specifications. guessed specifications incorrect, can use col_types argument adjust. likely occur columns contain geographic codes stored numeric values:","code":"# Convert MSA codes to character format read_nhgis( nhgis_ex1, col_types = c(MSA_CMSAA = \"c\"), verbose = FALSE ) #> # A tibble: 71 × 25 #> GISJOIN YEAR STUSAB CMSA DIVISIONA MSA_CMSAA PMSA PMSAA REGIONA STATEA #> #> 1 G0080 1990 OH 28 NA 1692 Akron, O… 0080 NA NA #> 2 G0360 1990 CA 49 NA 4472 Anaheim-… 0360 NA NA #> 3 G0440 1990 MI 35 NA 2162 Ann Arbo… 0440 NA NA #> 4 G0620 1990 IL 14 NA 1602 Aurora--… 0620 NA NA #> 5 G0845 1990 PA 78 NA 6282 Beaver C… 0845 NA NA #> 6 G0875 1990 NJ 70 NA 5602 Bergen--… 0875 NA NA #> 7 G1120 1990 MA 07 NA 1122 Boston, … 1120 NA NA #> 8 G1125 1990 CO 34 NA 2082 Boulder-… 1125 NA NA #> 9 G1145 1990 TX 42 NA 3362 Brazoria… 1145 NA NA #> 10 G1160 1990 CT 70 NA 5602 Bridgepo… 1160 NA NA #> # ℹ 61 more rows #> # ℹ 15 more variables: AREALAND , AREAWAT , ANPSADPI , #> # FUNCSTAT , INTPTLAT , INTPTLNG , PSADC , D6Z001 , #> # D6Z002 , D6Z003 , D6Z004 , D6Z005 , D6Z006 , #> # D6Z007 , D6Z008 "},{"path":"http://tech.popdata.org/ipumsr/articles/ipums-read.html","id":"fixed-width-data","dir":"Articles","previous_headings":"Reading IPUMS NHGIS extracts > NHGIS data formats","what":"Fixed-width data","title":"Reading IPUMS Data","text":"read_nhgis() also handles NHGIS files provided fixed-width format: Note case numeric geographic codes correctly loaded character variables. correct parsing NHGIS fixed-width files driven column parsing information contained .file provided .zip archive. contains information column positions data types, also implicit decimals data. longer access .file, best resubmit /re-download extract (may also consider converting .csv format process). moved .file, provide file path do_file argument use column parsing information. Note unlike read_ipums_micro(), fixed-width files NHGIS still handled providing path data file, metadata file (.e. provide ipums_ddi object data_file argument read_nhgis()). syntactical consistency loading NHGIS .csv files.","code":"nhgis_fwf <- ipums_example(\"nhgis0730_fixed.zip\") nhgis_fwf_data <- read_nhgis(nhgis_fwf, file_select = matches(\"ts_nominal\")) #> Use of data from NHGIS is subject to conditions including that users should cite the data appropriately. Use command `ipums_conditions()` for more details. #> Rows: 84 Columns: 28 #> ── Column specification ──────────────────────────────────────────────────────── #> #> chr (4): GISJOIN, STATE, STATEFP, STATENH #> dbl (24): A00AA1790, A00AA1800, A00AA1810, A00AA1820, A00AA1830, A00AA1840, ... #> #> ℹ Use `spec()` to retrieve the full column specification for this data. #> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message. nhgis_fwf_data #> # A tibble: 84 × 28 #> GISJOIN STATE STATEFP STATENH A00AA1790 A00AA1800 A00AA1810 A00AA1820 #> #> 1 G010 Alabama 01 010 NA NA NA 127901 #> 2 G020 Alaska 02 020 NA NA NA NA #> 3 G025 Alaska Terri… NA 025 NA NA NA NA #> 4 G040 Arizona 04 040 NA NA NA NA #> 5 G045 Arizona Terr… NA 045 NA NA NA NA #> 6 G050 Arkansas 05 050 NA NA NA NA #> 7 G055 Arkansas Ter… NA 055 NA NA NA 14273 #> 8 G060 California 06 060 NA NA NA NA #> 9 G080 Colorado 08 080 NA NA NA NA #> 10 G085 Colorado Ter… NA 085 NA NA NA NA #> # ℹ 74 more rows #> # ℹ 20 more variables: A00AA1830 , A00AA1840 , A00AA1850 , #> # A00AA1860 , A00AA1870 , A00AA1880 , A00AA1890 , #> # A00AA1900 , A00AA1910 , A00AA1920 , A00AA1930 , #> # A00AA1940 , A00AA1950 , A00AA1960 , A00AA1970 , #> # A00AA1980 , A00AA1990 , A00AA2000