Skip to content

R package for verifying and transforming data between nomenclature, categories or standards

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

cynthiahqy/xmap

Repository files navigation

xmap xmap website

The {xmap} package provides support for transformations of numeric aggregates between statistical classifications (e.g. occupation or industry categorisations) using the Crossmaps framework. Implements classes for representing transformations between a source and target classification as graph structures (i.e. Crossmaps), and methods for validating and applying crossmaps to transform data collected under the source classification into data indexed using the target classification codes.

Overview

Crossmaps encodes instructions for transforming data between statistical classifications. It is a graph structure with links (to, from, and weight_by) that associate source and target classification codes with weights for redistributing numeric mass attached to each code in a source classification. For example, a given link could specify encode that 10% of people in a source classification are employed in a particular target occupation. A collection of links between two classifications forms a crossmap graph structure, which when represented as an table of links (i.e. an edge list table) can be easily verified against conditions that are required for a valid transformation of data between the specified classifications.

Using a valid crossmap guarantees that the total mass before and after the transformation remains the same. For example, if we reclassify counts of workers by occupation, we would expect that the total number of workers across all occupation categories remains unchanged after reclassification. However, comparing totals is not always sufficient to identify mistakes in data transformation as there can be multiple ways to redistribute mass between source and target classifications while maintaining the same total. This package allows you to create, validate and apply xmap_tbl objects to perform valid and mass-preserving transformations of numeric aggregates between statistical classifications. The crossmaps workflow saves users from having to manually check code lines for implementation errors by verifying crossmaps satisfy mathematically sufficient conditions for valid transformation.

Citation and Related Papers

For more details the task abstraction underpinning the crossmap framework and some visualisations of crossmaps, see Visualising category recoding and numeric redistributions, and for more on how transformation guarantees arise from graph properities of crossmaps, see A Unified Statistical And Computational Framework For Ex-Post Harmonisation Of Aggregate Statistics.

To cite this package use:

citation("xmap")
## Warning in citation("xmap"): could not determine year for 'xmap' from package
## DESCRIPTION file

## To cite package 'xmap' in publications use:
## 
##   Huang C, Puzzello L (????). _xmap: Transforming Data Between
##   Statistical Classifications_. https://github.com/cynthiahqy/xmap,
##   https://cynthiahqy.github.io/xmap/.
## 
## A BibTeX entry for LaTeX users is
## 
##   @Manual{,
##     title = {xmap: Transforming Data Between Statistical Classifications},
##     author = {Cynthia A. Huang and Laura Puzzello},
##     note = {https://github.com/cynthiahqy/xmap, https://cynthiahqy.github.io/xmap/},
##   }

Installation

To install the latest CRAN release of xmap:

install.packages("xmap")

To install the latest development version of xmap:

remotes::install_github("cynthiahqy/xmap")

Usage

Creating crossmaps

The easiest way to create a crossmap is to coerce a dataframe (e.g. xmap::demo$abc_links) containing source codes, target codes and weights between them:

library(xmap)
demo$abc_links |>
  as_xmap_tbl(from = "lower", to = "upper", weight_by = "share")
## # A crossmap tibble: 6 × 3
## # with unique keys:  [4] lower -> [5] upper
##   .from$lower .to$upper .weight_by$share
##   <chr>       <chr>                <dbl>
## 1 a           AA                     1  
## 2 b           BB                     1  
## 3 c           BB                     1  
## 4 d           CC                     0.3
## 5 d           DD                     0.6
## 6 d           EE                     0.1

If the coercion fails, you can use diagnose_as_xmap() to identify issues:

bad_links <- demo$abc_links
bad_links[4, "share"] <- 5

diagnose_as_xmap_tbl(bad_links, from = "lower", to = "upper", weight_by = "share")
## Warning: The sum of weights on outgoing links for some source nodes are not near 1
## ℹ Fix weights or adjust `tol=`
## ℹ See `.$bad_froms` for more details

## $bad_dups
## NULL
## 
## $miss_weight_by
## NULL
## 
## $bad_froms
## # A tibble: 1 × 2
##   .from$lower .sum.weight_by
##   <chr>                <dbl>
## 1 d                      5.7

Applying crossmaps

When using a crossmap to transform data, you want to make sure that the crossmap covers all the codes present in your data. For example, if your data contained a count for the category “teacher”, but your crossmap doesn’t have any links with “teacher”, then you risk silently losing data in the transformation. Even if you wanted to remove the count for “teacher”, this should be done in the original dataset explicitly (e.g. via filtering and removing rows), rather than implictly in the transformation.

To use a suitable crossmap to transform data, you can use apply_xmap():

abc_xmap <- demo$abc_links |>
  as_xmap_tbl(from = "lower", to = "upper", weight_by = "share")
abc_data <- tibble::tibble(
  lower = unique(demo$abc_links$lower),
  count = runif(length(unique(demo$abc_links$lower)), min = 100, max = 500)
)
transformed_data <- apply_xmap(
  .data = abc_data,
  .xmap = abc_xmap,
  values_from = count
)
## Matching keys in `.data$lower` with `.xmap$.from$lower`
## ℹ To silence, set `keys_from = lower`
## totals still match!
sum(abc_data$count) == sum(transformed_data$count)
## [1] TRUE

About

R package for verifying and transforming data between nomenclature, categories or standards

Topics

Resources

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md

Stars

Watchers

Forks

Languages