README.Rmd

---
output: github_document
---

<!-- README.md is generated from README.Rmd. Please edit that file -->

```{r, include = FALSE}
options(width = 999)

knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.path = "man/figures/README-",
  out.width = "100%"
)

devtools::load_all()
```

# migrate <img src='man/figures/logo.png' align="right" height="200" />

<!-- badges: start -->
[![lifecycle](https://img.shields.io/badge/lifecycle-stable-green.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable)
[![CRAN status](https://www.r-pkg.org/badges/version/migrate)](https://CRAN.R-project.org/package=migrate)
[![metacran downloads](https://cranlogs.r-pkg.org/badges/migrate)](https://cran.r-project.org/package=migrate)
[![R-CMD-check](https://github.com/ketchbrookanalytics/migrate/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/ketchbrookanalytics/migrate/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

The goal of {migrate} is to provide users with an easy set of tools for building *state transition matrices*.

<br>

![](man/figures/gt_tbl.png)

## Methodology

{migrate} provides an easy way to calculate absolute or percentage migration within a credit portfolio. The above image shows a typical credit migration matrix using the *absolute* approach; each cell in the grid represents the total balance in the portfolio at 2020-06-30 that started at the Risk Rating represented on the left-hand vertical axis and ended (at 2020-09-30) at the Risk Rating represented on the upper horizontal axis of the matrix. For example, $6.58M moved from a Risk Rating **AAA** at 2020-06-30 to a Risk Rating **AA** at 2020-09-30.

While the above, *absolute*, migration example is typically more of a reporting function, the *percentage* (or probabilistic) methodology is often more of a statistical modeling exercise, often used in credit portfolio risk management. Currently, this package only supports the simple "cohort" methodology. This estimates the probability of moving from state *i* to state *j* in a single time step, echoing a Markov process. We can visualize this in a matrix, for a credit portfolio with *N* unique, ordinal states:

![](man/figures/markov_matrix.png)

### Future Plans for {migrate}

Future development plans for this package include building functionality for the more complex **duration**/**hazard** methodology, including both the *time-homogeneous* and *non-homogeneous* implementations.

## Installation

You can install the released version of {migrate} from [CRAN](https://CRAN.R-project.org) with:

``` {r, eval = FALSE}
install.packages("migrate")
```

And the development version from [GitHub](https://github.com/) with:

``` {r, eval = FALSE}
# install.packages("devtools")
devtools::install_github("ketchbrookanalytics/migrate")
```

## Practical Usage

{migrate} currently only handles transitions between exactly two (2) timepoints. Under the hood, `migrate()` finds the earliest & latest dates in the given *time* variable, and filters out any observations where the *time* value does not match those two dates.

If you are writing a SQL query to get data to be used with `migrate()`, the query would likely look something like this:

```{r, eval = FALSE}
# -- Get the *State* risk status and *Balance* dollar amount for each ID, at two distinct dates

# SELECT ID, Date, State, Balance
# FROM my_database
# WHERE Date IN ('2020-12-31', '2021-06-30')
```

By default, `migrate()` drops observations that belong to IDs found at a single timepoint. However, users can define a *filler state* so that IDs with a single timepoint are not removed but rather migrated from or to this *filler state*. This allows for more flexible handling of such data, ensuring that no information is lost during the migration process. Check [Handle IDs with observations at a single timepoint](https://ketchbrookanalytics.github.io/migrate/articles/migrate.html#handle-ids-with-observations-at-a-single-timepoint) for more information.

## Example

First, load the package using `library()`

```{r load, eval = FALSE}
library(migrate)
```

The package has a built-in mock dataset, which can be loaded into the environment like so:

```{r data, eval = FALSE}
data("mock_credit")

head(mock_credit[order(mock_credit$customer_id), ])   # sort by 'customer_id'
```

```{r data_tbl, echo = FALSE}
head(mock_credit[order(mock_credit$customer_id), ]) |>
  knitr::kable(row.names = FALSE)
```

Note that an important feature of the `mock_credit` dataset is that there are exactly two (2) unique values in the `date` column variable; if the `time` argument passed to `migrate()` has more than two (2) unique values, the function will throw an error.

```{r dates}
unique(mock_credit$date)
```

To summarize the migration within the data, use the `migrate()` function

```{r migrate}
migrated_df <- migrate(
  data = mock_credit,
  id = customer_id,
  time = date,
  state = risk_rating,
)
head(migrated_df)
```

To create the state transition matrix, use the `build_matrix()` function
```{r matrix}
build_matrix(migrated_df)
```

Or, to do it all in one shot, use the `|>`
```{r pipe}
mock_credit |>
  migrate(
    id = customer_id,
    time = date,
    state = risk_rating,
    metric = principal_balance,
    percent = FALSE,
    verbose = FALSE
  ) |>
  build_matrix(
    state_start = risk_rating_start,
    state_end = risk_rating_end,
    metric = principal_balance
  )
```