Skip to content

An R package for building state transition matrices

License

Unknown, MIT licenses found

Licenses found

Unknown
LICENSE
MIT
LICENSE.md
Notifications You must be signed in to change notification settings

ketchbrookanalytics/migrate

Repository files navigation

migrate

lifecycle CRAN status metacran downloads R-CMD-check

The goal of {migrate} is to provide users with an easy set of tools for building state transition matrices.


Methodology

{migrate} provides an easy way to calculate absolute or percentage migration within a credit portfolio. The above image shows a typical credit migration matrix using the absolute approach; each cell in the grid represents the total balance in the portfolio at 2020-06-30 that started at the Risk Rating represented on the left-hand vertical axis and ended (at 2020-09-30) at the Risk Rating represented on the upper horizontal axis of the matrix. For example, $6.58M moved from a Risk Rating AAA at 2020-06-30 to a Risk Rating AA at 2020-09-30.

While the above, absolute, migration example is typically more of a reporting function, the percentage (or probabilistic) methodology is often more of a statistical modeling exercise, often used in credit portfolio risk management. Currently, this package only supports the simple “cohort” methodology. This estimates the probability of moving from state i to state j in a single time step, echoing a Markov process. We can visualize this in a matrix, for a credit portfolio with N unique, ordinal states:

Future Plans for {migrate}

Future development plans for this package include building functionality for the more complex duration/hazard methodology, including both the time-homogeneous and non-homogeneous implementations.

Installation

You can install the released version of {migrate} from CRAN with:

install.packages("migrate")

And the development version from GitHub with:

# install.packages("devtools")
devtools::install_github("ketchbrookanalytics/migrate")

Practical Usage

{migrate} currently only handles transitions between exactly two (2) timepoints. Under the hood, migrate() finds the earliest & latest dates in the given time variable, and filters out any observations where the time value does not match those two dates.

If you are writing a SQL query to get data to be used with migrate(), the query would likely look something like this:

# -- Get the *State* risk status and *Balance* dollar amount for each ID, at two distinct dates

# SELECT ID, Date, State, Balance
# FROM my_database
# WHERE Date IN ('2020-12-31', '2021-06-30')

By default, migrate() drops observations that belong to IDs found at a single timepoint. However, users can define a filler state so that IDs with a single timepoint are not removed but rather migrated from or to this filler state. This allows for more flexible handling of such data, ensuring that no information is lost during the migration process. Check Handle IDs with observations at a single timepoint for more information.

Example

First, load the package using library()

library(migrate)

The package has a built-in mock dataset, which can be loaded into the environment like so:

data("mock_credit")

head(mock_credit[order(mock_credit$customer_id), ])   # sort by 'customer_id'
customer_id date risk_rating principal_balance
Customer_1001 2020-06-30 A 915000
Customer_1001 2020-09-30 A 1328000
Customer_1002 2020-06-30 AAA 979000
Customer_1002 2020-09-30 AAA 354000
Customer_1003 2020-06-30 BBB 1400000
Customer_1003 2020-09-30 BBB 356000

Note that an important feature of the mock_credit dataset is that there are exactly two (2) unique values in the date column variable; if the time argument passed to migrate() has more than two (2) unique values, the function will throw an error.

unique(mock_credit$date)
#> [1] "2020-06-30" "2020-09-30"

To summarize the migration within the data, use the migrate() function

migrated_df <- migrate(
  data = mock_credit,
  id = customer_id,
  time = date,
  state = risk_rating,
)
#> ℹ Migrating from 2020-06-30 to 2020-09-30
head(migrated_df)
#> # A tibble: 6 × 3
#>   risk_rating_start risk_rating_end   prop
#>   <ord>             <ord>            <dbl>
#> 1 AAA               AAA             0.774 
#> 2 AAA               AA              0.194 
#> 3 AAA               A               0.0323
#> 4 AAA               BBB             0     
#> 5 AAA               BB              0     
#> 6 AAA               B               0

To create the state transition matrix, use the build_matrix() function

build_matrix(migrated_df)
#> ℹ Using `risk_rating_start` as the 'state_start' column variable
#> ℹ Using `risk_rating_end` as the 'state_end' column variable
#> ℹ Using `prop` as the 'metric' column variable
#>             AAA         AA          A        BBB         BB          B        CCC
#> AAA 0.774193548 0.19354839 0.03225806 0.00000000 0.00000000 0.00000000 0.00000000
#> AA  0.101123596 0.66292135 0.15730337 0.07865169 0.00000000 0.00000000 0.00000000
#> A   0.008333333 0.06666667 0.72500000 0.16666667 0.03333333 0.00000000 0.00000000
#> BBB 0.000000000 0.00000000 0.11363636 0.68181818 0.14772727 0.05681818 0.00000000
#> BB  0.000000000 0.00000000 0.00000000 0.11392405 0.63291139 0.16455696 0.08860759
#> B   0.000000000 0.00000000 0.00000000 0.01388889 0.09722222 0.62500000 0.26388889
#> CCC 0.000000000 0.00000000 0.00000000 0.00000000 0.00000000 0.14285714 0.85714286

Or, to do it all in one shot, use the |>

mock_credit |>
  migrate(
    id = customer_id,
    time = date,
    state = risk_rating,
    metric = principal_balance,
    percent = FALSE,
    verbose = FALSE
  ) |>
  build_matrix(
    state_start = risk_rating_start,
    state_end = risk_rating_end,
    metric = principal_balance
  )
#>          AAA       AA        A      BBB       BB        B      CCC
#> AAA 29042000  6575000    20000        0        0        0        0
#> AA   6445000 58095000 13045000 14467000        0        0        0
#> A     804000  7898000 85330000 21015000  5829000        0        0
#> BBB        0        0 12461000 65315000 13911000  8140000        0
#> BB         0        0        0 11374000 45986000 14057000  5723000
#> B          0        0        0   413000  6700000 47402000 17132000
#> CCC        0        0        0        0        0  2094000 14843000