This is the companion website for the paper "Health dynamics, life expectancy heterogeneity, and the racial gap in Social Security wealth" which hosts the health-to-health and survival transition probabilities estimated from the Health and Retirement Study (HRS) for the United States in CSV and Excel format.
Authors: Richard Foltyn, Jonna Olsson
The contents of this repository is licensed under a Creative Commons Attribution 4.0 International License.
If you are using the material in your research, please cite
Foltyn, Richard and Jonna Olsson: "Health dynamics, life expectancy heterogeneity, and the racial gap in Social Security wealth", 2024
You can download the citation in BibTeX format.
- The directory Health-5 contains the estimates for the benchmark model using all five health states reported in the HRS.
- The directory Health-3 contains the estimates for a model with a smaller set of only three health states, where we combine the first two ("excellent" and "very good") and the last two ("fair" and "poor") states.
- The directory Health-2 contains the estimates for a model where the first three health states are merged into one group and the last two form the other group.
The CSV files contain the health-to-health transition and survival probabilities for individuals aged 50 to 99. The estimates for each demographic group are stored in a separate file.
The CSV files have the following format:
- Each six lines correspond to an age-specific block, i.e. the first six lines are for age 50, the next six for age 51, etc.
- Within each block, the first 5 lines correspond to the initial health state: (1) excellent, (2) very good, ..., (5) poor.
- Each column corresponds to one outcome: the first 5 columns are health states, and the last column is the probability of dying.
- The sixth line is present for completeness so that each age-specific transition matrix is 6-by-6. It represents the absorbing state of death.
The easiest way to load the CSV files is to use the pandas library:
import pandas as pd
# Create DataFrame from CSV data
df = pd.read_csv('H5_trans_prob_age50-99_nonblack_male.csv', sep=',', index_col=['age', 'health'])
# print first 5 rows of DataFrame
df[:5]
Health1 Health2 Health3 Health4 Health5 Death
age health
50 1 0.720009 0.241329 0.030531 0.005596 0.001696 0.000839
2 0.104946 0.733775 0.146507 0.013232 0.000663 0.000877
3 0.012944 0.179531 0.699388 0.095823 0.009584 0.002730
4 0.005203 0.022331 0.228457 0.649878 0.083380 0.010752
5 0.009192 0.003818 0.022781 0.224592 0.698203 0.041414
Alternatively, plain numpy also works:
import numpy as np
data = np.loadtxt('H5_trans_prob_age50-99_nonblack_male.csv', delimiter=',', skiprows=1)
# Transition probabilities
prob = np.ascontiguousarray(data[:, 2:])
# Age corresponding to each row in prob array
age = np.array(data[:, 0], dtype=int)
# Health state corresponding to each row in prob array
health = np.array(data[:, 1], dtype=int)
The next four graphs show the two-year probabilities of transitioning between the five self-reported health states conditional on survival, as well as the survival probability for each initial health state and age. The model estimates annual probabilities from biennial HRS data, so when comparing the estimates to raw data, these need to be transformed to two-year horizons.
The estimation is performed separately by race and gender for the male/female and black/nonblack subpopulations.
Shaded areas represent bootstrapped 95% confidence intervals (not included in the data files).
The next two graphs show the annual transition probabilities which correspond to the contents of the data files.
To perform simulations using the above transition and survival probabilities, an initial health distribution is required.
- We provide the empirical distribution over health at age 50-51 and age 70-71 observed in the HRS in the files Health-5/CSV/H5_dist_health.csv or Health-5/Excel/H5_dist_health.xlsx for black/nonblack and male/female groups.
- These population shares are computed from the estimation sample using the respondent-level weights.