-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathcalculate_mmrates.Rmd
199 lines (158 loc) · 9.85 KB
/
calculate_mmrates.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
---
title: "Calculating MMRates from DHS survey data"
author: "Evan Brand"
date: "4/23/2019"
output: html_document
---
## Introduction
The following is a method of calculating age-disaggregated maternal mortality rates (MMRates) from DHS survey data in R. The DHS estimates MMRates using the sisterhood method, by which individual women are asked for information about their sisters. The sisters are the units of analysis for maternal mortality calculations.
The information we'll need is:
* Sisters' dates of birth
* Deceased sisters' dates of death
* Whether deceased sisters had a pregnancy-related death
The libraries we need for data import and wrangling:
```{r message=F, warning=F}
library(tidyverse)
library(haven)
```
## Getting the data in
This example uses the DHS VI model data, which are synthetic data created to look like real DHS microdata. We need the individual recode, which contains one record for each women interviewed. (Note: the ZZIR62FL.dta file is not included in the repo for this tutorial but can be downloaded [here](https://dhsprogram.com/data/Download-Model-Datasets.cfm).)
```{r eval=FALSE}
women = read_dta("ZZIR62FL.dta")
```
Each interviewed woman's siblings are recorded in different columns. There are twenty copies of each sibling variable in order to allow plenty of space to record data for all siblings. For example, the sibling survival status variable is `mm2` so the dataset contains columns `mm2_01`, `mm2_02`, ..., `mm2_20`.
The first step is to select the variables we'll need. Sampling weights are recorded without a decimal so they need to be divided by 1,000,000 before we use them:
```{r eval=FALSE}
mmrate_data = select(women,
caseid,
weight = v005,
int_date = v008,
starts_with("mmidx_"),
starts_with("mm1_"),
starts_with("mm2_"),
starts_with("mm4_"),
starts_with("mm8_"),
starts_with("mm9_"))
mmrate_data$weight = mmrate_data$weight / 1000000
```
It's helpful to convert the categorical variables from labelled Stata variables to R factor variables so we can work with the level labels directly:
```{r eval=FALSE}
mmrate_data = mmrate_data %>%
mutate_at(vars(starts_with("mm1_"),
starts_with("mm2_"),
starts_with("mm9_")),
as_factor)
```
## Reshaping the data
Now we need to reshape the data so that each observation is a sibling. We can reshape with the following pipe:
```{r eval=FALSE}
mmrate_data = mmrate_data %>%
gather(key, value, starts_with("mm")) %>%
separate(key, c("variable", "sib_num"), "_") %>%
spread(variable, value, convert=T)
```
`gather` gathers all the `mm` variable names into a `key` column, `separate` splits this column at the underscore into the root of the variable name (e.g., `mm9`) and the sibling number suffix, and `spread` maps the variable name roots back to columns.
Next we'll give our variables more descriptive names and keep only female siblings whose survival status is known.
```{r eval=FALSE}
mmrate_data = mmrate_data %>%
rename(sex = mm1,
alive = mm2,
birth_date = mm4,
death_date = mm8,
maternal_death = mm9) %>%
filter(sex == "female",
alive %in% c("alive", "dead"))
```
Here is how the data look now (first 10 rows):
```{r echo=FALSE}
mmrate_data = read.csv("mmrate_data.csv")
```
```{r}
knitr::kable(head(mmrate_data, 10))
```
## Calculating time spent in (up to) three age groups
Calculating the numerator for the MMRate is straightforward: number of maternal deaths in each age group in the past seven years. Calculation of the denominator, person-years of “exposure”, is more complicated. In effect, this is asking the question: in the past seven years, how many years did all women collectively spend in each (five-year) age group?
It's possible for women to have spent time in up to three age groups in the past seven years. For example, if a women was 36 at the time she was interviewed, she spent time in the 25-29, 30-34 and 35-39 age groups in the past seven years. In other words, she had “exposure” to each of those age groups. So we will create six variables to calculate exposure:
* `last_age_group`: a variable to store the age group a woman currently belongs to (at the time of interview) or died in if she died in the past seven years
* `mid_age_group`: a variable to store the age group below a woman’s current age group or the age group she died in
* `first_age_group`: a variable to store two age groups below a woman’s current age group or the age group she died in - i.e., the first age group she could have been exposed to.
* `expo_1`: the amount of time, in months, a woman spent in last_age_group in the past seven years
* `expo_2`: the amount of time, in months, a woman spent in mid_age_group in the past seven years
* `expo_3`: the amount of time, in months, a woman spent in first_age_group in the past seven years (will be zero for surviving women who only spent time in two age groups in the past seven years)
We'll record age groups as 0 for 0-4, 1 for 5-9, 2 for 10-14, etc. Since `last_age_group` is the age group of death for deceased women, we will also count number of deaths in `last_age_group`.
The first step is to calculate total exposure during the last seven years. For surviving females older than 7 years old, this will be 84 months (DHS stores dates as century month codes, so we'll be working in months for now). For women who have died, this will be the amount of time they were alive during the past seven years before they died. For sisters younger than 7 years old, this will be their age in months - though sisters younger than 15 years old at the time of interview/death will be filtered out eventually anyway.
```{r}
mmrate_data = mmrate_data %>%
mutate(
upper_lim = pmin(int_date - 1, death_date, na.rm = T),
lower_lim = pmax(int_date - 84, birth_date),
total_exposure = upper_lim - lower_lim + 1
) %>%
filter(total_exposure > 0)
```
We'll add the age group and exposure variables as follows:
* `last_age_group`: floor((`upper_lim` - `birth_date`) / 60), which evaluates to 0 for sisters aged 0-4 at the time of interview/death, 1 for sisters aged 5-9 at the time of interview/death, etc.
* `mid_age_group`: `last_age_group` - 1
* `first_age_group`: `mid_age_group` - 1
* `expo_1`: the minimum of `total_exposure` and `upper_lim` - (`birth_date` + `last_age_group` * 60) + 1). The latter is the number of months between time of interview/death and when the woman entered her current age group - `total_exposure` will be less than this for deceased women who died shortly after seven years ago so we take the minimum of the two.
* `expo_2`: the minimum of 60 and `total_exposure` - `expo_1` - either a woman spanned the entire middle age group in the past seven years, which is 60 months wide, or she didn't and her remaining exposure, `total_exposure` - `expo_1`, is less and all gets allocated to the middle group.
* `expo_3`: `total_exposure` - `expo_1` - `expo_2` - whatever remains, if anything, after exposure has been allocated to the first two age groups.
We'll also collapse the levels of `maternal_death` that are considered pregnancy-related deaths.
```{r warning=F}
mmrate_data = mmrate_data %>%
mutate(
last_age_group = floor((upper_lim - birth_date) / 60),
mid_age_group = last_age_group - 1,
first_age_group = mid_age_group - 1,
expo_1 = pmin(total_exposure,
upper_lim - (birth_date + last_age_group * 60) + 1),
expo_2 = pmin(60, total_exposure - expo_1),
expo_3 = total_exposure - expo_1 - expo_2,
maternal_death = fct_collapse(maternal_death,
"pregnancy-related" = c("died while pregnant",
"died during delivery",
"since delivery",
"6 weeks after delivery",
"2 months after delivery")
)
)
```
Now we'll summarize weighted exposure by age group, keeping only age groups 3-9 (15-19 to 45-49). We need to summarize weighted pregnancy-related deaths by `last age group` since this was the age group of death for deceased sisters.
```{r}
last_age_sums = mmrate_data %>%
group_by(last_age_group) %>%
summarize(
total_exp_1 = sum(weight * expo_1 / 12),
total_deaths = sum(weight * (maternal_death == "pregnancy-related"),
na.rm=T)) %>%
filter(last_age_group %in% 3:9)
mid_age_sums = mmrate_data %>%
group_by(mid_age_group) %>%
summarize(
total_exp_2 = sum(weight * expo_2 / 12)) %>%
filter(mid_age_group %in% 3:9)
first_age_sums = mmrate_data %>%
group_by(first_age_group) %>%
summarize(
total_exp_3 = sum(weight * expo_3 / 12)) %>%
filter(first_age_group %in% 3:9)
```
The last step is to stitch together the three exposure tables and calculate maternal mortality rates.
```{r}
mmrate = last_age_sums %>%
transmute(age_group = paste(last_age_group * 5, "-",
(last_age_group + 1) * 5 - 1,
sep = ""),
maternal_deaths = total_deaths,
exposure_years = total_exp_1 +
mid_age_sums$total_exp_2 +
first_age_sums$total_exp_3,
maternal_mortality_rate = 1000 * maternal_deaths / exposure_years)
```
The final table, `mmrate`:
```{r echo=FALSE}
mmrate %>%
mutate_if(is.numeric, round, 1) %>%
knitr::kable()
```
This table matches the maternal mortality rate table for the model data provided by DHS. It also illustrates how maternal mortality rates fail to control for differences in fertility across groups - MMRates trail off for older age groups mostly just because pregnancies are less common in those age groups.