-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
308 lines (242 loc) · 9.6 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%",
dev = "ragg_png"
)
```
# chronogram
<!-- badges: start -->
[![Lifecycle: stable](https://img.shields.io/badge/lifecycle-stable-brightgreen.svg)](https://lifecycle.r-lib.org/articles/stages.html#stable) [![R-CMD-check](https://github.com/FrancisCrickInstitute/chronogram/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/FrancisCrickInstitute/chronogram/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->
The goal of `chronogram` is to "cast" and annotate metadata, laboratory and clinical data into a tidy-like data structure. This bridges between a LIMS / database style data warehouse and data that is ready for interrogation to test biological hypotheses.
`chronogram` was designed during the SARS-CoV-2 pandemic (2019-). However, it is pathogen, vaccine and symptoms agnostic.
------------------------------------------------------------------------
## Installation
Install the current version from [GitHub](https://github.com/):
``` r
# install.packages("devtools")
devtools::install_github("FrancisCrickInstitute/chronogram")
```
If you have not installed packages from github before, you will need to [setup your GitHub account to interact with R](https://usethis.r-lib.org/articles/git-credentials.html#practical-instructions).
------------------------------------------------------------------------
## Why should I use `chronogram`?
- To aggregate study data **regularly 🕓**, and **repetitively 🔁**. Perhaps your study has rolling recruitment, ongoing data generation or incremental analysis. Outsource that effort to `chronogram`.
- To **reproducibly aggregate** data within and **across several studies** & **users 👩💻👨💻**. Stop troubleshooting joins by hand.
- To provide a **versatile** data shape **poised** for **new or follow-up analyses** without needing re-aggregation 🛫.
***When shouldn't I use `chronogram`?***
Your study is **completed**. You have assembled a clean, de-duplicated and fully annotated data object. You have **finished all data analysis**. Congratulations! 🥳 Don't reinvent the wheel here.
------------------------------------------------------------------------
## How do I use `chronogram`?
The `chronogram` workflow can be divided into assembly, annotation and finally, filtering, windowing and selecting data for a specific analysis.
### chronogram assembly
- `cg_assemble()` combines cleaned metadata, experimental data, and a range of calendar dates into a chronogram.
- `cg_add_experiment()` allows the adding of further experiments.
Further details:
- [assembly](articles/assembly.html) vignette for a step-by-step guide, or the [quickstart](articles/chronogram.html).
- [SQL vignette](articles/SQL_assembly.html) explains `chronogram` assembly from an SQL database.
- An introduction to the [chronogram class](articles/chronogram_class.html).
### chronogram annotation
#### Annotate vaccines
```{r echo=FALSE, warning=FALSE, message=FALSE, fig.height=3}
library(chronogram)
library(dplyr)
library(ggplot2)
data(built_smallstudy)
cg <- built_smallstudy$chronogram
cg <- cg_add_experiment(cg, built_smallstudy$infections_to_add)
cg <- cg_annotate_vaccines_count(
cg,
## the prefix to the dose columns: ##
dose = dose,
## the output column name: ##
dose_counter = dose_number,
## the prefix to the date columns: ##
vaccine_date_stem = date_dose,
## use 14d to 'star' after a dose ##
intermediate_days = 14
) %>%
mutate(dose_number = factor(dose_number,
levels = c(
"0",
"1star",
"1",
"2star",
"2"
)
))
## plot over time ##
cg %>%
ggplot(
aes(
x = calendar_date,
y = elig_study_id,
fill = dose_number
)
) +
geom_tile(height = 0.5) +
scale_fill_grey(end = 0.2, start = 0.8) +
theme_bw()
```
Label each day for each participant with the number of doses they have received, including support for a lag period between the reciept of a dose and its immunological priming effect. [Annotate vaccines here](articles/annotate_vaccines.html).
#### Annotate infection episodes
```{r echo=FALSE, warning=FALSE, message=FALSE, results="none"}
library(chronogram)
library(dplyr)
library(ggplot2)
data(built_smallstudy)
cg <- built_smallstudy$chronogram
cg <- cg_add_experiment(cg, built_smallstudy$infections_to_add)
cg <- cg_annotate_episodes_find(
cg,
infection_cols = c("LFT", "PCR", "symptoms"),
infection_present = c("pos", "Post", "^severe"),
episode_days = 90
)
cg <- cg %>%
mutate(
episode_variant =
case_when(
# "is an episode" & "PCR positive" -> Delta #
(!is.na(episode_number)) & PCR == "Pos" ~ "Delta",
# "is an episode" & "PCR unavailable" -> Anc/Delta #
(!is.na(episode_number)) & PCR == "not tested" ~ "Anc/Alpha"
)
)
cg <- cg %>%
cg_annotate_episodes_fill(col_to_fill = episode_variant,
col_to_return = episode_variant_filled)
## plot over time ##
cg %>%
ggplot(
aes(
x = calendar_date,
y = elig_study_id,
fill = episode_variant_filled
)
) +
geom_tile(height = 0.1, width = 5) +
geom_point(data = . %>%
## filter to LFT results that are present ##
filter(!is.na(LFT)) %>%
## preceed with "LFT" to allow a single colour guide ##
mutate(LFT = paste("LFT", LFT)),
aes(col = LFT),
shape = "I",
size = 4,
position = position_nudge(y=0.4)) +
## repeat LFT approach for PCR
geom_point(data = . %>%
filter(!is.na(PCR)) %>%
mutate(PCR = paste("PCR", PCR)),
aes(col = PCR),
shape = "I",
size = 4,
position = position_nudge(y=0.6)) +
## repeat LFT approach for symptoms
geom_point(data = . %>%
filter(!is.na(symptoms)) %>%
mutate(symptoms = paste("symptoms", symptoms)),
aes(col = symptoms),
shape = "I",
size = 4,
position = position_nudge(y=0.2)) +
## swap the fill scale, and stop colour being included in this guide ##
scale_fill_grey(name = "infection episode",
na.translate = FALSE,
na.value = NA,
guide = guide_legend(override.aes = list(color = NA))) +
## swap the colour scale ##
scale_color_brewer(type = "qual",palette = 2,
name = "nasopharyngeal testing & symptoms") +
theme_bw() +
theme(legend.position = "right",
legend.direction = "vertical")
```
Symptoms, point-of-care tests, and laboratory tests of infection rarely occur on exactly the same study day. `chronogram` finds, fills and annotates these tests and symptoms into episodes of infection. [Annotate episodes here](articles/annotate_episodes.html).
#### Annnotate exposures
```{r echo=FALSE, warning=FALSE, message=FALSE,results="none", fig.height=7}
library(chronogram)
library(dplyr)
library(ggplot2)
library(patchwork)
cg <- cg_annotate_vaccines_count(
cg,
## the prefix to the dose columns: ##
dose = dose,
## the output column name: ##
dose_counter = dose_number,
## the prefix to the date columns: ##
vaccine_date_stem = date_dose,
## use 14d to 'star' after a dose ##
intermediate_days = 14
) %>%
mutate(dose_number = factor(dose_number,
levels = c(
"0",
"1star",
"1",
"2star",
"2"
)
))
cg_exposures <- cg %>% cg_annotate_exposures_count(
episode_number = episode_number,
dose_number = dose_number,
## we have not considered episodes of seroconversion
N_seroconversion_episode_number = NULL
)
cg_exposures <- cg_exposures %>%
mutate(
episode_variant_summarised =
episode_variant_filled
) %>%
cg_annotate_antigenic_history(
episode_number = episode_number,
dose_number = dose_number,
episode_variant_summarised = episode_variant_summarised,
ag_col = antigenic_history
)
## Plot ##
top_panel <- cg_exposures %>%
select(calendar_date,
exposure_number,
elig_study_id,
antigenic_history) %>%
ggplot(aes(
x = calendar_date, y = exposure_number,
col = elig_study_id
)) +
geom_line() +
facet_grid(antigenic_history ~ .)
swimmers_panel <- cg_plot_meta(cg_exposures,
visit = serum_Ab_S
) +
## set the axes to match top_panel ##
xlim(
min(cg_exposures$calendar_date),
max(cg_exposures$calendar_date)
) +
scale_y_discrete(limits = factor(c(3, 2, 1)))
top_panel / swimmers_panel & theme_bw() &
theme(
legend.position = "bottom",
strip.text.y = element_text(angle = 0),
strip.background = element_blank(),
panel.grid.minor = element_blank()
)
```
After annotating vaccines and infection episodes, these can be combined to [annotate exposures](articles/annotate_exposures.html) - encounters with antigen from either infection or vaccination.
------------------------------------------------------------------------
### chronogram filtering, window and select
- `dplyr::filter()` to filter a chronogram based on metadata (eg vaccine formulation)
- `cg_window_by_metadata()` to window around an event such as 14 days after each participant's vaccine
- `cg_window_by_episode()` picks a window around infection episodes
See these functions at work in our [brief primer](articles/stats.html) demonstrating how to a pass `chronogram` to a variety of statistical tests.
------------------------------------------------------------------------