R/Part_4_Outcomes.qmd

---
title: "Preoperative Atelectasis"
subtitle: "Part 4: Outcomes"
author: "Javier Mancilla Galindo"
date: "`r Sys.Date()`"
execute: 
  echo: false
  warning: false
toc: true
toc_float: true
format:
  html:
    embed-resources: true
  pdf:
    documentclass: scrartcl
editor: visual
---

\pagebreak

# Setup

#### Packages used

```{r}
#| echo: true
if (!require("pacman", quietly = TRUE)) {
  install.packages("pacman")
}


pacman::p_load(
  tidyverse, # Used for basic data handling and visualization.
  table1, #Used to add lables to variables.
  RColorBrewer, #Color palettes for data visualization. 
  gridExtra, #Used to arrange multiple ggplots in a grid.  
  grid, #Used to arrange multiple ggplots in a grid.
  mgcv, #Used to model non-linear relationships with a general additive model.  
  ggmosaic, #Used to create mosaic plots.   
  car, #Used assess distribution of continuous variables (stacked Q-Q plots).
  simpleboot, boot, # Used to calculate mean atelectasis coverage and 
                   # 95%CI through bootstrapping.
  gt, #Used to present tables in html format. 
  report #Used to cite packages used in this session.
)
```

##### Session and package dependencies

```{r}
# Credits chunk of code: Alex Bossers, Utrecht University (a.bossers@uu.nl)

# remove clutter
session <- sessionInfo()
session$BLAS <- NULL
session$LAPACK <- NULL
session$loadedOnly <- NULL
# write log file
writeLines(
  capture.output(print(session, locale = FALSE)),
  paste0("sessions/",lubridate::today(), "_session_Part_4.txt")
)

session
```

Set seed (for reproducibility of bootstrapping) as the current year 2023:

```{r}
#| echo: true
seed <- 2023 
```

```{r}
#| include: false

# Create directories for sub-folders 
figfolder <- "../results/output_figures"
dir.create(figfolder, showWarnings = FALSE)
```

```{r}
#| include: false

# Load dataset  
data <- read.csv("../data/processed/atelectasis_included.csv",
                 na.strings="NA", 
                 row.names = NULL)
# Recode variables 
source("scripts/variable_names.R")

```

\pagebreak

# Outcome variable

```{r}
#| include: false

attach(data)
```

Corroborate that atelectasis(Yes/No) matches atelectasis percent equal or different to 0%:

```{r}
table(atelectasis,atelectasis_percent)
```

Yes, these do match.

## Prevalence of atelectasis

```{r}
frequencies <- table(atelectasis)
percent <- round((prop.table(frequencies)*100),1)
total <- rbind(frequencies,percent)
total
```

Prevalence of atelectasis with 95% confidence interval

```{r}
prev_atelectasis <- prop.test(frequencies, correct=FALSE)
confint <- round((prev_atelectasis$conf.int*100),2)
prev_atelectasis
```

> The prevalence of atelectasis was **`r percent[1]` (95%CI: `r confint`)**.

```{r}
#| include: false
rm(frequencies, percent, prev_atelectasis, total, confint)
```

## Atelectasis - obesity class

Mean expected frequency:

```{r}
mean_exp <- data %>% 
  drop_na(type_obesity, atelectasis) %>% 
  summarize(mean_expected_freq = n()/(nlevels(type_obesity)*nlevels(atelectasis)))

mean_exp
```

Frequencies:

```{r}
frequencies <- table(type_obesity, atelectasis)
frequencies
```

Percentage:

```{r}
round(prop.table(frequencies,1),4)*100
```

Mosaic Plot

```{r, fig.height=3, fig.width=5}
data %>% 
  mutate(atelectasis = fct_relevel(atelectasis, "No", "Yes")) %>%
ggplot() +
  geom_mosaic(
    aes(x = product(atelectasis,type_obesity), 
        fill=atelectasis),
    na.rm = TRUE
    ) +
  scale_fill_manual(values=c("grey95","lightsteelblue4")) +
  labs(
    y = "Atelectasis",
    x = "Obesity class"
  ) +
  theme_mosaic() + 
  theme(axis.text.x=element_text(size=rel(0.8)))
```

```{r}
chi <- chisq.test(frequencies, correct=FALSE)
chi
```

```{r}
#| include: false
rm(mean_exp,frequencies, chi)
```

#### Atelectasis location by obesity class

Mean expected frequency:

```{r}
mean_exp <- data %>% 
  drop_na(type_obesity, atelectasis_location) %>% 
  summarize(mean_expected_freq = n()/(nlevels(type_obesity)*nlevels(atelectasis_location)))

mean_exp
```

Mean expected frequency is greater than 5.0, so chi-squared without continuity correction is adequate.

Frequencies:

```{r}
frequencies <- table(type_obesity, atelectasis_location)
frequencies
```

Percentage:

```{r}
round(prop.table(frequencies,1),4)*100
```

Mosaic Plot

```{r, fig.height=3, fig.width=6}
ggplot(data = data) +
  geom_mosaic(
    aes(x = product(type_obesity,atelectasis_location),
        fill=type_obesity),
    na.rm = TRUE
    ) +
  scale_fill_manual(values=c("seagreen1","seagreen3","seagreen4")) +
  labs(
    y = "Obesity class",
    x = "Atelectasis location"
  ) +
  theme_mosaic() + guides(fill="none")
```

```{r}
#| warning: false
chi <- chisq.test(frequencies, correct=FALSE)
chi
```

Prevalence of atelectasis with 95% confidence intervals calculated with sourced script ***Prevalence_atelectasis.R***

```{r}
#| include: false

source("scripts/Prevalence_atelectasis.R", local = knitr::knit_global())
```

> The prevalence of atelectasis was greater in higher obesity classes: class 1, n=`r atelectasis$n[3]` (`r atelectasis$prev[3]`%, 95%CI:`r atelectasis$confint[3]`); class 2, n=`r atelectasis$n[5]` (`r atelectasis$prev[5]`%, 95%CI:`r atelectasis$confint[5]`); and class 3, n=`r atelectasis$n[7]` (`r atelectasis$prev[7]`%, 95%CI:`r atelectasis$confint[7]`) (p\<0.001).

> Of those who had atelectasis, the most frequent presentation was the right lung base predominance n=`r atelectasis$Unilateral[1]`, compared to bilateral lung bases n=`r atelectasis$Bilateral[1]`. When examining this by obesity class, the observed distribution was not significantly different for those with class 1, 2, and 3 obesity categories (n=`r atelectasis$Unilateral[3]`, n=`r atelectasis$Unilateral[5]`, and n=`r atelectasis$Unilateral[7]`, respectively) (p=`r round(chi$p.value,3)`).

```{r}
#| include: false 
rm(mean_exp, frequencies, chi, location, atelectasis_obesity, atelectasis_total, atelectasis_total_location, atelectasis_obesity_location, atelectasis)
```

#### Atelectasis Percent

```{r}
#| include: false 
## Change 'false' for 'true' above to show plot. 

# Assess distribution if assumed to be a numeric variable:   
data %>% 
  mutate(atelectasis_percent = as.numeric(atelectasis_percent)) %>%
  ggplot(aes(x = atelectasis_percent)) +
  geom_histogram(colour = "black") +
  facet_grid(type_obesity ~ .)

```

```{r}
#| include: false 
## Change 'false' for 'true' above to show plot. 

# Distribution excluding 0:  
data %>% 
  mutate(atelectasis_percent = as.numeric(atelectasis_percent)) %>% 
  filter(atelectasis_percent >0) %>% 
  ggplot(aes(x = atelectasis_percent)) +
  geom_histogram(colour = "black") +
  facet_grid(type_obesity ~ .)

```

##### Mean atelectasis percentage

The following would be the mean atelectasis percentage coverage if a normal distribution were assumed, which is what has been done in some prior studies:

```{r}
data %>%  summarize(
    mean = mean(atelectasis_percent),
    sd = sd(atelectasis_percent)
  ) 
```

And by obesity class:

```{r}
data %>% group_by(type_obesity) %>%
  summarize(
    mean = mean(atelectasis_percent),
    sd = sd(atelectasis_percent)
  ) 
```

As is evident from these numbers, assuming normality causes standard deviation to capture negative values, which is impossible in reality for this variable.

Thus, bootstrapping the mean and 95%CI is expected to lead to more appropriate estimates.

Mean by bootstrapping for the total sample:

```{r}
set.seed(seed)
boot_atel <- one.boot(data$atelectasis_percent, mean, R=10000)
mean_boot <- mean(boot_atel$t)
mean_boot
```

Bootstrap 95% confidence intervals:

```{r}
#| warning: false
boot_ci <- boot.ci(boot_atel)
boot_ci
```

The bias-corrected and accelerated (BCa) bootstrap interval is known to lead to more stable intervals with better coverage. Will report this. However, it is a good thing that here 95%CI through different methods do not lead to widely different results.

Now, I will calculate this for different BMI categories:

```{r}
data_class_1 <- data %>% filter(type_obesity=="Class 1 Obesity") 
data_class_2 <- data %>% filter(type_obesity=="Class 2 Obesity") 
data_class_3 <- data %>% filter(type_obesity=="Class 3 Obesity") 
```

###### Class 1

Mean:

```{r}
set.seed(seed)
boot_class1 <- one.boot(data_class_1$atelectasis_percent, mean, R=10000)
mean_boot_class1 <- mean(boot_class1$t)
mean_boot_class1
```

95% CI:

```{r}
#| warning: false
boot_ci_class1 <- boot.ci(boot_class1)
boot_ci_class1
```

###### Class 2

Mean:

```{r}
set.seed(seed)
boot_class2 <- one.boot(data_class_2$atelectasis_percent, mean, R=10000)
mean_boot_class2 <- mean(boot_class2$t)
mean_boot_class2
```

95% CI:

```{r}
#| warning: false
boot_ci_class2 <- boot.ci(boot_class2)
boot_ci_class2
```

###### Class 3

Mean:

```{r}
set.seed(seed)
boot_class3 <- one.boot(data_class_3$atelectasis_percent, mean, R=10000)
mean_boot_class3 <- mean(boot_class3$t)
mean_boot_class3
```

95% CI:

```{r}
#| warning: false
boot_ci_class3 <- boot.ci(boot_class3)
boot_ci_class3
```

> The mean atelectasis percentage coverage in the sample was `r round(mean_boot,2)`% (95%CI:`r round(boot_ci$bca[1,4],2)`-`r round(boot_ci$bca[1,5],2)`) and according to obesity categories: class 1 (`r round(mean_boot_class1,2)`%, 95%CI:`r round(boot_ci_class1$bca[1,4],2)`-`r round(boot_ci_class1$bca[1,5],2)`), class 2 (`r round(mean_boot_class2,2)`%, 95%CI:`r round(boot_ci_class2$bca[1,4],2)`-`r round(boot_ci_class2$bca[1,5],2)`), and class 3 (`r round(mean_boot_class3,2)`%, 95%CI:`r round(boot_ci_class3$bca[1,4],2)`-`r round(boot_ci_class3$bca[1,5],2)`).

What could happen if I took random subsamples between n=20 and n=30 similar to what has been done in other studies and calculate mean BMI and mean atelectasis percentage assuming normal distributions?

Random sample

```{r}
set.seed(seed)
random_sample_1 <- sample_n(data,20)

random_sample_1 %>% summarize(
  mean_BMI =  mean(BMI),
  mean_atelectasis = mean(atelectasis_percent)
  ) 
```

```{r}
set.seed(seed)
random_sample_2 <- sample_n(data,25)

random_sample_2 %>% summarize(
  mean_BMI =  mean(BMI),
  mean_atelectasis = mean(atelectasis_percent)
  ) 
```

```{r}
set.seed(seed)
random_sample_3 <- sample_n(data,30)

random_sample_3 %>% summarize(
  mean_BMI =  mean(BMI),
  mean_atelectasis = mean(atelectasis_percent)
  ) 
```

```{r}
#| include: false
rm(list=setdiff(ls(pattern = "boot"), lsf.str()))
rm(list=setdiff(ls(pattern = "class"), lsf.str()))
rm(list=setdiff(ls(pattern = "sample"), lsf.str()))
```

##### Atelectasis percentage by obesity class

Now, I will continue assessing atelectasis percentage if assumed to be categorical ordinal:

Mean expected frequency:

```{r}
mean_exp <- data %>% 
  mutate(
    atelectasis_percent=factor(atelectasis_percent)) %>%
  summarize(
    mean_expected_freq = n()/(nlevels(type_obesity)*nlevels(atelectasis_percent))
    )
mean_exp
```

Mean expected frequency is greater than 5.0, so chi-squared without continuity correction is adequate.

Frequencies:

```{r}
frequencies <- table(atelectasis_percent,type_obesity)
frequencies
```

Percentage by obesity class

```{r}
prop_fig2a <- prop.table(frequencies,margin=2)
round(prop_fig2a*100,2)
```

Barplot of absolute frequencies:

```{r}
barplot(frequencies,beside=TRUE)
```

```{r}
#| warning: false
chi <- chisq.test(frequencies, correct=FALSE)
chi
```

##### Barplot of atelectasis percentage by obesity class category

```{r, fig.height=5, fig.width=8}
barplot(prop_fig2a,beside=TRUE,ylim=c(0,1),ylab="Relative frequency",
        col=brewer.pal(9,"Blues"),
        legend.text=c("0%","2.5%","5%","7.5%","10%","12.5%","15%","17.5%","27.5%"),
        space = c(0.2, 1.5)
        )
Figure2a <- recordPlot()
```

```{r}
#| include: false
rm(mean_exp,frequencies,percent,frequencies,chi,Figure2a,prop_fig2a)
```

##### Smooth term?

```{r}
plot(atelectasis_percent~BMI, 
     main="Scatterplot", 
     xlab="Body mass index (kg/m²)", 
     ylab="Atelectasis percent (%)"
     )
```

Atelectasis percent seems to increase as BMI increases. However, relationship is not linear.

```{r}
#| include: false 
## Change 'false' for 'true' above to show plot. 

# Would a smooth term be more useful to model atel_percent? Fit with loess:   
ggplot(data, aes(BMI,atelectasis_percent)) + 
  geom_point(size=0.6,color="gray40") + 
  geom_smooth(method="loess", color="darkblue") +
  ylab("Atelectasis percent (%)") + 
  xlab("Body mass index (kg/m²)")  +
  theme_bw() + 
  theme(panel.border = element_blank(),
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(), 
        axis.line = element_line(colour = "black"),
        axis.text.x = element_text(size=rel(1.2)),
        axis.text.y = element_text(size=rel(1.2))
        )
```

Models evaluated with the accompanying sourced script ***nonlinear_BMI_Atelectasis.R***

```{r}
#| include: false

source("scripts/nonlinear_BMI_Atelectasis.R", local = knitr::knit_global())
```

All models are significantly better than linear. Thus, using a smooth term for BMI to predict atelectasis percent is better than modelling a linear relationship.

Best AIC:

```{r}
data.frame(
  model = c("k=2","k=4","k=6","k=8"),
  AIC = round(c(AIC_k2,AIC_k4,AIC_k6,AIC_k8),1)
) %>% gt()
```

Regarding AIC, greatest improvement in AIC is k=6. Will model with k=5 and k=7 to compare

Best AIC:

```{r}
data.frame(
  model = c("k=5","k=6","k=7"),
  AIC = round(c(AIC_k5,AIC_k6,AIC_k7),1)
) %>% gt()
```

k=6 offers the lowest AIC. Will keep k=6 to model.

```{r}
fig1a
```

Positive non-monotonic relationship since atelectasis increases as BMI increases only after \~BMI equal to 42.

Will assess Spearman's correlation again only to have a rough idea (will not report this in the paper since the relationship is not monotonic):

```{r}
spearman <- cor.test(spo2_VPO,atelectasis_percent,
                     method="spearman",
                     exact=FALSE
                     )
spearman
```

> Atelectasis percent exhibited a negative non-linear non-monotonic relationship with SpO2 (**Figure 1A**, rho= `r round(spearman$estimate,3)`, p\<0.001).

Note that this p-value refers to the smooth term vs linear as assessed in GAM models.

Interestingly, this figure is almost a mirror image of the priorly created plot for SpO2 \~ BMI.

```{r}
#| include: false
rm(AIC_k2,AIC_k4,AIC_k5,AIC_k6,AIC_k7,AIC_k8,model,spearman,fig1a)
```

## Atelectasis - age

```{r}
boxplot(age~atelectasis,
        ylab="Age",
        xlab="Atelectasis"
        )
```

Assess distribution of age by atelectasis (yes/no):

```{r}
data_age <- data %>% group_by(atelectasis)
```

```{r}
#| include: false
## Change 'false' for 'true' above to show plot. 

#Assess distribution of age by atelectasis (yes/no):   
ggplot(data_age,aes(x = age)) +
  geom_histogram(fill = "lightsteelblue4", colour = "black") +
  facet_grid(atelectasis ~ .)
```

```{r}
#| include: false 
## Change 'false' for 'true' above to show plot. 

qqPlot(age ~ atelectasis, data=data_age)
```

Distribution near-normal, will assess mean and variance for further testing:

```{r}
mean_age <- data_age %>% 
  summarise(n=n(),
            age_mean = round(mean(age),2),
            sd = round(sd(age),2),
            variance = round(var(age),4)
            )
mean_age %>% gt()
```

Variances near-similar, but group sizes differ. Welch's t-test more suitable:

```{r}
t_test <- t.test(age ~ atelectasis, data = data_age)
t_test
```

> Age was similarly distributed among patients without atelectasis (`r round(mean_age$age_mean[2],1)`, sd:`r round(mean_age$sd[2],1)`) and those with atelectasis (`r round(mean_age$age_mean[1],1)`, sd:`r round(mean_age$sd[1],1)`) (p=`r round(t_test$p.value,3)`).

```{r}
#| include: false 
rm(data_age,mean_age,t_test)
```

## Atelectasis - sex

Mean expected frequency:

```{r}
mean_exp <- data %>% 
  drop_na(sex, atelectasis) %>% 
  summarize(mean_expected_freq = n()/(nlevels(sex)*nlevels(atelectasis)))

mean_exp
```

Mean expected frequency is greater than 5.0, so chi-squared without continuity correction is adequate.

Frequencies:

```{r}
frequencies <- table(sex, atelectasis)
frequencies
```

Percentage:

```{r}
percent <- round((prop.table(frequencies, 1)*100),2)
percent
```

Mosaic Plot

```{r, fig.height=3, fig.width=5}
data %>% 
  mutate(atelectasis = fct_relevel(atelectasis, "No", "Yes")) %>%
ggplot() +
  geom_mosaic(
    aes(x = product(atelectasis,sex), 
        fill=atelectasis),
    na.rm = TRUE
    ) +
  scale_fill_manual(values=c("grey95","lightsteelblue4")) +
  labs(
    y = "Atelectasis",
    x = "Sex"
  ) +
  theme_mosaic() + 
  theme(axis.text.x=element_text(size=rel(0.8)))
```

```{r}
chi <- chisq.test(frequencies, correct=FALSE)
chi
```

> There were no significant differences in atelectasis ocurrence between men (`r round(percent[2,1],1)`%) and women (`r round(percent[1,1],1)`%) (p=`r round(chi$p.value,3)`).

```{r}
#| include: false 
rm(mean_exp,frequencies,percent,chi)
```

## Atelectasis - OSA

Mean expected frequency:

```{r}
mean_exp <- data %>% 
  drop_na(sleep_apnea, atelectasis) %>% 
  summarize(mean_expected_freq = n()/(nlevels(sleep_apnea)*nlevels(atelectasis)))

mean_exp
```

Mean expected frequency is greater than 5.0, so chi-squared without continuity correction is adequate.

Frequencies:

```{r}
frequencies <- table(sleep_apnea, atelectasis)
frequencies
```

Percentage:

```{r}
percent <- round((prop.table(frequencies, 1)*100),2)
percent
```

Mosaic Plot

```{r, fig.height=3, fig.width=5}
data %>% 
  mutate(atelectasis = fct_relevel(atelectasis, "No", "Yes")) %>%
ggplot() +
  geom_mosaic(
    aes(x = product(atelectasis,sleep_apnea), 
        fill=atelectasis),
    na.rm = TRUE
    ) +
  scale_fill_manual(values=c("grey95","lightsteelblue4")) +
  labs(
    y = "Atelectasis",
    x = "Obstructive sleep apnea"
  ) +
  theme_mosaic() + 
  theme(axis.text.x=element_text(size=rel(0.8)))
```

```{r}
chi <- chisq.test(frequencies, correct=FALSE)
chi
```

> Patients with a diagnosis of obstructive sleep apnea had atelectasis more frequently (`r round(percent[2,1],1)`%) than those without the diagnosis (`r round(percent[1,1],1)`%) (p\<0.001).

```{r}
#| include: false 
rm(mean_exp,frequencies, percent, chi)
```

#### Atelectasis location by OSA

Mean expected frequency:

```{r}
mean_exp <- data %>% 
  drop_na(sleep_apnea, atelectasis_location) %>% 
  summarize(mean_expected_freq = n()/(nlevels(sleep_apnea)*nlevels(atelectasis_location)))

mean_exp
```

Mean expected frequency is greater than 5.0, so chi-squared without continuity correction is adequate.

Frequencies:

```{r}
frequencies <- table(sleep_apnea, atelectasis_location)
frequencies
```

Percentage:

```{r}
percent <- round((prop.table(frequencies, 1)*100),2)
percent
```

Mosaic Plot

```{r, fig.height=3, fig.width=5}
ggplot(data = data) +
  geom_mosaic(aes(
    x = product(atelectasis_location,sleep_apnea),
    fill=atelectasis_location),
    na.rm = TRUE
    ) +
  scale_fill_manual(values=c("seagreen1","seagreen4")) +
  labs(
    y = "Atelectasis location",
    x = "Obstructive sleep apnea"
  ) +
  theme_mosaic() + guides(fill="none")
```

```{r}
chi <- chisq.test(frequencies, correct=FALSE)
chi
```

> The location of atelectasis was not different among patients with and without OSA (p=`r round(chi$p.value,3)`).

```{r}
#| include: false 
rm(mean_exp,frequencies, percent, chi)
```

## Atelectasis - Asthma

Mean expected frequency:

```{r}
mean_exp <- data %>% 
  drop_na(asthma, atelectasis) %>% 
  summarize(mean_expected_freq = n()/(nlevels(asthma)*nlevels(atelectasis)))

mean_exp
```

Mean expected frequency is greater than 5.0, so chi-squared without continuity correction is adequate.

Frequencies:

```{r}
frequencies <- table(asthma, atelectasis)
frequencies
```

Percentage:

```{r}
percent <- round((prop.table(frequencies, 1)*100),2)
percent
```

Mosaic Plot

```{r, fig.height=3, fig.width=5}
data %>% 
  mutate(atelectasis = fct_relevel(atelectasis, "No", "Yes")) %>%
ggplot() +
  geom_mosaic(
    aes(x = product(atelectasis,asthma), 
        fill=atelectasis),
    na.rm = TRUE
    ) +
  scale_fill_manual(values=c("grey95","lightsteelblue4")) +
  labs(
    y = "Atelectasis",
    x = "Asthma"
  ) +
  theme_mosaic() + 
  theme(axis.text.x=element_text(size=rel(0.8)))
```

```{r}
chi <- chisq.test(frequencies, correct=FALSE)
chi
```

> Patients with a diagnosis of asthma did not have atelectasis more frequently (`r round(percent[2,1],1)`%) than those without the diagnosis (`r round(percent[1,1],1)`%) (p=`r round(chi$p.value,3)`).

```{r}
#| include: false 
rm(mean_exp,frequencies, percent, chi)
```

#### Atelectasis location by asthma status

Mean expected frequency:

```{r}
mean_exp <- data %>% 
  drop_na(asthma, atelectasis_location) %>% 
  summarize(mean_expected_freq = n()/(nlevels(asthma)*nlevels(atelectasis_location)))

mean_exp
```

Mean expected frequency is greater than 5.0, so chi-squared without continuity correction is adequate.

Frequencies:

```{r}
frequencies <- table(asthma, atelectasis_location)
frequencies
```

Percentage:

```{r}
percent <- round((prop.table(frequencies, 1)*100),2)
percent
```

Mosaic Plot

```{r, fig.height=3, fig.width=5}
ggplot(data = data) +
  geom_mosaic(aes(
    x = product(atelectasis_location,asthma),
    fill=atelectasis_location),
    na.rm = TRUE
    ) +
  scale_fill_manual(values=c("seagreen1","seagreen4")) +
  labs(
    y = "Atelectasis location",
    x = "Asthma"
  ) +
  theme_mosaic() + guides(fill="none")
```

```{r}
chi <- chisq.test(frequencies, correct=FALSE)
chi
```

> The location of atelectasis was not different among patients with and without asthma (p=`r round(chi$p.value,3)`).

```{r}
#| include: false 
rm(mean_exp,frequencies, percent, chi)
```

## Atelectasis - COPD

Mean expected frequency:

```{r}
mean_exp <- data %>% 
  drop_na(COPD, atelectasis) %>% 
  summarize(mean_expected_freq = n()/(nlevels(COPD)*nlevels(atelectasis)))

mean_exp
```

Mean expected frequency is greater than 5.0, so chi-squared without continuity correction is adequate.

Frequencies:

```{r}
frequencies <- table(COPD, atelectasis)
frequencies
```

Percentage:

```{r}
percent <- round((prop.table(frequencies, 1)*100),2)
percent
```

Mosaic Plot

```{r, fig.height=3, fig.width=5}
data %>% 
  mutate(atelectasis = fct_relevel(atelectasis, "No", "Yes")) %>%
ggplot() +
  geom_mosaic(
    aes(x = product(atelectasis,COPD), 
        fill=atelectasis),
    na.rm = TRUE
    ) +
  scale_fill_manual(values=c("grey95","lightsteelblue4")) +
  labs(
    y = "Atelectasis",
    x = "COPD"
  ) +
  theme_mosaic() + 
  theme(axis.text.x=element_text(size=rel(0.8)))
```

```{r}
chi <- chisq.test(frequencies, correct=FALSE)
chi
```

> Patients with a diagnosis of COPD had atelectasis more frequently (`r round(percent[2,1],1)`%) than those without the diagnosis (`r round(percent[1,1],1)`%) (p\<0.001).

```{r}
#| include: false 
rm(mean_exp,frequencies, percent, chi)
```

#### Atelectasis location by COPD

Mean expected frequency:

```{r}
mean_exp <- data %>% 
  drop_na(COPD, atelectasis_location) %>% 
  summarize(mean_expected_freq = n()/(nlevels(COPD)*nlevels(atelectasis_location)))

mean_exp
```

Mean expected frequency is greater than 5.0, so chi-squared without continuity correction is adequate.

Frequencies:

```{r}
frequencies <- table(COPD, atelectasis_location)
frequencies
```

Percentage:

```{r}
percent <- round((prop.table(frequencies, 1)*100),2)
percent
```

Mosaic Plot

```{r, fig.height=3, fig.width=5}
ggplot(data = data) +
  geom_mosaic(aes(
    x = product(atelectasis_location,COPD),
    fill=atelectasis_location),
    na.rm = TRUE
    ) +
  scale_fill_manual(values=c("seagreen1","seagreen4")) +
  labs(
    y = "Atelectasis location",
    x = "COPD"
  ) +
  theme_mosaic() + guides(fill="none")
```

```{r}
chi <- chisq.test(frequencies, correct=FALSE)
chi
```

> The location of atelectasis was not different among patients with and without COPD (p=`r round(chi$p.value,3)`).

```{r}
#| include: false 
rm(mean_exp,frequencies, percent, chi)
```

## Atelectasis - Oxygen use

Mean expected frequency:

```{r}
mean_exp <- data %>% 
  drop_na(oxygen_use, atelectasis) %>% 
  summarize(mean_expected_freq = n()/(nlevels(oxygen_use)*nlevels(atelectasis)))

mean_exp
```

Mean expected frequency is greater than 5.0, so chi-squared without continuity correction is adequate.

Frequencies:

```{r}
frequencies <- table(oxygen_use, atelectasis)
frequencies
```

Percentage:

```{r}
percent <- round((prop.table(frequencies, 1)*100),2)
percent
```

Mosaic Plot

```{r, fig.height=3, fig.width=5}
data %>% 
  mutate(atelectasis = fct_relevel(atelectasis, "No", "Yes")) %>%
ggplot() +
  geom_mosaic(
    aes(x = product(atelectasis,oxygen_use), 
        fill=atelectasis),
    na.rm = TRUE
    ) +
  scale_fill_manual(values=c("grey95","lightsteelblue4")) +
  labs(
    y = "Atelectasis",
    x = "Oxygen use at home"
  ) +
  theme_mosaic() + 
  theme(axis.text.x=element_text(size=rel(0.8)))
```

```{r}
chi <- chisq.test(frequencies, correct=FALSE)
chi
```

> Patients who used oxygen at home had atelectasis more frequently (`r round(percent[2,1],1)`%) than those who did not (`r round(percent[1,1],1)`%) (p\<0.001).

```{r}
#| include: false 
rm(mean_exp,frequencies, percent, chi)
```

#### Atelectasis location by OSA

Mean expected frequency:

```{r}
mean_exp <- data %>% 
  drop_na(oxygen_use, atelectasis_location) %>% 
  summarize(mean_expected_freq = n()/(nlevels(oxygen_use)*nlevels(atelectasis_location)))

mean_exp
```

Mean expected frequency is greater than 5.0, so chi-squared without continuity correction is adequate.

Frequencies:

```{r}
frequencies <- table(oxygen_use, atelectasis_location)
frequencies
```

Percentage:

```{r}
percent <- round((prop.table(frequencies, 1)*100),2)
percent
```

Mosaic Plot

```{r, fig.height=3, fig.width=5}
ggplot(data = data) +
  geom_mosaic(aes(
    x = product(atelectasis_location,oxygen_use),
    fill=atelectasis_location),
    na.rm = TRUE
    ) +
  scale_fill_manual(values=c("seagreen1","seagreen4")) +
  labs(
    y = "Atelectasis location",
    x = "Oxygen use at home"
  ) +
  theme_mosaic() + guides(fill="none")
```

```{r}
chi <- chisq.test(frequencies, correct=FALSE)
chi
```

> The location of atelectasis was not different according to oxygen use status (p=`r round(chi$p.value,3)`).

```{r}
#| include: false 
rm(mean_exp,frequencies, percent, chi)
```

## Atelectasis - SpO2

```{r}
data_spo2 <- data %>% group_by(atelectasis) 

median_spo2 <- data_spo2 %>% summarize(n = n(),
                                       spo2_median = median(spo2_VPO), 
                                       Q1 = quantile(spo2_VPO,0.25), 
                                       Q3 = quantile(spo2_VPO,0.75), 
                                       min = min(spo2_VPO), 
                                       max = max(spo2_VPO)
                                       )
median_spo2 %>% gt()
```

```{r}
boxplot(spo2_VPO ~ atelectasis,
        ylab="SpO2 (%)",
        xlab="Atelectasis"
        )
```

Distribution not normal and influential outliers. Will assess non-parametrically.

```{r}
wil <- wilcox.test(spo2_VPO ~ atelectasis, 
                   data = data_spo2, 
                   exact = FALSE
                   )
wil
```

> The median SpO2 was significantly lower in patients with atelectasis (`r round(median_spo2$spo2_median[1],1)`, IQR: `r round(median_spo2$Q1[1],1)`-`r round(median_spo2$Q3[1],1)`) compared to those without (`r round(median_spo2$spo2_median[2],1)`, IQR: `r round(median_spo2$Q1[2],1)`-`r round(median_spo2$Q3[2],1)`) (p\<0.001).

```{r}
#| include: false 
rm(data_spo2,median_spo2,wil)
```

## Atelectasis location - SpO2

```{r}
data_spo2 <- data %>% 
  group_by(atelectasis_location) %>% 
  drop_na(atelectasis_location)

median_spo2 <- data_spo2 %>% summarize(n = n(),
                                       spo2_median = median(spo2_VPO), 
                                       Q1 = quantile(spo2_VPO,0.25), 
                                       Q3 = quantile(spo2_VPO,0.75), 
                                       min = min(spo2_VPO), 
                                       max = max(spo2_VPO)
                                       )
median_spo2 %>% gt()
```

```{r}
boxplot(spo2_VPO ~ atelectasis_location,
        ylab="SpO2 (%)",
        xlab="Atelectasis location"
        )
```

Distribution not normal and likely influential outliers. Will assess non-parametrically.

```{r}
wil <- wilcox.test(spo2_VPO ~ atelectasis_location, 
                   data = data_spo2, 
                   exact = FALSE
                   )
wil
```

> The median SpO2 was significantly lower in patients with bilateral atelectasis (`r round(median_spo2$spo2_median[2],1)`, IQR: `r round(median_spo2$Q1[2],1)`-`r round(median_spo2$Q3[2],1)`) compared to those with unilateral atelectasis (`r round(median_spo2$spo2_median[1],1)`, IQR: `r round(median_spo2$Q1[1],1)`-`r round(median_spo2$Q3[1],1)`) (p=`r round(wil$p.value,3)`).

```{r}
#| include: false 
rm(data_spo2,median_spo2,wil)
```

## Atelectasis percent - SpO2

### Smooth term?

Scatterplot

```{r}
plot(spo2_VPO~atelectasis_percent, data=data, 
     main="Scatterplot", 
     xlab="Atelectasis percent (%)", 
     ylab="SpO2 (%)")
```

Decreasing SpO2 as atelectasis percent increases.

Would a smooth term be more useful to model SpO2?

```{r}
#| include: false 
## Change 'false' for 'true' above to show plot. 

# Would a smooth term be more useful to model SpO2? Fit with loess:   

ggplot(data, aes(atelectasis_percent,spo2_VPO)) + 
  geom_point(size=0.6,color="gray40") + 
  geom_smooth(method="loess", color="black") +
  ylab("SpO2 (%)") + xlab("Atelectasis percent (%)") + 
  ylim(85,100) +
  theme_bw() + 
  theme(panel.border = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.line = element_line(colour = "black"),
        axis.text.x = element_text(size=rel(1.2)), 
        axis.text.y = element_text(size=rel(1.2))
        )
```

Models evaluated with the accompanying sourced script ***nonlinear_Atelectasis_SpO2.R***

```{r}
#| include: false

source("scripts/nonlinear_Atelectasis_SpO2.R", local = knitr::knit_global())
```

All models are significantly better than linear. Thus, using a smooth term for atelectasis percent is better than modelling a linear relationship.

Best AIC:

```{r}
data.frame(
  model = c("k=2","k=4","k=6","k=8"),
  AIC = round(c(AIC_k2,AIC_k4,AIC_k6,AIC_k8),1)
) %>% gt()
```

Regarding AIC, no model offers greater improvement in AIC than k=6. Will try a model with k=5.

Best AIC:

```{r}
data.frame(
  model = c("k=4","k=5","k=6"),
  AIC = round(c(AIC_k4,AIC_k5,AIC_k6),1)
) %>% gt()
```

There is a drop in AIC for k=5, which also offers the best k-index. Nonetheless, one problem with this is that the extra knot is explaining a clump around 12.%, for which there was only one single observation. Thus, it is likely that this clump and additional knot is only explaining noise in the data, and would thus not be a good representation of the trend in the variable. Thus, will keep k=4 to model as this model offers the best visual representation of the trend in all categories.

```{r}
fig1c
```

Negative monotonic relationship since SpO2 decreases as BMI increases. Will assess Spearman's correlation coefficient to report in paper:

```{r}
spearman <- cor.test(spo2_VPO,atelectasis_percent,
                     method="spearman",
                     exact=FALSE
                     )
spearman
```

> Atelectasis percent exhibited a negative non-linear monotonic relationship with SpO2 (**Figure 1C**, rho= `r round(spearman$estimate,3)`, p\<0.001).

### Figure 1

Created with the accompanying sourced script ***Figure1.R***

```{r}
#| include: false

source("scripts/Figure1.R", local = knitr::knit_global())
```

```{r, fig.height=8, fig.width=10}
grid.arrange(fig1a, blank_plot, fig1b, fig1c, ncol=2)
```

```{r}
#| include: false
rm(AIC_k2,AIC_k4,AIC_k5,AIC_k6,AIC_k8,model,spearman,figure1,fig1a,fig1b,fig1c,blank_plot)
```

```{r}
#| include: false
detach(data)
```

### Ordinal variable

Since there is only one participant in the 30% category, will collapse with the 20 category for further analyses:

```{r}
datamodel <- data
datamodel$atelectasis_percent <- factor(
  datamodel$atelectasis_percent,
  levels=c(0,2.5,5,7.5,10,12.5,15,17.5,27.5)
  ) %>% 
  fct_collapse("17.5%" = c(17.5,27.5)) %>% 
  factor(labels = c("0%","2.5%","5%","7.5%","10%","12.5%","15%","17.5%"))

table(datamodel$atelectasis_percent)
```

```{r}
#| include: false 
data_spo2 <- datamodel %>% group_by(atelectasis_percent)
```

```{r}
#| include: false 
## Change 'false' for 'true' above to show plot. 

# Assess distribution of SpO2 by atelectasis percent categories: 
ggplot(data_spo2,aes(x = spo2_VPO)) +
  geom_histogram(fill = "aquamarine", colour = "black") +
  facet_grid(atelectasis_percent ~ .)
```

Distribution not normal, group sizes are different and there are outliers in both directions, depending where you are located. Thus, will proceed with non-parametric assessment:

```{r}
median_spo2 <- data_spo2 %>% 
  summarize(n = n(), 
            spo2_median = median(spo2_VPO), 
            Q1 = quantile(spo2_VPO,0.25), 
            Q3 = quantile(spo2_VPO,0.75), 
            min = min(spo2_VPO), 
            max = max(spo2_VPO)
            )
median_spo2 %>% gt()
```

```{r}
krus <- kruskal.test(spo2_VPO ~ atelectasis_percent, data = data_spo2)
krus
```

> There was a decreasing trend in median SpO2 with higher atelectasis percentage extension (p\<0.001).

```{r}
ggplot(datamodel, aes(x = atelectasis_percent, y = spo2_VPO)) + 
  geom_boxplot(width = .4, outlier.shape = NA, fill="aliceblue") +
  geom_point(size = 1.5, alpha = .3, 
             position = position_jitter(seed = 1, width = .1)) + 
  ylab("SpO2 (%)") + xlab("Atelectasis percent") + 
  ylim(85,100) +
  labs(tag="Kruskall-Wallis: p<0.001") + 
  theme_bw() + 
  theme(panel.border = element_blank(), 
        panel.grid.major = element_blank(),
        panel.grid.minor = element_blank(),
        axis.line = element_line(colour = "black"),
        axis.ticks.length.x = unit(.1, "in"),
        axis.text.x = element_text(size=rel(1.2)),
        axis.text.y = element_text(size=rel(1.2)),
        plot.tag.position = "top", 
        plot.tag = element_text(size=rel(0.9))
        )
```

```{r}
#| include: false 
rm(data_spo2,median_spo2,krus) 
```

## Relationship between OSA, obesity type and atelectasis percent:

```{r}
#| include: false 
## Change 'false' for 'true' above to show table 

# This also gives an idea of which sizes to use for next plot: 
datamodel %>% 
  group_by(atelectasis_percent, sleep_apnea, type_obesity) %>% 
  summarize(
    n = n()
  )
```

```{r}
datamodel %>% mutate(
  type_obesity = fct_recode(
    type_obesity,
    "1" = "Class 1 Obesity",
    "2" = "Class 2 Obesity",
    "3" = "Class 3 Obesity"
    )
  ) %>% 
  count(type_obesity, sleep_apnea, atelectasis_percent) %>%
  ggplot(
    aes(x = sleep_apnea,
        y = atelectasis_percent,
        color = sleep_apnea)
    ) +
  geom_point(aes(group = sleep_apnea, size = n)) +
  scale_color_manual(values=c("gray40","darkred")) +
  facet_wrap(~type_obesity, scales = "free_x",
             labeller = labeller(type_obesity = label_both)
             ) +
  scale_size(breaks = c(1, 5, 10, 20, 30, 40, 50, 60)) + 
  labs(
    y = "Atelectasis percent",
    x = "Obstructive sleep apnea"
  ) +
  theme_light()
```

Sleep apnea was more common with higher BMI categories and also with higher atelectasis percentage. Atelectasis percent increases at higher obesity classes.

\pagebreak

# Package References

```{r}
#| include: false
report::cite_packages(session)
```

-   Angelo Canty, B. D. Ripley (2024). *boot: Bootstrap R (S-Plus) Functions*. R package version 1.3-30. A. C. Davison, D. V. Hinkley (1997). *Bootstrap Methods and Their Applications*. Cambridge University Press, Cambridge. ISBN 0-521-57391-2, <doi:10.1017/CBO9780511802843>.
-   Auguie B (2017). *gridExtra: Miscellaneous Functions for "Grid" Graphics*. R package version 2.3, <https://CRAN.R-project.org/package=gridExtra>.
-   Fox J, Weisberg S (2019). *An R Companion to Applied Regression*, Third edition. Sage, Thousand Oaks CA. <https://socialsciences.mcmaster.ca/jfox/Books/Companion/>.
-   Fox J, Weisberg S, Price B (2022). *carData: Companion to Applied Regression Data Sets*. R package version 3.0-5, <https://CRAN.R-project.org/package=carData>.
-   Grolemund G, Wickham H (2011). “Dates and Times Made Easy with lubridate.” *Journal of Statistical Software*, *40*(3), 1-25. <https://www.jstatsoft.org/v40/i03/>.
-   Iannone R, Cheng J, Schloerke B, Hughes E, Lauer A, Seo J (2024). *gt: Easily Create Presentation-Ready Display Tables*. R package version 0.10.1, <https://CRAN.R-project.org/package=gt>.
-   Jeppson H, Hofmann H, Cook D (2021). *ggmosaic: Mosaic Plots in the 'ggplot2' Framework*. R package version 0.3.3, <https://CRAN.R-project.org/package=ggmosaic>.
-   Makowski D, Lüdecke D, Patil I, Thériault R, Ben-Shachar M, Wiernik B (2023). “Automated Results Reporting as a Practical Tool to Improve Reproducibility and Methodological Best Practices Adoption.” *CRAN*. <https://easystats.github.io/report/>.
-   Müller K, Wickham H (2023). *tibble: Simple Data Frames*. R package version 3.2.1, <https://CRAN.R-project.org/package=tibble>.
-   Neuwirth E (2022). *RColorBrewer: ColorBrewer Palettes*. R package version 1.1-3, <https://CRAN.R-project.org/package=RColorBrewer>.
-   Peng RD (2019). *simpleboot: Simple Bootstrap Routines*. R package version 1.1-7, <https://CRAN.R-project.org/package=simpleboot>.
-   Pinheiro J, Bates D, R Core Team (2023). *nlme: Linear and Nonlinear Mixed Effects Models*. R package version 3.1-164, <https://CRAN.R-project.org/package=nlme>. Pinheiro JC, Bates DM (2000). *Mixed-Effects Models in S and S-PLUS*. Springer, New York. doi:10.1007/b98882 <https://doi.org/10.1007/b98882>.
-   R Core Team (2024). *R: A Language and Environment for Statistical Computing*. R Foundation for Statistical Computing, Vienna, Austria. <https://www.R-project.org/>.
-   Rich B (2023). *table1: Tables of Descriptive Statistics in HTML*. R package version 1.4.3, <https://CRAN.R-project.org/package=table1>.
-   Rinker TW, Kurkiewicz D (2018). *pacman: Package Management for R*. version 0.5.0, <http://github.com/trinker/pacman>.
-   Wickham H (2016). *ggplot2: Elegant Graphics for Data Analysis*. Springer-Verlag New York. ISBN 978-3-319-24277-4, <https://ggplot2.tidyverse.org>.
-   Wickham H (2023). *forcats: Tools for Working with Categorical Variables (Factors)*. R package version 1.0.0, <https://CRAN.R-project.org/package=forcats>.
-   Wickham H (2023). *stringr: Simple, Consistent Wrappers for Common String Operations*. R package version 1.5.1, <https://CRAN.R-project.org/package=stringr>.
-   Wickham H, Averick M, Bryan J, Chang W, McGowan LD, François R, Grolemund G, Hayes A, Henry L, Hester J, Kuhn M, Pedersen TL, Miller E, Bache SM, Müller K, Ooms J, Robinson D, Seidel DP, Spinu V, Takahashi K, Vaughan D, Wilke C, Woo K, Yutani H (2019). “Welcome to the tidyverse.” *Journal of Open Source Software*, *4*(43), 1686. doi:10.21105/joss.01686 <https://doi.org/10.21105/joss.01686>.
-   Wickham H, François R, Henry L, Müller K, Vaughan D (2023). *dplyr: A Grammar of Data Manipulation*. R package version 1.1.4, <https://CRAN.R-project.org/package=dplyr>.
-   Wickham H, Henry L (2023). *purrr: Functional Programming Tools*. R package version 1.0.2, <https://CRAN.R-project.org/package=purrr>.
-   Wickham H, Hester J, Bryan J (2024). *readr: Read Rectangular Text Data*. R package version 2.1.5, <https://CRAN.R-project.org/package=readr>.
-   Wickham H, Vaughan D, Girlich M (2024). *tidyr: Tidy Messy Data*. R package version 1.3.1, <https://CRAN.R-project.org/package=tidyr>.
-   Wood SN (2011). “Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models.” *Journal of the Royal Statistical Society (B)*, *73*(1), 3-36. Wood S, N., Pya, S"afken B (2016). “Smoothing parameter and model selection for general smooth models (with discussion).” *Journal of the American Statistical Association*, *111*, 1548-1575. Wood SN (2004). “Stable and efficient multiple smoothing parameter estimation for generalized additive models.” *Journal of the American Statistical Association*, *99*(467), 673-686. Wood S (2017). *Generalized Additive Models: An Introduction with R*, 2 edition. Chapman and Hall/CRC. Wood SN (2003). “Thin-plate regression splines.” *Journal of the Royal Statistical Society (B)*, *65*(1), 95-114.

```{r}
#| include: false

# Run this chunk if you wish to clear your environment and unload packages.

rm(data, datamodel, figfolder, seed, session)

pacman::p_unload(negate = TRUE)
```