generated from Sage-Bionetworks/docker-rstudio
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathdetermine-top-performers.Rmd
42 lines (26 loc) · 1.93 KB
/
determine-top-performers.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
---
title: "Bootstrap analysis: Determine top performers"
author: "a name like Annie Easley"
date: "`r Sys.Date()`"
output:
html_document:
df_print: paged
code_fold: hide
toc: true
toc_float: true
---
## Introduction
In order to declare top-performers for a DREAM challenge, we need to assess if there are any "tied" methods, that is, methods that are not substantially different in performance. We determine this using a bootstrapping (sampling with replacement) approach to determing how a submission would score in different scenarios (that is - when only considering resampled sets of the values to be predicted). Specifically, we sample with replacement all of the submitted predictions and the gold standard and score the prediction files. We repeat this for at total of 1000-10000 samples to obtain a distribution of scores for each participant. We then calculate a Bayes factor relative to the best-scoring method, to see if any of the other methods are within a certain threshold. Smaller Bayes factors indicate more similar performance while larger Bayes factors indicate more disparate performance. We use a Bayes factor of 3 as a cutoff to indicate a tie.
First, import packages for data manipulation and retrieve prediction data, gold standard data, and the template for use in the scoring code. Don't forget to set a seed!
```{r echo=TRUE, message=FALSE, warning=FALSE}
set.seed(98109)
```
Read in prediction files, combine, then bootstrap the predictions + a gold standard 1000 times to calculate 1000 scores per prediction.
```{r echo=TRUE, message=FALSE, warning=FALSE}
```
Use our `challengescoring` package to compute Bayes factors using a matrix of scores, setting the `refPredIndex` as the number of the column that contains the top prediction (the reference prediction).
```{r echo=TRUE, message=FALSE, warning=FALSE}
```
Then plot boxplot of all scores, coloring the boxes by Bayes factor.
```{r echo=TRUE}
```