Skip to content

Commit

Permalink
Move sparse datasets to development
Browse files Browse the repository at this point in the history
  • Loading branch information
J535D165 committed Jun 8, 2022
1 parent 5618755 commit 70847ce
Show file tree
Hide file tree
Showing 2 changed files with 32 additions and 39 deletions.
32 changes: 32 additions & 0 deletions DEVELOPMENT.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Development

## Very sparse datasets

Very sparse or small datasets can provide good explanation on interesting
details of the plotting subcommands in this extension. Important details are
for example the handling of prior knowledge and the computation of the recall
prediction in case of random screening.

The following plot shows the result of a collection of 4 records with 3
relevant items (inclusions). The relevant items are found in the following
order:

```
[1, 1, 0, 1, 0]
```

![Recall of small dataset example](https://github.com/asreview/asreview-insights/blob/master/figures/tests_small_dataset_recall.png)

The black line is an estimate of the recall after every screened record in a
naive manner (also refered to as 'random').

The Work Saved over Sampling (WSS) is the difference between the recall of the
simulation and the theoretical recall of random screening.

![WSS for small dataset example](https://github.com/asreview/asreview-insights/blob/master/figures/tests_small_dataset_wss.png)

The following graph shows the recall versus the WSS. This comparison is
important because it is the fundamental of the `WSS@95%` metric used in the
literature about Active Learning for systematic reviewing.

![ERF for small dataset example](https://github.com/asreview/asreview-insights/blob/master/figures/tests_small_dataset_erf.png)
39 changes: 0 additions & 39 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -168,45 +168,6 @@ horizontal axis shows the proportion of total number of records in the
dataset. The steep increase of the ERF in the beginning of the process is
related to the steep recall curve.

### Very sparse datasets

Very sparse or small datasets can provide good explanation on interesting
details of the plotting subcommands in this extension. Important details are
for example the handling of prior knowledge and the computation of the recall
prediction in case of random screening.

The following plot shows the result of a collection of 4 records with 3
relevant items (inclusions). The relevant items are found in the following
order:

```
[1, 1, 0, 1, 0]
```

![Recall of small dataset example](https://github.com/asreview/asreview-insights/blob/master/figures/tests_small_dataset_recall.png)

The black line is an estimate of the recall after every screened record in a
naive manner (also refered to as 'random').

```
Recall (est) when screening 1 = (3 relevant records / 4 records left) / 3 = 0.25
Recall (est) when screening 2 = (1/4) * (3 relevant records / 3 records left) / 3 +
(3/4) * (2 relevant records / 3 records left) / 3 = (1/4 + 3/4*2/3) / 3 = 0.25
```

The Work Saved over Sampling (WSS) is the difference between the recall of the
simulation and the theoretical recall of random screening.

![WSS for small dataset example](https://github.com/asreview/asreview-insights/blob/master/figures/tests_small_dataset_wss.png)

The following graph shows the recall versus the WSS. This comparison is
important because it is the fundamental of the `WSS@95%` metric used in the
literature about Active Learning for systematic reviewing.

![ERF for small dataset example](https://github.com/asreview/asreview-insights/blob/master/figures/tests_small_dataset_erf.png)


### Plotting CLI

See `asreview plot -h` for all command line arguments.
Expand Down

0 comments on commit 70847ce

Please sign in to comment.