Skip to content

Commit

Permalink
docs(README.md): update README with further details and rationale
Browse files Browse the repository at this point in the history
  • Loading branch information
znicholls committed Apr 29, 2023
1 parent 4eeec02 commit 1f494bf
Show file tree
Hide file tree
Showing 2 changed files with 122 additions and 11 deletions.
42 changes: 42 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
# Makefile to help automate key steps

.DEFAULT_GOAL := help


# A helper script to get short descriptions of each target in the Makefile
define PRINT_HELP_PYSCRIPT
import re, sys

for line in sys.stdin:
match = re.match(r'^([\$$\(\)a-zA-Z_-]+):.*?## (.*)$$', line)
if match:
target, help = match.groups()
print("%-30s %s" % (target, help))
endef
export PRINT_HELP_PYSCRIPT


help: ## print short description of each target
@python3 -c "$$PRINT_HELP_PYSCRIPT" < $(MAKEFILE_LIST)

.PHONY: checks
checks: ## run all the linting checks of the codebase
@echo "=== black docs ==="; poetry run blacken-docs --check book/notebooks/*.md || echo "--- black docs failed ---" >&2; \
echo "======"

.PHONY: docs
docs: ## build the docs (also acts as a testing step because of assertions in the notebooks)
poetry run jupyter-book build book

.PHONY: black-docs
black-docs: ## format the notebok examples using black
poetry run blacken-docs book/notebooks/*.md

.PHONY: check-commit-messages
check-commit-messages: ## check commit messages
poetry run cz check --rev-range 6ecf76a3..HEAD

virtual-environment: ## update virtual environment, create a new one if it doesn't already exist
# Put virtual environments in the project
poetry config virtualenvs.in-project true
poetry install
91 changes: 80 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,91 @@
# OpenSCM-Calibration-examples

Long-running examples using OpenSCM-Calibration
Long-running examples using [OpenSCM-Calibration](https://github.com/openscm/OpenSCM-Calibration).

To do:
## Installation

- tidy up the below
- Makefile/install/run instructionss
After cloning the repository, we recommend installing with [poetry](https://python-poetry.org/).

## Development
```bash
poetry install
```

Comments on the below welcome, it is a work in progress
## Running the examples

In this repository, we don't currently store outputs in the notebooks (more on this below). The reason is that outputs can quickly become bloated and storing the outputs discourages re-running i.e. testing.
All the examples can be run with (and the docs built with)

Having looked around, we haven't found a good solution that allows us to run our notebooks once as part of a test, then use the run output directly in a docs build without rebuilding. nbmake claims to support this use case, but in our experience jupyter-book didn't recognise the nbmake output and tried to re-run the outputs anyway. Perhaps we were just using the combination of tools incorrectly (one thing to keep in mind if trying to make this work is that we want the execution time of the notebook to appear in our docs too).
```bash
poetry run jupyter-book build book
```

Instead, we combine the docs building and testing steps. We build the docs using jupyter-book. In our notebooks, we include assertion cells (which we make togglable in the built docs using jupyter-book's support for this) to check that the notebooks have run correctly. This makes it clear to devs where this assertion is and allows users to look at it if they wish without forcing them to deal with that detail if they don't want to. As a result of the assertions in the notebook, the docs build won't pass unless the assertions pass i.e. our docs build is also our test. This is a bit yuck as we'd ideally have these two steps be separate, but we couldn't work out how to make existing tools do this so have gone with this plan b instead (PRs or issues to discuss solutions are very welcome, our suggestion would be to try to build any solution around a combination of nbmake, jupyter-cache and jupyter-book).
## Rationale

We do check the notebook formatting as a separate step. This is easy to do and very cheap.
As a user of [OpenSCM-Calibration](https://github.com/openscm/OpenSCM-Calibration)
(or indeed any repository), it is helpful to have examples of how it is used
in production and the outputs it produces. However, including such examples in
the core development repository comes with a problem: it adds a very slow step
to any continuous integration, which quickly becomes really annoying for
development. A secondary problem is that you also end up with output in the
repository, which quickly bloats it (even notebooks can be megabytes in size
if they contain many plots).

If we had super long-running notebooks, we could start to track the outputs of our notebook cache i.e. store their outputs in the repository too. This would make the notebook untested in most cases (because the cache would be used by default) but this might be an ok compromise in some circumstances. We don't have such a use case yet so we haven't implemented it but we think the current solution at least provides a path for that so is a good choice for now.
The solution we use here is to house our production-style examples in a
separate repository. We don't run these examples every time we change the code
base, however we do run them regularly and check/update them when the core
repository makes new releases.

### Further details

Even in this repository, we don't currently store outputs in the notebooks.
The reason is that outputs can quickly become bloated and storing the outputs
discourages re-running i.e. testing the notebooks.

Having looked around, we haven't found a good solution that allows us to run
our notebooks once as part of a test, then use the run output directly in a
docs build without rebuilding. [nbmake](https://github.com/treebeardtech/nbmake)
claims to support this use case, but in our experience jupyter-book didn't
recognise the nbmake output and tried to re-run the notebooks anyway. Perhaps
we were just using the combination of tools like [nbmake](https://github.com/treebeardtech/nbmake),
[jupyter-cache](https://github.com/executablebooks/jupyter-cache) and
[jupyter-book](https://github.com/executablebooks/jupyter-book) incorrectly
(one thing to keep in mind if trying to make this work is that we want the
execution time of the notebook to appear in our docs too).

Instead, we combine the docs building and testing steps. We build the docs
using jupyter-book, and include, in our notebooks, assertion cells that act
effectively as tests. We avoid polluting the entire notebook with these
assertions by making them hidden by default (and use jupyter-book's support
for showing and hiding cells to give users and readers the chance to check
them if they wish). We like this solution because it makes clear to developers
where the assertion is (other solutions hide the assertions in the cell's
JSON, which feels like a hack to us) and allows users to look at it if they
wish while making clear that they aren't actually necessary for the example to
run.

We do check the notebook formatting as a separate step. This is easy to do and
very cheap using [blacken-docs](https://github.com/adamchainz/blacken-docs).

### Really long-running notebooks

If we want to add super long-running notebooks to this repository, one
possible problem is that they are too long-running to reasonably run in CI
(perhaps they take 3 days to run). In this case, one solution could be to
start tracking some of the outputs of our notebook cache. This would make the
notebook untested in most cases (because the cache would be used instead of
running the notebook) but this might be the best compromise where running the
notebook is truly not an option. We don't have such a use case yet so we
haven't implemented this, but we think the current solution doesn't shut the
door on really long-running notebooks so is a good choice for now.

## For developers

For development, we rely on [poetry](https://python-poetry.org) for all our
dependency management. For all of work, we use our `Makefile`.
You can read the instructions out and run the commands by hand if you wish,
but we generally discourage this because it can be error prone and doesn't
update if dependencies change (e.g. the environment is updated).
In order to create your environment, run `make virtual-environment -B`.

If there are any issues, the messages from the `Makefile` should guide you
through. If not, please raise an issue in the
[issue tracker](https://github.com/openscm/OpenSCM-Calibration_examples/issues).

0 comments on commit 1f494bf

Please sign in to comment.