Skip to content

Commit

Permalink
Merge pull request #77 from ctr26/docs
Browse files Browse the repository at this point in the history
[init] docs
  • Loading branch information
ctr26 committed Oct 9, 2024
2 parents 8eda614 + 76cd5f7 commit 74b7ffc
Show file tree
Hide file tree
Showing 6 changed files with 178 additions and 0 deletions.
19 changes: 19 additions & 0 deletions .readthedocs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
# Read the Docs configuration file for MkDocs projects
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details

# Required
version: 2

# Set the version of Python and other tools you might need
build:
os: ubuntu-22.04
tools:
python: "3.12"

mkdocs:
configuration: mkdocs.yml

# Optionally declare the Python requirements required to build your docs
python:
install:
- requirements: docs/requirements.txt
100 changes: 100 additions & 0 deletions docs/cli.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
The cli is mostly handled by hydra (https://hydra.cc/docs/intro/). The main commands are:

bie_train: Train a model
bie_predict: Predict with a model

# Training

To train a model, you can use the following command:

```bash
bie_train
```

To see all the available options, you can use the `--help` flag:

```bash
bie_train --help
```

## Data

Out of the box bie_train is configured to try to use torchvision.datasets.ImageFolder to load data.
This can be endlessly overwritte using Hydra's configuration system (e.g. _target_ ).
However, for most applications using the stock ImageFolder class will work.
To then point the model to useful data you need to set the 'receipe.data' key like so:

```bash
bie_train recipe.data=/path/to/data
```

ImageFolder will use PIL to load images, so you can use any image format that PIL supports, this includes jpg, png, bmp, etc, tif.

More exotic formats will require a custom dataset class, which is not covered here; realisitically you should convert your data to a more common format.
PNG for instance is a lossless format that loads quickly from disk due to it's efficient compression.
The bie_train defaults tend to be sane, for instance the data is shuffled, and the data is split into train and validation sets.

It is worth noting that ImageFolder expects the data to be organised into "classes" even though default bie_train does not use the class labels during training.
To denote these classes, you should organise your data into folders, where each folder is a class, and the images in that folder are instances of that class.
See here for more information: https://pytorch.org/vision/stable/datasets.html#imagefolder

## Models

The default model backbone a "resnet18" with a "vae" architecture for autoencoding, but you can specify a different model using the `receipe.model` flag:

```bash
bie_train recipe.model=resnet50_vqvae receipe.data=/path/to/data
```

N.B. the resnet series of models expect the tensor input to (3,224,224) in shape,


### Supervised vs Unsupervised models

By default the model is unsupervised, meaning the class labels are ignored during training.
However, a (experimental) supervised model can be selected by setting:

```bash
bie_train lit_model.model=_target_="bioimage_embed.lightning.torch.AutoEncoderSupervised" receipe.data=/path/to/data
```

This uses contrastive learning using the labelled data, specifically SimCLR: https://arxiv.org/abs/2002.05709

## Reciepes

The major components of the training process are controlled by the "reciepe" schema.
These values are also what is used for generating the uuid of the training run.
This means that the model can infact resume from a crash or be retrained with the same configuration aswell as multiple models being trained in parallel using the same directory.
This is useful for hyperparameter search, or for training multiple models on the same data.

### lr_scheduler and optimizer

The lr_scheduler and optimizer are mimics of the timm library and built using create_optimizer and create_scheduler.
https://timm.fast.ai/Optimizers
and
https://timm.fast.ai/schedulerss

The default optimizer is "adamw" and the default scheduler is "cosine", aswell as some other hyperparameters borrowed from: https://arxiv.org/abs/2110.00476

The way the timm create_* functions work is they receive a generic SimpleNamespace, and only take the keys they need.
The consequence is that timm creates a controlled vocabulary for the hyperparameters in receipe; this makes it possible to choose from the wide variety of optimizers and schedulers in timm.
https://timm.fast.ai

## Augmentation

The package includes a default augmentation, which is stored in the configruation file.
The default augmentation is written using albumentations, which is a powerful library for image augmentation.
https://albumentations.ai/docs/


The default augmentation is a simple set of augmentations that are useful for biological_images, crucially it mostly neglects any RGB and non-physical augmentation effects.
It is recommended to edit the default augmentations in the configuration file and not in the CLI as the commands can get quite long.


## Config file

This will train a model using the default configuration. You can also specify a configuration file using the `--config` flag:

```bash
bie_train --config path/to/config.yaml
```
52 changes: 52 additions & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
# Configuration file for the Sphinx documentation builder.
#
# This file only contains a selection of the most common options. For a full
# list see the documentation:
# https://www.sphinx-doc.org/en/master/usage/configuration.html

# -- Path setup --------------------------------------------------------------

# If extensions (or modules to document with autodoc) are in another directory,
# add these directories to sys.path here. If the directory is relative to the
# documentation root, use os.path.abspath to make it absolute, like shown here.
#
# import os
# import sys
# sys.path.insert(0, os.path.abspath('.'))


# -- Project information -----------------------------------------------------

project = "Bioimage Embed"
copyright = "2024, Craig Russell"
author = "Craig Russell"


# -- General configuration ---------------------------------------------------

# Add any Sphinx extension module names here, as strings. They can be
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
# ones.
extensions = ["myst_parser"]


# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

# List of patterns, relative to source directory, that match files and
# directories to ignore when looking for source files.
# This pattern also affects html_static_path and html_extra_path.
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]


# -- Options for HTML output -------------------------------------------------

# The theme to use for HTML and HTML Help pages. See the documentation for
# a list of builtin themes.
#
html_theme = "alabaster"

# Add any paths that contain custom static files (such as style sheets) here,
# relative to this directory. They are copied after the builtin static files,
# so a file named "default.css" will overwrite the builtin "default.css".
html_static_path = ["_static"]
Empty file added docs/library.md
Empty file.
1 change: 1 addition & 0 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
myst-parser==4.0.0
6 changes: 6 additions & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
site_name: "Bioimage Embed"
site_url: ""
nav:
- 'cli.md'
- 'library.md'
theme: readthedocs

0 comments on commit 74b7ffc

Please sign in to comment.