-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #77 from ctr26/docs
[init] docs
- Loading branch information
Showing
4 changed files
with
184 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# .readthedocs.yaml | ||
# Read the Docs configuration file | ||
# See https://docs.readthedocs.io/en/stable/config-file/v2.html for details | ||
|
||
# Required | ||
version: 2 | ||
|
||
# Set the OS, Python version and other tools you might need | ||
build: | ||
os: ubuntu-22.04 | ||
tools: | ||
python: "3.12" | ||
# You can also specify other tool versions: | ||
# nodejs: "19" | ||
# rust: "1.64" | ||
# golang: "1.19" | ||
|
||
# Build documentation in the "docs/" directory with Sphinx | ||
sphinx: | ||
configuration: docs/conf.py | ||
|
||
# Optionally build your docs in additional formats such as PDF and ePub | ||
# formats: | ||
# - epub | ||
|
||
# Optional but recommended, declare the Python requirements required | ||
# to build your documentation | ||
# See https://docs.readthedocs.io/en/stable/guides/reproducible-builds.html | ||
# python: | ||
# install: | ||
# - requirements: docs/requirements.txt |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,100 @@ | ||
The cli is mostly handled by hydra (https://hydra.cc/docs/intro/). The main commands are: | ||
|
||
bie_train: Train a model | ||
bie_predict: Predict with a model | ||
|
||
# Training | ||
|
||
To train a model, you can use the following command: | ||
|
||
```bash | ||
bie_train | ||
``` | ||
|
||
To see all the available options, you can use the `--help` flag: | ||
|
||
```bash | ||
bie_train --help | ||
``` | ||
|
||
## Data | ||
|
||
Out of the box bie_train is configured to try to use torchvision.datasets.ImageFolder to load data. | ||
This can be endlessly overwritte using Hydra's configuration system (e.g. _target_ ). | ||
However, for most applications using the stock ImageFolder class will work. | ||
To then point the model to useful data you need to set the 'receipe.data' key like so: | ||
|
||
```bash | ||
bie_train recipe.data=/path/to/data | ||
``` | ||
|
||
ImageFolder will use PIL to load images, so you can use any image format that PIL supports, this includes jpg, png, bmp, etc, tif. | ||
|
||
More exotic formats will require a custom dataset class, which is not covered here; realisitically you should convert your data to a more common format. | ||
PNG for instance is a lossless format that loads quickly from disk due to it's efficient compression. | ||
The bie_train defaults tend to be sane, for instance the data is shuffled, and the data is split into train and validation sets. | ||
|
||
It is worth noting that ImageFolder expects the data to be organised into "classes" even though default bie_train does not use the class labels during training. | ||
To denote these classes, you should organise your data into folders, where each folder is a class, and the images in that folder are instances of that class. | ||
See here for more information: https://pytorch.org/vision/stable/datasets.html#imagefolder | ||
|
||
## Models | ||
|
||
The default model backbone a "resnet18" with a "vae" architecture for autoencoding, but you can specify a different model using the `receipe.model` flag: | ||
|
||
```bash | ||
bie_train recipe.model=resnet50_vqvae receipe.data=/path/to/data | ||
``` | ||
|
||
N.B. the resnet series of models expect the tensor input to (3,224,224) in shape, | ||
|
||
|
||
### Supervised vs Unsupervised models | ||
|
||
By default the model is unsupervised, meaning the class labels are ignored during training. | ||
However, a (experimental) supervised model can be selected by setting: | ||
|
||
```bash | ||
bie_train lit_model.model=_target_="bioimage_embed.lightning.torch.AutoEncoderSupervised" receipe.data=/path/to/data | ||
``` | ||
|
||
This uses contrastive learning using the labelled data, specifically SimCLR: https://arxiv.org/abs/2002.05709 | ||
|
||
## Reciepes | ||
|
||
The major components of the training process are controlled by the "reciepe" schema. | ||
These values are also what is used for generating the uuid of the training run. | ||
This means that the model can infact resume from a crash or be retrained with the same configuration aswell as multiple models being trained in parallel using the same directory. | ||
This is useful for hyperparameter search, or for training multiple models on the same data. | ||
|
||
### lr_scheduler and optimizer | ||
|
||
The lr_scheduler and optimizer are mimics of the timm library and built using create_optimizer and create_scheduler. | ||
https://timm.fast.ai/Optimizers | ||
and | ||
https://timm.fast.ai/schedulerss | ||
|
||
The default optimizer is "adamw" and the default scheduler is "cosine", aswell as some other hyperparameters borrowed from: https://arxiv.org/abs/2110.00476 | ||
|
||
The way the timm create_* functions work is they receive a generic SimpleNamespace, and only take the keys they need. | ||
The consequence is that timm creates a controlled vocabulary for the hyperparameters in receipe; this makes it possible to choose from the wide variety of optimizers and schedulers in timm. | ||
https://timm.fast.ai | ||
|
||
## Augmentation | ||
|
||
The package includes a default augmentation, which is stored in the configruation file. | ||
The default augmentation is written using albumentations, which is a powerful library for image augmentation. | ||
https://albumentations.ai/docs/ | ||
|
||
|
||
The default augmentation is a simple set of augmentations that are useful for biological_images, crucially it mostly neglects any RGB and non-physical augmentation effects. | ||
It is recommended to edit the default augmentations in the configuration file and not in the CLI as the commands can get quite long. | ||
|
||
|
||
## Config file | ||
|
||
This will train a model using the default configuration. You can also specify a configuration file using the `--config` flag: | ||
|
||
```bash | ||
bie_train --config path/to/config.yaml | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,52 @@ | ||
# Configuration file for the Sphinx documentation builder. | ||
# | ||
# This file only contains a selection of the most common options. For a full | ||
# list see the documentation: | ||
# https://www.sphinx-doc.org/en/master/usage/configuration.html | ||
|
||
# -- Path setup -------------------------------------------------------------- | ||
|
||
# If extensions (or modules to document with autodoc) are in another directory, | ||
# add these directories to sys.path here. If the directory is relative to the | ||
# documentation root, use os.path.abspath to make it absolute, like shown here. | ||
# | ||
# import os | ||
# import sys | ||
# sys.path.insert(0, os.path.abspath('.')) | ||
|
||
|
||
# -- Project information ----------------------------------------------------- | ||
|
||
project = "Bioimage Embed" | ||
copyright = "2024, Craig Russell" | ||
author = "Craig Russell" | ||
|
||
|
||
# -- General configuration --------------------------------------------------- | ||
|
||
# Add any Sphinx extension module names here, as strings. They can be | ||
# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom | ||
# ones. | ||
extensions = ["myst_parser"] | ||
|
||
|
||
# Add any paths that contain templates here, relative to this directory. | ||
templates_path = ["_templates"] | ||
|
||
# List of patterns, relative to source directory, that match files and | ||
# directories to ignore when looking for source files. | ||
# This pattern also affects html_static_path and html_extra_path. | ||
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"] | ||
|
||
|
||
# -- Options for HTML output ------------------------------------------------- | ||
|
||
# The theme to use for HTML and HTML Help pages. See the documentation for | ||
# a list of builtin themes. | ||
# | ||
html_theme = "alabaster" | ||
|
||
# Add any paths that contain custom static files (such as style sheets) here, | ||
# relative to this directory. They are copied after the builtin static files, | ||
# so a file named "default.css" will overwrite the builtin "default.css". | ||
html_static_path = ["_static"] |
Empty file.