Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline Architecture Refinement and Enhancement #3

Merged
merged 60 commits into from
Mar 26, 2024
Merged
Show file tree
Hide file tree
Changes from 52 commits
Commits
Show all changes
60 commits
Select commit Hold shift + click to select a range
11f97b0
Thinh's suggestion: pull/2#discussion_r1532180529
MilagrosMarin Mar 20, 2024
71d72fe
Suggestion pull/2#discussion_r1532189430_
MilagrosMarin Mar 20, 2024
e77f2f2
Suggestion pull/2#discussion_r1532233183_
MilagrosMarin Mar 20, 2024
09d8009
Suggestion pull/2#discussion_r1532363392_
MilagrosMarin Mar 20, 2024
f0f60c9
suggestion pull/2#discussion_r1532363814_
MilagrosMarin Mar 20, 2024
961a77b
suggestion pull/2#discussion_r1532329205_
MilagrosMarin Mar 20, 2024
57ae481
suggestion pull/2#discussion_r1532337714_
MilagrosMarin Mar 20, 2024
2c929a7
suggestion pull/2#discussion_r1532339169_
MilagrosMarin Mar 20, 2024
22a05dc
suggestion pull/2#discussion_r1532373250_
MilagrosMarin Mar 20, 2024
42f3168
suggestion pull/2#discussion_r1532375217_
MilagrosMarin Mar 20, 2024
065d014
suggestion pull/2#discussion_r1532403101_
MilagrosMarin Mar 20, 2024
58ab1a3
suggestion pull/2#discussion_r1532421093_
MilagrosMarin Mar 20, 2024
e91afcd
suggestion pull/2#discussion_r1532435698_
MilagrosMarin Mar 20, 2024
db7c393
suggestion pull/2#discussion_r1532440194_
MilagrosMarin Mar 20, 2024
ba450ae
suggestion pull/2#discussion_r1532188097_
MilagrosMarin Mar 20, 2024
4c07e50
update `IMAGING_ROOT_DATA_DIR` to /example_data
MilagrosMarin Mar 20, 2024
fd94abb
update `citation.md`
MilagrosMarin Mar 20, 2024
105157f
kpms_pca->moseq_train & kpms_model->moseq_infer
MilagrosMarin Mar 20, 2024
d2064bc
move prefitting and fullfitting to `moseq_train`
MilagrosMarin Mar 20, 2024
96ed977
eliminate redundancy of `get_kpms_x_data_dir` func
MilagrosMarin Mar 20, 2024
b8e8eac
one activation for both of the modules
MilagrosMarin Mar 20, 2024
e24377f
suggestion pull/2#discussion_r1532921555_
MilagrosMarin Mar 20, 2024
4889c8d
refactor `make` of `Inference` 2#disc_r1532924691_
MilagrosMarin Mar 20, 2024
0b918fb
suggestion pull/2#discussion_r1532926679_
MilagrosMarin Mar 20, 2024
5c7dfed
suggestion pull/2#discussion_r1532927138_
MilagrosMarin Mar 20, 2024
1dc9506
revert git mv moseq_infer -> kpms_model
MilagrosMarin Mar 20, 2024
4151274
git mv `kpms_model` -> `moseq_infer.py`
MilagrosMarin Mar 20, 2024
d201bc4
update Dockerfile pointing to inbox and outbox
MilagrosMarin Mar 20, 2024
142ad44
from XFitting to PCAFit, PreFit & FullFit
MilagrosMarin Mar 20, 2024
52f6fa5
update `images`
MilagrosMarin Mar 20, 2024
e555e73
add import two dependencies
MilagrosMarin Mar 20, 2024
761146b
update tutorial notebook
MilagrosMarin Mar 20, 2024
09d580d
git mv `tutorial_pipeline.py` to `tests`
MilagrosMarin Mar 22, 2024
4a2464f
from previous commit
MilagrosMarin Mar 22, 2024
a58389a
major changes to `moseq_train`
MilagrosMarin Mar 22, 2024
760f745
major changes to `moseq_infer`
MilagrosMarin Mar 22, 2024
e0cbd7f
bump version and update changelog
MilagrosMarin Mar 22, 2024
0f74e35
revert mv `tutorial_pipeline.py` to `tests`
MilagrosMarin Mar 25, 2024
0bd5155
update changelog
MilagrosMarin Mar 25, 2024
35e6ea0
update `pipeline.md` with new architecture
MilagrosMarin Mar 25, 2024
3de9e01
update `tutorial.ipynb`
MilagrosMarin Mar 25, 2024
f220165
update `images`
MilagrosMarin Mar 25, 2024
7553b9c
update `moseq_train.py`
MilagrosMarin Mar 25, 2024
6c8e707
update `moseq_infer.py`
MilagrosMarin Mar 25, 2024
edaa0a9
add cite in `citation.md`
MilagrosMarin Mar 25, 2024
3ba3a49
update CHANGELOG
MilagrosMarin Mar 25, 2024
51bba6d
black formatting
MilagrosMarin Mar 25, 2024
20a1960
Update CHANGELOG.md
MilagrosMarin Mar 25, 2024
9d9f8b8
Update CHANGELOG.md
MilagrosMarin Mar 25, 2024
b04a80c
Update CHANGELOG.md
MilagrosMarin Mar 25, 2024
7e58ffe
refactor `PCAPrep` `make` function
MilagrosMarin Mar 25, 2024
441ce35
fix env variables DevContainer
MilagrosMarin Mar 25, 2024
6cb9056
black formatting `version`
MilagrosMarin Mar 25, 2024
32a8965
Merge remote-tracking branch 'origin/main'
MilagrosMarin Mar 25, 2024
9b36f16
black formatting `moseq_train`
MilagrosMarin Mar 25, 2024
f5b2a66
added suggestions from this PR
MilagrosMarin Mar 25, 2024
8485325
refactor `setup_project` code block
MilagrosMarin Mar 25, 2024
cd2e872
`kappa` type from `int` to `float`
MilagrosMarin Mar 25, 2024
1f1831c
update `tutorial.ipynb`
MilagrosMarin Mar 25, 2024
83c68d5
black formatting
MilagrosMarin Mar 25, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 3 additions & 2 deletions .devcontainer/Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -44,8 +44,9 @@ ENV DJ_HOST fakeservices.datajoint.io
ENV DJ_USER root
ENV DJ_PASS simple

ENV KPMS_ROOT_DATA_DIR /workspaces/element-moseq/example_data/inbox
ENV KPMS_ROOT_OUTPUT_DIR /workspaces/element-moseq/example_data/outbox
ENV DATA_MOUNTPOINT /workspaces/element-moseq/example_data
ENV KPMS_ROOT_DATA_DIR $DATA_MOUNTPOINT/inbox
ENV KPMS_PROCESSED_DATA_DIR $DATA_MOUNTPOINT/outbox
ENV DATABASE_PREFIX neuro_

USER vscode
Expand Down
4 changes: 2 additions & 2 deletions .devcontainer/devcontainer.json
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@
"remoteEnv": {
"LOCAL_WORKSPACE_FOLDER": "${localWorkspaceFolder}"
},
"onCreateCommand": "mkdir -p ${KPMS_ROOT_DATA_DIR} && pip install -e .",
"postStartCommand": "docker volume prune -f && s3fs ${DJ_PUBLIC_S3_LOCATION} ${KPMS_ROOT_DATA_DIR} -o nonempty,multipart_size=530,endpoint=us-east-1,url=http://s3.amazonaws.com,public_bucket=1",
"onCreateCommand": "mkdir -p ${DATA_MOUNTPOINT} && pip install -e .",
"postStartCommand": "docker volume prune -f && s3fs ${DJ_PUBLIC_S3_LOCATION} ${DATA_MOUNTPOINT} -o nonempty,multipart_size=530,endpoint=us-east-1,url=http://s3.amazonaws.com,public_bucket=1",
"hostRequirements": {
"cpus": 4,
"memory": "8gb",
Expand Down
8 changes: 0 additions & 8 deletions .github/workflows/release.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -4,14 +4,6 @@ on:
jobs:
make_github_release:
uses: datajoint/.github/.github/workflows/make_github_release.yaml@main
pypi_release:
needs: make_github_release
uses: datajoint/.github/.github/workflows/pypi_release.yaml@main
secrets:
TWINE_USERNAME: ${{secrets.TWINE_USERNAME}}
TWINE_PASSWORD: ${{secrets.TWINE_PASSWORD}}
with:
UPLOAD_URL: ${{needs.make_github_release.outputs.release_upload_url}}
mkdocs_release:
uses: datajoint/.github/.github/workflows/mkdocs_release.yaml@main
permissions:
Expand Down
14 changes: 14 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,20 @@
Observes [Semantic Versioning](https://semver.org/spec/v2.0.0.html) standard and
[Keep a Changelog](https://keepachangelog.com/en/1.0.0/) convention.

## [0.1.1] - 2024-03-21

+ Update - Schemas and tables renaming
+ Update - Move `PreFit` and `FullFit` to `moseq_train`
+ Update - Additional attributes and data type modification from `time` to `float` for `duration` to eliminate datetime formatting code
+ Update - Code refactoring in `make` functions and enhanced path handling
+ Update - `docs`, docstrings and table definitions
+ Update - `tutorial.ipynb` according to these changes and verify full functionality with Codespaces
+ Update - pipeline `images` according to these changes
+ Fix - `Dockerfile` environment variables
+ Update - Activation of one schema with two modules by updating `tutorial_pipeline.ipynb`
+ Update - remove PyPI release from `release.yml`
+ Update - README

## [0.1.0] - 2024-03-20

+ Add - `CHANGELOG` and version for first release
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ DataJoint Elements collectively standardize and automate data collection and ana
+ Clone the repository to your computer.

```bash
git clone https://github.com/<enter_github_username>/element-moseq
git clone https://github.com/<enter_github_username>/element-moseq.git
```

+ Install with `pip`:
Expand Down Expand Up @@ -72,4 +72,4 @@ MYSQL_VER=8.0 docker compose -f docker-compose-db.yaml up --build -d

1. We recommend you start by navigating to the `notebooks` directory on the left panel and go through the `tutorial.ipynb` Jupyter notebook. Execute the cells in the notebook to begin your walkthrough of the tutorial.

1. Once you are done, see the options available to you in the menu in the bottom-left corner. For example, in Codespace you will have an option to `Stop Current Codespace` but when running Dev Container on your own machine the equivalent option is `Reopen folder locally`. By default, GitHub will also automatically stop the Codespace after 30 minutes of inactivity. Once the Codespace is no longer being used, we recommend deleting the Codespace.
2. Once you are done, see the options available to you in the menu in the bottom-left corner. For example, in Codespace you will have an option to `Stop Current Codespace` but when running Dev Container on your own machine the equivalent option is `Reopen folder locally`. By default, GitHub will also automatically stop the Codespace after 30 minutes of inactivity. Once the Codespace is no longer being used, we recommend deleting the Codespace.
7 changes: 6 additions & 1 deletion docs/src/citation.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,9 @@ If your work uses the following resources, please cite the respective manuscript
+ [RRID:SCR_021894](https://scicrunch.org/resolver/SCR_021894)

+ Keypoint-MoSeq
+ [Manuscripts](https://www.biorxiv.org/content/10.1101/2023.03.16.532307v2.full.pdf)
+ Weinreb C, Pearl J, Lin S, Osman MAM, Zhang L, Annapragada S, Conlin E, Hoffman R,
Makowska S, Gillis WF and Jay M. Keypoint-MoSeq: parsing behavior by linking point
tracking to pose dynamics. BioRxiv. 2023 Dec 23. doi: https://doi.org/10.1101/2023.03.16.532307
+ Wiltschko AB, Johnson MJ, Iurilli G, Peterson RE, Katon JM, Pashkovski SL, Abraira VE,
Adams RP, Datta SR. Mapping sub-second structure in mouse behavior. Neuron. 2015 Dec 16;
88(6):1121-35.
4 changes: 1 addition & 3 deletions docs/src/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,4 @@ Key features include:
- Loading and formatting of 2D deeplabcut keypoint tracking data for model training
- Queue management and initiation of Keypoint-MoSeq analysis across multiple sessions
- Ingestion of analysis outcomes such as PCA, AR-HMM, and Keypoint-SLDS components
- Ingestion of analysis outcomes from motion sequencing inference


- Ingestion of analysis outcomes from motion sequencing inference
3 changes: 2 additions & 1 deletion docs/src/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@

DataJoint Element for Motion Sequencing with
[Keypoint-MoSeq](https://github.com/dattalab/keypoint-moseq){:target="_blank"},
from keypoint data extracted with [DeepLabCut](x){:target="_blank"}. DataJoint Elements collectively standardize and automate
from keypoint data extracted with [DeepLabCut](http://www.mackenziemathislab.org/deeplabcut){:target="_blank"}.
DataJoint Elements collectively standardize and automate
data collection and analysis for neuroscience experiments. Each Element is a modular
pipeline for data storage and processing with corresponding database tables that can be
combined with other Elements to assemble a fully functional pipeline.
Expand Down
2 changes: 1 addition & 1 deletion docs/src/partnerships.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
# Key partnerships

Element MoSeq was developed in collaboration with the [Keypoint-MoSeq developers](https://github.com/dattalab/keypoint-moseq) in Datta's Lab at Harvard Medical School to promote integration and interoperability between Keypoint-MoSeq and the DataJoint Element MoSeq.
Element MoSeq was developed in collaboration with the [Keypoint-MoSeq developers](https://github.com/dattalab/keypoint-moseq), particularly with Kai Fox from Datta's Lab at Harvard Medical School, to foster integration and interoperability between Keypoint-MoSeq and the DataJoint Element MoSeq.
55 changes: 27 additions & 28 deletions docs/src/pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,21 @@ corresponding table in the database. Within the pipeline, Element MoSeq
connects to upstream Elements including Lab, Animal, Session, and Event. For more
detailed documentation on each table, see the API docs for the respective schemas.

The Element is composed of two main schemas, `kpms_pca` and `kpms_model`. The `kpms_pca` schema is designed to handle the analysis and ingestion of PCA model for formatted keypoint tracking. The `kpms_model` schema is designed to handle the analysis and ingestion of Keypoint-MoSeq's motion sequencing on video recordings.
The Element is composed of two main schemas, `moseq_train` and `moseq_infer`. The `moseq_train` schema is designed to handle the analysis and ingestion of PCA model for formatted keypoint tracking and train the Kepoint-MoSeq model. The `moseq_infer` schema is designed to handle the analysis and ingestion of Keypoint-MoSeq's motion sequencing on video recordings by using one registered model.

## Diagrams

### `kpms_pca` module
### `moseq_train` module

- The `kpms_pca` schema is designed to handle the analysis and ingestion of a PCA model for formatted keypoint tracking.
- The `moseq_train` schema is designed to handle the analysis and ingestion of PCA model for formatted keypoint tracking and train the Kepoint-MoSeq model.

![pipeline](https://raw.githubusercontent.com/datajoint/element-moseq/main/images/pipeline_kpms_pca.svg)
![pipeline](https://raw.githubusercontent.com/datajoint/element-moseq/main/images/pipeline_moseq_train.svg)

### `kpms_model` module
### `moseq_infer` module

- The `kpms_model` schema is designed to handle the analysis and ingestion of Keypoint-MoSeq's motion sequencing on video recordings.
- The `moseq_infer` schema is designed to handle the analysis and ingestion of Keypoint-MoSeq's motion sequencing on video recordings by using one registered model.

![pipeline](https://raw.githubusercontent.com/datajoint/element-moseq/main/images/pipeline_kpms_model.svg)
![pipeline](https://raw.githubusercontent.com/datajoint/element-moseq/main/images/pipeline_moseq_infer.svg)

## Table Descriptions

Expand Down Expand Up @@ -49,36 +49,35 @@ The Element is composed of two main schemas, `kpms_pca` and `kpms_model`. The `k
| --- | --- |
| Session | Unique experimental session identifier |

### `kpms_pca` schema
### `model_train` schema

- For further details see the [kpms_pca schema API docs](https://datajoint.com/docs/elements/element-moseq/latest/api/element_moseq/kpms_pca/)
- For further details see the [`model_train` schema API docs](https://datajoint.com/docs/elements/element-moseq/latest/api/element_moseq/model_train/)

| Table | Description |
| --- | --- |
| PoseEstimationMethod | Table to store the pose estimation methods supported by the keypoint loader of `keypoint-moseq` package. |
| KeypointSet | Table to store the keypoint data and video set directory to train the model.|
| KeypointSet.VideoFile | IDs and file paths of each video file that will be used to train the model.|
| Bodyparts | Table to store the body parts to use in the analysis.|
| KeypointSet | Store keypoint data and video set directory for model training.|
| KeypointSet.VideoFile | IDs and file paths of each video file that will be used for model training. |
| Bodyparts | Store the body parts to use in the analysis. |
| PCATask | Staging table to define the PCA task and its output directory. |
| LoadKeypointSet | Table to create the `kpms_project_output_dir`, and create and update the `config.yml` by creating a new `dj_config.yml`. |
| PCAFitting | Automated fitting of the PCA model.|
| LatentDimension | Automated computation to calculate the latent dimension as one of the autoregressive hyperparameters (`ar_hypparams`) necessary for the model fitting. |
| PCAPrep | Setup the Keypoint-MoSeq project output directory (`kpms_project_output_dir`) creating the default `config.yml` and updating it in a new `dj_config.yml`. |
| PCAFit | Fit PCA model.|
| LatentDimension | Calculate the latent dimension as one of the autoregressive hyperparameters (`ar_hypparams`) necessary for the model fitting. |
| PreFitTask | Specify parameters for model (AR-HMM) pre-fitting. |
| PreFit | Fit AR-HMM model. |
| FullFitTask | Specify parameters for the model full-fitting. |
| FullFit | Fit the full (Keypoint-SLDS) model. |

### `moseq_infer` schema

### `kpms_model` schema

- For further details see the [kpms_model schema API docs](https://datajoint.com/docs/elements/element-moseq/latest/api/element_moseq/kpms_model/)
- For further details see the [`moseq_infer` schema API docs](https://datajoint.com/docs/elements/element-moseq/latest/api/element_moseq/moseq_infer/)

| Table | Description |
| --- | --- |
| PreFittingTask | Table to specify the parameters for the pre-fitting (AR-HMM) of the model. |
| PreFitting | Automated computation to fit a AR-HMM model. |
| FullFittingTask | Table to specify the parameters for the full fitting of the model. The full model will generally require a lower value of kappa to yield the same target syllable durations. |
| FullFitting | Automated computation to fit the full model. |
| Model | Table to register the models. |
| Model | Register a model. |
| VideoRecording | Set of video recordings for the Keypoint-MoSeq inference. |
| VideoRecording.File | File IDs and paths associated with a given `recording_id`. |
| InferenceTask | Table to specify the model, the video set, and the output directory for the inference task. |
| Inference | This table is used to infer the model results from the checkpoint file and save them to `{output_dir}/{model_name}/{inference_output_dir}/results.h5`. |
| Inference.MotionSequence | This table is used to store the results of the model inference.|
| Inference.GridMoviesSampledInstances | This table is used to store the grid movies sampled instances.|
| PoseEstimationMethod | Pose estimation methods supported by the keypoint loader of `keypoint-moseq` package. |
| InferenceTask | Staging table to define the Inference task and its output directory. |
| Inference | Infer the model from the checkpoint file and save the results as `results.h5` file. |
| Inference.MotionSequence | Results of the model inference. |
| Inference.GridMoviesSampledInstances | Store the sampled instances of the grid movies. |
Loading
Loading