Merge pull request #3 from MilagrosMarin/main

Pipeline Architecture Refinement and Enhancement
datajoint · Mar 26, 2024 · 2e87c49 · 2e87c49
2 parents 1dea4e4 + 83c68d5
commit 2e87c49
Show file tree

Hide file tree

Showing 23 changed files with 3,073 additions and 2,967 deletions.
diff --git a/.devcontainer/Dockerfile b/.devcontainer/Dockerfile
@@ -44,8 +44,9 @@ ENV DJ_HOST fakeservices.datajoint.io
 ENV DJ_USER root
 ENV DJ_PASS simple
 
-ENV KPMS_ROOT_DATA_DIR /workspaces/element-moseq/example_data/inbox
-ENV KPMS_ROOT_OUTPUT_DIR /workspaces/element-moseq/example_data/outbox
+ENV DATA_MOUNTPOINT /workspaces/element-moseq/example_data
+ENV KPMS_ROOT_DATA_DIR $DATA_MOUNTPOINT/inbox
+ENV KPMS_PROCESSED_DATA_DIR $DATA_MOUNTPOINT/outbox
 ENV DATABASE_PREFIX neuro_
 
 USER vscode

diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json
@@ -6,8 +6,8 @@
 	"remoteEnv": {
 		"LOCAL_WORKSPACE_FOLDER": "${localWorkspaceFolder}"
 	},
-	"onCreateCommand": "mkdir -p ${KPMS_ROOT_DATA_DIR} && pip install -e .",
-	"postStartCommand": "docker volume prune -f && s3fs ${DJ_PUBLIC_S3_LOCATION} ${KPMS_ROOT_DATA_DIR} -o nonempty,multipart_size=530,endpoint=us-east-1,url=http://s3.amazonaws.com,public_bucket=1",
+	"onCreateCommand": "mkdir -p ${DATA_MOUNTPOINT} && pip install -e .",
+	"postStartCommand": "docker volume prune -f && s3fs ${DJ_PUBLIC_S3_LOCATION} ${DATA_MOUNTPOINT} -o nonempty,multipart_size=530,endpoint=us-east-1,url=http://s3.amazonaws.com,public_bucket=1",
 	"hostRequirements": {
 		"cpus": 4,
 		"memory": "8gb",

diff --git a/.github/workflows/release.yaml b/.github/workflows/release.yaml
@@ -4,14 +4,6 @@ on:
 jobs:
   make_github_release:
     uses: datajoint/.github/.github/workflows/make_github_release.yaml@main
-  pypi_release:
-    needs: make_github_release
-    uses: datajoint/.github/.github/workflows/pypi_release.yaml@main
-    secrets:
-      TWINE_USERNAME: ${{secrets.TWINE_USERNAME}}
-      TWINE_PASSWORD: ${{secrets.TWINE_PASSWORD}}
-    with:
-      UPLOAD_URL: ${{needs.make_github_release.outputs.release_upload_url}}
   mkdocs_release:
     uses: datajoint/.github/.github/workflows/mkdocs_release.yaml@main
     permissions: 

diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,6 +3,20 @@
 Observes [Semantic Versioning](https://semver.org/spec/v2.0.0.html) standard and 
 [Keep a Changelog](https://keepachangelog.com/en/1.0.0/) convention.
 
+## [0.1.1] - 2024-03-21
+
++ Update - Schemas and tables renaming
++ Update - Move `PreFit` and `FullFit` to `moseq_train`
++ Update - Additional attributes and data type modification from `time` to `float` for `duration` to eliminate datetime formatting code
++ Update - Code refactoring in `make` functions and enhanced path handling
++ Update - `docs`, docstrings and table definitions
++ Update - `tutorial.ipynb` according to these changes and verify full functionality with Codespaces
++ Update - pipeline `images` according to these changes
++ Fix - `Dockerfile` environment variables
++ Update - Activation of one schema with two modules by updating `tutorial_pipeline.ipynb`
++ Update - remove PyPI release from `release.yml`
++ Update - README 
+
 ## [0.1.0] - 2024-03-20
 
 + Add - `CHANGELOG` and version for first release

diff --git a/README.md b/README.md
@@ -19,7 +19,7 @@ DataJoint Elements collectively standardize and automate data collection and ana
 + Clone the repository to your computer.
 
   ```bash
-  git clone https://github.com/<enter_github_username>/element-moseq
+  git clone https://github.com/<enter_github_username>/element-moseq.git
   ```
 
 + Install with `pip`:
@@ -72,4 +72,4 @@ MYSQL_VER=8.0 docker compose -f docker-compose-db.yaml up --build -d
 
 1. We recommend you start by navigating to the `notebooks` directory on the left panel and go through the `tutorial.ipynb` Jupyter notebook. Execute the cells in the notebook to begin your walkthrough of the tutorial.
 
-1. Once you are done, see the options available to you in the menu in the bottom-left corner. For example, in Codespace you will have an option to `Stop Current Codespace` but when running Dev Container on your own machine the equivalent option is `Reopen folder locally`. By default, GitHub will also automatically stop the Codespace after 30 minutes of inactivity.  Once the Codespace is no longer being used, we recommend deleting the Codespace.
+2. Once you are done, see the options available to you in the menu in the bottom-left corner. For example, in Codespace you will have an option to `Stop Current Codespace` but when running Dev Container on your own machine the equivalent option is `Reopen folder locally`. By default, GitHub will also automatically stop the Codespace after 30 minutes of inactivity.  Once the Codespace is no longer being used, we recommend deleting the Codespace.
diff --git a/docs/src/citation.md b/docs/src/citation.md
@@ -10,4 +10,9 @@ If your work uses the following resources, please cite the respective manuscript
   + [RRID:SCR_021894](https://scicrunch.org/resolver/SCR_021894)
 
 + Keypoint-MoSeq
-  + [Manuscripts](https://www.biorxiv.org/content/10.1101/2023.03.16.532307v2.full.pdf)
+  + Weinreb C, Pearl J, Lin S, Osman MAM, Zhang L, Annapragada S, Conlin E, Hoffman R, 
+  Makowska S, Gillis WF and Jay M. Keypoint-MoSeq: parsing behavior by linking point 
+  tracking to pose dynamics. BioRxiv. 2023 Dec 23. doi: https://doi.org/10.1101/2023.03.16.532307
+  + Wiltschko AB, Johnson MJ, Iurilli G, Peterson RE, Katon JM, Pashkovski SL, Abraira VE, 
+  Adams RP, Datta SR. Mapping sub-second structure in mouse behavior. Neuron. 2015 Dec 16;
+  88(6):1121-35.
diff --git a/docs/src/concepts.md b/docs/src/concepts.md
@@ -23,6 +23,4 @@ Key features include:
 - Loading and formatting of 2D deeplabcut keypoint tracking data for model training
 - Queue management and initiation of Keypoint-MoSeq analysis across multiple sessions
 - Ingestion of analysis outcomes such as PCA, AR-HMM, and Keypoint-SLDS components
-- Ingestion of analysis outcomes from motion sequencing inference 
-
-
+- Ingestion of analysis outcomes from motion sequencing inference
diff --git a/docs/src/index.md b/docs/src/index.md
@@ -2,7 +2,8 @@
 
 DataJoint Element for Motion Sequencing with 
 [Keypoint-MoSeq](https://github.com/dattalab/keypoint-moseq){:target="_blank"}, 
-from keypoint data extracted with [DeepLabCut](x){:target="_blank"}. DataJoint Elements collectively standardize and automate
+from keypoint data extracted with [DeepLabCut](http://www.mackenziemathislab.org/deeplabcut){:target="_blank"}. 
+DataJoint Elements collectively standardize and automate
 data collection and analysis for neuroscience experiments. Each Element is a modular
 pipeline for data storage and processing with corresponding database tables that can be
 combined with other Elements to assemble a fully functional pipeline.

diff --git a/docs/src/partnerships.md b/docs/src/partnerships.md
@@ -1,3 +1,3 @@
 # Key partnerships
 
-Element MoSeq was developed in collaboration with the [Keypoint-MoSeq developers](https://github.com/dattalab/keypoint-moseq) in Datta's Lab at Harvard Medical School to promote integration and interoperability between Keypoint-MoSeq and the DataJoint Element MoSeq.
+Element MoSeq was developed in collaboration with the [Keypoint-MoSeq developers](https://github.com/dattalab/keypoint-moseq), particularly with Kai Fox from Datta's Lab at Harvard Medical School, to foster integration and interoperability between Keypoint-MoSeq and the DataJoint Element MoSeq.
diff --git a/docs/src/pipeline.md b/docs/src/pipeline.md
@@ -5,21 +5,21 @@ corresponding table in the database.  Within the pipeline, Element MoSeq
 connects to upstream Elements including Lab, Animal, Session, and Event. For more 
 detailed documentation on each table, see the API docs for the respective schemas.
 
-The Element is composed of two main schemas, `kpms_pca` and `kpms_model`. The `kpms_pca` schema is designed to handle the analysis and ingestion of PCA model for formatted keypoint tracking. The `kpms_model` schema is designed to handle the analysis and ingestion of Keypoint-MoSeq's motion sequencing on video recordings.
+The Element is composed of two main schemas, `moseq_train` and `moseq_infer`. The `moseq_train` schema is designed to handle the analysis and ingestion of PCA model for formatted keypoint tracking and train the Kepoint-MoSeq model. The `moseq_infer` schema is designed to handle the analysis and ingestion of Keypoint-MoSeq's motion sequencing on video recordings by using one registered model.
 
 ## Diagrams
 
-### `kpms_pca` module
+### `moseq_train` module
 
-- The `kpms_pca` schema is designed to handle the analysis and ingestion of a PCA model for formatted keypoint tracking.
+- The `moseq_train` schema is designed to handle the analysis and ingestion of PCA model for formatted keypoint tracking and train the Kepoint-MoSeq model. 
 
-     ![pipeline](https://raw.githubusercontent.com/datajoint/element-moseq/main/images/pipeline_kpms_pca.svg)
+     ![pipeline](https://raw.githubusercontent.com/datajoint/element-moseq/main/images/pipeline_moseq_train.svg)
 
-### `kpms_model` module
+### `moseq_infer` module
 
-- The `kpms_model` schema is designed to handle the analysis and ingestion of Keypoint-MoSeq's motion sequencing on video recordings.
+- The `moseq_infer` schema is designed to handle the analysis and ingestion of Keypoint-MoSeq's motion sequencing on video recordings by using one registered model.
 
-     ![pipeline](https://raw.githubusercontent.com/datajoint/element-moseq/main/images/pipeline_kpms_model.svg)
+     ![pipeline](https://raw.githubusercontent.com/datajoint/element-moseq/main/images/pipeline_moseq_infer.svg)
 
 ## Table Descriptions
 
@@ -49,36 +49,35 @@ The Element is composed of two main schemas, `kpms_pca` and `kpms_model`. The `k
 | --- | --- |
 | Session | Unique experimental session identifier |
 
-### `kpms_pca` schema
+### `model_train` schema
 
-- For further details see the [kpms_pca schema API docs](https://datajoint.com/docs/elements/element-moseq/latest/api/element_moseq/kpms_pca/)
+- For further details see the [`model_train` schema API docs](https://datajoint.com/docs/elements/element-moseq/latest/api/element_moseq/model_train/)
 
 | Table | Description |
 | --- | --- |
-| PoseEstimationMethod | Table to store the pose estimation methods supported by the keypoint loader of `keypoint-moseq` package. |
-| KeypointSet | Table to store the keypoint data and video set directory to train the model.|
-| KeypointSet.VideoFile | IDs and file paths of each video file that will be used to train the model.|
-| Bodyparts | Table to store the body parts to use in the analysis.|
+| KeypointSet | Store keypoint data and video set directory for model training.|
+| KeypointSet.VideoFile | IDs and file paths of each video file that will be used for model training. |
+| Bodyparts | Store the body parts to use in the analysis. |
 | PCATask | Staging table to define the PCA task and its output directory. |
-| LoadKeypointSet | Table to create the `kpms_project_output_dir`, and create and update the `config.yml` by creating a new `dj_config.yml`. |
-| PCAFitting | Automated fitting of the PCA model.|
-| LatentDimension | Automated computation to calculate the latent dimension as one of the autoregressive hyperparameters (`ar_hypparams`) necessary for the model fitting. |
+| PCAPrep | Setup the Keypoint-MoSeq project output directory (`kpms_project_output_dir`) creating the default `config.yml` and updating it in a new `dj_config.yml`. |
+| PCAFit | Fit PCA model.|
+| LatentDimension | Calculate the latent dimension as one of the autoregressive hyperparameters (`ar_hypparams`) necessary for the model fitting. |
+| PreFitTask | Specify parameters for model (AR-HMM) pre-fitting. |
+| PreFit | Fit AR-HMM model. |
+| FullFitTask | Specify parameters for the model full-fitting. |
+| FullFit | Fit the full (Keypoint-SLDS) model. |
 
+### `moseq_infer` schema
 
-### `kpms_model` schema
-
-- For further details see the [kpms_model schema API docs](https://datajoint.com/docs/elements/element-moseq/latest/api/element_moseq/kpms_model/)
+- For further details see the [`moseq_infer` schema API docs](https://datajoint.com/docs/elements/element-moseq/latest/api/element_moseq/moseq_infer/)
 
 | Table | Description |
 | --- | --- |
-| PreFittingTask | Table to specify the parameters for the pre-fitting (AR-HMM) of the model. |
-| PreFitting | Automated computation to fit a AR-HMM model. |
-| FullFittingTask | Table to specify the parameters for the full fitting of the model. The full model will generally require a lower value of kappa to yield the same target syllable durations. |
-| FullFitting | Automated computation to fit the full model. |
-| Model | Table to register the models. |
+| Model | Register a model. |
 | VideoRecording | Set of video recordings for the Keypoint-MoSeq inference. |
 | VideoRecording.File | File IDs and paths associated with a given `recording_id`. |
-| InferenceTask | Table to specify the model, the video set, and the output directory for the inference task. |
-| Inference | This table is used to infer the model results from the checkpoint file and save them to `{output_dir}/{model_name}/{inference_output_dir}/results.h5`. |
-| Inference.MotionSequence | This table is used to store the results of the model inference.|
-| Inference.GridMoviesSampledInstances | This table is used to store the grid movies sampled instances.|
+| PoseEstimationMethod | Pose estimation methods supported by the keypoint loader of `keypoint-moseq` package. |
+| InferenceTask | Staging table to define the Inference task and its output directory. |
+| Inference | Infer the model from the checkpoint file and save the results as `results.h5` file. |
+| Inference.MotionSequence | Results of the model inference. |
+| Inference.GridMoviesSampledInstances | Store the sampled instances of the grid movies. |