Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline Architecture Refinement and Enhancement #3

Merged
merged 60 commits into from
Mar 26, 2024

Conversation

MilagrosMarin
Copy link
Contributor

@MilagrosMarin MilagrosMarin commented Mar 20, 2024

This PR incorporates feedback from previous PR#2 reviewers and implements modifications to the pipeline architecture.

Changes in pipeline:

  • Schema and table renaming:

    • Renamed kpms_pca to moseq_train
    • Renamed kpms_model to moseq_infer
    • Renamed LoadKeypointSet to PCAPrep
    • Renamed PrefittingTask to PreFitTask
    • Renamed PreFitting to PreFit
    • Renamed FullFittingTask to FullFit
    • Renamed FullFitting to FullFit
    • Renamed PCAFitting to PCAFit
  • Moved tables between schemas:

    • Moved PreFit and FullFit from moseq_infer to moseq_train
  • Additional attributes and data type modifications:

    • Added secondary attributes (frame_rate, model_desc,etc) and renamed some of them according to the previous pipeline modification ('pre_fit_duration,pre_fit_desc`, etc).
    • Modified data types for inference_duration, pre_fit_duration, and full_fit_duration from time to float and removed time formatting code
  • Code refactoring and reorganization:

    • Refactored make function of Inference and PCAPrep tables
    • Reorganized import dependencies and attribute order within table definitions for improved code readability and adherence to best practices.
    • Updated path handling and variable usage for enhanced clarity and consistency

Updates according to main changes:

  • Updated citation.md, index.md, partnerships.md, pipeline.md
  • Updated docstrings and descriptions of all tables
  • Updated tutorial.ipynb
  • Verified full functionality of the tutorial with Codespaces *
  • Updated pipeline images

Other major changes:

  • Activation of one schema with two modules by updating tutorial_pipeline.ipynb
  • Fixed Dockerfile environment variables
  • Add kpms_project_output_dir folder in the s3 bucket for tutorial purposes

Other minor changes:

  • Removed PyPi release from release.yml
  • Updated README
  • Update CHANGELOG and bump version
  • The tutorial's full functionality in Codespaces relies on mounting the outbox files from the public S3 bucket and integrating load functions for these files into the pipeline code. This is necessary because Codespaces does not provide write access to the public S3 bucket. We plan to implement this in the upcoming Pull Request.

@MilagrosMarin MilagrosMarin changed the title [WIP] New revisions Pipeline Architecture Refinement and Enhancement Mar 25, 2024
"pose_estimation_method",
)

kpms_root_inbox, kpms_root_outbox = get_kpms_root_data_dir()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no guarantee that get_kpms_root_data_dir() will return 2 directories. And there is no notion of inbox or outbox in the elements

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will modify this by using root and processed instead, as follows:

        kpms_root = moseq_infer.get_kpms_root_data_dir()
        kpms_processed = moseq_infer.get_kpms_processed_data_dir()

kpms_root_inbox, kpms_root_outbox = get_kpms_root_data_dir()
model_dir = find_full_path(
kpms_root_outbox,
(Model & "model_id = {}".format(model_id)).fetch1("model_dir"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like there may be a bug in this line, also, I'd suggest using f-string formatting

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've modified this piece of code as follows:

       kpms_root = get_kpms_root_data_dir()
        kpms_processed = get_kpms_processed_data_dir()

        model_dir = find_full_path(
            kpms_processed,
            (Model & f"model_id = {model_id}").fetch1("model_dir"),
        )

There are no errors triggered when running the tutorial locally.

fit_model,
)

_, kpms_root_outbox = moseq_infer.get_kpms_root_data_dir()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no guarantee that kpsm_root_outbox will be the 2nd element returned from the get_kpms_root_data_dir() function

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct. I've made the following alteration:

 kpms_processed = moseq_infer.get_kpms_processed_data_dir()

        kpms_project_output_dir = find_full_path(
            kpms_processed, (PCATask & key).fetch1("kpms_project_output_dir")
        )

@ttngu207
Copy link
Contributor

Thank you @MilagrosMarin, I left a few suggestions for improvement but overall this is ready to merge as the initial release

@MilagrosMarin
Copy link
Contributor Author

Thank you @MilagrosMarin, I left a few suggestions for improvement but overall this is ready to merge as the initial release

Thanks, @ttngu207 . Your feedback is greatly appreciated. I've implemented the suggested changes, and the code is now ready for merging. Please review. Thank you!

@ttngu207 ttngu207 merged commit 2e87c49 into datajoint:main Mar 26, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants