feat: Validate dataset properties in jobs #1473

sbliven · 2024-10-30T14:06:17Z

Description

Allows jobs to validate properties of the dataset.

Motivation

The validate action currently allows operators to enforce attributes of the DTO.

However, many jobs depend on jobParams.datasetLists to link a job to certain datasets. This allows validation of properties on datasets associated with the dataset.

The motivating example would be checking datasetLifecycle properties for certain jobs. These are currently hard-coded for special job types. This would allow them to be applied to custom job types as well.

configVersion: v1.0
jobs:
  - jobType: custom_archive
    create:
      auth: "#datasetAccess"
      actions:
        - actionType: validate
          datasets:
            datasetlifecycle.archivable:
              const: true

Changes:

Add datasets property to validate actions (only in create operations) to validate linked dataset properties

Tests included

(WIP pending tests and documentation)

Included for each change/fix?
Passing?

Documentation

swagger documentation updated (required for API changes)
official documentation updated

official documentation info

Docs are included in SciCatProject/documentation#56

sbliven · 2024-11-03T22:40:54Z

Documentation is now added to SciCatProject/documentation#56

sbliven · 2024-11-05T08:26:38Z

The unit tests pass because I'm manually mocking the datasetService, so the core functionality works.

The mocha tests fail because the datasetService doesn't get injected properly by nest. I need to figure out how to properly initialize the ValidateCreateAction before merging this.

During a `create` operation, the `validate` action can take a `datasets` option listing path/schema pairs that will be applied to any datasets in jobParams.datasetList. Datasets are fetched from the database during the DTO validation step if needed. Validation of archive/retrieve/public lifecycle properties is no longer done automatically, but must be configured in the jobConfig file. Details: - Separate ValidateCreateJobAction (create) and ValidateJobAction (update) - Fix 'Internal Server Error' from job controller if a session timed out. If the session times out then only public jobs will be visible. - Update documentation for validate action - Remove checkDatasetState from the job controller. This must now be configured with a validate action - Add unit test - Remove hard-coded job types and dataset states

sbliven · 2024-12-18T14:00:53Z

I rebased this onto the current release-jobs.

Tests are currently failing due to a circular dependency in the module DI. I'm looking into how to fix it.

DatasetsService is request scoped, so it can't be added on module init. This also avoids the circular dependency from import datasetService during module initialization.

Mongoose documents do some Proxy shenanigans that confuses JSONPath. They should first be converted to simple objects.

- Update error messages for archivable, retrievable, and public jobs. These used to be bespoke errors, but now follow the standard 'Invalid request' error message. - Change dataset property validation to throw 409 errors, to be consistent with the previous behavior - Add datasetList to many job DTOs that now validate it - Set archivable/retrievable properly in test datasets

sbliven · 2025-02-02T06:41:08Z

Tests are fixed, so this is ready for review. Some tests changed, so maybe double check that we intended the new behavior (eg I requiring more properties in the example jobs and datasets)

despadam · 2025-02-06T12:16:51Z

jobConfig.example.yaml

+          datasets:
+            "datasetlifecycle.archivable":
+              const: true


I have to admit that I am a bit confused with this approach. I understand that the idea is to avoid hardcoded job types and allow users to easily configure their own jobs, however isn't this now also hardcoded in the config file? And also now more prone to errors since it's provided by the user. I realize that for new facility-specific job types that we don't cover in the core, perhaps this is the only way. But thinking about it the other way round, why would someone define an archive job for datasets with archivable=false? Shouldn't this already be the default? Otherwise we always force users to add archivable=true for archive jobs even though is it self-explanatory.
So my suggestion would be that for archive/retrieve jobs we do the dataset lifecycle validation by default, and only ask users to provide this when they want to define a new job type. But that leads us back to hardcoded job types I think? I am happy to discuss this further, as I am clearly confused 😅 Perhaps I am misunderstanding the approach?

why would someone define an archive job for datasets with archivable=false?

They shouldn't. This was previously enforced by scicat for the type:archive job type; now it can also be configured with custom job names.

Shouldn't this already be the default?

I could see an argument for automatically adding this validate action to any job called archive (and the equivalent for retrieve and public as a special case, for backwards compatibility. Or at least warning if it isn't configured. I think that can be a later discussion, as I can think of several ways to implement it, most of which boil down to syntactic sugar for the syntax here.

despadam

Perhaps then we could proceed with this approach for now and discuss #1692 also with Max before making a final decision. I just have one question below.

despadam · 2025-02-07T09:58:35Z

src/config/job-config/actions/validateaction/validateaction.ts

+    const filter = {
+      where: {
+        pid: {
+          $in: datasetIds,
+        },
+      },
+    };


I fail to understand how the datasetlifecycle state is being checked here. Could you provide an explanation please?

sbliven self-assigned this Oct 30, 2024

sbliven requested a review from despadam October 30, 2024 14:28

sbliven force-pushed the validateaction branch from c689643 to be8fe75 Compare December 18, 2024 10:44

sbliven force-pushed the validateaction branch from be8fe75 to 5e32d75 Compare December 18, 2024 10:53

sbliven marked this pull request as ready for review December 18, 2024 10:56

sbliven added 4 commits December 18, 2024 20:23

Fix circular dependency

1ed4efb

Resolve DatasetsService at runtime.

4bc4a35

DatasetsService is request scoped, so it can't be added on module init. This also avoids the circular dependency from import datasetService during module initialization.

Fix bug with dataset validation

a66d1dc

Mongoose documents do some Proxy shenanigans that confuses JSONPath. They should first be converted to simple objects.

sbliven mentioned this pull request Feb 3, 2025

bug[release-jobs]: Circular Dependency in ValidateJobAction #1668

Open

sbliven assigned despadam Feb 4, 2025

despadam reviewed Feb 6, 2025

View reviewed changes

sbliven mentioned this pull request Feb 7, 2025

Enable default validate job actions #1692

Open

despadam reviewed Feb 7, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Validate dataset properties in jobs #1473

feat: Validate dataset properties in jobs #1473

sbliven commented Oct 30, 2024 •

edited

Loading

sbliven commented Nov 3, 2024

sbliven commented Nov 5, 2024

sbliven commented Dec 18, 2024

sbliven commented Feb 2, 2025

despadam Feb 6, 2025

sbliven Feb 7, 2025 •

edited

Loading

sbliven Feb 7, 2025

despadam left a comment

despadam Feb 7, 2025

feat: Validate dataset properties in jobs #1473

Are you sure you want to change the base?

feat: Validate dataset properties in jobs #1473

Conversation

sbliven commented Oct 30, 2024 • edited Loading

Description

Motivation

Changes:

Tests included

Documentation

official documentation info

sbliven commented Nov 3, 2024

sbliven commented Nov 5, 2024

sbliven commented Dec 18, 2024

sbliven commented Feb 2, 2025

despadam Feb 6, 2025

Choose a reason for hiding this comment

sbliven Feb 7, 2025 • edited Loading

Choose a reason for hiding this comment

sbliven Feb 7, 2025

Choose a reason for hiding this comment

despadam left a comment

Choose a reason for hiding this comment

despadam Feb 7, 2025

Choose a reason for hiding this comment

sbliven commented Oct 30, 2024 •

edited

Loading

sbliven Feb 7, 2025 •

edited

Loading