-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Validate dataset properties in jobs #1473
base: release-jobs
Are you sure you want to change the base?
Conversation
Documentation is now added to SciCatProject/documentation#56 |
The unit tests pass because I'm manually mocking the datasetService, so the core functionality works. The mocha tests fail because the datasetService doesn't get injected properly by nest. I need to figure out how to properly initialize the |
c689643
to
be8fe75
Compare
During a `create` operation, the `validate` action can take a `datasets` option listing path/schema pairs that will be applied to any datasets in jobParams.datasetList. Datasets are fetched from the database during the DTO validation step if needed. Validation of archive/retrieve/public lifecycle properties is no longer done automatically, but must be configured in the jobConfig file. Details: - Separate ValidateCreateJobAction (create) and ValidateJobAction (update) - Fix 'Internal Server Error' from job controller if a session timed out. If the session times out then only public jobs will be visible. - Update documentation for validate action - Remove checkDatasetState from the job controller. This must now be configured with a validate action - Add unit test - Remove hard-coded job types and dataset states
be8fe75
to
5e32d75
Compare
I rebased this onto the current release-jobs. Tests are currently failing due to a circular dependency in the module DI. I'm looking into how to fix it. |
DatasetsService is request scoped, so it can't be added on module init. This also avoids the circular dependency from import datasetService during module initialization.
Mongoose documents do some Proxy shenanigans that confuses JSONPath. They should first be converted to simple objects.
- Update error messages for archivable, retrievable, and public jobs. These used to be bespoke errors, but now follow the standard 'Invalid request' error message. - Change dataset property validation to throw 409 errors, to be consistent with the previous behavior - Add datasetList to many job DTOs that now validate it - Set archivable/retrievable properly in test datasets
Tests are fixed, so this is ready for review. Some tests changed, so maybe double check that we intended the new behavior (eg I requiring more properties in the example jobs and datasets) |
datasets: | ||
"datasetlifecycle.archivable": | ||
const: true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have to admit that I am a bit confused with this approach. I understand that the idea is to avoid hardcoded job types and allow users to easily configure their own jobs, however isn't this now also hardcoded in the config file? And also now more prone to errors since it's provided by the user. I realize that for new facility-specific job types that we don't cover in the core, perhaps this is the only way. But thinking about it the other way round, why would someone define an archive
job for datasets with archivable=false
? Shouldn't this already be the default? Otherwise we always force users to add archivable=true
for archive
jobs even though is it self-explanatory.
So my suggestion would be that for archive/retrieve
jobs we do the dataset lifecycle validation by default, and only ask users to provide this when they want to define a new job type. But that leads us back to hardcoded job types I think? I am happy to discuss this further, as I am clearly confused 😅 Perhaps I am misunderstanding the approach?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would someone define an archive job for datasets with archivable=false?
They shouldn't. This was previously enforced by scicat for the type:archive
job type; now it can also be configured with custom job names.
Shouldn't this already be the default?
I could see an argument for automatically adding this validate action to any job called archive
(and the equivalent for retrieve
and public
as a special case, for backwards compatibility. Or at least warning if it isn't configured. I think that can be a later discussion, as I can think of several ways to implement it, most of which boil down to syntactic sugar for the syntax here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps then we could proceed with this approach for now and discuss #1692 also with Max before making a final decision. I just have one question below.
const filter = { | ||
where: { | ||
pid: { | ||
$in: datasetIds, | ||
}, | ||
}, | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fail to understand how the datasetlifecycle
state is being checked here. Could you provide an explanation please?
Description
Allows jobs to validate properties of the dataset.
Motivation
The
validate
action currently allows operators to enforce attributes of the DTO.However, many jobs depend on
jobParams.datasetLists
to link a job to certain datasets. This allows validation of properties on datasets associated with the dataset.The motivating example would be checking datasetLifecycle properties for certain jobs. These are currently hard-coded for special job types. This would allow them to be applied to custom job types as well.
Changes:
datasets
property to validate actions (only increate
operations) to validate linked dataset propertiesTests included
(WIP pending tests and documentation)
Documentation
official documentation info
Docs are included in SciCatProject/documentation#56