Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add author and tools details in RO-Crate #18820
base: dev
Are you sure you want to change the base?
Add author and tools details in RO-Crate #18820
Changes from 4 commits
e824db7
a05e522
effa82e
468de8a
803a558
f26662a
a582e52
5a26dc2
875e8ea
58219b1
f930358
124db0b
0cbc508
f585677
e66652e
f1b404d
9a2cea8
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also try to use
identifier
here, as that is an option in the UI too when setting an organization as a creator. For example an ROR identifier could be used. (though admittedly it's more likely for an average user to just input their institute's URL in theurl
field)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this point I was not sure what to do I used
identifier
for the ORCID of the creator. Should I replace it by ROR or is something completely different you were suggesting ?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the section above (L228-L251), if the creator is a Person, then we assume the
identifier
field is an ORCID and that's all good.In this section, if the creator is an Organization, the
identifier
field won't represent an ORCID - instead it would (probably) be an ROR identifier. However, it's less likely to be used by users, as ROR isn't as widely known yet.So for the entity id on this line I would suggest: if the
identifier
exists, use that, otherwise use theurl
, or if neither exist, use an empty string.And in the
properties
you can includeidentifier
as well asurl
if the identifier exists in the creator_data.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workflows can have workflow inputs. Should we add them here as well? At least the types of the workflow inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice to have, yes.
This can even be done per tool (though more fiddly). There is already some capturing of the inputs and outputs as as
formalParameter
s, but it seems only to happen if data files are included in the crate. There seem to be some helper functions further down the file for this purpose:galaxy/lib/galaxy/model/store/ro_crate_utils.py
Line 359 in 39e38c9
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not always true. Tools could come from multiple toolsheds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was wondering if this part was relevant or if I should remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a
url
for each tool would be great - but if we can't get this due to the toolbox limitations, it should be removed.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work, you can't use the toolbox in this context. We can't load up the toolbox in celery workers, that would take too much time and memory. The best solution is to either get that data from a yet to be written toolshed endpoint or load up the tool on demand, neither of these are easy to do or fit into the context of this PR. Maybe you could focus on the other enhancements in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a shame that the toolbox can't be used in this way. But the most essential metadata about the tools is already captured without using the toolbox, so I think it's ok to remove this function and consider this as an enhancement which could be added in future.
It does mean we would have a greater need for the
url
on line 369, though, as without thexrefs
there's no alternative links to find the tool (and the rest of its metadata). Still not absolutely mandatory to haveurl
, but a definite best practice.(I wondered if we could retrieve the citation/xref/EDAM metadata from something like
job_attrs.txt
which is already part of the crate, but it's not included there. Nor is the url.)