-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add author and tools details in RO-Crate #18820
base: dev
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice. Any chance you could also add or extend tests ? 3dbe289#diff-9f3ad661ca1701467f48508ea79951efe90aa29a6d02dec3f2998d831e755bb4 contains the existing examples.
Thanks a lot for the review and help @mvdbeek ! |
@OliverWoolland could you review this? I think this solves many of the issues you had about the Workflow Run Crate from Galaxy |
TO DO in a 2nd time (maybe in a later PR but I'm just putting that here to keep it in mind):
|
"name": tool_name, | ||
"version": tool_version, | ||
"description": tool_description, | ||
"url": "https://toolshed.g2.bx.psu.edu", # URL if relevant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not always true. Tools could come from multiple toolsheds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I was wondering if this part was relevant or if I should remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a url
for each tool would be great - but if we can't get this due to the toolbox limitations, it should be removed.
|
||
# Add tools used in the workflow | ||
self._add_tools(crate) | ||
self._add_steps(crate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Workflows can have workflow inputs. Should we add them here as well? At least the types of the workflow inputs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice to have, yes.
This can even be done per tool (though more fiddly). There is already some capturing of the inputs and outputs as as formalParameter
s, but it seems only to happen if data files are included in the crate. There seem to be some helper functions further down the file for this purpose:
def _add_step_parameter_fp(self, step: WorkflowInvocationStep, crate: ROCrate): |
Quick update from the RO-Crate side (with apologies from the long delay) - the metadata that's being added looks fine to me as I read the code, but I would like to look at an example RO-Crate generated using this PR to make sure the JSON-LD output looks like it should. I'm waiting for @pauldg or @OliverWoolland to provide me with such a crate, whoever gets there first |
Hi all, I've been trying to test these changes on a local 24.1.3 instance I have (base commit 71f5048) I've not been able to get the revisions to run for me, with errors being thrown relating to the use of the toolbox. I think I've tracked the problem down to the origin of this PR being usegalaxy-eu rather than galaxyproject. The two forks seem to be handling the toolbox a little differently. I'd be curious if others have found the same? Or if there is an obvious solution? Apologies if I have misunderstood anything! |
@OliverWoolland what kind of errors are you seeing? There should only be minor differences on .eu compared to the upstream (release_24.1...usegalaxy-eu:galaxy:release_24.1_europe) |
Thanks @martenson! This is the error I see [2024-11-15 10:43:15,368: ERROR/main] Task galaxy.prepare_invocation_download[2d38756c-b893-4a20-8ca7-80e243608cc4] raised unexpected: AttributeError("'GalaxyManagerApplication' object has no attribute '_toolbox'")
Traceback (most recent call last):
File "/galaxy/server/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 453, in trace_task
R = retval = fun(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/.venv/lib/python3.12/site-packages/celery/app/trace.py", line 736, in __protected_call__
return self.run(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/lib/galaxy/celery/__init__.py", line 184, in wrapper
rval = app.magic_partial(func)(*args, **kwds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/.venv/lib/python3.12/site-packages/lagom/wrapping.py", line 28, in _bound_func
return inner_func(*bound_args, **bound_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/.venv/lib/python3.12/site-packages/lagom/wrapping.py", line 45, in _error_handling_func
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/lib/galaxy/celery/tasks.py", line 392, in prepare_invocation_download
model_store_manager.prepare_invocation_download(request)
File "/galaxy/server/lib/galaxy/managers/model_stores.py", line 156, in prepare_invocation_download
with model.store.get_export_store_factory(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/lib/galaxy/model/store/__init__.py", line 2481, in __exit__
self._finalize()
File "/galaxy/server/lib/galaxy/model/store/__init__.py", line 2850, in _finalize
ro_crate = self._init_crate()
^^^^^^^^^^^^^^^^^^
File "/galaxy/server/lib/galaxy/model/store/__init__.py", line 2518, in _init_crate
invocation_crate_builder = WorkflowRunCrateProfileBuilder(self)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/lib/galaxy/model/store/ro_crate_utils.py", line 49, in __init__
self.toolbox = self.model_store.app.toolbox
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/galaxy/server/lib/galaxy/app.py", line 373, in toolbox
return self._toolbox
^^^^^^^^^^^^^
AttributeError: 'GalaxyManagerApplication' object has no attribute '_toolbox'. Did you mean: 'toolbox'? |
I'm not sure if that is the issue, but note that these changes are on the |
Fair enough! I'll try and cross-check myself |
I don't have a local development Galaxy set up already, so I cloned Galaxy dev, checked out this PR, then installed Galaxy from scratch. When I created a simple workflow, invoked it, and tried to export as RO-Crate, I got the same error as @OliverWoolland above.
I checked out Finally, I merged So the error appears to be somewhere in this PR, but I don't have the expertise required to understand or fix it. It's visible in CI too - https://github.com/galaxyproject/galaxy/actions/runs/11610465172/job/32407941424?pr=18820 has some 500 errors which I think come from this problem. |
Returns: | ||
dict: A dictionary containing citations, xrefs, and EDAM operations for the tool. | ||
""" | ||
tool = self.toolbox.get_tool(tool_id) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't work, you can't use the toolbox in this context. We can't load up the toolbox in celery workers, that would take too much time and memory. The best solution is to either get that data from a yet to be written toolshed endpoint or load up the tool on demand, neither of these are easy to do or fit into the context of this PR. Maybe you could focus on the other enhancements in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a shame that the toolbox can't be used in this way. But the most essential metadata about the tools is already captured without using the toolbox, so I think it's ok to remove this function and consider this as an enhancement which could be added in future.
It does mean we would have a greater need for the url
on line 369, though, as without the xrefs
there's no alternative links to find the tool (and the rest of its metadata). Still not absolutely mandatory to have url
, but a definite best practice.
(I wondered if we could retrieve the citation/xref/EDAM metadata from something like job_attrs.txt
which is already part of the crate, but it's not included there. Nor is the url.)
I had a closer look and this is what I found: The export process happens in the context of a Celery task - a background task that runs out of the main galaxy process. The "galaxy app" in this context is a Here we can see that the toolbox is set up only in the UniverseApplication (full-fledged Galaxy app): Line 775 in 455da5c
Moving this call (along with possibly other toolbox-related calls) to the parent class Maybe someone else from the @galaxyproject/wg-backend can answer this question? Edit: lol, Marius was answering at the same time here #18820 (comment) so, that would be the answer. |
Hello all, Thanks again for all your help !! |
Yes, or drop the things which require tool access from this PR. |
@Marie59 please don't be discouraged, this is good work on the whole and I'm happy to see the RO-Crate outputs being improved :) As I see it, only a small amount of lines need to be removed from this PR to take out the toolbox-dependent bits (namely the If you decide to make a new PR instead, so that the toolbox code is still visible here if someone wants to find it again, that's fine as well. In that case, please tag me in the new one so I can keep track of it! |
Not discouraged no worries ! Thanks for the encouragements ! I'll remove what need to be removed here (and keep a back up). |
Hello all ! |
Can you resolve the conflict in test/unit/data/model/test_model_store.py ? |
I don't understand the last errors I get ... |
This comment was marked as resolved.
This comment was marked as resolved.
Co-authored-by: Nicola Soranzo <[email protected]>
Co-authored-by: Nicola Soranzo <[email protected]>
@nsoranzo thank you for the help ! I was a bit lost !! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks great, thanks @Marie59
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at the output JSON-LD and I've made some suggestions based on the Provenance Run Crate profile - Galaxy RO-Crates don't strictly conform to this, but it offers some best practices that are applicable here.
From a Workflow Run Crate profile perspective (the less-strict one we actually intend to conform to), the metadata added in this PR looks good.
crate, | ||
creator_data.get( | ||
"url", "" | ||
), # Use URL as identifier if available, otherwise empty string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could also try to use identifier
here, as that is an option in the UI too when setting an organization as a creator. For example an ROR identifier could be used. (though admittedly it's more likely for an average user to just input their institute's URL in the url
field)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this point I was not sure what to do I used identifier
for the ORCID of the creator. Should I replace it by ROR or is something completely different you were suggesting ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per the section above (L228-L251), if the creator is a Person, then we assume the identifier
field is an ORCID and that's all good.
In this section, if the creator is an Organization, the identifier
field won't represent an ORCID - instead it would (probably) be an ROR identifier. However, it's less likely to be used by users, as ROR isn't as widely known yet.
So for the entity id on this line I would suggest: if the identifier
exists, use that, otherwise use the url
, or if neither exist, use an empty string.
And in the properties
you can include identifier
as well as url
if the identifier exists in the creator_data.
"name": tool_name, | ||
"version": tool_version, | ||
"description": tool_description, | ||
"url": "https://toolshed.g2.bx.psu.edu", # URL if relevant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a url
for each tool would be great - but if we can't get this due to the toolbox limitations, it should be removed.
|
||
# Add tools used in the workflow | ||
self._add_tools(crate) | ||
self._add_steps(crate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would be nice to have, yes.
This can even be done per tool (though more fiddly). There is already some capturing of the inputs and outputs as as formalParameter
s, but it seems only to happen if data files are included in the crate. There seem to be some helper functions further down the file for this purpose:
def _add_step_parameter_fp(self, step: WorkflowInvocationStep, crate: ROCrate): |
Thanks a lot for the review ! I tried to add as much as I could your suggestions ! I don't understand the last comment though on Is everything else good ? |
Co-authored-by: Eli Chadwick <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Revisiting my earlier comments about formalParameter
s, I think we can leave it out of this PR. It's not essential and would take a bit of fiddly work to implement, so it can be looked at another time.
My other unresolved comment here is an optional improvement, not blocking.
As such, I'm happy for this PR to be merged now if the core devs approve :) thanks again for your work on this @Marie59 !
Here we go again !
I am launching this PR on a bit of work I did to ameliorate the RO-Crate plugin.
I updated the
_add_worfklow
function. I added the information on the author of the workflow:The update checks if the workflow attribute 'creator_metadata' exist. If it exists, it writes into the ro-crate the name of the creator (if there is no name it leaves a "") same for the idenifier (by default i put the ORCID maybe it's not a good Idea and if there is no Orcid it leaves a ""). If there is no 'creator_metadata' nothing is written on the creator.
I created a function
_add_tools
that checks the different info inworkflow.steps
to add some tools details:I am not sure if I wrote that the correct way corresponding to the run crate profile (this is a bit obscure to me)
@pauldg What do you think of this ?
Thanks !!