Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend KINC workflow to ingest/output data from/in iRODS. #1

Open
clarisca opened this issue Sep 26, 2017 · 2 comments
Open

Extend KINC workflow to ingest/output data from/in iRODS. #1

clarisca opened this issue Sep 26, 2017 · 2 comments
Assignees

Comments

@clarisca
Copy link

We need to integrate compute/network with iRODS data grid. To start, we will configure KINC Pegasus workflow to ingest input data from the ScIDAS iRODS deployment. We envision raw data transferred from NCBI into IRODS using replication policies prior workflow execution but for now lets assume that the data is already in iRODS.

See email thread initiated by @feltus and response from Mats@ISI.

-----Original Message-----
From: Mats Rynge [mailto:[email protected]]
Sent: Thursday, September 21, 2017 12:48 PM
To: Alex Feltus [email protected]
Cc: Claris Castillo [email protected]; Ficklin, Stephen Patrick ([email protected]) [email protected]; William Poehlman ([email protected]) [email protected]
Subject: Re: iRODS and OSG-GEM/OSG-KINC Workflows

For the SciDAS project, we now have iRODS installations at WSU, RENCI (of course), and CU. We are loading it up with genomes and will be moving data into the iRODs zone from NCBI and our own file systems. Then we want to pull from iRODS into Pegasus workflows to process on OSG (+Cloudlab and Chameleon).
QUESTION: Is it possible to pull data from a remote iRODS zone directly into the Pegasus workflow at osgconnect and then run on OSG?
Yes, just put it in URL from by prepending irods:// to the path. This is to let Pegasus know it is an iRODS location. Then add a "irodsPassword"
to your configuration file:
See More
For the SciDAS project, we now have iRODS installations at WSU, RENCI (of course), and CU. We are loading it up with genomes and will be moving data into the iRODs zone from NCBI and our own file systems. Then we want to pull from iRODS into Pegasus workflows to process on OSG (+Cloudlab and Chameleon).
QUESTION: Is it possible to pull data from a remote iRODS zone directly into the Pegasus workflow at osgconnect and then run on OSG?
Yes, just put it in URL from by prepending irods:// to the path. This is to let Pegasus know it is an iRODS location. Then add a "irodsPassword"
to your configuration file:

https://urldefense.proofpoint.com/v2/url?u=https-3A__pegasus.isi.edu_documentation_cred-5Fstaging.php-23irods-5Fcred&d=DwICaQ&c=Ngd-ta5yRYsqeUsEDgxhcqsYYY1Xs5ogLxWPA_2Wlc4&r=-iT4EzFw1LHEyOlwP_M6-5Up102auJihsYcZkxUv70c&m=bLAIbB-XUqd6aK6SnRLqlpPjZxdS6bOAkCgLQPTy_sQ&s=RS6HsUcljTEYzSouEdMR_bb2N6oAatgBLUNeWZkTGfo&e=

It is a while ago we did this implementation, so it might need some refreshing - but that should be easy.

What workflow is this for? For GEM, data would probably have to be pulled in to the submit host, and split up. If you have more of a one input per job setup, you can do staging bypass and have the jobs start up and pull directly from irods. You have to be careful with this approach as you can easily end up with 100's or 1000's of clients interacting with your data store.

@nlmills
Copy link
Collaborator

nlmills commented Sep 27, 2017

@wpoehlm and I will work on these features in the "irods" branch.

@nlmills
Copy link
Collaborator

nlmills commented Oct 2, 2017

iRODS client software installed in 3f8b237.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants