Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adjust import workflow configuration to support S3 #357

Merged
merged 7 commits into from
Sep 20, 2024
Merged

Conversation

jimmymathews
Copy link
Collaborator

@jimmymathews jimmymathews commented Sep 20, 2024

  • Allows input_path, for the import workflow, to be an S3 URI.
  • Adds some checks on the URIs to make sure the correct files are available.
  • Adds some logic to make sure Nextflow will normally be able to access the appropriate credentials to pull from S3.

The result is that the Nextflow procedure to import a curated dataset into the database pretty much works as-is when the source files come from a folder in an S3 bucket, rather than from a local directory.

Note that Nextflow does support S3 "out of the box", but a subtlety is that it does not support the session-specific tokens that we have been using in the shell environment. Instead, in the session-specific case one must use the credentials from a user profile, and one must not allow Nextflow to try guessing credentials environment variables, because it will fail with a misleading error.

  • There is also a slight adjustment to the metadata field inference, to allow some channels to be indicated as "computationally generated" without affecting the primary channel annotation.

@jimmymathews jimmymathews merged commit 26de250 into main Sep 20, 2024
1 check passed
@jimmymathews jimmymathews deleted the dataimportmods branch September 20, 2024 22:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant