Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance cubi-tk sodar ingest-fastq with (regex) presets #232

Closed
Nicolai-vKuegelgen opened this issue Jun 25, 2024 · 0 comments · Fixed by #235
Closed

Enhance cubi-tk sodar ingest-fastq with (regex) presets #232

Nicolai-vKuegelgen opened this issue Jun 25, 2024 · 0 comments · Fixed by #235

Comments

@Nicolai-vKuegelgen
Copy link
Contributor

Is your feature request related to a problem? Please describe.
We currently have two general purpose "ingest" functions for uploading data to sodar landing zones in cubi-tk: cubi-tk sodar ingest (which takes singular file trees and uploads them into one collection of a landing zone) and cubi-tk sodar ingest-fastq (which uses filename/pattern matching to upload any number files to matching collections).

The --src-regex option of cubi-tk sodar ingest-fastq theoretically allows one to match and upload any file (even non fastq files) and, combined with --remote-dir-pattern, also change the name patterns in the LZ collections. However, properly editing this regex is complicated and not accessible to most users.

Describe the solution you'd like
Instead of having only one default regex, we should add an easily expandable set of preset defaults, that can optionally also modify the
accompanying default for --remote-dir-pattern. Examples include:

  • Our current default, for general purpose unstructured fastq file collections
  • Additonal fastq default for bcl2fastq / digestflow output, using flowcell-id instead of date to mark unique subcollections
  • Further new defaults for i.e. ONT-LR data or similar (fast5 / pod files)
  • ...

Describe alternatives you've considered
Without a feature like this, using cubi-tk for Sodar upload will be complicated and anything but self explanatory for users outside of cubi.

Additional context
We should probably consider renaming our current ingest functions (i.e. ingest-tree and ingest-pattern)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant