Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature Request] I'd like to be able to do multirun over files matching a glob (ideally a glob that depends on other, non-sweeped params in the config file) #2942

Open
mmcdermott opened this issue Sep 1, 2024 · 0 comments
Labels
enhancement Enhanvement request

Comments

@mmcdermott
Copy link

🚀 Feature Request

I want to be able to specify a multirun job to sweep over a list of files matching a glob.

Motivation

I use hydra for a number of parallel data processing pipelines, and use multirun jobs to parallelize my commands out over different data shards. In these cases, I have had to write custom bash helpers to take file globs and turn them into lists of filepaths in the hydra syntax so that the sweeper recognizes this as a valid option. I can't use a custom resolver as those resolve after the sweeper has already begun sweeping through the parameters.

Pitch

Describe the solution you'd like
Much like there is a range(0,N) helper for ranging a sweep over integral values, I would like a glob(file_glob) option I can put in a config or on the command line to have it sweep over all files matching that glob.

Describe alternatives you've considered
We currently have this helper implemented as a python script which we package and release via pypi then use in a bash script to produce the input to the hydra program. This results in a syntax like my_hydra_app --multirun data.root=$DATA_DIR data.shard=$(expand_shards $DATA_DIR), whereas I would like the ability to do my_hydra_app --multirun data.root=$DATA_DIR data.shard=glob($DATA_DIR). Critically, I would also like to be able to put the data.shard=glob($data.root) into my hydra config, if possible, so that this can be configured in the .yaml file, not the command line.

Are you willing to open a pull request? (See CONTRIBUTING)
Yes

Additional context

Add any other context or screenshots about the feature request here.

@mmcdermott mmcdermott added the enhancement Enhanvement request label Sep 1, 2024
@mmcdermott mmcdermott changed the title [Feature Request] [Feature Request] I'd like to be able to do multirun over files matching a glob (ideally a glob that depends on other, non-sweeped params in the config file) Sep 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Enhanvement request
Projects
None yet
Development

No branches or pull requests

1 participant