Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add with_rank to Dataset.from_generator #7213

Open
muthissar opened this issue Oct 10, 2024 · 0 comments
Open

Add with_rank to Dataset.from_generator #7213

muthissar opened this issue Oct 10, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@muthissar
Copy link

muthissar commented Oct 10, 2024

Feature request

Add with_rank to Dataset.from_generator similar to Dataset.map and Dataset.filter.

Motivation

As for Dataset.map and Dataset.filter, this is useful when creating cache files using multi-GPU, where the rank can be used to select GPU IDs. For now, rank can be added in the gen_kwars argument; however, this, in turn, includes the rank when computing the fingerprint.

Your contribution

Added #7199 which passes rank based on the job_id set by num_proc.

@muthissar muthissar added the enhancement New feature or request label Oct 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant