Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

strange behavior of split_between_processes when length of inputs is smaller than num_processes #3393

Open
hmk114 opened this issue Feb 11, 2025 · 2 comments

Comments

@hmk114
Copy link

hmk114 commented Feb 11, 2025

Environment

  • Accelerate version: 1.3.0
  • Platform: Linux-5.15.0-107-generic-x86_64-with-glibc2.31
  • accelerate bash location: /home/user/miniconda3/envs/test/bin/accelerate
  • Python version: 3.12.0
  • Numpy version: 2.0.1
  • PyTorch version (GPU?): 2.5.1 (True)
  • PyTorch XPU available: False
  • PyTorch NPU available: False
  • PyTorch MLU available: False
  • PyTorch MUSA available: False
  • System RAM: 503.53 GB
  • GPU type: NVIDIA GeForce RTX 3090
  • Accelerate default config:
    - compute_environment: LOCAL_MACHINE
    - distributed_type: MULTI_GPU
    - mixed_precision: no
    - use_cpu: False
    - debug: False
    - num_processes: 4
    - machine_rank: 0
    - num_machines: 1
    - gpu_ids: 4,5,6,7
    - rdzv_backend: static
    - same_network: True
    - main_training_function: main
    - enable_cpu_affinity: False
    - downcast_bf16: no
    - tpu_use_cluster: False
    - tpu_use_sudo: False
    - tpu_env: []
    - dynamo_config: {'dynamo_backend': 'INDUCTOR'}

Description

Currently, when inputs being splitted have a length smaller than number of processes spawned, the last num_processes - len(inputs) processes will receive a copy of the final element in inputs. For example, the following code snippet

from accelerate import PartialState

state = PartialState()
with state.split_between_processes(["A", "B"]) as inputs:
    print(inputs)

will output

["A"]
["B"]
["B"]
["B"]

(not necessarily in the same order) when launched using 4 processes.

I've checked the code and found this statement is responsible for the above behavior:

if start_index >= len(inputs):
result = inputs[-1:]

It seems to be an intentional design, but isn't it more reasonable for the last several processes to simply receive an empty list in this setting?

Thanks a lot if someone can give me an explanation!

@muellerzr
Copy link
Collaborator

We duplicate it the same way that we duplicate the batch during training so all works well when doing batch-wise inference. It can then just be dropped manually (it's more efficient for doing gather() and such by keeping the data on all processes of the same size

@hmk114
Copy link
Author

hmk114 commented Feb 12, 2025

Thanks again for your response! I got it, but implicitly duplicating elements even when apply_padding is set to False is still somewhat confusing. Maybe better to mention this in documentation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants