strange behavior of `split_between_processes` when length of `inputs` is smaller than `num_processes` #3393

hmk114 · 2025-02-11T17:28:54Z

Environment

Accelerate version: 1.3.0
Platform: Linux-5.15.0-107-generic-x86_64-with-glibc2.31
accelerate bash location: /home/user/miniconda3/envs/test/bin/accelerate
Python version: 3.12.0
Numpy version: 2.0.1
PyTorch version (GPU?): 2.5.1 (True)
PyTorch XPU available: False
PyTorch NPU available: False
PyTorch MLU available: False
PyTorch MUSA available: False
System RAM: 503.53 GB
GPU type: NVIDIA GeForce RTX 3090
Accelerate default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: no
- use_cpu: False
- debug: False
- num_processes: 4
- machine_rank: 0
- num_machines: 1
- gpu_ids: 4,5,6,7
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: False
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
- dynamo_config: {'dynamo_backend': 'INDUCTOR'}

Description

Currently, when inputs being splitted have a length smaller than number of processes spawned, the last num_processes - len(inputs) processes will receive a copy of the final element in inputs. For example, the following code snippet

from accelerate import PartialState

state = PartialState()
with state.split_between_processes(["A", "B"]) as inputs:
    print(inputs)

will output

["A"]
["B"]
["B"]
["B"]

(not necessarily in the same order) when launched using 4 processes.

I've checked the code and found this statement is responsible for the above behavior:

accelerate/src/accelerate/state.py

Lines 442 to 443 in 24f8d02

    
           if start_index >= len(inputs): 
        
               result = inputs[-1:]

It seems to be an intentional design, but isn't it more reasonable for the last several processes to simply receive an empty list in this setting?

Thanks a lot if someone can give me an explanation!

The text was updated successfully, but these errors were encountered:

muellerzr · 2025-02-11T17:46:14Z

We duplicate it the same way that we duplicate the batch during training so all works well when doing batch-wise inference. It can then just be dropped manually (it's more efficient for doing gather() and such by keeping the data on all processes of the same size

hmk114 · 2025-02-12T02:35:48Z

Thanks again for your response! I got it, but implicitly duplicating elements even when apply_padding is set to False is still somewhat confusing. Maybe better to mention this in documentation?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

strange behavior of `split_between_processes` when length of `inputs` is smaller than `num_processes` #3393

strange behavior of `split_between_processes` when length of `inputs` is smaller than `num_processes` #3393

hmk114 commented Feb 11, 2025

muellerzr commented Feb 11, 2025

hmk114 commented Feb 12, 2025

strange behavior of split_between_processes when length of inputs is smaller than num_processes #3393

strange behavior of split_between_processes when length of inputs is smaller than num_processes #3393

Comments

hmk114 commented Feb 11, 2025

Environment

Description

muellerzr commented Feb 11, 2025

hmk114 commented Feb 12, 2025

strange behavior of `split_between_processes` when length of `inputs` is smaller than `num_processes` #3393

strange behavior of `split_between_processes` when length of `inputs` is smaller than `num_processes` #3393