Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating transformers issue with bloom models #541

Open
loadams opened this issue Nov 1, 2024 · 0 comments
Open

Updating transformers issue with bloom models #541

loadams opened this issue Nov 1, 2024 · 0 comments
Assignees

Comments

@loadams
Copy link
Contributor

loadams commented Nov 1, 2024

Updating to transformers versions beyond v4.43.4 causes issues with the CI tests in the legacy mode. The bloom tests fail with:

FAILED test_non_persistent_deployment.py::test_single_GPU[None-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query3-non-persistent] - ValueError: not enough values to unpack (expected 2, got 0)
FAILED test_local_deployment.py::test_session[None-local-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query0] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_local_deployment.py::test_multi_GPU[None-local-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query0] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_local_deployment.py::test_single_GPU[None-local-50050-False-28080-fp16-1-False-False-1-True-False-ds_config0-text-generation-bigscience/bloom-560m-query3] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_deployment_options.py::test_meta_tensor[query0-None-bigscience/bloom-560m-local-50050-False-28080-text-generation-fp16-False-1-True-False-ds_config0-2-True] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_deployment_options.py::test_load_to_sys_mem[query0-None-bigscience/bloom-560m-local-50050-False-28080-text-generation-fp16-1-False-1-True-False-ds_config0-True] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_deployment_options.py::test_restful_api[query0-28080-None-bigscience/bloom-560m-local-50050-text-generation-fp16-1-False-False-1-True-False-ds_config0-True] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:
FAILED test_deployment_options.py::test_replicas[query0-None-bigscience/bloom-560m-local-50050-False-28080-text-generation-fp16-1-False-False-True-False-ds_config0-2] - grpc.aio._call.AioRpcError: <AioRpcError of RPC that terminated with:

We have isolated the problematic commit to this one: huggingface/transformers#31445

../../mii/legacy/client.py:144: in query
    return task_methods.run_inference(inference_pipeline, args, query_kwargs)
../../mii/legacy/method_table.py:101: in run_inference
    response = inference_pipeline(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/text_generation.py:262: in __call__
    return super().__call__(text_inputs, **kwargs)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/base.py:1238: in __call__
    outputs = list(final_iterator)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/pt_utils.py:124: in __next__
    item = next(self.iterator)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/pt_utils.py:125: in __next__
    processed = self.infer(item, **self.params)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/base.py:1164: in forward
    model_outputs = self._forward(model_inputs, **forward_params)
../../../venv/lib/python3.12/site-packages/transformers/pipelines/text_generation.py:351: in _forward
    generated_sequence = self.model.generate(input_ids=input_ids, attention_mask=attention_mask, **generate_kwargs)
../../../venv/lib/python3.12/site-packages/deepspeed/inference/engine.py:631: in _generate
    return self.module.generate(*inputs, **kwargs)
../../../venv/lib/python3.12/site-packages/torch/utils/_contextlib.py:116: in decorate_context
    return func(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/transformers/generation/utils.py:2024: in generate
    result = self._sample(
../../../venv/lib/python3.12/site-packages/transformers/generation/utils.py:2982: in _sample
    outputs = self(**model_inputs, return_dict=True)
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/transformers/models/bloom/modeling_bloom.py:955: in forward
    transformer_outputs = self.transformer(
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/transformers/models/bloom/modeling_bloom.py:744: in forward
    outputs = block(
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/deepspeed/model_implementations/transformers/ds_transformer.py:162: in forward
    self.attention(input,
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1736: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/torch/nn/modules/module.py:1747: in _call_impl
    return forward_call(*args, **kwargs)
../../../venv/lib/python3.12/site-packages/deepspeed/ops/transformer/inference/ds_attention.py:168: in forward
    context_layer, key_layer, value_layer = self.compute_attention(qkv_out=qkv_out,
@loadams loadams self-assigned this Nov 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant