[Issue]: Installing ROCm Flash-Attention on RHEL #69

varshaprasad96 · 2024-07-22T19:09:48Z

Problem Description

We are trying to install ROCm flash-attention on RHEL using steps similar to those mentioned in the Dockerfile, but using a RHEL/UBI9 base image (registry.access.redhat.com/ubi9:latest) instead of rocm/pytorch.

As a prerequisite, the Dockerfile installs setuptools, packaging, ninja, and torch from https://download.pytorch.org/whl/rocm6.0, as recommended on the PyTorch website and the README of the repository.

Here are the versions:
Python: 3.11
Setuptools: 71.1.0
Torch: 2.1.1

The intention is to install the flash-attention successfully for ROCm version 6.1.2.

However, these are the following issues:

Patching Hipify:

The hipify.py script has been modified in recent versions of Torch, causing the patch command to fail. The Dockerfile references this command: #

flash-attention/Dockerfile.rocm

Line 27 in 2554f49

    
           && patch "${PYTHON_SITE_PACKAGES}/torch/utils/hipify/hipify_python.py" hipify_patch.patch

It looks like the version of hipify.py in Torch 2.1.1 does not match the expected version for ROCm 6.1.2. Could you specify which version of Torch should be used with ROCm 6.1.2 to avoid this issue?

Mismatch in Version Requirements and pip install Errors

When running pip install ., after cloning the repository and setting the PYTHON_SITE_PACKAGES path, the following errors appear:

Using pip 24.1.2 from /flash-attention/myenv/lib64/python3.9/site-packages/pip (python 3.9)
Processing /flash-attention
  Running command python setup.py egg_info
  Traceback (most recent call last):
    File "<string>", line 2, in <module>
    File "<pip-setuptools-caller>", line 34, in <module>
    File "/flash-attention/setup.py", line 293, in <module>
      build_for_cuda()
    File "/flash-attention/setup.py", line 125, in build_for_cuda
      raise_if_cuda_home_none("flash_attn")
    File "/flash-attention/setup.py", line 63, in raise_if_cuda_home_none
      raise RuntimeError(
  RuntimeError: flash_attn was requested, but nvcc was not found.  Are you sure your environment has nvcc available?  If you're installing within a container from https://hub.docker.com/r/pytorch/pytorch, only images whose names contain 'devel' will provide nvcc.

Warning: Torch did not find available GPUs on this system.
If your intention is to cross-compile, this is not an error.
By default, Apex will cross-compile for Pascal (compute capabilities 6.0, 6.1, 6.2),
Volta (compute capability 7.0), Turing (compute capability 7.5),
and, if the CUDA version is >= 11.0, Ampere (compute capability 8.0).
If you wish to cross-compile for a single specific architecture,
export TORCH_CUDA_ARCH_LIST="compute capability" before running setup.py.

torch.__version__  = 2.3.1+cu121

error: subprocess-exited-with-error
× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.
full command: /flash-attention/myenv/bin/python -c '
exec(compile('"'"''"'"''"'"'
# This is <pip-setuptools-caller> -- a caller that pip uses to run setup.py
#
# - It imports setuptools before invoking setup.py, to enable projects that directly
#   import from `distutils.core` to work with newer packaging standards.
# - It provides a clear error message when setuptools is not installed.
# - It sets `sys.argv[0]` to the underlying `setup.py`, when invoking `setup.py` so
#   setuptools doesn'"'"'t think the script is `-c`. This avoids the following warning:
#     manifest_maker: standard file '"'"'-c'"'"' not found".
# - It generates a shim setup.py, for handling setup.cfg-only projects.
import os, sys, tokenize

try:
    import setuptools
except ImportError as error:
    print(
        "ERROR: Can not execute `setup.py` since setuptools is not available in "
        "the build environment.",
        file=sys.stderr,
    )
    sys.exit(1)

__file__ = %r
sys.argv[0] = __file__

if os.path.exists(__file__):
    filename = __file__
    with tokenize.open(__file__) as f:
        setup_py_code = f.read()
else:
    filename = "<auto-generated setuptools caller>"
    setup_py_code = "from setuptools import setup; setup()"

exec(compile(setup_py_code, filename, "exec"))
'"'"''"'"''"'"' % ('"'"'/flash-attention/setup.py'"'"',), "<pip-setuptools-caller>", "exec"))' egg_info --egg-base /tmp/pip-pip-egg-info-4o0z0_8v
cwd: /flash-attention/
Preparing metadata (setup.py) ... error
error: metadata-generation-failed

There are 2 issues here:
2.1 Error with setuptools not being available, even though it is present in the PYTHON_SITE_PACKAGES.
2.2 Error with nvcc not present.

For (2.1):
Even after verifying the availability of setuptools in the expected location, setting the env var and also setting PYTHONPATH the error still persists. Is there any way to identify where the shim that pip uses in setup.py is looking at while installing. Also are there any specific version requirements that is being violated.

For (2.2):
Tried setting FLASH_ATTENTION_SKIP_CUDA_BUILD=TRUE as suggested in this issue, assuming nvcc would not be required, but the problem still persists.

Tl;dr; these are the major issues we need help with:

Which version of Torch should be used with ROCm 6.1.2 to ensure the right changes are made in hipify to avoid hipification of certain files?
Is there a specific version of dependencies that need to be used to avoid the setup.py errors?
How do we resolve the nvcc not found issue, especially when intending to use ROCm instead of CUDA?

It would be helpful if anyone could provide guidance or help in resolving these issues. Thank you!

Operating System

RHEL/UBI9

CPU

NA

GPU

AMD Instinct MI300X, AMD Instinct MI300A

ROCm Version

ROCm 6.1.0

The text was updated successfully, but these errors were encountered:

sancspro · 2024-08-20T10:59:04Z

Were you able to find a solution or workaround for this issue? Facing the same error with torch.version = 2.4.0+rocm6.1

Used to install and make use of flash-attn sometime back with Navi32.

jiridanek · 2024-08-20T11:17:43Z

Install ROCm devel packages first (https://rocm.docs.amd.com/projects/install-on-linux/en/latest/install/install-overview.html, dnf install rocm in our case) and only install flash-attention after that.

sancspro · 2024-08-26T03:07:00Z

Thanks for responding. I uninstalled current rocm package completely rebooted and reinstalled it. Went back from 6.2.0 to 6.1.0
Now, FA installs and works fine.

ppanchad-amd · 2024-11-04T15:12:49Z

Hi @varshaprasad96. Were you able to resolve your issue? If so, please close the ticket. Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Issue]: Installing ROCm Flash-Attention on RHEL #69

[Issue]: Installing ROCm Flash-Attention on RHEL #69

varshaprasad96 commented Jul 22, 2024 •

edited

Loading

sancspro commented Aug 20, 2024

jiridanek commented Aug 20, 2024 •

edited

Loading

sancspro commented Aug 26, 2024

ppanchad-amd commented Nov 4, 2024

[Issue]: Installing ROCm Flash-Attention on RHEL #69

[Issue]: Installing ROCm Flash-Attention on RHEL #69

Comments

varshaprasad96 commented Jul 22, 2024 • edited Loading

Problem Description

Operating System

CPU

GPU

ROCm Version

sancspro commented Aug 20, 2024

jiridanek commented Aug 20, 2024 • edited Loading

sancspro commented Aug 26, 2024

ppanchad-amd commented Nov 4, 2024

varshaprasad96 commented Jul 22, 2024 •

edited

Loading

jiridanek commented Aug 20, 2024 •

edited

Loading