Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unexpected Pyarrow module structure breaks basics cuDF notebook #313

Open
Str-Gen opened this issue Oct 5, 2020 · 5 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@Str-Gen
Copy link

Str-Gen commented Oct 5, 2020

Describe the bug
Creating a fresh anaconda environment with installation of packages via the rapidsAI get started preferred command
leads to an environment in which even the basics notebook in this repo (getting_started_notebooks/basics/Getting_Started_with_cuDF.ipynb) breaks upon trying
gdf = cudf.DataFrame.from_pandas(df), fatally erroring with ModuleNotFoundError: No module named 'pyarrow._cuda'
The command in the library at the bottom of the chain is inside the source files of pyarrow in the conda environment.
More specifically inside cuda.py in the pyarrow package this statement causes the crash.

from pyarrow._cuda import (Context, IpcMemHandle, CudaBuffer,
                           HostBuffer, BufferReader, BufferWriter,
                           new_host_buffer,
                           serialize_record_batch, read_message,
                           read_record_batch)

Steps/Code to reproduce bug

  1. Create a fresh conda environment (blank)
    conda create [env-name]
  2. Install the packages via the suggested command on https://rapids.ai/start.html#get-rapids
    conda install -c rapidsai -c nvidia -c conda-forge -c defaults rapids=0.15 python=3.8 cudatoolkit=11.0
  3. Open the getting_started_notebooks/basics/Getting_Started_with_cuDF.ipynb with the fresh environment as kernel
  4. Try to run the first cell in which a gpu dataframe is constructed

Expected behavior
Normal execution with the dataframe now loaded onto gpu memory.

Environment details:

  • Environment location: Bare-metal Arch Linux 5.8.12
  • Method of RAPIDS libraries install: Conda
  • Nvidia driver version: 455.23.04-3
  • Nvidia-utils version: 455.23.04-1

Additional context
I do not run ubuntu 18.04 or CentOS7, I run Arch Linux 5.8.12 kernel, but that shouldn't matter

conda list cudf
Name Version Build Channel
cudf 0.15.0 cuda_11.0_py38_g71cb8c0e0_0 rapidsai
cudf_kafka 0.15.0 py38_g71cb8c0e0_0 rapidsai
dask-cudf 0.15.0 py38_g71cb8c0e0_0 rapidsai
libcudf 0.15.0 cuda11.0_g71cb8c0e0_0 rapidsai
libcudf_kafka 0.15.0 g71cb8c0e0_0 rapidsai

conda list pyarrow
Name Version Build Channel
pyarrow 0.17.1 py38h1234567_11_cuda conda-forge

The pyarrow build specifically mentions cuda in the name, so I would expect that the feature included in the build.

@Str-Gen Str-Gen added the bug Something isn't working label Oct 5, 2020
@taureandyernv
Copy link
Contributor

taureandyernv commented Oct 5, 2020 via email

@Str-Gen
Copy link
Author

Str-Gen commented Oct 5, 2020

I have solved it, but the process has been strange:

Conda would not let me downgrade to 0.15.0 for pyarrow due to numerous conflicts. Foremost being a requirement to downgrade python from 3.8.x to at most 3.7.x as well as being marked in conflict with most of rapids' packages and with cudatoolkit.

After another clean environment creation, now with python 3.7 (3.7.8 after install), the sample notebook manages to load the dataframe onto the GPU. The downgrade of python-3.8.5 to python-3.7.8 (and subsequently all the builds of rapids components and pyarrow for py3.7) seem to work together. Nonetheless I find it strange, because I went into the environment's source code for pyarrow and somehow even though the version of pyarrow is still 0.17.1 and thus its cuda.py file still contains the same import, it no longer crashes.

from pyarrow._cuda import (Context, IpcMemHandle, CudaBuffer,
                           HostBuffer, BufferReader, BufferWriter,
                           new_host_buffer,
                           serialize_record_batch, read_message,
                           read_record_batch)

Final environment:

Still pyarrow 0.17.1, 0.15.0 for rapids libraries (cudf etc.) and still version 11.0.221 of cudatoolkit (I did not include this in the initial post, but it was). Just python 3.8.5 to 3.7.8.

Name Version Build Channel
pyarrow 0.17.1 py37h1234567_11_cuda conda-forge

Name Version Build Channel
cudf 0.15.0 cuda_11.0_py37_g71cb8c0e0_0 rapidsai
cudf_kafka 0.15.0 py37_g71cb8c0e0_0 rapidsai
dask-cudf 0.15.0 py37_g71cb8c0e0_0 rapidsai
libcudf 0.15.0 cuda11.0_g71cb8c0e0_0 rapidsai
libcudf_kafka 0.15.0 g71cb8c0e0_0 rapidsai

Name Version Build Channel
cudatoolkit 11.0.221 h6bb024c_0 nvidia
dask-cuda 0.15.0 py37_0 rapidsai

Name Version Build Channel
python 3.7.8 h6f2ec95_1_cpython conda-forge

Conclusion: it works (for now) and I can't pinpoint exactly what changed to make it work, which is frustrating.

Additional context of the downgrade attempt
Uninstalling just pyarrow with a forced uninstall (because a regular uninstall would have taken 50+ other packages with it in dependencies), followed by an attempt to install with:
conda install -c conda-forge pyarrow=0.15.0 leads to this output

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: /
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

  • pyarrow=0.15.0 -> python[version='>=2.7,<2.8.0a0|>=3.7,<3.8.0a0|>=3.6,<3.7.0a0']

Your python: python=3.8

If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.

The following specifications were found to be incompatible with your system:

  • feature:/linux-64::__cuda==11.1=0
  • feature:|@/linux-64::__cuda==11.1=0

Your installed version is: 11.1

@taureandyernv
Copy link
Contributor

When I have a minute, i'll spin up a new conda env and test it out (may be later this week). Happy you're back up and running, sad it wasn't straight forward and still remains strange. Sometimes a clean reinstall fixes some of these issues. I'll assign this to myself and get back to you. Please let me know anything else isn't working as expected.

@diggerdu
Copy link

this bug still exists in 2022.04

@diggerdu
Copy link

diggerdu commented Apr 26, 2022

@Str-Gen @taureandyernv
I inspect the procedure of import cudf with command strace python -m cudf, I find that the program import gpuarrow.pyx in
~/.local/lib/pythonxxx/ firstly rather then pyx file in the conda/env/xxx/lib/pythonxxx. The import error disapppers after I adjust the order of sys.path. So, a simple fix comes here: add the above two lines at the head of python script

import sys
sys.path = sorted(sys.path, key=lambda s:'envs' not in s)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants