[BUG] Unexpected Pyarrow module structure breaks basics cuDF notebook #313

Str-Gen · 2020-10-05T14:55:28Z

Describe the bug
Creating a fresh anaconda environment with installation of packages via the rapidsAI get started preferred command
leads to an environment in which even the basics notebook in this repo (getting_started_notebooks/basics/Getting_Started_with_cuDF.ipynb) breaks upon trying
gdf = cudf.DataFrame.from_pandas(df), fatally erroring with ModuleNotFoundError: No module named 'pyarrow._cuda'
The command in the library at the bottom of the chain is inside the source files of pyarrow in the conda environment.
More specifically inside cuda.py in the pyarrow package this statement causes the crash.

from pyarrow._cuda import (Context, IpcMemHandle, CudaBuffer,
                           HostBuffer, BufferReader, BufferWriter,
                           new_host_buffer,
                           serialize_record_batch, read_message,
                           read_record_batch)

Steps/Code to reproduce bug

Create a fresh conda environment (blank)
conda create [env-name]
Install the packages via the suggested command on https://rapids.ai/start.html#get-rapids
conda install -c rapidsai -c nvidia -c conda-forge -c defaults rapids=0.15 python=3.8 cudatoolkit=11.0
Open the getting_started_notebooks/basics/Getting_Started_with_cuDF.ipynb with the fresh environment as kernel
Try to run the first cell in which a gpu dataframe is constructed

Expected behavior
Normal execution with the dataframe now loaded onto gpu memory.

Environment details:

Environment location: Bare-metal Arch Linux 5.8.12
Method of RAPIDS libraries install: Conda
Nvidia driver version: 455.23.04-3
Nvidia-utils version: 455.23.04-1

Additional context
I do not run ubuntu 18.04 or CentOS7, I run Arch Linux 5.8.12 kernel, but that shouldn't matter

conda list cudf
Name Version Build Channel
cudf 0.15.0 cuda_11.0_py38_g71cb8c0e0_0 rapidsai
cudf_kafka 0.15.0 py38_g71cb8c0e0_0 rapidsai
dask-cudf 0.15.0 py38_g71cb8c0e0_0 rapidsai
libcudf 0.15.0 cuda11.0_g71cb8c0e0_0 rapidsai
libcudf_kafka 0.15.0 g71cb8c0e0_0 rapidsai

conda list pyarrow
Name Version Build Channel
pyarrow 0.17.1 py38h1234567_11_cuda conda-forge

The pyarrow build specifically mentions cuda in the name, so I would expect that the feature included in the build.

The text was updated successfully, but these errors were encountered:

taureandyernv · 2020-10-05T16:26:08Z

Hey! Can you downgrade pyarrow by installing `pyarrow=0.15.0` from condaforge, try it again, and report back? I just confirmed that your issue works on did app.blazingsql.com, did `!conda list` of their set up, and that’s the version of pyarrow they have. From: Str-Gen <[email protected]> Sent: Monday, October 5, 2020 7:56 AM To: rapidsai-community/notebooks-contrib <[email protected]> Cc: Subscribed <[email protected]> Subject: [rapidsai-community/notebooks-contrib] [BUG] Unexpected Pyarrow module structure breaks basics cuDF notebook (#313) Describe the bug Creating a fresh anaconda environment with installation of packages via the rapidsAI get started preferred command leads to an environment in which even the basics notebook in this repo (getting_started_notebooks/basics/Getting_Started_with_cuDF.ipynb) breaks upon trying gdf = cudf.DataFrame.from_pandas(df), fatally erroring with ModuleNotFoundError: No module named 'pyarrow._cuda' The command in the library at the bottom of the chain is inside the source files of pyarrow in the conda environment. More specifically inside cuda.py in the pyarrow package this statement causes the crash. from pyarrow._cuda import (Context, IpcMemHandle, CudaBuffer, HostBuffer, BufferReader, BufferWriter, new_host_buffer, serialize_record_batch, read_message, read_record_batch) Steps/Code to reproduce bug 1. Create a fresh conda environment (blank) conda create [env-name] 2. Install the packages via the suggested command on https://rapids.ai/start.html#get-rapids conda install -c rapidsai -c nvidia -c conda-forge -c defaults rapids=0.15 python=3.8 cudatoolkit=11.0 3. Open the getting_started_notebooks/basics/Getting_Started_with_cuDF.ipynb with the fresh environment as kernel 4. Try to run the first cell in which a gpu dataframe is constructed Expected behavior Normal execution with the dataframe now loaded onto gpu memory. Environment details: * Environment location: Bare-metal Arch Linux 5.8.12 * Method of RAPIDS libraries install: Conda * Nvidia driver version: 455.23.04-3 * Nvidia-utils version: 455.23.04-1 Additional context I do not run ubuntu 18.04 or CentOS7, I run Arch Linux 5.8.12 kernel, but that shouldn't matter conda list cudf Name Version Build Channel cudf 0.15.0 cuda_11.0_py38_g71cb8c0e0_0 rapidsai cudf_kafka 0.15.0 py38_g71cb8c0e0_0 rapidsai dask-cudf 0.15.0 py38_g71cb8c0e0_0 rapidsai libcudf 0.15.0 cuda11.0_g71cb8c0e0_0 rapidsai libcudf_kafka 0.15.0 g71cb8c0e0_0 rapidsai conda list pyarrow Name Version Build Channel pyarrow 0.17.1 py38h1234567_11_cuda conda-forge The pyarrow build specifically mentions cuda in the name, so I would expect that the feature included in the build. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#313>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALGCYZDBZSRYXOF2AVDSIG3SJHM7JANCNFSM4SEZQ5VA>.

Str-Gen · 2020-10-05T18:50:05Z

I have solved it, but the process has been strange:

Conda would not let me downgrade to 0.15.0 for pyarrow due to numerous conflicts. Foremost being a requirement to downgrade python from 3.8.x to at most 3.7.x as well as being marked in conflict with most of rapids' packages and with cudatoolkit.

After another clean environment creation, now with python 3.7 (3.7.8 after install), the sample notebook manages to load the dataframe onto the GPU. The downgrade of python-3.8.5 to python-3.7.8 (and subsequently all the builds of rapids components and pyarrow for py3.7) seem to work together. Nonetheless I find it strange, because I went into the environment's source code for pyarrow and somehow even though the version of pyarrow is still 0.17.1 and thus its cuda.py file still contains the same import, it no longer crashes.

from pyarrow._cuda import (Context, IpcMemHandle, CudaBuffer,
                           HostBuffer, BufferReader, BufferWriter,
                           new_host_buffer,
                           serialize_record_batch, read_message,
                           read_record_batch)

Final environment:

Still pyarrow 0.17.1, 0.15.0 for rapids libraries (cudf etc.) and still version 11.0.221 of cudatoolkit (I did not include this in the initial post, but it was). Just python 3.8.5 to 3.7.8.

Name Version Build Channel
pyarrow 0.17.1 py37h1234567_11_cuda conda-forge

Name Version Build Channel
cudf 0.15.0 cuda_11.0_py37_g71cb8c0e0_0 rapidsai
cudf_kafka 0.15.0 py37_g71cb8c0e0_0 rapidsai
dask-cudf 0.15.0 py37_g71cb8c0e0_0 rapidsai
libcudf 0.15.0 cuda11.0_g71cb8c0e0_0 rapidsai
libcudf_kafka 0.15.0 g71cb8c0e0_0 rapidsai

Name Version Build Channel
cudatoolkit 11.0.221 h6bb024c_0 nvidia
dask-cuda 0.15.0 py37_0 rapidsai

Name Version Build Channel
python 3.7.8 h6f2ec95_1_cpython conda-forge

Conclusion: it works (for now) and I can't pinpoint exactly what changed to make it work, which is frustrating.

Additional context of the downgrade attempt
Uninstalling just pyarrow with a forced uninstall (because a regular uninstall would have taken 50+ other packages with it in dependencies), followed by an attempt to install with:
conda install -c conda-forge pyarrow=0.15.0 leads to this output

Collecting package metadata (current_repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Collecting package metadata (repodata.json): done
Solving environment: failed with initial frozen solve. Retrying with flexible solve.
Solving environment: /
Found conflicts! Looking for incompatible packages.
This can take several minutes. Press CTRL-C to abort.
failed

UnsatisfiableError: The following specifications were found
to be incompatible with the existing python installation in your environment:

Specifications:

pyarrow=0.15.0 -> python[version='>=2.7,<2.8.0a0|>=3.7,<3.8.0a0|>=3.6,<3.7.0a0']

Your python: python=3.8

If python is on the left-most side of the chain, that's the version you've asked for.
When python appears to the right, that indicates that the thing on the left is somehow
not available for the python version you are constrained to. Note that conda will not
change your python version to a different minor version unless you explicitly specify
that.

The following specifications were found to be incompatible with your system:

feature:/linux-64::__cuda==11.1=0

feature:|@/linux-64::__cuda==11.1=0

Your installed version is: 11.1

taureandyernv · 2020-10-05T18:59:03Z

When I have a minute, i'll spin up a new conda env and test it out (may be later this week). Happy you're back up and running, sad it wasn't straight forward and still remains strange. Sometimes a clean reinstall fixes some of these issues. I'll assign this to myself and get back to you. Please let me know anything else isn't working as expected.

diggerdu · 2022-04-26T17:22:25Z

this bug still exists in 2022.04

diggerdu · 2022-04-26T18:45:29Z

@Str-Gen @taureandyernv
I inspect the procedure of import cudf with command strace python -m cudf, I find that the program import gpuarrow.pyx in
~/.local/lib/pythonxxx/ firstly rather then pyx file in the conda/env/xxx/lib/pythonxxx. The import error disapppers after I adjust the order of sys.path. So, a simple fix comes here: add the above two lines at the head of python script

import sys
sys.path = sorted(sys.path, key=lambda s:'envs' not in s)

Str-Gen added the bug Something isn't working label Oct 5, 2020

taureandyernv self-assigned this Oct 5, 2020

adrienpacifico mentioned this issue Oct 22, 2020

[QST] Broken package after installing modin rapidsai/cudf#6584

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Unexpected Pyarrow module structure breaks basics cuDF notebook #313

[BUG] Unexpected Pyarrow module structure breaks basics cuDF notebook #313

Str-Gen commented Oct 5, 2020

taureandyernv commented Oct 5, 2020 via email

Str-Gen commented Oct 5, 2020

taureandyernv commented Oct 5, 2020

diggerdu commented Apr 26, 2022

diggerdu commented Apr 26, 2022 •

edited

Loading

[BUG] Unexpected Pyarrow module structure breaks basics cuDF notebook #313

[BUG] Unexpected Pyarrow module structure breaks basics cuDF notebook #313

Comments

Str-Gen commented Oct 5, 2020

taureandyernv commented Oct 5, 2020 via email

Str-Gen commented Oct 5, 2020

I have solved it, but the process has been strange:

Final environment:

Conclusion: it works (for now) and I can't pinpoint exactly what changed to make it work, which is frustrating.

taureandyernv commented Oct 5, 2020

diggerdu commented Apr 26, 2022

diggerdu commented Apr 26, 2022 • edited Loading

diggerdu commented Apr 26, 2022 •

edited

Loading