-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Unexpected Pyarrow module structure breaks basics cuDF notebook #313
Comments
Hey! Can you downgrade pyarrow by installing `pyarrow=0.15.0` from condaforge, try it again, and report back? I just confirmed that your issue works on did app.blazingsql.com, did `!conda list` of their set up, and that’s the version of pyarrow they have.
From: Str-Gen <[email protected]>
Sent: Monday, October 5, 2020 7:56 AM
To: rapidsai-community/notebooks-contrib <[email protected]>
Cc: Subscribed <[email protected]>
Subject: [rapidsai-community/notebooks-contrib] [BUG] Unexpected Pyarrow module structure breaks basics cuDF notebook (#313)
Describe the bug
Creating a fresh anaconda environment with installation of packages via the rapidsAI get started preferred command
leads to an environment in which even the basics notebook in this repo (getting_started_notebooks/basics/Getting_Started_with_cuDF.ipynb) breaks upon trying
gdf = cudf.DataFrame.from_pandas(df), fatally erroring with ModuleNotFoundError: No module named 'pyarrow._cuda'
The command in the library at the bottom of the chain is inside the source files of pyarrow in the conda environment.
More specifically inside cuda.py in the pyarrow package this statement causes the crash.
from pyarrow._cuda import (Context, IpcMemHandle, CudaBuffer,
HostBuffer, BufferReader, BufferWriter,
new_host_buffer,
serialize_record_batch, read_message,
read_record_batch)
Steps/Code to reproduce bug
1. Create a fresh conda environment (blank)
conda create [env-name]
2. Install the packages via the suggested command on https://rapids.ai/start.html#get-rapids
conda install -c rapidsai -c nvidia -c conda-forge -c defaults rapids=0.15 python=3.8 cudatoolkit=11.0
3. Open the getting_started_notebooks/basics/Getting_Started_with_cuDF.ipynb with the fresh environment as kernel
4. Try to run the first cell in which a gpu dataframe is constructed
Expected behavior
Normal execution with the dataframe now loaded onto gpu memory.
Environment details:
* Environment location: Bare-metal Arch Linux 5.8.12
* Method of RAPIDS libraries install: Conda
* Nvidia driver version: 455.23.04-3
* Nvidia-utils version: 455.23.04-1
Additional context
I do not run ubuntu 18.04 or CentOS7, I run Arch Linux 5.8.12 kernel, but that shouldn't matter
conda list cudf
Name Version Build Channel
cudf 0.15.0 cuda_11.0_py38_g71cb8c0e0_0 rapidsai
cudf_kafka 0.15.0 py38_g71cb8c0e0_0 rapidsai
dask-cudf 0.15.0 py38_g71cb8c0e0_0 rapidsai
libcudf 0.15.0 cuda11.0_g71cb8c0e0_0 rapidsai
libcudf_kafka 0.15.0 g71cb8c0e0_0 rapidsai
conda list pyarrow
Name Version Build Channel
pyarrow 0.17.1 py38h1234567_11_cuda conda-forge
The pyarrow build specifically mentions cuda in the name, so I would expect that the feature included in the build.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub<#313>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/ALGCYZDBZSRYXOF2AVDSIG3SJHM7JANCNFSM4SEZQ5VA>.
|
I have solved it, but the process has been strange:Conda would not let me downgrade to 0.15.0 for pyarrow due to numerous conflicts. Foremost being a requirement to downgrade python from 3.8.x to at most 3.7.x as well as being marked in conflict with most of rapids' packages and with cudatoolkit. After another clean environment creation, now with python 3.7 (3.7.8 after install), the sample notebook manages to load the dataframe onto the GPU. The downgrade of python-3.8.5 to python-3.7.8 (and subsequently all the builds of rapids components and pyarrow for py3.7) seem to work together. Nonetheless I find it strange, because I went into the environment's source code for pyarrow and somehow even though the version of pyarrow is still 0.17.1 and thus its cuda.py file still contains the same import, it no longer crashes. from pyarrow._cuda import (Context, IpcMemHandle, CudaBuffer,
HostBuffer, BufferReader, BufferWriter,
new_host_buffer,
serialize_record_batch, read_message,
read_record_batch) Final environment:Still pyarrow 0.17.1, 0.15.0 for rapids libraries (cudf etc.) and still version 11.0.221 of cudatoolkit (I did not include this in the initial post, but it was). Just python 3.8.5 to 3.7.8. Name Version Build Channel Name Version Build Channel Name Version Build Channel Name Version Build Channel Conclusion: it works (for now) and I can't pinpoint exactly what changed to make it work, which is frustrating.Additional context of the downgrade attempt
|
When I have a minute, i'll spin up a new conda env and test it out (may be later this week). Happy you're back up and running, sad it wasn't straight forward and still remains strange. Sometimes a clean reinstall fixes some of these issues. I'll assign this to myself and get back to you. Please let me know anything else isn't working as expected. |
this bug still exists in 2022.04 |
@Str-Gen @taureandyernv
|
Describe the bug
Creating a fresh anaconda environment with installation of packages via the rapidsAI get started preferred command
leads to an environment in which even the basics notebook in this repo (getting_started_notebooks/basics/Getting_Started_with_cuDF.ipynb) breaks upon trying
gdf = cudf.DataFrame.from_pandas(df)
, fatally erroring with ModuleNotFoundError: No module named 'pyarrow._cuda'The command in the library at the bottom of the chain is inside the source files of pyarrow in the conda environment.
More specifically inside cuda.py in the pyarrow package this statement causes the crash.
Steps/Code to reproduce bug
conda create [env-name]
conda install -c rapidsai -c nvidia -c conda-forge -c defaults rapids=0.15 python=3.8 cudatoolkit=11.0
Expected behavior
Normal execution with the dataframe now loaded onto gpu memory.
Environment details:
Additional context
I do not run ubuntu 18.04 or CentOS7, I run Arch Linux 5.8.12 kernel, but that shouldn't matter
conda list cudf
Name Version Build Channel
cudf 0.15.0 cuda_11.0_py38_g71cb8c0e0_0 rapidsai
cudf_kafka 0.15.0 py38_g71cb8c0e0_0 rapidsai
dask-cudf 0.15.0 py38_g71cb8c0e0_0 rapidsai
libcudf 0.15.0 cuda11.0_g71cb8c0e0_0 rapidsai
libcudf_kafka 0.15.0 g71cb8c0e0_0 rapidsai
conda list pyarrow
Name Version Build Channel
pyarrow 0.17.1 py38h1234567_11_cuda conda-forge
The pyarrow build specifically mentions cuda in the name, so I would expect that the feature included in the build.
The text was updated successfully, but these errors were encountered: