Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-37621: [Packaging][Conda] Sync conda recipes with feedstocks #37624

Merged
merged 9 commits into from
Sep 28, 2023

Conversation

h-vetinari
Copy link
Contributor

@h-vetinari h-vetinari commented Sep 8, 2023

Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68)

Relevant updates:

Also some further hardening of the activation scripts & clean-ups for dependencies & test skips.

@github-actions
Copy link

github-actions bot commented Sep 8, 2023

⚠️ GitHub issue #37621 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added the awaiting review Awaiting review label Sep 8, 2023
@h-vetinari
Copy link
Contributor Author

@github-actions crossbow submit -g conda

@github-actions
Copy link

github-actions bot commented Sep 8, 2023

Revision: c04c157

Submitted crossbow builds: ursacomputing/crossbow @ actions-630421cd79

Task Status
conda-clean Azure
conda-linux-aarch64-cpu-py3 Azure
conda-linux-aarch64-cpu-r42 Azure
conda-linux-aarch64-cpu-r43 Azure
conda-linux-aarch64-cuda-py3 Azure
conda-linux-ppc64le-cpu-py3 Azure
conda-linux-ppc64le-cuda-py3 Azure
conda-linux-x64-cpu-py3 Azure
conda-linux-x64-cpu-r42 Azure
conda-linux-x64-cpu-r43 Azure
conda-linux-x64-cuda-py3 Azure
conda-osx-arm64-cpu-py3 Azure
conda-osx-arm64-cpu-r42 Azure
conda-osx-arm64-cpu-r43 Azure
conda-osx-x64-cpu-py3 Azure
conda-osx-x64-cpu-r42 Azure
conda-osx-x64-cpu-r43 Azure
conda-win-x64-cpu-py3 Azure
conda-win-x64-cpu-r41 Azure
conda-win-x64-cuda-py3 Azure

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Sep 8, 2023
@h-vetinari
Copy link
Contributor Author

Something is not working out with the CUDA cross-compilation here:

[82/139] Linking CXX shared module release/_cuda.cpython-310-powerpc64le-linux-gnu.so
FAILED: release/_cuda.cpython-310-powerpc64le-linux-gnu.so 
: && $BUILD_PREFIX/bin/powerpc64le-conda-linux-gnu-c++ -fPIC -Wno-noexcept-type  -Wall -fno-semantic-interposition -fvisibility-inlines-hidden -std=c++17 -fmessage-length=0 -mcpu=power8 -mtune=power8 -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O3 -pipe -isystem $PREFIX/include -fdebug-prefix-map=/build/apache-arrow_1694161537567/work=/usr/local/src/conda/pyarrow-14.0.0.dev124 -fdebug-prefix-map=$PREFIX=/usr/local/src/conda-prefix -isystem /usr/local/cuda/targets/ppc64le-linux/include -fdiagnostics-color=always  -fno-omit-frame-pointer -Wno-unused-variable -Wno-maybe-uninitialized -O3 -DNDEBUG -O2 -ftree-vectorize  -Wl,-O2 -Wl,--sort-common -Wl,--as-needed -Wl,-z,relro -Wl,-z,now -Wl,--allow-shlib-undefined -Wl,-rpath,$PREFIX/lib -Wl,-rpath-link,$PREFIX/lib -L$PREFIX/lib -shared  -o release/_cuda.cpython-310-powerpc64le-linux-gnu.so CMakeFiles/_cuda.dir/_cuda.cpp.o  -Wl,-rpath,"\$ORIGIN"  release/libarrow_python.so  $PREFIX/lib/libarrow_cuda.so.1400.0.0  $PREFIX/lib/libarrow_dataset.so.1400.0.0  $PREFIX/lib/libarrow_acero.so.1400.0.0  $PREFIX/lib/libparquet.so.1400.0.0  $PREFIX/lib/libarrow.so.1400.0.0  /usr/local/cuda-11.2/targets/x86_64-linux/lib/stubs/libcuda.so && :
$BUILD_PREFIX/powerpc64le-conda-linux-gnu/bin/ld: /usr/local/cuda-11.2/targets/x86_64-linux/lib/stubs/libcuda.so: error adding symbols: file in wrong format
collect2: error: ld returned 1 exit status

It does work in conda-forge, and CUDA on aarch/ppc is a bit more of a niche setup, so I guess we could also drop it here. For completeness, I looked at the diff for the build scripts, but I couldn't determine something that would touch upon where the CUDA_HOME would point -- the error is pretty clearly that we're not pointing to the ppc version of libcuda.so:

$BUILD_PREFIX/powerpc64le-conda-linux-gnu/bin/ld: /usr/local/cuda-11.2/targets/x86_64-linux/lib/stubs/libcuda.so: error adding symbols: file in wrong format

@h-vetinari
Copy link
Contributor Author

Is it intended that gandiva now has a run-time dependence on libllvm?

    import pyarrow.gandiva
ImportError: libLLVM-15.so: cannot open shared object file: No such file or directory

Adding the respective host-dependence (+ respective run-export) is not hard, I'm just double-checking that this is intentional.

CC @pitrou @kou @raulcd @jorisvandenbossche

PS. This is a rare case where windows looks better than unix, only one test failure:

=========================== short test summary info ===========================
FAILED tests/parquet/test_basic.py::test_fastparquet_cross_compatibility - As...
= 1 failed, 7154 passed, 355 skipped, 12 deselected, 22 xfailed, 2 xpassed, 8 warnings in 302.94s (0:05:02) =

@kou
Copy link
Member

kou commented Sep 8, 2023

Yes. It's caused by #37412.
We use libLLVM.so by default on non-Windows. If you don't like it you can add -DARROW_LLVM_USE_SHARED=OFF to use static LLVM libraries.
(I think that libLLVM.so is better to reduce duplication.)

@h-vetinari
Copy link
Contributor Author

It does work in conda-forge, and CUDA on aarch/ppc is a bit more of a niche setup, so I guess we could also drop it here.

I started debugging this PR on conda-forge infrastructure in conda-forge/arrow-cpp-feedstock#1170, and it turns out that the problem hits us there as well. I'm not sure what changed since 13.0, but it seems something is now overriding (or not respecting) our CUDA_HOME, which is necessary to make cross-compilation work with CUDA...

@kou
Copy link
Member

kou commented Sep 13, 2023

Could you try -DCUDAToolkit_ROOT=${CUDA_HOME} instead of -DCUDA_TOOLKIT_ROOT_DIR=${CUDA_HOME}?

CUDA_TOOLKIT_ROOT_DIR is for old FindCUDA: https://cmake.org/cmake/help/latest/module/FindCUDA.html
We use FindCUDAToolkit for new CMake: https://cmake.org/cmake/help/latest/module/FindCUDAToolkit.html

@h-vetinari
Copy link
Contributor Author

Thanks for the quick response, that sounds like a very promising candidate! Rebased & retriggered conda-forge/arrow-cpp-feedstock#1170. Will sync back to this PR if passing.

@github-actions github-actions bot added Component: Python awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 17, 2023
python/pyarrow/tests/test_fs.py Outdated Show resolved Hide resolved
Comment on lines +389 to +390
# currently broken
{% set tests_to_skip = tests_to_skip + " or test_fastparquet_cross_compatibility" %}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haven't raised an issue for this yet, but this is consistently failing. Not sure what's the difference between our CI and the one here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know a CI build log that shows the error for this? (the last ones will have this already skipped)
(to have an idea what is going on here)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_____________________ test_fastparquet_cross_compatibility _____________________

tempdir = PosixPath('/tmp/pytest-of-conda/pytest-0/test_fastparquet_cross_compati0')

    @pytest.mark.pandas
    @pytest.mark.fastparquet
    @pytest.mark.filterwarnings("ignore:RangeIndex:FutureWarning")
    @pytest.mark.filterwarnings("ignore:tostring:DeprecationWarning:fastparquet")
    def test_fastparquet_cross_compatibility(tempdir):
        fp = pytest.importorskip('fastparquet')
    
        df = pd.DataFrame(
            {
                "a": list("abc"),
                "b": list(range(1, 4)),
                "c": np.arange(4.0, 7.0, dtype="float64"),
                "d": [True, False, True],
                "e": pd.date_range("20130101", periods=3),
                "f": pd.Categorical(["a", "b", "a"]),
                # fastparquet writes list as BYTE_ARRAY JSON, so no roundtrip
                # "g": [[1, 2], None, [1, 2, 3]],
            }
        )
        table = pa.table(df)
    
        # Arrow -> fastparquet
        file_arrow = str(tempdir / "cross_compat_arrow.parquet")
        pq.write_table(table, file_arrow, compression=None)
    
        fp_file = fp.ParquetFile(file_arrow)
        df_fp = fp_file.to_pandas()
>       tm.assert_frame_equal(df, df_fp)

pyarrow/tests/parquet/test_basic.py:741: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
testing.pyx:55: in pandas._libs.testing.assert_almost_equal
    ???
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

>   ???
E   AssertionError: DataFrame.iloc[:, 3] (column name="d") are different
E   
E   DataFrame.iloc[:, 3] (column name="d") values are different (66.66667 %)
E   [index]: [0, 1, 2]
E   [left]:  [True, False, True]
E   [right]: [False, False, False]
E   At positional index 0, first diff: True != False

testing.pyx:173: AssertionError

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Searching for fastparquet in our code base, I assume that we actually don't have any CI build that includes it ... So we are just not running that test anywhere, whoops.

Now, the cross-compatibility is also tested in the pandas test suite, so maybe it's not too important to have it here as well. Anyway, opened an issue at #37853

Copy link
Member

@jorisvandenbossche jorisvandenbossche Sep 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Although fastparquet reading a column of boolens wrongly for a pyarrow-written file seems a quite serious issue ..
Now, similar data is used in the pandas test suite, where this is passing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising the issue. I agree it does looks potentially serious, but I hadn't looked too closely because it was just one test. The first thing would be to add testing here in CI. I think this PR is fine as is (I try to keep the test skips to an absolute minimum and remove them whenever possible; that then gets picked up by the next recipe sync anyway)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a closer look at it (could reproduce this locally), and so actually it was good to do so, as this is a problem in fastparquet reading parquet files generated by latest pyarrow, see details at #37853 (comment)

But yes, so skipping for now is perfectly fine.

dev/tasks/conda-recipes/arrow-cpp/meta.yaml Outdated Show resolved Hide resolved
@h-vetinari
Copy link
Contributor Author

@github-actions crossbow submit -g conda

@github-actions
Copy link

Revision: d7af614

Submitted crossbow builds: ursacomputing/crossbow @ actions-20144be8d4

Task Status
conda-clean Azure
conda-linux-aarch64-cpu-py3 Azure
conda-linux-aarch64-cpu-r42 Azure
conda-linux-aarch64-cpu-r43 Azure
conda-linux-aarch64-cuda-py3 Azure
conda-linux-ppc64le-cpu-py3 Azure
conda-linux-ppc64le-cuda-py3 Azure
conda-linux-x64-cpu-py3 Azure
conda-linux-x64-cpu-r42 Azure
conda-linux-x64-cpu-r43 Azure
conda-linux-x64-cuda-py3 Azure
conda-osx-arm64-cpu-py3 Azure
conda-osx-arm64-cpu-r42 Azure
conda-osx-arm64-cpu-r43 Azure
conda-osx-x64-cpu-py3 Azure
conda-osx-x64-cpu-r42 Azure
conda-osx-x64-cpu-r43 Azure
conda-win-x64-cpu-py3 Azure
conda-win-x64-cpu-r41 Azure
conda-win-x64-cuda-py3 Azure

@github-actions
Copy link

Revision: 53a4b70

Submitted crossbow builds: ursacomputing/crossbow @ actions-80a2017be8

Task Status
conda-clean Azure
conda-linux-aarch64-cpu-py3 Azure
conda-linux-aarch64-cpu-r42 Azure
conda-linux-aarch64-cpu-r43 Azure
conda-linux-aarch64-cuda-py3 Azure
conda-linux-ppc64le-cpu-py3 Azure
conda-linux-ppc64le-cuda-py3 Azure
conda-linux-x64-cpu-py3 Azure
conda-linux-x64-cpu-r42 Azure
conda-linux-x64-cpu-r43 Azure
conda-linux-x64-cuda-py3 Azure
conda-osx-arm64-cpu-py3 Azure
conda-osx-arm64-cpu-r42 Azure
conda-osx-arm64-cpu-r43 Azure
conda-osx-x64-cpu-py3 Azure
conda-osx-x64-cpu-r42 Azure
conda-osx-x64-cpu-r43 Azure
conda-win-x64-cpu-py3 Azure
conda-win-x64-cpu-r41 Azure
conda-win-x64-cuda-py3 Azure

@h-vetinari
Copy link
Contributor Author

Failures:

linux-64

Failed during artefact upload -- irrelevant

linux-aarch64

Connection lost to agent -- irrelevant

osx-64

=================================== FAILURES ===================================
_____________ test_debug_memory_pool_disabled[system_memory_pool] ______________

pool_factory = <built-in function system_memory_pool>

    @pytest.mark.parametrize('pool_factory', supported_factories())
    def test_debug_memory_pool_disabled(pool_factory):
        res = run_debug_memory_pool(pool_factory.__name__, "")
        # The subprocess either returned successfully or was killed by a signal
        # (due to writing out of bounds), depending on the underlying allocator.
        if os.name == "posix":
            assert res.returncode <= 0
        else:
            res.check_returncode()
>       assert res.stderr == ""
E       AssertionError: assert 'libc++abi: P...ion called!\n' == ''
E         + libc++abi: Pure virtual function called!

pyarrow/tests/test_memory.py:255: AssertionError
----------------------------- Captured stderr call -----------------------------
libc++abi: Pure virtual function called!

I can skip the respective test on osx if that's what people prefer. Otherwise this PR should be ready.

@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Sep 25, 2023
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Sep 25, 2023
@h-vetinari
Copy link
Contributor Author

Anything else to do here?

@jorisvandenbossche
Copy link
Member

I don't think so, thanks for the ping, and for all the work here!

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Sep 28, 2023
@jorisvandenbossche jorisvandenbossche merged commit c9674bc into apache:main Sep 28, 2023
8 checks passed
@jorisvandenbossche jorisvandenbossche removed the awaiting merge Awaiting merge label Sep 28, 2023
@h-vetinari h-vetinari deleted the conda_sync branch September 28, 2023 11:58
@conbench-apache-arrow
Copy link

After merging your PR, Conbench analyzed the 6 benchmarking runs that have been run so far on merge-commit c9674bc.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about possible false positives for unstable benchmarks that are known to sometimes produce them.

etseidl pushed a commit to etseidl/arrow that referenced this pull request Sep 28, 2023
…apache#37624)

Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68)

Relevant updates:
* we're not building twice for different protobuf versions anymore
* new abseil version (fixes apache#36908)
* we've finally upgraded the aws-sdk to 1.11
* the default R versions (on unix) are now 4.2 & 4.3.

Also some further hardening of the activation scripts & clean-ups for dependencies & test skips.
* Closes: apache#37621

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
…apache#37624)

Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68)

Relevant updates:
* we're not building twice for different protobuf versions anymore
* new abseil version (fixes apache#36908)
* we've finally upgraded the aws-sdk to 1.11
* the default R versions (on unix) are now 4.2 & 4.3.

Also some further hardening of the activation scripts & clean-ups for dependencies & test skips.
* Closes: apache#37621

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request Oct 23, 2023
…apache#37624)

Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68)

Relevant updates:
* we're not building twice for different protobuf versions anymore
* new abseil version (fixes apache#36908)
* we've finally upgraded the aws-sdk to 1.11
* the default R versions (on unix) are now 4.2 & 4.3.

Also some further hardening of the activation scripts & clean-ups for dependencies & test skips.
* Closes: apache#37621

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
loicalleyne pushed a commit to loicalleyne/arrow that referenced this pull request Nov 13, 2023
…apache#37624)

Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68)

Relevant updates:
* we're not building twice for different protobuf versions anymore
* new abseil version (fixes apache#36908)
* we've finally upgraded the aws-sdk to 1.11
* the default R versions (on unix) are now 4.2 & 4.3.

Also some further hardening of the activation scripts & clean-ups for dependencies & test skips.
* Closes: apache#37621

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
dgreiss pushed a commit to dgreiss/arrow that referenced this pull request Feb 19, 2024
…apache#37624)

Syncing after the release of 13.0.0 + a couple of migrations (state as of conda-forge/arrow-cpp-feedstock#1168 & conda-forge/r-arrow-feedstock#68)

Relevant updates:
* we're not building twice for different protobuf versions anymore
* new abseil version (fixes apache#36908)
* we've finally upgraded the aws-sdk to 1.11
* the default R versions (on unix) are now 4.2 & 4.3.

Also some further hardening of the activation scripts & clean-ups for dependencies & test skips.
* Closes: apache#37621

Lead-authored-by: H. Vetinari <[email protected]>
Co-authored-by: h-vetinari <[email protected]>
Signed-off-by: Joris Van den Bossche <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Sync conda-forge recipe [Python][FlightRPC] Tests segfault on OSX in conda-forge
5 participants