Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FutureWarning of UCX < 1.11.1 deprecation #779

Merged

Conversation

pentschev
Copy link
Member

@pentschev pentschev commented Sep 9, 2021

Besides stability, UCX 1.11.1 includes important features such as:

  • Automatic InfiniBand<->CUDA device mapping without specifying UCX_NET_DEVICES explicitly;
  • InfiniBand allocation registration cache without patching source;
  • Endpoint error handling with CUDA IPC;
  • etc.

All features above will simplify usage both with and without Dask and RAPIDS. Therefore, we would like to deprecate older versions of UCX, to be removed at a later release.

@pentschev pentschev requested a review from a team as a code owner September 9, 2021 18:26
@pentschev
Copy link
Member Author

I'm not necessarily tied to this idea, but perhaps we should start warning users of UCX < 1.11.1 deprecation? Also it doesn't need to be right now, we could delay it for 21.12 or another RAPIDS release.

@quasiben
Copy link
Member

quasiben commented Sep 9, 2021

I'm not against deprecating but I think the sentiment you raised around waiting a bit is probably correct. However, I don't know of anyone using UCX-Py who can't upgrade UCX

@pentschev
Copy link
Member Author

I'm not against deprecating but I think the sentiment you raised around waiting a bit is probably correct. However, I don't know of anyone using UCX-Py who can't upgrade UCX

It's not too much on upgrading or not, but fundamentally older UCX (such as 1.9) lacked a lot of functionality that are now implemented in UCX 1.11.1 (e.g., automatic IB device detection, IB allocation registration cache, endpoint error handling with CUDA IPC) that really simplifies things for us by a lot, and removing that code sometime in the near future would be great. Therefore, alerting users would probably be a good first step, also to hear from users who can't potentially upgrade rather sooner than later.

@jakirkham
Copy link
Member

This is just adding a warning right? Not actually dropping anything. Think it makes sense to get this in sooner rather than later. It will give us flexibility later on when deciding whether we want to drop or not. The longer we wait to communicate this to users, the longer it takes for us to actually do the deprecation, which may come at a point where things are more painful

@jakirkham
Copy link
Member

That said, it's probably worth raising this with key stakeholders first just so they are aware and we can make any changes to the deprecation schedule if needed.

@pentschev
Copy link
Member Author

cc @dantegd @cjnolet for cuML
cc @BradReesWork @rlratzel for cuGraph
cc @randerzander @beckernick @VibhuJawa @ayushdg for Dask workflows
cc @JohnZed generally for RAPIDS
cc @benjha @MattBBaker @jglaser for Summit/ORNL

@benjha
Copy link

benjha commented Sep 9, 2021

Hi all,

What do you recommend for a "quick" check ? Build ucx-py 1.11.1 in a stand alone environment and run the test suite ?

@pentschev
Copy link
Member Author

pentschev commented Sep 10, 2021

@benjha yes, it would be good to run UCX-Py and Dask-CUDA tests if you can:

# UCX-Py
git clone https://github.com/rapidsai/ucx-py
cd ucx-py
pip install .
pytest --cache-clear -vs ucp/_libs/tests/
UCX_TLS=rc,tcp,cuda_copy,cuda_ipc pytest --cache-clear -vs tests/

# Dask-CUDA
git clone https://github.com/rapidsai/dask-cuda
cd dask-cuda
pip install .
pytest -vs --cache-clear dask_cuda/tests/

EDIT: For the async UCX-Py API I've added UCX_TLS. It turns out that the shm transport causes some of the tests to hang, which is a transport that still requires some work, see #565 .

@pentschev
Copy link
Member Author

I've also checked with DLFW folks offline, and they're already running some RC version of UCX 1.11.1, so dropping compatibility code isn't a problem for them, as that only affects UCX 1.9 and older.

@jakirkham
Copy link
Member

How do we feel about merging this?

@pentschev pentschev changed the base branch from branch-0.22 to branch-0.23 November 19, 2021 15:14
@pentschev
Copy link
Member Author

Since no objections were raised in over 2 months, I'll plan to merge this later today. We'll discuss a plan to completely drop support for UCX < 1.11.1 early next year.

@pentschev
Copy link
Member Author

rernu tests

@pentschev
Copy link
Member Author

rerun tests

@pentschev
Copy link
Member Author

I'm doing some final testing, will merge this tomorrow or Monday.

@pentschev pentschev merged commit fbd6071 into rapidsai:branch-0.23 Nov 22, 2021
rapids-bot bot pushed a commit to rapidsai/dask-cuda that referenced this pull request Nov 22, 2021
After rapidsai/ucx-py#779 , this is required to run pytests when using UCX 1.9. To be removed in a future release.

Authors:
  - Peter Andreas Entschev (https://github.com/pentschev)

Approvers:
  - https://github.com/jakirkham

URL: #799
@pentschev pentschev deleted the deprecate-ucx-1.11.0-and-older branch November 24, 2021 13:16
@pentschev pentschev mentioned this pull request Jan 18, 2022
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants