-
Notifications
You must be signed in to change notification settings - Fork 491
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NVIDIA: Adding cuPQC as a backend for ML-KEM. #2044
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Steven Reeves <[email protected]>
@praveksharma @neil-lindquist for visibility. |
@praveksharma looks like a number of github actions are failing, can you help with this, or point to what needs to be changed? |
@stevenireeves For the code formatting errors, please review this. For the basic test error (and error, actually), please open the "twisty" in the CI error report e.g. here at the "Configure" step to see the config command executed: You can run locally and see the problem: The new config variable apparently isn't initialized in all cases: |
Signed-off-by: Steven Reeves <[email protected]>
@stevenireeves I've pushed 3 commits to https://github.com/open-quantum-safe/liboqs/tree/libOQSxcuPQC, which builds off of your branch, to get it to pass CI. I can't seem to push the changes directly to your fork. Here's how to re-produce the changes:
|
…_####.c and kem/family/kem_scheme.c Signed-off-by: Steven Reeves <[email protected]>
Thank you for making these changes @stevenireeves! Since I reviewed the private PR initially I'll let this one be reviewed by reviewers other than myself. |
I talked to @praveksharma just now to understand some of the approach here, and I understand why the cupqc metadata is patched into the PQ Crystals meta.yml file (since there isn't another meta.yml to patch it into); so that makes sense. But I think a few other pieces could be done differently. I don't think the *.cu files need to be added via the patch file; couldn't they just be added directly without also being added through the patch? And if so then could we give those directories a name that doesn't include pqcrystals -- e.g., just |
@dstebila with this organizational change will we need to mess with the copy_from_upstream stuff as well? |
@stevenireeves I shall work on this and attemp to push the changes directly to your fork. I don't believe copy_from_upstream.py would need to be changed significantly. |
I have the allow edits from maintainers option selected. So should be able to make those changes. |
Signed-off-by: Pravek Sharma <[email protected]>
@dstebila handling the naming shouldn't be an issue. The *.cu file must be sourced with every run of copy_from_upstream.py under delete mode, since there is no upstream the *.cu must be sourced from within the liboqs repo - a patch is the most straightforward way of doing this. Would a separate patch file for the *.cu file be adequate? |
Hmm -- it doesn't to me: We have #2041 in the pipeline with the declared goal of removing the PQCrystals files. So this PR thus would need to be re-done after that landed, right? That doesn't seem sensible. Wouldn't it be much more sensible to have this code be contained in an upstream of its own to pull it from? As far as I can see, this is effectively a different implementation with different license terms, characteristics etc. Such split also would make responsibilities clear: NVIDIA is to support the interfacing to its library (that may very well change over time unbeknownst to OQS) and OQS is responsible for the proper operation within/integration into the OQS APIs (that may change unbeknownst to NVIDIA). Or is the intention by NVIDIA to become committed, well committers and/or maintainers to OQS @stevenireeves ? |
Nvidia maintaining it's own fork of liboqs is not what we want to do. The intention of NVIDIA is to support libOQS on portions of liboqs that utilize cuPQC as it's backend. So if there are significant changes that require alteration of the source files related to cupqc in liboqs, we will help there (so long as we have the internal support). I believe that @praveksharma is in the process of removing the cupqc metadata from PQ Crystals meta.yaml, correct me if I'm wrong. |
Oh, I see now. I knew that we have some local implementations that So I see why the patch file is adding these. |
I think these are talking about two slightly different things. Michael isn't suggesting that NVIDIA maintain a separate fork of liboqs. The way our code has been structured is that we have scripts to pull in source code from (self-contained) implementations of algorithms in other repositories, and add them to liboqs using code generation and patches. This could be done that way, but since it is also only one file that is being added, I don't think it's worth the effort of setting up a separate upstream for that. |
This is not my ask: I only (meant to :) ask whether you'd want to maintain the wrapper code around the cuPQC code in a separate project -- complete with META.yml such as for |
I added the |
Thank you for offering @SWilson4 but I think it is more prudent to create a separate git repository to store cupqc_ml-kem metadata since the pqcrystals upstream is going to be deprecated soon in any case. If things work out okay @stevenireeves or someone else from Nvidia can ownership of the repo. |
@praveksharma I am not sure what metadata you want us to store. Do you want us to write the wrapper code in an additional repository? |
…metadata to separate upstream repo Signed-off-by: Pravek Sharma <[email protected]>
I've updated the import mechanism to use https://github.com/praveksharma/cupqc-mlkem as an upstream. Ideally I would like to modify copy_from_upstream to not require ad hoc upstreams for situations such as this. Could you please review the updated PR @SWilson4 @dstebila @baentsch ? |
Thanks @praveksharma |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically LGTM, @stevenireeves @praveksharma ; just some nits (see single comments). Also the question whether it would be worth while adding build documentation and a "tier 3" entry to PLATFORMS.md (as I guess we'll never be able to test this in CI, right?)
CONFIGURE.md
Outdated
@@ -124,6 +125,13 @@ Dynamically load OpenSSL through `dlopen`. When using liboqs from other cryptogr | |||
|
|||
Only has an effect if the system supports `dlopen` and ELF binary format, such as Linux or BSD family. | |||
|
|||
### OQS_USE_CUPQC | |||
|
|||
Can be `ON` or `OFF`. When `ON`, use NVIDIA's cuPQC library where able (currently just ML-KEM). When this option is enabled, liboqs may not run correctly on machines that lack supported GPUs. To download cuPQC follow the instructions at (https://developer.nvidia.com/cupqc-download/). Detailed descriptions of the API, requirments, and installation guide are in the cuPQC documentation (https://docs.nvidia.com/cuda/cupqc/index.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo "requirements"
docs/algorithms/kem/ml_kem.md
Outdated
@@ -9,6 +9,10 @@ | |||
- **Primary Source**<a name="primary-source"></a>: | |||
- **Source**: https://github.com/pq-crystals/kyber/commit/10b478fc3cc4ff6215eb0b6a11bd758bf0929cbd with copy_from_upstream patches | |||
- **Implementation license (SPDX-Identifier)**: CC0-1.0 or Apache-2.0 | |||
- **Optimized Implementation sources**: https://github.com/pq-crystals/kyber/commit/10b478fc3cc4ff6215eb0b6a11bd758bf0929cbd with copy_from_upstream patches | |||
- **cupqc-cuda**:<a name="cupqc-cuda"></a> | |||
- **Source**: https://github.com/praveksharma/cupqc-mlkem/commit/adb8454e56979628c07b67eb7d90f9337be6dc30 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this going to move to a non-personal GH repo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is the goal @baentsch. Alternatively, I would like to modify to copy_from_upstream to store changes locally in the case of a minimal "upstream" such as this one.
@@ -38,6 +38,14 @@ upstreams: | |||
kem_meta_path: '{pretty_name_full}_META.yml' | |||
kem_scheme_path: '.' | |||
patches: [pqcrystals-ml_kem.patch] | |||
- | |||
name: cupqc | |||
git_url: https://github.com/praveksharma/cupqc-mlkem.git |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please eventually move to long-term maintained GH repo
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know we'd need to get additional hardware to run CI tests on the CUDA code, but it is possible to add a build-only job?
Signed-off-by: Pravek Sharma <[email protected]>
I'm trying to do this on a maching without a Nvidia GPU and am running into trouble. Do you know if this is possible @stevenireeves?
I think we should do this. But I'm not familiar enough GPU architectures to know which specific platform to list. @stevenireeves, given cuPQC's requirements:
is listing "NVIDIA GPU architectures 70, 75, 80, 86, 89, and 90 with a x86_64 CPU" sufficient information for those in the know? For reference here are other Tier 3 platforms from PLATFORMS.md:
|
I think if you have the NVIDIA toolkit 12.4 (even though the machine does not have a GPU) you should be able to build.
cuPQC only support Linux with x86_64 currently. We only officially support archs 70-90 but I don't see why any GPU made post 70 wouldn't. I think we could say Linux NVIDIA GPU architectures 70, 75, 80, 86, 89, and 90 with a x86_64 CPU in that list and be good. In the future we can update if we support more OS/CPUs and GPUs. |
Signed-off-by: Pravek Sharma <[email protected]>
This would require the CI images to updated with NVIDIA toolkit so I'll get started on that. |
If there's no concrete desire to list this feature in a higher support tier (who wants that?) that'd be "future work" in my eyes, @praveksharma (or at least lower priority than many other open issues) and for me not a prerequisite for an approval of this PR. |
Signed-off-by: Pravek Sharma <[email protected]>
Signed-off-by: Pravek Sharma <[email protected]>
Signed-off-by: Pravek Sharma <[email protected]>
Signed-off-by: Pravek Sharma <[email protected]>
Signed-off-by: Pravek Sharma <[email protected]>
Signed-off-by: Pravek Sharma <[email protected]>
@praveksharma looks like the build with OQS_USE_CUPQC is still failing in this PR. Although I can't see what is causing the failures. Did Neil's comments in the email help? |
@stevenireeves Yes, the comments were super helpful. I can succesfully build locally, this is mostly likely an issue with the CI config itself that I'm still troubleshooting. |
Signed-off-by: Pravek Sharma <[email protected]>
@stevenireeves cmake in CI is failing with this message:
I am unable to reproduce this error locally in a docker container provisioned from the same image used in CI. Do you know how to fix this? |
@praveksharma ah, because CI has no GPU it can't detect a default. Try adding this flag to the CMake command. Specifically for CI. |
Signed-off-by: Pravek Sharma <[email protected]>
@praveksharma looks like that CI test is still failing.
You'll need to tell CMAKE where nvcc is. |
Signed-off-by: Pravek Sharma <[email protected]>
Signed-off-by: Pravek Sharma <[email protected]>
@praveksharma looks like that did the trick. Any other comments by the reviewers? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, with the caveat that we should eventually move away from a personal repo, as noted by @baentsch above.
This PR adds the support for the NVIDIA library cuPQC to be used as the backend for ML-KEM algorithms.
cuPQC requires the use of an NVIDIA GPU to perform the PQC algorithms.