-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Generate meta.yaml dependencies #45
Conversation
A partial example of this can be seen in rapidsai/cudf#13022 |
"if": { | ||
"properties": { "table": { "const": "project.optional-dependencies" } } | ||
}, | ||
"then": { | ||
"required": ["key"] | ||
}, | ||
"else": { | ||
"not": { | ||
"required": ["key"] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was having trouble getting this conditional to behave correctly. According to the official JSON Schema docs it should have been sufficient to add a "required": ["table"]
to the if
statement, but no matter what I did the validator insisted that the key
property was required when only the section
key was present. I'll need to revisit this but didn't want to block the rest of the review on a schema issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
CC @csadorf if you have thoughts here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So when you simply added a "required": ["table"]
to the "if"
block and removed the "then"
and "else"
blocks it would always require it? That seems indeed very odd. Can you provide some valid/invalid examples that we can test against?
For versions that need to be in
with
If that works then we ought to be able to generate the entire config file with dfg. Not sure if we want to go that route or if that is in scope for this PR, though. |
We should also keep in mind the alternative proposed by @jakirkham in rapidsai/rmm#1220. I've already outlined my thoughts on the pros and cons there, so just restarting that conversation now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this approach a lot better.
I'll summarize our huddle about this PR here for posterity:
I had some concerns about how we'd account for these Jinja functions provided by conda build
.
The results of our conversation were:
- We can likely drop the use of
pin_compatible
altogether and just manually manage those version relationships wherever necessary. You mentioned we have to do this for wheels anyway - We'd like to continue using
pin_subpackage
since we won't be able to emulate it'sexact=True
flag functionality (which is pretty much the only flag that use) - We can likely drop the use of
compiler
altogether assuming:- We can identify the packages that result from calling
compiler()
for different architectures - Conda provides us with some way to load external files with variable names (e.g.
meta_dependencies_x86_64_build.yaml
vsmeta_dependencies_aarch64_build.yaml
)
- We can identify the packages that result from calling
After reading about the It seems like that function is mostly used for cross-compiling, which we don't do (we natively compile on
From these conda-forge docs, it seems that So I'm assuming we can just replace @jakirkham, can you corroborate all this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this seems like a rather trivial change and I would have no major objections. But I'm wondering whether it is too trivial, considering that a conda_meta file can be fairly complex.
Can you add at least one valid example input and expected output (and thus test)?
How does this mesh with tools like Grayskull?
default_conda_dir = "conda/environments" | ||
default_conda_meta_dir = "conda/recipes/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason why to use a trailing backslash here, but not for the other dirs?
extras.get("output", ""), | ||
extras.get("section", ""), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
extras.get("output", ""), | |
extras.get("section", ""), | |
extras.get("output"), | |
extras.get("section"), |
Since you are filtering on None
anyways.
@@ -55,6 +56,8 @@ def dedupe(dependencies): | |||
""" | |||
deduped = sorted({dep for dep in dependencies if not isinstance(dep, dict)}) | |||
dict_deps = defaultdict(list) | |||
# The purpose of the outer loop is to support nested dependency lists such as the | |||
# `pip:` list. If multiple are present, they must be internally deduped as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated?
|
||
Notes | ||
----- | ||
An empty `gridspec` dict will result in an empty dict as the single yielded value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated?
@@ -131,6 +138,12 @@ def make_dependency_file( | |||
"dependencies": dependencies, | |||
} | |||
) | |||
elif file_type == str(OutputTypes.CONDA_META): | |||
file_contents += yaml.dump( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we assume that the header will be valid for all generated file types?
elif file_type == str(OutputTypes.CONDA_META): | ||
file_contents += yaml.dump( | ||
{ | ||
"dependencies": dependencies, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The "dependencies" section of a conda-meta file is called "requirements". What am I missing here?
@@ -162,22 +163,16 @@ | |||
"extras": { | |||
"type": "object", | |||
"properties": { | |||
"section": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the keyword "section" originate from the conda docs? I am wondering whether it would make sense to better namespace our "extras", something like "conda-meta-section" instead of just "section".
"if": { | ||
"properties": { "table": { "const": "project.optional-dependencies" } } | ||
}, | ||
"then": { | ||
"required": ["key"] | ||
}, | ||
"else": { | ||
"not": { | ||
"required": ["key"] | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So when you simply added a "required": ["table"]
to the "if"
block and removed the "then"
and "else"
blocks it would always require it? That seems indeed very odd. Can you provide some valid/invalid examples that we can test against?
To be clear, we are not generating a full recipe anymore, just simple YAML files containing dependency lists. That's basically a subset of what we already need for conda environment.yaml files (same thing, but without the channels). That's why this is so simple right now.
I will when I get a chance. This work has temporarily been deprioritized since we don't want to make this kind of change until after we have CUDA 12 compatible recipes, but once that's done I'll be getting back to this and will add some then.
Grayskull's goals are a bit different. IIUC it's for generating an entire recipe given a PyPI package or a repo. However, the resulting package will be too simple for our purposes since it won't encode all of the extra information we put in our recipes around things like run_exports for instance, nor will it support complex features like split outputs. This PR is an approach that lets us share dependencies with the rest of our infrastructure while still having the flexibility to fill out the non-dependency parts of the recipe, which strikes the best balance for us. |
So the conda-meta semantics here means "this will be needed for a conda recipe" rather than "we are generating a conda-meta file"?
Sure, but I would be surprised if some of our downstream processing tools wouldn't share at least some of the logic. Or on the flipside, maybe a Grayskull generated recipe could be further processed. Or we add missing features to Grayskull. Maybe none of this is appropriate, but I wanted to mention this to make sure it has at least been considered. |
Correct. Happy to rename if you think there is a better way to convey this.
These are good thoughts. My opinion here is that conda recipes are fairly complex configuration files, and we don't want to be in the business of dealing with them directly. dfg is purely about dependencies. In the case of pyproject.toml files, we can modify in place largely because a package exists (tomlkit) that supports format-preserving in-place modification. Since conda recipes are a superset of YAML (they also support Jinja etc), we have no way to do the same without rolling our own parser. Moreover, the complexity of split recipes means that we would need a lot of complexity in dfg to handle writing to subsections that are in split outputs vs at the top level of a recipe. As a result, we chose to go this route of generating separate files for each dependency set. Even if we were able to modify in place, however, we still wouldn't want to modify any section aside from dependency lists. As a result, I would expect that the only overlap we have with Grayskull is in Grayskull's ability to generate the dependency sections of meta.yaml from either setup.py or pyproject.toml (I haven't looked at how it does this). I'm not sure how much work it's worth to try and find that common ground in order to share code/logic. |
I'm going to close this for now. A decent bit has changed since this initial implementation. Also, depending on if we move forward with rapidsai/build-planning#47 there may be much better ways to accomplish our goals in this PR (i.e. writing out meta.yaml from a template directly). We should still address this issue, but the code in this PR is probably no longer the best way to do so. |
This PR is an alternative to #28 that is based on generating separate dependency files instead of modifying
meta.yaml
in place. It is a much simpler changeset as a result. I considered putting the data into theconda_build_config.yaml
file instead of using separate YAML files for each dependency set, but I rejected that option because dfg is based around writing separatefiles
sections for every include list and the only way to map that to a singleconda_build_config.yaml
file is to do a lot of in-place overwriting, which we have generally decided to avoid unless absolutely necessary, i.e. for pyproject.toml (otherwise we could just do that formeta.yaml
too). Willing to be convinced otherwise though.In order to support split recipes, this approach requires writing separate files for every section in every output, which may be too verbose for our liking. Curious to hear opinions there.