Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change JSONFields to more specific Fields #3653

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 10 additions & 20 deletions pulp_rpm/app/serializers/package.py
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test are passing, but it doesnt look right (see snippet below).

We have a clear definition of what this "json" object should be (here). Is there any reason not to create/add serializers that precisely describe its structure?

That's how it looks now with only the List specification and without knowing anything about its items:
(I've tried going a step further and using child=DictField(), but that broke stuff).

"files": [
  [
    null,
    "/tmp/",
    "duck.txt"
  ]
],
"requires": [
  [
    "cockateel",
    null,
    null,
    null,
    null,
    false
  ],
  [
    "lion",
    null,
    null,
    null,
    null,
    false
  ]
],
"provides": [
  [
    "duck",
    "EQ",
    "0",
    "0.6",
    "1",
    false
  ]
],

Copy link
Contributor

@dralley dralley Jul 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The schema is roughly speaking exactly as it is represented by createrepo_c, as that is what we use for parsing the metadata and also for publishing it.

Unfortunately that's not documented super well upstream either, but obviously you've seen the comments and here's an example w/ createrepo_c: https://github.com/rpm-software-management/createrepo_c/blob/master/examples/python/manual_repodata_parsing.py#L13C1-L35C63

It's not the ideal format for display to the user, but doing the minimum amount of work to transform the data during sync and publish was fairly important for performance. That's partly why we used JSONField also - it preserves order without needing to do sorts and joins during query-time. Coming from Pulp 2 where publish performance sucked (for different reasons but nonetheless), we were pretty sensitive to that concern.

To be honest that may or may not be an issue in practice nowadays. Ergonomically it is still a bit more painful though.

As for why not use sub-serializers - I think it was just simpler at the time, and survived from the prototype phase. If we can use sub-serializers and experience benefits from that without unacceptable drawbacks then I'd merge that patch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly do you mean by "doesn't look right"?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the info, makes sense these are not db fields.

About the benefit of sub-serializers, maybe some of the fields could benefit more and be less risky of bringing significant drawbacks, such as the config ones (repo_config and copy config).

Original file line number Diff line number Diff line change
Expand Up @@ -77,64 +77,54 @@ class PackageSerializer(SingleArtifactContentUploadSerializer, ContentChecksumSe
read_only=True,
)

changelogs = serializers.JSONField(
changelogs = serializers.ListField(
help_text=_("Changelogs that package contains"),
default="[]",
required=False,
read_only=True,
)
files = serializers.JSONField(
files = serializers.ListField(
help_text=_("Files that package contains"),
default="[]",
required=False,
read_only=True,
)

requires = serializers.JSONField(
requires = serializers.ListField(
help_text=_("Capabilities the package requires"),
default="[]",
required=False,
read_only=True,
)
provides = serializers.JSONField(
provides = serializers.ListField(
help_text=_("Capabilities the package provides"),
default="[]",
required=False,
read_only=True,
)
conflicts = serializers.JSONField(
conflicts = serializers.ListField(
help_text=_("Capabilities the package conflicts"),
default="[]",
required=False,
read_only=True,
)
obsoletes = serializers.JSONField(
obsoletes = serializers.ListField(
help_text=_("Capabilities the package obsoletes"),
default="[]",
required=False,
read_only=True,
)
suggests = serializers.JSONField(
suggests = serializers.ListField(
help_text=_("Capabilities the package suggests"),
default="[]",
required=False,
read_only=True,
)
enhances = serializers.JSONField(
enhances = serializers.ListField(
help_text=_("Capabilities the package enhances"),
default="[]",
required=False,
read_only=True,
)
recommends = serializers.JSONField(
recommends = serializers.ListField(
help_text=_("Capabilities the package recommends"),
default="[]",
required=False,
read_only=True,
)
supplements = serializers.JSONField(
supplements = serializers.ListField(
help_text=_("Capabilities the package supplements"),
default="[]",
required=False,
read_only=True,
)
Expand Down
28 changes: 24 additions & 4 deletions pulp_rpm/app/serializers/repository.py
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here I've added an example of the config on the field help, but I actually think the right approach is defining a precise serializer (in this example, its this spec), so a correct sample will be generated by Redoc/OpenApi themselves.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That schema isn't used by OpenAPI at all, fwiw. It's validated by our own code in the serializer.

Unless you just mean that's an example of the sort of thing we'd need to provide for OpenAPI.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that the serializer and its fields are used by openapi/redoc to generate the api docs page.

Notice the config: { } in Request Samples, and how it says its an Object type.

image

Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,7 @@ class RpmRepositorySerializer(RepositorySerializer):
),
read_only=True,
)
repo_config = serializers.JSONField(
repo_config = serializers.DictField(
required=False,
help_text=_("A JSON document describing config.repo file"),
)
Expand Down Expand Up @@ -402,7 +402,7 @@ class RpmPublicationSerializer(PublicationSerializer):
),
read_only=True,
)
repo_config = serializers.JSONField(
repo_config = serializers.DictField(
required=False,
help_text=_("A JSON document describing config.repo file"),
)
Expand Down Expand Up @@ -536,13 +536,33 @@ def validate(self, data):
return data


copy_config_example = """
[
{
"source_repo_version": {{ version-href }},
"dest_repo": {{ repo-href }},
"content": [ {{ item-1.href }}, {{ item-2.href }} ]
},
{
"source_repo_version": {{ version-href }},
"dest_repo": {{ repo-href }},
"dest_base_version": 0,
"content": [ {{ item-1.href }}, {{ item-2.href }} ]
}
]
"""


class CopySerializer(ValidateFieldsMixin, serializers.Serializer):
"""
A serializer for Content Copy API.
"""

config = serializers.JSONField(
help_text=_("A JSON document describing sources, destinations, and content to be copied"),
config = serializers.ListField(
help_text=_(
"A JSON List of Objects describing sources, destinations, and content to be copied. "
f"E.g:\n```{copy_config_example}```"
),
)

dependency_solving = serializers.BooleanField(
Expand Down
Loading